Distributed Computing | Machine Learning

DPro-SM – A distributed framework for proactive straggler mitigation using LSTM.

January 3, 2024 Distributed Computing, Machine Learning

Researchers

Aswathy Ravikumar Harini Sriraman

Journal

Heliyon

Modalities

Models

LSTM

Abstract

The recent advancement in deep learning with growth in big data and high-performance computing is Distributed Deep Learning. The rapid rise in the volume of data and network complexity has led to significant growth in DDL. Distribution of the network leads to high communication and computation among the nodes, which leads to high training time and lower accuracy. The primary reason for the delay in communication is the presence of straggler nodes which causes the bottleneck in communication. Due to the enormous volume of parameter transfer, Distributed Deep Learning’s data parallelism incurs substantial communication costs. The newly developed model-parallel methods may minimize the communication effort; however, this results in load imbalance and severe straggler issues: the proposed model DPro-SM, a distributed framework for proactive straggler mitigation using LSTM in distributed deep learning. DPro-SM uses LSTM to predict the completion time of each worker and proactively allocates resources to reduce the overall training time. The results show that DPro-SM can significantly reduce the training time and improve the scalability and efficiency of large-scale machine learning tasks.© 2023 The Authors. Published by Elsevier Ltd.

Show Full Text

DPro-SM – A distributed framework for proactive straggler mitigation using LSTM.

Researchers

Journal

Modalities

Models

Abstract

Traditional Machine and Deep Learning for Predicting Toxicity Endpoints.

Alleviating sample imbalance in water quality assessment using the VAE-WGAN-GP model.

Statistical learning theory of structured data.

Deep mendelian randomization: Investigating the causal knowledge of genomic deep learning models.

DeepThal: A Deep Learning-Based Framework for the Large-Scale Prediction of the α-Thalassemia Trait Using Red Blood Cell Parameters.

Machine learning workflow for edge computed arrhythmia detection in exploration class missions.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply