Skip to main navigation Skip to search Skip to main content

Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates

  • Guojun Xiong
  • , Gang Yan
  • , Shiqiang Wang
  • , Jian Li
  • Stony Brook University
  • University of California Merced
  • IBM

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

With the increasing demand for large-scale training of machine learning models, fully decentralized optimization methods have recently been advocated as alternatives to the popular parameter server framework. In this paradigm, each worker maintains a local estimate of the optimal parameter vector, and iteratively updates it by waiting and averaging all estimates obtained from its neighbors, and then corrects it on the basis of its local dataset. However, the synchronization phase is sensitive to stragglers. An efficient way to mitigate this effect is to consider asynchronous updates, where each worker computes stochastic gradients and communicates with other workers at its own pace. Unfortunately, fully asynchronous updates suffer from staleness of stragglers’ parameters. To address these limitations, we propose a fully decentralized algorithm DSGD-AAU with adaptive asynchronous updates via adaptively determining the number of neighbor workers for each worker to communicate with. We show that DSGD-AAU achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers). Experimental results on a suite of datasets and deep neural network models are provided to verify our theoretical results.

Original languageEnglish
Title of host publicationMobiHoc 2024 - Proceedings of the 2024 International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing
PublisherAssociation for Computing Machinery
Pages434-439
Number of pages6
ISBN (Electronic)9798400705212
DOIs
StatePublished - Oct 1 2024
Event2024 International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, MobiHoc 2024 - Athens, Greece
Duration: Oct 14 2024Oct 17 2024

Publication series

NameProceedings of the International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc)

Conference

Conference2024 International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, MobiHoc 2024
Country/TerritoryGreece
CityAthens
Period10/14/2410/17/24

Keywords

  • Asynchronous Updates
  • Decentralized Learning
  • Stragglers

Fingerprint

Dive into the research topics of 'Straggler-Resilient Decentralized Learning via Adaptive Asynchronous Updates'. Together they form a unique fingerprint.

Cite this