Skip to main navigation Skip to search Skip to main content

Two-Stage Coded Distributed Learning: A Dynamic Partial Gradient Coding Perspective

  • Xinghan Wang
  • , Xiaoxiong Zhong
  • , Jiahong Ning
  • , Tingting Yang
  • , Yuanyuan Yang
  • , Guoming Tang
  • , Fangming Liu
  • Southeast University, Nanjing
  • Peng Cheng Laboratory
  • Dalian Maritime University
  • National University of Defense Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Distributed learning has been widely adopted to train a global model from local data. However, its performance can be severely affected by stragglers. Recently, some research has been dedicated to resolving the straggler problem by adopting gradient coding, the essence of gradient coding is to solve the straggler problem by adding data redundancy. However, the large amount of data redundancy as well as computation and communication overhead that it brings is still hard to be resolved. Besides, the complexity of the encoding and decoding will increase linearly with the number of the local workers. To this end, in this paper, we design a lightweight coding method in the computing phase and seek to ensure fair transmission in the communication phase. Specifically, to tolerate stragglers in computing phase, we propose a two-stage dynamic coding scheme, part of the workers start computing the partial gradients from the data partitions assigned in the first stage, and the remaining workers for computation in the second stage is decided based on which workers have finished in the first stage. To further tolerate stragglers in the communication phase, a perturbed Lyapunov function is designed to maximize admission data balancing fairness as well as the throughput. The experimental result verifies the derived properties and demonstrates that our proposed solution can achieve a better performance for practical network parameters and benchmark data in terms of accuracy and resource utilization in the distributed learning system.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE 43rd International Conference on Distributed Computing Systems, ICDCS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages942-952
Number of pages11
ISBN (Electronic)9798350339864
DOIs
StatePublished - 2023
Event43rd IEEE International Conference on Distributed Computing Systems, ICDCS 2023 - Hong Kong, China
Duration: Jul 18 2023Jul 21 2023

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2023-July

Conference

Conference43rd IEEE International Conference on Distributed Computing Systems, ICDCS 2023
Country/TerritoryChina
CityHong Kong
Period07/18/2307/21/23

Keywords

  • Distributed learning (DL)
  • dynamic coding scheme
  • two-stage

Fingerprint

Dive into the research topics of 'Two-Stage Coded Distributed Learning: A Dynamic Partial Gradient Coding Perspective'. Together they form a unique fingerprint.

Cite this