Skip to main navigation Skip to search Skip to main content

LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference for IoT

  • Ramyad Hadidi
  • , Bahar Asgari
  • , Jiashen Cao
  • , Younmin Bae
  • , Da Eun Shim
  • , Hyojong Kim
  • , Sung Kyu Lim
  • , Michael S. Ryoo
  • , Hyesoon Kim
  • Rain AI
  • University of Maryland, College Park
  • Georgia Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Deep neural networks (DNNs) have stimulated research in diverse edge applications including robotics and Internet-of-Things (IoT) devices. However, IoT-based DNN inference poses significant challenges due to resource constraints. Further, as communication is costly, taking advantage of other available IoT devices by using data- or model-parallelism methods is not an effective solution. We introduce a low-communication parallelization (LCP) method to minimize communication over-head in distributed IoT systems. LCP models consist of multiple, largely-independent, narrow branches, providing enhanced distribution and parallelization opportunities while reducing memory and computational requirements. Implemented on AWS instances, Raspberry Pis, and PYNQ boards, as well as a customized 16mW 0.107mm2ASIC @7nm chip, LCP models yield maximum and average speedups of 56x and 7x, compared to original models, which could be improved by incorporating common optimizations such as pruning and quantization.

Original languageEnglish
Title of host publicationProceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1670-1677
Number of pages8
ISBN (Electronic)9798350327595
DOIs
StatePublished - 2023
Event2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023 - Las Vegas, United States
Duration: Jul 24 2023Jul 27 2023

Publication series

NameProceedings - 2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023

Conference

Conference2023 Congress in Computer Science, Computer Engineering, and Applied Computing, CSCE 2023
Country/TerritoryUnited States
CityLas Vegas
Period07/24/2307/27/23

Keywords

  • Distributed
  • DNN
  • FPGA
  • Inference
  • IoT
  • Parallel

Fingerprint

Dive into the research topics of 'LCP: A Low-Communication Parallelization Method for Fast Neural Network Inference for IoT'. Together they form a unique fingerprint.

Cite this