Skip to main navigation Skip to search Skip to main content

Taming the noisy gradient: Train deep neural networks with small batch sizes

  • Yikai Zhang
  • , Hui Qu
  • , Chao Chen
  • , Dimitris Metaxas
  • Rutgers - The State University of New Jersey, New Brunswick

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Scopus citations

Abstract

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm.

Original languageEnglish
Title of host publicationProceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019
EditorsSarit Kraus
PublisherInternational Joint Conferences on Artificial Intelligence
Pages4348-4354
Number of pages7
ISBN (Electronic)9780999241141
DOIs
StatePublished - 2019
Event28th International Joint Conference on Artificial Intelligence, IJCAI 2019 - Macao, China
Duration: Aug 10 2019Aug 16 2019

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2019-August
ISSN (Print)1045-0823

Conference

Conference28th International Joint Conference on Artificial Intelligence, IJCAI 2019
Country/TerritoryChina
CityMacao
Period08/10/1908/16/19

Fingerprint

Dive into the research topics of 'Taming the noisy gradient: Train deep neural networks with small batch sizes'. Together they form a unique fingerprint.

Cite this