Achieving small-batch accuracy with large-batch scalability via Hessian-aware learning rate adjustment.

Researchers

Chaoyang He Salman Avestimehr Sunwoo Lee

Journal

Neural networks : the official journal of the International Neural Network Society

Modalities

Models

Abstract

We consider synchronous data-parallel neural network training with a fixed large batch size. While the large batch size provides a high degree of parallelism, it degrades the generalization performance due to the low gradient noise scale. We propose a general learning rate adjustment framework and three critical heuristics that tackle the poor generalization issue. The key idea is to adjust the learning rate based on geometric information of loss landscape and encourage the model to converge into a flat minimum that is known to better generalize to the unknown data. Our empirical study demonstrates that the Hessian-aware learning rate schedule remarkably improves the generalization performance in large-batch training. For CIFAR-10 classification with ResNet20, our method achieves 92.31% accuracy using 16,384 batch size, which is close to 92.83% achieved using 128 batch size, at a negligible extra computational cost.Copyright © 2022 Elsevier Ltd. All rights reserved.

Show Full Text

Achieving small-batch accuracy with large-batch scalability via Hessian-aware learning rate adjustment.

Researchers

Journal

Modalities

Models

Abstract

CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification.

Advancing Spectrally-Resolved Single Molecule Localization Microscopy with Deep Learning.

A human-centred assessment framework to prioritise heat mitigation efforts for active travel at city scale.

Least kth-Order and Rényi Generative Adversarial Networks.

DISCO: A deep learning ensemble for uncertainty-aware segmentation of acoustic signals.

The Applications of Metaheuristics for Human Activity Recognition and Fall Detection Using Wearable Sensors: A Comprehensive Analysis.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply