Scalable and Practical Natural Gradient for Large-Scale Deep Learning.

Researchers

Journal

IEEE transactions on pattern analysis and machine intelligence

Modalities

Models

Abstract

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or <i>ad hoc</i> modifications of batch normalization. We propose <i>Scalable and Practical Natural Gradient Descent</i> (SP-NGD), a principled approach for training models that allows them to attain similar generalization performance to models trained with first-order optimization methods, but with accelerated convergence. Furthermore, SP-NGD scales to large mini-batch sizes with a negligible computational overhead as compared to first-order methods. We evaluated SP-NGD on a benchmark task where highly optimized first-order methods are available as references: training a ResNet-50 model for image classification on ImageNet. We demonstrate convergence to a top-1 validation accuracy of 75.4% in 5.5 minutes using a mini-batch size of 32,768 with 1,024 GPUs, as well as an accuracy of 74.9% with an extremely large mini-batch size of 131,072 in 873 steps of SP-NGD.

Show Full Text

Scalable and Practical Natural Gradient for Large-Scale Deep Learning.

Researchers

Journal

Modalities

Models

Abstract

COVID-CT-MD, COVID-19 computed tomography scan dataset applicable in machine learning and deep learning.

Deep vessel segmentation by learning graphical connectivity.

Deep Learning-Based High-Frequency Ultrasound Skin Image Classification with Multicriteria Model Evaluation.

PSNet: prostate segmentation on MRI based on a convolutional neural network.

Automatic diagnosis for cysts and tumors of both jaws on panoramic radiographs using a deep convolution neural network.

Efficient deep learning approach for augmented detection of Coronavirus disease.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply