Hierarchical Weight Averaging for Deep Neural Networks.

Researchers

Journal

IEEE transactions on neural networks and learning systems

Modalities

Models

Abstract

Despite simplicity, stochastic gradient descent (SGD)-like algorithms are successful in training deep neural networks (DNNs). Among various attempts to improve SGD, weight averaging (WA), which averages the weights of multiple models, has recently received much attention in the literature. Broadly, WA falls into two categories: 1) online WA, which averages the weights of multiple models trained in parallel, is designed for reducing the gradient communication overhead of parallel mini-batch SGD and 2) offline WA, which averages the weights of one model at different checkpoints, is typically used to improve the generalization ability of DNNs. Though online and offline WA are similar in form, they are seldom associated with each other. Besides, these methods typically perform either offline parameter averaging or online parameter averaging, but not both. In this work, we first attempt to incorporate online and offline WA into a general training framework termed hierarchical WA (HWA). By leveraging both the online and offline averaging manners, HWA is able to achieve both faster convergence speed and superior generalization performance without any fancy learning rate adjustment. Besides, we also analyze the issues faced by the existing WA methods, and how our HWA addresses them, empirically. Finally, extensive experiments verify that HWA outperforms the state-of-the-art methods significantly.

Show Full Text

Hierarchical Weight Averaging for Deep Neural Networks.

Researchers

Journal

Modalities

Models

Abstract

Development and validation of deep learning algorithms for scoliosis screening using back images.

Localization of common carotid artery transverse section in B-mode ultrasound images using faster RCNN: a deep learning approach.

Current state and future prospects of artificial intelligence in ophthalmology: a review.

Advanced deep learning-based image reconstruction in lumbar spine MRI at 0.55 T – Effects on image quality and acquisition time in comparison to conventional deep learning-based reconstruction.

Towards unsupervised classification of macromolecular complexes in cryo electron tomography: Challenges and opportunities.

Precision dose prediction for breast cancer patients undergoing IMRT: The Swin-UMamba-Channel Model.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply