| |

Efficient learning rate adaptation based on hierarchical optimization approach.

Researchers

Journal

Modalities

Models

Abstract

This paper proposes a new hierarchical approach to learning rate adaptation in gradient methods, called learning rate optimization (LRO). LRO formulates the learning rate adaption problem as a hierarchical optimization problem that minimizes the loss function with respect to the learning rate for current model parameters and gradients. Then, LRO optimizes the learning rate based on the alternating direction method of multipliers (ADMM). In the process of this learning rate optimization, LRO does not require any second-order information and probabilistic model, so it is highly efficient. Furthermore, LRO does not require any additional hyperparameters when compared to the vanilla gradient method with the simple exponential learning rate decay. In the experiments, we integrated LRO with vanilla SGD and Adam. Then, we compared their optimization performance with the state-of-the-art learning rate adaptation methods and also the most commonly-used adaptive gradient methods. The SGD and Adam with LRO outperformed all the competitors on the benchmark datasets in image classification tasks.Copyright © 2022 The Author(s). Published by Elsevier Ltd.. All rights reserved.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *