Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces.

Researchers

Journal

Modalities

Models

Abstract

The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. On the other hand, training neural networks without over-parameterization faces many practical problems, e.g., being trapped in local optimal. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces. Specifically, our method can generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-ojasiewicz framework. Particularly, we explore several applications of DessiLBI, including finding sparse structures of networks directly via the coupled structure parameter and growing networks from simple to complex ones progressively.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *