Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces.
Researchers
Journal
Modalities
Models
Abstract
The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. On the other hand, training neural networks without over-parameterization faces many practical problems, e.g., being trapped in local optimal. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces. Specifically, our method can generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-ojasiewicz framework. Particularly, we explore several applications of DessiLBI, including finding sparse structures of networks directly via the coupled structure parameter and growing networks from simple to complex ones progressively.