Self-Growing Binary Activation Network: A Novel Deep Learning Model With Dynamic Architecture.

Researchers

Journal

Modalities

Models

Abstract

For a deep learning model, the network architecture is crucial as a model with inappropriate architecture often suffers from performance degradation or parameter redundancy. However, it is experiential and difficult to find the appropriate architecture for a certain application. To tackle this problem, we propose a novel deep learning model with dynamic architecture, named self-growing binary activation network (SGBAN), which can extend the design of a fully connected network (FCN) progressively, resulting in a more compact architecture with higher performance on a certain task. This constructing process is more efficient than neural architecture search methods that train mass of networks to search for the optimal one. Concretely, the training technique of SGBAN is based on the function-preserving transformations that can expand the architecture and combine the information in the new data without neglecting the knowledge learned in the previous steps. The experimental results on four different classification tasks, i.e., Iris, MNIST, CIFAR-10, and CIFAR-100, demonstrate the effectiveness of SGBAN. On the one hand, SGBAN achieves competitive accuracy when compared with the FCN composed of the same architecture, which indicates that the new training technique has the equivalent optimization ability as the traditional optimization methods. On the other hand, the architecture generated by SGBAN achieves 0.59% improvements of accuracy, with only 33.44% parameters when compared with the FCNs composed of manual design architectures, i.e., 500 + 150 hidden units, on MNIST. Furthermore, we demonstrate that replacing the fully connected layers of the well-trained VGG-19 with SGBAN can gain a slightly improved performance with less than 1% parameters on all these tasks. Finally, we show that the proposed method can conduct the incremental learning tasks and outperform the three outstanding incremental learning methods, i.e., learning without forgetting, elastic weight consolidation, and gradient episodic memory, on both the incremental learning tasks on Disjoint MNIST and Disjoint CIFAR-10.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *