FV-MViT: Mobile Vision Transformer for Finger Vein Recognition.

Researchers

Journal

Modalities

Models

Abstract

In addressing challenges related to high parameter counts and limited training samples for finger vein recognition, we present the FV-MViT model. It serves as a lightweight deep learning solution, emphasizing high accuracy, portable design, and low latency. The FV-MViT introduces two key components. The Mul-MV2 Block utilizes a dual-path inverted residual connection structure for multi-scale convolutions, extracting additional local features. Simultaneously, the Enhanced MobileViT Block eliminates the large-scale convolution block at the beginning of the original MobileViT Block. It converts the Transformer’s self-attention into separable self-attention with linear complexity, optimizing the back end of the original MobileViT Block with depth-wise separable convolutions. This aims to extract global features and effectively reduce parameter counts and feature extraction times. Additionally, we introduce a soft target center cross-entropy loss function to enhance generalization and increase accuracy. Experimental results indicate that the FV-MViT achieves a recognition accuracy of 99.53% and 100.00% on the Shandong University (SDU) and Universiti Teknologi Malaysia (USM) datasets, with equal error rates of 0.47% and 0.02%, respectively. The model has a parameter count of 5.26 million and exhibits a latency of 10.00 milliseconds from the sample input to the recognition output. Comparison with state-of-the-art (SOTA) methods reveals competitive performance for FV-MViT.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *