Other

DctViT: Discrete Cosine Transform meet vision transformers.

February 1, 2024 Other

Researchers

Botong Zhao Di Wu Keke Su Lihua Cao Ning Li Xiyu Han Yangfan Liu

Journal

Neural networks : the official journal of the International Neural Network Society

Modalities

Models

CNN DCT ViT

Abstract

Vision transformers (ViTs) have become one of the dominant frameworks for vision tasks in recent years because of their ability to efficiently capture long-range dependencies in image recognition tasks using self-attention. In fact, both CNNs and ViTs have advantages and disadvantages in vision tasks, and some studies suggest that the use of both may be an effective way to balance performance and computational cost. In this paper, we propose a new hybrid network based on CNN and transformer, using CNN to extract local features and transformer to capture long-distance dependencies. We also proposed a new feature map resolution reduction based on Discrete Cosine Transform and self-attention, named DCT-Attention Down-sample (DAD). Our DctViT-L achieves 84.8% top-1 accuracy on ImageNet 1K, far outperforming CMT, Next-ViT, SpectFormer and other state-of-the-art models, with lower computational costs. Using DctViT-B as the backbone, RetinaNet can achieve 46.8% mAP on COCO val2017, which improves mAP by 2.5% and 1.1% with less calculation cost compared with CMT-S and SpectFormer as the backbone.Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.

Show Full Text

DctViT: Discrete Cosine Transform meet vision transformers.

Researchers

Journal

Modalities

Models

Abstract

Policy Correction and State-Conditioned Action Evaluation for Few-Shot Lifelong Deep Reinforcement Learning.

Open Source Assessment of Deep Learning Visual Object Detection.

Deep Convolutional Tables: Deep Learning Without Convolutions.

Coupled Nonlinear Delay Systems as Deep Convolutional Neural Networks.

Hierarchical Recurrent Neural Hashing for Image Retrieval With Hierarchical Convolutional Features.

Cognitive Relevance Transform for Population Re-Targeting.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply