MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

Abstract

A fundamental task in computer vision is the process of differentiation and identification of different objects or entities in a visual scene using semantic segmentation methods. The advancement of transformer networks has surpassed traditional convolutional neural network (CNN) architectures in terms of segmentation performance. The continuous pursuit of optimal performance, with respect to the popular evaluation metric results, has led to very large architectures that require a significant amount of computational power to operate, making them prohibitive for real-time applications, including autonomous driving. In this paper, we propose a model that leverages a visual transformer encoder with a parallel twin decoder, consisting of a visual transformer decoder and a CNN decoder with multi-resolution connections working in parallel. The two decoders are merged with the aid of two trainable CNN blocks, the fuser that combined the information from the two decoders and the scaler that scales the contribution of each decoder. The proposed model achieves state-of-the-art performance on the Cityscapes and ADE20K datasets, maintaining a low-complexity network that can be used in real-time applications.

Show Full Text

MResTNet: A Multi-Resolution Transformer Framework with CNN Extensions for Semantic Segmentation.

Researchers

Journal

Modalities

Models

Abstract

Direct Unsupervised Super-Resolution using Generative Adversarial Network (DUS-GAN) for Real-World Data.

Design and optimization of haze prediction model based on particle swarm optimization algorithm and graphics processor.

RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic Scenarios.

Worldwide research landscape of artificial intelligence in lung disease: A scientometric study.

Fusion of fruit image processing and deep learning: a study on identification of citrus ripeness based on R-LBP algorithm and YOLO-CIT model.

Artificial Intelligence in Digestive Endoscopy-Where Are We and Where Are We Going?

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply