TransCrack: revisiting fine-grained road crack detection with a transformer design.

July 16, 2023 Civil Engineering, Computer Vision

Abstract

Prior convolution-based road crack detectors typically learn more abstract visual representation with increasing receptive field via an encoder-decoder architecture. Despite the promising accuracy, progressive spatial resolution reduction causes semantic feature blurring, leading to coarse and incontiguous distress detection. To these ends, an alternative sequence-to-sequence perspective with a transformer network termed TransCrack is introduced for road crack detection. Specifically, an image is decomposed into a grid of fixed-size crack patches, which is flattened with position embedding into a sequence. We further propose a pure transformer-based encoder with multi-head reduced self-attention modules and feed-forward networks for explicitly modelling long-range dependencies from the sequential input in a global receptive field. More importantly, a simple decoder with cross-layer aggregation architecture is developed to incorporate global with local attentions across different regions for detailed feature recovery and pixel-wise crack mask prediction. Empirical studies are conducted on three publicly available damage detection benchmarks. The proposed TransCrack achieves a state-of-the-art performance over all counterparts by a substantialmargin, and qualitative results further demonstrate its superiority in contiguous crack recognition and fine-grained profile extraction. This article is part of the theme issue ‘Artificial intelligence in failure analysis of transportation infrastructure and materials’.

Show Full Text

TransCrack: revisiting fine-grained road crack detection with a transformer design.

Researchers

Journal

Modalities

Models

Abstract

CNN-Based Fault Detection of Scan Matching for Accurate SLAM in Dynamic Environments.

YOLOv5 Drone Detection Using Multimodal Data Registered by the Vicon System.

Automatic counting of rapeseed inflorescences using deep learning method and UAV RGB imagery.

SpikeSegNet-a deep learning approach utilizing encoder-decoder network with hourglass for spike segmentation and counting in wheat plant from visual imaging.

Understanding Uncertainty Maps in Vision with Statistical Testing.

P-CSEM: An Attention Module for Improved Laparoscopic Surgical Tool Detection.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply