Computer Vision | Neuroscience

Revisiting Video Saliency Prediction in the Deep Learning Era.

June 27, 2019 Computer Vision, Neuroscience

Abstract

Predicting where people look in static scenes, a.k.a visual saliency, has received significant research interests recently. However, relatively less effort has been spent in understanding visual attention over dynamic scenes. This work makes three contributions to video saliency research. First, we introduce a new benchmark, called DHF1K, for predicting fixations during dynamic scene free-viewing, which is a long-time need in this field. DHF1K consists of 1K high-quality, elaborately selected videos annotated by 17 observers using an eye tracker. The videos span a wide range of scenes, motions, object types and backgrounds. Second, we propose a novel video saliency model, called ACLNet, that augments the CNN-LSTM network with a supervised attention mechanism to enable fast end-to-end learning. The attention mechanism explicitly encodes static saliency information, thus allowing LSTM to focus on learning a more flexible temporal saliency representation. Such a design leverages existing large-scale static fixation datasets, avoids overfitting, and significantly improves training efficiency and testing performance. Third, we perform an extensive evaluation of state-of-the-art saliency models on three current datasets (i.e., DHF1K, Hollywood2, UCF sports). Experimental results over more than 1.2K testing videos containing 400K frames demonstrate that ACLNet outperforms other contenders and has a fast processing speed (40fps).

Show Full Text

Revisiting Video Saliency Prediction in the Deep Learning Era.

Researchers

Journal

Modalities

Models

Abstract

Underwater image recovery utilizing polarimetric imaging based on neural networks.

Deep learning-based synapse counting and synaptic ultrastructure analysis of electron microscopy images.

A novel hybrid face mask detection approach using Transformer and convolutional neural network models.

Spatiotemporal Mapping and Molecular Basis of Whole-brain Circuit Maturation.

Analysis of Training Deep Learning Models for PCB Defect Detection.

Polarization-guided road detection network for LWIR division-of-focal-plane camera.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply