Sparse Representations for Object- and Ego-Motion Estimations in Dynamic Scenes.

Abstract

Disentangling the sources of visual motion in a dynamic scene during self-movement or ego motion is important for autonomous navigation and tracking. In the dynamic image segments of a video frame containing independently moving objects, optic flow relative to the next frame is the sum of the motion fields generated due to camera and object motion. The traditional ego-motion estimation methods assume the scene to be static, and the recent deep learning-based methods do not separate pixel velocities into object- and ego-motion components. We propose a learning-based approach to predict both ego-motion parameters and object-motion field (OMF) from image sequences using a convolutional autoencoder while being robust to variations due to the unconstrained scene depth. This is achieved by: 1) training with continuous ego-motion constraints that allow solving for ego-motion parameters independently of depth and 2) learning a sparsely activated overcomplete ego-motion field (EMF) basis set, which eliminates the irrelevant components in both static and dynamic segments for the task of ego-motion estimation. In order to learn the EMF basis set, we propose a new differentiable sparsity penalty function that approximates the number of nonzero activations in the bottleneck layer of the autoencoder and enforces sparsity more effectively than L1- and L2-norm-based penalties. Unlike the existing direct ego-motion estimation methods, the predicted global EMF can be used to extract OMF directly by comparing it against the optic flow. Compared with the state-of-the-art baselines, the proposed model performs favorably on pixelwise object- and ego-motion estimation tasks when evaluated on real and synthetic data sets of dynamic scenes.

Show Full Text

Sparse Representations for Object- and Ego-Motion Estimations in Dynamic Scenes.

Researchers

Journal

Modalities

Models

Abstract

STMS-YOLOv5: A Lightweight Algorithm for Gear Surface Defect Detection.

Deep Learning on Construction Sites: A Case Study of Sparse Data Learning Techniques for Rebar Segmentation.

Facial Expression Recognition with Contrastive Learning and Uncertainty-Guided Relabeling.

HADF-Crowd: A Hierarchical Attention-Based Dense Feature Extraction Network for Single-Image Crowd Counting.

A Spatiotemporal Deep Neural Network Useful for Defect Identification and Reconstruction of Artworks Using Infrared Thermography.

100+ FPS detector of personal protective equipment for worker safety: A deep learning approach for green edge computing.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply