Tracking Beyond Detection: Learning a Global Response Map for End-to-End Multi-Object Tracking.

Researchers

Journal

Modalities

Models

Abstract

Most of the existing Multi-Object Tracking (MOT) approaches follow the Tracking-by-Detection and Data Association paradigm, in which objects are firstly detected and then associated in the tracking process. In recent years, deep neural network has been utilized to obtain more discriminative appearance features for cross-frame association, and noticeable performance improvement has been reported. On the other hand, the Tracking-by-Detection framework is yet not completely end-to-end, which leads to huge computation and limited performance especially in the inference (tracking) process. To address this problem, we present an effective end-to-end deep learning framework which can directly take image-sequence/video as input and output the located and tracked objects of learned types. Specifically, a novel global response network is learned to project multiple objects in the image-sequence/video into a continuous response map, and the trajectory of each tracked object can then be easily picked out. The overall process is similar to how a detector inputs an image and outputs the bounding boxes of each detected object. Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieves state-of-the-art performance on several tracking metrics.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *