Artificial Intelligence | Computer Vision

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future.

June 14, 2024 Artificial Intelligence, Computer Vision

Researchers

Chaoyang Zhu Long Chen

Journal

IEEE transactions on pattern analysis and machine intelligence

Modalities

Models

deep learning Transfer Learning

Abstract

As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By “open-vocabulary”, we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed. In addition, we benchmark each task along with the vital components of each method in appendix and updated online at awesome-ovd-ovs. Finally, several promising directions are provided and discussed to stimulate future research.

Show Full Text

A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future.

Researchers

Journal

Modalities

Models

Abstract

Accurate prediction of responses to transarterial chemoembolization for patients with hepatocellular carcinoma by using artificial intelligence in contrast-enhanced ultrasound.

Factor-GAN: Enhancing stock price prediction and factor investment with Generative Adversarial Networks.

Transfer learning for image classification using VGG19: Caltech-101 image data set.

Intelligent Perception System of Robot Visual Servo for Complex Industrial Environment.

Explainable Graph Wavelet Denoising Network for Intelligent Fault Diagnosis.

HAKE: A Knowledge Engine Foundation for Human Activity Understanding.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply