ECCT: Efficient Contrastive Clustering via Pseudo-Siamese Vision Transformer and Multi-view Augmentation.

Researchers

Chong Zhao Di Wu Fan Yang Taizhang Hu Xing Wei Yang Lu

Journal

Neural networks : the official journal of the International Neural Network Society

Modalities

Models

Hilbert Patch Embedding (HPE)Vision Transformer (ViT)

Abstract

Image clustering aims to divide a set of unlabeled images into multiple clusters. Recently, clustering methods based on contrastive learning have attracted much attention due to their ability to learn discriminative feature representations. Nevertheless, existing clustering algorithms face challenges in capturing global information and preserving semantic continuity. Additionally, these methods often exhibit relatively singular feature distributions, limiting the full potential of contrastive learning in clustering. These problems can have a negative impact on the performance of image clustering. To address the above problems, we propose a deep clustering framework termed Efficient Contrastive Clustering via Pseudo-Siamese Vision Transformer and Multi-view Augmentation (ECCT). The core idea is to introduce Vision Transformer (ViT) to provide the global view, and improve it with Hilbert Patch Embedding (HPE) module to construct a new ViT branch. Finally, we fuse the features extracted from the two ViT branches to obtain both global view and semantic coherence. In addition, we employ multi-view random aggressive augmentation to broaden the feature distribution, enabling the model to learn more comprehensive and richer contrastive features. Our results on five datasets demonstrate that ECCT outperforms previous clustering methods. In particular, the ARI metric of ECCT on the STL-10 (ImageNet-Dogs) dataset is 0.852 (0.424), which is 10.3% (4.8%) higher than the best baseline.Copyright © 2024 Elsevier Ltd. All rights reserved.

Show Full Text

ECCT: Efficient Contrastive Clustering via Pseudo-Siamese Vision Transformer and Multi-view Augmentation.

Researchers

Journal

Modalities

Models

Abstract

A comprehensive swarming intelligent method for optimizing deep learning-based object detection by unmanned ground vehicles.

Deep-learning-enabled protein-protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution.

Role of artificial intelligence and machine learning in interventional cardiology.

Use of artificial intelligence for decision-support to avoid high-risk behaviors during laparoscopic cholecystectomy.

Cervical lymph node metastasis prediction from papillary thyroid carcinoma US videos: a prospective multicenter study.

Compact Image-Style Transfer: Channel Pruning on the Single Training of a Network.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply