Learning clustering-friendly representations via partial information discrimination and cross-level interaction.

Researchers

Journal

Neural networks : the official journal of the International Neural Network Society

Modalities

Models

Abstract

Despite significant advances in the deep clustering research, there remain three critical limitations to most of the existing approaches. First, they often derive the clustering result by associating some distribution-based loss to specific network layers, neglecting the potential benefits of leveraging the contrastive sample-wise relationships. Second, they frequently focus on representation learning at the full-image scale, overlooking the discriminative information latent in partial image regions. Third, although some prior studies perform the learning process at multiple levels, they mostly lack the ability to exploit the interaction between different learning levels. To overcome these limitations, this paper presents a novel deep image clustering approach via Partial Information discrimination and Cross-level Interaction (PICI). Specifically, we utilize a Transformer encoder as the backbone, coupled with two types of augmentations to formulate two parallel views. The augmented samples, integrated with masked patches, are processed through the Transformer encoder to produce the class tokens. Subsequently, three partial information learning modules are jointly enforced, namely, the partial information self-discrimination (PISD) module for masked image reconstruction, the partial information contrastive discrimination (PICD) module for the simultaneous instance- and cluster-level contrastive learning, and the cross-level interaction (CLI) module to ensure the consistency across different learning levels. Through this unified formulation, our PICI approach for the first time, to our knowledge, bridges the gap between the masked image modeling and the deep contrastive clustering, offering a novel pathway for enhanced representation learning and clustering. Experimental results across six image datasets demonstrate the superiority of our PICI approach over the state-of-the-art. In particular, our approach achieves an ACC of 0.772 (0.634) on the RSOD (UC-Merced) dataset, which shows an improvement of 29.7% (24.8%) over the best baseline. The source code is available at https://github.com/Regan-Zhang/PICI.Copyright © 2024 Elsevier Ltd. All rights reserved.

Show Full Text

Learning clustering-friendly representations via partial information discrimination and cross-level interaction.

Researchers

Journal

Modalities

Models

Abstract

SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance.

Lane Marking Detection via Deep Convolutional Neural Network.

Behavioral profiling for adaptive video summarization: From generalization to personalization.

Predicting CRISPR/Cas9 Repair Outcomes by Attention-Based Deep Learning Framework.

Real-time mental stress detection using multimodality expressions with a deep learning framework.

Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply