Other

Detecting Errors with Zero-Shot Learning.

July 31, 2022 Other

Researchers

Ning Wang Xiaoyu Wu

Journal

Entropy (Basel, Switzerland)

Modalities

Models

AEGAN Pre-trained language model SAT-GAN Self-attention mechanism

Abstract

Error detection is a critical step in data cleaning. Most traditional error detection methods are based on rules and external information with high cost, especially when dealing with large-scaled data. Recently, with the advances of deep learning, some researchers focus their attention on learning the semantic distribution of data for error detection; however, the low error rate in real datasets makes it hard to collect negative samples for training supervised deep learning models. Most of the existing deep-learning-based error detection algorithms solve the class imbalance problem by data augmentation. Due to the inadequate sampling of negative samples, the features learned by those methods may be biased. In this paper, we propose an AEGAN (Auto-Encoder Generative Adversarial Network)-based deep learning model named SAT-GAN (Self-Attention Generative Adversarial Network) to detect errors in relational datasets. Combining the self-attention mechanism with the pre-trained language model, our model can capture semantic features of the dataset, specifically the functional dependency between attributes, so that no rules or constraints are needed for SAT-GAN to identify inconsistent data. For the lack of negative samples, we propose to train our model via zero-shot learning. As a clean-data tailored model, SAT-GAN tries to recognize error data as outliers by learning the latent features of clean data. In our evaluation, SAT-GAN achieves an average F1-score of 0.95 on five datasets, which yields at least 46.2% F1-score improvement over rule-based methods and outperforms state-of-the-art deep learning approaches in the absence of rules and negative samples.

Show Full Text

Detecting Errors with Zero-Shot Learning.

Researchers

Journal

Modalities

Models

Abstract

An intelligent LinkNet-34 model with EfficientNetB7 encoder for semantic segmentation of brain tumor.

A Low-Complexity Hand Gesture Recognition Framework via Dual mmWave FMCW Radar System.

Deep Learning for Approaching Hepatocellular Carcinoma Ultrasound Screening Dilemma: Identification of α-Fetoprotein-Negative Hepatocellular Carcinoma From Focal Liver Lesion Found in High-Risk Patients.

A deep learning framework for denoising and ordering scRNA-seq data using adversarial autoencoder with dynamic batching.

Current Value of Biparametric Prostate MRI with Machine-Learning or Deep-Learning in the Detection, Grading, and Characterization of Prostate Cancer: A Systematic Review.

Deep learning model for distinguishing Mayo endoscopic subscore 0 and 1 in patients with ulcerative colitis.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply