High-performance deep learning pipeline predicts individuals in mixtures of DNA using sequencing data.

Researchers

Amrita Chattopadhyay Eric Y Chuang Hsiang-I Yin Hsiao-Lin Hwa Liang-Chuan Lai Mong-Hsun Tsai Nam Nhut Phan Tsui-Ting Lee Tzu-Pin Lu

Journal

Briefings in bioinformatics

Modalities

Models

deep learning

Abstract

In this study, we proposed a deep learning (DL) model for classifying individuals from mixtures of DNA samples using 27 short tandem repeats and 94 single nucleotide polymorphisms obtained through massively parallel sequencing protocol. The model was trained/tested/validated with sequenced data from 6 individuals and then evaluated using mixtures from forensic DNA samples. The model successfully identified both the major and the minor contributors with 100% accuracy for 90 DNA mixtures, that were manually prepared by mixing sequence reads of 3 individuals at different ratios. Furthermore, the model identified 100% of the major contributors and 50-80% of the minor contributors in 20 two-sample external-mixed-samples at ratios of 1:39 and 1:9, respectively. To further demonstrate the versatility and applicability of the pipeline, we tested it on whole exome sequence data to classify subtypes of 20 breast cancer patients and achieved an area under curve of 0.85. Overall, we present, for the first time, a complete pipeline, including sequencing data processing steps and DL steps, that is applicable across different NGS platforms. We also introduced a sliding window approach, to overcome the sequence length variation problem of sequencing data, and demonstrate that it improves the model performance dramatically.© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].

Show Full Text

High-performance deep learning pipeline predicts individuals in mixtures of DNA using sequencing data.

Researchers

Journal

Modalities

Models

Abstract

Multimodal spatio-temporal deep learning approach for neonatal postoperative pain assessment.

Automatic MRI segmentation of pectoralis major muscle using deep learning.

Nonlinear Chemical Process Fault Diagnosis Using Ensemble Deep Support Vector Data Description.

Vehicle and Person Re-Identification With Support Neighbor Loss.

BCM3D 2.0: accurate segmentation of single bacterial cells in dense biofilms using computationally generated intermediate image representations.

Machine learning techniques for biomedical image segmentation: An overview of technical aspects and introduction to state-of-art applications.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply