Deep learning and support vector machines for transcription start site identification.

Researchers

Alicia Olivares-Gil César García-Osorio José A Barbero-Aparicio José F Díez-Pastor

Journal

Modalities

Models

Long short-term memory (LSTM) neural networks Support Vector Machines (SVM)

Abstract

Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.©2023 Barbero-Aparicio et al.

Show Full Text

Deep learning and support vector machines for transcription start site identification.

Researchers

Journal

Modalities

Models

Abstract

DeepPod: a convolutional neural network based quantification of fruit number in Arabidopsis.

Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model.

Comparative Analysis on Machine Learning and Deep Learning to Predict Post-Induction Hypotension.

Multiscale deep learning framework captures systemic immune features in lymph nodes predictive of triple negative breast cancer outcome in large-scale studies.

Evaluation of a deep learning-based pelvic synthetic CT generation technique for MRI-based prostate proton treatment planning.

Improving needle visibility in LED-based photoacoustic imaging using deep learning with semi-synthetic datasets.

Leave a Reply Cancel reply

Researchers

Journal

Modalities

Models

Abstract

Similar Posts

Leave a Reply Cancel reply