|

RNA3DB: A dataset for training and benchmarking deep learning models for RNA structure prediction.

Researchers

Journal

Modalities

Models

Abstract

With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. Our dataset clusters RNA 3D chains into distinct groups that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. For the PDB RNA chains as of 2024-01-10, RNA3DB produces 118 independent components with a total of 1,645 distinct RNA sequences with 21,005 reported crystal structures, representing 216 different Rfam structural families. A potential split consists of a training set of 1,152 RNA sequences, with 9,832 experimentally determined structures that belong to 169 distinct RNA structural Rfam families (at an E-value of 10 -3 ), and a test set of 493 RNA sequences with 1,344 structures that belong to 47 structural Rfam families. This split guarantees that all test RNA chains are distinct by sequence and structure from those in the training set. We provide the methodology along with the source-code, with the goal of creating a reproducible and customizable tool for RNA structure prediction.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *