|

An intrinsically interpretable neural network architecture for sequence to function learning.

Researchers

Journal

Modalities

Models

Abstract

Sequence-based deep learning approaches have been shown to predict a multitude of functional genomic readouts, including regions of open chromatin and RNA expression of genes. However, a major limitation of current methods is that model interpretation relies on computationally demanding post-hoc analyses, and even then, we often cannot explain the internal mechanics of highly parameterized models. Here, we introduce a deep learning architecture called tiSFM (totally interpretable sequence to function model). tiSFM improves upon the performance of standard multi-layer convolutional models while using fewer parameters. Additionally, while tiSFM is itself technically a multi-layer neural network, internal model parameters are intrinsically interpretable in terms of relevant sequence motifs.tiSFM’s model architecture makes use of convolutions with a fixed set of kernel weights representing known transcription factor (TF) binding site motifs. We analyze published open chromatin measurements across hematopoietic lineage cell-types and demonstrate that tiSFM outperforms a state- of-the-art convolutional neural network model custom-tailored to this dataset. We also show that it correctly identifies context specific activities of transcription factors with known roles in hematopoietic differentiation, including Pax5 and Ebf1 for B-cells, and Rorc for innate lymphoid cells. tiSFM’s model parameters have biologically meaningful interpretations, and we show the utility of our approach on a complex task of predicting the change in epigenetic state as a function of developmental transition.The source code, including scripts for the analysis of key findings, can be found at https://github.com/boooooogey/ATAConv , implemented in [email protected].

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *