|

Insertion-deletions are depleted in protein regions with predicted secondary structure.

Researchers

Journal

Modalities

Models

Abstract

A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here we test the null hypothesis that insertion-deletion events (indels) in protein coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes, then quantified their overlap with four different types of secondary structure – alpha helices, beta strands, protein bends, and protein turns – predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected, and were especially under-represented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.© The Author(s) 2024. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *