|

Improving protein structure prediction with extended sequence similarity searches and deep-learning-based refinement in CASP15.

Researchers

Journal

Modalities

Models

Abstract

The human predictor team PEZYFoldings got first place with the assessor’s formulae (3rd place with Global Distance Test Total Score [GDT-TS]) in the single-domain category and 10th place in the multimer category in Critical Assessment of Structure Prediction 15. In this paper, I describe the exact method used by PEZYFoldings in the competition. As AlphaFold2 and AlphaFold-Multimer, developed by DeepMind, were state-of-the-art structure prediction tools, it was assumed that enhancing the input and output of the tools was an effective strategy to obtain the highest accuracy for structure prediction. Therefore, I used additional tools and databases to collect evolutionarily related sequences and introduced a deep-learning-based model in the refinement step. In addition to these modifications, manual interventions were performed to address various tasks. Detailed analyses were performed after the competition to identify the main contributors to performance. Comparing the number of evolutionarily related sequences I used with those of the other teams that provided AlphaFold2’s baseline predictions revealed that an extensive sequence similarity search was one of the main contributors. Nonetheless, there were specific targets for which I could not identify any evolutionarily related sequences, resulting in my inability to construct accurate structures for these targets. Notably, I noticed that I had gained large Z-scores with the subunits of H1137, for which I performed manual domain parsing considering the interfaces between the subunits. This finding implies that the manual intervention contributed to my performance. The influence of the refinement model on the accuracy of structure prediction was minimal. I could have predicted structures with a similar level of accuracy without employing the refinement model. However, from the perspective of accuracy self-estimate, many structures demonstrated improvement after refinement. This improvement likely had a substantial influence on improving my position in the assessor’s formulae rankings. These results highlight the opportunities for improvement in (1) multimer prediction, (2) building of larger and more diverse databases, and (3) developing tools to predict structures from primary sequences alone. In addition, transferring the manual intervention process to automation is a future concern.© 2023 Wiley Periodicals LLC.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *