Protein structure prediction, a fundamental challenge in computational biology, aims to predict a protein's 3D structure from its amino acid sequence. This structure is pivotal for elucidating protein functions, interactions, and driving innovations in drug discovery and enzyme engineering. AlphaFold, a powerful deep learning model, has revolutionized this field by leveraging phylogenetic information from multiple sequence alignments (MSAs) to achieve remarkable accuracy in protein structure prediction. However, a key question remains: how well does AlphaFold understand protein structures? This study investigates AlphaFold's capabilities when relying primarily on high-quality template structures, without the additional information provided by MSAs. By designing experiments that probe local and global structural understanding, we aimed to dissect its dependence on specific features and its ability to handle missing information. Our findings revealed AlphaFold's reliance on sterically valid C-β atoms for correctly interpreting structural templates. Additionally, we observed its remarkable ability to recover 3D structures from certain perturbations. Collectively, these results support the hypothesis that AlphaFold has learned an accurate local biophysical energy function. However, this function seems most effective for local interactions. Our work significantly advances understanding of how deep learning models predict protein structures and provides valuable guidance for researchers aiming to overcome limitations in these models.
make install-environment
- also install DSSP into
T_A
make install-attnpacker
- install OpenFold in it's own environment
openfold_env
- install RFDiffusion into it's own environment
- install other utilities if not done before:
- perform installation
bash bash_scripts/bash-packing.sh
→ you have to adjust for CASP13 or CASP14- get results by running
analysis/Packing_results.ipynb
- get SASA analysis by running
analysis/SASA_analysis.ipynb
- perform installation
bash bash_scripts/bash-synthetic-backbones.sh
→ you have to adjust for CASP13 or CASP14 and give your ownMAXIT_PATH
- get results by running
analysis/Recovery_and_prev_x_results.ipynb
If there are questions, please file a GitHub issue or send an e-mail to thomas.lemmin@unibe.ch and jannik.gut@unibe.ch.