Researchers have used AlphaFold, a revolutionary artificial intelligence system developed by DeepMind and Google, to predict the structures of more than 200 million proteins from some 1 million species, covering almost every known protein on the planet. This unprecedented achievement has opened up new possibilities for understanding the function and evolution of proteins, the building blocks of life.
To analyse the vast amount of data generated by AlphaFold, the researchers developed a new algorithm called Foldseek Cluster, which can efficiently compare and group similar protein structures. The algorithm identified over 2 million unique structural clusters, representing distinct protein shapes that have emerged throughout evolution. One third of these clusters had no previous annotations, meaning they had not been described or categorised before.
The researchers, from EMBL’s European Bioinformatics Institute (EMBL-EBI), the Institute of Molecular Systems Biology ETH Zurich, and the School of Biological Sciences Seoul National University, published their findings in the journal Nature.
Insights into the origin of human immunity proteins
One of the surprising discoveries made by the researchers was the structural similarity between human and bacterial proteins involved in immunity. These proteins, called Toll-like receptors (TLRs), recognise foreign molecules and trigger an immune response. The researchers found that some human TLRs share a common ancestor with bacterial proteins that sense DNA damage.
This finding suggests that human immunity proteins evolved from ancient bacterial proteins that detected DNA damage and repaired it. The researchers speculate that this evolutionary link may explain why some bacterial infections can trigger autoimmune diseases, such as lupus, in which the immune system attacks the body’s own DNA.
The researchers also found that some human TLRs have a unique structural feature that allows them to bind to a variety of molecules, such as lipids, proteins, and nucleic acids. This feature may have evolved to enhance the diversity and specificity of the immune recognition.
A transformative resource for protein research
Proteins are essential for all biological processes, from metabolism to signalling. Their function depends on their three-dimensional shape, which is determined by their sequence of amino acids. However, predicting protein structure from sequence is a challenging problem that has eluded scientists for decades.
AlphaFold, a deep-learning artificial intelligence system, has revolutionised the field of protein structure prediction by achieving unprecedented accuracy and speed. The system uses neural networks to recognise patterns in protein sequences and structures, and then predicts the most likely shape for a given sequence.
The AlphaFold database, which is publicly available, contains predicted structures for nearly all catalogued proteins known to science. The database fills a critical gap in understanding protein function and evolution, as only a fraction of proteins have their structures experimentally determined.
The database has already enabled researchers to gain new insights into various biological problems, such as malaria, Parkinson’s disease, honeybee health, and human evolution. As the database continues to expand, algorithms such as Foldseek Cluster emerge as critical tools for navigating and interpreting the wealth of information made available by AI predictions.