In 1957, a biochemist and crystallographer named John Kendrew became the first person to determine the 3D structure of a protein. Deciphering that one structure – that of myoglobin, the protein responsible for supplying oxygen to our muscles – had taken him more than two decades of painstaking research, and it was such a significant discovery that it would later win him the Nobel Prize.
In 2022, DeepMind'ss AlphaFold Artificial Intelligence just predicted the structures of 200 million more.
“Essentially, you can think of it as covering the entire protein universe,” Demis Hassabis, DeepMind’s founder and chief executive, told The Guardian. “It includes predictive structures for plants, bacteria, animals, and many other organisms, opening up huge new opportunities for… important issues, such as sustainability, food insecurity, and neglected diseases.”
It’s hard to overstate what a big deal this is: proteins are the building blocks of life, determining every biological process that occurs – from kickstarting life itself to causing (and maybe curing) cancers. But so far, we’ve properly understood only a tiny fraction of a fraction of them, with current experimental methods having determined only 190,000 protein structures.
That may sound like a lot overall, but it’s the equivalent of one known protein structure for every 999 unknown. The problem is that, while it’s simple enough in this modern age to sequence a protein’s DNA – thereby learning the chain of amino acids that makes it up – it’s the 3D structure that determines the protein’s actual function. You could think of it as being like tiny, super-complicated origami, except all you’ve been given is a sheet of paper and the information that if you fold right, you should end up with some kind of bird.
“Determining the 3D structure of a protein used to take many months or years,” Eric Topol, Founder and Director of the Scripps Research Translational Institute, said in a statement. But in November 2020, the AI company DeepMind released AlphaFold: a program that could rapidly predict this information using an algorithm.
Finding those structures “now takes seconds,” Topol said. “And with this new addition of structures illuminating nearly the entire protein universe, we can expect more biological mysteries to be solved each day.”
AlphaFold has already proven itself in the field: last year, DeepMind published its predictions for the structures of nearly every human protein – all 20,000 of them. That database “became an essential tool for biopharma research nearly overnight,” said Rosana Kapeller, President & CEO of biotech company ROME Therapeutics, in a statement.
“It is allowing us to predict protein structures in areas of the dark genome that have never been solved for before,” she explained. “AlphaFold speed and accuracy is accelerating the drug discovery process, and we’re only at the beginning of realising its impact on getting novel medicines to patients faster.”
But with this latest announcement, AlphaFold is expanding the number of known or predicted protein structures by more than 200 times. The update includes predicted structures for plants, bacteria, animals, and other organisms – in fact, nearly every protein known to science.
“As someone who’s been in genomics and computational biology since the 1990s, I’ve seen many of these moments come where you can sense the landscape shifting under you and the provision of new resources, and this has been one of the fastest,” Ewan Birney told New Scientist. “I mean, two years ago, we just simply did not realize that this was feasible.”
Birney is Deputy Director General of the European Molecular Biology Laboratory, or EMBL, and Joint Director of EMBL-EBI, the EMBL’s European Bioinformatics Institute (EMBL-EBI). In collaboration with DeepMind, the Institute has created a free searchable database of AlphaFold’s predictions, which has been accessed by more than half a million researchers across the world in the year since its launch.
The availability of so many new protein structures is great news for working scientists across countless disciplines. “It’s been so inspiring to see the myriad ways the research community has taken AlphaFold,” said Demis Hassabis in a statement.
“[They are] using it for everything from understanding diseases, to protecting honey bees, to deciphering biological puzzles, to looking deeper into the origins of life itself.”
But as with so many scientific breakthroughs, the announcement is likely to open as many questions as it has answered – if not more. The structures in the database have so far turned out to be pretty accurate – but they’re only predictions, and they are based only on already-determined structures. More to the point, the algorithm can’t yet tell us anything about the interactions between proteins, or figure out how they fold rather than just their final structure.
“Once you discover one thing, then there are more problems thrown up,” Keith Willison, chair of chemical biology at Imperial College London, told New Scientist.
“It’s quite terrifying actually, how complicated biology is.”