World-leading Australian science: 17M protein structures added to the AlphaFold Database to accelerate the fight against antimicrobial resistance
Australian researcher, George Bouras, has recently contributed an extraordinary 17 million protein predictions to an international open access database. The availability of the large-scale dataset in the AlphaFold Protein Structure Database will have a transformative impact on the international fight to combat antimicrobial resistance.
As the lead of the AllTheBacteria protein structure prediction project, the Adelaide University bioinformatician and current PhD student completed his world-leading work using resources made available through a BioCommons partnership with Pawsey Supercomputing Research Centre. George’s work became possible only when the right human and compute resources came together. Working closely with BioCommons, Pawsey’s Dr Sarah Beecroft containerised the ColabFold tool for use on Setonix's AMD GPUs. Once ColabFold was ported and stable, George could utilise the massive scale of Pawsey’s Setonix to create the 17 million structural predictions for bacterial proteins.
Source image: AlphaFold prediction of a banna virus spike protein VP4 (AF-0000000365762994-v1). Design credit: Karen Arnott/ EMBL-EBI.
George was honoured to contribute to the AlphaFold Database, one of several high-value datasets for microbial and viral proteins selected from specialist communities. This integration of essential, high-quality datasets from users reinforces the AlphaFold Database’s role as an inclusive, and community-driven resource. The database provides open access to over 200 million protein structure predictions, and the developers,Google DeepMind and EMBL’s European Bioinformatics Institute (EMBL-EBI), are wanting to expand their impact for specialist areas including pandemic preparedness, antimicrobial resistance, neglected tropical diseases and environmental sciences.
“I hope that access to these novel bacterial protein structures derived from high-quality genome assemblies will lead to better understanding of the function of all bacterial proteins.” – George Bouras, lead of the AllTheBacteria
The availability of the ColabFold container for use on Setonix’s AMD GPUs also allowed George to generate more than 3 million phage and viral structures, which are now used for protein structure-informed bacteriophage genome annotation hundreds of thousands of times each day by researchers around the world.
This example shows how close working relationships with both researchers and the Tier-1 HPC infrastructures enables BioCommons to precisely respond to community needs and accelerate Australian science at a scale. Making valuable tools accessible on national platforms is a focus of the BioCommons BioCLI project, and this work paves the way for the creation of a new national protein folding service further streamlining the use of Pawsey computing resources.
A publication about this work is currently under peer review, but in the meantime you can read the release from EBI.