News
Subscribe to the Australian BioCommons monthly newsletter or read previous editions
Multi-model 3D visualisation enhances Nextflow pipeline for protein structure prediction
Community driven enhancements to Nextflows’ nf-core proteinfold pipeline have simplified the parallel execution, visualisation and comparison of multiple models for protein structure prediction including AlphaFold2, ColabFold, ESMFold.
Predicted protein structures for LmrP visualised using the proteinfold pipeline
Advances in AI are taking protein structure predictions to a whole new level, accelerating research and enabling deeper analysis of protein structure and function. The nf-core community is embracing these developments by building the Nextflow proteinfold pipeline that integrates models such as Alphafold2, Colabfold and Esmfold and simplifies their use on a variety of computing infrastructures.
BioCommons’ Dr Ziad Al Bkhetan, Product Manager - Bioinformatics Platforms and Australian Nextflow Ambassador, identified an opportunity to optimise the existing nf-core proteinfold pipeline for Australian researchers using the Australian Nextflow Seqera Service. Ziad initiated this effort by reaching out to the original developers from the Center for Genomic Regulation (CRG) in Spain with an offer to reconfigure the pipeline and add new features. This sparked an international collaborative effort that connected researchers and experts from Australian BioCommons, the CRG, the Sydney Informatics Hub (SIH) at the University of Sydney and the Structural Biology Facility (SBF) at UNSW, at several hackathons and summits to enhance the pipeline. The enhanced, community-driven pipeline is now available to all through nf-core’s curated set of open‑source analysis pipelines.
The pipeline borrows a useful reporting and visualisation feature already implemented in Galaxy Australia. Front-end developer for BioCommons, Minh Vu, augmented the pipeline to implement this feature which allows the parallel execution of multiple models and generation of reports that visualise the resulting structures simplifying comparison and benchmarking of the outputs. Several state-of-the-art tools such as AlphaFold2, ColabFold, ESMFold are included in the pipeline with additional models including RoseTTAFold-All-Atom, HelixFold3, Boltz, RosettaFold2NA and AlphaFold3 to be added soon.
The ability to run different models through the pipeline without writing new code removes the impediment of command line or complicated compute infrastructure. Reflecting on the project in the Nextflow Podcast, Phil Ewels, Product Manager for Open Source at Seqera, said:
“With almost no setup and no real prior experience, you can run these state of the art models and compare them all in a dynamic visual report. That’s pretty amazing.”
While it is designed to integrate with Seqera Platform, there’s no requirement to use it that way. Running the Nextflow pipeline on the command line gives the exact same reports. The code is freely available for others to use or improve via the nf-core repository of pipelines.
Ziad’s presentation about the collaboration and these new features was spotlighted as a highlight of the recent Nextflow Summit in Seqera’s Nextflow Podcast. Bioinformatics Engineer at Seqera, Dr Florian Wünnemann acknowledges there is great value in improving shared resources:
“I think it really represents the best of the Nextflow community: they are developing tools and not just keeping it for themselves, but directly giving them back to the larger community.”
Rob Syme, Scientific Support Lead at Seqera Labs, believes the work speaks to the Nextflow and Seqera ethos of giving scientists and researchers the tools they need to build other tools.
“I love this project: it was an amazing outcome that required no input from Seqera or Nextflow. Yes, Seqera Platform could absolutely build an alignment viewer into the platform, but it wouldn’t be as good as if researchers themselves develop it. It wouldn’t be as good as the one that Ziad and the team have developed because research moves so incredibly quickly.”
The collaboration within the international nf-core community has been a rewarding experience for all involved parties, and CRG has forged a new working relationship with BioCommons to continue development and maintenance of the pipeline. CRG’s Dr Cedric Notredame said of the experience:
“The collaboration with BioCommons has been so valuable. It has showcased the effectiveness of nf-core as a collaborative tool. Thanks to this framework, all of our teams were able to simultaneously contribute to the pipeline with minimal technical coordination. The pipeline is now one of the most complete go-to resources, covering the needs of a wide community of biologists interested in structural aspects of genomics.”
The improvements made to the visualisation code during the project will also be fed back into the Galaxy codebase. BioCommons’ close connections with research communities means that the national Structural Biology Computing community is now testing and finessing the pipeline, and supporting the creation of user documentation.
Sharing what’s been learnt through a publication about the nf-core/proteinfold pipeline is on the horizon, and a pilot Australian ProteinFold Service is under development.
An Australian community for computational structural biology
A passionate group of structural biologists has formed the Australian Structural Biology Computing Community, to share computational knowledge, methods, and resources.
The active new community is receiving support from a range of partners and advocates, including L-R Johan Gustafsson (BioCommons), Steven Manos (BioCommons), Kate Michie (UNSW) and Andrew Gilbert (Bioplatforms Australia)
The explosion of possibilities presented by deep learning approaches in structural biology research has created many new opportunities and challenges. A passionate group of structural biologists has formed the Australian Structural Biology Computing Community, to approach this new era as part of a community that shares computational knowledge, methods, and resources.
This community-driven approach brings together a diverse group of people, with initial contributions forming around leads from the Structural Biology Facility at UNSW, and an academic panel of experts from Monash University, Walter and Eliza Hall Institute of Medical Research (WEHI), University of Western Australia (UWA), Australian National University (ANU), Bio21 Institute of Molecular Science and Biotechnology (Bio21), University of Melbourne, La Trobe University, University of Queensland (UQ) - IMB, University of Sydney, Griffith University, Swinburne University of Technology, CSIRO, and the University of Adelaide. Anyone involved in structural biology in Australia is invited to join and there are lots of different ways to get involved.
The Community for Structural Biology Computing in Australia webpage is a useful new resource for all users of computing for structural biology research in Australia. The page is constantly evolving and expanding, and it currently focuses on the use of deep learning methods in Structural Biology. It includes practical guides on topics like “Best practices for presenting and sharing AlphaFold models in a paper” as well as news items and announcements for relevant courses and meetings.
Australian BioCommons supports the community by hosting quarterly online meetings that aim to tease out how computational structural biologists’ challenges might be addressed with community-scale responses and national research infrastructure solutions. If you join the mailing list via the community webpage, you will receive updates and invitations to community meetings and the discussions in Slack.
BioCommons began providing broad, fully subsidised, access to structural prediction in 2022 by making AlphaFold2 available within its Galaxy Australia service. The Australian AlphaFold2 Service provides both an easy-to-use interface and dedicated GPUs to Australian researchers. When BioCommons hosted the international 2023 Galaxy Community Conference, the keynote speech by Chief Scientist of the Structural Biology Factility at UNSW, Kate Michie, generated much excitement around forming an Australian community of practice for computational structural biology as an avenue for collectively addressing the challenges presented by deep learning in structural biology.
BioCommons has supported key research stakeholders to refine the new community’s purpose, began running quarterly community meetings, and helped to establish the shared community spaces like the Australian Structural Biology Computing website and GitHub. As well as facilitating consultations with infrastructure partners and the broader computational infrastructure community, a group of national panel of experts has been identified.
This community collaborates with their peers to:
Collectively create and maintain community forums and centralised collaboration platforms to support collaboration and knowledge sharing (i.e. methods and documentation);
Foster collaboration between structural biologists, computer scientists, and data scientists, thereby creating interdisciplinary teams to help tackle complex challenges, validate results and ensure robust applications of deep learning methods;
Lead the review, prioritisation, testing, optimisation, and sharing of deep learning codes, software and approaches that are of broad relevance and interest to the Australian research community;
Develop quality assessment tools to help evaluate the quality of calculated structures, and help guide researchers towards reliable predictions; and,
Address the ethical implications of AI-driven structural predictions, as well as discuss transparency, bias and interpretability to ensure responsible use of these technologies.
The Australian Structural Biology Community is poised to tackle a set of pilot activities aimed at fast tracking a national response to the challenges facing computational approaches in structural biology. A much anticipated future output is an infrastructure roadmap document that will formalise and describe the high level requirements of the community. This collaborative effort between the new Australian Structural Biology Community, Australian BioCommons, and BioCommons infrastructure partners will support the Australian Structural Biology community as new needs arise relating to bioinformatics tools, software, infrastructure or training.
Keep in touch by subscribing for updates at the Community for Structural Biology Computing in Australia webpage.
Repurposed hardware boosts national capacity and powers innovation
QCIF Ltd has made high-performance hardware available to the Australian BioCommons, giving the hardware a second life and uplifting national capacity for running AlphaFold 2 jobs in Galaxy Australia while supporting innovation through other GPU-enabled tools.
This story is co-published with QCIF Ltd
After successfully completing a previous project, QCIF Ltd made available high-performance hardware to the Australian BioCommons, giving the hardware a second life in enabling research and uplifting national capacity for the benefit of the scientific community.
Well suited for running AlphaFold 2 jobs, the five General-Purpose Graphics Processing Units (GPGPUs) are now being used to enhance the national compute network behind the Galaxy Australia service.
The impact of this repurposing goes beyond infrastructure improvements. It has significantly expanded Galaxy Australia's capacity to support research and innovation by enabling the use of other GPU-enabled tools that offer major benefits to the scientific community. GPU processing can provide massive improvements in computational efficiency, decreasing processing times to less than 5% of conventional equivalents.
Dr Cameron Hyde, a bioinformatician at QCIF who supports the development of national software platforms like Australian BioCommons' Galaxy and Apollo services, co-authored the original AlphaFold 2 wrapper that enabled the tool to run within Galaxy Australia, ensuring both a friendly user-interface as well as instant access to the GPU clusters required to power the tool. He shared his enthusiasm for the new possibilities unlocked by the repurposed hardware which was originally part of an investment made in 2021 by the Australian Research Data Commons (ARDC) to support national platform projects and now directly enhances the bioinformatics services he helps deliver to Australian researchers. “Now that we have five GPU nodes of our own, we have room to experiment and explore new GPU-enabled tools. This gives us room to innovate beyond AlphaFold and accelerate scientific discovery in other research domains.”
For example, Galaxy Australia’s lead Bioinformatician Michael Thang has been using the hardware to explore running Nanopore’s “Dorado” on Galaxy Australia. Dorado is a high-performance basecaller for Oxford Nanopore Technology sequencing data. This innovation would enable researchers to conduct their entire analysis, from raw sequencing data through to assembled genome, all within the Galaxy Australia service.
Collaboration driving innovation
Developed by Google DeepMind, AlphaFold is an AI system that predicts a protein’s 3D structure from its amino acid sequence with accuracy comparable to experimental methods. In 2020, Australian BioCommons identified an opportunity to democratise access to this powerful tool by making AlphaFold 2 available through Galaxy Australia. This gave Australian researchers much greater accessibility to AlphaFold 2, allowing life scientists to easily visualise proteins in a manner inaccessible to all but dedicated structural biology researchers. This advance has supported research into protein-protein interactions, activation and inhibition mechanisms, and drug design.
By 2025, use of AlphaFold 2 has surged, evolving from an analytical tool for individual proteins into a routine screening tool for studying protein-protein interactions. To support this shift, Dr Hyde collaborated closely with Australian Structural Biology Computing Community to develop extensions to the AlphaFold Galaxy tool, including new output formats, input parameters, and an option to re-use intermediate files for improved efficiency.
Supported by the Australian BioCommons, AARNnet, QCIF Ltd, and The University of Melbourne, the optimised system now provides fully subsidised access for all Australian researchers via the Australian Alphafold Service. We extend our sincere thanks to the Australian Research Data Commons (ARDC) for providing the hardware to QCIF Ltd and enabling its reuse by Australian BioCommons.
Creative collisions: Bio Day a hit at Supercomputing Asia 2024
Learn more the dedicated ‘Bio Day’ at SCA, which focused on the intersection of biology and computing.
This month's Supercomputing Asia (SCA) conference featured a dedicated ‘Bio Day’ which focused on the intersection of biology and computing. Life scientists were enthusiastically invited to interact with the Asia Pacific high performance computing (HPC) community at the Sydney event. The conference organisers offered special access to almost 40 researchers and research infrastructure providers who were keen to participate in the biology-focused sessions. This extra support to add the unique voice of life scientists to the HPC forum was generously provided through Bioplatforms Australia's platinum sponsorship of the event.
Bio Day commenced with Prof Alex Brown, Director - National Centre for Indigenous Genomics, delivering a keynote presentation ‘Towards a National Indigenous genomics Ecosystem within Australia.’ As Professor of Indigenous Genomics at the Telethon Kids Institute and The Australian National University, Alex is an internationally leading Aboriginal clinician/researcher who has worked his entire career in Aboriginal health in the provision of public health services, infectious diseases and chronic disease care, health care policy and research.
Later, sessions titled ‘Building the Foundation: Genomic Data Infrastructure for Precision Medicine and Beyond’ showcased several key pieces of research infrastructure that Australian BioCommons has developed to support life scientists including:
The newly funded GUARDIANS project
A pilot program bringing Seqera Platform to Australian researchers
Some of BioCommons’ significant national partners such as the Australian Amphibian and Reptile Genomics Initiative (AusARG) and international collaborators ELIXIR were also showcased on Bio Day. Additionally, Dr Kate Michie’s (UNSW) talk revealed the ‘Transformative Impact of Deep Learning on Accelerating Molecular Research: A Focus on AlphaFold2 and its Implementation Challenges.’ The Skills and Training Track on the same day also featured our training guru, Dr Melissa Burke, presenting our unique Training Cooperative model.
Sessions held on Bio Day illuminated the unique challenges that bioinformatics research brings to HPC, including:
Episodic and extended access is required for compute resources
Compute use is reliant on experimental outcomes, and difficult to predict in advance
Software is diverse, rapidly evolving, and in many cases not optimised for HPC
Researchers may have limited experience working in HPC environments
The light shone on these unique challenges stimulated some uncommon conversations at SCA, which aim to improve life science researchers' access to appropriate and scalable bioinformatics methods and compute resources. Dr Johan Gustafsson, Bioinformatics Engagement Officer at BioCommons said:
The conference was a unique opportunity to bring two worlds together - researchers working hard in their particular field of biology don’t normally attend HPC conferences, and vice versa. So it was great to see them starting to speak the same language!
Uwe Winter, BioCloud DevOps Engineer at BioCommons attended a workshop on the recently launched Trillion Parameter Consortium (TPC), a group formed to address the challenges of building large-scale artificial intelligence (AI) systems and advancing trustworthy and reliable AI for scientific research.
Discussions at the TPC workshop brought up a lot of exciting ideas on utilising AI in a fully automated research environment. I was inspired to hear TPC’s future plans and can’t wait to apply them to BioCommons infrastructure for the benefit of Australian researchers!
Overall, Bio Day at SCA was a fantastic chance to continue important conversations around the specialised support and infrastructure that life scientists need. BioCommons extends our thanks to Bioplatforms Australia for their sponsorship and to the conference organisers for running a successful event.
Launching the Australian AlphaFold Service
A new service is now offering researchers the chance to accelerate their research through access to AlphaFold. The Australian AlphaFold Service delivers AlphaFold through Galaxy Australia, offering an easy to use interface coupled with the required computational resources at no cost to Australian researchers.
A new Australian BioCommons service is now offering researchers the chance to accelerate their research through access to AlphaFold. By providing this new capability through Galaxy Australia, the powerful tool is backed by the required computational resources at no cost to Australian researchers. After an intensive period of testing across many fields of biology, access is now open to the new Australian AlphaFold Service.
When the AI system, AlphaFold, was released by DeepMind it promised to reduce painstakingly laborious and time consuming experimental techniques for determining protein structures down to accurate predictions that can take minutes. Researchers jumped at the opportunity to investigate the 3D structure predictions built using only their primary amino acid sequences.
In July 2022, Deepmind partnered with EMBL’s European Bioinformatics Institute (EMBL-EBI) to release the predicted structures for nearly all catalogued proteins known to science, expanding the AlphaFold Database to over 200 million structures. With accuracy rivalling real-life experiments and the results shared freely the impact of the tool on research was felt internationally.
Galaxy Australia seized the opportunity to facilitate access to the tool for even more researchers. They made AlphaFold 2 available in their easy-to-use interface, and paired it with the GPU clusters required to power AlphaFold.
To design the best possible service, Galaxy Australia worked with a group of beta testers who provided feedback on how to improve the experience for their particular research application. A diverse group of Australian researchers who brought with them a wide range of experience with AlphaFold and Galaxy.
Dr Kate Michie runs the Structural Biology Facility at the Mark Wainwright Analytical Centre, UNSW, and has incorporated AlphaFold into her standard practice. The protein structure prediction speeds up laborious and sometimes frustrating protein crystallisation, saving time and money in the wetlab process.
“As structural biologists, we can use AlphaFold for both designing constructs and solving protein structures. It is quickly becoming the first thing you do, to inform what happens in the lab.”
Kate is currently looking at ways to sustain her ongoing use of AlphaFold in the long term. The computationally intensive jobs are expensive to run, so she was keen to sign up to become a beta tester for the new service. These types of jobs may not run as fast as on a dedicated private facility, but once submitted to the Australian AlphaFold Service they conveniently run in the background at no cost to researchers.
Another benefit of using AlphaFold in this way comes from the easy to navigate interface that is provided by Galaxy Australia. Like many of the beta testers, Kate is an experienced user already, and found using the tool in Galaxy Australia was easier.
“AlphaFold is enormously helpful for lots of different projects, and they are often led by researchers who are not structural biologists, who come to me asking “what can I understand from this esoteric protein we found in some weird organism?” So in order to really jump things forward and trim down which experiment is likely to work and which one isn’t, I become this mediator who helps interpret what AlphaFold can do for them and run their structures.
Ideally researchers could be using the Australian AlphaFold Service themselves. It is easy to navigate and it would be awesome if I don’t have to perform the AlphaFold runs for them.”
Dr Frank Sainsbury from the Griffith Institute for Drug Discovery at Griffith University had not used Galaxy Australia before, and was concerned about whether he would have the computational skills to be one of the beta testers for the new AlphaFold service. But with Galaxy Australia removing any requirement to install and compile obscure software, he had such success that he is looking at integrating it into his future research.
“We are looking at how a particular genus of plant viruses evolved to only infect plants. We make virus-like particles in order to determine the structure of viruses and also for drug discovery and biomedical use. AlphaFold was used to check for evidence of a core structural domain of a putative coat protein and the fact that it was there gave us the confidence to go on and make virus-like particles.”
The beta testers also identified other potentially powerful capabilities they would like to see, like using Alphafold to visualise and predict interactions between homo- and hetero- polymer proteins. This functionality is in the Alphafold Multimer tool, so Galaxy Australia is working to make it available in the future. The team is always open to finding new ways to support researchers, in this case to enable even greater insights into the 3D structural relationships between proteins.
If you are an Australian researcher who is keen to use AlphaFold to investigate protein structure, apply for access to the Australian AlphaFold Service now.
The Australian AlphaFold Service is part of Galaxy Australia which is managed by QCIF, Melbourne Bioinformatics and AARNet. The AlphaFold Service is specifically underpinned by scalable computational resources procured from Microsoft Azure. This service is supported by funding from the Queensland Government’s Research Infrastructure Co-investment Fund (RICF), Bioplatforms Australia (BPA) and Australian Research Data Commons (ARDC). BPA and ARDC are enabled by NCRIS.
Uncovering the secrets of parasites through AI and Cloud computing
A partnership between Australian BioCommons and Microsoft has resulted in University of Melbourne researchers being able to quickly leverage new methods through a large grant to use the Azure Cloud, with significant implications for the research community worldwide.
An exciting novel approach to investigating human and animal parasites is underway. New software and computational capabilities enabling automated structure-based genome annotation are overcoming some of the challenges of working with non-model organisms. A partnership between Australian BioCommons and Microsoft has resulted in University of Melbourne researchers being able to quickly leverage new methods through a large grant to use the Azure Cloud, with significant implications for the research community worldwide.
Conventional, primary sequence-based methods of annotation leave a large proportion of non-model organisms’ genes and proteins “unannotatable” because of the lack of known homologs in other species. But a new world of possibilities opens up by using structure to guide the search using AlphaFold, the AI system that accurately predicts the 3D structure of proteins.
Parasites that cause major human and animal suffering are prime candidates for this new structure-based approach. The research of Dr Neil Young from Prof Robin B Gasser’s laboratory at The University of Melbourne’s Faculty of Veterinary and Agricultural Sciences focuses on a range of socioeconomically important parasites. They are poised to fast track our understanding of how these complex organisms work by utilising AlphaFold to annotate all proteins encoded in the genomes of parasitic flatworms and their intermediate hosts (snails).
AlphaFold 2 is now available via the easy-to-use Galaxy Australia interface, with the platform also providing access to a rich catalogue of computational resources including the GPU clusters required to power AlphaFold. Further, Australian BioCommons has been working to deploy Galaxy Australia on commercial cloud resources, to enable massive scale-up of the platform and access specialised resources. The new Australian AlphaFold Service brings these significant developments together: researchers’ jobs are now running on the Azure Cloud thanks to an Australian BioCommons collaboration with BizData and Microsoft on Azure.
Connecting the Microsoft team with researchers who could immediately get to work, the proposal from the University of Melbourne Team to perform automated structure-based proteome annotation on a genome-wide scale was granted a massive computational boost. Jobs have already begun capitalising on the generous 12,000-hour A100 GPGPU allocation on Microsoft’s Azure cloud (equivalent to approximately $65,000 AUD).
By applying new capabilities in structure-based homology, this project will overcome the problems associated with primary sequence-based methods that have long dogged researchers seeking to annotate protein-coding genes and their products in silico. A substantial improvement in the accuracy of the functional annotation of nine parasite proteomes is expected, but this approach can be immediately applied to any other groups of eukaryotic organisms by researchers worldwide.
Read more about the work of Robin B Gasser’s Lab.
All Australian researchers can now access AlphaFold 2 through Galaxy Australia by simply applying through the Australian AlphaFold Service.
Australian BioCommons’ Australian AlphaFold Service is part of Galaxy Australia which is managed by QCIF, Melbourne Bioinformatics and AARNet. The AlphaFold Service is specifically underpinned by scalable computational resources procured from Microsoft Azure. This service is supported by funding from the Queensland Government’s Research Infrastructure Co-investment Fund (RICF), Bioplatforms Australia (BPA) and Australian Research Data Commons (ARDC).
Galaxy Australia is underpinned by computational resources provided by AARNet, Nectar Research Cloud, University of Melbourne, QCIF, National Computational Infrastructure and the Pawsey Supercomputing Research Centre. These efforts are supported by funding from The University of Melbourne, the Queensland Government’s Research Infrastructure Co-investment Fund (RICF), Bioplatforms Australia (BPA) and Australian Research Data Commons (ARDC). BPA and ARDC are enabled by NCRIS.