Multi-model 3D visualisation enhances Nextflow pipeline for protein structure prediction
Predicted protein structures for LmrP visualised using the proteinfold pipeline
Advances in AI are taking protein structure predictions to a whole new level, accelerating research and enabling deeper analysis of protein structure and function. The nf-core community is embracing these developments by building the Nextflow proteinfold pipeline that integrates models such as Alphafold2, Colabfold and Esmfold and simplifies their use on a variety of computing infrastructures.
BioCommons’ Dr Ziad Al Bkhetan, Product Manager - Bioinformatics Platforms and Australian Nextflow Ambassador, identified an opportunity to optimise the existing nf-core proteinfold pipeline for Australian researchers using the Australian Nextflow Seqera Service. Ziad initiated this effort by reaching out to the original developers from the Center for Genomic Regulation (CRG) in Spain with an offer to reconfigure the pipeline and add new features. This sparked an international collaborative effort that connected researchers and experts from Australian BioCommons, the CRG, the Sydney Informatics Hub (SIH) at the University of Sydney and the Structural Biology Facility (SBF) at UNSW, at several hackathons and summits to enhance the pipeline. The enhanced, community-driven pipeline is now available to all through nf-core’s curated set of open‑source analysis pipelines.
The pipeline borrows a useful reporting and visualisation feature already implemented in Galaxy Australia. Front-end developer for BioCommons, Minh Vu, augmented the pipeline to implement this feature which allows the parallel execution of multiple models and generation of reports that visualise the resulting structures simplifying comparison and benchmarking of the outputs. Several state-of-the-art tools such as AlphaFold2, ColabFold, ESMFold are included in the pipeline with additional models including RoseTTAFold-All-Atom, HelixFold3, Boltz, RosettaFold2NA and AlphaFold3 to be added soon.
The ability to run different models through the pipeline without writing new code removes the impediment of command line or complicated compute infrastructure. Reflecting on the project in the Nextflow Podcast, Phil Ewels, Product Manager for Open Source at Seqera, said:
“With almost no setup and no real prior experience, you can run these state of the art models and compare them all in a dynamic visual report. That’s pretty amazing.”
While it is designed to integrate with Seqera Platform, there’s no requirement to use it that way. Running the Nextflow pipeline on the command line gives the exact same reports. The code is freely available for others to use or improve via the nf-core repository of pipelines.
Ziad’s presentation about the collaboration and these new features was spotlighted as a highlight of the recent Nextflow Summit in Seqera’s Nextflow Podcast. Bioinformatics Engineer at Seqera, Dr Florian Wünnemann acknowledges there is great value in improving shared resources:
“I think it really represents the best of the Nextflow community: they are developing tools and not just keeping it for themselves, but directly giving them back to the larger community.”
Rob Syme, Scientific Support Lead at Seqera Labs, believes the work speaks to the Nextflow and Seqera ethos of giving scientists and researchers the tools they need to build other tools.
“I love this project: it was an amazing outcome that required no input from Seqera or Nextflow. Yes, Seqera Platform could absolutely build an alignment viewer into the platform, but it wouldn’t be as good as if researchers themselves develop it. It wouldn’t be as good as the one that Ziad and the team have developed because research moves so incredibly quickly.”
The collaboration within the international nf-core community has been a rewarding experience for all involved parties, and CRG has forged a new working relationship with BioCommons to continue development and maintenance of the pipeline. CRG’s Dr Cedric Notredame said of the experience:
“The collaboration with BioCommons has been so valuable. It has showcased the effectiveness of nf-core as a collaborative tool. Thanks to this framework, all of our teams were able to simultaneously contribute to the pipeline with minimal technical coordination. The pipeline is now one of the most complete go-to resources, covering the needs of a wide community of biologists interested in structural aspects of genomics.”
The improvements made to the visualisation code during the project will also be fed back into the Galaxy codebase. BioCommons’ close connections with research communities means that the national Structural Biology Computing community is now testing and finessing the pipeline, and supporting the creation of user documentation.
Sharing what’s been learnt through a publication about the nf-core/proteinfold pipeline is on the horizon, and a pilot Australian ProteinFold Service is under development.
As an Australian researcher, you can use the protein 3D structure prediction pipeline on the fully subsided Australian AlphaFold Service.