News

Subscribe to the Australian BioCommons monthly newsletter or read previous editions

Patrick Capon 30/8/24 Patrick Capon 30/8/24

New resources power long-running workflows at Pawsey Supercomputing Research Centre

In response to community requests, new resources supporting cutting edge bioinformatics workflows are available on Pawsey’s Setonix supercomputer.

The Setonix supercomputer — Pawsey’s Setonix supercomputer (supplied by Karina Nunez).

Specialised nodes are now available at the Pawsey Supercomputing Research Centre that are designed to power long-running scientific workflows. Responding to researcher demand, new Workflow Nodes have been custom built on Setonix to optimise and support workflows managed by tools like Nextflow and Snakemake that surpass the regular 96 hour wall-time constraint.

Researchers voiced their challenges in running long workflows, including numerous reports from the BioCommons computational workflows community that they were running out of wall-time - the clock time it takes for a computation to run from start to finish. One of these researchers was Lauren Huet, Bioinformatics Research Officer at the Minderoo OceanOmics Centre at UWA:

Our Ocean Genomes project is addressing a key gap where over 95% of marine vertebrates lack sequenced genomes. Building such a comprehensive reference genome library requires intensive compute power, and the workflows can be quite long. This project would not be possible without the capacity to scale up to process tens or hundreds of genomes in parallel.

Dr Sarah Beecroft, Life Sciences Supercomputing Specialist at Pawsey, led the team effort to build dedicated Workflow Nodes on Pawsey’s Setonix - the most powerful research computer in the Southern Hemisphere.

Setonix’s Workflow Nodes provide a stable and robust environment for workflow orchestration. Users can launch their master jobs interactively and keep their sessions alive for extended time periods, enhancing both productivity and performance. I’m really excited to see the new research that is enabled!

Lauren and the OceanOmics team are already benefiting greatly from the Workflow Nodes:

It’s been a game-changer for our research! The nodes enable us to run Nextflow pipelines directly in the terminal, offering unparalleled flexibility for developing and testing our workflows. The capability to execute long-running pipelines without interruptions has significantly increased our throughput, allowing us to produce results faster and more efficiently.

As a member of the BioCommons BioCLI project, Sarah is passionate about making command-line infrastructure accessible and well documented. Together with other supercomputing experts, the team has produced a new comprehensive technical user guide for users looking to run their workflows on the Setonix Workflow Nodes.

Learn how to run workflows on the Workflow Nodes in Pawsey’s user support documentation, or join the next meeting of the BioCommons computational workflows interest group to influence future research infrastructure developments.

Patrick Capon 27/6/24 Patrick Capon 27/6/24

BioCLI: Improving command-line infrastructure for life scientists

Learn more about how the newly established BioCLI Project is empowering Australian life scientists to access command-line infrastructure.

Australian life scientists are set to be empowered with the resources, skills, and knowledge required to access command-line infrastructure for bioinformatics research through the newly established BioCLI Project.

Data analysis in the life sciences is constantly evolving, as new instrument types are rolled out and larger amounts of data are generated. The flexibility, scalability, and control uniquely afforded by the command-line interface (CLI) gives users powerful capabilities to interrogate their data, meaning that coding skills can sometimes be essential to particularly complex data analyses. However, the sheer number and diversity of bioinformatics data, tools, and working scales presents a significant entry barrier to using the CLI for life scientists.

Australian BioCommons has established the BioCLI Project to uplift life scientists and help tackle the challenges of working at the CLI, offering environments and services that will reduce friction for processing and analysis of molecular data at scale. Working with our partners at Sydney Informatics Hub, the National Computational Infrastructure (NCI), and the Pawsey Supercomputing Research Centre, BioCLI will:

Develop key CLI infrastructure such as public virtual machine images that come preconfigured with all the essentials for life sciences research (eg. the BioImage)
Accelerate command-line job throughput by configuring key tools and workflows to run efficiently on specialised hardware or queuing systems (eg. configuration of Parabricks for NCI’s Gadi supercomputer)
Provide clear documentation for accessing and configuring all BioCLI outputs
Have a strong focus on empowering researchers through a dedicated training program

Keep up to date with the latest BioCLI project developments on our website, and be sure to register for our upcoming entry-level webinar “What exactly is bioinformatics?” delivered by Dr Georgie Samaha, Product Owner of BioCLI and Bioinformatics Group Lead at the Sydney Informatics Hub, The University of Sydney.

Patrick Capon 15/12/23 Patrick Capon 15/12/23

Wrapping up the ‘Bring Your Own Data’ project and a look to the future

Read how BYOD enabled highly accessible, available, and scalable data analysis and sharing capabilities for the benefit of Australian life science researchers.

Important Outputs

Australian Apollo Service
Australian Alphafold Service
Australian Fgenesh++ Service
Grew Galaxy Australia significantly, from two to seven million submitted jobs
Established Galaxy Australia’s TIaaS and the Genome Lab
Tool Finder
Seqera Platform pilot
Bioimage, a purpose built bioinformatics environment on the command-line

Four new national services, major expansions to Galaxy Australia, 15 training workshops and webinars, many specialised workflows, and even more stories of impact, all thanks to collaborative efforts of 12 organisations. It’s fair to say that the Australian BioCommons ‘Bring Your Own Data’ (BYOD) project met its aim to enable highly accessible, available, and scalable data analysis and sharing capabilities for the benefit of Australian life science researchers.

Winding down at the end of 2023, the BYOD Expansion project’s legacy will continue through the delivery and constant improvement of our national services.

The project began in June 2019 thanks to investment from BioPlatforms Australia and the Australian Research Data Commons (ARDC), and brought together a large group of collaborators and co-investors including AAF, AARNet, Melbourne Bioinformatics, NCI, Pawsey, QCIF via the Queensland Government RICF fund, The University of Sydney, AGRF, Griffith University and Monash University. There were three focus areas: web-based bioinformatics workbenches for life sciences researchers, a complementary command line interface (CLI)-focused platform, and creation of data infrastructure connecting ‘omics instruments and reference datasets to the analysis infrastructure. Work in these areas has had a wide ranging and extremely positive impact on the life sciences research landscape, as showcased in the words of infrastructure end users.

TESTIMONIALS

Tool Finder will be a really useful resource for researchers, particularly those who are just getting started and want to understand what software is available for their analysis and what computing platform would be most suitable. It’s awesome to have all of that information on hand in the one place!

Dr Parice Brandies, The University of Sydney

Galaxy Australia is intuitive to use, it’s easy because students don’t have to install software, it has lots of really good documentation and visualisation, and all of this helps the students to understand what they are doing and more importantly why they are doing it.

Dr Kylie Munyard, Curtin Medical School

The Fgenesh++ service has helped us easily and efficiently annotate multiple diverse genomes to a high standard.

Dr Kate Farquharson, The University of Sydney

For my PhD project I assembled close to 4000 RNA-Seq datasets from samples from all over the world - a task that would have been impossible without Galaxy Australia.

Dr Rhys Parry, University of Queensland

So much software gets left without regular updates and from year to year you realise that it isn’t maintained or updated. So we look for things that are stable - this is the reason we call on the Australian Apollo Service.

Assoc Prof Charles Robin, University of Melbourne

We are looking at how a particular genus of plant viruses evolved to only infect plants. We make virus-like particles in order to determine the structure of viruses and also for drug discovery and biomedical use. AlphaFold was used to check for evidence of a core structural domain of a putative coat protein and the fact that it was there gave us the confidence to go on and make virus-like particles.

Dr Frank Sainsbury, Griffith Institute for Drug Discovery

TIaaS helps keep workshops on track. Trainers have live insight into how participants’ jobs are running and can identify sticking points almost before they happen. The special training queue means that everyone has a consistent experience. Even large jobs submitted simultaneously from all around Australia run fast.

Dr Melissa Burke, Australian BioCommons

The Bioimage was a great place to enter the world of bioinformatics and really helped me to upskill on the command-line. I was able to jump right in and make use of Nextflow pipelines, Singularity containers and interactive Rstudio sessions.

Alexandra Boyling, ANZAC Research Institute

Looking ahead, BioCommons are establishing two new activities - the ‘Workflow Commons’ and ‘BioCLI’ to continue where BYOD has left off. Stay tuned for more in this space!

Read a full summary of the BYOD Expansion project

The Australian BioCommons BYOD Expansion Project is funded through NCRIS investments from Bioplatforms Australia and the Australian Research Data Commons (http://doi.org/10.47486/PL105) that are matched with co-investments from AARNet, Melbourne Bioinformatics, NCI, Pawsey, QCIF via the Queensland Government RICF fund, The University of Sydney, AGRF, Griffith University and Monash University.

Patrick Capon 4/10/23 Patrick Capon 4/10/23

A purpose built environment for bioinformatics on the command-line

A new welcoming interface to high performance computing environments has been created specifically for bioinformatics users, called the BioImage.

A new welcoming interface to high performance computing environments has been created specifically for bioinformatics users. The “BioImage” comes pre-installed with software, tools and datasets commonly used in the bioinformatics domain.

The BioImage has been purpose built to help researchers working on the command-line, removing the need to invest time in installing common software such as Singularity, Jupyter Notebook, RStudio or Nextflow. Researchers can follow bioinformatics best practices without having to set up the environment themselves from scratch, providing a reliable and reproducible starting point for new projects.

The BioImage is already in use on the Pawsey Nimbus Research Cloud and has detailed instructional documentation available. Its collaborative development has been led by Audrey Stott, Systems Administrator at the Pawsey Supercomputing Research Centre, as part of the BioCommons ‘Bring Your Own Data’ project. The utility of the BioImage will see it made available at other national HPC facilities, and work is currently underway to ensure compatibility with NCI’s Nirin Cloud. Expect to see it rolled out across other facilities in the near future.

Two particularly exciting features of the BioImage are CernVM-FS and Singularity-HPC. CernVM-FS is a mounted shared file system that provides users access to over 8,000 Biocontainer tools, plus common reference data sets. Singularity-HPC (a container-specific software module) then makes users’ lives easier by enabling them to execute containers as modules, bypassing the need for container syntax.

BioCommons bioinformatician Dr Georgina Samaha has assisted numerous University of Sydney researchers to use the BioImage in her role as Bioinformatics Group Lead at the Sydney Informatics Hub. Alexandra Boyling, PhD candidate at the ANZAC Research Institute said:

“The Bioimage was a great place to enter the world of bioinformatics and really helped me to upskill on the command-line. I was able to jump right in and make use of Nextflow pipelines, Singularity containers and interactive Rstudio sessions. As a wet-lab scientist, I felt quite intimidated by the idea of analysing the vast volumes of genomic data I was working with. Now I feel considerably more confident, and even excited, about the prospect of incorporating more bioinformatics into my projects in the future."

The BioImage is also proving useful for group training, providing all trainees with access to a consistent computational environment with everything pre-loaded and ready to go. When used with Pawsey’s powerful Nimbus compute, it presents a friendly gateway that is tailored for bioinformatics jobs, and includes a risk-free sandbox to work within. Using the BioImage for training means that learning takes place in an enduring environment that can be returned to over time once paired with a (quick and easy to get) allocation to Pawsey’s resources. Trainees attending the upcoming online BioCommons workshop ‘RNASeq: reads to differential genes and pathways’ will no doubt continue to use the BioImage well after the training event is over.

Mentor program breaking down barriers with Nextflow/nf-core

With researchers continuing to have more and more data to process and analyse, Dr Georgina Samaha is aiming to break down barriers that inhibit researchers from accessing the compute resources they need during her Nextflow/nf-core mentorship.

Dr Georgina Samaha smiling at the camera in front of a flowery bush — Dr Georgina Samaha, Bioinformatics Group Lead at the Sydney Informatics Hub and BioCommons team member.

A mentorship in the Nextflow and nf-core program has been awarded to Australian BioCommons bioinformatician Dr Georgina Samaha, Bioinformatics Group Lead at the Sydney Informatics Hub. The highly competitive Nextflow/nf-core mentorship program will pair Georgie with an experienced developer to work closely on a project that she is particularly passionate about: breaking down barriers that prevent life sciences researchers from using high performance computing (HPC).

The ever-increasing scale of life sciences data means that researchers need to process or analyse their data with large-scale compute resources. This can be a daunting process, particularly for those with less experience writing code or interacting with computers on the command-line interface. Georgie plans to address these challenges:

“I will create new resources and share my learnings about Nextflow/nf-core with the broader life science research and bioinformatics communities to make their lives easier, give them a starting point and increased confidence in approaching the difficult and intimidating aspects of their work on HPCs.”

Georgie is heavily involved in improving access to command-line infrastructure for life sciences researchers as part of the BioCommons Bring Your Own Data Expansion Project. She applied to the Nextflow/nf-core mentorship program as she frequently encounters researchers who need her help to run bioinformatics pipelines on HPCs. nf-core offers community-supported reproducible pipelines that simplify data processing. These pipelines are popular in the bioinformatics community, but researchers still face challenges in using them such as understanding the resource requirements and running the pipelines efficiently. Georgie’s project aims to address these challenges to make life sciences researchers’ lives easier. She also aims to demonstrate to national HPC providers that there is significant value in improving access to large-scale compute resources for life sciences researchers.

Georgie expects that her involvement in the mentorship program will greatly increase the level of support she can offer to researchers, plus provide her valuable experience coding in Nextflow. Georgie will bring this newfound expertise to her role at Sydney Informatics Hub and the BioCommons to empower the researchers she works with, plus share her knowledge with partners including QCIF, NCI and Pawsey.

Georgie’s mentorship started in June and will run until the end of September 2023. You can find out more about the Nextflow/nf-core mentorship program at the nf-core website, and stay tuned to hear from Georgie later in the year!