BioCLI: Improving command-line infrastructure for life scientists

Vision

A collection of command-line environments and services, tailored to the needs of Australian life science researchers, deployed at the compute infrastructures they use, supporting both research and training.

The BioCLI Project aims to empower life scientists with user-focused CLI environments and services that reduce friction for processing and analysis of molecular data at scale.

Challenges

Data Complexity: Handling the growing scale and complexity of ‘omics data and complex analyses requires the flexibility, scalability, and control uniquely afforded by the command-line interface (CLI). 

Varied methods and expertise: The diversity of bioinformatics data, scale of work, available tools presents significant challenges when configuring CLI environments. Most life scientists do not have the expertise required, and need to be empowered with the resources, skills, and knowledge to navigate CL environments and handle substantial workloads confidently and efficiently.

Current activities

Streamlining access and execution

Making it easier to access and execute bioinformatics software and workflows via the CLI by:

  • Enabling Nextflow plugins at NCI. Nextflow plugins are very popular in the Nextflow community, but they can be tricky to implement on different systems. A new version of Nextflow on NCI allows users to run plugins like nf-schema that streamlines workflow execution and parameter validation

  • Developing a custom Nextflow task monitor for national HPC job schedulers, including detailed cost reporting

  • Configuring national HPCs to better accommodate complex and long running bioinformatics workflows (e.g. specialised nodes on Pawsey’s Setonix)

  • Developing a simple Nextflow workflow template that assists newcomers to construct and configure workflows for execution on HPC and cloud systems.

BioImage

A welcoming interface to HPC environments specifically created for bioinformatics users.

Training program

Empowering users to work at the CLI.

Hardware access

Facilitating wide access to specialist hardware by:

  • Developing code to run GPU-enabled structural biology tools like Alphafold on Pawsey’s HPC, Setonix

Reference datasets

Enabling access to curated datasets that are needed for standard processing and analysis.


Project timeline

January 2024 - December 2026


Project partners

Australian BioCommons is collaborating with our partners at Sydney Informatics Hub, the National Computational Infrastructure (NCI), and the Pawsey Supercomputing Research Centre to deliver the BioCLI Project.


Project Achievements

JULY 2025 - DECEMBER 2025

Nextflow for the life sciences

TRAINING

A workshop on building reproducible and scalable scientific workflows using Nextflow. 

Enabling population-scale human genomics analyses

Reference Data

This Oxford Nanopore (ONT) dataset of the 1000 Genomes Project is hosted within NCI’s HPC ecosystem. It enables population-scale human genomics analyses. 


JANUARY 2025 - JUNE 2025

AMD Container: ProteinMPNN

ProteinMPNN (Protein Message Passing Neural Network) is a cutting-edge, deep learning-based method for designing stable, realistic protein structures. Structural biology software (NVIDIA architecture) ported to AMD chips at Pawsey.

Software

nf-core configs for Gadi and Setonix

WORKFLOW Execution

Running proteinfold on Gadi requires use of the gpuvolta queue. The general gadi nf-core config does not natively support proteinfold execution or use of gpuvolta queues. Requires a pipeline-specific configuration.

AMD Container: ColabFold

ColabFold uses a combination of the fast MMseqs2 tool and the powerful AlphaFold2 or RoseTTAFold models to predict protein structures. Structural biology software (NVIDIA architecture) ported to AMD chips at Pawsey.

Software

AMD Container: AlphaFold

AlphaFold predicts the 3D structure of proteins from their amino acid sequences with high accuracy. Structural biology software (NVIDIA architecture) ported to AMD chips at Pawsey.

Software


JULY 2024 - DECEMBER 2024

What exactly is bioinformatics? And is there a right (or a wrong) way to do it?

TRAINING

Webinar on the key concepts of bioinformatics to help life scientists looking to get started or build their bioinformatics skills. 

Powering long running workflows

WORKFLOW Execution

Specialised nodes to power long-running scientific workflows made available on Pawsey’s Setonix.

Nextflow on NCI Infrastructure

WORKFLOW Execution

NCI have been improving how Nextflow works on NCI infrastructure.

Introduction to Nextflow workshop

TRAINING

Introduction to Nextflow workshop based on Seqera’s Hello Nextflow training series.