BioCLI: Improving command-line infrastructure for life scientists

A computer with coloured lines arranged to look like code

Vision

A collection of command-line environments and services, tailored to the needs of Australian life science researchers, deployed at the compute infrastructures they use, supporting both research and training.

The BioCLI Project aims to empower life scientists with user-focused CLI environments and services that reduce friction for processing and analysis of molecular data at scale.

Challenges

Data Complexity: Handling the growing scale and complexity of ‘omics data and complex analyses requires the flexibility, scalability, and control uniquely afforded by the command-line interface (CLI). 

Varied methods and expertise: The diversity of bioinformatics data, scale of work, available tools presents significant challenges when configuring CLI environments. Most life scientists do not have the expertise required, and need to be empowered with the resources, skills, and knowledge to navigate CL environments and handle substantial workloads confidently and efficiently.

Current activities

Streamlining access and execution

Making it easier to access and execute bioinformatics software and workflows via the CLI by:

  • Enabling Nextflow plugins at NCI. Nextflow plugins are very popular in the Nextflow community, but they can be tricky to implement on different systems. A new version of Nextflow on NCI allows users to run plugins like nf-schema that streamlines workflow execution and parameter validation

  • Developing a custom Nextflow task monitor for national HPC job schedulers, including detailed cost reporting

  • Configuring national HPCs to better accommodate complex and long running bioinformatics workflows

  • Developing a simple Nextflow workflow template that assists newcomers to construct and configure workflows for execution on HPC and cloud systems.

Reference datasets

Enabling access to curated datasets that are needed for standard processing and analysis.

BioImage

A welcoming interface to HPC environments specifically created for bioinformatics users.

Training program

Empowering users to work at the CLI.

Hardware access

Facilitating wide access to specialist hardware by:

  • Developing code to run GPU-enabled structural biology tools like Alphafold on Pawsey’s HPC, Setonix


Project timeline

January 2024 - December 2026


Project partners

Australian BioCommons is collaborating with our partners at Sydney Informatics Hub, the National Computational Infrastructure (NCI), and the Pawsey Supercomputing Research Centre to deliver the BioCLI Project.