A purpose built environment for bioinformatics on the command-line

A new welcoming interface to high performance computing environments has been created specifically for bioinformatics users. The “BioImage” comes pre-installed with software, tools and datasets commonly used in the bioinformatics domain.

The BioImage has been purpose built to help researchers working on the command-line, removing the need to invest time in installing common software such as Singularity, Jupyter Notebook, RStudio or Nextflow. Researchers can follow bioinformatics best practices without having to set up the environment themselves from scratch, providing a reliable and reproducible starting point for new projects. 

The BioImage is already in use on the Pawsey Nimbus Research Cloud and has detailed instructional documentation available. Its collaborative development has been led by Audrey Stott, Systems Administrator at the Pawsey Supercomputing Research Centre, as part of the BioCommons ‘Bring Your Own Data’ project. The utility of the BioImage will see it made available at other national HPC facilities, and work is currently underway to ensure compatibility with NCI’s Nirin Cloud. Expect to see it rolled out across other facilities in the near future.

Two particularly exciting features of the BioImage are CernVM-FS and Singularity-HPC. CernVM-FS is a mounted shared file system that provides users access to over 8,000 Biocontainer tools, plus common reference data sets. Singularity-HPC (a container-specific software module) then makes users’ lives easier by enabling them to execute containers as modules, bypassing the need for container syntax.

BioCommons bioinformatician Dr Georgina Samaha has assisted numerous University of Sydney researchers to use the BioImage in her role as Bioinformatics Group Lead at the Sydney Informatics Hub. Alexandra Boyling, PhD candidate at the ANZAC Research Institute said:

“The Bioimage was a great place to enter the world of bioinformatics and really helped me to upskill on the command-line. I was able to jump right in and make use of Nextflow pipelines, Singularity containers and interactive Rstudio sessions. As a wet-lab scientist, I felt quite intimidated by the idea of analysing the vast volumes of genomic data I was working with. Now I feel considerably more confident, and even excited, about the prospect of incorporating more bioinformatics into my projects in the future."

The BioImage is also proving useful for group training, providing all trainees with access to a consistent computational environment with everything pre-loaded and ready to go. When used with Pawsey’s powerful Nimbus compute, it presents a friendly gateway that is tailored for bioinformatics jobs, and includes a risk-free sandbox to work within. Using the BioImage for training means that learning takes place in an enduring environment that can be returned to over time once paired with a (quick and easy to get) allocation to Pawsey’s resources. Trainees attending the upcoming online BioCommons workshop ‘RNASeq: reads to differential genes and pathways’ will no doubt continue to use the BioImage well after the training event is over.

Read more about the BioImage

Patrick Capon