News
Subscribe to the Australian BioCommons monthly newsletter or read previous editions
Bringing Seqera Platform to Australian researchers
Discover the latest progress on providing Australian researchers with a centralised command post for Nextflow workflows.
Dr Steven Manos, BioCommons’ A/Director of the BioCloud, recently returned from Barcelona where he opened the global Nextflow Summit by presenting “The national Nextflow Tower service for Australian researchers.” He was invited to share his experience in recognition of the sophisticated Australian usage of the workflow management and data analysis space, Seqera Platform.
A partnership involving Seqera Labs, Pawsey and NCI has allowed BioCommons to investigate how to best support researchers in their Nextflow workflow use across high-performance computing (HPC) and commercial cloud environments. Since July 2022, the Seqera Platform has been integrated with the Pawsey and NCI supercomputers as well as compute environments at WEHI, AGRF and AWS. The service is in use by 60 individuals from 20 research groups/organisations.
Once fully operational, the Australian Nextflow Seqera Service will offer a fully subsidised, centralised command post for Australian researchers to manage and run their Nextflow workflows. Users can bring their own supercomputing allocations, local HPC or commercial cloud services. Alternatively, access to compute can be provided through the Australian BioCommons Leadership Share (ABLeS). BioCommons is now inviting Australian life sciences research groups and organisations to express interest in joining the pilot project that is building this national service. For now, the call out is for experienced users/groups with access to compute infrastructure, Nextflow workflows and testing datasets ready for use, and the requisite expertise to manage their environment.
Did you know? Nextflow Tower has recently been rebranded “Seqera Platform”
A progress update on the Seqera Platform pilot project will be presented at the 2023 ABACBS Conference by Dr Ziad Al Bkhetan, Bioinformatics Application Specialist in the ABLeS program and Seqera Platform Service Lead. As silver sponsors, we extend our long history of supporting the conference and we hope you will catch up with Ziad and other BioCommons staff to discuss the latest developments in Seqera Platform and other BioCommons services.
Learn more about Seqera Platform via this Australian BioCommons webinar.
The Seqera Platform pilot project is delivered through the Australian BioCommons ‘Bring Your Own Data’ Expansion Project. The BYOD Expansion Project is funded through NCRIS investments from Bioplatforms Australia and the Australian Research Data Commons (http://doi.org/10.47486/PL105) that are matched with co-investments from AARNet, Melbourne Bioinformatics, NCI, Pawsey, QCIF via the Queensland Government RICF fund, The University of Sydney, AGRF, Griffith University and Monash University.
BioCommons strengthens connections with the Asia-Pacific region
BioCommons has strengthened connections with the Asia-Pacific region and showcased our training capabilities and services at the 22nd International Conference on Bioinformatics (InCOB) hosted by the Asia-Pacific Bioinformatics Network (APBioNet) in Brisbane.
BioCommons has strengthened connections with the Asia-Pacific region at the 22nd International Conference on Bioinformatics (InCOB) hosted by the Asia-Pacific Bioinformatics Network (APBioNet) in Brisbane. In line with our mission to uplift life sciences digital research, BioCommons training capabilities and services were on show at the conference.
BioCommons provided virtual machines to support the Tyagi Lab at RMIT University and COMBINE to deliver an ‘Introduction to Machine Learning for Bioinformatics’ workshop on predictive modelling for genomics data for conference attendees. Attended by 25 people, this workshop is amplifying machine learning skills in the region and beyond. Find out more about how we enable events related to bioinformatics that benefit a national audience.
Dr Melissa Burke, BioCommons bioinformatics training specialist, attended a special session on bioinformatics education to hear how organisations and individuals are navigating the evolving landscape of bioinformatics education in the 21st century.
Dr Gareth Price, Project Lead of the Galaxy Australia service, presented a talk on ‘Making bioinformatics more user-friendly with custom interfaces’ and the Galaxy Australia team developed new training materials to support wrapping of tools for Galaxy.
In 2024, bioinformatics societies of the Asia-Pacific region will unite at the first Asia and Pacific Bioinformatics Joint Conference in Okinawa, Japan around the theme of ‘Creating bioinformatics synergy across the Asia Pacific region.’ Find out more about the conference.
New framework to improve effectiveness and inclusivity of life science training
A first-of-its-kind framework termed the ‘Bicycle Principles’ advocates for actions that will improve the effectiveness and inclusiveness of professional development in the life sciences and beyond.
A first-of-its-kind framework termed the ‘Bicycle Principles’ advocates for actions that will improve the effectiveness and inclusiveness of professional development in the life sciences and beyond.
Nearly all researchers participate in workshops or short courses to enhance their skill sets. However, there are almost no standards to guarantee training quality, and peer-reviewed evidence suggests that much of what is available is ineffective.
The Bicycle Principles, developed through an international project led by Jason Williams (Cold Spring Harbor Laboratory) and Rochelle Tractenberg (Georgetown University) with support from the National Science Foundation (USA), provide a set of 14 recommendations to systematically strengthen short format training. The Principles represent the consensus amongst international experts on instruction in the life sciences including the BioCommons’ Christina Hall (Associate Director of Training and Communications) and Melissa Burke (Training and Communications Officer). They provide a framework through which instructors, programs, organisations and funders can prioritise evidence-based teaching, inclusiveness and equity as well as the ability to scale, share, and sustain training.
For researchers, the implementation of this framework could provide assurances of training quality, assisting them in maximising their professional skills, achieving career goals, and increasing the impact of their scientific work.
The Bicycle Principles framework is published in PLOS ONE and is accompanied by an implementation roadmap that demonstrates how individuals, groups, communities of practice and organisations can put the principles into action. 
Read more about the Bicycle Principles on the project’s website.
Funding boost to bring game-changing growth to BioCommons
Australian BioCommons will undertake significant growth following the allocation of crucial new NCRIS funding to Bioplatforms Australia. Designed to support game-changing national infrastructure, the funding will enable BioCommons to deliver in three key areas: BioCloud, Australian Tree of Life Data Laboratories and GUARDIANS.
Australian BioCommons is poised to undertake a period of significant growth following the Department of Education’s announcement of crucial new funding to Australia’s National Collaborative Research Infrastructure Strategy (NCRIS). This will deliver a variety of game-changing national infrastructure developments to support omics-based life science research, with Bioplatforms Australia allocating funding to the BioCommons over the 2023-27 period in three growth areas:
Integrated analysis platforms for omics research through the development of a ‘BioCloud’ - a unified set of ‘research context aware’ digital services tailored to meet the requirements of life sciences researchers to work with molecular biological data, using bioinformatics tools and workflows on a variety of infrastructures.
Foundational infrastructure for accelerating biodiversity research and conservation (Australian Tree of Life (AToL) Data Laboratories) bringing together national ‘omics data with multidisciplinary data (environment, climate, trait) and connecting these to build a portfolio of transparent and repeatable analytical tools supporting deeply informed biodiversity and biosecurity management decisions.
A translational human ‘omics data infrastructure program (GUARDIANS) to drive a step change to cutting-edge national digital research infrastructure and unlock Australia’s potential in human ‘omics research through the provision of secure, scalable, and integrated data and analytics platforms.
These three projects will be delivered in collaboration with the growing network of research consortia and delivery partners established by the BioCommons during its first phase.
This exciting announcement builds on a 2023-28 extension to the BioCommons project that is already underway. Having delivered significant outcomes in the first 5-year term, the contractual agreements required for a further 5 years of continuity are currently being bedded down. This will ensure ongoing support for critical national services and community building activities.
Planning for the three growth areas is already underway and will intensify over the next six months. Discussions will broaden to include key partners soon, with a view to formalising initial projects and engagements in early 2024. Expect to see many exciting announcements, invitations to participate, and new opportunities to join BioCommons activities in the near future.
The outcomes of the NCRIS 2023 Funding Round are published on the Department of Education website.
The Australian BioCommons is supported by Bioplatforms Australia. Read their announcement: Bioplatforms Secures Crucial Funding from NCRIS Program to Propel Frontier Omics Technology.
Students investigate beetle genetic variation using new Apollo training instance
Teaching genetics is easier and more effective now that a powerful online tool is being made available to Australian researchers and trainers through a new feature of the Australian Apollo Service.
Teaching genetics is easier and more effective now that a powerful online tool is being made available to Australian researchers and trainers. Students can use a tailored training instance of the web-browser accessible system, Apollo, for real-time collaborative curation and editing of genome annotations.
This new feature of the Australian Apollo Service allows trainers to focus on teaching genome annotation curation, without being burdened by installation and maintenance of Apollo. All the hosting and system administration of customised Apollo instances is taken care of for service users. Life scientists and research consortia based in Australia can apply for an instance that is ready-made for training, and is up to date with the latest release of the software.
Molecular evolution and population genetics researcher, Assoc Prof Charles Robin signed on to the Australian BioCommons’ Apollo service through a recommendation from a colleague at Melbourne Bioinformatics. After facilitating some Australian BioCommons online workshops, Charles now uses an Apollo training instance when teaching genetics to third year undergraduates at the University of Melbourne. Each student is provided with their own login where they can visualise DNA sequences, perform annotations and explore without overwriting each others’ work. Charles finds Apollo ideal for teaching:
“It’s great that the transcriptome maps really well to the genome. By playing within Apollo, students get to see the AC / GT rule and how this can be reinforced by the transcript. Things like alternate splicing are also easily visualised.”
Two adult beetles, one with a CRISPR deletion in the gene called cardinal that is involved in eye colour. The genetic modifications were performed in Charles’ lab.
The class answers research questions like ‘which of the genes in this region includes this particular mutation?’ or ‘how do you find the candidate gene in this region?’ using real world data from a beetle with a gene mutation. Charles deliberately chooses genomes that are ‘untidy’:
“You want a non-model genome to identify gaps and changes. Students can have an expectation that all annotations on a genome are true, and using Apollo allows them to see that this is not the case.”
Charles rates the reliability of the software as a key factor in why he uses Apollo for teaching genetics:
“So much software gets left without regular updates and from year to year you realise that it isn’t maintained or updated. So we look for things that are stable - this is the reason we call on the Australian Apollo Service.”
Using Apollo to curate genome annotations
The Australian Apollo Service performs all system administration, build and deployment of the instance on behalf of users, with support provided through a help desk, user documentation and training events. The deployment of a full technology stack, long term hosting of data, maintenance updates and security are all covered, providing customised, local instances of the Apollo software for individual genome projects or training.
Australian BioCommons are working with Apollo project principal investigator Prof Ian Holmes to understand researchers’ needs within Apollo, with an aim to provide improved annotation and visualisation features for genome annotation research.
Australian BioCommons delivers collaborative distributed infrastructure to enable life science research. BioCommons partner QCIF is offering the Apollo Portal service, and it is underpinned by computational resources provided by the Pawsey Supercomputing Research Centre. These efforts are supported by funding from Bioplatforms Australia (BPA) and Australian Research Data Commons (ARDC). BPA and ARDC are enabled by NCRIS.
Automated testing with Selenium assures robust systems
The latest release of the Galaxy software includes code that benchmarks new infrastructure robustness and monitors its performance to ensure users are getting the most out of their time spent computing.
To the untrained eye, the latest release of the Galaxy software is a long and technical list of 208 enhancements and 145 fixes of known issues, inherent to the development of a sophisticated analysis platform. But step back from the detail, and we can see a significant advance in the maturity of this global service. The enhancements heavily feature tests of the system’s robustness – code that is tasked with benchmarking new infrastructure and monitoring its performance to ensure users are getting the most out of their time spent computing.
Software Engineer, Dr Nuwan Goonasekera, has championed the use of Selenium to test new infrastructure and deployments for Galaxy Australia for several years now. This suite of tools enables automatic testing of applications. Galaxy Australia makes use of Selenium to monitor user health of the service, by checking the time taken to perform various common actions that users need. By constantly testing the system, the team can monitor a custom dashboard that alerts them to any potential issues the instant they begin.
End-to-end testing is not just good software engineering practice, it should also be part of a holistic plan to measure scalability and monitor service quality that users experience day-to-day.
Dr Nuwan Goonasekera
Melbourne Bioinformatics, Galaxy Australia
Galaxy Australia’s testing and ongoing monitoring drives the constant improvements that underpin a professional service. Their modelling of the use of Selenium has resulted in the tool now appearing in the procedures of other Galaxy servers around the world. It featured heavily in the new global release of Galaxy 23.1, with the list of enhancements made to the new version including many mentions of Selenium code.
After witnessing the value of Galaxy Australia’s usage of Selenium at a recent meeting for BioCommons partners, QCIF Software Engineer, Brigette Gonch, was keen to investigate how the tool might assist in her work on the Bioplatforms Australia’s Data Portal. Galaxy Australia’s move to a new data centre is currently using Selenium to confirm improved performance, and ensure that any kinks are ironed out before users are granted access to the new resources in November.
Keeping the Vortex spinning: How Galaxy Australia ensures jobs run smoothly
Total Perspective Vortex is a powerful locally-designed tool to streamline the management and routing of jobs. It led to such massive efficiencies that it has now been rolled out around the world.
When a researcher uses the Galaxy Australia platform for their data analysis, they expect a straightforward job launch, easy tool selection and a quick response time. This seemingly simple task requires a complex set of resources and a sophisticated process behind the scenes. Total Perspective Vortex (TPV) was designed locally to streamline the management and routing of jobs and led to such massive efficiencies that it has been rolled out around the world.
TPV is able to dynamically set resource requests and route jobs to appropriate compute resources based on a set of configurable rules. By significantly lowering the manual intervention required for administering the system, managing and routing complex jobs in high-throughput environments become significantly more efficient. When it was deployed to dispatch jobs across Galaxy Australia’s highly distributed compute resources, it nearly doubled the maximum throughput.
One challenge administrators face is the huge diversity in jobs that Galaxy Australia users run. With TPV, a user’s choice of tool and their type of data is passed through a decision tree to ensure that the job is appropriately resourced, while balancing the competing demands of job turnaround time, throughput and overall efficiency.
Another strength and complexity of the Galaxy Australia system is its distributed network of computers. Galaxy Australia’s compute infrastructure is spread across a number of national cloud services. Underpinning the simple graphical user interface that users interact with are resources provided by AARNet, University of Melbourne, QCIF, Pawsey, NCI and the Microsoft Azure Cloud. The following table shows the sort of jobs the Galaxy Australia TPV distributes across these resources.
Examples of how TPV distributes jobs to compute infrastructure
TPV is authored by Software Engineer and Research Fellow Dr Nuwan Goonasekera, Melbourne Bioinformatics, University of Melbourne, with support from the global Galaxy community. It is evidence of the constant systemic improvements that are possible when an international community has a shared ambition and another example of Nuwan’s contribution to Galaxy’s international code base.
TPV has a domain-agnostic design that can be adapted to other complex resource management systems. If you are interested in technical details, ARDC is hosting an online information session on 26 Oct: TechTalk 23: Galaxy – Distributed Computing Through Total Perspective Vortex.
Galaxy Australia’s uptake reaches a new milestone: 7 million jobs and counting!
Over seven million jobs have been submitted to Galaxy Australia! Learn more about the lucky seven millionth job and Galaxy Australia’s versatility across a range of fields.
Kylie’s class celebrate submitting Galaxy’s seven millionth job
When the seven millionth job was submitted to Galaxy Australia, we were excited to see what area of research had triggered this milestone. The lucky job was submitted by a student in Dr Kylie Munyard’s undergraduate course ‘Introduction to Bioinformatics’ at Curtin University.
As Senior Lecturer in Molecular Genetics within the Curtin Medical School, Kylie regularly uses the extensive training materials available in Galaxy Training to teach her graduate researchers and undergraduate students how to use a wide range of data analysis tools:
“It is intuitive to use, it’s easy because students don’t have to install software, it has lots of really good documentation and visualisation, and all of this helps the students to understand what they are doing and more importantly why they are doing it.”
This university course teaches students how to import their raw sequence read data and move through a standard workflow of assembling genomes, calling variants, creating annotations and studying gene expression using Galaxy Australia.
Another key feature of Galaxy Australia that makes life easy for computational life science trainers is TIaaS (Training Infrastructure as a Service), or as Kylie describes it: “A little ‘corner’ of Galaxy Australia dedicated for our class use.”
TIaaS provides dedicated compute power, customised dashboards, and tracking of trainee jobs. All Australian trainers are eligible to use TIaaS through a simple application process, which can include setup assistance and backend support during training events. Trainers capture important technical event reports and trainees can access their data after the event for continued learning.
Rapid uptake of Galaxy Australia by researchers, trainers and trainees demonstrates the platform's versatility across a range of fields. Its popularity amongst bioinformatics teachers is just one of the reasons that it has been only a few months since Galaxy Australia celebrated reaching 6 million jobs!
Check out the wide range of tutorials available in Galaxy Training, and investigate how TIaaS can support successful training events.
A purpose built environment for bioinformatics on the command-line
A new welcoming interface to high performance computing environments has been created specifically for bioinformatics users, called the BioImage.
A new welcoming interface to high performance computing environments has been created specifically for bioinformatics users. The “BioImage” comes pre-installed with software, tools and datasets commonly used in the bioinformatics domain.
The BioImage has been purpose built to help researchers working on the command-line, removing the need to invest time in installing common software such as Singularity, Jupyter Notebook, RStudio or Nextflow. Researchers can follow bioinformatics best practices without having to set up the environment themselves from scratch, providing a reliable and reproducible starting point for new projects.
The BioImage is already in use on the Pawsey Nimbus Research Cloud and has detailed instructional documentation available. Its collaborative development has been led by Audrey Stott, Systems Administrator at the Pawsey Supercomputing Research Centre, as part of the BioCommons ‘Bring Your Own Data’ project. The utility of the BioImage will see it made available at other national HPC facilities, and work is currently underway to ensure compatibility with NCI’s Nirin Cloud. Expect to see it rolled out across other facilities in the near future.
Two particularly exciting features of the BioImage are CernVM-FS and Singularity-HPC. CernVM-FS is a mounted shared file system that provides users access to over 8,000 Biocontainer tools, plus common reference data sets. Singularity-HPC (a container-specific software module) then makes users’ lives easier by enabling them to execute containers as modules, bypassing the need for container syntax.
BioCommons bioinformatician Dr Georgina Samaha has assisted numerous University of Sydney researchers to use the BioImage in her role as Bioinformatics Group Lead at the Sydney Informatics Hub. Alexandra Boyling, PhD candidate at the ANZAC Research Institute said:
“The Bioimage was a great place to enter the world of bioinformatics and really helped me to upskill on the command-line. I was able to jump right in and make use of Nextflow pipelines, Singularity containers and interactive Rstudio sessions. As a wet-lab scientist, I felt quite intimidated by the idea of analysing the vast volumes of genomic data I was working with. Now I feel considerably more confident, and even excited, about the prospect of incorporating more bioinformatics into my projects in the future."
The BioImage is also proving useful for group training, providing all trainees with access to a consistent computational environment with everything pre-loaded and ready to go. When used with Pawsey’s powerful Nimbus compute, it presents a friendly gateway that is tailored for bioinformatics jobs, and includes a risk-free sandbox to work within. Using the BioImage for training means that learning takes place in an enduring environment that can be returned to over time once paired with a (quick and easy to get) allocation to Pawsey’s resources. Trainees attending the upcoming online BioCommons workshop ‘RNASeq: reads to differential genes and pathways’ will no doubt continue to use the BioImage well after the training event is over.
Powering up the ACDC
The Australian Cardiovascular disease Data Commons (ACDC) will pave the way for researchers to make new mechanistic insights and identify potential markers for coronary artery disease.
The Australian Cardiovascular disease Data Commons (ACDC) will pave the way for researchers to make new mechanistic insights and identify potential markers for coronary artery disease (CAD), plus facilitate a translational pipeline to ensure new discoveries are deployed to clinical practice. A new comprehensive, secure, scalable, and internationally integrated data infrastructure will provide access to pooled data from approximately 400,000 individuals across up to 18 clinical cohorts within Australia.
CAD is the most common type of cardiovascular disease, both in terms of deaths and hospitalisations. Despite developing over several years, CAD is difficult to detect and many patients have no warning symptoms. Developments in data-intensive biomarker research techniques such as genomics, metabolomics, proteomics and immuno-phenotyping, paired with advances in image processing, machine learning, and systems biology pipelines, present opportunities to better understand and identify those at risk through pooled analysis using research infrastructures such as the ACDC.
Work on the ACDC commenced in September, with all stakeholders in the MRFF Critical Research Infrastructure Grant meeting online, including representatives of all 18 clinical cohorts that are planned to be integrated into the ACDC platform.
It’s a long way to the top, and small working groups will now hold fortnightly ‘sprints’ to keep the ACDC moving forward. You can read more about the ACDC on the BioCommons website.
A formal, in-person, kick off of the ACDC will occur in early December as part of a broader Australian Cardiovascular Alliance Precision Medicine Flagship event hosted by the Baker Heart and Diabetes Institute.
The ACDC project is led by the Baker Heart and Diabetes Institute and funded by the Medical Research Future Fund (MRFF) and Bioplatforms Australia (MRFF 2022 National Critical Research Infrastructure Grant: Building an Australian Cardiovascular disease Data Commons). Additional contributions are being made by the Baker Heart and Diabetes Institute, 23Strands, ACvA, CSL Limited, University of Sydney, Australian BioCommons, data custodians and other partners.