Establishing Gen3 to enable better Human Genome Data sharing in Australia


NOTE: This project completed in Q3 2021. The aims were met and gen3 instances were deployed at both CCI and UMCCR/AGHA to better enable human genome data management by those groups.


There are multiple Initiatives in Australia generating human genomes at scale. To date, each have developed in-house solutions for storing/ warehousing genomes, describing the content and largely have manual/laborious systems for managing and providing access, based on different technologies. The content of each collection is largely not transparent to outside users, and although there is a desire to share data wherever possible, most have no efficient way to expose the collection content to researchers or to distribute the data. The outcome is that there is a substantial burden in sharing data with the current arrangements.

All have a need to operate scalable services and infrastructure that is: easily administered; allows for the efficient management of data files and metadata (storing, security, access control, findability, shareability/interoperability with collaborators and others); and aligned with global systems that observe and utilise tools/infrastructure that align with global standards for human genome data sharing.

Gen3 is an open source software suite that allows data to be received, managed, described, quality controlled and shared with authorised/ authenticated individuals. Storage of data objects can be held over any number of private or public clouds. Gen3 has been used to underpin several very large NIH-funded Genomic Datasets that collectively house and describe data derived from hundreds of thousands of human samples (e.g. NCI Genomic Data Commons, BloodPAC, BrainCommons, Kids First Data Commons). Access to the actual genomic data is managed by a Data Access Committee (DAC) for each dataset, and controlled by various tools which are part of the Gen3 system. Data held in the cloud in a Gen3-based storage system can also be directly linked to cloud-based analysis systems.

Working with key partners Zero Childhood Cancer (ZERO) / Children’s Cancer Institute and the University of Melbourne Centre for Cancer Research (UMCCR), this 6 month project (Apr - Sep 2021) is exploring establishing the Gen3 technology in Australia. The broad objectives are to lay the necessary groundwork for ZERO and UMCCR to establish systems for easier management and sharing of their human genome data holdings, and to also ensure that other Australian providers/Institutions can easily deploy the same solution into the future.

The specific aims are to:

  1. Support ZERO and UMCCR deploy basic Gen3 instances*

  2. Support ZERO and UMCCR migrate existing cancer data into the Gen3 instances established

  3. Design procedures within ZERO and UMCCR that will enable the migration of future data into the Gen3 instances established

  4. Provide documentation and training material/events to enable users to use the systems deployed

  5. Provide documentation and training material/events so other Australian providers/Institutions can also deploy a Gen3

  6. Explore the applicability of an integrated identity and access management platform (CILogon) for potential future use by ZERO, UMCCR and others.

The project forms part of the Australian BioCommons Human Genome Informatics initiative, and is funded through NCRIS funding via Bioplatforms Australia and contributions from UMCCR and ZERO.


Project partners:


*This enables a suite of microservices to support various core functionalities - each of which will be explored: Data discovery through a graphical user interface (GUI) and applicaion programming interface (API); Metadata representation via a data dictionary; Metadata validation; Data upload through a command-line interface (CLI); User Authentication/Authorisation via Google or Shibboleth log-in; Data download through a GUI or CLI; and Data analysis via built in Jupyter Notebooks.