Genome Assembly

600px-Mapping_Reads2.png

Genome assembly is the process of taking large numbers of relatively short discrete DNA or nucleotide sequences, termed ‘reads’, and computationally ordering them correctly to create a representation of the original chromosome or genome from which the DNA originated. 

We offer fully subsidised resources to help Australian researchers undertake genome assembly. For the full range of relevant services and resources visit our genomics page.

If you're looking for annotation specific tools, try these:

  • Access many genome annotation tools with significant underlying compute power through your web browser via Galaxy Australia*

  • Take a look at the How-to-Guides that describe the steps and tools required to assemble a genome on the Galaxy Australia platform

  • Browse self-paced training materials freely available via the Galaxy Training Network

  • Find assembly software and tools across Australian computational infrastructure with ToolFinder

  • Find assembly workflows to deploy or implement with WorkflowHub.


Join the conversation - All welcome!

We coordinate a joint community for genome assembly and annotation, which aims to:

  • Provide a forum for the community to connect and share knowledge 

  • Understand the bioinformatics challenges of relevance to genome assembly and annotation

  • Identify gaps in community scale digital infrastructure currently available for genome assembly and annotation

  • Deploy national scale solutions that address these gaps or challenges when relevant


Roadmap

We have been engaging with a broad group of Australian researchers working across a wide variety of taxa since 2020 to develop a Genome Assembly Infrastructure Roadmap for Australia, which presents a community vision for shared national infrastructure that will help researchers undertake genome assembly.


*Galaxy Australia is capable of assembling an animal genome in less than two hours. For example, when assembling an avian genome from PacBio HiFi reads using HiFiASM (Input dataset size: 18 GB BAM compressed and 45 GB FASTQ uncompressed), the genome was assembled in 8 hours using a 16 CPU node or in 1 hour 54 minutes using a 63 CPU node.