Key molecular structure prediction datasets now available in the new Structural Biology AI Reference Collection

The prediction and analysis of protein structure by Australian researchers has been accelerated by the publication of the Structural Biology AI Reference Collection at the National Computational Infrastructure (NCI). This collection includes key datasets required to support deep learning models for molecular structure prediction. Its creation is a result of a targeted collaboration between Australian BioCommons, UNSW Sydney and NCI staff.

How does this new resource help researchers?

The Structural Biology AI Reference Collection includes several replicated datasets required to support protein structure research, including various deep learning models for molecular structure prediction. These copies of sequence and structure databases include UniProt, MGnify and the PDB will be regularly updated, providing a citable, versioned resource and ensuring reproducible research. The datasets support higher quality model predictions using tools such as AlphaFold3, AlphaFold2, Boltz, RoseTTaFold-All-Atom and HelixFold3.

Three 3D molecular surface models showing blue and purple proteins interacting with red DNA double-helix structures against a dark blue background.

3D molecular visualisations showing the complex structural interactions between proteins and DNA (Image: Science Photo Library)

Created through national collaboration

A team from UNSW comprising Dr Tom Litfin (Australian BioCommons) and Joshua Caley (Structural Biology Facility) worked directly with the NCI Data Collections team to add the new resources to the NCI Data Catalogue. NCI is an important partner in the BioCommons BioCLI and Workflow Commons projects, and this data collection is an example of our shared ambitions to streamline processing and analysis of molecular data at scale, and establish national ecosystems that support life science researchers at the community level.

How can researchers access the data?

The Structural Biology AI Reference Collection is now freely available. Existing NCI users can register for file-system access through the MyNCI User Portal. New users should review the information on the data products, licences, and data access to get started. 

The Structural Biology AI Reference Collection is supported by the Australian Structural Biology Computing community, Australian BioCommons, and the UNSW Structural Biology Facility.

Next
Next

Meet the Team: Mok, UX Designer