Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy
2024; Nature Portfolio; Volume: 42; Issue: 3 Linguagem: Inglês
10.1038/s41587-023-02100-3
ISSN1546-1696
AutoresDelphine Larivière, Linelle Abueg, Nadolina Brajuka, Cristóbal Gallardo Alba, Björn Grüning, Byung June Ko, Alexander Ostrovsky, Marc Palmada‐Flores, Brandon D. Pickett, K Siddique-e Rabbani, Agostinho Antunes, Jennifer Balacco, Mark Chaisson, Haoyu Cheng, Joanna Collins, Mélanie Couture, Alexandra A. Denisova, Olivier Fédrigo, Guido Roberto Gallo, Alice Maria Giani, Grenville MacDonald Gooder, Kathleen Horan, Nivesh Jain, Cassidy Johnson, Heebal Kim, Chul Lee, Tomàs Marquès‐Bonet, Brian I. O’Toole, Arang Rhie, Simona Secomandi, Marcella Sozzoni, Tatiana Tilley, Marcela Uliano‐Silva, Marius van den Beek, Robert W. Williams, Robert M. Waterhouse, Adam M. Phillippy, Erich D. Jarvis, Michael C. Schatz, Anton Nekrutenko, Giulio Formenti,
Tópico(s)Chromosomal and Genetic Variations
ResumoScalable, accessible and reproducible reference genome assembly and evaluation in Galaxy T he Earth BioGenome Project aims to produce reference genomes for all ~1.8 million known eukaryotic species over the next decade [1][2][3][4] .Achieving this goal will require the current pace of reference genome production to increase by at least two orders of magnitude 1 .Automation of the assembly process with a pipeline that is widely accessible to any research group will be required to achieve this speed-up.Enabling this goal requires sustained effort in three major areas: genome assembly optimization and best-practice development, computational infrastructure provisioning, and dissemination and training.To optimize the assembly process and devise best practices, we combined the expertise of two projects-the Vertebrate Genomes Project (VGP) and the European Reference Genome Atlas (ERGA).The VGP is a collaborative effort to generate reference genomes for all ~70,000 vertebrate species 5 .In the past 5 years, the VGP has released hundreds of new reference genomes supported by the development of automated assembly tools and workflows 1,5 .The ERGA is a pan-European scientific initiative to generate reference genomes for all ~200,000 European eukaryote species, many of which are on the International Union for Conservation of Nature Red List of species at risk of extinction 2 .Advancing from the prior VGP work, originally on the DNAnexus platform (Supplementary Note, section 1.1), we developed a pipeline within the Galaxy ecosystem 6 that combines Pacific Biosciences (PacBio) high-fidelity (HiFi) reads with long-distance information from Hi-C maps and/or optical maps to generate nearly complete assemblies (Supplementary Note 1.3).The pipeline further uses Hi-C or whole-genome sequence data from parents to produce chromosomal-level or whole-genome-level phased genomes, respectively.To streamline the assembly process and ensure quality, the pipeline includes extensive quality control (QC) functions pipeline do not release those that are under specific embargo policies for genome-wide analyses (e.g., https://genome10k.ucsc.
Referência(s)