Artigo Acesso aberto Revisado por pares

A robust benchmark for detection of germline large deletions and insertions

2020; Nature Portfolio; Volume: 38; Issue: 11 Linguagem: Inglês

10.1038/s41587-020-0538-8

ISSN

1546-1696

Autores

Justin M. Zook, Nancy F. Hansen, Nathan D. Olson, Lesley M. Chapman, James C. Mullikin, Chunlin Xiao, Stephen T. Sherry, Sergey Koren, Adam M. Phillippy, Paul C. Boutros, Sayed Mohammad Ebrahim Sahraeian, Vincent Huang, Alexandre Rouette, Noah Alexander, Christopher E. Mason, Iman Hajirasouliha, Camir Ricketts, Joyce Lee, Rick Tearle, Ian T. Fiddes, Álvaro Martínez Barrio, Jeremiah A. Wala, Andrew Carroll, Noushin Ghaffari, Oscar L. Rodriguez, Ali Bashir, Shaun D. Jackman, John J. Farrell, Aaron M. Wenger, Can Alkan, Arda Söylev, Michael C. Schatz, Shilpa Garg, George M. Church, Tobias Marschall, Ken Chen, Xian Fan, Adam C. English, Jeffrey Rosenfeld, Weichen Zhou, Ryan E. Mills, Jay M. Sage, Jennifer R. Davis, Michael D. Kaiser, John S. Oliver, Anthony P. Catalano, Mark Chaisson, Noah Spies, Fritz J. Sedlazeck, Marc Salit,

Tópico(s)

Genomics and Rare Diseases

Resumo

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping. Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions.

Referência(s)