Revisão Acesso aberto Revisado por pares

Reproducible acquisition, management and meta-analysis of nucleotide sequence (meta)data using q2-fondue

2022; Oxford University Press; Volume: 38; Issue: 22 Linguagem: Inglês

10.1093/bioinformatics/btac639

ISSN

1367-4811

Autores

Michał Ziemski, A. K. Adamov, Lina Kim, Lena Flörl, Nicholas A. Bokulich,

Tópico(s)

Microbial Community Ecology and Physiology

Resumo

Abstract Motivation The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles. Results q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets. Availability and implementation q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples. Supplementary information Supplementary data are available at Bioinformatics online.

Referência(s)