Carta Acesso aberto Revisado por pares

Reagent contamination in viromics: all that glitters is not gold

2019; Elsevier BV; Volume: 25; Issue: 10 Linguagem: Inglês

10.1016/j.cmi.2019.06.019

ISSN

1469-0691

Autores

Edward C. Holmes,

Tópico(s)

Molecular Biology Techniques and Applications

Resumo

High-throughput sequencing and metagenomics are revolutionizing virus discovery and enabling the rapid tracking of disease outbreaks on potentially clinically actionable time scales [[1]Grubaugh N.D. Ladner J.T. Lemey P. Pybus O.G. Rambaut A. Holmes E.C. et al.Tracking virus outbreaks in the twenty-first century.Nat Micro. 2019; 4: 10-19Crossref PubMed Scopus (163) Google Scholar]. Metagenomics may also eventually reveal the total infectome (bacteria, viruses, eukaryotic parasites) of an individual host in a single sequencing run, turbo-charging diagnostics [[2]Zhang Y.-Z. Chen Y.-M. Wang W. Qin X.-C. Holmes E.C. Expanding the RNA virosphere by unbiased metagenomics.Annu Rev Virol. 2019; (May 17)https://doi.org/10.1146/annurev-virology-092818-015851Crossref PubMed Scopus (71) Google Scholar]. Although the science of this new genomic era faces a number of challenges – with the sheer quantity and complexity of data generated by rapidly developing sequencing technologies both a help and a hindrance – one additional, major, and often unanticipated problem, is reagent contamination. In a new study, Asplund et al. [[3]Asplund M. Kjartansdóttir K.R. Mollerup S. Vinner L. Fridholm H. Herrera J.A.R. et al.Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing libraries.Clin Microbiol Infect. 2019; 25: 1277-1285https://doi.org/10.1016/j.cmi.2019.04.028Abstract Full Text Full Text PDF Scopus (73) Google Scholar] reveal the remarkable scale and pattern of virus contamination in reagents that should send an important and timely warning shot to all those engaged in ‘viromics’. Although there have been previous discussions on the prevalence and impact of reagent contamination [4Smuts H. Kew M. Khan A. Korsman S. Novel hybrid parvovirus-like virus, NIH-CQV/PHV, contaminants in silica column-based nucleic acid extraction kits.J Virol. 2014; 88 (1398–8)Crossref PubMed Scopus (45) Google Scholar, 5Naccache S.N. Greninger A.L. Lee D. Coffey L.L. Phan T. Rein-Weston A. et al.The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns.J Virol. 2013; 87: 11966-11977Crossref PubMed Scopus (189) Google Scholar], and the echoes of some very high profile cases (such as XMRV; [6Paprotka T. Delviks-Frankenberry K.A. Cingöz O. Martinez A. Kung H.J. Tepper C.G. et al.Recombinant origin of the retrovirus XMRV.Science. 2011; 333: 97-101Crossref PubMed Scopus (209) Google Scholar, 7Knox K. Carrigan D. Simmons G. Teque F. Zhou Y. Hackett Jr., J. et al.No evidence of murine-like gammaretroviruses in CFS patients previously identified as XMRV-infected.Science. 2011; 333: 94-97Crossref PubMed Scopus (95) Google Scholar]) can still be heard, the work of Asplund et al. is notable because of its size: an analysis of over 700 sequencing libraries, many from humans, generated using eight different protocols including shotgun DNA and RNA sequencing as well as a variety of enrichment and capture protocols, and involving over 50 laboratory reagents. A staggering 493 of the virus sequences analysed were significantly associated with the use of laboratory reagents, including those for sample storage and nucleic acid extraction, purification and library preparation. In addition, a remarkable 68% of all viral reads were part of clusters linked to laboratory components. Although a wide variety of viruses were statistically linked to the reagents, single-stranded (ss) DNA viruses, such as those related to circoviruses, were particularly common, including a number found in environmental samples such as sewage. To add to the complexity, these reagent-associated viruses often only appeared sporadically in subsets of the total data. Hence, some of the tell-tale signs of contamination, discussed in more detail below, may not be readily apparent. Although the study of Asplund et al. is based on statistical associations between viruses and reagents, greatly enhanced by the fact that some of the human samples were processed using different protocols and sequenced multiple times, the sheer scale of the contamination is remarkable, and few reagents go contamination free. Figure 1 in the Asplund et al. paper, that starkly depicts the extent and structure of contamination, should probably be posted on every laboratory wall as a warning. Interestingly, components used in nucleic acid purification, nucleases and library kits were particularly prone to the presence of viruses, with RNeasy MinElute the most contaminated reagent, followed by ScriptSeq v2 and ScriptSeq Gold. Although previous studies have shown that marine organisms are may be an especially rich source of reagent contamination [[5]Naccache S.N. Greninger A.L. Lee D. Coffey L.L. Phan T. Rein-Weston A. et al.The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns.J Virol. 2013; 87: 11966-11977Crossref PubMed Scopus (189) Google Scholar], one of the key observations of the Asplund et al. study was that common human viruses, such as B19 parovirus, may also be contaminants, greatly complicating both diagnostics and prevalence studies. Also of importance, although perhaps expected, was that reagent-associated contamination was primarily associated with individual sequence reads rather than assembled contigs. Hence, special care must be taken when assigning sequences to viruses based on read data alone, and it is important to drill down into the individual reads to guarantee that they truly represent the microbe suggested by the bioinformatics analysis, rather than simply mis-assignment which is commonplace. Although many of the contaminants identified by Asplund et al. were ssDNA viruses, it is clear that other types of microbe are also present in laboratory reagents. The widespread presence of bacteria, and how this might impact studies of the microbiome, is now appreciated [[8]de Goffau M.C. Lager S. Salter S.J. Wagner J. Kronbichler A. Charnock-Jones D.S. et al.Recognizing the reagent microbiome.Nat Microbiol. 2018; 3: 851-853Crossref PubMed Scopus (161) Google Scholar]. It is also clear that RNA viruses can find their way into laboratory materials, with Kadipiro virus (a double-strand RNA virus) an excellent case in point [[9]Ngoi C.N. Siqueira J. Li L. Deng X. Mugo P. Graham S.M. et al.Corrigendum: the plasma virome of febrile adult Kenyans shows frequent parvovirus B19 infections and a novel arbovirus (Kadipiro virus).J Gen Virol. 2017; 98: 517Crossref PubMed Scopus (13) Google Scholar]. My own research group has also found evidence that members of the Kinetoplastida – a group of protists that include such clinical important parasites as trypanosomes – are also strongly associated with reagents, along with the ubiquitous ssDNA viruses. As well as reagents, the environment may also represent an important source of analytical noise, as any virus sequence reads obtained could in theory come from microorganisms that are co-infecting the host, or be associated with host diet [[2]Zhang Y.-Z. Chen Y.-M. Wang W. Qin X.-C. Holmes E.C. Expanding the RNA virosphere by unbiased metagenomics.Annu Rev Virol. 2019; (May 17)https://doi.org/10.1146/annurev-virology-092818-015851Crossref PubMed Scopus (71) Google Scholar]. Indeed, some of the earliest metagenomic studies of humans revealed that plant viruses are a common component of human faecal matter, reflecting the consumption of plants as food [[10]Zhang T. Breitbart M. Lee W.H. Run J.Q. Wei C.L. Soh S.W. et al.RNA viral community in human feces: prevalence of plant pathogenic viruses.PLoS Biol. 2005; 4: e3Crossref Scopus (507) Google Scholar], and so should not be regarded as pathogens. If reagent contamination is so pervasive, how best to deal with it? First, and most obvious, in the context of trying to identify the possible microbiological cause of a specific infectious disease it is always important to confirm the presence of a microbe suggested by sequencing through an additional step, such as PCR or even cell culture. Hence, despite its immense power and potential, at present metagenomics might be best thought of as a first-line method to identify potential pathogens for follow-up, rather than the final word in pathogen identification. Indeed, in diagnostic cases, it is advisable to sequence a blank control, such as a sterile water sample, in each case, and then use this as a quality control filter in comparisons of viromes obtained from the case sample. Hopefully cleaner reagents and more reliable bioinformatics screens are a not too distant dream. Next, there are a few important tell-tale signals of reagent contamination: for example, reads associated with reagents are often (although not always) at low abundance and will be found in multiple libraries. Indeed, as pointed out by Asplund et al., it is critical to consider genome coverage, read depth and the distribution of read/contig alignments across the viral genome as key parameters that might be indicative of contaminant or bona fide sequences. Finally, and perhaps most important of all, the analysis and interpretation of high-throughput sequencing and metagenomic data requires a healthy dose of common sense. If the data contain virus sequence reads that are phylogenetically divergent from those normally associated with the host of interest, such as the presence of marine or plant viruses in a human sample, then the possibility of reagent contamination should be carefully considered. The author declares no conflicts of interest. No external funding was received for this study. Contaminating viral sequences in high-throughput sequencing viromics: a linkage study of 700 sequencing librariesClinical Microbiology and InfectionVol. 25Issue 10PreviewSample preparation for high-throughput sequencing (HTS) includes treatment with various laboratory components, potentially carrying viral nucleic acids, the extent of which has not been thoroughly investigated. Our aim was to systematically examine a diverse repertoire of laboratory components used to prepare samples for HTS in order to identify contaminating viral sequences. Full-Text PDF Open Access

Referência(s)