Carta Acesso aberto Revisado por pares

The UNITE database for molecular identification of fungi – recent updates and future perspectives

2010; Wiley; Volume: 186; Issue: 2 Linguagem: Inglês

10.1111/j.1469-8137.2009.03160.x

ISSN

1469-8137

Autores

Kessy Abarenkov, R. Henrik Nilsson, Karl‐Henrik Larsson, Ian J. Alexander, Ursula Eberhardt, Susanne Erland, Klaus Høiland, Rasmus Kjøller, Ellen Larsson, Taina Pennanen, Robin Sen, Andy F. S. Taylor, Leho Tedersoo, Björn M. Ursing, Trude Vrålstad, Kare Liimatainen, Ursula Peintner, Urmas Kõljalg,

Tópico(s)

Yeasts and Rust Fungi Studies

Resumo

New PhytologistVolume 186, Issue 2 p. 281-285 LettersFree Access The UNITE database for molecular identification of fungi – recent updates and future perspectives Kessy Abarenkov, Kessy Abarenkov Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, EstoniaSearch for more papers by this authorR. Henrik Nilsson, R. Henrik Nilsson Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, Estonia Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenSearch for more papers by this authorKarl-Henrik Larsson, Karl-Henrik Larsson The Mycological Herbarium, Natural History Museum, University of Oslo, PO Box 1172, Blindern, N-0318 Oslo, NorwaySearch for more papers by this authorIan J. Alexander, Ian J. Alexander School of Biological Sciences, University of Aberdeen, Cruickshank Building, Aberdeen AB24 3UU, UKSearch for more papers by this authorUrsula Eberhardt, Ursula Eberhardt Centraalbureau voor Schimmelcultures, PO Box 85167, 3508 AD Utrecht, the NetherlandsSearch for more papers by this authorSusanne Erland, Susanne Erland Max Planck Institute for Chemical Ecology, Hans-Knoell-Strasse 8, D-07745 Jena, GermanySearch for more papers by this authorKlaus Høiland, Klaus Høiland Department of Biology, University of Oslo, Box 1066, Blindern, N-0316 Oslo, NorwaySearch for more papers by this authorRasmus Kjøller, Rasmus Kjøller Biological Institute, Terrestrial Ecology, University of Copenhagen, Øster Farimagsgade 2D, DK-1353 Copenhagen, DenmarkSearch for more papers by this authorEllen Larsson, Ellen Larsson Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenSearch for more papers by this authorTaina Pennanen, Taina Pennanen The Finnish Forest Research Institute, PL 18, FI-01301 Vantaa, FinlandSearch for more papers by this authorRobin Sen, Robin Sen Department of Environmental and Geographical Sciences, Manchester Metropolitan University, John Dalton Building, Chester Street, Manchester M1 5GD, UKSearch for more papers by this authorAndy F. S. Taylor, Andy F. S. Taylor Macaulay Institute, Craigiebuckler, Aberdeen AB15 8QH, UKSearch for more papers by this authorLeho Tedersoo, Leho Tedersoo Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, EstoniaSearch for more papers by this authorBjörn M. Ursing, Björn M. Ursing Beronem AB, Kullagränd 6, 239 42 Falsterbo, SwedenSearch for more papers by this authorTrude Vrålstad, Trude Vrålstad Department of Biology, University of Oslo, Box 1066, Blindern, N-0316 Oslo, NorwaySearch for more papers by this authorKare Liimatainen, Kare Liimatainen Department of Biology, University of Washington Seattle, Box 351330, 235 Johnson Hall, WA 98195, USASearch for more papers by this authorUrsula Peintner, Ursula Peintner Institute of Microbiology, University Innsbruck, Technikerstr. 25, 6020 Innsbruck, AustriaSearch for more papers by this authorUrmas Kõljalg, Corresponding Author Urmas Kõljalg Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, Estonia(Author for correspondence: tel +372 738 3027; email [email protected])Search for more papers by this author Kessy Abarenkov, Kessy Abarenkov Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, EstoniaSearch for more papers by this authorR. Henrik Nilsson, R. Henrik Nilsson Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, Estonia Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenSearch for more papers by this authorKarl-Henrik Larsson, Karl-Henrik Larsson The Mycological Herbarium, Natural History Museum, University of Oslo, PO Box 1172, Blindern, N-0318 Oslo, NorwaySearch for more papers by this authorIan J. Alexander, Ian J. Alexander School of Biological Sciences, University of Aberdeen, Cruickshank Building, Aberdeen AB24 3UU, UKSearch for more papers by this authorUrsula Eberhardt, Ursula Eberhardt Centraalbureau voor Schimmelcultures, PO Box 85167, 3508 AD Utrecht, the NetherlandsSearch for more papers by this authorSusanne Erland, Susanne Erland Max Planck Institute for Chemical Ecology, Hans-Knoell-Strasse 8, D-07745 Jena, GermanySearch for more papers by this authorKlaus Høiland, Klaus Høiland Department of Biology, University of Oslo, Box 1066, Blindern, N-0316 Oslo, NorwaySearch for more papers by this authorRasmus Kjøller, Rasmus Kjøller Biological Institute, Terrestrial Ecology, University of Copenhagen, Øster Farimagsgade 2D, DK-1353 Copenhagen, DenmarkSearch for more papers by this authorEllen Larsson, Ellen Larsson Department of Plant and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, SwedenSearch for more papers by this authorTaina Pennanen, Taina Pennanen The Finnish Forest Research Institute, PL 18, FI-01301 Vantaa, FinlandSearch for more papers by this authorRobin Sen, Robin Sen Department of Environmental and Geographical Sciences, Manchester Metropolitan University, John Dalton Building, Chester Street, Manchester M1 5GD, UKSearch for more papers by this authorAndy F. S. Taylor, Andy F. S. Taylor Macaulay Institute, Craigiebuckler, Aberdeen AB15 8QH, UKSearch for more papers by this authorLeho Tedersoo, Leho Tedersoo Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, EstoniaSearch for more papers by this authorBjörn M. Ursing, Björn M. Ursing Beronem AB, Kullagränd 6, 239 42 Falsterbo, SwedenSearch for more papers by this authorTrude Vrålstad, Trude Vrålstad Department of Biology, University of Oslo, Box 1066, Blindern, N-0316 Oslo, NorwaySearch for more papers by this authorKare Liimatainen, Kare Liimatainen Department of Biology, University of Washington Seattle, Box 351330, 235 Johnson Hall, WA 98195, USASearch for more papers by this authorUrsula Peintner, Ursula Peintner Institute of Microbiology, University Innsbruck, Technikerstr. 25, 6020 Innsbruck, AustriaSearch for more papers by this authorUrmas Kõljalg, Corresponding Author Urmas Kõljalg Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, 51005 Tartu, Estonia(Author for correspondence: tel +372 738 3027; email [email protected])Search for more papers by this author First published: 25 March 2010 https://doi.org/10.1111/j.1469-8137.2009.03160.xCitations: 1,175AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Rationale Ectomycorrhizal (ECM) fungi are typically examined for taxonomic affiliation through sequence similarity searches involving the internal transcribed spacer (ITS) region of the nuclear ribosomal repeat unit and the International Nucleotide Sequence Databases (INSD) (Peay et al., 2008; Taylor, 2008; Benson et al., 2009; Tedersoo et al., 2010). However, the usefulness of these searches is constrained by the technical quality and the taxonomic reliability of the reference sequences in the databases (Nilsson et al., 2006; Bidartondo, 2008). The meagre data on voucher specimen, country of collection and host, which are associated with many of the entries, place a further restriction on the usefulness of the entries in an ecological or taxonomic context (Ryberg et al., 2009). The UNITE project (Kõljalg et al., 2005) was initiated in 2001 to address these problems through a free online database for high-quality reference records of ITS sequences from North European ECM fungi. Taxonomic reliability was the founding principle of the initiative; all records were determined to species level (or as far as possible) by researchers well versed in the taxonomic group in question, and all sequences were obtained from, or in association with, richly annotated fruiting bodies (voucher specimens) deposited in public herbaria. The years since the 2005 publication of UNITE have witnessed a proliferation of environmental sequencing efforts from all over the world, and there is a clear tendency in the recent literature to target entire communities of fungi rather than individual taxa or subsets of the full diversity. Such studies may still focus on ECM fungi, but inherent to many of these projects is a desire to examine also the non-ECM sequences to obtain a better view of the trophic processes and potential interdependence or interactions among taxa (Lindahl et al., 2007). Any modern initiative aiming to provide facilities for sequence identification must therefore be prepared for geographically diverse, and ecologically disparate, query sequences. These observations, together with the prospects of emerging high-throughout sequencing technologies such as massively parallel (454) pyrosequencing (Margulies et al., 2005; Shendure & Ji, 2008; Hibbett et al., 2009), suggest that a sequence database with a limited geographical or nutritional coverage of taxa – and where sequences are processed one at a time – may no longer serve the needs of the research community in a fully efficient way. We have been working to keep the UNITE database abreast of developments in the field, and in the present Letter we list the major updates in technology, methodology and policy that UNITE has undergone since its initial publication. A set of guidelines for its future development is also provided. Sequence coverage and taxonomic inclusiveness: statistics and policy updates The number of voucher-associated sequences in UNITE has increased from 811 in 2005 to nearly 3000 at present (Table 1). The number of species of North European ECM fungi represented in UNITE has more than doubled to 896 (73% of the known ECM fungi in North Europe; Hansen & Knudsen, 1997; Knudsen & Vesterholt, 2008), and the total number of species has increased from 480 to 1078. Taxon sampling is necessarily not uniform, but reflects the availability of taxonomic expertise and updated generic revisions. For example, Ramaria (two species in UNITE), Pezizales (30 species in UNITE) and Helotiales (32 species in UNITE) are taxa for which a substantial amount of data remain to be generated; by contrast, sequences for 80% of the known North European species of Lactarius and Boletus have been deposited in UNITE. Table 1. Statistics on the records of the UNITE database as of November 2009 Taxon name No. of sequences 2009 No. of species 2009 No. of sequences March 2005 No. of species March 2005 Basidimycota Agaricales 1576 587 330 227 Boletales 291 102 133 62 Cantharellales 18 8 10 7 Dacrymycetales 1 1 0 0 Geastrales 2 2 0 0 Hymenochaetales 1 1 0 0 Hysterangiales 2 2 0 0 Phallales 8 7 2 2 Polyporales 56 27 7 5 Russulales 442 191 216 115 Thelephorales 275 78 98 49 Tremellales 11 5 3 2 Ascomycota Eurotiales 4 2 3 2 Helotiales 37 32 0 0 Hypocreales 1 1 0 0 Lecanorales 1 1 0 0 Orbiliales 1 1 0 0 Pezizales 33 30 9 9 Total 2760 1078 811 480 Where applicable the corresponding statistics from 2005 are also given. A list of all species and the number of sequences present in UNITE is available at http://unite.ut.ee/SearchPages.php The first major change to announce is that the previous geographical and ecological restrictions on the scope and sequence coverage of UNITE have been lifted. Although we will continue to expand and enhance the taxon sampling of mycorrhizal fungi, we now accept fully identified and well-annotated reference sequences from any geographical locality, nutritional mode and group of fungi, as long as they are supported by vouchers or type cultures, and the sequence authors have a documented expertise of the taxa in question. The curators of UNITE reserve the right to send new reference sequences for peer review. Unidentified and environmental sequences may also be deposited in UNITE, provided that they are of high quality, well annotated and mutually nonredundant (i.e. large sets of identical sequences from the same author and study site will not be accepted); in addition they must cover the ITS region in full. The option to submit reference sequences with the sequence data open to query, but with the species name withheld (i.e. ‘locked’), will be allowed only when the sequences have been accepted for publication but are not yet online. We are furthermore looking into supporting the provision of operational names of the accession number type for clusters of hypothetically conspecific sequences that cannot be identified to species level. Such informal names represent an unambiguous way of referring to such taxa until the data are there to warrant formal description of the species. Although UNITE will maintain a focus on the ITS region, there is now generic support for any other gene and genetic marker pertinent to the identification of fungi. The nuclear large subunit (nLSU/28S) gene, arguably one of the mainstays of current fungal phylogenetic inference, is an example of this. Although not always discriminative at the species level, the nLSU can be used, in the absence of a good ITS match, to assign specimens to a higher taxonomic level. The number of nLSU sequences in UNITE is modest, but several sizable data sets of primarily saprophytic fungi have been scheduled for inclusion in the near future. We hope that these will be followed by more, and invite the mycological community to deposit nLSU sequences in UNITE. It is important that such data be accompanied by primer and amplification details, if this information is not available in a tagged publication. Richer and more dynamic project-oriented relational database model The initial database model has been expanded into a 130-table SQL-compliant database structure compatible with the Taxonomic Database Working Group standards (http://www.tdwg.org/standards/). The structure draws from Taxonomer (Pyle, 2004; and subsequent additions) to capture the full complexity of modern mycological taxonomy and nomenclature. Metadata pertaining to sequences or sets of sequences can now be stored in a way open to direct query and include locality, habitat, soil type and host (Supporting Information Fig. S1). Sequence sets can be formed to reflect contexts such as studies, plots and samples, and the sequences in such sets can be addressed jointly, separately, or in combination with all other sequences. Particular care has been taken to make sure that information can be represented in a fully nuanced way through many-to-many relations: a sequence can have more than one correct name (to account for anamorph–teleomorph relationships and synonyms), a species can have many habitats and ecological characteristics, and a study may be composed of any number of distinct or coupled plots and subplots. A researcher may, for instance, divide the sequences from some given project into sets – reflecting, for example, host or plot of origin – and compare these for differences in taxonomic composition and species richness. Research groups can be granted far-reaching access to the system, allowing, for example, tailoring of the submission procedure in the interest of exact and efficient storage and representation of particularly complex data. Indeed, we envision UNITE as being a fully fledged sequence-management system that individual researchers or research groups can use to store and analyze data from entire projects or study sites. Many of the features of such a system are already in operation. We feel that the inclusion of this sequence-management environment to process new and existing sequences distinguishes the UNITE database from the INSD. Improved support for storage of auxiliary data It is now possible to associate any number of binary files with sequences, species, studies, or other objects or contexts within UNITE. While this service was initially conceived as a response to the debate on the availability of primary sequence data such as chromatograms and other raw sequence data (Costello, 2009), any noncopyrighted and freely available file relevant to the interpretation of the underlying or downstream data will be accepted. This includes, but is not limited to, photographs of fruiting bodies or root tips, drawings of spores and mantle structure, maps, GIS (geographic information system) data files and PDF (Portable Document Format) documents; size restrictions may, however, apply and the data authors are requested to use file formats with manageable file sizes. Scientific publications may be deposited for public view alongside sequence data as long as no copyright laws or legal limitations in distribution or dissemination are violated: it is the responsibility of the depositor/author to obtain such permission where needed. We are also willing to hyperlink to relevant external files provided that these are maintained by – or otherwise under the control of – the sequence author in question and that a reasonable permanency can be guaranteed (Ducut et al., 2008). All such links are checked for validity every 6 months. New sequence submission and maintenance procedure, as well as improved INSD connectivity The sequence-deposition process in UNITE has been reworked and now features a log-in system through which the user can deposit and annotate even large sets of sequences. Information can be updated by the user (sequence author) through the log-in system; any such change will take effect immediately. Users are not allowed to modify the records of other sequence authors, but a Wikipedia-style system for commenting on individual sequences is under development. There is a batch-submission system for environmental sequences, and a software package to examine fungal ITS sequences for the presence of chimeric elements is in the final stages of development. All sequence authors are encouraged also to submit their sequences to the INSD; UNITE is now an INSD LinkOut provider, such that all sequences in INSD that are also present in UNITE (possibly with richer annotation) are hyperlinked there. UNITE similarly offers the possibility to link entries to the INSD. UNITE exchanges data with the INSD on a trimestrial basis and keeps a local copy of all fungal ITS sequences in INSD (approximately 135 000 sequences belonging to some 13 500 fully identified species as of July 2009). The fully identified sequences from INSD can be included in, or excluded from, sequence queries in UNITE. Similarly, environmental/unidentified sequences from UNITE and INSD can be excluded from searches. Usage statistics as a window on the road ahead UNITE (Kõljalg et al., 2005) has been cited about a hundred times since publication, with 2008 being the year with the highest number of citations (30 in total). The studies citing UNITE cover all five continents. The proportion of users from fields other than mycorrhizal and systematic mycology is growing. This increases the pressure on UNITE to provide information that is clear, accurate and up-to-date; the information should ideally be presented with a general scientific, rather than a strictly mycological, target audience in mind. It is equally clear that in future many users will not turn to UNITE with one or a handful of sequences for query and analysis, but with hundreds or thousands. This trend is taken to its extreme by the 454 pyrosequencing platform, whose voluminous output forms a challenge to any database effort (Buée et al., 2009; Jumpponen & Jones, 2009; Öpik et al., 2009). We do not currently envision UNITE as a full solution for newly generated, unprocessed raw sequence data from 454-based projects, but we will seek to make it a swift and useful resource for analysing pre-processed, clustered 454 data sets. As a first step we are preparing a new batch blast search function for joint analysis of multiple query sequences. A second, and more challenging, step is to employ phylogenetic analysis in the batch-mode identification process, but the details remain to be formalized here. The pursuit of mycological knowledge is a global scientific enterprise. UNITE collaborates with the Fungal Environmental Sampling and Informatics Network (FESIN) (Bruns et al., 2008; Horton et al., 2009) to establish guidelines and standards for how environmental samples of fungi should be processed and analysed. Much will be gained in terms of time and resources if software and infrastructural development can be co-ordinated. Furthermore, the geographical coverage of UNITE and FESIN together is considerable and should lead to a significant leap in the number of reference sequences in UNITE over the next few years. The challenges remain substantial, however, and we welcome assistance and collaboration to further the underlying objectives and help to bridge the gap between mycology and other disciplines. We invite any researcher or research group with data or resources relevant to reliable molecular identification of fungi to either deposit their data in UNITE or to contact the UNITE team for further discussions. We similarly invite anyone with a set of fungal sequences in need of taxonomic assignment to consider the sequence-processing environment of UNITE and to make any information that would cast further light on data already residing in the database available to the scientific community. We have secured at least basic funding for UNITE for the foreseeable future, and we intend the database to be a permanent resource for the scientific community. Supporting Information Filename Description NPH_3160_sm_legend to f1.doc19.5 KB Supporting info item NPH_3160_sm_fig1.pdf1.5 MB Supporting info item Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. 2009. GenBank. Nucleic Acids Research 37: D26– D31. Bidartondo MI. 2008. Preserving accuracy in GenBank. Science 319: 1616. Bruns TD, Arnold AE, Hughes KW. 2008. Fungal networks made of humans: UNITE, FESIN, and frontiers in fungal ecology. New Phytologist 177: 586– 588. Buée M, Reich M, Murat C, Morin E, Nilsson RH, Uroz S, Martin F. 2009. 454 pyrosequencing analyses of forest soils reveal an unexpected high fungal diversity. New Phytologist 184: 449– 456. Costello MJ. 2009. Motivating online publication of data. BioScience 59: 418– 427. Ducut E, Liu F, Fontelo P. 2008. An update on Uniform Resource Locator (URL) decay in MEDLINE abstracts and measures for its mitigation. BMC Medical Informatics and Decision Making 8: 23. L Hansen, H Knudsen, eds. 1997. Nordic Macromycetes vol. 3 – Heterobasidioid, aphyllophoroid and gastromycetoid basidiomycetes. Copenhagen, Denmark: Nordsvamp. Hibbett DS, Ohman A, Kirk PM. 2009. Fungal ecology catches fire. New Phytologist 184: 279– 282. Horton TR, Arnold AE, Bruns TD. 2009. FESIN workshops at ESA – the mycelial network grows. Mycorrhiza 19: 283– 285. Jumpponen A, Jones KL. 2009. Massively parallel 454-sequencing indicates hyperdiverse fungal communities in temperate Quercus macrocarpa phyllosphere. New Phytologist 184: 438– 448. H Knudsen, J Vesterholt, eds. 2008. Funga Nordica. Agaricoid, boletoid and cyphelloid genera. Copenhagen, Denmark: Nordsvamp. Kõljalg U, Larsson K-H, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E et al. 2005. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytologist 166: 1063– 1068. Lindahl BD, Ihrmark K, Boberg J, Trumbore SE, Hogberg P, Stenlid J, Finlay RD. 2007. Spatial separation of litter decomposition and mycorrhizal nitrogen uptake in a boreal forest. New Phytologist 173: 611– 620. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen ZT et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376– 380. Nilsson RH, Ryberg M, Kristiansson E, Abarenkov K, Larsson K-H, Kõljalg U. 2006. Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective. PLoS ONE 1: e59. Öpik M, Metsis M, Daniell TJ, Zobel M, Moora M. 2009. Large-scale parallel 454 sequencing reveals host ecological group specificity of arbuscular mycorrhizal fungi in a boreonemoral forest. New Phytologist 184: 424– 437. Peay KG, Kennedy PG, Bruns TD. 2008. Fungal community ecology: a hybrid beast with a molecular master. BioScience 58: 799– 810. Pyle RL. 2004. Taxonomer: a relational data model for managing information relevant to taxonomic research. Phyloinformatics 1: 1– 54. Ryberg M, Kristiansson E, Sjökvist E, Nilsson RH. 2009. An outlook on the fungal ITS sequences in GenBank and the introduction of a web-based tool for the exploration of fungal diversity. New Phytologist 181: 471– 477. Shendure J, Ji H. 2008. Next-generation DNA sequencing. Nature Biotechnology 26: 1135– 1145. Taylor AFS. 2008. Recent advances in our understanding of fungal ecology. Coolia 51: 197– 212. Tedersoo L, May TW, Smith ME. 2010. Ectomycorrhizal lifestyle in fungi: global diversity, distribution and evolution of phylogenetic lineages. Mycorrhiza, in press. Citing Literature Volume186, Issue2April 2010Pages 281-285 ReferencesRelatedInformation

Referência(s)
Altmetric
PlumX