Topic modeling for untargeted substructure exploration in metabolomics
2016; National Academy of Sciences; Volume: 113; Issue: 48 Linguagem: Inglês
10.1073/pnas.1608041113
ISSN1091-6490
AutoresJustin J. J. van der Hooft, Joe Wandy, Michael P. Barrett, Karl Burgess, Simon Rogers,
Tópico(s)Fermentation and Sensory Analysis
ResumoSignificance Tandem MS is a technique for compound identification in untargeted metabolomics experiments. Because of a lack of reference spectra, most molecules cannot be identified, and many spectra cannot be used. We present MS2LDA, an unsupervised method (inspired by text-mining) that extracts common patterns of mass fragments and neutral losses—Mass2Motifs—from collections of fragmentation spectra. Structurally characterized Mass2Motifs can be used to annotate molecules for which no reference spectra exist and expose biochemical relationships between molecules. For four beer extracts, without training data, we show that, with 30 structurally characterized Mass2Motifs, we can annotate approximately three times as many molecules as with library matching. These Mass2Motifs were validated in reference spectra from Global Natural Products Social Molecular Networking (GNPS) and MassBank.
Referência(s)