Artigo Acesso aberto Revisado por pares

Topic modeling for untargeted substructure exploration in metabolomics

2016; National Academy of Sciences; Volume: 113; Issue: 48 Linguagem: Inglês

10.1073/pnas.1608041113

ISSN

1091-6490

Autores

Justin J. J. van der Hooft, Joe Wandy, Michael P. Barrett, Karl Burgess, Simon Rogers,

Tópico(s)

Fermentation and Sensory Analysis

Resumo

Significance Tandem MS is a technique for compound identification in untargeted metabolomics experiments. Because of a lack of reference spectra, most molecules cannot be identified, and many spectra cannot be used. We present MS2LDA, an unsupervised method (inspired by text-mining) that extracts common patterns of mass fragments and neutral losses—Mass2Motifs—from collections of fragmentation spectra. Structurally characterized Mass2Motifs can be used to annotate molecules for which no reference spectra exist and expose biochemical relationships between molecules. For four beer extracts, without training data, we show that, with 30 structurally characterized Mass2Motifs, we can annotate approximately three times as many molecules as with library matching. These Mass2Motifs were validated in reference spectra from Global Natural Products Social Molecular Networking (GNPS) and MassBank.

Referência(s)