Artigo Revisado por pares

Chemical Name to Structure: OPSIN, an Open Source Solution

2011; American Chemical Society; Volume: 51; Issue: 3 Linguagem: Inglês

10.1021/ci100384d

ISSN

1549-960X

Autores

Daniel M. Lowe, Peter Corbett, Peter Murray‐Rust, Robert C. Glen,

Tópico(s)

Analytical Chemistry and Chromatography

Resumo

We have produced an open source, freely available, algorithm (Open Parser for Systematic IUPAC Nomenclature, OPSIN) that interprets the majority of organic chemical nomenclature in a fast and precise manner. This has been achieved using an approach based on a regular grammar. This grammar is used to guide tokenization, a potentially difficult problem in chemical names. From the parsed chemical name, an XML parse tree is constructed that is operated on in a stepwise manner until the structure has been reconstructed from the name. Results from OPSIN on various computer generated name/structure pair sets are presented. These show exceptionally high precision (99.8%+) and, when using general organic chemical nomenclature, high recall (98.7−99.2%). This software can serve as the basis for future open source developments of chemical name interpretation.

Referência(s)