Artigo Acesso aberto Revisado por pares

PMC text mining subset in BioC: about three million full-text articles and growing

2019; Oxford University Press; Volume: 35; Issue: 18 Linguagem: Inglês

10.1093/bioinformatics/btz070

ISSN

1367-4811

Autores

Donald C. Comeau, Chih-Hsuan Wei, Rezarta Islamaj, Zhiyong Lu,

Tópico(s)

Genetics, Bioinformatics, and Biomedical Research

Resumo

Abstract Motivation Interest in text mining full-text biomedical research articles is growing. To facilitate automated processing of nearly 3 million full-text articles (in PubMed Central® Open Access and Author Manuscript subsets) and to improve interoperability, we convert these articles to BioC, a community-driven simple data structure in either XML or JavaScript Object Notation format for conveniently sharing text and annotations. Results The resultant articles can be downloaded via both File Transfer Protocol for bulk access and a Web API for updates or a more focused collection. Since the availability of the Web API in 2017, our BioC collection has been widely used by the research community. Availability and implementation https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/.

Referência(s)