Artigo Revisado por pares

Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients

2020; Elsevier BV; Volume: 144; Linguagem: Inglês

10.1016/j.ejca.2020.11.030

ISSN

1879-0852

Autores

Nuria Ribelles, José M. Jerez, Pablo Rodríguez-Brazzarola, Begoña Jiménez, Tamara Díaz-Redondo, Héctor Mesa, Antonia Márquez, Alfonso Sánchez‐Muñoz, Bella Pajares, F. Carabantes, María José Bermejo, Ester Martín-Villar, María Emilia Domínguez-Recio, E Sáez, Laura Gálvez, Ana Godoy-Ortiz, Leonardo Franco, Sofía Ruiz-Medina, Irene López, Emilio Alba,

Tópico(s)

HER2/EGFR in Cancer Research

Resumo

Background CDK4/6 inhibitors plus endocrine therapies are the current standard of care in the first-line treatment of HR+/HER2-negative metastatic breast cancer, but there are no well-established clinical or molecular predictive factors for patient response. In the era of personalised oncology, new approaches for developing predictive models of response are needed. Materials and methods Data derived from the electronic health records (EHRs) of real-world patients with HR+/HER2-negative advanced breast cancer were used to develop predictive models for early and late progression to first-line treatment. Two machine learning approaches were used: a classic approach using a data set of manually extracted features from reviewed (EHR) patients, and a second approach using natural language processing (NLP) of free-text clinical notes recorded during medical visits. Results Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment, of which 126 (20.6%) occurred within the first 6 months. There were 152 patients (24.9%) who showed no disease progression before 28 months from the onset of first-line treatment. The best predictive model for early progression using the manually extracted dataset achieved an area under the curve (AUC) of 0.734 (95% CI 0.687–0.782). Using the NLP free-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714–0.800). The best model to predict long responders using manually extracted data obtained an AUC of 0.669 (95% CI 0.608–0.730). With NLP free-text processing, the best model attained an AUC of 0.752 (95% CI 0.705–0.799). Conclusions Using machine learning methods, we developed predictive models for early and late progression to first-line treatment of HR+/HER2-negative metastatic breast cancer, also finding that NLP-based machine learning models are slightly better than predictive models based on manually obtained data.

Referência(s)