Artigo Acesso aberto Revisado por pares

Construction of Atorvastatin Dose–Response Relationships Using Data from a Large Population‐Based DNA Biobank

2007; Wiley; Volume: 100; Issue: 4 Linguagem: Inglês

10.1111/j.1742-7843.2006.00035.x

ISSN

1742-7843

Autores

Peggy Peissig, Ekta Sirohi, Richard L. Berg, Christa Brown‐Switzer, Nader Ghebranious, Catherine A. McCarty, Russell A. Wilke,

Tópico(s)

Pharmaceutical Economics and Policy

Resumo

Large healthcare databases are increasingly being used for population-based studies of drug efficacy [1]. The Marshfield Clinic Personalized Medicine Research Project (PMRP) in central Wisconsin currently represents one of the largest population-based DNA Biobanks in the world (http://www.mfldclin.edu/pmrp) [2]. The PMRP database was constructed specifically for use in the areas of genetic epidemiology and pharmacogenetics [3]. With more than 19,000 participants, it would be inefficient and cost-prohibitive to manually reconstruct complete medication exposure histories within the PMRP. Validated electronic text searching algorithms will be required. Natural language processing (NLP) has shown promise in terms of reconstructing accurate medication use histories in the PMRP database [4]. The current study extends this work in the context of characterizing efficacy for a single drug class, the 3-hydroxy-3-methylglutaryl coenzyme A reductase inhibitors (statins). Methods. This study received prior approval from the Marshfield Clinic Institutional Review Board. The study was conducted in accordance with the basic Principles of the Declaration of Helsinki. The design represents a retrospective observational study utilizing a test sample of 100 participants from the Marshfield Clinic PMRP database. At present, the PMRP database contains coded electronic medical records for more than 19,000 participants. The large majority receive their health care through Marshfield Clinic, a horizontally integrated, multi-specialty group practice located in central Wisconsin. This community exhibits very low in- and out-migration rates. For patients residing in ZIP codes associated with Marshfield Clinic, the electronic medical record captures 95% of all outpatient encounters and 93% of all inpatient encounters. Because Marshfield Clinic has maintained this highly integrated medical record for over a decade, all clinical data moved into the PMRP database from medical records is amenable to electronic abstraction (i.e. programmable review through the use of text-mining algorithms). For the current study, a test set of older study persons was selected because they typically have a higher frequency of clinical conditions associated with lipid lowering therapy [5]. A sample cohort of older people was identified within the PMRP database through an initial data interrogation strategy that used age and diagnostic codes. In order to avoid a potential provider-bias against the use of statin-based lipid lowering therapy in patients with advanced cognitive impairment, PMRP participants with a prior diagnosis of dementia were not included in the current study. The initial electronic search strategy yielded 336 unique PMRP participants aged 80 years and older with no underlying diagnosis of dementia. From this pre-selected group of PMRP participants, a sample of 100 unique persons was selected for further study. Manual abstraction. Documents for these 100 older participants were reviewed by a licensed practicing physician (C. B.-S.). Records were not abstracted for five of them because either article charts were not available for confirmation (n = 2) or they were deceased since the date of their initial enrolment in PMRP (n = 3). For the remaining 95 persons, text documents were manually reviewed for subject identifiers, past medical history, all prescription medication use at the time of most recent clinic visit, and all statin use throughout the duration of their medical record. This required that the reviewer abstract clinical text documents dating back to 1991. Types of documents encompassed a variety of clinical encounters (e.g. primary care office notes, routine physical examinations, cardiology consultations). In order to manually reconstruct a complete statin use history, the physician reviewer recorded drug name (generic and/or trade-name), drug dose and the time period over which the drug had been prescribed. In all cases, there was sufficient information embedded within the clinical text documents to identify specific dates when each statin drug had been started and stopped. Electronic abstraction. We have previously reported the application of NLP software to the reconstruction of complete retrospective drug use histories for each person participating in the PMRP [4]. This database currently contains more than 19,000 participants. For the 95 persons characterized in the current study, the previously reconstructed ancillary drug use database [4] was re-interrogated to produce 95 unique NLP-based drug use files. Each medication event contained a subject identifier, a clinical visit date (corresponding to the text document), a drug name (as it appeared in the text document), a National Drug Classification code, text surrounding the drug name (flanking text) and drug dose (if available). These data were scanned programmatically to generate a string of event dates (containing drug, dose and flanking text) representing each time the drug name was used within clinical free text documents. This was done for all six available statins, and the process was repeated for each of the 95 unique PMRP participants being characterized in this test set. In contrast to the manual abstraction strategy, no effort was made during the electronic process to determine when a particular statin was started or stopped. Results. The study group included 66 (69.5%) women and 29 (30.5%) men, with a mean age at the time of PMRP enrolment of 88.1 ± 2.8 years (range, 82 to 95 years). For these 95 participants, the electronic abstraction process yielded a total of 993,633 medication records, covering a time period from April 1994 through February 2005. Electronic abstraction revealed that 23 of the 95 persons had text documents containing the mention of a statin at the time of their most recent office visit; manual abstraction confirmed that 20 out of the 95 were actually using a statin at the time of their most recent visit, a statin-use prevalence of 21.1% (95% confidence interval [CI], 13.4–30.6). Our initial data comparison therefore yielded a sensitivity of 100% (95% CI, 83.2–100), specificity of 96% (95% CI, 88.8–99.2) and precision of 87% (95% CI, 66.4–97.2). Based on manual abstraction, nearly a third of our 95 participants were found to have taken a statin since this class of drugs became available (table 1). Because many of them were found to have taken more than one statin, the totals in table 1 are not additive. Atorvastatin was the most commonly prescribed statin: based on electronic abstraction, 30 of the 95 participants (31.6%) had at least one text note mentioning atorvastatin, and manual abstraction confirmed that 26 of them had actually used atorvastatin (i.e. four false positives). Simvastatin was the second most commonly prescribed statin: based on electronic abstraction, 9 of the 95 participants (9.5%) had at least one text note mentioning simvastatin, and manual abstraction confirmed that 7 of them had actually used simvastatin (i.e. two false-positives). For purposes of illustrating the types of false-positive electronic data encountered in this study, the latter two cases are discussed further. Upon review of the flanking text associated with each simvastatin text event, we observed one case where brand-name simvastatin had been negated within the text (provider dictated 'instead of Zocor'), and another case where the drug had been mentioned with a qualifier (provider dictated 'not sure if the patient is on Zocor'). Raw drug exposure data are illustrated in fig. 1A. Both manual and electronic data are shown as a function of calendar time for a single representative study person exposed to atorvastatin. Circles indicate text mention of atorvastatin, determined electronically. Solid horizontal lines represent manually confirmed atorvastatin use. To increase precision, data points lacking a dose were dropped from the final dataset (fig. 1B). This increased precision to 95%. It should also be noted that fig. 1A and B contain numerous discrepancies between the initial electronically abstracted dose (circle) and the manually abstracted dose (solid lines). In most cases, these discrepancies were attributed to the common practice of pill splitting (i.e. many patients chose to break tablets into halves in order to reduce medication cost). To improve the accuracy of each dose, flanking text was searched for occurrence of a fraction (e.g. using character strings such as 'one-half' or '1/2'). When a fraction was noted, the dose was adjusted accordingly (fig. 1C). Atorvastatin exposure for one representative subject. Open circles: electronic extraction. Solid horizontal bars: manual abstraction. (A) Raw data. (B) Data limited to only those electronic records containing a dose. (C) Data adjusted for "pill splitting". In order to correlate atorvastatin exposure with outcome, all available clinical lipid panels were abstracted electronically for the 26 persons with a confirmed exposure to atorvastatin. Select lipid panels (those obtained 6 weeks or more after drug initiation, and 6 weeks or more after a change in dose) were then linked to atorvastatin dose, and dose–response relationships were constructed for low-density lipoprotein cholesterol (LDL-C) using those study persons exposed to atorvastatin (fig. 2). Atorvastatin dose-response relationships (LDL cholesterol). Discussion. The current study demonstrates that electronic data abstraction – based on NLP – can produce high-quality drug exposure histories for the most commonly prescribed class of lipid-lowering medications, the statins. This study also demonstrates that NLP-based exposure data can be further manipulated electronically to produce accurate dosing information. The algorithms used in this study were 100% sensitive and 96% specific, with a final precision of 95%. By electronically abstracting standard clinical lipid panels and linking these data to manually validated drug exposure histories, we also report the efficient construction of atorvastatin dose–response relationships. These relationships trend in the direction anticipated clinically (i.e. dose-dependent decrease in LDL-C). While the observed dose-dependence is consistent in magnitude with findings available in the established literature [6-8], the purpose of the current study was to validate and optimize the process of electronic data abstraction. The resulting algorithms will need to be applied to larger cohorts, if they are to prove useful in determining genotype–phenotype association [9]. The application of such algorithms to population-based Biobanks may soon facilitate pharmacogenetic discovery on an unprecedented scale. This work was funded in part by grants from the Marshfield Clinic Research Foundation (GHE 10,204) and the National Institutes of Health (U01HL069757–06). The authors wish to thank Linda Weis and Alice Stargardt for assistance during the preparation of this manuscript.

Referência(s)