Carta Acesso aberto Revisado por pares

The matrix revolutions: How databases and database linkages will transform epidemiologic research

2023; Wiley; Volume: 37; Issue: 4 Linguagem: Inglês

10.1111/ppe.12974

ISSN

1365-3016

Autores

Neda Razaz, Sid John, K.S. Joseph,

Tópico(s)

Food Security and Health in Diverse Populations

Resumo

Over three decades ago, Arnold Relman, the then Editor of the N Engl J Med, predicted a 'third revolution in medical care'.1 Relman's thesis was that a focus on the quality of services and cost control would lead to dramatic improvements in health care. Such 'assessment and accountability' would be made possible by 'linking medical management decisions to new, systematic information about outcomes of treatment' using computerised health information systems.1 Although Relman's prediction has not come to pass, digital information systems that capture vast amounts of medical and related data are now on the verge of revolutionising the work of perinatal and other epidemiologists. This issue of Paediatric and Perinatal Epidemiology includes two articles that highlight recent developments in health information systems: Suárez-Idueta et al.2 present the results of linking live birth and death registration data (for infants and children <5 years of age) in Mexico, while Johansson et al.3 discuss avant-garde population-based database linkages that bring together detailed clinical and related information on pregnant women in the Stockholm-Gotland cohort. The linkage of 24 million live births to deaths in infancy and early childhood represents a significant epidemiologic success for Mexico, despite challenges related to the underestimation of deaths, and unlinked deaths.2 A similar attempt in Canada in the 1990s resulted in a 25% rate of unlinked infant deaths in Ontario, the largest province in Canada.4 Several investigations aimed at identifying the cause for the missing birth registrations in the vital statistics data proved unsuccessful, and ultimately led to a completely revamped perinatal program in Ontario, and live birth-infant death linkages using an alternative hospitalisation database.5 The previous development and recent consolidation of detailed database linkages in the Stockholm-Gotland cohort are highly inspirational even against the Swedish backdrop, where longstanding deterministic linkages between databases (based on unique personal identifiers) have already yielded numerous epidemiologic insights. The availability of longitudinal clinical, laboratory and other information on the mother, fetus and infant will provide boundless opportunities for groundbreaking epidemiologic research. The explosion in detailed health information within databases and through database linkages has led to many different disciplines entering a field that has been, until recently, the almost exclusive prerogative of epidemiologists. Undoubtedly, the involvement of students of health informatics and data science (among others) in health research will facilitate and enrich medical research. The increasing use of machine learning techniques, and artificial intelligence, more generally, will provide a substantial impetus to diagnostic and non-causal prognostic research both by epidemiologists and other professionals. The benefits of an influx of new researchers into the health research arena notwithstanding, inter-disciplinary semantic and methodologic differences could pose communication and other challenges. As members of the discipline traditionally involved with analysing and interpreting health data using non-experimental designs, epidemiologists have a duty to help manage this expansion in professional diversity. At the very least, epidemiologists have an obligation to introduce newcomers to the lessons learned and caveats derived from decades of working with non-experimental data. One crucial general lesson is the importance of integrating substantive (clinical) understanding and methodologic principles, while arguably the most outstanding particular caveat is the problem of confounding by indication. This latter phenomenon haunts non-experimental studies of therapeutic efficacy when the outcome of interest is an intended effect.6, 7 It is generally accepted that the efficacy of therapy, whether drug, surgery, or other intervention, is best assessed through randomised trials because confounding by indication in non-experimental studies is generally considered intractable. Higher maternal mortality rates following caesarean delivery (as compared with vaginal delivery) epitomise this issue. The Sixth Report of the Confidential Enquiry into Maternal and Child Health in the United Kingdom8 highlighted this problem of confounding by indication by stating that 'it is almost impossible to disentangle the consequences of a caesarean section from the indication for the operation'. The bias of confounding by indication is also evident in the crude positive associations between admission to intensive care units (compared with admission to general wards) and death; between people with hypertension on medication (compared with normotensive people and even those with hypertension not on medication) and stroke and between hospital births (compared with home births) and perinatal death.6, 7 One of the more exciting developments associated with databases and linked database research is the ability to examine clinically and socially relevant determinants and outcomes that may be otherwise difficult to study. Rare outcomes and outcomes that require short- or long-term follow-up can be more easily studied by accessing large population databases or by linking different databases. Successive pregnancy and sibling studies are a particular genre of perinatal research that exemplify the utility of such database linkages.9, 10 The population-based nature of some databases can address problems that may arise due to more selective inclusion into a study population, and the large study sizes, which typify many such databases, are also a clear benefit. Access to information on a given determinant, confounder or outcome that may be more accurate or complete is another potential advantage of linked database research. Figure 1A shows how linking the birth database to subsequent hospitalisation and outpatient-visit databases in Sweden provides a more complete picture of congenital malformations. Figure 1B shows the frequency of cyanotic congenital heart disease infant deaths in the United States, with more deaths of infants whose heart defect was not diagnosed at birth compared with the death of infants whose heart defect was diagnosed at birth. These figures provide a sense of the misclassification that occurs in studies, which rely solely on birth data for ascertaining congenital anomaly status. The general problem of misclassification of determinant and/or outcome status is not unique to research using databases and is well understood. Studies using the information on congenital anomalies at birth and the study by Suárez-Idueta et al.2 will yield reasonable and useful estimates of association if misclassification of determinant/outcome status is non-differential (with estimates biased towards the null). On the other hand, the need to obtain accurate quantitative information on confounders represents a more crucial requirement as the distortion in effect estimates given residual confounding is less predictable (see below). Attempts at including data such as those on socioeconomic, environmental and behavioural factors in the Stockholm-Gotland cohort are therefore a prudent step, and database managers elsewhere would be wise to follow suit. Large databases tend to include some transcription and other related errors, and implausible values for all relevant data elements need to be addressed at the outset. Additionally, the idiosyncrasies of specific databases need to be understood if the results are to be interpreted correctly. Figure 2 presents the frequency distribution of birthweights obtained from birth databases in Sweden and the United States and shows end-digit preference in Sweden and end-digit and ounce preference in the United States. Although the phenomena illustrated in this example are trivial and will not impact the results of most studies, an awareness of such issues is required for the occasional situation when such quirks can lead to problems. One potential problem with database research arises in connection with attempts to answer causal questions non-experimentally. Whereas research has traditionally required pre-specification of the object of study and ad hoc collection of data on the determinant, confounders and outcome status, database studies use data collected for a purpose not directly related to the object of any particular research study. Thus, databases may contain accurate information on the determinant and outcome, and only include poor quality or absent information on key confounders. Unlike non-differential misclassification of determinant/ outcome status, which typically has predictable consequences, misclassification of confounder status poses a greater threat to validity. Although recent developments in the area of quantitative bias analysis permit an assessment of threats to validity due to unaddressed confounding, it is debatable whether researchers should attempt to answer causal questions when the database does not contain accurate information on key confounders. Two other inter-related issues involving database research require mentioning, namely, ethics/confidentiality and data access. Ethics and confidentiality issues, while extremely important, are currently managed variably, with some countries and institutions allowing free access to anonymized data repositories, others providing more restricted access, and the scientific community increasingly advocating for data sharing along with publication. Data access issues have to be resolved by balancing the public good that could follow increased access with the risk of confidentiality being breached because of unconstrained access. Databases and linked databases (which are literally matrices, hence the title) are ushering in an era where epidemiologists and other professionals will be required to work with vast quantities of detailed clinical and related health data. Although many routine, big data analysis tasks will likely be automated in the future, the automated algorithms and new studies will require a multi-disciplinary integration of substantive and methodologic inputs. The adage that 'knowledge is power' requires data to be first processed into information and information to be distilled into knowledge. Neda Razaz is an Assistant Professor in the Clinical Epidemiology Division, at Karolinska Institutet in Stockholm, Sweden. Her research program aims to understand the role of maternal and paternal chronic illness during pregnancy, neurodevelopment, and other long-term outcomes in childhood and early adulthood. Dr Razaz serves as a junior editor of Paediatric and Perinatal Epidemiology. Sid John is a Staff Researcher in the Department of Obstetrics and Gynaecology at the University of British Columbia, Vancouver, Canada. He has a background in biological systems engineering, and his current research is on gestational diabetes and fetal growth. K.S. Joseph is a Professor in the Department of Obstetrics and Gynaecology at the University of British Columbia, Vancouver, Canada, and his interests include maternal, fetal and infant health and health services. Dr Joseph serves on the editorial board of Paediatric and Perinatal Epidemiology.

Referência(s)