Open science: Considerations and issues for TESOL research
2024; Wiley; Volume: 58; Issue: 1 Linguagem: Inglês
10.1002/tesq.3304
ISSN1545-7249
AutoresAli H. Al‐Hoorie, Carlo Cinaglia, Phil Hiver, Amanda Huensch, Daniel R. Isbell, Constant Leung, Ekaterina Sudina,
Tópico(s)Online Learning and Analytics
ResumoOpen science (OS; also known as “open research” and “open scholarship”) refers to various practices to make scientific knowledge openly available, accessible, and reusable. The core purpose of such practices is to open the process of scientific knowledge creation, evaluation, and communication to societal actors within and beyond the traditional scientific community (UNESCO, 2021). Looking across the different areas within TESOL and applied linguistics more broadly, it is clear that OS practices have become more common on the part of individual researchers and journals. For example, while Marsden, Morgan-Short, Thompson, and Abugaber (2018) and Marsden, Morgan-Short, Trofimovich, and Ellis (2018) noted generally low prevalence of replication studies, the number of replications identified from 2010 to 2015 (n = 20) was larger than all located in the period of 1973–1999 (n = 17). Open data and materials have become more common, too, as seen in the widespread use of the instruments and data for research in language studies (IRIS) database (iris-database.org). OS badges now frequently adorn articles in several journals, and journals such as Language Learning have been recognized for adopting a range of support for OS practices, as seen in TOP (Transparency and Openness Promotion) Factor scores (https://topfactor.org, based on the TOP Guidelines, TOP Guidelines Committee, 2015). Given this momentum, we feel that it is time for TESOL researchers to seriously consider the benefits, and potential challenges, of more active, consistent engagement in OS practices. In this article, we focus on four aspects of OS: transparency, preregistration, data and participant protection, and open access. We begin by discussing the issue of transparency, exploring initiatives within and in related areas of our field, examining the benefits of enhancing transparency and challenges underlying certain practices. Next, we explore affordances, challenges, and common perceptions surrounding the practices of preregistration and open data sharing, emphasizing that these are not all-or-nothing endeavors but instead can consist of specific decisions made by researchers depending on their particular situation. Following this, we consider barriers to accessibility of scholarship and discuss grassroots initiatives supporting open access research as a goal underlying OS. Finally, we reflect on potential (mis)conceptions of OS in TESOL, and rather than calling for a universal standard for OS practices in our field, we encourage individual scholars to consider employing contextually appropriate OS practices in their work. Overall, we agree with Liu's (2023) observation that the rich interdisciplinarity and methodological diversity within the field of TESOL in particular and applied linguistics overall has great potential to support “an inclusive understanding of open science” (p. 6). At the forefront of OS topics gaining attention in the field are initiatives to enhance transparency. Here we briefly explore several questions around transparency in TESOL-related research: What is transparency? Why is transparency necessary? What are the main barriers to achieving transparency? What are the costs of limiting transparency? Methodological transparency involves making all aspects of the research process fully available to its academic, institutional, and public stakeholders from the conceptual framing of the research to the design, materials, data collection, analytic methods, and reporting and dissemination. Transparency is intended “to improve the accessibility, visibility, rigour, scrutiny, reproducibility, replicability, and systematicity of research” (Marsden, 2020, p. 26), and it is increasingly seen as coinciding with research quality. Indeed, study design and methods “demand a kind of professional scrutiny that goes directly to the core of what we do and what we know and what we can tell our publics that we know” (Byrnes, 2013, p. 825). One of the core principles of methodological transparency is that the validity of a research claim depends not on the reputation of those making the claim, the venue in which the claim is disseminated, or the novelty of the findings, but rather on the empirical evidence supported by the underlying materials, data, and methods, all of which are made accessible to fellow researchers and the wider public (Nosek et al., 2015). There is an increasing array of empirically demonstrated benefits of methodological transparency (e.g., Gennetian, Tamis-LeMonda, & Frank, 2020; Miguel, 2021; Schroeder, Gaeta, El Amin, Chow, & Borders, 2023). Transparency creates a more complete understanding of the connected network of materials underlying our research and empirical claims. Peer reviewers benefit when they are better able to critically evaluate research. Journals benefit when they have more accurate information about the research that underlies their publications. Professional associations benefit from more complete access to research that is the basis for conclusions made available to practitioners, fellow researchers, and policymakers. In this way, transparency builds confidence in the knowledge we share and apply in practice. Transparency also promotes exchange and collaboration with underrepresented stakeholders, thus enhancing the diversity and equity of our work. Transparency increases the impact and widens the reach of our work as we benefit through increased credibility or understanding of our results, and the potential for more reuse, citation, and broader recognition of our outputs. Overall, we can see that transparency is an important investment in our field's future and advances our ethical imperative in generating knowledge (Ortega, 2005). There are many initiatives for increased transparency both within our field and beyond. For example, IRIS, the open repository for multilingual research instruments and materials, aims to promote the sharing and reuse of research data and materials and make study methods more transparent, down to the individual questions and stimuli used in studies (Marsden, Mackey, & Plonsky, 2016; Marsden, Thompson, & Plonsky, 2017). Open Accessible Summaries in Language Studies (OASIS) and TESOLgraphics, for their part, share the common aim to enhance accessibility of research findings to practitioners and to the wider public. These forms of sharing support access which can help address issues related to trust and research integrity. Contributor Role Taxonomy (CRediT) statements (Brand, Allen, Altman, Hlava, & Scott, 2015) are a tool used to provide a transparent and nuanced understanding of authors' diverse responsibilities and contributions to research projects. TOP Guidelines provide transparency standards in multiple domains of research (e.g., study design, materials, data analysis, and methods) that are used by journals and scholarly associations to model, endorse, and reward practices for making transparent disclosure the default in research publication (see https://topfactor.org). On this view, it is interesting to note that Language Learning has a TOP Factor of 11, Applied Psycholinguistics has a score of 8, The Modern Language Journal has a score of 2, while TESOL Quarterly has a score of 0. A salient question here is: Can we achieve transparency without mandating these procedures? In an effort to promote transparency, TESOL Quarterly will soon formally recognize the adoption of OS practices, including sharing research instruments and materials, making data publicly available, and preregistering research plans. While these practices will not be a requirement for publishing with TESOL Quarterly, they are intended to encourage authors to adopt OS practices where appropriate and increase the transparency of their work. It should come as no surprise that there are other, seemingly less effortful, ways of achieving transparency in the life cycle of research by relying on robust reporting practices. Included here are things such as conflict of interest disclosures, reflexivity and/or positionality statements, and the many considerations and disclosures that are part of research projects beginning with designing a study, gaining ethical approval, choosing and implementing materials and data elicitation instruments, collecting and analyzing data, through to reporting on the results. Readers of research no doubt expect to find many of these details in published manuscripts. However, limits to researchers' available time and effort and inherent space restrictions in such venues may lead to certain omissions and compromises in transparency. Reviews of TESOL-related research in certain domains indeed show evidence of low uptake of such reporting practices (Isbell & Kim, 2023). In this regard, Marsden (2020) cautions that relying primarily on robust reporting challenges methodological transparency because “the inevitable lack of standardisation and the organic nature of reporting standards” (p. 21) leads many to craft a good/sanitized methodological story rather than to tell the whole/real story no matter how imperfect. Seen in this light, achieving methodological transparency in our “everyday” practice should not be taken for granted. Methodological transparency can manifest in many ways, and because not all OS practices are transposable across all research paradigms, the different designs, methods, and epistemologies across TESOL-related research are accompanied by different challenges for achieving methodological transparency (Marsden & Morgan-Short, 2023). Chief among the challenges to increasing methodological transparency are systemic barriers including “inertia and the comfortable embrace of the status quo” (Center for Open Science, 2015). At a field-wide level, for instance, there are formal incentives and a publication infrastructure that make transparency not the obvious default. Given this reality, Marsden (2020, p. 25) argues that there is a collective need for stronger “unified directives and incentives from professional associations, promotion systems, funders, and journals” to encourage, incentivize, and support the field in making our research transparent by default. For individuals or research teams, there are other barriers to making our research transparent by default. These include concerns about ethical protections and confidentiality for research participants and their data (which we examine at length below), the desire to maintain a perceived competitive advantage, or intellectual property concerns for certain materials. There are also legitimate technical and resource limits with certain methodologies and with large datasets, and a lack of widespread training in methods or practices that enhance transparency in research (e.g., Liu & de Cat, in press). Unfortunately, limiting transparency can have serious costs and downstream consequences. As argued by many methodologists with expertise in comprehensive reviews, it decreases efficiency within thematic domains of research, lowers the rate of independent verification and replication, and hinders the comparability of results across studies and within meta-analyses. Limiting transparency also fosters unproductive competition between researchers and precludes science from being self-correcting. Perhaps most importantly, it erects barriers to trustworthiness of research findings and undercuts equity and inclusivity in knowledge creation and sharing (Marsden & Morgan-Short, 2023). Clearly one size does not fit all, and there are risks to any new initiatives for transparency in TESOL-related research. One is setting the bar for transparency too high that it alienates researchers and limits engagement in such practices (see Chiware & Skelly, 2023; Liu, 2023; Steinhardt, Mauermeister, & Schmidt, 2023). Another is setting the bar so low that it does not have a meaningful impact. As researchers in TESOL, we are not expected to be all-knowing as we conduct our research. We are not expected to ask all the right questions, make flawless decisions, or use the most advanced and sophisticated methods that exist. We are, however, expected to be honest and transparent in our research about what we did and why. Methodological transparency, then, is the minimal expectation for any research (Center for Open Science, 2015) and is critical to the trust that TESOL stakeholders and institutions place in our research and the evidence it provides for policy and practice. With this, we turn to preregistration—a significant practice that can promote methodological transparency. Preregistration is the OS practice of placing a time-stamped research plan onto a public repository before data collection has occurred. We will now look at the aims, benefits, and potential criticisms of preregistration. In addition, the differences between general preregistration and a specific type of preregistration called a registered report are highlighted. The purpose of preregistration is to increase transparency by providing a public record of the methodological decisions and analyses that were preplanned by researchers and differentiating those from exploratory investigations and unplanned decisions. This is particularly desirable for confirmatory, quantitative research studies (Nosek & Lakens, 2014); however, preregistration has also been argued to be useful for qualitative research studies (Haven & van Grootel, 2019). Part of the logic behind preregistration is that by creating a public record of the methodological decisions that were made in advance of data collection and analysis, we can more likely identify questionable research practices such as HARKing (Hypothesizing After Results are Known; Kerr, 1998) or p-hacking (Simonsohn, Nelson, & Simmons, 2014, p. 534, “[attempting] multiple analyses to obtain statistical significance”), which may threaten the replicability and reproducibility of our work. With a publicly available record, readers can compare the documented plans included in preregistrations with the procedures and analyses reported in published studies. All researchers are familiar with preregistration at some level given the research proposals we likely had to create for our theses/dissertations during our graduate education or if we have submitted a grant proposal that required information on study design and methodology. The main difference for preregistration is that this information is placed in a public repository, potentially embargoed until publication. The depth or amount of detail included in a preregistration can range from relatively skeletal to a fully-fledged literature review and methods section write-up. The most important aspect of any preregistration is that it includes enough detail to document the critical decisions made in advance of the study being conducted. Multiple resources and examples exist to help guide researchers new to the process. For instance, the Center for Open Science has a host of information related to study preregistration (https://www.cos.io/initiatives/prereg) and also provides a free template and space for researchers to preregister their work (for an example, see https://osf.io/w4gj2). Simmons, Nelson, and Simonsohn (2021b) included a useful discussion of “good vs. bad” preregistration information, the American Psychological Association (APA) provides a template,1 and the Penn Wharton Credibility Lab provides free housing of preregistrations at https://aspredicted.org/. Some of the main critiques of preregistration appear to stem from misunderstandings regarding the practice (for a more detailed discussion, see Huensch, in press). Here we focus on three common criticisms/misunderstandings: preregistration stifles creativity, preregistration is onerous, and preregistration reduces productivity (see Pham & Oh, 2021a, 2021b; Simmons et al., 2021b; Simmons, Nelson, & Simonsohn, 2021a). The first criticism claims preregistration stifles creativity because once the research plan is placed in a repository, authors cannot modify their plan or conduct additional, exploratory analyses. This is most definitely a misunderstanding, and a very important point bears repeating: Preregistering a study does not mean that the author is required to follow all plans and cannot conduct unplanned analyses. Rather, preregistering a study means that if an author does modify a plan or conduct exploratory analyses, the former should be justified, and the latter should be labeled as such. The second two criticisms are related to each other in that they both critique preregistration for being time-consuming or effortful. Simmons et al. (2021b) responded to the “preregistration is onerous for authors” claim by countering that if it were that overly time-consuming, over 20,000 authors would not have preregistered their studies on the site AsPredicted.org without a requirement to do so. Preregistration might also be considered onerous for journal reviewers if there is an expectation that reviewers take on the additional burden of comparing submitted manuscripts to preregistration plans. Whether preregistration reduces productivity is an interesting empirical question, and some authors (e.g., Wagenmakers & Vazire, 2020) have even indicated that this might be a welcome change. One specific type of preregistration, registered reports, deserves special mention. Registered reports reflect a modified approach to the typical publication process in that they incorporate a peer review stage prior to data collection (see e.g., Marsden, Morgan-Short, Thompson, & Abugaber, 2018; Marsden, Morgan-Short, Trofimovich, & Ellis, 2018). Registered reports have been suggested as a potential solution to reducing publication bias and shifting incentives for both authors and reviewers (Nosek, 2020). In the registered report process, instead of submitting a full manuscript to a journal for review, authors submit everything up until the results section (i.e., introduction, literature review, and method/proposed analysis) for review before any data have been collected. This represents the most involved type of preregistration in that a partial manuscript has been written and reviewed before data collection begins. A successful outcome of the review process for a registered report would be an “in-principle accept,” meaning that if the method is followed and/or any deviations are justified, the manuscript will be published by the journal regardless of the results. This means that during the initial review stage, reviewers are asked to evaluate whether the research questions are interesting and relevant and whether the proposed method and analysis are rigorous and of high quality without being influenced by the results. Several journals in fields related to TESOL (e.g., Bilingualism: Language and Cognition, Language Learning, and Language and Speech) have adopted this practice, including the Journal of Child Language which requires an additional preliminary step in which authors write a brief letter of intent to the editors before submitting the registered report. Returning to the question of time/effort, it should be acknowledged that the registered report process involves some level of uncertainty. For example, planning for data collection might be difficult as it is unknown when the preliminary review process will be completed. Despite this, it could be argued that registered reports simply shift the timing of the stages of the research process without increasing the overall time/effort necessary (see e.g., the timelines presented in Huensch, in press). Regardless of format, there are multiple potential benefits to preregistering a study, including increased transparency, decreasing (unwitting) questionable research practices, demonstrating confidence in one's work, and having a written record of the critical study design decisions made in advance of data collection. Given the newness of this practice in our field, how we, as TESOL researchers, choose to adopt study preregistration, evaluate its effectiveness, and train future scholars (see Hui, Koh, & Ogawa, 2023) is something to focus upon in the coming months and years. Open data sharing is at the heart of the OS movement because it is conceptualized as a practice to increase robustness and reproducibility of research findings, thereby helping alleviate irreplicability across research fields through facilitated reanalysis and increased transparency (Hicks, 2023). Although regarded as a meritorious practice, it should not be taken lightly. Common issues with data sharing include variability in (a) open data standards (e.g., FAIR; https://www.go-fair.org/fair-principles/; Wilkinson et al., 2016), which are often unique to specific fields and may be difficult to implement without some degree of familiarity with advanced methodology, (b) data sharing journal policies (e.g., making open data sharing mandatory may lead to sharing highly sensitive information that should otherwise be protected), and (c) data files organization and explanation of the shared materials, which, if done casually, can make it difficult for others to use them (Hicks, 2023; Isbell, in press). Critically, the most challenging aspect of open data sharing concerns the protection of data and participants. Indeed, in our digital age, data are “the new oil” (Gstrein & Beaulieu, 2022, p. 2), but “privacy is a universal human right” (p. 10). It is worth being reminded that “the most valuable data are often the most sensitive” (Dennis et al., 2019, p. 1839). No one would like their personal data to be leaked and found on thousands of dark websites. For that reason, steps should be taken to protect participants' privacy and confidentiality to eliminate “the possibility of tracing, linking or deducing individuals from the data” (White, Blok, & Calhoun, 2022, p. 280). Although fully anonymized data (i.e., without personal identifiers) can arguably be shared without participants' consent, at least in some fields, “large datasets with multiple variables cannot truly be anonymized” (p. 280); the same holds for studies with small sample sizes (Maritsch et al., 2022). Additionally, reidentification is always a possibility with de-identified data (i.e., without any personal identifiers in the dataset itself, but with a separate key that allows to reidentify study participants). Furthermore, openly sharing sensitive data can lead to identifying “specific, nonconsenting individuals or communities” (Zipper et al., 2019, p. 5203). To illustrate, Maritsch et al. (2022) analyzed a total of 530 studies in clinical research and found that 13% thereof inadvertently put participants at risk. To alleviate the possibility of reidentification, at-risk study authors were then asked to make changes in their publications (e.g., remove direct and indirect identifiers and revise their data sharing plans). Although TESOL and applied linguistics data more broadly are arguably rarely as sensitive as those in clinical trials, openly sharing data, if done without due caution, may violate the privacy of participants and other members of their respective communities. To help protect participants' data, at a minimum, the following steps should be taken: (a) conducting OS training sessions for data holders (Hicks, 2023; see also Hui & Huntley, in press for a multi-tiered approach to incorporating OS training in applied linguistics graduate programs), (b) publishing field-specific decision trees on sharing research data (e.g., Zipper et al., 2019, for water science research), (c) submitting data management plans with the manuscripts (e.g., “a written data privacy and security statement as part of the submission process;” Zipper et al., 2019, p. 5207), (d) discussing with participants how their data will be shared and ensuring that they can make an informed decision even if the purpose of the study is unknown (Dennis et al., 2019), (e) using controlled or managed access for sensitive data that can be reidentified (Dyke et al., 2015; Maritsch et al., 2022), and (f) sharing simulated data if the original data cannot be shared (In'nami, Mizumoto, Plonsky, & Koizumi, 2022). The latter is a viable option if the data are being collected from vulnerable groups (e.g., minorities and children). Crucially, researchers should prioritize the well-being of their participants and have “cultural understanding of and sensitivity toward communities” (Zipper et al., 2019, p. 5203). Some communities (e.g., Indigenous peoples, refugees, and undocumented immigrants) may not allow open data sharing, and their decisions must be respected. Ethically sharing participants' data can be even more challenging in the case of qualitative data because such data are hard to anonymize (Dennis et al., 2019). This is especially true for video data. To illustrate, in sign language acquisition, participants' hands, body posture, and facial expressions should all be visible; blurring participants' faces would protect their privacy but inevitably lead to linguistic data loss (Kolbe, 2022). In situations like this, one might wonder what can be done to support OS when research material cannot be made anonymous as in The NaKom DGS-Test project, a sign language study involving children that was more in line with “‘locked’ science, rather than open science” to comply with European data protection laws (Kolbe, 2022, p. 118). According to Kolbe (2022), alternative solutions for open data sharing and protection exist, even though their implementation and data preparation would require a considerable amount of time on the part of the researchers. For example, one can use technology to transcribe and notate sign language data; another solution is to blur participants' faces and use annotations to provide missing facial data; more user-friendly options include using avatar software or incorporating shadow signing, whereby “another person is filmed copying the communication of a signer” (p. 119). The latter can be made possible with the help of a trained research assistant, which has direct implications for TESOL research in such sensitive contexts. Overall, it is always advisable to pursue data-appropriate anonymization techniques. Linguistic data loss may be inevitable, but losing some aspects may be less consequential for research. For instance, typing up handwritten essays, rather than sharing scans or pictures, would in most cases constitute minimal loss of information in research on L2 writing. Yet another challenge of open data sharing includes protecting the data and privacy of participants who self-disclose. The reasons for participants' self-disclosure range from being proud of their contribution to a research study to being willing to inspire others, which can be particularly encouraged by funding agencies (McKibbin, Malin, & Clayton, 2021). Nonetheless, public self-disclosure increases the risk of reidentification—a highly undesirable outcome that would harm researchers' and their institutions' reputations and may ultimately undermine individuals' trust in research (McKibbin et al., 2021). To minimize the risks of participants' self-disclosure, researchers should inform participants of the consequences thereof and consider implementing data use agreements (e.g., when data users are only allowed to use the data to answer their proposed research questions but not reidentify the participants) and imposing “penalties to users who re-identify or disclose research data” (McKibbin et al., 2021, p. 10). In addition to the above courses of action, a somewhat unorthodox solution to data and participant protection is to reconsider the issue of data ownership (Dennis et al., 2019; White et al., 2022). Typically, participants' data belong to researchers' institutions, journals, or funding agencies. However, OS-minded researchers might consider redelegating this ownership to participants themselves—a principle echoing Indigenous data sovereignty (Zipper et al., 2019). This could be done by creating a repository of research data that would be populated by participants themselves and could be used for various research projects; “the researchers would be purchasing the right to analyze data, not the data themselves,” and participants would have more control over their personal data, which could arguably make them more invested in the research process (Dennis et al., 2019, p. 1842). Ultimately, data and participant protection is the researchers' ethical and professional responsibility. Given that participants predominantly share their data for the greater good (White et al., 2022) and that there are several examples of egregious misuse of open data in other disciplines (as noted by Dennis et al., 2019), researchers in TESOL and elsewhere in the language sciences are obliged to take the matter of open data sharing (even more) seriously and consult data sharing regulations and privacy experts prior to making their datasets publicly available. Open access is concerned with the availability of the final product of research (e.g., the journal article) so that interested scholars can read it without having to pay subscription fees. Without open access, a significant proportion of academics (in addition to practitioners, policymakers, and the broader public), especially in the Global South, are excluded from reading the latest literature, let alone contributing to it, due to their (and their institutions') inability to afford subscription fees. This status quo aggravates the already existing North–South inequality in the academic communities. There are some very good reasons why we should take open access seriously. Some publishers charge fees in excess of US $35 for a single PDF, sometimes just to rent it for 24 hours. If we assume that the average journal article is 15 pages, and if we were then to apply this rate to books, a 300-page book would cost $600. Such overpriced subscription fees apply no matter how old the article is, or how short it is, even if it is
Referência(s)