Artigo Acesso aberto Revisado por pares

Artificial intelligence takes center stage: exploring the capabilities and implications of ChatGPT and other AI‐assisted technologies in scientific research and education

2023; Wiley; Volume: 101; Issue: 10 Linguagem: Inglês

10.1111/imcb.12689

ISSN

1440-1711

Autores

Jessica G. Borger, Ashley P. Ng, Holly Anderton, George W. Ashdown, Megan Auld, Marnie E. Blewitt, Daniel V. Brown, Melissa Call, Peter Collins, Saskia Freytag, Leonard C. Harrison, Eva Hesping, Jaci Hoysted, Anna Johnston, Andrew McInneny, Phil Tang, Lachlan Whitehead, Aaron R. Jex, Shalin H. Naik,

Tópico(s)

Ethics in Clinical Research

Resumo

The emergence of large language models (LLMs) and assisted artificial intelligence (AI) technologies have revolutionized the way in which we interact with technology. A recent symposium at the Walter and Eliza Hall Institute explored the current practical applications of LLMs in medical research and canvassed the emerging ethical, legal and social implications for the use of AI-assisted technologies in the sciences. This paper provides an overview of the symposium's key themes and discussions delivered by diverse speakers, including early career researchers, group leaders, educators and policy-makers highlighting the opportunities and challenges that lie ahead for scientific researchers and educators as we continue to explore the potential of this cutting-edge and emerging technology. The emergence of large language models (LLMs) and assisted artificial intelligence (AI) technologies such as ChatGPT and Bard have revolutionized the way in which we interact with technology. These publicly available LLMs can generate cogent, human-like and human-level responses and have a diverse range of potential applications across diverse knowledge areas, including scientific research and education. However, with such advancements comes a new set of ethical, legal and social implications. Medical research and education are no exceptions, and our organizations must contend with new models of governance and responsibility. A recent Chat-GPT symposium at the Walter and Eliza Hall Medical Research Institute (WEHI)1 explored the current practical applications of LLMs in medical research and canvassed the emerging ethical, legal and social implications for the use of AI-assisted technologies in the sciences. The symposium was led by early career researchers, lab heads, educators and policy-makers, representing the diverse academic landscape within medical research institutes who engage with ChatGPT in their work, who as experts in their fields are learning to navigate the appropriate, efficient and ethical application of AI and LLMs. Together, the speakers sought to provoke discussions within the 500+ in-person and online audience on the use of AI-assisted technologies in scientific research, including its use as an overtly friendly editor of scientific papers and grants, its ability to turn non-coders into bioinformaticians, and its ability to analyze big data at warp speed. In addition to the emergence of AI-driven tools such as Alphafold and protein hallucination, the symposium addressed broader societal implications of using AI-assisted technologies in science, including concerns around the ethics, privacy, confidentiality and security of research data and writing that is entered into the AI-assisted technologies ether. This paper provides an overview of the WEHI Chat-GPT symposium's key themes and discussions, highlighting the opportunities and challenges that lie ahead for scientific researchers and educators as we continue to explore the potential of this cutting-edge and emerging technology. “Artificial intelligence” has certainly captured the popular imagination since the launch of ChatGPT, but models of AI have been used widely in scientific and clinical research for some time.2 Models can drive cars, recognize images and even create synthetic data – but ChatGPT is different in several ways. As a LLM, it is immediately accessible, and interacting with the model is as easy as having an online conversation. Designed to allow users to enter natural language “prompts” to “generate” a human-level response, depending on the nature of the prompt, the output can often surpass the knowledge, expertise and efficiency of the human entering it. This is relevant to many “human”-driven tasks from basic editing and distillation of a topic to complex analysis and collation of dispersed information. Crossing the threshold from science-fiction into reality has required significant technological and financial investment into the development and training of neural network-based models by well-resourced AI technology focused companies and collaborations. GPT4, for example, required significant investment from Microsoft to enable the OpenAI AI startup for the development and evolution to its current form. This highlights the scale of “training” required to produce their predictive text generating model and the important logistical considerations required for model implementation prior to public release, including how such models may interact with societal ethical, legal and social implications considerations. OpenAI has made a splash in the AI Language Network space before. The earlier GPT2 LLM – a text completion neural network launched by OpenAI in 2019 – had “limited” release amid concerns that a full version may be used to create fake news articles or be used for other nefarious purpose.3 The full 1.5B parameter network was released soon after, when their claimed fears turned out to be an overestimation of the network's performance and that its output did not traverse the “uncanny valley” to human-like responses (Figure 1). However, the leap between GPT2 to subsequent iterations of GPT3.5 and most recently GPT4 was stark. It is interesting that while OpenAI has given access to the full model, they have built in several safeguards to mitigate its malicious use, as anyone who has used it has almost certainly come across the phrase “As an AI language model, I can-not…”. Artificial intelligence tools designed for academic researchers have seen significant growth and adoption in recent years. These tools are created to address various challenges faced by researchers, helping them to streamline their work, improve efficiency and enhance the quality of their research. There now exists an expansive toolbox for academic researchers to support writing with inbuilt reference managers, image and video analysis, AI-assisted survey and experimental design platforms as well as plagiarism detectors within education (Table 1). These AI technologies and related tools have revolutionized the academic landscape, making it easier for researchers to tackle complex challenges and to focus more on the creative aspects of their work. While AI can offer tremendous benefits, it is essential for researchers to understand the limitations and potential biases of these tools to ensure the reliability and validity of their research findings. Zotero Mendeley EndNote ChatPDF Scholarcy Explainpaper IBM SPPS R Pandas NumPy Google translate NLTK spaCy OpenCV TensorFlow DALL-E2 Cariyon Microsoft teams Slack Google Workspace Elicit Qualtrics SurveyMonkey Semantic Scholar Iris.ai Research rabbit R Discovery Tableau Power BI Turnitin IThenticate Copyscape The WEHI Chat-GPT symposium identified two broad, overlapping themes to be considered when adopting ChatGPT and related LLM and AI-based tools in scientific and medical research: (1) the wide range of applications for AI in scientific research, communication and education and its potential to improve as well as to confuse; (2) the implications, current challenges and potential future developments in the application of AI to broader academic domains, including ethics, law, security and intelligence, and analyzed these in context of what it will mean to be a scientist in the future. We will now discuss these themes, with consideration of how these LLM and AI-based tools will further impact science and academic domains as they become integrated into widely used word-processing, spreadsheet and multimedia software. The following discussions are to provoke thought and invoke discussions among the readers on these broad themes that as scientists, educators and ethicists we are navigating with limited, but growing, understanding and experience. Expert reviews on these themes can be found in Table 2. Large language models can be applied in various applications across a breadth of scientific and medical research, which is not limited to the work performed at the bench, including the potential for AI-assisted technologies to bridge language barriers in science. Indeed, there is an exciting potential for AI-assisted technologies to improve accessibility and to facilitate collaboration between non-native English speaker scientists from different parts of the world. However, real concerns have been identified around bias and accuracy in text generation, particularly in the context of scientific research where accuracy and objectivity are paramount. Scientific literature is vital in advancing knowledge, but poor readability often poses a significant challenge. This issue goes beyond the use of technical jargon and incorrect English syntax. Common barriers to comprehension in scientific writing include excessive passive voice, long and convoluted sentences and unnecessarily complex language. Poorly written articles can hinder effective communication and impede dissemination of scientific findings within the scientific community and beyond. Additionally, in the increasingly competitive landscape of scientific funding, it is essential to convey the quality of research, its significance and a sense of excitement, while remaining accessible. In this regard, ChatGPT has emerged as an effective writing assistant. ChatGPT can clarify convoluted text and ambiguous statements that may hinder a reader's understanding. It can also identify and simplify jargon and complex terminology, making scientific writing more accessible to non-experts. This is particularly beneficial in grant writing, where reviewers making career-defining decisions may lack subject matter expertise. The pitfalls in generative AI can undermine ChatGPT's performance in simplifying and summarizing scientific writing. The output from a generative model is statistically reconstructed language and does not guarantee scientific coherence. While an LLM can generate summaries encompassing research questions, main findings, methodology, results and implications, these summaries may be unreliable and inaccurate. Thus, an expert human author must critically assess AI-generated text through careful fact-checking and cross-referencing. Even with experts able to identify inaccuracies within their domain, the confidence with which generative AI can produce misinformation poses a real and present risk. Additionally, the training model's access to information is limited by its date restriction, excluding more up-to-date information unless newer applications that link the LLM to the internet are used. Nonetheless, by appropriately leveraging ChatGPT's capabilities, scientists can optimize their time and expertise, allowing them to focus on improving the scientific quality of their work while using AI to help enhance the readability and accessibility of their writing. As the world and scientific research increasingly move towards globalization and international collaboration, the importance of proficient English language skills for biomedical researchers cannot be overstated. Non-native English speakers may face challenges with limited vocabulary, grammar rules and cultural nuances, creating a sense of separation among colleagues or collaborators. In this context, ChatGPT has emerged as an empowering and inclusive tool. LLMs can accurately translate complex sentences in multiple languages, including those with different alphabets (Figure 2). This helps in breaking down barriers between English as the primary scientific language and the thousands of languages that are spoken by scientists around the world, thus advancing science, its collaborations and networks, globally. Excellent communication is an important attribute of being a successful scientist and the scientific community understands that individuals with English as a secondary language are disadvantaged. Communication between scientists, the foundation of a productive work environment and effective collaborations, can be compromised by the broad and potential unfamiliar written communication styles of diverse cultures, as experienced in different email practices. Here, LLMs can help to bridge the communication gap. AI-aided translation of emails to colleagues or collaborators can improve context and vastly shorten the time that is needed to formulate complex matters that, in comparison, are more easily generated by native English speakers (Figure 3). No longer restricted only to basic translation, ChatGPT can serve as a personal assistant, teacher and translator, all in one platform. Specialized text-to-voice AI software can even help non-native English speakers to improve their pronunciation, while applications such as Kick Resume AI aid in resume writing. ChatGPT is a valuable resource for non-native English speakers for content-based questions, allowing them to retrieve information about specific biotechnological techniques, such as CRISPR, in their native language. Additional prompts can be used to request corresponding references to ensure accuracy. ChatGPT also excels as a proofreading tool. With Microsoft planning to integrate GPT4 as a co-pilot assistant, and Google intending to extend Bard LLM into the Google Suite of productivity tools, LLMs will inevitably help streamline many day-to-day tasks. These integrations will facilitate AI assistance in activities such as drafting emails, summarizing discussions, creating presentations and interpreting spreadsheet content, such that non-native English speakers stand to benefit greatly from the incorporation of these AI tools. Although LLMs have limitations, indeed translated text requires verification, effective communication is one of a professional's most powerful skills, and leveraging AI to improve individual communication skills will foster confidence and inclusion in the scientific community. Supervisors of graduate students often provide multiple rounds of feedback as students construct their thesis. In addition to reviewing scientific content, correcting language structure and grammar can form a large part of thesis revision. While students may also have access to writing courses and professional copyediting services that proofread their thesis, such services are not universally accessible or cost-effective. AI typing assistants, such as Grammarly, are already used with little controversy to provide real-time writing feedback on spelling, grammar, punctuation and clarity, suggesting replacements for identified errors. With GPT4 soon to become part of the Windows365 suite, ChatGPT will become just another icon, located next to Grammarly, to provide writing assistance. In principle, this should reduce the editing load for students and supervisors. Much less time will be spent on removing commas and breaking up paragraphs, and more time talking about what is important – the science. Yet, it raises broader concerns within the education sector regarding the originality of content that is produced by the student. Many supervisors and educators are worried that students will lose the critical skills of academic writing and editing that we are so passionate about developing in our students. However, we need to reflect on our own writing – with red underlined typos and blue underlined grammatical errors currently highlighted in Word and Powerpoint documents, we don't simply accept them; we review and correct them as appropriate. A thesis that has had any form of proofreading from a supervisor, a professional copyeditor or ChatGPT will still need to be revised by the student themself. Supervisors and educators should acknowledge that, if used appropriately, students will still learn the same writing and editing skills they traditionally have, just differently, and that is ok. As LLMs are currently unable to undertake logical or critical thinking in any true sense, their use is limited to writing assistance, not as a search engine that automates the critical skills required to produce exceptional researchers. A WEHI researcher asked ChatGPT to organize a conference for an Australian Medical Research Consortium whose members and interests are readily accessible from websites and publications. To enhance accuracy and richness, the prompt request initially included key details such as title/theme, purpose, topics, target audience and number, location, duration and total budget. The aim was to see whether ChatGPT could produce more than just a generic program covering organization and content. Unfortunately, the result was rather disappointing. Firstly, of the ten scientists named as members of the organizing committee or as speakers, eight had no connection to the consortium or to the theme. Even for the two relevant individuals, their affiliations were incorrect. None of the individuals named on the organizing committee were known consortium members. Attempts to improve the selection of relevant contributors was not improved by regenerating the request with additional scientific keywords or even suggested speakers; in fact, except for copying suggested speakers, the accuracy declined. Secondly, the scientific content did not align well with the current or future themes, which would have been easily traceable. Presentation titles were too broad, e.g. ‘Multiomics in research’, ‘Controversies’, ‘Future Directions’. Finally, although ChatGPT provided a generic program structure with sessions for keynote speakers, oral and poster presentations, workshops and a suggested budget breakdown, the closest it got to specific and accurate content was to suggest conference venues and registration websites. At this time, although ChatGPT can provide a useful checklist for organizing a conference, it lacks the creative ability to go beyond this and to craft meaningful scientific content. As the size and complexity of datasets in biomedical science have grown over time, the demand for programming skills has often outpaced the availability of such expertise, frequently leading to a bottleneck in experimental iteration. Analyzing large datasets using programming languages can be daunting for bench scientists. However, relying on professional bioinformaticians for such analyses can introduce delays and communication challenges. Tools such as ChatGPT, explicitly trained in part on programming code, can empower bench scientists to analyze their own datasets. A non-coding user can describe their inputs and desired outputs in natural language and use LLMs to generate the code required for their bioinformatic analysis. The improvement from GPT3.5 to GPT4 was evident in the reduced number of prompts required to refine the execution of a basic task, with GPT4 needing only a quarter of the prompts compared with GPT3.5. Indeed, in a recent study 97.3% of a bioinformatics task was solved in 7 or less prompts. Despite these excellent results, the study also points out that relying on AI generated programming code can lead to erroneous outputs in the absence of code comprehension.8 The huge potential of ChatGPT for bioinformatics education was highlighted by prompting the model to explain the reasoning behind the selection of specific functions, which resulted in a detailed description of the underlying algorithm. Furthermore, ChatGPT can be employed to summarize and interpret code produced by a bioinformatician to provide a plain language interpretation and summary for the bench scientist, including an assessment of the script's limitations. Further prompts can enhance the script, making it more comprehensible and reusable for other applications. However, to examine the true impact that AI has on learning bioinformatics, controlled experiments in a classroom setting should be conducted.9 A final consideration is how adopting AI models might impact bioinformatics as a profession. We expect an immediate surge in productivity of computational laboratories as AI models automate code prototyping. In the medium term, tools like the ChatGPT code interpreter could democratize bioinformatics by enabling direct dataset analysis via natural language using an integrated Python interpreter. The required skill set will likely shift from writing syntactically correct code to better comprehension and testing. In the long term, we envision professional bioinformaticians being liberated from routine analyses, allowing them to concentrate on more bespoke and creative tasks. This shift, in turn, may favor individuals with stronger logical, mathematical and creative skills over those focused on raw output. The rise in the availability of large-scale imaging data through multi-channel, multi-dimensional, long-term and live-cell microscopy is providing information-rich data that requires bespoke pipelines to extract meaningful results. To cope with the challenge of quantifying the imaging data, non-coding users have a few options: (1) collaborate with researchers who do code (ideally an imaging specialist); (2) try and work with previously published pipelines; (3) rely on the proprietary software included with some imaging platforms which are limited in their application and cannot be modified. Incorporation of previously published pipelines often requires workarounds and cannot always be applied directly to in-house datasets, even those using the same techniques from different laboratories. Challenges of achieving this when developing an opening source pipeline for detecting and counting cells, are clearly visible when using training data from different research groups around the world to increase the robustness of the model. New imaging methods such as lattice light-sheet are now offering plug and play long-term (hours to days), video-rate, 3D, live-cell datasets. This has huge potential for answering complex biological questions for any researcher with access to a system. However, as with many imaging modalities, data handling and analysis is an ongoing challenge and even an afterthought. As such ChatGPT's coding capability offers an exciting opportunity to accelerate users in the rapid and efficient generation of bespoke coding packages. An example presented by imaging specialists at WEHI asked whether ChatGPT could be used to assist in generating a framework or at least in laying the coding foundations for an imaging data analysis workflow. The AI-generated code (in this case for Python) could import relevant libraries, load the imaging data, segment regions of interest using thresholding methods and attempt to quantify the data over time (Figure 4). The results could then be plotted accordingly. However, when running the image analysis code on real lattice light sheet data, it quickly became apparent that there are significant limitations that LLMs may struggle to overcome when developing novel coding approaches (Figure 4). For example, when prompted to track segmented cells over time, ChatGPT's coding attempt could only plot direct paths between all detected cells. While we can appreciate that cells do not move in perfectly straight lines, this may not be apparent to the LLM. Mathematical errors were also common, so any steps requiring quantification required manual checking. It was also common, upon being asked to fix coding errors, for ChatGPT to offer 2 or 3 potentially non-functional alternatives, requiring an experienced coder's attention to resolve. While ChatGPT can provide a starting basis for bespoke analytic pipelines, its performance is still only as good as the researcher's own knowledge, the clarity of the question asked and, most importantly, the user's ability to fact-check and verify the code and the output. Despite the limitations in GPT3.5, GPT4.0 offers major advances where researchers can generate bespoke code with little or no coding background. ChatGPT4.0 is already being incorporated into other pipelines; Omega (https://github.com/royerlab/napari-chatgpt) takes the image processing and analysis code from ChatGPT and, through napari, attempts to fix bugs and errors in real-time. Thus, using LLMs as part of an analytic ecosystem could be a significant area of growth in the future. For researchers that do code, it is becoming an invaluable assistant when writing our own code, acting as an alternative to StackOverflow for finding fixes to errors. The emerging concepts within Theme 1 demonstrate that ChatGPT can have real-world applications, but its effectiveness should be tempered by the user's expertise. In assisting with writing, it is crucial for users to guide the AI with specific prompts and to understand and edit the output appropriately. In many instances, experts may find it easier to write the text themselves and to use AI to handle mundane but time-consuming tasks, such as copyediting. The advantages of this are evident, particularly for users for whom English is a second language. Opinions vary among academics as to how useful ChatGPT is for coding, which largely depends on the user's area of expertise. ChatGPT can output compelling looking code; however, if the user is unable to understand each step of the code to check for valid treatment of the data, the use of this tool is risky. For a wet lab scientist with a good understanding of the source data and the ability to assess the output of an analysis, ChatGPT can be extremely useful for routine coding tasks. For expert coders or instances where large chunks of code are generated for multistep tasks, ChatGPT significantly slows down the process by introducing errors that require extensive debugging. Used judiciously and for discrete well-defined and well-understood tasks, ChatGPT can speed up data processing and be enabling for scientists with little coding experience. The second theme of the symposium focused on the ethical, legal and social implications of using machine learning and LLMs in our day-to-day academic work. The discussion revolved around not only what we can do but what we should do. It raised questions about the impact that AI-assistance may have on the role of scientists and educators and what intellectual contributions truly mean in our work. The discussions explored what might be gained by using these AI platforms versus what could be lost or put at risk. These considerations have profound ethical, legal and social implications, particularly given that the scientists and students working with generative AI will have varying levels of understanding, experience and awareness of its shortcomings. As AI continues to shape the research landscape, it is essential that we continuously assess the broader impact of this transformative technology and actively strive to use it responsibly in research and education. ChatGPT and other emerging machine learning technologies provide a significant opportunity for researchers. However, to ensure that our use of these technologies has us working towards discoveries that promote human flourishing, AI needs to be aligned to these goals. Due to the inherent difficulty in specifying the full range of desired and undesired behaviors, this has sparked a new field called alignment research. This research is at the core of ensuring that the risks AI poses for humanity are kept to a minimum. AI can harm humans in two ways; by influencing other humans to commit unethical behavior or as an enabler of unethical behavior. To navigate the corrupting effects of AI, researchers have suggested the application of behavioral ethics, a field of social scientific research based on empirical observations. Adopting such an approach will provide better foundations to design evidence-based effective policies. However, currently, this research is primarily undertaken by the companies that design and sell AI, as a research community we must insist on this changing. An excellent review by Kobis et al.,7 evaluating the available research from behavioral science, human-computer interaction and AI research, provides a novel evidence-based discussion (Table 2). Whilst there is a consensus that AI tools such as ChatGPT will increase efficiency and allow scientists to focus their enquiring minds on discovery, there are specific concerns of which to be mindful. The content shared in queries or prompts can become part of the data pool that companies such as OpenAI use to train their generative models. As a user, one permits that use when you sign up. There are currently no restrictions placed on the future use of content entered as text prompts into any LLM platform. This raises the potential risk that once shared on these platforms, the information is considered a public disclosure and is no longer confidential. Companies such as Samsung have banned the use of ChatGPT after engineers used it to fix proprietary code inadvertently disclosing a valuable trade secret.10 Following this incident, Walmart, Amazon and Microsoft were among many companies who implemented similar bans, at least until guidelines are developed.11 Many Australian universities have also established guidelines or policies around the use of generative AI software for teaching, learning and assessment. They all remind their students of the value of academic integrity, but they also are strong advocates for the (careful) adoption of AI tools. Other concerns are around posting personal data onto the platform, including information that could constitute a data breach and contravene privacy or medical laws. According to the OpenAI ChatGPT privacy policy, a user intending to post personal data is thus bound by the General Data Protection Regulation, and must include a Data Processing Addendum with the AI provider. Questions have been raised about copyright ownership and moral rights' infringement in the use of LLMs and other AI-based tools. Copyright subsists in a work that has been generated by a human and must have a human author. Since ChatGPT is an AI language model, it is not capable of creating content on its own but rather responds to user prompts. Copyright subsists in PhD or Master's theses or any scientific articles that may become part of the ChatGPT data pool

Referência(s)