Artigo Acesso aberto Revisado por pares

Rising to Meet the Challenge of Generative AI

2024; Wiley; Volume: 41; Issue: 1 Linguagem: Inglês

10.1111/jlse.12141

ISSN

1744-1722

Autores

Inara Scott,

Tópico(s)

Evolutionary Algorithms and Applications

Resumo

ChatGPT launched in November 2022 and quickly became the fastest-growing user application in history, marking one million users in two months—a milestone that took TikTok nine months to achieve and Instagram two and a half years.1 That explosive growth has come with an explosion of concern for the ability of scientists and regulators to understand what it is, how it works,2 and its potential to change life as we know it. Politicians and technology executives alike are calling for more and better regulation to ensure apocalyptic scenarios of artificial intelligence (AI)-aided disasters (everything from AI-created weapons to sentient AI systems) do not come to pass.3 Meanwhile, the practice of law is dealing with the implications of a tool that can pass the Bar exam4 and colleges and universities are grappling with the reality that students can use generative AI to complete just about any assignment they can give.5 This article is not intended to "solve" the problem of generative AI. Rather, recognizing the astonishing pace of development of generative AI tools and their impact on business law and other higher education classes, it seeks to provide specific, concrete steps that faculty can take to evolve alongside these tools. There is no way to "AI-proof" your classes. However, taking the steps outlined here can help you decide what you want to teach and how you should teach it. It offers a structure for identifying the content you want to keep and what you can let go of and tips for redesigning assignments and syllabi to clarify your approach to students and reduce academic misconduct. To understand the profound impact of generative AI and tools like ChatGPT, it is helpful to begin by unpacking some of the language that is used in this field.6 AI, short for "artificial intelligence," generally refers to the use of machines, particularly computers, to perform tasks that simulate the use of human thinking, reasoning, and learning.7 We encounter AI throughout our day, when using computer applications like a Google search, interacting with a chatbot on a consumer website, or using a virtual assistant like Siri or Alexa. The ubiquity of AI in our daily lives is predicted to increase.8 In the near term, AI's use will likely become more ubiquitous as it becomes embedded in the way we interact with everyday items like our cars, office computers, and coffee machines.9 The term artificial "intelligence" is controversial because machines do not actually have the capacity to think or learn like humans. Their programming can simulate some aspects of human intelligence, but they do not reason like a human.10 For this reason, scientists distinguish between "strong AI" or "artificial general intelligence" and "weak AI."11 Weak AI is what we generally have in use today—computers simulating human intelligence while they complete a specific type of task they have been programmed to perform. Strong AI is a theoretical system that thinks, reasons, and has self-awareness like a human (and could evolve to exceed human intelligence—thus prompting doomsday "AI takes over the world" scenarios). Machine learning is a type of AI that programs systems to "learn" and improve their ability to perform assigned tasks without explicit programming.12 Machine learning relies on algorithms that allow machines to sort vast quantities of data and use that data to make predictions and find patterns, ultimately teaching themselves through repetition and human intervention, while human programmers tweak the algorithms that guide the learning process.13 The most powerful and complex AI applications today use neural networks—a type of machine learning that simulates the way neurons in the human brain communicate information.14 Generative AI refers to the use of AI applications to create something new, including images, text, and other forms of media. Large language models (LLMs) use machine learning and highly sophisticated neural networks to process and generate text.15 LLMs are trained on enormous datasets with billions of inputs—imagine starting by dumping all of the Internet into a giant bucket and you get the idea. Somewhat disturbingly, the contents of those databases are not transparent,16 and litigation has alleged that the databases likely contain a significant amount of copyrighted material.17 The LLM uses this dataset to process inputs (i.e., human prompts to the LLM interface) and predict the most helpful, common, and relevant responses.18 In effect, the LLM crowdsources answers to the questions it is asked, by using its skills in pattern identification and prediction to generate the most likely and useful textual responses given the information in its database. The tools we know as ChatGPT, Bard, or Bing are simply interfaces using LLMs to produce responses to prompts from external users. Generative AI's flaws are predictable, given how the technology works. Though as humans, we are likely to anthropomorphize LLMs and attribute them with human characteristics,19 they are not thinking beings—they are simply computer models with predictable flaws arising out of an astonishingly complex but ultimately nonhuman system. One of the most glaring flaws of LLMs is their potential to produce inaccurate, false, or flawed responses, in which the model puts together strings of text that do not point to accurate information but mimic accurate information in a compelling way. For example, when prompted for a peer-reviewed journal article, an LLM might string together a series of notations that look like a journal article, with a title, volume, issue, and page numbers but which does not point to an existing source. LLMs can create a variety of types of false information, such as false journal articles, false meetings of historical figures, and false medical information.20 Early in the use of LLMs, these responses were dubbed "hallucinations."21 Notably, even that term is anthropomorphically loaded—it suggests the LLM is operating in a semi-conscious human way, "imagining" events that did not occur, rather than simply acting as a flawed computer model producing inaccurate results. Another essential limitation of LLMs is also rooted in their technological structure. LLMs are, by their nature, predicting and generating text that aligns with information already in their database. The information in their database was generally created by humans—flawed and biased humans, producing information that is also flawed and biased, operating in societies with existing structural racism and bias. If faced with a question like "please list the top twenty jazz pianists," the LLM does not independently assess the work of all living and dead jazz pianists and reach a conclusion about their relative merit. Rather, it crowdsources information, identifying patterns in commonly reported responses to this question and developing a prediction based on that information. LLMs cannot account for historical inequities and biases in reporting on artists, let alone the biases that go into the development of artists and their access to recording contracts or critical acclaim.22 In an area like jazz music, with deep historical roots of inequities in the treatment of men and women,23 one cannot expect an LLM to do anything other than repeat existing biases, which are reflected in popular source materials. As a result, when queried, ChatGPT, crowdsourcing from existing sources, reported no women in the top 20 jazz pianists.24 ChatGPT produces this result, despite the historical significance of figures like Mary Lou Williams, Alice Coltrane, and Shirley Scott, iconic female artists well-known to jazz scholars and aficionados alike.25 DALL-E, a generative AI tool that creates images based on textual prompts, produces similarly biased responses. When prompted to create a "realistic photograph quality picture of a college professor talking to a large group of students," DALL-E created the following image, of a white male professor surrounded by white students Figure 1: Is this image "incorrect"? Survey data from 2019 suggests over seventy-five percent of professors are white, while fifty-five percent of undergraduates are white.26 Data from 2021 show white men continue to represent the largest percentage of tenure-track faculty.27 Given this landscape, it is entirely consistent with AI's programming to generate images that center white men as professors and to leave women out of a list of top jazz pianists. Of course, the answer begets a closer look at the question. If we are asking which jazz musicians have been most widely recognized or what the "average" college professor looks like, the tools are performing as programmed. But there are many other questions embedded in this call and response, and they get at the heart of what it means to use generative AI in education. If the average user of an LLM assumes that this list of jazz pianists accurately represents a list of the most talented jazz pianists ever to have lived, the technology is reinforcing inaccurate information based on existing societal biases. To be clear, the LLM cannot and is not answering that question. The LLM, in this scenario, is also not considering the question of what cultural or legal barriers might prevent or have prevented women from becoming known in the field of jazz music (though it could, if directly asked).28 It is simply reproducing a list of commonly mentioned "top" jazz pianists. Similarly, DALL-E is not "incorrect" when it creates a likely image of a professor as white and male. The problem here is that research suggests the images we view shape our ideas of what is possible.29 Thus, if popular media images are dominated by the "most likely" outcomes generated by AI models, stereotypical roles will be harder to reject and move past. Generative AI can emphasize and blind us to the way our cultural biases show up in everything from the medical literature to our notions of beauty because they are, by their programming, designed to reflect a majoritarian view of the world. To be clear, DALL-E and ChatGPT are not different in this action than existing search engines or media companies, which are also built on crowdsourcing and majoritarian views. The challenge is that, as these tools become ubiquitous and embedded in our daily lives, we may come to view them as definitive sources of truth, even more so than we currently do with search engines. When a student asks ChatGPT a medical question and receives a conversational response that mimics a medical professional, they may assume the answer they are receiving is accurate for all populations rather than understanding that the information is likely based on a medical study of young, white men and may not be representative of the population as a whole.30 These are two of the most prominent concerns about generative AI, but there are other flaws worth noting. Generative AI cannot replace the need for professionals in real life to "think on their feet" and react quickly in scenarios that require human empathy and communication skills. It cannot respond in real time to complex scenarios with constantly changing fact patterns and complicated human actors. It bases its responses on existing scenarios and patterns, so it is unlikely to make truly innovative, creative leaps. Because it lacks human intuition and feeling, it also cannot look below the surface of a question to respond to a deeper, more essential question, or even draw out the question that is not asked. As a consultant might say, "AI can only give the client what they ask for—so my job is secure." Implied: We often do not know what we want, in business or in life, and we rely on the insight from the humans around us to help identify our underlying needs. One approach or response to the wave of generative AI would be to plant a flag in the ground and refuse to move or evolve. Rather than changing the content of their courses, in this scenario, faculty might argue that universities and colleges should instead institute more stringent academic integrity policies and punishments and develop better ways to "catch the cheaters." This position is fundamentally flawed, for two primary reasons: (1) faculty will never be able to "AI-proof" their courses, and (2) learning the responsible use of AI is essential to students' future success. First, while creating an environment that supports academic integrity is important, some assignments and assessments will always be able to evade restrictions on the use of AI. While many faculty have reasonably decided to turn toward more in-class assignments and assessments to reduce cheating, this option is not available to those teaching online. Moreover, not all assignments can be completed during class hours. What about developing more and better ways to catch the cheaters? That is, can we not just proctor our way out of this situation? Even keeping in mind that online proctoring has limited application and can only be used for time-limited assessments, the reality is that many universities had begun moving away from AI-enabled proctoring before ChatGPT came on the scene due to concerns about privacy, discrimination, and unreliability.31 What about AI-text detectors like ZeroGPT and GPTzero?32 Can we not use AI to detect AI? Unfortunately, these tools have been shown to be unreliable33 and biased against non-native English writers.34 In addition, work-arounds are notoriously easy to find, most commonly (and ironically) by using AI paraphrasing tools.35 Faculty can create assignments that are easier to complete with generative AI and those that are harder. In most cases, however, particularly in large undergraduate survey courses, generative AI will be able to complete most assignments, particularly those completed outside of class. If ChatGPT can pass the Bar exam, it can probably draft a better legal essay than most undergraduate students. It certainly can undertake any multiple-choice questions a faculty member might write. This means students will have to decide to complete their assignments knowing that they have the answer key (via ChatGPT) and knowing faculty probably cannot catch them cheating. Students inclined to cheat (for any number of reasons, which might include the struggle of balancing school with full-time work,36 caring for family members,37 or dealing with other challenging life circumstances like food insecurity38 or physical or mental illness39) now have a straightforward way to successfully complete their assignments. Students who do try to complete their assignments without assistance may as a result be at a disadvantage, particularly in classes using curved grading, and faculty would not be able to tell one group from the other. This could lead to grading inequities, not to mention even greater incentives for students who would not normally bend the rules, if only to be able to stand on a level playing field. Second, and I believe even more importantly, what faculty teach should have relevance to the way their students will undertake their future work. For those teaching undergraduate business law courses at a business school, students are unlikely to become lawyers. Rather, they will most likely be applying legal content to business situations. They need to learn how to apply the information they are learning to these real-world scenarios. Because their real world will likely include generative AI, learning how to use it safely, responsibly, and effectively to identify legal risks and strategic legal opportunities and how to fill in the known gaps and defects of the technology, particularly as relates to bias and inaccuracies. In my experience, faculty want to evolve but are not sure how. Even knowing that students will be likely to use AI tools in their future work, many faculty remain convinced that there is some basic, fundamental information that students need to use the technology responsibly and to avoid the limitations described above. At the same time, faculty are unsure how to integrate this technology into their courses while retaining the essential information students need and focusing on the human-thinking skills that generative AI cannot provide. Finally, many are aware of the limitations of generative AI discussed above and want to avoid using the tool in a way that furthers biases or inhibits creative and critical thinking. In this part, I begin to address these challenges. I discuss how to start using generative AI and how to adapt the curriculum to meet the needs of students today. This process will include a look at both content and pedagogy—the what and the how. Before moving further, however, I must emphasize that faculty themselves must become conversant with the basics of using generative AI and LLMs. Your subdiscipline's tools may vary, but at a minimum, anyone can start with ChatGPT, Bard, or Bing (all of which have a free version) to get familiar with the technology. Most colleges and universities with a center for teaching and learning have training resources available, including opportunities for coaching and mentoring, and articles about the integration of AI within and beyond their organization,40 and a quick web search for articles about "using ChatGPT in higher education" will yield thousands of results. To adapt the content of a course to meet the challenge of generative AI, I propose two primary steps. First, faculty should review their learning outcomes. Then they should consider if the content they are covering (and assessing) falls into an essential content area. After doing so, faculty will likely find that some, or perhaps much, of the content currently included in the syllabus is not essential. They may also find content that will be difficult or even impossible to assess, given the availability of generative AI. Once faculty remove that content and revise learning outcomes, they can then focus on how to best teach essential legal content, while also integrating generative AI into class assignments in a meaningful and relevant manner, allowing students to use the technology responsibly and effectively in the future. Learning outcomes are statements of an intended outcome for a course or program, often constructed in measurable terms to guide the development of metrics of assessment.41 A common tool used to guide the development of learning outcomes is Bloom's Taxonomy, a classification tool for learning outcomes that divides them into six categories of learning (remember, understand, apply, analyze, evaluate, and create). Bloom's Taxonomy follows a hierarchical manner, with lower levels serving as a base for higher levels of thinking and learning.42 An extensive literature surrounds the development of learning outcomes, and many higher education institutions now mandate the use of learning outcomes in courses and programs, often for accreditation purposes.43 To address generative AI, faculty must keep in mind that students will have basic information and content related to legal subject areas readily available. Moreover, generative AI can easily substitute and complete assignments that test lower levels of learning and cognition. As a result, focusing a course on recalling and identifying information is likely to make the course less useful to students and more vulnerable to cheating. Rather than focusing student learning on skills that AI is likely to perform as well or better than a human, faculty should teach and assess uniquely human skills that AI cannot replace. Faculty should focus on how students will use, apply, and think critically about the information they can readily find rather than memorizing content. The Oregon State University (OSU) Ecampus division of online learning has reframed Bloom's Taxonomy in light of the pressures of generative AI.44 For each level, this revised Bloom's Taxonomy compares what AI can do with associated human skills AI cannot replicate. For example, for the learning outcome of "recall," this chart notes that AI can readily recall faculty information, while the associated uniquely human skills would be recalling information when technology is not available. At the level of "analyze," the chart notes that AI can compare and contrast data, and infer and predict, while the associated human skills are critically thinking and reasoning within cognitive and affective domains. In general, the chart suggests that "[a]ll course activities and assessments will benefit from review given the capabilities of AI tools; those at the Remember and Analyze levels may be more likely to need amendment."45 What might this amendment or reframing look like in a business law course? Consider a learning outcome such as: "[s]tudents will be able to identify and describe the concepts of offer and acceptance in a contractual setting." Identify and describe are learning outcomes at the lower end of Bloom's Taxonomy and can easily be replicated by generative AI. This learning outcome could be revised to call on higher levels of thinking: "[s]tudents will be able to evaluate whether a hypothetical scenario could be considered legally binding based on contractual concepts of offer and acceptance." Many faculty reviewing this example will point out, appropriately, that students cannot apply the concepts of offer and acceptance without learning what those terms mean, or what it means to use legal reasoning in a hypothetical fact scenario. This brings us to the next question faculty face when revising learning outcomes: What content-based information must remain in a course in order for students to engage in higher-level reasoning? I suggest there are three primary areas of content that are necessary to support higher levels of thinking: structural basics, key content areas, and magic words. I provide examples below of each of these concepts from an introductory business law class Figure 2. Structural basics are organizing concepts that higher-order concepts build and rely upon. These are basic elements of your field or discipline that are essential to applying more specific concepts using higher-level thinking. For example, students must understand the concept of a legal theory and how legal claims are often organized into "elements" before they can determine what legal theories might apply to a given fact scenario. Students also need to understand how the legal system is structured, including the difference between federal and state courts, to understand how, for example, federal constitutional rights differ from state statutory rights. Magic words are specialized words and concepts that can shift the outcome of an AI prompt. Students must learn magic words to create successful prompts that use generative AI capabilities effectively. In law, the concept of "magic words" can be considered the use of specific terminology to invoke legal meaning, typically created by statute or case law. For example, the concept of a "reasonable person"46 has a specific meaning in the context of negligence and tort law. In the context of generative AI, magic words can be considered language that will change the outcome of a prompt. Considering these concepts together, students may need specific legal language to craft a prompt in an LLM that will generate a useful and legally appropriate response. Consider the notion of a "reasonable accommodation" under the Americans with Disabilities Act (ADA). A "reasonable accommodation" has a specific legal meaning. As a result, asking ChatGPT whether an employee's request is a reasonable accommodation produces a different response than asking what might be a reasonable request. For example, when I asked ChatGPT whether it was reasonable for an employee to be given dictation software, ChatGPT discussed efficiency, fairness, and a potential employee contribution, among other factors. When my prompt used the term reasonable accommodation, the response included information about the interactive process, essential functions of the job, and legal requirements, including the ADA.47 Key content or concepts refers to concepts or content that can be applied when learning related or analogous content areas. For example, the concept of the reasonable person can be found in tort, criminal, and contract law, though it has different meanings based on the specific context. Understanding the concept of "protected classes" is essential when applying Title VII of the Civil Rights Act, but the same concept, with different protected classes, is useful when understanding the ADA, Age Discrimination in Employment Act, or state anti-discrimination laws. Understanding that employment law requires an analysis of whether a plaintiff has been discriminated against due to their identity or perceived identity as a member of a protected class allows students to learn about one area of content (i.e., Title VII) and apply that to another area of content (i.e., state-specific discrimination in employment laws). By identifying these essential content areas, I do not mean to suggest these areas must appear in specific learning outcomes or that students should be expected to memorize the definitions of these terms. Rather, students should be able to use this information when applying these concepts in a complex scenario. This content should remain but that does not require a new learning outcome at the "memorize" level. Let us apply this concept in the context of our contracts learning outcome. As noted above, faculty may believe that the concepts of offer and acceptance are essential foundational knowledge to a learning outcome that requires applying these concepts. Certainly, any overview or discussion of contracts will include a discussion of contract formation, which requires an understanding of offer and acceptance. This, however, does not mean that students must be able to recall those concepts (i.e., memorize) and reproduce them, unaided, in an assessment. They simply must be able to use those concepts effectively when analyzing a hypothetical scenario or generating a query for ChatGPT. When struggling to determine how to include this information in your curriculum, consider how students are likely to encounter those terms in their lives in the future. For example, I have worked with student-athletes who have received offers for name, image, and likeness (NIL)48 deals via Instagram direct message. The athletes often believe they cannot "accept" an offer via social media or through a direct message. When working with these students, I discuss the concepts of offer and acceptance. After a short conversation about these concepts, I can then ask them to assess whether a particular scenario (deliberately created to be ambiguous) includes a legally binding offer and acceptance. For an in-person class, I can have them discuss this scenario without the use of AI and then test their conclusions against what ChatGPT suggests. In an online class, where I cannot control the use of AI, I can have the students engage with ChatGPT to assess the scenario and then send me the prompts they used and a record of the conversation. I can then have them discuss which side they find more convincing, or ask them to decide which of several cases is most applicable to the scenario. In short, rather than trying to keep AI out of the conversation, I bring the technology in, just as they might in their real lives, and then ask them how they would take the next step of testing the AI's conclusion before they apply it to their lives. In addition to these areas of substantive content, three other areas of curriculum stand out as deserving to remain when updating a course in light of generative AI. They are focused on critical thinking skills to address the limitations of AI as applied to a specific topic area and include: research tools, decision-making factors, and areas of bias Figure 3. Research tools: As described above, generative AI may provide false or misleading responses to prompts. It may invent citations, prior facts, and data. Thus, students need to be taught how to verify the information they get from a generative AI tool. They need the skills to find the primary source behind a citation, such as a legal statute or database. They may need to be able to navigate legal websites, read cases, and analyze statutory language to cross-reference an AI-generated response. Instead of teaching specific case law, this guideline stresses the importance of teaching students how to read cases. Instead of teaching rules based on an existing statute, this guideline also suggests that students need to learn how to find the statutes for their state and how to apply the statutory language. Problem-solving: Metacognition is an umbrella term that has been used to describe a variety of cognitive processes.49 Meta-strategic knowledge (MSK) is a subcomponent of metacognition in which individuals have "an awareness of the type of thinking strategies being used in specific instances."50 MSK includes knowledge of specific thinking strategies tuned to particular content areas and why and how to use them.51 In law, there is a very specific way of analyzing problems we often call "thinking like a lawyer," which might also be called the MSK for law. The characteristics of legal thinking include considering both sides of an argument, developing counter-arguments, and stress-testing conclusions against opposing views.52 It also involves thinking through analogy—applying existing legal rules to novel situations that may look nothing like the original, and justifying that application through the identification of similarities in fact, precedent, and/or policy.53 To become adept at analyzing legal scenarios and developing good prompts for a generative AI tool, students must learn this way of thinking. Similarly, students in a chemistry class will need to understand the scientific process, while students in a literature class will need to learn what tools to use when analyzing litera

Referência(s)