Forum on Artificial Intelligence

Artigo Revisado por pares

Forum on Artificial Intelligence

2024; University of Illinois Press; Volume: 76; Issue: 1 Linguagem: Inglês

10.5406/19346018.76.1.05

ISSN

1934-6018

Autores

Craig Erpelding, Jack Beck, J. D. Swerzenski, Thomas Brecheisen,

Resumo

in this forum of short essays, Craig Erpelding, Jack Beck, J. D. Swerzenski, and Thomas Brecheisen explore the transformative role of artificial intelligence (AI) in screenwriting, postproduction, and teaching. AI language models and tools have emerged as valuable assets to assist in script development because they offer ideas and provide informative feedback. Beyond scriptwriting, AI creates new opportunities for characterizations, and AI tools have useful applications in video editing and film education. The four essays examine the capabilities of AI language models as a development and scriptwriting tool, the impact of AI on characterizations in film using a Chionian perspective of voice–body duality, Adobe's AI-powered Text-Based Editing tools, and the integration of AI tools in film education, highlighting the potential benefits of enhanced student engagement and immersive learning experiences while also addressing the challenges of maintaining ethical standards and human creativity in the face of technological advancements. —Forum editor Thomas BrecheisenThe advancement of artificial intelligence (AI) language models, such as ChatGPT, has impacted the creative industries, including movie development and screenplays. This discussion considers the capabilities of ChatGPT as a tool for scriptwriting, looking not only at its ability to write scripts but also at its potential to assist in script evaluation and feedback. It explores research on AI's capacity to generate outlines based on concepts, loglines, synopses, and character descriptions, while also looking at its ability to conform to various film and television structures. It considers how users might optimize the performance of ChatGPT in their writing process. At the time of this article's inception, Hollywood unions were embroiled in heated negotiations between writers, actors, and studios, some of them involving the impact of AI on the industry. This author feels that as of 2023, current AI engines do not have the ability to replace human creatives in screenwriting. However, based on advances in AI in music, songwriting, animation, and other creative realms, AI could have a significant impact on the future of filmmaking.First, what is ChatGPT? ChatGPT is a conversational assistant application that operates on an advanced generative AI engine—specifically, at the time of writing in July 2023, the GPT-3.5 architecture. It is pre-trained on a massive dataset containing diverse text sources to develop a deep understanding of human language patterns and contextual meaning. When a user inputs a prompt or a question, ChatGPT processes it and generates responses by predicting the most probable continuation based on its pre-training, creating coherent and contextually appropriate responses. The interface allows for multiple conversation lines and has a progressive memory of information within a conversation line. As of July 2023, the extensive GPT-3.5 architecture pre-training of the engine seemed to be based on broad Internet searches, social media, and global hot topics, as will be described here.ChatGPT's understanding of the movie and screenwriting context allows it to respond accurately to prompts such as "Write a dialogue between two detectives investigating a murder case" or "write a scene for a movie using the following characters . . . " When a prompt is typed into ChatGPT, such as "write a feature screenplay," the content of the screenplay generated by ChatGPT is configured based on the knowledge and patterns it has learned from the vast amount of text data on which it was pre-trained with GPT3.5, which has a historical limit to 2021.Scripts generated may include elements, themes, and plot points commonly found in typical, contemporary feature screenplays. Thus, scripts from ChatGPT result in what one might call "cliché" characters, dialogue, conflicts, settings, and story arcs. In the author's research, typing "write a feature screenplay" into ChatGPT in July 2023 resulted in a movie outline called The Enigma Equation, which focused on underground societies pulling the strings of government to affect human culture while exploiting the influence of mass media. This is very similar to hot topics and sentiments found across the Internet and social media at the time of the engine's most current historical pre-training boundary.The language capabilities of ChatGPT enable it to produce scripts that incorporate defined plots, developed characters, and systematic dialogues. By leveraging its vast pre-trained knowledge of film scripts and narrative structures, it is entirely possible that ChatGPT could deliver compelling and professional-grade cinematic stories that cater to various genres and styles.When prompted to write a romantic comedy script, ChatGPT attempts to create a heartwarming storyline with witty dialogue and relatable characters that align with the genre's conventions. The author prompted ChatGPT to write "the scariest opening scene to a horror film" it could imagine, and the engine created a short scene that could understandably be the blueprint for a good opener that squarely fit within the genre. Similarly, the author prompted ChatGPT to write and rewrite scenes based on tone, such as "dark and vengeful" and then "whimsical and goofy fun," which did provide results, albeit clichéd.Thus, through its generative AI capabilities, it seems that ChatGPT can create sample movie scenes that evoke specific emotions, fit various genres, adopt distinct styles, and set appropriate tones. This feature could benefit writers and filmmakers in exploring creative possibilities and in experimenting with different elements in the scripting and development phase.According to the author's experimentation, as of July 2023, the engine seemed to have difficulty generating scripts longer than a few pages and containing more than one location. Formatting, which is vital to the industry for a variety of reasons, seems to be something the engine is not capable of. It does understand the difference between action, slug lines, and dialogue and even incorporates "Fade In" and "Fade to Black." However, it does not utilize Courier New font or appropriate tabs; it does not capitalize important sounds, is inconsistent with character names from one scene to the next, and overuses parentheticals. Additionally, while the engine seemingly could write serviceable dialogue, it does not write action blocks visually, as would be expected in a screenplay. Rather, it generates these blocks more as prose in a novel.It is important to note that while ChatGPT can generate coherent responses, it does not have consciousness or creativity like a human screenwriter. When the author asked more nuanced creative deliverables of the engine, it identified its inability to generate a response, "as it would require extensive creativity and storytelling skills." Moreover, ChatGPT can sometimes produce content that is repetitive, nonsensical, or off-topic, especially when the prompt is ambiguous or lacks clear instructions.In the case of The Enigma Equation, the author requested ChatGPT to write each scene of the outline, resulting in scenes that when strung together as a full screenplay lacked continuity, left huge gaps in plot, forgot or stranded characters who had been introduced with some importance, and more. Therefore, users will often need to provide more specific and detailed prompts to guide ChatGPT and refine the generated content to meet their specific narrative requirements. There is a "thumbs up" and "thumbs down" function when ChatGPT re-generates a response that seems to help guide results to a more satisfactory place.While not particularly good in quickly or easily writing serviceable screenplays for the industry, ChatGPT might serve as a valuable tool for script evaluation. Writers can use ChatGPT to analyze scripts for strengths and weaknesses, identify potential plot holes, and assess the overall feasibility of a project. The author found that typing "evaluate this screenplay" and pasting in the text of the script, even when feature-length, resulted in both positive and negative feedback, such as commending a "well-defined plot" and noting that a character's "motivation for exploring the woods is not clearly defined." Thus, evaluation tools in ChatGPT could streamline early stages of internal script reviews.ChatGPT can quickly generate outlines for loglines, synopses, and character descriptions, aiding writers in structuring their concepts. A writer can request an outline for a logline based on a sci-fi concept, and ChatGPT will deliver a concise and attention-grabbing summary that highlights the movie's unique premise. The author tested this on a variety of concepts in this genre with positive results. With key details about the plot, characters, and setting provided, ChatGPT did create seemingly well-organized outlines based on the provided information.Based on experimentation, it seems the pre-training of ChatGPT includes well-known movie structures such as the traditional three-act structure, Joseph Campbell's Hero's Journey, Blake Snyder's "Save the Cat!" Beat Sheet, Syd Field's Paradigm, and the 15-Minute Movie Method. According to ChatGPT, it also can combine elements from various structures or create new storytelling frameworks, allowing for more unique and original narrative approaches. At the time of writing, this was not tested or researched.Ultimately, the results of the author's current study show that ChatGPT may be a valuable tool for generating ideas and providing inspiration for screenplays. However, the author's testing leads to the confident conclusion that human creativity, critical thinking, and editing are indispensable to processes necessary for crafting a compelling and polished feature screenplay.The topic quickly becoming more topical is the increasing utilization of AI in film production. A growing concern is how AI can negatively affect the industry. The 2023 SAG strikers fear a future of actors being replaced by AI 3-D replicas. A recent Black Mirror (Netflix 2011–present) episode, "Joan Is Awful," involves a Netflix-like company, Streamberry, using a quantum computer and CGI to create virtual actors in automatically generated storylines for broadcast. Before and still today, AI characters abound. Recent feature films Mission Impossible: Dead Reckoning (Christopher McQuarrie, 2023) and The Creator (Gareth Edwards, 2023) both sport Earth-destroying AI.However, what is a threat to some remains a wellspring of content for the screenwriter. It is intriguing to film historians and theorists how characters can be very much like humans, but not. This offers a lot of creative possibilities, such as unexpected heroes, maniacal villains, and hapless victims. AI portrayal often reflects societal fears, ethical conundrums, fascination with technological advancements, and philosophical questions surrounding the nature of consciousness and humanity. AI characters seek self-realization, rights and freedoms, and human desires (OpenAI)—all core story content.This author's particular fascination prioritizes the creative challenges and opportunities involving vocalization and AI characters. In sync dialogue, humans are simple and boring: if the onscreen character's voice is heard, and the mouth is moving in sync, audiences have a match. To do things in a more interesting way, filmmakers need horror, supernatural, animation, or AI.A study of AI characterizations might begin with the following overview of the character types and tropes: Malevolent—The Terminator (James Cameron, 1984), including SkynetBenevolent—Star Wars (George Lucas, 1977), R2-D2 (Kenny Baker)Sentient—Ex Machina (Alex Garland, 2014), Ava (Alicia Vikander)Companion—Her (Spike Jonze, 2013), Samantha (Scarlett Johansson)Dilemma (Ethical)—The Artifice Girl (Franklin Rich, 2022), Cherry (Tatum Matthews)Augmentation—Iron Man (Jon Favreau, 2008), Jarvis (Paul Bettany) and the suitDystopia—The Matrix (Lana and Lilly Wachowski, 1999)Comedy—Hitchhiker's Guide to the Galaxy (Garth Jennings, 2005), Marvin (Warwick Davis and Alan Rickman)AI runs the gamut of bodily categories. There is the android with human qualities and appearance, but with mechanical and artificial designs, emphasizing the contrast. Data (Brent Spiner) from Star Trek: The Next Generation (Paramount 1987–84) and Ava from Ex Machina are two that are well-known. Data wishes in many ways to become more like his human colleagues while Ava contemplates killing and then kills her human creator, representing the full spectrum.What is not obvious is that even basic human character sync involves the "contract of belief" because as theorist Michel Chion explains, "the process of 'embodying' a voice is not a mechanistic operation, but a symbolic one . . . provided that the rules of a sort of 'Contract of Belief' are respected" (Audio-Vision 129). A contract of belief exists when filmmakers provide clues or codes that there is a certain source-body for a voice. Embodiment is the translation from Chion's mise-en-corps—putting into a body. The spectator/auditor attaches the voice and body; there is a separation, and an individual creates a suitable, desired union. The audience sees Data talking but remembers it is still an illusion. It is not Data talking; the sound is sourced from a speaker centered behind the screen. Data is not right of frame, but colored, light, rapid stills represent Data. The image sources no sound. Here sync is a fiction; it is a ventriloquism. But this all takes on greater meaning as examples move more into objectified bodies.There are AI avatars. In Tron (Steven Lisberger, 1982), Tron (Bruce Boxleitner) and Yori (Cindy Morgan) are programs written by users that are personified inside the computer. Flynn (Jeff Bridges) enters as a human, becoming Clu. In TRON: Legacy (Joseph Kasinski, 2010), audiences see the AI face de-aging of Jeff Bridges, ironically in a computer controlled by the evil AI MCP, personified in Sark (David Warner), also the CEO, and so on.Some films explore the idea of transference of AI consciousness or intelligence between different bodies or forms. Jarvis, the AI brain computer for Iron Man, is transferred into human form with Vision.Another offshoot of the android is the cyborg—a person whose physical abilities are extended beyond normal human limitations by mechanical elements. In Terminator 2 (James Cameron, 1991), the Terminator (Arnold Schwarzenegger) can trick the T-1000 by disguising his voice as that of the boy, John Connor (Edward Furlong). Moreover, the T-1000 can shapeshift to and disguise his voice as John's foster mother. The body–voice presumption can be manipulated with technology, both within the film and in postproduction.Additional sound theory terms about voice–body are worth engaging here. Synthesis is the combining of visual and sound as one, in ways that include sync and other means. It is the grand attempt at uniting the two senses in the cinematic experience. Synchresis is an oft-used Chionian term: "mental fusion free of logic, occurs with sync" (The Voice in Cinema 129). If you present any sound seemingly sourced by any subject, the observer will (willingly) believe they are together, sync or not. Chion argues that synthesis is symbolic in that it often avoids vocal phonation in favor of the body—consider the breathing diaphragm, abdomen and core muscles, gesturings, glances, and throat or larynx and the particularly exaggerated machinations of AI robots.Robots can provide a loose sync but often effectively do not create attachment through source as much as through behavior and design. TARS (Bill Irwin) from Interstellar (Christopher Nolan, 2014) has a tinny voice, as though it reverberates off its body. The robot has no mouth but aspects of eyes (monitors at eye level) and arms (integrated into a monolithic-inspired design). In Star Wars, C-3PO (Anthony Daniels) also has that odd quality of a masked voice in a face that does not move or express. But all the exaggerated gesticulations and mechanical joint sounds are the sync substitutes, and audiences have the contract and assign voice to the body—synchresis. Note that R2-D2 emits merely sound effects that are quite linguistic and well-chosen; plus, there is the swiveling head as the robot considers what to do (with associated mechanical flourishes for contract). In the scene where R2-D2 tries to convince C-3PO to follow him on Tatooine, R2-D2, uncertain, contemplates, looks both ways and at the lens, blurts a curse-effect, then exerts one more insulting final scream of desperation—no "moving mouth" sync, but quite effective dialogue nevertheless.The quasi-embodied AI requires a greater reach, a gap that is expressed by Chion as the cinematic duality. He explains, "The sound film, for its part, is dualistic. Its dualism is hidden or disavowed to varying extents . . . the physical nature of film necessarily makes an incision or cut between the body and the voice. Then the cinema does its best to restitch the two together at the seam" (Audio-Vision 125). Chion goes on to say that this is the grafting of the non-localized voice onto a particular body. The operation leaves a scar, which is the sync, observed or intimated. The act of synthesis with sync is described symbolically and graphically as bodily mutilation, surgically repaired.HAL from 2001: A Space Odyssey (Stanley Kubrick, 1968) has a famously soothing voice provided by actor Douglas Rain. HAL's signature is close and unlocalized, characteristics of narrator or God. He has acousmetric powers: being the entire ship, everywhere and all-seeing; having the ability to control and kill; and being the all-knowing supercomputer. The scene where he is playing chess with Poole (Gary Lockwood) is telling of his superior logic and intelligence. The diagonal angle on eyeline favors HAL (cyclops of the Odyssey). He is arguably embodied, with the red pupil lens and speaker beneath. Constant activity in panel screens keeps the body alive or breathing or thinking, in a sense. On another level, HAL is the ship, Discovery, with its bulbous head, spine, and tail design.Later, Bowman (Keir Dullea) must ascend a "neck" of the ship to the "brain" of HAL. This is getting inside the head of a computer and killing the deity through electronic lobotomy. In the close of the scene, HAL sings "Daisy Bell" while reverting, devolving, descending to infancy—a birth/death combined and a foreshadowing of the final scene and the birth of the Starchild. Herein is the common sync duality dictum—kill the voice, kill the body.A similar space supercomputer, but more sympathetic, can be found in Moon (Duncan Jones, 2009). Gerty (Kevin Spacey) is Sam's (Sam Rockwell) caretaker. It is a friend, father, physician, therapist, protector, guardian. It resurrects and provides, and like HAL it keeps secrets from the humans against its nature. Gerty has a changing, emoji face-screen on the front of its mobile, boxy, machine body. This is an interesting "mask," representing its emotion/disposition without providing a sync source.WarGames (John Badham, 1983) is less often cited in the AI oeuvre but has many innovative themes of duality: father/son, adult/child, death/resurrection, innocence/war, monster/friend. A teenager, David (Matthew Broderick), summons an intelligent computer. The bedroom terminal voice-speaker is affectionately named "Joshua" when awakened with the eponymous backdoor password. The human Joshua is the deceased son of the "resurrected" father-creator, Falken (John Wood). Falken's AI counterpart is the massive, remote, mute, warlike (adult) mainframe, WOPR. Present Joshua wishes to play chess, but David insists on a global thermonuclear war that takes over NORAD and nearly starts Armageddon. Violence and innocence, voice and body—these are dualities that mirror the dual personality of Joshua and WOPR. The body is objectified in two places at once.Her asks important questions about AI. Can a fully realized emotional artificial intelligence be indistinguishable from a human? Can said AI be an actual, valid romantic partner? What is love? Theodore (Joaquin Phoenix) is a crestfallen man who discovers a "perfect" companion in his new, female AI operating system. In the "meet-cute" scene there is an unnatural closeness/dryness to Samantha's voice, while Theodore is markedly distant. One could argue that the same thing is happening in 2001 and Moon. The elevated quality of the voice is because the voice is the being; the body is less of a concern, almost neutralized or in the ether. The most theoretically significant scene occurs when Samantha has convinced a surrogate woman to engage with Theodore, serving as a body for Samantha's voice in his pocket. The symbolism indicates another duality: the voice as intellect, the body as pleasure. When things go expectedly wrong, two bodies and three voices become a triangular argument, with the women cornering Theodore and his transgressions. The women are disembodied or hidden; Theodore, seen, tries to console them both. It is an effectively sad and honest, but nevertheless absurdly funny, scenario.With concerns about the impact of AI on daily life, art, and teaching continuing to grow, the opportunities remain vast for AI characterizations in fiction. The Chionian perspective keeps the theoretical and fundamental separation of sound and image, voice and body, in mind for all.Adobe's 2022 MAX conference, emceed by Kevin Hart and Bria Alexander in front of a packed house at the Los Angeles Crypto.com Arena, saw the debut of several splashy AI-powered tools for the developer's Creative Suite of programs. Garnering most of the attention was Firefly, a Midjourney/DALL-E style image generator that allows users to alter specific elements within a Photoshop canvas via text command prompt. Less heralded was a feature eventually slated for Adobe's Premiere Pro video editing software, one that would make editing video as easy as editing text ("Project Blink"). Instead of using the usual video editing workflow of watching back and editing from raw footage, users could work directly from a transcript generated from that footage, highlighting specific lines and inserting them back into the timeline as edited video clips. Since the conference, the feature now known as the Text-Based Editor has made its way from beta form to Premiere proper in May 2023, rolled out alongside a bevy of other AI-powered tools for the program ("AI Video Editing"). Yet despite its rather quiet debut and arrival, I argue the Text-Based Editor's impact may hold much wider significance in the realm of video editing, portending a reshaping of the practice reminiscent of recent AI-driven transformations in writing and illustration.First, what is the significance of this specific piece of software? Adobe Premiere Pro is one of the "big four" professional video editors, along with Avid's Media Composer, Black Magic's DaVinci Resolve, and Apple's Final Cut Pro. Since its debut in 1991, Premiere has become the standard editor in advertising, corporate settings, and higher education ("Companies Using Adobe Premiere"). It has also made some inroads in Hollywood production—notably as the editor used for Best Picture winner Everything Everywhere All at Once (Daniel Kwan and Daniel Scheinert, 2022)—though Avid's Media Composer remains the accepted norm at present. More significantly for the purposes of this article, Premiere has introduced the most AI-based tools of the big four editors, part of a larger directive from Adobe to stay at the cutting edge of AI development across its suite of tools ("Future Vision").Adobe has promoted the Text-Based Editor and many of its other new features as "AI-powered tools," largely to leverage current cultural fascination around the term. However, many of these functions have existed in some form on the program for years. Most notably, the Auto Transcribe tool, which serves as the foundation for the Text-Based Editor, dates back to version CS4 from 2010. The Auto Transcribe function allows users to convert dialogue from a selected video clip into text script, an affordance that, for most of its decade-long existence, primarily served as an easy means to generate captions. In this sense, the Text-Based Editor is a reimagining of the existing Auto Transcribe feature: taking what had been a helpful, though auxiliary, tool and making it the basis for the video editing process itself. Rather than following the standard digital video editing practice of taking raw footage from the clip bin, users instead start from the raw text of the transcript. The practice of setting In/Out points or Trimming/Razoring raw footage to create edited clips is replaced with the copy-paste logic of text editing, where users highlight lines from the transcript and insert the video of the individual saying those lines into the timeline. The Text-Based Editor also scans for pauses and other long silences and displays them as ellipses within the transcript, allowing users to right-click and remove these "annoying pauses" instead of having to identify, cut, and delete them from the timeline itself.The Text-Based Editor marks a quietly radical shift in the workflow of video editing, one that mostly retained the same clip bin/timeline interface from the launch of the first digital nonlinear editing programs in the 1980s (see Swerzenski). Yet there is significant evidence that this shift might soon become more pronounced. An earlier iteration of the Text-Based Editor—released under the name Project Blink—analyzed video clips not just for language, but for a range of audio and visual content. Users could search the transcript for specific objects in the frame, non-speech-related sounds, or emotions expressed by a speaker. Whether Adobe decided not to include these capabilities in Premiere due to the limitations of the technology or other concerns is not clear. More recently, Adobe has also advertised an AI-analysis feature to "recommend b-roll clips" ("Future Vision"). The AI analysis here involves auto-transcribing the speech from a given clip such as an interview and from that transcript finding B-roll clips with appropriate footage to overlay atop the interview A-roll. More than likely, this function will eventually connect to Adobe's Stock platform, such that with a subscription, users can have Premiere automatically insert relevant B-roll clips atop their footage by selecting (or more probably, purchasing) them from its Adobe Stock video library. These announced features represent just the first wave of many more functions to be built out from the Text-Based Editor.So what are the implications of these new AI-powered video editing tools, and how might they alter the practice of video editing? Most immediately, they will likely affect the employment prospects for a select number of video editors. The Text-Based Editor is built not for feature film editing, but for the sort of corporate and prosumer market in which Premiere has long dominated. Tasks such as editing down a Zoom call or conference presentation can now be completed without training in video production, which may certainly pose a threat to these sorts of editing jobs. Should Premiere further develop the sentiment analysis and face detection functionality shown in the Project Blink beta, it is not inconceivable that Premiere could also threaten other forms of video production such as wedding or event video markets. As with ChatGPT or Midjourney, these AI-automated outputs would pose an immediate threat to human producers not because of their quality, but because they require little skill level or cost to produce.Beyond this labor threat to non-entertainment-industry editors, the Text-Based Editor also presents a more fundamental challenge to the practice of video editing. Media theorist Lev Manovich calls software like Premiere a metamedium, one that by nature flattens other mediums—photography, film, animation, print design—into digital content (101–06). What is lost, however, when people treat all media with the same set of tools, are several important, medium-specific practices. Tools for working with one medium just get used on another, a maneuver Adobe Creative Suite users might recognize through the similar functionality and interface design across Premiere, Photoshop, After Effects, and Audition. This same tendency is on display with Premiere's Text-Based Editor, which treats the transcript of a video as an analogue for the visual. Absent in the conversion are the many visual nuances a video editor uses to determine where to make a cut: how not just the content of the words but the way they are spoken affects the shot, or how those "annoying pauses" might be the most compelling footage.This potential loss of medium-specific tools speaks to a real threat in the deskilling of video editing practice. Platforms such as Premiere sell themselves as vessels for one's own creativity, providing "all the tools to tell your story," as their promotional copy states ("Adobe Premiere Pro"). Yet Premiere does have an agenda of its own: an economic incentive to build dependency on its tools so that users continue to pay a monthly subscription to access them. Along with the other big four professional editors, Premiere has typically built this dependency by offering video editors more advanced color, mixing, cutting, and multitracking options. However, the Text-Based Editor represents a shift away from giving editors more tools to making sure editors can edit as little as possible. In its aim to help "save time," Adobe is casting editing itself as a menial task: a process of assembling footage that AI can handle for you ("Text-Based Editing").Speaking personally as a video editor, I can attest to the utility of the Text-Based Editing tool, particularly for the sort of interview-ba

Ver no editor

Altmetric

PlumX

Entrar

Lembrar minha senha

Receber meu e-mail de confirmação

Forum on Artificial Intelligence