The Actotron
2024; Queensland University of Technology; Volume: 27; Issue: 6 Linguagem: Inglês
10.5204/mcj.3118
ISSN1441-2616
AutoresJustin Matthews, Angelique Nairn,
ResumoIntroduction – The Advent of the Actotron Imagine a movie production where leading actors are not bound by human limitations, and digital entities render every emotion, movement, and line with breathtaking precision. This is no longer a conceptual idea but is becoming more possible with the increased integration of artificial intelligence (AI) into screen production activities. Essentially, we are at the dawn of the Actotron era. These advanced virtual actors, equipped with artificial intelligence, could transform not just how movies are made, but who makes them and what stories they tell. The Actotron promises to redefine the creative landscape, challenging our perceptions of artistry and authenticity in the digital age. The potential of the Actotron marks a milestone at the intersection of artificial intelligence, performance, and technology. This virtual human represents both a technological leap and a cultural shift that may revolutionise entertainment globally. Synthesising advancements in AI, motion capture, and voice synthesis, the Actotron enables autonomous performance, raising questions about creativity, copyright law, and the ethics of digital personalities. The capability for real-time learning and interaction pushes boundaries beyond CGI and deepfakes. Driven by AI algorithms and real-time graphics, the Actotron simulates nuanced human emotions, allowing dynamic interaction with human actors in media. Using future studies, we consider the potential emergence of the Actotron as the next step in digital actors and the place of artificial intelligence in the screen production industry. Method: Future Studies and Futurecasting To explore the potential and implications of the Actotron, this article employs methodologies from Future Studies and Futurecasting. These approaches are suited to assessing the Actotron due to their focus on creating plausible scenarios that envision future technological and societal shifts (Brown). Future Studies, as outlined by Miller, provides a structured way to consider potential outcomes and how current trends might evolve, utilising the "possibility-space" approach to explore future scenarios (Miller, "Futures"). This method allows us to escape the constraints of conventional forecasting, which relies heavily on past trends, limiting creative exploration of more impactful future scenarios. Exploring the Actotron's impact within a non-ergodic context—where historical precedents do not dictate future results—is useful. Miller explains that in unpredictable environments, traditional forecasting methods falter by not accommodating radical changes and emergent patterns (Miller, "From Trends"). This insight is vital for navigating uncertainties and recognising that the past may not be a reliable guide for future developments. Understanding this is critical for assessing how technologies like the Actotron could reshape media and entertainment, fostering a more adaptable approach to future possibilities. Futurecasting, as elaborated by Steve Brown, involves modelling future possibilities not to predict changes definitively but to prepare strategically for potential new realities. This approach aligns with the innovative essence of the Actotron—aimed at transforming performance landscapes and interactive experiences by anticipating shifts in technology and audience engagement dynamics. Miller highlights the critical role of anticipation in shaping decisions, emphasising its impact on developing technologies like the Actotron ("Futures"). By transitioning from trend-based forecasting to futures literacy, we can explore a wider array of possibilities beyond traditional prediction methods. By integrating Future Studies and Futurecasting and applying insights from the non-ergodic context and possibility-space approach, this analysis not only predicts but also prepares for a strategic future by providing a robust framework for understanding the societal impacts of technologies like the Actotron. CGI, Deepfakes, and Digital Actors The inception of Computer-Generated Imagery (CGI) revolutionised visual storytelling in cinema. Starting with simple wire-frame graphics in the 1970s, exemplified by Westworld (1973), CGI evolved into today's complex imagery. The 1980s and 1990s saw landmark films like Tron (1982), Terminator 2: Judgment Day (1991), and Jurassic Park (1993), demonstrating CGI's potential to create realistic environments and characters that enhanced narrative depth (Das). In the late 1990s and early 2000s, digital actors or "synthespians" emerged. Films like Final Fantasy: The Spirits Within (2001) and The Polar Express (2004) used full CGI and motion capture technologies to create human-like characters. Advances in motion capture, translating human actions into digital models, were critical in developing digital actors that convincingly emulate real human emotions and interact with live actors on screen (Gratch et al.). Building on earlier developments, this period saw significant advancements in digital doubles, which are highly realistic digital replicas of actors created using motion capture and digital modelling techniques. This progress was exemplified by The Matrix Reloaded (2003) and The Curious Case of Benjamin Button (2008). These films leveraged sophisticated motion capture to create detailed digital replicas of actors, refining digital doubles in mainstream cinema (Deguzman). Characters like Gollum from the Lord of the Rings trilogy showcased this technology's peak by combining motion capture with digital modelling to perform complex emotional roles alongside live actors (Patterson). Alongside these developments was the exploration of Autonomous Digital Actors (ADAs), integral to virtual actors and interactive media, extensively documented in research. ADAs represent significant advancements in digital media and interactive entertainment, offering novel methods for creating and animating 3D characters (Perlin and Seidman). These virtual actors can perform complex scenes autonomously, using procedural animation to respond to dynamic directions without pre-scripted motions, enriching interaction and storytelling (Iurgel, da Silva, and dos Santos). This technology allowed for cost-effective and versatile character animation, potentially transforming industries from gaming to educational software by enabling more nuanced and emotionally responsive character interactions. From 2017 onwards, deepfake technology captured public attention for convincingly—if controversially—manipulating video and audio, serving as both a precursor and foundational element for more sophisticated digital actors (Sample). Originally, deepfake technology focussed on manipulating video and audio recordings. Utilising machine learning and sophisticated algorithms, deepfakes could alter facial expressions, sync lips, or replace faces entirely (Pavis 976). This required understanding the video's three-dimensional space to apply realistic modifications, conducted during lengthy post-production workflows involving multiple VFX artists. In The Book of Boba Fett (2021), deepfake technology enabled the realistic portrayal of a youthful Mark Hamill as Luke Skywalker. The technique merged over 80 shots of deepfakes, CG heads, a body double, and Hamill's own performance to seamlessly depict his younger self (Bacon; Industrial Light & Magic). From pioneering CGI in the 1970s to sophisticated digital doubles in the early 2000s, the trajectory of visual storytelling has led to the advent of the Actotron. This technology has become a mainstay in visual effects and digital character generation, offering means to modify appearance, and age, or enable actors to fulfil different characters within a production (Xu 24). Synthesising these advancements through futurecasting, we consider the Actotron a virtual human tool that democratises filmmaking. To understand how this future operates, we turn to the fictitious but possible scenario of Alex, an imaginative director who harnesses the Actotron to bring cinematic visions to life. The Actotron Scenario Imagine a near future where film production has been revolutionised and democratised by the advent of Actotron technology—an advanced form of virtual human capable of comprehensive autonomous performance. We follow a day in the life of Alex, an aspiring young director with a passion for storytelling and a flair for technology. Alex's day begins in the quiet of her home studio, illuminated by the glow of dual screens. Today, Alex will create the lead character for an upcoming short film. Opening a sophisticated software portal, Alex interacts with a generative AI engine designed to craft an Actotron. Alex inputs desired traits and styles—courageous, empathetic, with a hint of mystery. The artificial intelligence proposes several faces; Alex selects one with captivating eyes and a resolute expression. Next, they sculpt the body—athletic and poised for action. Alex then tests different voice samples presented by the AI, blending them to forge a unique voice that mirrors their character's essence—a calming tone with a resilient undertone. With the character finalised, Alex uploads the script. The Actotron, "Kai", analyses it, intelligently querying to grasp the character's motivations fully. Content with Kai's comprehension, Alex moves to the virtual set. Alex commands, "Action!" and Kai begins the scene. Observing how Kai's expressions shift authentically with each line, Alex notes the performance. After a take, Alex suggests prompt changes—"Let's try it with more surprise on discovering the clue"—and Kai adapts seamlessly. This process repeats, with Alex refining Kai's performance until it aligns with her vision. As the day progresses, Alex introduces more Actotrons into different scenes. She directs interactions between Kai and other virtual actors, creating complex, dynamic exchanges that would be costly and challenging to shoot in a traditional setting. By dusk, Alex reviews the day's footage—digital dailies that can be edited or re-shot, as needed, by discussing it with the Actotron. The flexibility is exhilarating; changes that once would have taken days now happen in minutes. Reflecting on the day, Alex sees the transformative power of Actotron technology as a revolution in filmmaking that democratises cinema. Alex appreciates a future where directors can quickly bring visions to life, making filmmaking accessible, everyday, and diverse, showcasing Actotron's potential to redefine storytelling and innovate production. The Concept of Actotrons as Digital Actors Building on technological advancements, the Actotron is the next step in virtual actors. Unlike predecessors relying on predefined scripts and animations, Actotrons use a modular system combining human appearance and behaviour to create fully customisable, interactive characters, simplifying creation and increasing accessibility. Historically, developing virtual humans was a multidisciplinary challenge integrating complex components like natural language processing, emotional modelling, graphics, and animation (Gratch et al.). Early efforts struggled to achieve believable human-like behaviour due to disparate technologies not designed to work together. Actotrons depart from traditional CGI and deepfake technologies by embracing a modular construction philosophy, revolutionising virtual human creation. This approach offers unprecedented customisation and flexibility, enabling creators to assemble bespoke digital personas for specific needs. Central to Actotron technology is its component-based architecture with interchangeable modules covering human attributes: Visual Appearance: Modules for facial features, skin tones, and body shapes enable diverse identities, from unique characters to archetypes. Vocal Characteristics: Offers various voice modulations, accents, and language fluencies for role-specific needs. Kinetic Abilities: Motion capture libraries provide diverse movements and gestures, enabling realistic performances from athletic feats to nuanced expressions. AI-Driven Encapsulation and Integration What fundamentally distinguishes the Actotron from its predecessors is the sophisticated AI that seamlessly encapsulates and orchestrates various components into a coherent entity. Actotron technology embraces creating virtual actors, using AI to dynamically synchronise models, movements, and expressions in real time, which is difficult today (Gratch et al.). This encapsulation into an "AI-entity" via plug-and-play components dynamically integrates multiple inputs, ensuring the Actotron's movements, voice, and emotional expressions are perfectly synchronised and respond in real time to situational changes. This advanced capability enhances the Actotron's realism and allows instant adaptation to directorial inputs or script changes, offering interactivity unmatched by traditional virtual human technology. Integrating generative AI—like that developed by Google and NVIDIA—into Actotron technology allows this sophisticated level of dynamic interaction. For example, NVIDIA's development of digital humans interacting in real time shows that AI-driven systems can handle complex inputs and generate lifelike responses (Burnes, "NVIDIA & Developers"). Moreover, these AI systems' ability to simulate detailed human emotions is enhanced by leading GPT chat technology, as seen in Unreal Engine's real-time digital human rendering (Burnes, "NVIDIA Digital Human"). This technology captures subtle human nuances, enabling AI to produce characters that mimic basic actions and convey deep emotional expressions. Modern crowd simulation tools such as the HiDAC (High-Density Autonomous Crowds) system further demonstrate advancements in creating lifelike digital behaviours. Recent enhancements include the integration of human personality models, notably the OCEAN framework—Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism—to improve the authenticity and diversity of virtual agents' behaviours. Research indicates that mapping OCEAN traits to agent parameters in HiDAC can generate nuanced crowd dynamics, enhancing the realism of character simulations (Pelechano). Incorporating these personality models into Actotron performances could yield virtual characters with more complex and varied behaviours, enriching the landscape of digital storytelling. However, this development also prompts ethical questions about designing virtual entities with human-like psychological profiles, suggesting a need for further exploration of their societal impact. A key feature of modularised aspects encapsulated as an AI entity is its ability to learn and evolve from processed data, akin to generative AI used by Google to drive hardware robots (Vanhoucke). However, in Actotrons, this AI entity does not control a physical robot but drives a digital entity that can interact within any narrative framework created (Vanhoucke; NVIDIA Developer). This AI's ability to integrate and synthesise human-like attributes from data makes Actotrons versatile, and capable of diverse performances without needing a human actor behind the scenes. Adjusting tone of voice to match emotional settings or altering physical responses to script changes, the generative AI-entity in Actotron technology handles it seamlessly, pushing digital storytelling boundaries (NVIDIA Developer). Generative AI models, such as those discussed by Vincent Vanhoucke of Google DeepMind, adeptly process vast amounts of data and learn to improve over time (NVIDIA Developer). This AI could analyse feedback from Actotron performances to refine actions and expressions, ensuring each iteration is more nuanced than the last. These advancements highlight AI's transformative impact in digital acting, where Actotrons equipped with such technologies will set new standards for virtual performances. An Actotron will combine various human traits into a single, versatile model that can perform dynamically and respond in real-time (see Table 1). Capability Description Motion Capture Captures subtle human movements for realistic animations. 3D Modelling Provides detailed body shapes for diverse appearances. Facial Animation Creates expressions and emotions with high-fidelity models. Voice Synthesis Generates lifelike speech patterns. Tab. 1: Capabilities of Actotron Technology. Implications and Considerations of Actotron Tech Actotrons, synthesised using advanced AI algorithms, represent a revolutionary step in digital actor technology. These self-contained, autonomous digital actors could interact within any virtual environment, delivering dynamic, context-aware performances directed by digital creators. This would mark a significant departure from the static manipulations of earlier technologies like deepfakes and traditional CGI, which are currently pre-rendered, allowing Actotrons to redefine traditional roles in cinema, gaming, and virtual reality by operating in real-time and dynamically like a real actor. The modular design of Actotrons could offer unmatched flexibility, enabling directors to adapt these virtual actors for various roles across different media without starting from scratch for each project. This reduces production costs and development time and enables rapid adjustments to feedback, enhancing responsiveness in environments where changes are costly and time-consuming (Pulliam-Moore). Additionally, by utilising generic modules that do not directly copy real individuals, Actotrons could circumvent ethical and copyright issues associated with digital likenesses (Roth). The Actotron democratises acting and performance by providing capabilities at the desktop level and on demand. By moving beyond the limitations of deepfake technology and traditional CGI, the modularised Actotron technology embodies a new era in creating virtual humans, but this necessitates ongoing discussions about their ethical, legal, and social implications. AI creation of personalities and celebrities' voices, likenesses, and styles, such as examples like AI Drake and The Weeknd (Coscarelli), Pope Francis's generative AI image puffer jacket (Huang), and Tom Hanks's dental plan AI deepfake (Taylor), present challenges to ethical and legal spaces which Actotron technology would only amplify. The recent dispute between studios and SAG-AFTRA over the rights to actors' digitally scanned likenesses and AI use highlights the significant tensions surrounding virtual human technology and creative performance (Pulliam-Moore). Conclusion Futurecasting suggests the Actotron is the next evolution of virtual actors, heralding a new era in creative industries by integrating generative AI to enhance performances. AI enables Actotrons to deliver dynamic performances, deepening engagement and expanding creativity. Lifelike animations allow complex storytelling previously unattainable due to cost or technical constraints. Economically, AI reduces reliance on human actors, cutting costs and increasing efficiency. However, this raises concerns about job displacement and challenges regarding AI's authenticity and ethics in art. Advancing AI promises innovative, interactive viewer experiences and democratises content creation, empowering untrained individuals to produce sophisticated works. This convergence will drive discussions on the future of creativity and labour in the digital age. References Bacon, T. "Why Luke’s CGI in Boba Fett Is So Much Better (Explained Properly)." Screenrant, 4 May 2022. <https://screenrant.com/book-boba-fett-luke-skywalker-cgi-hamill-improved-explained/>. Brown, S. Futurecasting: A White Paper by Steve Brown, CEO of Possibility and Purpose, LLC. 26 Sep. 2024 <https://static1.squarespace.com/static/54beba03e4b0cb3353d443df/t/57c76f39ff7c50f29964a8d1/ 1472687934439/Futurecasting_white+paper.pdf>. Burnes, A. "NVIDIA & Developers Pioneer Lifelike Digital Characters for Games and Applications with NVIDIA ACE." Nvidia Blog, 8 Jan. 2024. <https://www.nvidia.com/en-us/geforce/news/nvidia-ace-architecture-ai-npc-personalities/>. ———. "NVIDIA Digital Human Technologies Bring AI Game Characters to Life." Nvidia Blog, 19 Mar. 2024. <https://www.nvidia.com/en-us/geforce/news/nvidia-ace-gdc-gtc-2024-ai-character-game-and-app-demo-videos/>. Coscarelli, J. "An AI Hit of Fake ‘Drake’ and ‘The Weeknd’ Rattles the Music World." The New York Times, 19 Apr. 2023. <https://www.nytimes.com/2023/04/19/arts/music/ai-drake-the-weeknd-fake.html>. Das, S. "The Evolution of Visual Effects in Cinema: A Journey from Practical Effects to CGI." Journal of Emerging Technologies and Innovative Research 10.11 (2023): 303–309. Deguzman, K. "What Is Mocap—The Science and Art behind Motion Capture." Studiobinder, 7 Nov. 2021. <https://www.studiobinder.com/blog/what-is-mocap-definition/>. Gratch, J., et al. "Creating Interactive Virtual Humans: Some Assembly Required." IEEE Intelligent Systems 17.4 (2002): 54–63. <https://doi.org/10.1109/mis.2002.1024753>. Huang, K. "Why Pope Francis Is the Star of AI-Generated Photos." New York Times, 8 Apr. 2023. <https://www.nytimes.com/2023/04/08/technology/ai-photos-pope-francis.html>. Industrial Light and Magic. "Behind the Magic | The Visual Effects of The Book of Boba Fett." YouTube, 20 Aug. 2022. <https://www.youtube.com/watch?v=M74Jb8iggew>. Iurgel, I.A., da Silva, R.E., and dos Santos, M.F.. "Towards Virtual Actors for Acting Out Stories." Entertainment for Education: Digital Techniques and Systems 6249 (2010): 570–581. Miller, R. "From Trends to Futures Literacy: Reclaiming the Future." Centre of Strategic Education, 2006. 1-20. <https://doi.org/10.13140/2.1.2214.4329>. ———. "Futures Studies, Scenarios, and the 'Possibility-Space' Approach." Think Scenarios, Rethink Education. OECD Publishing, 2006. 93–105. <https://doi.org/10.1787/9789264023642-7-en>. NVIDIA Developer. "Robotics in the Age of Generative AI with Vincent Vanhoucke, Google DeepMind | NVIDIA GTC 2024." YouTube, 12 Apr. 2024. <https://www.youtube.com/watch?v=vOrhfyMe_EQ>. Pavis, M. "Rebalancing Our Regulatory Response to Deepfakes with Performers’ Rights." Convergence: The International Journal of Research into New Media Technologies 27.4 (2021): 974–998. <https://doi.org/10.1177/13548565211033418>. Pelechano, Nuria. “How the Ocean Personality Model Affects the Perception of Crowds.” IEEE Computer Graphics and Applications (2011). Perlin, K., and Seidman, G. "Motion in Games." First International Workshop, MIG 2008, Utrecht, The Netherlands, 14-17 June 2008. Revised Papers. Lecture Notes in Computer Science, 2008. 246–255. <https://doi.org/10.1007/978-3-540-89220-5_24>. Pulliam-Moore, C. "SAG Strike Negotiations Have Once Again Dissolved over the Use of AI." The Verge, 8 Nov. 2023. <https://www.theverge.com/2023/11/7/23950491/sag-aftra-amptp-ai-negotiations-strike-actor-likenes>. Roth, E. "James Earl Jones Lets AI Take Over the Voice of Darth Vader." The Verge, 25 Sep. 2022. <https://www.theverge.com/2022/9/24/23370097/darth-vader-james-earl-jones-obi-wan-kenobi-star-wars-ai-disney-lucasfilm>. Sample, I. "What Are Deepfakes – and How Can You Spot Them?" The Guardian, 13 Jan. 2020. <https://www.theguardian.com/technology/2020/jan/13/what-are-deepfakes-and-how-can-you-spot-them>. Taylor, D.B. "Tom Hanks Warns of Dental Ad Using A.I. Version of Him." New York Times, 2 Oct. 2023. <https://www.nytimes.com/2023/10/02/technology/tom-hanks-ai-dental-video.html/>. Vanhoucke, V. "What Is RT-2: Google DeepMind’s Vision-Language-Action Model for Robotic Actions." Google AI Blog, 28 July 2023. <https://deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/>. Xu, K. "The Application and Challenges of Face-Swapping Technology in the Film Industry." Highlights in Business, Economics and Management 29 (2024).
Referência(s)