Artigo Acesso aberto Revisado por pares

The Allure of Artificial Worlds

2024; Queensland University of Technology; Volume: 27; Issue: 6 Linguagem: Inglês

10.5204/mcj.3105

ISSN

1441-2616

Autores

Daniel Binns,

Tópico(s)

Space Science and Extraterrestrial Life

Resumo

Fig. 1: ‘Vapourwave Hall’, generated by the author using Leonardo.Ai, 2024. Introduction With generative AI (genAI) and its outputs, visual and aural cultures are grappling with new practices in storytelling, artistic expression, and meme-farming. Some artists and commentators sit firmly on the critical side of the discourse, citing valid concerns around utility, longevity, and ethics. But more spurious judgements abound, particularly when it comes to quality and artistic value. This article presents and explores AI-generated audiovisual media and AI-driven simulative systems as worlds: virtual technocultural composites, assemblages of material and meaning. In doing so, this piece seeks to consider how new genAI expressions and applications challenge traditional notions of narrative, immersion, and reality. What ‘worlds’ do these synthetic media hint at or create? And by what processes of visualisation, mediation, and aisthesis do they operate on the viewer? I suggest here that these AI worlds offer a glimpse of a future aesthetic, where the lines between authentic and artificial are blurred, and the human and the machinic are irrevocably enmeshed across society and culture. Where the uncanny is not the exception, but the rule. Analytic Survey The term ‘composite’ is co-opted here from Lisa Purse, whose writings have become perhaps inadvertent champions of digital augmentation and visual effects in film. The critical and academic response to AI media is not dissimilar from that to the advent of high-concept, visual effects-laden, digitally-encoded cinema. An “overdetermined nexus of loss”, Purse dubs the digital screen, “of material presence, of an indexical relation to the world and lived experience” (Purse 149). James Verdon says that there is an incontrovertible “indexical severance when pro- or a-filmic reality is recorded or manipulated digitally” (Verdon 197), and photography and cinema seemingly continue to struggle with this severance. In terms of AI media, though, there is no harsh ‘severance’ with which to grapple; the dilemma is much more existential, in that the ‘real’ of these objects never existed. Despite their often realistic outputs, AI media still possess an eerie, uncanny quality. Some scholars suggest that the result is a kind of ‘haunted’ media: the main thing haunting these new AI images is actually the camera itself, rendered a ghost now by its total absence from the new medium, a seemingly unnecessary anachronism, but one that nevertheless exerts a strong spectral influence on everything that is generated. (Schofield 17) Andreas Ervik also observes this spectral influence in how generation models structure their outputs with a clear predisposition towards older forms of media. Ervik calls these images without ‘real’ referents “views of nowhere”, offering the example of “an ahistorical emulation of the general vibe of classical portrait painting” (Ervik 83-4). This uncanniness persists in abstract or glitched AI outputs as well as in those that are more realistically rendered. There is always some trace of something recognisable or tangible, be that a human feature, a graphical element, or an assortment of colours. AI media, tools, and applications are lingering, surviving, and in the process are changing visual culture, particularly in terms of aesthetics. Shane Denson notes that AI tools and generators are “dissolving the industrial-era wedge between art and tech” (Denson 147), leading to a profound shift in aisthesis: these ... technologies are transforming the domain of sensation itself, opening up new objects of perceptual and cognitive experience, and changing the scope and parameters of embodied relation to the environment. (Denson 147) Denson’s general argument is that, love them or loathe them, the power of AI media lies in their visceral impact, rather than the technical accomplishment of their generation or any immanent artistic value or quality. The outputs of these AI models are received and filtered through the body and mind; they are thus triggers for ‘felt’ experience, where the viewer is immersed, if only momentarily, in a hallucinated reality. What kinds of felt experiences can AI media conjure, and through what mechanisms do these conjurings occur? How might these experiences change our understanding of reality and representation when we return to everyday life? Eryk Salvaggio writes that “when we look at AI images ... we are looking at infographics about these datasets, including their categories, biases, and stereotypes” (Salvaggio, "How to Read" 87). Salvaggio’s process-driven analysis is based on his experience with Stable Diffusion, where concepts or forms that are highly represented in the model’s training data emerge more clearly in outputs than less-represented concepts. This notion remains true for most media generators, particularly in the realm of image and video. In Salvaggio’s creative work, too, he finds that more provocative results come in pushing at the less-represented, at the gaps between the model’s defined ‘vision’ or ‘understanding’ of the world: AI models, when used as intended, don’t move us away from the bias of human vision, it constrains us to that bias. This bias is infused into training data, a bias that merges images into the categories of their descriptions, reconstructing links between words and what they represent. (Salvaggio, "Moth") Despite the constraints of a bias towards the human, many models have hallucination purposely built in. In these cases, hallucination is a feature rather than a bug. It is hallucination—the introduction of some chaos and randomness into the maths underpinning the mechanisms of generation—that may give that uncanny feeling of human connection when conversing with ChatGPT, but that also causes glitches and unexpected mutations in outputs from media generators. But these glitches, too, are what artists and creatives often gravitate towards in working with generative AI. As Nataliia Laba notes, in engaging with the machinic, one positions oneself as an agent with multiplicities: as a promptor, I assume the role of a social actor engaging in the co-creation of visual outputs alongside [Midjourney’s] bot on Discord. As a researcher, I operate on the premise that AI-generated images are not straightforward extrapolations from existing generative AI technology, but are to be understood as the contingent outcome of a series of social, political, and organizational factors. (Laba 10) Echoing earlier work around algorithmic agency, AI generators and their human users sit at the nexus of technology, mythology, and representation (boyd and Crawford 663). The results of this enmeshment are AI-generated media. The moment of generation, thus, “fixes a unity from scattered data elements, at that same moment fabulating new connections and traits, forging attributes that will attach to other beings in the future” (Amoore 102-3). AI media are not just fixed unities or instances, the results of algorithmic and mathematical operations, but also complex, networked assemblages enacting particular effects. They are, essentially, worlds unto themselves. This is why AI media are so rapidly provoking and affecting visual culture and the media landscape more broadly. Thus, more nuanced analysis is required of their origin and creation, their forms and qualities, and their potential or actual effects. This article looks directly at AI media, and how they are being reconstituted into media artefacts—specifically short films—for consumption, enjoyment, and provocation. Beyond this, I engage with ‘simulative AI’, or rather, AI-driven simulations, and where and how these might attract and engage users. With both films and simulative experiences, I observe how these media have innate agency and power to influence us viscerally and psychologically. They seduce us with their aesthetic and material qualities: that these qualities are synthetic is also, paradoxically, a part of their charm. Two Video Works AI systems like RunwayML and Pika generate moving images by adding a “video-aware temporal backbone” to the process (Blattman et al., 3). These systems are becoming increasingly adept at producing videos of some length, though their outputs remain susceptible to hallucinations and glitching. For many creators, this is precisely where the ‘point’ of AI media lies. Fig. 2: Screen capture from “You Are, Unfortunately, an A.I. Artist” (Mind Wank, 2024). “You Are, Unfortunately, an A.I. Artist” (Mind Wank, 2024) interrogates the legitimacy and artistic value of AI media. Its main character, a small, fluffy, bug-eyed creature in spectacles, toils at their computer, generating endless outputs to edit into small clips to sell via the blockchain. The editing is judicious, avoiding the worst of the morphing or glitching, though some remains, lending an unnatural taint to the creature’s movements and its environment. We observe the pitiful protagonist as though through rippling water or oil, a shifting and dynamic carnival mirror visual filter. The robotic narration does little to allay this feeling of unease, addressing the viewer directly per the work’s title: “You are trying to create something new, something meaningful.” The critical moment of the piece is when the Wi-Fi connection cuts out: “Without Internet, are you really an AI artist? Or are you just some guy? Obviously you have no technical skills, or you would’ve done something else.” The central creature retains its general form, as a fluffy figure, though as it ‘grows’ through the film, it changes somewhat, with different body structures, and occasionally featuring human-like hands. This is likely a result of the AI model filling in some visual information with its best guess. The effect of this, though, works for the narrative; the creature is unstable as a character, but also as perceived by the viewer. Visuals, narration, music—all entirely AI-generated—here combine in a multi-valent experience that is both unnatural and universal. Fig. 3: Screen capture from “Midnight Odyssey” (Ethereal Gwirl and LeMoon, 2024). “Midnight Odyssey” (Ethereal Gwirl and LeMoon, 2024) is a short fantasy film telling the story of a princess born into a world without sunlight. The princess must confront the moon witch Lune, who presents her with three challenges to restore the sun. The creators spent two months crafting the ambitious ten-minute work, making use of tools like Midjourney, Suno, Eleven Labs, and Topaz. The result is an intriguing blend of conventional filmic storytelling and technology-driven hallucination. The aesthetic is dynamic, with the creators clearly prioritising an engaging and varied colour palette, and contrasting lighting elements within each shot. Crystalline rays illuminate the moon witch’s chamber, for instance, bouncing off her glittering gown; this contrasts with the neon-drenched and spot-lit action of the track featured in the racing challenge. The elements—both narrative and visual—that “Midnight Odyssey” presents are on the one hand conventional, but on the other ever-so-slightly askew. The creators have adhered to a ‘handmade’ visual style, with characters and environments appearing material and tangible, almost like puppets on a miniature stage, or stop-motion animation. This justifies or ‘grounds’ some of the quirkier, more glitchy moments per the logic of AI hallucination: one example of this is the sequence where the protagonist has psychedelic visions after an encounter with a talking frog, a sequence seemingly designed with AI errors in mind. The overall piece, too, alludes to earlier media forms through its colour grading, bleached film treatment, and even its 4:3 aspect ratio. Fig. 4: Screen capture from “Midnight Odyssey” (Ethereal Gwirl and LeMoon, 2024). These two video works exemplify both the strengths and weaknesses of AI as a tool of both storytelling and visual representation. The tools can be used to generate a very particular kind of aesthetic; both projects are highly stylised and visually dynamic. Character consistency remains a shortcoming for some AI media generators, and this is apparent in a few scenes from both videos. Similarly, there are significant issues with aesthetic continuity, and there is little to no synchronisation of speech with mouth movement. The generated media rely much on static shots, and on existing conventions of visual storytelling in their blocking and framing of characters, and presentation of settings and environments. This is reflective of the AI tools’ training data, and how representative these data are of tried and tested creative and cultural approaches; the AI’s latent space remixes and reconstitutes these data into composite spaces, novel, unique worlds that are eerily familiar. Per Ervik and Schofield, these worlds are ‘haunted’ by older media forms. These features lend an artificial but hand-crafted character to both videos. These images and videos feel materially attractive: like it would not take much to reach out and touch the screen, to feel the texture of the characters or the setting. This is an example of what Denson describes as “the body subperceptually filtering the extraperceptual stimulus of AI art”. Denson borrows Merleau-Ponty’s term ‘diaphragm’, referring to what it is in a human’s pre-cognitive perception that must react to what is in front of it; he also models AI generators' processing of inputs and outputs “prior to and without regard for any integral conception of subjective or objective form” as its own kind of diaphragm (Denson 154). It is these duelling (or duetting) diaphragms that, I suggest, offer us a shortcut into the worlds AI media present to us, as well as some explanation as to how such transport is possible. As Denson notes, we respond to these media viscerally, corporeally, before we respond cognitively; the bizarre, uncanny, yet inviting tactility of the AI aesthetic is at work on us before we know it. Like all generative AI tools, video generators will continue to improve, in terms of realism, generation length, and integration with other apps. But where is generative AI going more broadly? Not necessarily in terms of the killer app, but in terms of creative or immersive possibilities? The answer may lie not with higher-resolution image outputs, or more contiguous and plausible AI video, but rather in simulation. Simulative AI In 2023, tech start-up Fable Studio released several short Web videos featuring characters from the long-running comedy show South Park. They were all generated using their custom SHOW-1 system, which integrates various custom large language models and image generators. OpenAI’s GPT-4 had already been trained on South Park episodes and metadata, but the Fable team supplemented these data with extra information around decision-making, creativity, and character information. Specific details around the SHOW-1 interface are not presented in the Fable working paper, but they stress the maintenance of human input at key parts of the process. Underlying SHOW-1 is a ‘simulation’, where characters exist in a virtual world, seeking to satisfy their encoded needs based on “affordance providers”. Interactions between characters and providers are recorded by the system, which then generates “Reveries: reflections on each event and their day as a whole” (Maas et al. 7). If all of this sounds a little familiar, Westworld is frequently mentioned as a conceptual inspiration by the Fable team, and as a logical apocalyptic endpoint by critics (Takahashi). The endgame of SHOW-1 is a project called Showrunner, which would allow creators and audiences to track their favourite characters and create consumable content based on their experiences and interactions. This content could take the form of social media posts or more conventional TV-like ‘episodes’; a kind of artificial/virtual reality TV. Fig. 5: Screen capture from “Exit Valley”, Episode 1 (Showrunner, 2024). It is important to note that the development of multi-agent simulations did not begin with Fable Studio; previous work in this area includes a social AI sandbox experiment from Stanford (Waite; Park et al.) and articulations of videogame AI systems (McCoy et al.). Games have proven to be remarkable testing grounds for AI both generative and not, with various approaches employed to influence and govern the behaviour of non-player characters, the direction of narratives, and broader structures like environments and ecosystems. In this vein, the tech company Altera launched Project Sid in 2024. Project Sid is a virtual biome that resembles sandbox games like Minecraft. The biome is populated with hundreds of virtual agents, programmed to pursue their needs similarly to the agents in SHOW-1. Rather than entertainment, necessarily, Altera is using Project Sid to observe how individual agents and agent populations behave over time (Altera, “Project Sid”). The goal is to use these observations to build next-generation AI/ML products that require less human input as they ‘learn’ and develop over time, and then, potentially, become virtual human assistants (Altera, “Building Digital Humans”). There are also broader applications for these kinds of AI-driven simulations, to gain deeper understandings of how humans and other species behave in response to certain stimuli, how they form communities, and how those communities behave, rupture, and break down. In the examples of Showrunner and Project Sid, AI is being employed in the creation of virtual worlds, within which stories might naturally evolve. The visual style of these worlds is almost irrelevant; indeed, as noted, Project Sid is a low-resolution poly environment in the vein of Roblox or Minecraft. What builds the sense of ‘worldness’ in AI-driven simulations (or simulative AI) are the systems at work: be it the needs or behaviours of particular agents (or characters) or groups, or those governing the environment itself, like virtual physics, geology, or weather. These are systems that affect the viewer or user: one is drawn in to observe interactions and conflicts, or one becomes an active participant and subject, doing one’s own interacting, and being affected by the broader environmental systems at play. As with media generators, there may be unexpected phenomena: a miscommunication, an action that is inconsistent or surprising, an unforeseen effect or change. But these simulations are built to absorb these glitches: in the cases of SHOW-1 and Showrunner, these aberrant behaviours may become character or story arcs in and of themselves; with Project Sid, such phenomena may enable observation or analysis of how communities react in real-world situations. These artificial worlds have the capacity to generate—so to speak—very tangible consequences. Conclusion The term ‘artificial’ is often used pejoratively; something seems fake, shallow, superficial, therefore it has a manufactured, confected ‘gloss’. However, it can also be used accurately in describing something intentional and crafted. While some AI media can provoke intense negative reactions, that same provocative power can also seduce. Perfectly-arranged AI worlds offer up a vision of precision; a crafted reality that embodies the ideal version of what the user desires, be it in terms of low-stakes social or romantic interactions, influence over virtual communities, or perfectly-rendered bodies or spaces. These worlds are a hyperreality: the artificial becomes accessible and tangible. Rather than destroying this contiguous sense of ‘worldness’, the glitches or errors typical of AI systems may sometimes enhance this immersive hyperrealism. In a sense, these faults and imperfections become a machinic maker’s mark: we sympathise as fallible creators ourselves. These errors, too, can sometimes create unexpected, surprising, even humorous, outcomes, be it a whimsical background character or nonsensical environmental element, or a baffling encounter with an AI agent. Again, these glitches tend to make these worlds more inviting, enticing, and fun to explore. Like some videogame or cinematic spaces, AI worlds can be seductive because they offer the promise of controlled fantasy, of escape with boundaries. These spaces may be governed by rules encoded to resemble our actual world; the user is then free to inhabit, influence, or interfere as much as they wish. The tension between the deliberately arranged and the imperfect and unexpected offers a fluid, dynamic experience that one may find addictive. While interactions with or between artificial agents may be simulated, users may feel a sense of investment or attachment, making the emotional stakes or effects very real. Despite the very real and valid ethical concerns around the data on which AI systems are trained, it seems that AI media, simulative AI systems, and other AI-driven tools are not just lingering, but becoming embedded and enmeshed across both individual practices and entire industries. Much is being made of AI media’s negative qualities and connotations, but this article has explored how its fantastical visions and interactions are just as intriguing and perhaps even dangerous in their own way. AI worlds offer a sense of control and fulfilment, that notion of bounded fantasy, of ‘safe’ escape. As we find new ways to integrate these synthetic media and simulative systems with our lives, we will doubtless find new points of blurring and friction between the authentic and the artificial. With its quirky blend of the banal and the bonkers, the AI aesthetic of the future will become the now. The uncanny may appear so but no longer feel so; when that happens, who would honestly choose to return to or fully embody the desert of the completely real? References Altera. “Building Digital Humans: Shaping the Future of Human-Machine Interaction.” Substack newsletter. Altera’s Substack, 8 May 2024. 12 Sep. 2024 <https://digitalhumanity.substack.com/p/building-digital-humans>. ———. “Project Sid.” Substack newsletter. Altera’s Substack, 4 Sep. 2024. 12 Sep. 2024 <https://digitalhumanity.substack.com/p/project-sid>. Amoore, Louise. Cloud Ethics: Algorithms and the Attributes of Ourselves and Others. Durham: Duke UP, 2020. Blattmann, Andreas, et al. “Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models.” arXiv, 27 Dec. 2023. 14 Mar. 2024 <http://arxiv.org/abs/2304.08818>. boyd, danah, and Kate Crawford. “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication & Society 15.5 (2012): 662–679. Denson, Shane. “From Sublime Awe to Abject Cringe: On the Embodied Processing of AI Art.” Journal of Visual Culture 22.2 (2023): 146–175. Ervik, Andreas. “The Work of Art in the Age of Multiverse Meme Generativity.” Media Theory 7.2 (2023): 77–102. Ethereal Gwirl & LeMoon. Midnight Odyssey. 2024. 3 Sep. 2024 <https://www.youtube.com/watch?v=sQXpVyXbjY4>. Laba, Nataliia. “Engine for the Imagination? Visual Generative Media and the Issue of Representation.” Media, Culture & Society (2024): 01634437241259950. Maas, Philipp, et al. “To Infinity and Beyond: SHOW-1 and Showrunner Agents in Multi-Agent Simulations.” 24 Jul. 2023. <https://fablestudio.github.io/showrunner-agents/>. McCoy, Josh, Michael Mateas, and Noah Wardrip-Fruin. “Comme il Faut: A System for Simulating Social Games between Autonomous Characters.” After Media: Embodiment and Context. Irvine: Digital Arts and Culture, University of California, 2009. Mind Wank. You Are, Unfortunately, an A.I. Artist. 2024. 3 Sep. 2024. <https://www.youtube.com/watch?v=I2A1TwYlT5g>. Park, Joon Sung, et al.. “Generative Agents: Interactive Simulacra of Human Behavior.” arXiv, 5 Aug. 2023. 12 Sep. 2024 <http://arxiv.org/abs/2304.03442>. Purse, Lisa. “Layered Encounters: Mainstream Cinema and the Disaggregate Digital Composite.” Film-Philosophy 22.2 (2018): 148–167. Salvaggio, Eryk. “How to Read an AI Image: Toward a Media Studies Methodology for the Analysis of Synthetic Images.” IMAGE 37.1 (2023): 83–99. ———. “Moth Glitch.” Cybernetic Forests, 10 Aug. 2024. 2 Sep. 2024 <https://www.cyberneticforests.com/news/moth-glith-2024>. Schofield, Michael Peter. “Camera Phantasma: Reframing Virtual Photographies in the Age of AI.” Convergence: The International Journal of Research into New Media Technologies (2023): 13548565231220314. Showrunner. “Exit Valley.” Showrunner, n.d. 12 Sep. 2024 <https://www.showrunner.xyz/exitvalley>. Takahashi, Dean. “The Simulation by Fable Open Sources AI Tool to Power Westworlds of the Future.” VentureBeat, 20 Dec. 2023. 10 Sep. 2024 <https://venturebeat.com/ai/the-simulation-by-fable-open-sources-ai-tool-to-power-westworlds-of-the-future/>. Verdon, James. “Indexicality or Technological Intermediate? Moving Image Representation, Materiality, and the Real.” Acta Universitatis Sapientiae, Film and Media Studies 12.1 (2016): 191–209. Waite, Thom. “Inside Smallville, the Wholesome Village Populated Solely by AIs.” Dazed, 12 Apr. 2023. 12 Sep. 2024 <https://www.dazeddigital.com/life-culture/article/59633/1/smallville-inside-the-wholesome-village-populated-solely-by-ai-experiment>.

Referência(s)
Altmetric
PlumX