DeepFake videos pose significant challenges to conventional modes of viewing. Indeed, the use of machine learning algorithms in these videos’ production complicates not only traditional forms of moving-image media but also deeply anchored phenomenological categories and structures. By paying close attention to the exchange of energies around these videos, including the consumption of energy in their production but especially the investment of energy on the part of the viewer struggling to discern the provenance and veracity of such images, we discover a mode of viewing that both recalls pre-cinematic forms of fascination while relocating them in a decisively post-cinematic field. The human perceiver no longer stands clearly opposite the image object but instead interfaces with the spectacle at a pre-subjective level that approximates the nonhuman processing of visual information known as machine vision. While the depth referenced in the name “deep fake” is that of “deep learning,” the aesthetic engagement with these videos implicates an intervention in the depths of embodied sensibility—at the level of what Merleau-Ponty referred to as the “inner diaphragm” that precedes stimulus and response or the distinction of subject and intentional object. While the overt visual thematics of these videos is often highly gendered (their most prominent examples being so-called “involuntary synthetic pornography” targeting mostly women), viewers are also subject to affective syntheses and pre-subjective blurrings that, beyond the level of representation, open their bodies to fleshly “ungenderings” (Hortense Spillers) and re-typifications with far-reaching consequences for both race and gender.
Let me try to demonstrate these claims. To begin with, DeepFake videos are a species of what I have called discorrelated images, in that they trade crucially on the incommensurable scales and temporalities of computational processing, which altogether defies capture as the object of human perception (or the “fundamental correlation between noesis and noema,” as Hussserl puts it). To be sure, DeepFakes, like many other forms of discorrelated images, still present something to us that is recognizable as an image. But in them, perception has become something of a by-product, a precipitate form or supplement to the invisible operations that occur in and through them. We can get a glimpse of such discorrelation by noticing how such images fail to conform or settle into stable forms or patterns, how they resist their own condensation into integral perceptual objects—for example, the way that they blur figure/ground distinctions.
The article widely credited with making the DeepFake phenomenon known to wider public in December 2017 notes with regard to a fake porn video featuring Gal Gadot: “a box occasionally appeared around her face where the original image peeks through, and her mouth and eyes don’t quite line up to the words the actress is saying—but if you squint a little and suspend your belief, it might as well be Gadot.” There’s something telling about the formulation, which hinges the success of the DeepFake not on the suspension of disbelief—a suppression of active resistance—but on the suspension of belief—seemingly, a more casual form of affirmation—whereby the flickering reversals of figure and ground, or of subject and object, are flattened out into a smooth indifference.
In this regard, DeepFake videos are worth comparing to another type of discorrelated image: the digital lens flare, which is both to-be-looked-at (as a virtuosic display of technical achievement) and to-be-overlooked (after all, the height of their technical achievement is reached when they can appear as transparently naturalized simulations of a physical camera’s optical properties). The tension between opacity and transparency, or objecthood and invisibility, is never fully resolved, thus undermining a clear distinction between diegetic and medial or material levels of reality. Is the virtual camera that registers the simulated lens flare to be seen as part of the world represented on screen, or as part of the machinery responsible for revealing it to us? The answer, it seems, must be both. And in this, such images embody something like what Neil Harris termed the “operational aesthetic” that characterized nineteenth-century science and technology expos, magic shows, and early cinema alike; in these contexts, spectatorial attention oscillated between the surface phenomenon, the visual spectacle of a machine or a magician in motion, and the hidden operations that made the spectacle possible.
It was such a dual or split attention that powered early film as a “cinema of attractions,” where viewers came to see the Cinematographe in action, as much as or more than they came to see images of workers leaving the factory or a train arriving at the station. And it is in light of this operational aesthetic that spectators found themselves focusing on the wind rustling in the trees or the waves lapping at the rocks—phenomena supposedly marginal to the main objects of visual interest.
DeepFakes also trade essentially on an operational aesthetic, or a dispersal of attention between visual surface and the algorithmic operation of machine learning. However, I would argue that the post-cinematic processes to whose operation DeepFakes refer our attention fundamentally transform the operational aesthetic, relocating it from the oscillations of attention that we see in the cinema to a deep, pre-attentional level that computation taps into with its microtemporal speed.
Consider the way digital glitches undo figure/ground distinctions. Whereas the cinematic image offered viewers opportunities to shift their attention from one figure to another and from these figures to the ground of the screen and projector enabling them, the digital glitch refuses to settle into the role either of figure or of ground. It is, simply, both—it stands out, figurally, as the pixely appearance of the substratal ground itself. Even more fundamentally, though, it points to the inadequacy, which is not to say dispensibility, of human perception and attention with respect to algorithmic processing. While the glitch’s visual appearance effects a deformation of the spatial categories of figure and ground, it does so on the basis of a temporal mismatch between human perception and algorithmic processing. The latter, operating at a scale measured in nanoseconds, by far outstrips the window of perception and subjectivity, so that by the time the subject shows up to perceive the glitch, the “object” (so to speak) has already acted upon our presubjective sensibilities and moved on. This is why glitches, compression artifacts, and other discorrelated images are not even bound to appear to us as visual phenomena in the first place in order to exert a material force on us. Another way to account for this is to say that the visually-subjectively delineated distinction between figure and ground itself depends on the deeper ground of presubjective embodiment, and it is the latter that defines for us our spatial situations and temporal potentialities. DeepFakes, like other discorrelated images, are able to dis-integrate coherent spatial forms so radically because they undercut the temporal window within which visual perception occurs. The operation at the heart of their operational aesthetic is itself an operationalization of the flesh, prior to its delineation into subjective and objective forms of corporeality. The seamfulness of DeepFakes—their occasional glitchy appearance or just the threat or presentiment that they might announce themselves as such—points to our fleshly imbrication with technical images today, which is to say: to the recoding not only of aesthetic form but of embodied aesthesis itself.
In other words: especially and as long as they still routinely fail to cohere as seamless suturings of viewing subjects together with visible objects, but instead retain their potential to fall apart at the seams and thus still require a suspension of belief, DeepFake videos are capable of calling attention to the ways that attention itself is bypassed, providing aesthetic form to the substratal interface between contemporary technics and embodied aesthesis. To be clear, and lest there be any mistake about it, I in no way wish to celebrate DeepFakes as a liberating media-technology, the way that the disruption of narrative by cinematic self-reflexivity was sometimes celebrated as opening a space where structuring ideologies gave way to an experience of materiality and the dissolution of the subject positions inscribed and interpellated by the apparatus. No amount of glitchy seamfulness will undo the gendered violence inflicted, mostly upon women, in involuntary synthetic pornography. Not only that, but the pleasure taken by viewers in their consumption of this violence seems to depend, at least in part, precisely on the failure or incompleteness of the spectacle: what such viewers desire is not to be tricked into actually believing that it is Gal Gadot or their ex-girlfriend that they are seeing on the screen, but precisely that it is a fake likeness or simulation, still open to glitches, upon which the operational aesthetic depends. Nevertheless, we should not look away from the paradoxical opening signaled by these viewers’ suspension of belief. The fact that they have to “squint a little” to complete the gendered fantasy of domination also means that they have to compromise, at least to a certain degree or for a short duration, their subjective mastery of the visual object, that they have to abdicate their own subjective ownership of their bodies as the bearers of experience. Though it is hard to believe that any trace of conscious awareness of it remains, much less that viewers will be reformed as a result of the experience, it seems reasonable to believe that viewers of DeepFake videos must experience at least an inkling of their own undoing as their de-subjectivized vision interfaces with the ahuman operation of machine vision.
What I am saying, then, and I am trying to be careful about how I say it, is that DeepFake videos open the door, experientially, to a highly problematic space in which our predictive technologies participate in processes of subjectivation by outpacing the subject, anticipating the subject, and intervening materially in the pre-personal realm of the flesh, out of which subjectivized and socially “typified” bodies emerge. The late Sartre, writing in the Critique of Dialectical Reason, defined commodities and the built environment in terms of the “practico-inert,” in light of the ways that “worked matter” stored past human praxis but condensed it into inert physical form. Around these objects, increasingly standardized through industrial capitalism’s serialized production processes, are arrayed alienated and impotent social collectives of interchangeable, fungible subjects. Compellingly, feminist philosopher Iris Marion Young takes Sartre’s argument as the basis for rethinking gender as a non-essentialist formation, a nascent collectivity, that is imposed on bodies materially—through architecture, clothing, and gender-specific objects that serve to enforce patriarchy and heterosexism. The practico-inert, in other words, participated in the gendered typification of the body—and we could extend the argument to racialization processes as well. But the computational infrastructures of today’s built environment are no longer adequately captured by the concept of the practico-inert. These infrastructures and objects are still the products of praxis, but they are far from inert. In their predictive and interactive operations, they are better thought of under the concept of the practico-alert—they are highly active, always on alert, and like the viewers of DeepFake videos on the lookout for a telling glitch, so are we ever and exhaustingly on the alert. In these circuits, which are located deeper than subjective attention, the standardization and typification processes I just mentioned are more fine-grained, more “personalized” or targeted, operating directly on the presubjective flesh. In this sense, the flattening of subjectivity, the suspension of belief and depersonalization of vision in DeepFake videos, points towards the contemporary “ungendering” of the flesh, as Hortense Spillers calls it in a different context, that marks a preliminary step in the computational intensification of racialized and gendered subjectivization. This is a truly insidious aesthetics of the flesh.Sartre and practico-inert — updated to practico-alert; cf. gender via Iris Marion Young: typification (or serialization) via practico-inert. Now a more direct, because immeasurably fast, operation on presubjective flesh.