Malte Giesen: Latent diffusion with slop
für Ensemble mit Elektronik und Hochformat-Video
(2025)Interestingly, AI models use the same method to generate music as they do to generate images: latent diffusion. Put simply, this involves gradually adding noise (known as Gaussian noise) to an image, and the model learns the different degrees of image ‘noise’–why this is only a gross simplification is explained below. The model can now also reverse this process and generate a new image from pure noise according to specific prompts. To generate music, spectrograms are used, which can be converted back into sound after generation.
My new piece, ‘Latent diffusion with slop,’ aims to make these processes musically and aesthetically tangible. From a musical-philosophical perspective, acoustic white noise contains all music that has ever existed and will ever exist. I find it fascinating that this idea now also functions technically as a process in the creation of music. The key to the noise generation in the training material is that it is not the pure data itself that is noised, but the data of the so-called latent space, a mathematical multidimensional space in which certain superordinate properties and meanings are represented as points with a certain proximity or distance to each other. ‘Dog,’ for example, is closer to ‘wolf’ than to guinea pig. On a musical level, this means that genuine musical characteristics and meanings are arranged accordingly. This means that adding noise to the music would not simply mean adding white noise to the sound–instead, the noise occurs in a wide variety of musical parameters: rhythm, pitch, dynamics, instrumentation, timbre, form, figuration, harmony, metre, style, genre… etc.
In this respect, the piece is a continuation of algorithmic composition processes and a kind of ‘neural variation movement’. The original musical materials are fragments from an older work for ascolta (Tu M, 2014), relatively generic nu jazz, as the ascolta instrumentation is ideally suited for this, as well as various glitch/noise fragments originating from faulty outputs of various generative audio models.
Since the diffusion method was initially used for images and is now also used for video, I wanted to reintegrate the visual level into the ensemble. The second part of the title, ‘… with slop,’ gives an indication of the direction this is taking. Since the widespread use of generative AI, the internet has been flooded with AI slop–poor/sloppily created images and videos that can be generated in seconds and in large quantities, further inflating the already excessive flood of images on the internet. The ‘dead internet theory’ seems to be becoming a reality. Since most of this slop is viewed on smartphones, the video is integrated into the ensemble in portrait format via TV, as an additional visual player on an equal footing with the musicians playing on stage.
The visual material here consists of viral AI-ASMR videos of cut glass fruits, which are sharper than reality, hyper-aesthetic and hyper-smooth, and later on, equally high-gloss staged musical instruments from the Ascolta ensemble. These short video fragments interact with the ensemble’s sound, acting as a counterpoint or stimulus, as visual dividers of the musical form and as extensions of the electronic means.
(Malte Giesen)