In George Orwell’s 1984, music is a tool of the totalitarian state. The working classes are kept docile with sentimental songs “composed entirely by mechanical means on a special kind of kaleidoscope”.
It’s later than Orwell predicted, and there are no kaleidoscopes involved, but machine-made music is now very much a thing (for some funny, weird and slightly scary examples, check out https://openai.com/blog/jukebox/). Many people see this as a threat to creativity. If music can be produced without human involvement, and we can’t tell the difference, where does that leave musicians?
Like any new technology, AI in music does have the potential to be misused. However, new technology also has the potential to make our lives better. What if, instead of making human creativity redundant, artificial intelligence could short-circuit some of the things that stand in the way of creativity?
Wants and needs
There are skills that musicians learn because they are at the core of what we do, such as singing, songwriting and playing instruments. There are also skills that musicians learn out of necessity, though, and that includes recording and mixing our own music. It can be satisfying, but it can also be slow and immensely frustrating. When inspiration strikes, we want to capture it as quickly as possible, not spend hours worrying about whether the bass synth is clashing with the kick drum.
So, how would it be if artificial intelligence could take over the mundane and boring parts of audio engineering, and leave us free to focus on the creative aspects? That’s the key aim behind the plug-ins in Focusrite’s FAST suite, developed in conjunction with Sonible. At this point you might be thinking that surely all plug-ins are designed to make our lives easier. What can AI do that existing plug-ins can’t?
Why AI is different
Let’s consider two ways of developing a plug-in that automates some part of the mixing process. One approach would be for a human programmer to study the decisions a human sound engineer makes, and try to turn those into a set of rules. As Sonible’s Alexander Wankhammer explains, a plug-in like this is completely predictable in its behaviour, but it’s only as good as the rules we can devise. “In traditional algorithm design you know exactly what an algorithm will do with a certain input signal, because every processing step is designed and implemented by hand. The nice thing about that is that everything is fully transparent, and you can control every little detail of a certain logic. However, this transparency can also be a limiting factor since you have to fully understand and model every aspect of a certain problem.”
The big advantage of the AI approach is that it escapes this limiting factor. A machine-learning algorithm is ‘trained’ by feeding in lots of examples of good and bad sounds, and it eventually learns how to process new examples to match the ‘good’ set. It isn’t using rules that we have fed it, it’s inventing its own rules. Ultimately, these could potentially be far more subtle and complex than anything a human developer could come up with.
“In contrast to traditional algorithms, systems based on machine learning are content-aware,” continues Alexander. “That means that they can observe and interpret data, not simply process it based on some fixed logic. When allowing a system to learn from data, the system is able to come up with an internal representation and interpretation of the data. The great thing about that is that you don’t have to fully understand why a system is doing something – it just does it, and it works. And as we’ve seen over the last couple of years in many different fields, approaches based on such a concept are able to tackle problems that simply couldn’t be solved with traditional algorithms.”
The black box problem
The idea that an AI system might not be understood even by its creators is one of the things that has people worried. If a self-driving car drives off the edge of a cliff, we want to know why it did that, and how to stop it happening again. It’s a concern that Alexander is ready to acknowledge: “‘Black box’ approaches like Deep Neural Networks simply get a huge amount of training data and find a way to map the input data to some desired output. The huge benefit of these systems is their ability to solve extremely complex problems without any modelling assumptions. The down side is that even the engineers who developed the system cannot fully predict the behaviour of the system when presented with new data – and the data used to train the system is crucial. This means that the system is fully data driven and it’s rather hard to tune or control the processing.”
However, as Alexander explains, not all AI is about creating ‘black box’ systems. In a ‘rule-based’ system, the developers choose what parameters should be available, and what the machine learns is how to adjust these for best results. Most of the parameters in FAST Equaliser and FAST Compressor, for example, are exactly the same ones you’ll find in conventional equalisers and compressors. The AI element of these plug-ins lies in analysing the source and suggesting appropriate settings for these parameters. An experienced engineer can use these plug-ins exactly like any traditional equaliser or compressor; a less experienced one benefits from the guidance and insight of the AI.
Isn’t that inexperience part of the creative process that keeps music fresh, though? When new engineers and musicians don’t know how tools are ‘supposed’ to be used, sometimes they stumble on new and compelling sounds. Won’t an AI system trained on sounds from the past stifle that innovation, and steer everything towards sounding like something that already exists? Not at all, says Alexander. “Although AI tools typically learn from existing samples, that doesn’t mean that they are only able to exactly reproduce what they ‘saw’. The tools learn how to interpret or modify new data based on examples, but they are not simply mimicking these examples.
“Let’s take the example of an intelligent equaliser like the FAST Equaliser. The tool helps to quickly fix typical technical problems in the signal, because it knows from its training data what a ‘good’ signal typically looks like. But that doesn’t mean that the tool tries to make everything sound the same. It simply learned how to interpret audio data — in case of an EQ, particularly its spectral content – and how to find and fix problems in the data. So FAST Equaliser may clean up problems in a signal, but the creative part, the final style and character that make it unique, still lies in the hands of the user.”
Could an AI mixing algorithm even learn to mimic the moves of a specific human engineer, giving us a virtual Chris Lord-Alge, Susan Rogers or Andy Wallace? “Yes and no,” replies Alexander. “If a system was trained with data from these specific engineers it would indeed be able to learn their specific ways of ‘optimising’ signals. At the same time, though, the AI system would never be able to fully reproduce the results of such an engineer, since a lot of creative decisions will be very specific for a certain song, artist or even the mood of the engineer. So, yes, it’s possible to give an AI system a ‘twist’ towards a certain mixing style, but the system would not (and should not) be able to fully copy the decisions and work of a world-class mixing engineer.”
To completely recreate what a human engineer does, an AI system would also have to take decisions in context. When we’re deciding on EQ or compressor settings, what matters is how the mix as a whole sounds, not what the individual tracks sound like when they’re soloed. That’s not the goal of the FAST plug-ins; rather, the idea is to suggest starting points from which it’s easier to make those contextual decisions. FAST Equaliser, says Alexander, “will find the most balanced and clean sound for the track – no matter what’s happening on other channels. That’s why an intelligent equaliser is not a fully automatic mixing system. Still, having multiple tracks with a nicely balanced and clean sound is – most often – a surprisingly good starting point for a mix.”
Nevertheless, there are many situations where two sounds fight for the same real estate in our mixes, kick drum and bass guitar being the classic example. For this reason, the FAST suite includes a plug-in called FAST Reveal that can compare sources from two different tracks and balance them in the mix, so that one remains audibly in the foreground.
Power to spare
Ultimately, then, the idea of the FAST Equaliser and FAST Compressor is to speed up the initial stages of the mix, and get you more quickly to a point where you can start to make creative decisions. However, AI can be very demanding on computer resources, so will you actually have any CPU cycles left once you’ve reached this point? As Alexander explains, that needn’t be a big problem in practice.
“The most computationally intensive processing of an AI system typically happens during its training phase. But since this happens during development, that’s not a problem for the user. Once a system is fully ‘trained’, model-based systems can be quite lightweight, though systems based on deep learning are almost always computationally heavy. Still, depending on the task at hand, that doesn’t mean that the processing eats up all CPU resources. For example, a tool may analyse a signal over a certain period of time and during the anlaysis it needs more resources. The good thing is that the analysis is typically not time-critical, so the computational load can be spread over the whole analysis period. Once a system has learned its parameters, it typically doesn’t need much more resources than ‘classical’ algorithms.”
A force for good
We think that the FAST plug-ins represent a genuine breakthrough. They won’t mix your track for you, but they will make it quicker and easier for you to arrive at the mix that you want. For those who are concerned about the effect on human creativity, it might be worth looking back to 1982.
Two years before Orwell’s dystopian masterpiece was set, another new technology was making its presence felt. Musicians feared being put out of a job, and on May 20th of that year, the Central London Branch of the Musicians’ Union passed a resolution calling for an end to the practice of recreating real instruments by electronic means.
They didn’t succeed in banning sampling and, contrary to their fears, sampling didn’t put symphony orchestras out of work. Instead, instruments such as the Fairlight and Emulator kickstarted a golden age in pop music. Sampling became the foundation of rap, hip-hop, EDM, house, techno, dubstep, drum & bass and much more. Neither its inventors nor its opponents could have foreseen the uses human creativity would find for this new technology — but who now would wish that sampling had never been invented?
From the electric guitar to the MIDI sequencer, new tools haven’t stifled creativity. Instead, they’ve opened up new ways for us to express it — and AI is no different.
Words: Sam Pryor
Photo credit: H. Heyerlein