Intelligent Audio Editing Technologies

Intelligent Audio Editing Technologies

Date: 11 Jan 2011
Time: 18:30

Location: Royal Academy of Engineering
3 Carlton House Terrace
London SW1Y 5DG

See below for location map.

Lecture by Dr. Josh Reiss, Senior Lecturer, Centre for Digital Music, Queen Mary University of London.
A recording of the lecture is available here (81MB mp3)

The tools of our trade have transformed in the last twenty years, but the workflow of a mixing engineer is almost the same. A large proportion of the time and effort spent mixing down a multitrack recording is invested not in the execution of creative judgement, but in the mundane manipulation of equalisers, dynamics compressors, panning, and replay levels, so that the timbre and blend of individual channels is correct enough to attempt a balance.

There are two good reasons why much of this work has not already been automated. The first is that the task is not trivial: it is a highly parallel and cross-adaptive problem, and the correct value for every setting will depend to some extent on every other. The second reason is a resistance from those who assume that automating the mixdown process will either remove the requirement for a skilled hand and ear, or result in lazy use of automation to the extent that their careers or their integrity will be threatened. To make all music sound the same is not the goal of automation. Rather, automatic mixing will speed up the repetitive parts of an engineer’s job so that more effort can be expended on the art of production.

We need only look at the evolution of digital cameras to see what could be possible with audio. A typical consumer camera of twenty years ago would have had a fixed focal length and aperture, and perhaps an adjustable shutter speed. Now, multi-point auto-focus is a standard feature, the exposure time, aperture, and colour balance are adjusted automatically, a digital signal processor ameliorates camera shake, and so on. Poor shots may be recognised and retaken as many times as is necessary, because the photographer can immediately view their photograph. In spite of these enhancements, professional photographers still exist, and still need to be taught about the optics and anatomy of a camera. However, the emphasis of photographic discipline has shifted towards the creative side of the profession: there is less time spent setting up the camera and developing exposures, and more time in perfecting the technique and shot, and retouching the images.

There are, broadly speaking, four kinds of automatic sound processing tool:

Adaptive processing. Adaptive processes adjust instantaneously to the material that is being played through them. De-noisers and transient shapers are adaptive in nature.

Automatic processing. Automatic processes place some aspects of operation under user control, and make intelligent guesses about the positions of other controls. The ‘automatic’ mode on a dynamics compressor is such an example.

Cross-adaptive processing. A cross-adaptive tool must be aware of, and react to, every signal within the system. For example, the automatic level control on a public address system that adjusts to the ambient noise level may be cross-adaptive.

Reverse engineering tools. Deconstruction of a mix for historical reasons would involve taking the multitrack session master and the stereo master, and determining which processes must be applied to the former to derive the latter. It would be useful to automate some of this.

Adaptive mixing tools require two components: an accumulative feature extraction process, and a set of constrained control rules. Much of the difficulty of getting these tools right is in obtaining the correct information from the audio in the first place: to detect, for example, the pattern of onsets, the correct loudness, and thus precise masking information. The target for an equaliser can then be to reduce temporal and spectral masking, rather than to aim for a flat frequency response. Panning can be used to reduce spatial masking. A compressor can be inserted when the probability of a particular instrument being heard falls below a certain threshold, and it can be boosted to have a certain average loudness without its peak loudness exceeding a higher threshold.

Dr. Reiss played some examples of automated mixing from the Centre for Digital Music, showing us the system element by element. First, each instrument was manipulated in isolation. Then an automatic fader balance was performed. Finally, with one button, the compressor, equaliser, panning, and fader settings were set up for an entire multitrack jazz recording. The result was surprisingly effective, although the automatic nature of the balancing was clear. The vocals, for example, were somewhat quieter than custom usually allows, and the mix was equalised to a fairly flat spectrum whereas most commercial music is boosted at the top and bottom ends. Nevertheless, the power of automated mixing was effectively demonstrated – the result was perfectly reasonable for a monitor mix and, as the algorithms are perfected, the results will certainly improve further.

Suggestions and examples of other automatic tools were shown: an eliminator of feedback for live sound, which set itself the target of keeping the loop gain of the system below 0dB in every frequency band. It achieved this by finding the transfer function of the system and calculating its inverse. A plug-in for automatically correcting inter-channel delay was also demonstrated, which successfully reduced the artefacts created by spill between one microphone and another. The aim of these tools is again to free up the balance engineer’s hands and mind for the more creative aspects of live sound engineering.

The scope for further work in refining these tools is clear, although they already work impressively well. Informal blind testing has shown that it is hard to discern the automated mixes from those executed by students (at least, in short excerpts). In an act of subterfuge, Dr Reiss entered an automated mixdown into a student competition, and confessed his crime only after the competition was judged. Although the mix failed to win a place in the competition, it also failed to pique the judges. Inevitably, technology will soon change our craft beyond recognition. Fortunately for us, the researchers appear no closer to developing a substitute for talent.

Report by Ben Supper

Loading Map....