Meetings Archive – 2009
Date: 8 Dec 2009
Christmas Lecture by Peter Eastty of Oxford Digital
The PowerPoint visuals for this lecture are available from the website, and it’s highly recommended that you view them while reading this report or listening to the recording, because many of the key concepts are graphical and make no sense without the Download the visuals here (4MB PDF).
Peter opened his lecture with the assertion that it would not be mathematical, an interesting proposition for a topic notorious for the complexity and abstraction of its mathematics. He also pointed out that he has never designed an analogue filter in his life: instead of approaching digital filter design by designing analogue filters and translating them to the digital domain, he has always considered it from an exclusively discrete-time, sampled perspective. The first design aims are to create a bell-shaped presence filter, and a shelf filter: essentially the same as EQ controls on a mixing console (see page 3 of the visuals). Both of these filter types are defined by three independent parameters, as illustrated: gain, frequency, and Q (for the bell filter) or overshoot for the shelf. To achieve this, there are just three building blocks available (page 4): multiplication, addition, and a delay of one of more samples. So, how do we go about arranging these building blocks to make filters?
Peter took us on a rapid yet simple-to-follow tour of the effect of combining simple combinations of multiplication, addition and delay, with the most intuitive explanation I’ve ever heard of visualising filter responses in the z-plane. It’s difficult to paraphrase to convey the meaning without simply writing down the whole lecture, but listening to the recording whilst viewing the visuals will convey the message. He started by considering the behaviour of the simplest possible combination of a delay and adder – effectively a one-tap FIR, derived the z-plane representation and frequency response entirely graphically and intuitively, then proceeded to extend by adding a multiplier (for coefficient values other than 1), resulting in the insight that the coefficient moves a zero along the x-axis of the z-plane (page 19).
Next, it was shown that responses could be combined by cascading filters together (page 20 and following), but that the elements of cascaded filters can also be combined together into one structure with a single adder (accumulator) with identical behaviour, with simple mathematical relationships between the multiplier values (coefficients) in the cascaded filters and the combine structure. Based on this, a relationship was derived between the coefficients and the positions of zeroes on the x-axis – a little maths involving a square root, but still pretty straightforward (page 23). Of course, square roots often tend to give rise to roots of negative numbers (page 26 – looks remarkably like the quadratic formula) – so what do you do then? Well, in a move highly reminiscent of complex numbers, each zero moves off the x-axis in relation to a couple of new equations, to create a symmetrical pair (pages 27-29), and all the findings so far are summarised on page 30.
So far, the lecture had focussed on things that reduce gain (zeroes) – what about things that increase it, pulling that 3D surface upwards (page 31) instead of downwards? Just like with op-amps, positive gain is created using feedback loops, and the feedback loop contains very similar topologies to the filters already discussed (pages 32-34). Second-order filters with only negative gain response (all-zero) can be combined with second order filters with only positive gain response (all-pole) (pages 35-36), and the resulting structure is often known as a biquad, beloved of digital mixing console designers for (among many other things) digital versions of traditional parametric equalisers and filters. It is shown to have two symmetrical pole/zero pairs, and when the frequency response is plotted against log frequency, it can give rise to the familiar bell-curve EQ frequency response (page 41) if the pole and zero are associated with the same frequency. The distance between the pole and the zero was shown to be related to the Q or bandwidth of the filter (pages 43-50), and the geometry for the curves of constant frequency is calculated.
Curves of constant gain were also shown (pages 51-55), and it was then shown (pages 56-58) that the curves of constant frequency and curves of constant gain are orthogonal at all points – important for independent control of them. All these equations were pulled together, and with the addition of a gain correction term (page 60), resulted in the definitive equations for biquads (page 61). It was demonstrated with code snippets that these equations are directly implemented in Oxford Digital’s products. The effect of a biquad with same gain for the pole and zero, but at different frequencies, was illustrated (pages 65-66): it was shown that the perfectly-damped response is achieved when the gain circle has its origin on the unit circle (page 67).
Higher-order filters can be created by adding extra pole/zero pairs on the same constant frequency curves, but on carefully-chosen constant-gain curves (pages 76-81). It was then demonstrated how to make non-integer-order filters, by using the fact that a coincident pole/zero pair cancel each other, so by introducing such a pair (no effect on the filter) and then slowly moving them apart, the pole/zero configurations for integer orders can be interpolated between. This is truly novel, and although the graphics illustrating the configurations are not in the visuals linked to above, they can be seen in Peter’s convention paper presented at the 125th AES Convention, entitled “Accurate IIR Equalization to an Arbitrary Frequency Response, with Low Delay and Low Noise Real-Time Adjustment“.
This ability to have non-integer order IIR filters permits the construction of arbitrary filter responses, but without the usual penalties of FIR filters (namely, long processing delays and poor phase performance). Peter demonstrated his real-time filter software, running on a laptop with a frequency response curve that is manipulated by attaching handles and moving them arbitrarily as desired. Naturally, some extreme frequency responses result in filter orders in the hundreds, and CPU power is limited, so the filter order can be limited and the response gracefully falls away from the handles if an impractical response is requested. The filter changes response quickly and completely smoothly (to the audio) in real time, even with rapid changes to extreme filter responses with orders greater than 100 – how this is achieved, Peter declined to elaborate further! Controlling the coefficients of IIR filters such that smooth changes in gain/frequency/bandwidth are achieved without artefacts or (worse) instability is regarded as a challenging task for simple conventional filter designs, so achieving this for Peter’s much more sophisticated arbitrary-response EQ with extremely high orders is impressive.
Peter concluded his fascinating lecture with the observation – made possible by his EQ – that if one creates a dramatic comb-like filter response (in this case, alternating 12dB gain boost at roughly octave intervals), then shifts the frequencies of all the gain/cut points together in logarithmic frequency (i.e. group all the handles together and drag them left/right at once), the resulting effect sounds like playback pitch is being increased or decrease, despite the audio remaining at constant pitch and playback speed. Peter makes the entirely plausible suggestion that the rapid scaling of a complex frequency-domain structure in log frequency creates a psychoacoustic illusion of pitch shift, because it sounds like the frequency scaling of harmonic structures characteristic of pitch shift.
Many thanks to Peter Eastty for a fascinating and entertaining Christmas lecture, which delivered fascinating insights for both seasoned digital audio engineers and those new to the field, and revealed genuinely groundbreaking technology.
Meeting report by Michael Page
Date: 14 Apr 2009
‘Surround sound audio codecs in broadcasting – an introduction and latest results from independent listening tests’
Lecture by David Marston, BBC R&D
Surround sound systems are now becoming a popular addition to many people’s homes. This means there is now a demand for surround sound content to be delivered to homes via broadcasting, Internet or recorded media. Whichever way it gets to its destination, it is going to require data reduction along its journey. This may be in the transmission end of a broadcast chain, or in the transport of audio from a studio out over a broadcaster’s network.
This data reduction uses audio coders designed for surround sound. There are currently numerous different audio coders available, often with different attributes and performance. Choosing which coder to use is not a simple choice, and one of the key factors in this choice is the sound quality. It is inevitable that for serious data reduction, the coder will have to be lossy and therefore compromise sound quality. Our work assessed the sound quality of a selection of audio coders using the most accurate instrument of measurement available: the human ear. Here we present the codecs tested, how the tests were done, and of course the results.
This paper described the methodology used for a series of evaluation tests conducted by members of the EBU on a range of commercially available audio codecs.
In his introduction, David explained that the measurement of perceptual audio coding systems cannot be carried out using conventional objective measuring tools, as one would do for wow and flutter, for example. An objective measure based on psychoacoustic principles such as PEAQ can work reasonable well with MPEG-style stereo codecs, but there is nothing available yet for surround systems. A disadvantage of using such measurement is that any new method is likely to be incorporated into a codec’s design to ensure good test results.
The only effective test, therefore, is subjective listening using humans – a slow and expensive process if a good sized sample is employed, although you do get useful results.
There are various parameters that can be looked at: overall quality, spatial quality in the case of surround sound, intelligibility, cascaded codecs, and so on. When a selection of different codecs and coding rates have to be tested in multiple combinations, the complexity increases further. In these instances a measurement system such as PEAQ can be used as a pre-filter.
The main subjective testing methods today are MUSHRA (MUlti Stimulus test with Hidden Reference and Anchors); BS1534, which is designed for mid-range to higher quality codecs, can test multiple codecs at the same time and was used in the EBU tests; BS1116, designed for high quality codecs but only samples one at a time; and P800. The latter is for speech and was not relevant for these tests.
MUSHRA produces a quality value and BS1116 an impairment value for each codec. On occasions it may be relevant to have more than one value, for example for temporal and spatial quality. A single value makes testing faster, as well as being easier for the listener and for analysis, however it can hide differences in listeners’ perceptions.
Ensuring a gender balance has also been a problem as most of the listeners have been male. Training is important, whoever takes the tests. Listeners must be taught to identify coding artefacts and other problems, as well as how to use the assessment interface. For scoring, a numerical scale is useful because it avoids interpretations of words like ‘Fair’ or ‘Good’.
Each listener hears five codecs, any more would make the test too tiresome and could degrade the accuracy of results. During the MUSHRA test listeners are always given the reference, and also included in the randomised sequence is a hidden low quality anchor reference, a 3.5kHz low-pass filtered version of the original. In the EBU test another, spatially reduced, anchor was added. For BS1116, listeners hear one codec at time, which is compared with a hidden reference and the known reference. This takes much longer, therefore each listener is limited to four codecs.
It is important to select a cross-section of experienced and novice listeners. Some may prove to have poor listening skills, or have a hearing impairment, but it is not always possible to identify this in advance. So it is better to use them for the test and reject their findings afterwards, often based on their ability to rank the hidden reference and low quality anchor.
David showed a slide of the MUSHRA test interface and explained how the listener can select each of the examples in order to make direct comparisons with the reference. He went on describe listening set-up at Kingswood Warren – soon to disappear!
Choosing the test material is always difficult. It must be critical, in order to highlight coding artefacts, but at the same time be unbiased, eg not material that is known to disadvantage a specific codec. The material must also be appropriate for the application: a mixture of music, speech and jingles (which will already have been compressed) for a broadcast codec, for example. The final choice of ten pieces of test material was made by a selection panel.
One of the techniques used by Institut für Rundfunktechnik (IRT), which analysed the results, was the Spearman Rank Correlation. This looks at the ranking of all the scores, and if anybody’s ranking was massively different from the average they were rejected. Around ten percent of listeners were eliminated at this stage.
There are three phases to this series of tests. The first two covered the most commonly used codecs for emission (transmission), the last link in the chain and usually the one with the lowest bit rate. Phase three looked at combinations of higher bit rate codecs used in the production/distribution chain – which are designed to be cascaded – combined with low bit rate emission codecs, and how they interact.
To ensure randomisation it was decided to split the codecs into three groups, based on their bit rates. Each listener’s five codecs contained at least one from each of these high, medium and low bit-rate groups, with the remaining two being from a single group to ensure a strong intra-group comparison within the test; eg a listener might hear one high, three medium and one low bit-rate codec.
Ten test items were used covering a varied selection of material, including applause, harpsichord, sax and piano, a church organ and Robert Plant.
IRT carried out the analysis to produce the test results. Some listeners were rejected if they fell outside of the Spearman Rank Correlation threshold, which compares the ranking given by each listener with the overall rankings. After this process some codecs dropped below the minimum of 15 listeners and so extra listening tests had to be carried out.
David went on to show the various test results and explained that some of the codecs used for the test were pre-production prototypes, or have since been upgraded. One common element was that the most difficult item to encode – usually the applause – normally ranked much lower than the mean. For example, one codec was rated 30 on applause but 90 on music, proving that perceptual coding is very content-dependent. [Note: This report does not list the codecs involved or their rankings due to the risk of misrepresenting the current performance of those codecs.]
The conclusion from Phase 1, as would be expected, was that higher bit rates produce better quality. The detailed results for each codec have hopefully given their developers something to work on in terms of improving their performance.
Phase 2 retained the applause sample from Phase 1 as a reference item but the other samples, although similar in terms of content type, were different. When results from Phases 1 and 2 were compared they were similar, proving that the testing methodology was valid. Phase 2 again showed that excellent quality can be achieved from low bite-rate codecs, but not for every type of content, and again it gave the developers guidance on areas where improvements can be made.
Phase 3 combined cascaded high bit-rate distribution codecs such as Dolby E, apt-x and Linear Acoustics with a selection of emission codecs. Ten items were selected from the samples used in the previous tests and these were cascaded five times through the same distribution codec before being passed through one or two different emission codecs. Various combinations were tested.
It was decided to use BS1116 rather than MUSHRA for this phase. Because this is an impairment scale, it was not possible to make any direct comparisons with the results of Phases 1 and 2. The conclusion was that distribution codecs still introduce some impairment, having the effect of creating a ‘ceiling’ to the overall quality attainable. The recommendation therefore is to use the highest bit rate possible.
Overall conclusions from these listening tests were that perceptual coding is still an imperfect art and there is room for improvement. Analysis is not easy, but these tests do reveal things that objective tests could never do, as well as uncovering things you wouldn’t expect.
Meeting report by Bill Foster
Date: 15 Dec 2009
Conducted by Keith Howard
Bob Stuart has been a major figure in the British audio industry for over 30 years. Best known as Chairman and co-founder, with Allen Boothroyd, of what is today Meridian Audio Ltd, he has done much more than steer the company through challenging times to its current high-profile position manufacturing some of the most sophisticated audio equipment available. A pioneer of active and then DSP-equipped loudspeakers, he was quick to recognise the potential of CD and, as part of the ARA, to push for a version of DVD dedicated to high-resolution multichannel audio. Meridian’s own lossless compression algorithm, MLP, was developed in anticipation of this and selected by the DVD Forum for DVD-Audio in a technology shoot-out against stern competition. In expanded form it remains the basis of the Dolby TrueHD lossless compression scheme used in Blu-ray Disc. With a long-standing interest in psychoacoustics, which he studied alongside electronic engineering at Birmingham University, Bob is one of very few creators of high-quality audio equipment to have explored the fundamentals of sound perception and generated computer models of human hearing to help guide the design process. In recent years, in collaboration with Peter Craven, he has investigated the effects of digital anti-aliasing and reconstruction filters, one intriguing result being that Meridian’s latest flagship CD player – the 808.2 Signature Reference – uses minimum-phase rather than linear-phase output filtering.
These subjects and many others are covered in this interview, with Bob presenting supporting material to clarify the issues.
Date: 14 Jul 2009
Lecture by Michael Page of Peavey Digital Research
Advances in audio distribution and control over digital networks have delivered tremendous benefits for operators of large venues and premises, such as theme parks, cruise ships, stadiums, live performance venues, airports and industrial complexes. Audio for entertainment attractions, background music, paging systems and evacuation purposes may all be transported and controlled on a single distributed system, via Ethernet and IP local area networks. Audio processing for acoustic correction, routing, mixing and other processes is all easily performed using programmable DSP, located both centrally and at distributed nodes. Michael Page of Peavey Digital Research will discuss and demonstrate the technology used to achieve this.
Michael started his talk by listing the range of applications for the networked audio DSP systems he’d come to talk about: a diverse range including airports, stadiums, theme parks, ports, houses of worship, legislatures, and convention centres. Then, displaying an aerial view of the truly gigantic Hartsville-Jackson Atlanta Airport, he posed the question: what does it take to wire an airport for sound?
It sounded like a straightforward question, until Michael started discussing it. He started by talking about the audio system outputs: each boarding gate area (all 179 of them, at Atlanta) needs an individual output, each lounge, each concession, each arrivals hall zone, each check-in zone, each customs hall, each luggage reclaim zone… not to mention all the non-public areas. Each of these many hundreds of outputs needs, in addition to level control: EQ for loudspeaker correction, EQ for room correction, delay for time-alignment, possibly dynamic range processing, and possibly ambient level sensing. Ambient level sensing is a particularly complex DSP function: it uses a measurement microphone to detect the ambient level in a space, so that the level of the loudspeakers can be adjusted to ensure a consistent signal-to-ambient-noise ratio for the listeners. But if the audio system is active while this measurement is being made – as is often the case – sophisticated DSP is needed to “null-out” the contribution from the loudspeakers from the signal picked up by the microphone, in order to obtain an accurate measurement.
Next, Michael considered the inputs. Each boarding gate has a paging station, plus paging stations for every lounge, concourse and information desk, that may be routed to any system output. There may be background music inputs for lounges; automated message playback systems (“Please do not leave your bags unattended”, etc.); automatic announcements from the fire alarm system; and all these need to be prioritised, so that evacuation announcements aren’t blocked by the background music, for example. Each input typically needs EQ and dynamics processing, and needs to be routable or mixable to any combination of the several hundred outputs. So: we’ve got an audio system with several hundred inputs, several hundred outputs, all connected with a giant intelligent mixer, and a sizeable amount of DSP on every input and output. The inputs and outputs are distributed over six huge buildings, across a site a mile long and half a mile wide, and it has to integrate with the security, life safety, building management and enterprise management systems. Finally, it needs to be extremely robust and redundant, so that it keeps running even if the system sustains major component or infrastructure failures, perhaps caused by a large fire or a bomb explosion. Wiring this for sound isn’t as simple as first thought!
Michael next considered another application: stadiums need to get high-quality sound to every seat in the stadium, despite huge acoustic differences in the seat and loudspeaker placements. So each block of seats needs separate loudspeakers and processing – plus zones for all the internal areas: locker rooms, bars, restaurants, VIP areas, conference centres, car parks, atriums, etc. Stadiums don’t need as many inputs as an airport, but the ambient level processing is even more critical due to the difference in ambient level between sides of the stadium, at crucial points in events!
Finally, Michael discussed the requirements of theme parks. Audio-visual experience attractions such as “Terminator 2 3D” at Universal Studios are an obvious application: audio is a fundamental part of these attractions, and they require high-level, high-quality audio reproduction from a large number of independent channels to be precisely synchronised, sometimes interactively, with the motion control and video control systems that create the other dimensions of the visitor experience. Audio reproduction, even if it is only zone-specific background music, is usually present at pretty much every publically-accessible location in a theme park, and all public areas must have audio coverage for life safety announcements such as fire evacuation. As with airports, the wide range of different acoustic environments and geographic spread requires a large number of independently-addressable audio zones. The audio system inputs may be local, such as interactive audio playback within rides, or remote, such as background music, advertisements or paging announcements. Parade grounds and live shows complicate matters further, with many radio microphones and loudspeakers covering a very large area.
Now the problem is understood – how is it solved? Traditionally, it was analogue: large quantities of analogue multicore, thousands of crosspoints of punch-on patchbay, and many racks of analogue signal processing. It was difficult and expensive to engineer robust system redundancy, and very difficult to get computer-interfaced control of the audio signal processing. Each audio channel required a balanced line connection, requiring a huge quantity – and weight – of cable.
All this was revolutionised in the early 1990s by the arrival of DSP technology. DSP brought huge cost and functional benefits to the audio installation industry, for two principal reasons: it allows very simple interfacing of audio functionality to computer systems; and it permits arbitrary, heterogeneous arrangements of audio DSP functionality to be realised cost-effectively in generic hardware. The other crucial development from digital audio technology was digital audio networking, carrying many channels of uncompressed audio at low latencies on standard computer networking infrastructure. Analogue audio multicore cables were hugely expensive to buy, and even more expensive to install, whereas computer networking cables are flood-wired into all commercial and public buildings. So despite the relatively high transceiver costs, audio networking was a vastly cheaper way of getting audio around a commercial building. It also implicitly provided computer-controlled audio signal routing, saving the cost of expensive dedicated audio routers. The de-facto standard audio networking technology for the commercial installation industry has been CobraNet since the late 1990s, which is Ethernet-based (layer 2), has a latency of about 5 milliseconds, and convey up to 64 channels of audio in both directions.
To indicate the state of the art in audio networking, Michael spoke about a relatively new technology called Audinate Dante. It’s Internet Protocol based, typically runs over Gigabit Ethernet, and it’s scalable for both bandwidth and latency, making it very flexible for a wide range of applications. It may be configured for performance comparable to CobraNet, but in principle it can also function (with higher latency and lower bandwidth) over poorer-quality networks such as the public internet, or alternatively it can function as an ultra-low-latency, ultra-high-bandwidth point-to-point link between audio processors.
Michael then explained how these technologies are brought together. A system typically comprises some number of analogue audio i/o units, DSP units and control interfaces, connected by an Ethernet network for audio and control data, but these units may all be physically remote from each other. Control interfaces are used to communicate with user interface devices, uninterruptable power supplies, fire and life safety systems, building services management (HVAC) systems, show control systems, and many other possibilities.
This is a very successful technology area, with a number of companies actively competing. Peak Audio developed the first product of this kind in the early 1990s, the MediaMatrix system, which comprises a PC-AT motherboard with custom ISA backplane, DSP ISA cards with Motorola 56K DSPs, and analogue i/o boards. This was first used to provide an adaptive, distributed sound reinforcement system in the US Senate Chamber, which posed some unique challenges that could only be solved by computer-controlled DSP. The MediaMatrix product was licensed to Peavey, who manufactured and distributed it, and it was extremely successful.
The second generation MediaMatrix product was the Nion, launched in 2004. It has a PowerPC CPU running embedded Linux, for distributed control and communications with the other Nions on the network, and monitoring the DSPs and audio interfaces. It has a number of Analog Devices SHARC floating-point DSPs, a proprietary high-bandwidth low-latency audio link bus that uses Cat-5 cable to connect Nion units together, and a CobraNet interface module. CobraNet has such high bandwidths and low latencies that it needs dedicated data processing hardware: a generic CPU doesn’t have sufficient network performance. It also features a selection of “general purpose I/O” connections for control interfacing: logic i/o, relays, high-current outputs (for driving lamps, solenoids, etc.), control voltage inputs and outputs, and rotary encoder connections, for creating simple custom control panels.
Michael demonstrated of the NWare software, a Windows application used for defining the DSP and control functionality. It has a graphical user interface resembling a CAD drawing tool, allowing the user to drag-and-drop blocks representing DSP functions, audio i/o, control functions, control scripts, and many other functions. It also allows creation of custom control panels for PC-based or touch-screen user interfaces. When the design is complete, the “deploy” button is pressed to generate the DSP and control code, and download it to Nions connected on the network, which immediately take on the designed functionality.
The lecture was wrapped up with a look at the NWare system design for the MediaMatrix system at Emirates Stadium. As well as huge quantities of signal processing blocks, it featured touch-screen graphical user interfaces based on architectural plans of the stadium for ergonomic control and monitoring of audio across many different zones in the stadium at once. Custom support for communicating with UPS devices is implemented in the Python scripting language, which executes on the Nion. This vast system design gave a flavour of the tremendous complexity of the audio system implemented with MediaMatrix.
The NWare software can be downloaded for free from the downloads section on the MediaMatrix website.
Meeting report by Michael Page
Date: 3 Jun 2009
Special lecture by George Massenburg of George Massenburg Labs
George Massenburg needs little introduction – even if you don’t know of him, you have probably heard his recordings. For a detailed biography, see www.massenburg.com/cgi-bin/ml/bio.html.
What is difficult to represent in this report is the passion George exudes about music, a passion which drives him to strive (and help others to strive) to continually improve the quality of recorded music. Many recordings were replayed in the course of this lecture, some made by George, others not. Most were 192kHz, 24-bit; some were transferred from analogue master tapes.
George began by replaying a Diana Krall track, pointing out the subtlety and detail captured by Al Schmitt. In a change of style, the next track was by Neil Young – a new song about the recent financial crisis with the chorus line “A bailout is coming, but not for you”. Elements of the recording were described, there being a pair of guitars (slide and acoustic), rock’n’roll drums and a hi-hat “somewhere in the background”.
George then played a clip from YouTube of a recent and currently very popular track by Autotune The News (their second track, pirates. drugs. gay marriage), an original piece where television newsreaders have been cleverly edited in time and pitch such that they appear to be singing. The point here? Although the YouTube clip has been extremely popular (it received 1.5 million hits in the first week, possibly setting a web record), and although George admitted to thinking it “brilliant”, the audio quality is very poor. George pointed out that repeated listening at this YouTube-quality quickly gets very annoying because of the low-fidelity sound.
George then played the results of some subtraction tests on lossy audio codecs, a technique which George refers to as the Moorer test as it was originally suggested by James A Moorer. In these tests, high-quality 192kHz, 24-bit recordings were converted to various encoded forms such as MP3 and AAC. The encoded files were then decoded and upsampled back to the original 192kHz, 24-bit. A sample-by-sample subtraction was then performed, and the resultant difference – the error introduced by the codec – then replayed. The resulting error signal is surprisingly high in amplitude (estimated by George as typically 25-30% peak), clearly correlated to the signal and with a complex relationship to the original sound (not simple harmonic distortion).
George takes the view that his students should learn to recognise the nature of the codec error using this subtraction method and then listen to the encoded music. Using this learning technique, listeners can familiarise themselves with the artefacts’ sound in isolation, and can subsequently pick them out more readily when the encoded material is played.
George believes that every time we hear a piece of music we should have the possibility of hearing something new – “to take home something else” – and that this is more readily achieved with high resolution recordings. Although George concedes that it’s possible to make a “pretty good” 44.1kHz/16-bit CDs, he remembers the first time he heard a digital recording: rather than being impressed, he was “horrified”.
Subsequent work to push the boundaries of converter technology (George recalls the contribution of Paul Frindle in this area) has convinced him that good digital now is good. He believes that we don’t have to go back to magnetic tape to make good records, and describes himself as having “an easy peace” with both vinyl and analogue tape.
George deprecates recording techniques in which small elements are recorded separately and later combined/corrected/stretched/re-tuned, etc. He believes a key to great music recording is to maintain a performance focus. Preferably, the band should perform and be recorded playing simultaneously in the same space. George offers these suggestions to help your next recording:
- 1) Only use destructive record.
- 2) No punch-ins.
- 3) No one is allowed to take the recording home and ‘tweak’ it – they can do another take, but the previous one will be overwritten.
The AES UK section and George wish to thank the companies who kindly supplied equipment for this lecture, namely ATC (monitor loudspeakers), Digidesign (ProTools system), Arcam (DVD player) and Prism Sound (D/A converters).
Report by Nathan Bentall (edited by Keith Howard)
Date: 12 May 2009
Lecture by Jim Anderson of Jim Anderson Sounds
The former New York Times film critic Vincent Canby wrote: “all of us have different thresholds at which we suspend disbelief, and then gladly follow fictions to conclusions that we find logical.” Any recording is a ‘fiction’, a falsity, even in its most pure form. It is the responsibility, if not the duty, of the recording engineer, and producer, to create a universe so compelling and transparent that the listener isn’t aware of any manipulation. Using basic recording techniques, and standard manipulation of audio, a recording is made, giving the listener an experience that is not merely logical but better than reality. How does this occur? What techniques can be applied? How does an engineer create a convincing loudspeaker illusion that a listener will perceive as a plausible reality?
Jim Anderson: Professor of Recorded Music, Clive Davis Department of Recorded Music, New York University
Jim started his lecture with the attention-grabbing statement that audio recording is trickery, a devious deception – then expanded the point to explain that the aim is to make you, the listener, believe you’re hearing the truth: but actually it’s sleight of hand. He set about illustrating that by playing back a diverse range of audio recordings over the course of the lecture and discussing them, casting some light onto the techniques and tricks he’d used to exercise that devious deception: and without exception, create musical listening experiences of quite exceptional quality.
Jim started by playing the commercial release of J. J. Johnson’s “The Brass Orchestra” – it was extremely punchy, dynamic, and live-sounding. He then played another track: while obviously the same piece, and possibly the same very performance, it had much less impact, drums were much quieter, the soloist was clearly off-mic – this was from a simple stereo pair of mics to capture the “air” of the room, and illustrated the striking difference between the somewhat artificial, yet highly-appealing experience created by the commercial release, and the fly-on-the wall experience of the performance – which is arguably the “real” experience. Jim then discussed some of the details of this performance and the techniques he’d used to create the “false”, yet plausible and appealing final product: it was captured live-performance-style in a single take with no overdubbing; microphone selection was key in realising tonal and dynamic differences within the group; the studio had a “good” acoustic for performance, but this was enhanced with artificial concert-hall reverb. The artist wanted to mix first without the solos, in order to get all the internal balances right: then add the solos later – so the whole thing was mixed twice.
Jim expanded on the microphone selection points by playing “High Noon – The Jazz Soul of Frankie Laine” featuring Gary Smulyan, baritone sax player. Jim used ribbon microphones, with their smooth, easy sound, on all the nine-piece backing group; but used condensers to bring the baritone sax and French horn into sharp dynamic focus. It allows the backing to be up-front in the mix, yet keeping the sax solo sounding appropriately prominent.
To illustrate another interesting technique, Jime played drummer Marvin “Smitty” Smith tracking “The Road Less Travelled”: Marvin had requested “more depth, more breadth” in the kick drum. Jim met this requirement by using a Beyer Opus 51, a boundary effect mic designed for piano, under a sheet of wood to isolate it from the rest of the kit. He used two Opus 51s and an M88 in the middle, to create a mid/side array. In stereo, it creates perfect image of the kit: in mono, it collapses and provides a remarkably leakage-free kick drum.
Among other recordings Jim discussed, he played a track by Patricia Barber, recorded in Chicago. It had an extraordinarily huge, deep, broad-sounding kick drum, very prominent and snappy drums in general, whereas the female vocal is up-front yet full in the low-mids. He then played another recording, with same trumpeter in the same room, yet smoother-sounding – because it’s a tube mic rather than ribbon. Kick drum is only 18”, but with good tuning and an M/S mic it gives the huge depth and finish.
All recordings played so far had been tracked straight to digital: Jim’s next recording was a modern attempt to recreate the classic 1970s Blue Note sound, for an album called “Hubsound – The Music of Freddie Hubbard” Contrary to direct-to-digital tracking, this was done using a 16-track 2” at 15 inches-per-second with no noise reduction. It’s impossible to make lots of overdubs because 16 tracks is very limited. In this way, it emulates not only the sound, but also the practical constraints and therefore the recording techniques, of the Blue Note vintage.
Next up, we heard Gonzalo Rubelcaba performing “Here’s that Rainy Day” in Criteria Studio A in LA: solo piano in a large live rectangular room. Mics were a U87 above, DPA4007 close, DPA 4006 a little further back: and beyond that, a pair of U87s in a modified polyhymnia configuration, so the room sound was also captured in case a surround mix was subsequently needed.
He then played for us Bebo Valdes, a live recording done in a recording truck at the Village Vanguard nightclub. Mics were just a Sanken CUW180 with pair of ratchet movable capsules, here set up for X/Y. Mic pres with A-D were on stage, plus an audience microphone, and optical links connected the A/Ds to the truck. The recording setup was triple-redundant with Tascam DA98s, but the primary recorder was ProTools HD. Jim created a rough mix on Yamaha DM2000, for the performers to check each performance immediately afterwards. Mics were a combination of omnis and cardioids on piano, the Sanken X/Y on bass, and omnis on audience. The worth of the latter was shown when the audience start singing along – precise capture of the audience really added atmosphere to the final product.
Jim concluded by playing us his first ever jazz recording – Ella Fitzgerald at the New Orleans Jazz and Heritage festival 1977, knew Stevie Wonder was in the audience, so called him up to join in! The encore was the duet “You Are The Sunshine Of My Life”. It was a pretty magical moment to capture for a first jazz recording: particularly as immediately after the end of the song, the tape ran out, right then! A close-run thing.
Jim wrapped up this interesting talk – and listening session – by maintaining he’s the liar! Thanks to PMC and Arcam for the superlative audio reproduction system kindly lent to us for the evening.
Meeting report by Michael Page
Date: 9 Jun 2009
Lecture by Philip Hobbs of Linn Records
Phillips Hobbs is a Producer and Audio Consultant at Linn Records Ltd. He first worked for Linn in 1982, leaving to study on the Tonnmeister course, returning to Linn in 1987 after graduating. Philip’s main roles at Linn have been in music recording and speaker design. Philip described himself as being the ‘worst sort of communicator’, because, according to him, he is both ‘Scottish and an engineer’.
Philip talked tonight of how Linn’s business has been ‘transformed over the last 3 years’ by the introduction of their music download service, a service where the customer can choose the download quality all the way to 192kHz, 24bit and where all the downloads DRM-free.
Phillip gives a ‘two minute trip down memory lane’ of how Linn started
Linn was founded by Ivor Tiefenbrun as an offshoot of Castle Precision Engineering, a machining company who made parts for such things as aircraft and Rolls Royce. The original home of Linn, Linn Business Park gave Linn it’s name, and Linn established it’s HiFi pedigree with the Linn Sondek LP12, a record player still in production today.
Linn expanded its product range and has made amplifiers, CD players, active loudspeakers and Digital Stream Players which stream files from hard disk.
Linn The Record Label
Like many other hardware manufacturers, Linn developed an interest in the recording industry. The initial motivation was to make recordings to test the reproduction capability of the LP12 and to investigate vinyl cutting lathes to the same end, but Linn has subsequently blossomed into a serious audiophile record label with many original recordings.
Traditionally focused on classical music, Carol Kydd was the first Linn ‘proper jazz artist’ and Linn released her first album in 1983. Linn made an initial pressing of 7000 records, selling them through record shops. In 1984, Linn hooked up with a band called Blue Nile, releasing their first album ‘A Walk Across the Rooftops’. Blue Nile were keen to sell lots of records and Linn ‘spent hundreds of thousands trying to get them to release their second album’. Philip, who was designing speakers for Linn at this time, estimates the total bill in relation to Blue Nile to be just under £1 million. By 1992, Linn were working on building their classical catalogue in the ‘standard boutique label’ philosophy by focusing on recording quality.
By 2006, Linn had around 250 titles and had established distribution in Japan and America. In Philip’s words, the business ‘was a complete catastrophe’, as it was ‘not economically viable to sell CDs in commercial retail space’, a trend, according to Philip, that had been developing since the 1990s. Philip recalls that the situation had become so bad by 2006, Linn were faced with a decision either to leave the record business altogether, or to find some radical new approach – to find ‘a way to get back to the customers’ avoiding ‘the frustration that traditional retailing gives to many companies, and record companies in particular’, namely that ‘the company is so far away from the people they’re selling to’.
The conclusion at Linn was that they needed to use the internet ‘to connect directly to the customers without compromising on quality’. At this time, Apple’s I-Tunes service was well established and, thanks to the widening availability of fast broad-band services, the ‘possibility you could sell someone 1GB of data’ was becoming a reality.
So Linn built a web site where customers could download music directly – similar to the I-Tunes idea, but with a unique selling point – the ability to provide downloads up to 24-bit 192khz sampling rate (and loss-less), where the customer is free to choose the download resolution/sampling frequency from studio-master down to MP3-level quality. Like I-Tunes, customers can buy individual tracks or whole albums with the higher quality downloads commanding a higher price tag.
According to Philip, despite an initial cost of £100k, the site is now profitable after around 2.5 years of service.
Philip stated the more traditional music distribution method of physical media in shop-retail results in around a 15-20% share of the ticket price being returned to the record label. For example, if a CD retails for £15, £2.5 for the record company would be considered as ‘doing pretty well’. With the download service which operates in the absence of ‘middlemen’, the margins increase considerably. Philip estimates that the download business returns around 80% of profit, and further, their profits would probably increase were they to move entirely away from physical media, (which Linn continue to support out of loyalty to a minority of, presumably similarly loyal, customers).
High Quality Master Recordings
A key factor that made Linn’s Hi-Res download business viable was their long-term focus on making recordings of the best quality possible, a commitment which had led them to make many of their master recordings at 96kHz or 192kHz. This resulted in a ready supply of high-resolution back catalogue. This situation was, according to Philip, in contrast to many other record companies whose masters were typically made/archived at 44.1khz or 48kHz.
Linn have a base of around 120,000 customers. With their focussed direct-marketing approach, a Friday evening email-newsletter often results in many £1000s or business by Monday from their download service.
Philip also points out that these direct marketing activities rarely offer significant discounts (which would reduce profits) – usually, they are simply aimed to draw the customers attention to some new material or other works that may be similar to previously purchased material.
Download Usage – how are the customers using the downloads?
Philip sees 4 main customer types split by playback method.
- PCs with sound cards, Windows Media Player, I-Tunes etc.
- Portable Devices (I-Pods, Zune etc.)
- Burn-to-Disc – customers making CDR/DVD-R copies
- Streamed Media Players – from Linn and others – files streamed from local server
The DRM Issue
Linn considered the possibility of using DRM to protect their downloaded material, but at the time the web site was being prepared, it became clear to Linn that DRM just didn’t work sufficiently well. According to Philip, many people felt that moral arguments eventually killed DRM but wonders whether it was in large part due to an inability to make smoothly working system (and without imposing excessively limiting restrictions on the customers).
Offering such a wide range of download quality options has provided Linn with some interesting statistics on the decisions customers make when offered a quality/cost choice. Despite price differentials, In 2007, 25% of purchases were of the ‘studio master’ quality. By 2008, the figure had rising to around 50%, and so far in 2009, this seems to have risen further to around 70%. Of the CD quality albums downloaded, customers are showing a 50/50 split between choosing FLAC and WMA.
Additionally, of those customers who purchased studio master quality downloads, where they were offered a choice between 96kHz or 192kHz, 80% chose the higher rate in spite of the fact that many players can’t play 192kHz!
Further, Phil is convinced that around half the customers who have purchased studio-master quality downloads don’t currently have the playback equipment to support the sample rate/bit-depth they bought. His conclusion is that given a choice, Linn’s customers prefer to buy the best quality available. If this seems odd, there may be some logic here, and in some way maintaining the Linn tradition. The original Linn LP12 can be upgraded all the way to its current production specification. This ability to upgrade has been a Linn philosophy, at least for the LP12, for many years. By upgrading the equipment, the customer can benefit without buying into a whole new format. If the customer buys the Studio Master, the data they get is all that was recorded – it is essentially ‘as good as it will ever be’ – and with such a music collection, future equipment upgrades may offer further sound improvements when replaying the original material, in many ways similar to vinyl.
High Resolution Benefits
Philip made an impactual demonstration of the potential enjoyment offered by high resolution and high quality recording by playing Handel’s Messiah conducted by John Butt (and where Philip was himself the recording engineer). Unbeknown to the audience, the recording began at rate of 88.2kHz/ 24-bit, but as playback progressed, the bit rate dropped to 44.1khz 16 bit, then to 192kb mp3, then to 96kb, mp3. Although these differences were not immediately obvious to all, (at least in the listening environment in which they were presented), Philip described how it was common for the listeners attention to progressively drift to other matters as the bit-rate dropped, they ‘tend to get bored and start thinking about something else’. This certainly described my personal experience with surprising accuracy.
Philip briefly demonstrated one of the Linn Streaming Players which offer one possible method of replaying the downloaded material. One of the benefits Philip sees for customers with this type of equipment is a significant increase in convenience. Gone are the walls of CD/LP shelves, replaced by a compact hard-disk-based server and controlled via a little application on their I-Phone, a use case Philip describes as ‘addictive’.
Linn are beginning to diversify. They have taken on a couple of small labels and are offering downloads for them alongside their own material. For those interested in purchasing downloads, Linn’s web site may be found at http://www.linnrecords.com/ and test files for evaluating quality (and compatibility) can be found at http://www.linnrecords.com/linn-downloads-testfiles.aspx
The AES would like to thank Phillip for his fascinating talk. I’m sure many members were greatly encouraged to hear that there are still many customers for whom recording quality something worth paying for.
Report by Nathan Bentall
Edited by Keith Howard
Date: 13 Jan 2009
Lecture by Thomas Lund of TC Electronics
Thomas Lund, TC Electronic A/S
Thomas Lund’s background includes work as a recording engineer and musician and the study of medicine – an unusual combination which may contribute to his understanding of loudness perception. Thomas has also been involved the design of many TC Electronic’s products, he has contributed to various standardisation groups on the subject of loudness, and has authored many papers presented to the AES and other bodies.
Traditional Loudness Measurement
Recent years have seen the ‘level’ of pop/rock music, as delivered by CD, steadily increase. Thomas cited the simple way that audio level has been measured as a partial cause. Historically, audio level has often been measured by peak programme meters, and commonly used definitions of overload have been very simplistic methods such as peak-level-counting (eg three consecutive full-scale samples equals overload). Such simple techniques of measuring (and by association, limiting) the level may have worked well when systems consisted of a microphone, a preamp and an ADC but with digital processing techniques numerous methods have been devised to increase the apparent loudness of material delivered on CD while ‘working around’ the peak-level limitations, apparently (we must assume) to some perceived commercial benefit to the record industry.
Many hold the opinion that such ‘hot mastering’ techniques are severely detrimental to the overall quality of modern music releases. Thomas calls this drive for increased level whatever the cost, coupled with a high willingness of broadcasters and consumers to use large amounts of data compression (for archiving, broadcast and replay), a ‘war on music’.
The Problems of Incorrect Levels
With such hot-mastering techniques, it is trivial to generate digital signals that exceed 0dBFS in the analogue output, after the assumed reconstruction or up-sampling filters. The greater-than-0dB peak levels can cause serious problems in the reproduction chain where some processes have been implemented with the assumption that 0dBFS is the largest signal they should expect.
Thomas offered demonstrations based on a commercially available ‘professional-grade’ sample rate converter, subtracting output from input. In this experiment the output should have been silent but differences could be heard clearly, manifested as ticks and signal-related noise. Other potential problem areas, according to Thomas, include limiting in mix-busses and codecs such as MPEG 1 layer 3 . These processes can all exhibit similar problems when faced with very high level inputs, a phenomenon Thomas further demonstrated. The codec problems can depend on the implementation of the codec as well as the codec itself.
Because of these issues, Thomas recommends normalising to -3dBFS – not to 0dBFS, in digital mixing and recording situations. He pointed out that the final 3dB increase can be done in the mastering room without any real quality loss, given that most recordings use 24 bits.
Better Methods of Loudness/Level Measurement
Thomas gave a functional summary of various improved methods of measuring loudness level and showed relative results based on ITU-R BS.1770. A simple improvement is the over-sampling peak programme meter which offers a more accurate representation of the true peak level.
Thomas also presented a loudness meter available from TC Electronic as a plugin for Pro Tools as ‘LM5 Loudness Radar Meter’.
TC LM5 Loudess Radar Meter
This meter includes representations described as ‘Loudness Units’ (LU) or LkFS, ‘Consistency’ and ‘Center of Gravity’, where Center of Gravity indicates the overall loudness of the programme material or music track, and Consistency indicates the ‘intrinsic loudness changes’ present in the track, with 0 representing a steady-state signal (one which has no loudness changes at all, eg a sine-wave) and progressively more negative numbers indicate reducing Consistency. Low Consistency scores such as -4 or lower indicate that the material may have a large dynamic range.
In conclusion, Thomas offered the following recommendations:
- Stop Counting Samples: There are better methods of measuring peak levels than counting the number of consecutive full-scale samples
- True Peak Level: Set maximum peak level at -1dBFS using a true peak meter equipped with oversampling capability.
- Dialog Level: Suggested level of dialog is -26 to -22 LkFS.
- Music: Suggested level of music is -20 to -20 LkFS.
- Avoid Peak level normalisation
If audio level is anchored only to peak level or only to dialogue, both commonly used techniques, loudness chaos is likely to ensue with extreme level jumps between programme, commercials and other home sources.
The tools and understanding exist to provide well-balanced loudness levels between different programmes and material, providing the end-listener a more pleasant viewing/listening experience and the potential for reduced distortion and overall quality improvement. Thomas outlined the problems and offered tools and methods for solving them.
Report by Nathan Bentall (edited by Keith Howard)
Date: 20 Oct 2009
Lecture by Peter Mapp, Mapp Associates
Everyone wants ‘high quality’ sound – but what does this mean – is sound quality measureable? is it predictable? The talk will look at how we can assess sound quality – both in large spaces such as concert halls, cathedrals and even railway stations as well as in small rooms such as home theatres and hi-fi listening spaces. After introducing a number of parameters and concepts that affect sound quality and the listening experience, Peter will discuss how these can be measured and potentially predicted. In particular the use of 3D computer modeling of rooms will be highlighted together with the importance of bass frequency reproduction. A number of case studies and examples of problem sound systems/rooms will be presented. Peter will conclude the talk with an insight into some of his latest research and the introduction of a new measurement/assessment concept, SQI – the Sound Quality Index.
Date: 10 Nov 2009
Lecture by John Dawson, Arcam
Download audio recording of lecture (14MB MP3)
A modern Audio-Video amplifier/receiver (AVR) is an exceedingly complex piece of consumer electronics, requiring expertise in many aspects of analogue and digital audio and high definition video, plus considerable software skills. As such it represents a huge project for any small to medium sized audio company. This lecture takes a look inside the Arcam AVR600 – one of the few such units developed outside of the large Japanese CE companies – and will discuss some of the design choices made in order to try to ensure a good chance of commercial success.
Date: 24 Nov 2009
Special Lecture: Interview conducted by Keith Howard
Download audio recording of lecture (24MB MP3)
An excellent Tutorial by Neville Thiele can be found here (AES Members only, log-in required for www.aes.org)
Neville Thiele’s name is known to anyone who has ever taken an interest in the practical design of moving coil loudspeakers, through the Thiele-Small parameters that bear his name and that of Richard Small. In 1961 he wrote a seminal paper on the design of vented (reflex) loudspeakers that – although it was largely ignored for 10 years until reproduced in the AES Journal – is now acknowledged as initiating the filter parameter based approach to loudspeaker analysis and synthesis which today is routinely used by the audio industry at large. In recognition of this, in 1994 he was awarded the AES Silver Medal.
In this interview-based lecture, Neville Thiele will talk about what led up to this breakthrough and its significance to the speaker design process. He will then give three short presentations on loudspeaker-related topics: filter-assisted bass alignments and novel crossover approaches; driver ageing effects; and driver impedance correction in crossover networks. Questions will then be invited from the audience.
Date: 10 Mar 2009
Lecture by John Vanderkooy, Audio Research Group, University of Waterloo, Canada, with Steyning Research Establishment, B&W Group Ltd, UK.
John Vanderkooy presented research into methods to improve loudspeaker measurements made in non-anechoic rooms. The lecture began with a discussion of the motivation for the research:
- Not everyone has access to an anechoic chamber
- Anechoic chambers may not be effective below 100Hz due to inadequate LF absorption
- Low frequency calibration of anechoic chambers may be ineffective
- Low frequency noise from air conditioning, industry and the environment can easily contaminate the measurements.
Impulse response measurements made in an echoic room or an imperfect anechoic chamber will have reflections that contaminate the results and will also often have significant levels of added noise. John presented measurements from a 110mm driver in a small sealed cabinet to illustrate the algorithm developed to overcome these limitations.
The algorithm comprises the following steps:
1) Measure an impulse response, typically 5–6ms of which is reflection-free following the initial response of the loudspeaker, and obtain the frequency response..
2) Apply a minimum phase filter to the impulse data such that the frequency response becomes flat to DC and, optionally, a high-pass filter with a corner frequency significantly above that of the loudspeaker.
3) Truncate the impulse response such that all room reflections are removed. The resulting frequency response will have high-pass characteristic at a higher corner frequency.
4) Apply an inverse filter to that of step 2.
Now the impulse response has the low frequency persistent decaying oscillation extending cleanly beyond the first reflection arrival time.
There are several impulse response windowing methods and filter types that can be used. John explained that a rectangular window introduces ripples into the frequency response, while other types cause data to be lost towards the end of the truncated impulse response.
Methods of shortening the impulse response discussed were the Backman method and the Fincham method. The Backman method of flattening the frequency response to DC causes the impulse response to have a very long but zero-valued tail, making it suitable for truncation. The Fincham method, which raises the apparent corner frequency of the loudspeaker’s LF roll-off, shortens the impulse response, again allowing truncation to be applied without significant loss of data in the tail. As originally described, the Fincham method seemed to apply the step 2 filter to the test signal, which results (when the inverse filter is applied) in increased contamination of the acoustic measurement by low frequency noise. This can be avoided by applying the step 2 filter to the measured impulse response instead, and apparently this was the method actually employed.
Results obtained from a mid-size test speaker measured in a reverberant space were presented to show that reflections contaminate the measured frequency response if not windowed out. If they are windowed out conventionally, however, the frequency response at low frequencies is inaccurate because the impulse response is truncated prematurely. Whereas if the impulse response is processed using a 5ms rectangular window and Fincham filtering the result is a much more accurate frequency response below 200Hz.
Design of the Fincham filter requires knowledge of the loudspeaker’s bass alignment, which can be obtained either from analysis of its impedance versus frequency behaviour or from a near-field acoustic measurement. Accuracy of the frequency response obtained from the processed impulse response is not too dependent on the alignment parameters used..
John explained that the resulting low frequency response has a strong imprint of the model applied but argued that the result is still useful because we have good knowledge of the behaviour of loudspeakers at low frequencies. He also demonstrated that cabinet diffraction does not compromise the method, whereas it does provide difficulties for Prony Method modelling of the impulse response because diffraction cannot be modelled as an exponentially decaying oscillation.
John concluded the lecture by showing that conventionally gated impulse responses have validity at mid and high frequencies, so that obtaining the low frequency response using the method described gives a final measurement result which is in large part free of imperfections caused by room reflections across the entire audible frequency range. John ended the lecture by encouraging all present to try this methodology for themselves.
Report by Matthew Neighbour and Keith Howard