Archive

Archive for the ‘Meeting’ Category

A review of the sound system at Lord’s cricket ground

Title: A review of the sound system at Lord’s cricket ground
Location: Royal Academy of Engineering, London
Description: Lecture by Roland Hemming of RH Consulting
Start Time: 18:30 for 19:00
Date: Thursday 14th October 2010

Abstract

Stadium sound systems are always complicated projects. There is the need to deal with filling large spaces with sound and the requirement to comply with the many (and conflicting) voice alarm regulations.

Roland Hemming will explain the story behind the new sound system at Lord’s cricket ground. It uses the latest audio networking technology, a brand new signal processor and it involved unprecedented co-operation between two manufacturers. He will also explain how he deals with project risk and with the complexity of doing the work in many phases and with the fact that the ground is in a residential area.

About the presenter

Roland Hemming has established himself as one of the leading independent audio consultants and project managers for large technical installations.  His wide experience ranges from live events to construction sites, from manufacturing to installation, from cruise ships to theatre, rail, corporate AV, broadcast, education and stadia.  He has managed a number of very large projects including the Millennium Dome, Ascot Racecourse, Twickenham Stadium and St Pancras station.  He is also a consultant to manufacturers on the development of forthcoming audio products and is helping to develop and introduce the next generation of audio networking systems.

Headphone processing for a three-dimensional world

Title: Headphone processing for a three-dimensional world
Location: Royal Academy of Engineering, London
Description: Lecture by Ben Supper, Focusrite
Start Time: 18:30 for 19:00 start
Date: Tuesday 13th July 2010

Abstract

The practice of processing audio signals to impose lifelike room acoustics on them for headphone presentation is called auralization. The two most commercially exploited applications of auralisation are the conversion of headphone stereo listening into an experience more like loudspeaker stereo listening, and the simulation of proposed architectural spaces.

Although the tools required for auralization are fairly well understood, experiments that test the response of the human auditory system to spatial cues are generally designed to investigate one changing parameter in complicated sound field, and the way in which stimuli are synthesised is not standardised. These limitations mean that little has been written recently of the ways in which the various parts of the human auditory system interact to experience a spatial illusion presented over headphones.

This talk presents, informally, some observations learned from several years’ experience trying to analyse and deceive the spatial parts of the human auditory system. It discusses how we perceive the spatial cues present in direct sound and reverberation that are central to auralisation, and the most effective and efficient ways of presenting a convincing illusion without causing listening fatigue.

‘Synchronising the synchronisation standards’

Title: ‘Synchronising the synchronisation standards’
Location: Royal Academy of Engineering, London
Description: Lecture by John Emmett
Start Time: 19:00 for 19:30
Date: Tuesday 16th February, 2010

Download recording of lecture here (20MB MP3)

Lecture Report

Dr Emmett opened the lecture by summarising the audio-video synchronisation challenges encountered when putting together a television programme. It is better to correct synchronisation problems as they occur in the broadcasting chain than to attempt to correct them all immediately prior to transmission, as the former practice greatly simplifies video editing. With this achieved, attention turns to keeping audio synchronised during broadcast transmission and reception. This is particularly important for human speech: humans are exquisitely sensitive to lip sync. We develop this facility almost as soon as we can see, and the psychological need for lip movement to be attached to speech is so great that each Dalek must display a light that pulses in sync with speech, in order to bond dialogue to a particular character.

A number of techniques were employed in the days of purely analogue transmission to ensure that audio and video were kept in sync. It was not unusual for a programme’s video signal to be relayed via satellite and its audio via telephone, and a compensating audio delay had to be inserted to offset uplink and downlink delays. An example of this was used in ITN in the early 1980s. An in-band masked ‘bong’ was timed to follow any video cut in the programme by exactly one second. It was possible then for engineers to adjust the audio delay manually to maintain sync, even where this varied during the programme. Similar timestamps must still be maintained in digital systems, although this facility is now generally accommodated within the channel code.

It is increasingly common for audio and video to be streamed by piggy-backing on a packet-based protocol and transmitting via existing IT infrastructure. This works as long as there is sufficient bandwidth. Otherwise, heavy-duty interleaving is required to compensate for dropped packets, which increases transmission delay, and the chances of sync loss and system failure. As with real piggy-backs, the heavier the payload, the slower the system, and the greater the likelihood of collapse.

Consider what the word ’standard’ means: this is where problems are compounded. The word has two distinct meanings. It can refer to an outgoing or obsolescent paradigm (such as ’standard definition’), or to standard-bearing in its original sense — at the technological vanguard. We frequently encounter problems when it is necessary to choose between a plenitude of competing standards of different ages, some of which have yet to be adopted, and many of which should not. Standards are necessary only when the current best practice is unclear, so clues for choosing ‘good standards’ were suggested. A good standard must be fit for purpose, timely, and robustly defined: if the plug fits, the signal should work. There are also caveats, because not all standards are intended to be friendly (DRM systems were cited as an example), and even de facto standards undergo sudden and complete changes. Finally, although a standard needs to be owned by a company or committee to avoid obsolescence, it should contain no element for revenue generation.

The emergence of competing delivery standards in broadcasting has brought the synchronisation problem into the home. Many digital multichannel audio transport layers can be conveyed over S/PDIF channel code using IEC 61937 (Dolby Digital; DTS; linear PCM), and a home cinema amplifier may typically accommodate sixty connectors and a dozen multichannel formats. As for the picture, high-definition video formats such as 720p and 1080p co-exist with conventional 625-line 4:3 and 16:9 broadcasts. There are a number of video interconnection formats with different costs, advantages, and limitations. Any of four digital video broadcasting standards are in use in different regions throughout the world, encompassing several standard frame rates. Meanwhile, individual consumer products are designed for world markets, and are simultaneously compatible with many of these standards. In fact, UK broadcasters have been unable to rely on viewers possessing ’standard’ receiving equipment since 625-line broadcasts began in the 1960s.

Now that it can take half a day for a professional engineer to set up a new television, it is quite likely that a set-top box in a typical home may be configured to down-convert 720p video to standard definition, and transmit this signal over RGB SCART to a plasma television, which will then up-convert it to 1080p. Audio-video synchronisation is then at the mercy of equipment manufacturers.

Dr Emmett summarised his lecture with advice from Antoine de Saint-Exupéry: ‘No design is finished until the last superfluous item has been removed.’

Report by Ben Supper

‘When All the Songs Sounds the Same: Insights into the Musical Brain’

Title: When All the Songs Sounds the Same: Insights into the Musical Brain

Location: Royal Academy of Engineering, London
Description: Lecture by Dr Lauren Stewart, Goldsmiths University, London
Start Time: 19:00
Date: Thursday 10th June 2010

The ability to make sense of musical sound has been observed in every culture since the beginning of recorded history. In early infancy, it allows us to respond to the sing-song interactions from a primary caregiver and to engage in musical play. In later life it shapes our social and cultural identities and modulates our affective and emotional states. But a few percent of the population fail to develop the ability to make sense of or engage with music.  The study of disordered musical development sets in sharp relief the perceptual and cognitive abilities which most of us take for granted and give us a unique chance to investigate how musical perceptual ability develops, from the level of the gene to the brain development and the emergence of a complex and fundamental human behaviour.

Dr Stewart is Senior Lecturer and director of a new MSc course:

Music, Mind and Brain at Goldsmiths, University of London

Lauren originally studied Physiological Sciences at Balliol College Oxford, but transferred from bodies to brains with an MSc in Neuroscience and doctoral and postdoctoral training at the Institute of Cognitive Neuroscience, the Wellcome Department of Imaging Neuroscience (both UCL) and Harvard Medical School.

Her current research interests ranges from studying those with congenital amusia who have an inability to make sense of musical sound to studying the acquisition of perceptual, cognitive and motor skills in trained musicians.

‘Can we make quasi-anechoic measurements in normal rooms?’

Title: ‘Can we make quasi-anechoic measurements in normal rooms?’
Location: Royal Academy of Engineering, London
Description: Lecture by John Vanderkooy, Audio Research Group, University of Waterloo, Canada, with Steyning Research Establishment, B&W Group Ltd, UK
Start Time: 18:30 for 19:00
Date: Tuesday 10th March, 2009

Lecture Report

John Vanderkooy presented research into methods to improve loudspeaker measurements made in non-anechoic rooms. The lecture began with a discussion of the motivation for the research:

  • Not everyone has access to an anechoic chamber
  • Anechoic chambers may not be effective below 100Hz due to inadequate LF absorption
  • Low frequency calibration of anechoic chambers may be ineffective
  • Low frequency noise from air conditioning, industry and the environment can easily contaminate the measurements.

Impulse response measurements made in an echoic room or an imperfect anechoic chamber will have reflections that contaminate the results and will also often have significant levels of added noise. John presented measurements from a 110mm driver in a small sealed cabinet to illustrate the algorithm developed to overcome these limitations.

The algorithm comprises the following steps:

1)    Measure an impulse response, typically 5–6ms of which is reflection-free following the initial response of the loudspeaker, and obtain the frequency response..
2)    Apply a minimum phase filter to the impulse data such that the frequency response becomes flat to DC and, optionally, a high-pass filter with a corner frequency significantly above that of the loudspeaker.
3)    Truncate the impulse response such that all room reflections are removed.  The resulting frequency response will have high-pass characteristic at a higher corner frequency.
4)    Apply an inverse filter to that of step 2.

Now the impulse response has the low frequency persistent decaying oscillation  extending cleanly beyond the first reflection arrival time.

There are several impulse response windowing methods and filter types that can be used. John explained that a rectangular window introduces ripples into the frequency response, while other types cause data to be lost towards the end of the truncated impulse response.

Methods of shortening the impulse response  discussed were the Backman method and the Fincham method. The Backman method of flattening the frequency response to DC causes the impulse response to have a very long but zero-valued tail, making it suitable for truncation. The Fincham method, which raises the apparent corner frequency of the loudspeaker’s LF roll-off, shortens the impulse response, again allowing truncation to be applied without significant loss of data in the tail. As originally described, the Fincham method seemed to apply the step 2 filter to the test signal, which results (when the inverse filter is applied) in increased contamination of the acoustic measurement by low frequency noise. This can be avoided by applying the step 2 filter to the measured impulse response instead, and apparently this was the method actually employed.

Results obtained from a mid-size test speaker measured in a reverberant space were presented to show that reflections contaminate the measured frequency response if not windowed out. If they are windowed out conventionally, however, the frequency response at low frequencies is inaccurate because the impulse response is truncated prematurely. Whereas if the impulse response is processed using a 5ms rectangular window and Fincham filtering the result is a much more accurate frequency response below 200Hz.

Design of the Fincham filter requires knowledge of the loudspeaker’s bass alignment, which can be obtained either from analysis of its impedance versus frequency behaviour or from a near-field acoustic measurement. Accuracy of the frequency response obtained from the processed impulse response is not too dependent on the alignment parameters used..

John explained that the resulting low frequency response has a strong imprint of the model applied but argued that the result is still useful because we have good knowledge of the behaviour of loudspeakers at low frequencies. He also demonstrated that cabinet diffraction does not compromise the method, whereas it does provide difficulties for Prony Method modelling of the impulse response because diffraction cannot be modelled as an exponentially decaying oscillation.

John concluded the lecture by showing that conventionally gated impulse responses have validity at mid and high frequencies, so that obtaining the low frequency response using the method described gives a final measurement result which is in large part free of imperfections caused by room reflections across the entire audible frequency range. John ended the lecture by encouraging all present to try this methodology for themselves.

Report by Matthew Neighbour and Keith Howard

Can we make quasi-anechoic measurements in normal rooms?

John Vanderkooy

Audio Research Group, University of Waterloo, Canada

Steyning Research Establishment, B&W Group Ltd, UK

John Vanderkooy presented research into methods to improve loudspeaker measurements made in non-anechoic rooms. The lecture began with a discussion of the motivation for the research:

- Not everyone has access to an anechoic chamber

- Anechoic chambers may not be effective below 100Hz due to inadequate LF absorption

- Low frequency calibration of anechoic chambers may be ineffective

- Low frequency noise from air conditioning, industry and the environment can easily contaminate the measurements.

Impulse response measurements made in an echoic room or an imperfect anechoic chamber will have reflections that contaminate the results and will also often have significant levels of added noise. John presented measurements from a 110mm driver in a small sealed cabinet to illustrate the algorithm developed to overcome these limitations.

The algorithm comprises the following steps:

1) Measure an impulse response, typically 5–6ms of which is reflection-free following the initial response of the loudspeaker, and obtain the frequency response..

2) Apply a minimum phase filter to the impulse data such that the frequency response becomes flat to DC and, optionally, a high-pass filter with a corner frequency significantly above that of the loudspeaker.

3) Truncate the impulse response such that all room reflections are removed. The resulting frequency response will have high-pass characteristic at a higher corner frequency.

4) Apply an inverse filter to that of step 2.

Now the impulse response has the low frequency persistent decaying oscillation portion extending cleanly beyond the first reflection arrival time.

There are several impulse response windowing methods and filter types that can be used. John explained that a rectangular window introduces ripples into the frequency response, while other types cause data to be lost towards the end of the truncated impulse response.

Methods of shortening the impulse response Filter types discussed were the Backmann method and the Fincham method. The Backmann method of flattening the frequency response to DC causes the impulse response to have a very long but zero-valued tail, making it suitable for truncation. The Fincham method, which raises the apparent corner frequency of the loudspeaker’s LF roll-off, shortens the impulse response, again allowing truncation to be applied without significant loss of data in the tail. As originally described, the Fincham method seemed to applyies the step 2 filter to the test signal, which results (when the inverse filter is applied) in increased contamination of the acoustic measurement by low frequency noise. This can be avoided by applying the step 2 filter to the measured impulse response instead, and apparently this was the method actually employed.

Results obtained from a mid-size test speaker measured in a reverberant space were presented to show that reflections contaminate the measured frequency response if not windowed out. If they are windowed out conventionally, however, the frequency response at low frequencies is inaccurate because the impulse response is truncated prematurely. Whereas if the impulse response is processed using a 5ms rectangular window and Fincham filtering the result is a much more accurate frequency response below 200Hz.

Design of the Fincham filter requires knowledge of the loudspeaker’s bass alignment, which can be obtained either from analysis of its impedance versus frequency behaviour or from a near-field acoustic measurement. Accuracy of the frequency response obtained from the processed impulse response is not too dependent on the alignment parameters used.being known within tight tolerances.

John explained that the resulting low frequency response has a strong imprint of the model applied but argued that the result is still useful because we have good knowledge of the behaviour of loudspeakers at low frequencies. He also demonstrated that cabinet diffraction does not compromise the method, whereas it does provide difficulties for Prony Method modelling of the impulse response because diffraction cannot be modelled as an exponentially decaying oscillation.

John concluded the lecture by showing that conventionally gated impulse responses have validity at mid and high frequencies, so that obtaining the low frequency response using the method described gives a final measurement result which is in large part free of imperfections caused by room reflections across the entire audible frequency range. John ended the lecture by encouraging all present to try this methodology for themselves.

Report by Matthew Neighbour and Keith Howard

‘How I Does Filters’

Title: ‘How I Does Filters: An uneducated person’s way to design highly-regarded digital equalisers and filters’
Location: Royal Academy of Engineering, London
Description: Christmas Lecture by Peter Eastty of Oxford Digital
Start Time: 18:30 for 19:00
Date: Tuesday 8th December 2009

Abstract

“Much has been written in many learned papers about the design of audio filters and equalisers – this is not another one of those. The presenter is a bear of little brain and has over the years had to reduce the subject of digital filtering into bite-sized lumps containing a number of simple recipes that have got him through most of his professional life. Complete practical implementations of bell (or presence) filters, high pass and low pass multi-order filters and shelving filters will be given. The infrequently seen higher order shelving filters will also be used to generate minimum phase IIR filters of arbitrary shape. The tutorial is designed for the complete novice, is light on mathematics and heavy on explanation and visualization – even so, the provided code works and can be put to practical use.”

The audio recording of this lecture is available to download here (MP3, 15MB).

Meeting Report

The PowerPoint visuals for this lecture are available from the website, and it’s highly recommended that you view them while reading this report or listening to the recording, because many of the key concepts are graphical and make no sense without the pictures. Download the visuals here (4MB PDF).

Peter opened his lecture with the assertion that it would not be mathematical, an interesting proposition for a topic notorious for the complexity and abstraction of its mathematics. He also pointed out that he has never designed an analogue filter in his life: instead of approaching digital filter design by designing analogue filters and translating them to the digital domain, he has always considered it from an exclusively discrete-time, sampled perspective. The first design aims are to create a bell-shaped presence filter, and a shelf filter: essentially the same as EQ controls on a mixing console (see page 3 of the visuals). Both of these filter types are defined by three independent parameters, as illustrated: gain, frequency, and Q (for the bell filter) or overshoot for the shelf. To achieve this, there are just three building blocks available (page 4): multiplication, addition, and a delay of one of more samples. So, how do we go about arranging these building blocks to make filters?

Peter took us on a rapid yet simple-to-follow tour of the effect of combining simple combinations of multiplication, addition and delay, with the most intuitive explanation I’ve ever heard of visualising filter responses in the z-plane. It’s difficult to paraphrase to convey the meaning without simply writing down the whole lecture, but listening to the recording whilst viewing the visuals will convey the message. He started by considering the behaviour of the simplest possible combination of a delay and adder – effectively a one-tap FIR, derived the z-plane representation and frequency response entirely graphically and intuitively, then proceeded to extend by adding a multiplier (for coefficient values other than 1), resulting in the insight that the coefficient moves a zero along the x-axis of the z-plane (page 19).

Next, it was shown that responses could be combined by cascading filters together (page 20 and following), but that the elements of cascaded filters can also be combined together into one structure with a single adder (accumulator) with identical behaviour, with simple mathematical relationships between the multiplier values (coefficients) in the cascaded filters and the combine structure. Based on this, a relationship was derived between the coefficients and the positions of zeroes on the x-axis – a little maths involving a square root, but still pretty straightforward (page 23). Of course, square roots often tend to give rise to roots of negative numbers (page 26 – looks remarkably like the quadratic formula) – so what do you do then? Well, in a move highly reminiscent of complex numbers, each zero moves off the x-axis in relation to a couple of new equations, to create a symmetrical pair (pages 27-29), and all the findings so far are summarised on page 30.

So far, the lecture had focussed on things that reduce gain (zeroes) – what about things that increase it, pulling that 3D surface upwards (page 31) instead of downwards? Just like with op-amps, positive gain is created using feedback loops, and the feedback loop contains very similar topologies to the filters already discussed (pages 32-34). Second-order filters with only negative gain response (all-zero) can be combined with second order filters with only positive gain response (all-pole) (pages 35-36), and the resulting structure is often known as a biquad, beloved of digital mixing console designers for (among many other things) digital versions of traditional parametric equalisers and filters. It is shown to have two symmetrical pole/zero pairs, and when the frequency response is plotted against log frequency, it can give rise to the familiar bell-curve EQ frequency response (page 41) if the pole and zero are associated with the same frequency. The distance between the pole and the zero was shown to be related to the Q or bandwidth of the filter (pages 43-50), and the geometry for the curves of constant frequency is calculated.

Curves of constant gain were also shown (pages 51-55), and it was then shown (pages 56-58) that the curves of constant frequency and curves of constant gain are orthogonal at all points – important for independent control of them. All these equations were pulled together, and with the addition of a gain correction term (page 60), resulted in the definitive equations for biquads (page 61). It was demonstrated with code snippets that these equations are directly implemented in Oxford Digital’s products. The effect of a biquad with same gain for the pole and zero, but at different frequencies, was illustrated (pages 65-66): it was shown that the perfectly-damped response is achieved when the gain circle has its origin on the unit circle (page 67).

Higher-order filters can be created by adding extra pole/zero pairs on the same constant frequency curves, but on carefully-chosen constant-gain curves (pages 76-81). It was then demonstrated how to make non-integer-order filters, by using the fact that a coincident pole/zero pair cancel each other, so by introducing such a pair (no effect on the filter) and then slowly moving them apart, the pole/zero configurations for integer orders can be interpolated between. This is truly novel, and although the graphics illustrating the configurations are not in the visuals linked to above, they can be seen in Peter’s convention paper presented at the 125th AES Convention, entitled “Accurate IIR Equalization to an Arbitrary Frequency Response, with Low Delay and Low Noise Real-Time Adjustment“.

This ability to have non-integer order IIR filters permits the construction of arbitrary filter responses, but without the usual penalties of FIR filters (namely, long processing delays and poor phase performance). Peter demonstrated his real-time filter software, running on a laptop with a frequency response curve that is manipulated by attaching handles and moving them arbitrarily as desired. Naturally, some extreme frequency responses result in filter orders in the hundreds, and CPU power is limited, so the filter order can be limited and the response gracefully falls away from the handles if an impractical response is requested. The filter changes response quickly and completely smoothly (to the audio) in real time, even with rapid changes to extreme filter responses with orders greater than 100 – how this is achieved, Peter declined to elaborate further! Controlling the coefficients of IIR filters such that smooth changes in gain/frequency/bandwidth are achieved without artefacts or (worse) instability is regarded as a challenging task for simple conventional filter designs, so achieving this for Peter’s much more sophisticated arbitrary-response EQ with extremely high orders is impressive.

Peter concluded his fascinating lecture with the observation – made possible by his EQ – that if one creates a dramatic comb-like filter response (in this case, alternating 12dB gain boost at roughly octave intervals), then shifts the frequencies of all the gain/cut points together in logarithmic frequency (i.e. group all the handles together and drag them left/right at once), the resulting effect sounds like playback pitch is being increased or decrease, despite the audio remaining at constant pitch and playback speed. Peter makes the entirely plausible suggestion that the rapid scaling of a complex frequency-domain structure in log frequency creates a psychoacoustic illusion of pitch shift, because it sounds like the frequency scaling of harmonic structures characteristic of pitch shift.

Many thanks to Peter Eastty for a fascinating and entertaining Christmas lecture, which delivered fascinating insights for both seasoned digital audio engineers and those new to the field, and revealed genuinely groundbreaking technology.

Meeting report by Michael Page

‘Who’s the bad guy now? Maintaining audio/video sync in today’s broadcast environment’

Title: ‘Who’s the bad guy now? Maintaining audio/video sync in today’s broadcast environment’
Location: Royal Academy of Engineering, London
Description: Lecture by Andy Quested, Head of Technology, BBC R&D
Start Time: 18:30 for 19:00
Date: Tuesday 12th January 2010

To complain that “the audio is out of sync” was, in the past, doing audio an injustice. The use of visual effects units, time base correctors and other digital processing in the video chain, while the audio continued to pass through an analogue signal path, meant that it was, in fact, the video which was usually out of sync. However, the move to digital audio processing, and in particular surround sound broadcasting – which often  requires six channels to be passed through a two-channel infrastructure – has significantly moved the goalposts. The advent of HD, with its more clearly defined imaging, has exacerbated the problem. Andy Quested will highlight some of the audio/video synchronisation issues that the BBC HD channel has had to deal with, and will outline the measures it is taking to put audio back into its rightful place.

Andy’s BBC blog provides some more background: http://www.bbc.co.uk/blogs/bbcinternet/2009/12/the_hitchhikers_guide_to_encod.html

The lecture recording is available to download here (45MB MP3)

Meeting Report

Andy was joined for the lecture by a colleague from BBC Future Media & Technology, Rowan de Pomerai, who provided details of BBC HD’s audio/video transmission infrastructure and the points where sync errors can be introduced. This was comprehensively illustrated by slides showing block diagrams of the various elements in the chain, many of which can be found in an excellent white paper on the EBU’s website: http://tech.ebu.ch/docs/techreview/trev_2009-Q1_HD-Audio-Delays.pdf. Andy’s contribution was more anecdotal, highlighting the actual problems encountered, and this report will focus primarily on his part of the lecture.

Andy opened with some statistics on HD adoption in the UK. Sky has 1.8m HD subscribers, Virgin has 280,000 and 48,000 watch HD via Freesat. Freeview HD is launching and is expected to become the biggest single platform. In 2009, Wimbledon and Torchwood attracted HD audiences of 1.75m. Overall, 2009 was not a bumper year for sport but there will be plenty in 2010, including the Winter Olympics and the World Cup. Launched in April 2009, the HD iPlayer is now the most successful version of the BBC’s catch-up service.

In a recent survey, viewers were asked what they considered to be the most important elements of an HD channel. Not surprisingly, picture quality was placed top by 56 per cent, followed by choice of programming by 48 per cent. Sound quality was fourth at 34%, a figure that hasn’t really changed since BBC HD was launched in 2007. Part of the problem is that, unlike cinemas which have laid-down standards for audio replay, home speaker layouts can vary enormously, particularly in the placement of the centre speaker. This can make it difficult to predict the listening experience.

Moving on to the specific topic of audio sync, Andy noted that the BBC HD channel suffered from several audio sync and metadata problems in the early days. Programmes affected were the Proms, Electric Proms, Olympics and Strictly Come Dancing.

One of the earliest instances of a major problem involved the 2008 Eurovision Song Contest. Andy was watching at home and immediately noticed that there was no music track on the HD broadcast, only vocals from the centre speaker. Somehow what should have been a 5.1 track was actually 1.0, which shouldn’t happen because the BBC HD channel is locked to 5.1 even when broadcasting stereo in order to prevent clicks or mutes which happen when some AV receivers switch modes.

Andy phoned the broadcast centre, which was unaware on the problem – they were hearing 5.1 all the way through the chain. Andy suspected a metadata issue but where was the problem occurring? The broadcast chain includes many elements, not helped by the BBC’s outsourcing policy which means that there are several companies involved (see Rowan’s white paper). The decision was made to switch to an upconverted BBC 1 feed with stereo audio while the problem was investigated because taking audio only would have resulted in sync problems.

At Andy’s request the Dolby encoder was checked and it was found to be set to disable the metadata (a option that has since been removed by a software update). With no metadata the Dolby decoders in the set-top boxes revert to their default mode, which is 1.0. This is a legacy from Dolby systems in cinemas where the centre dialogue channel is the most important element and is therefore the most logical default.

With regard to maintaining sync, BBC HD has taken the approach that audio and video should be in sync at every stage of the chain – known as in-sync encoded. However, this hasn’t stopped numerous complaints about audio/video sync from viewers.

Rowan de Pomerai explained that many of the problems are due to delays created within set-top boxes and flat panel displays, the latter creating a video delay of up to 100ms. Hearing audio before the video is counter-intuitive because light travels faster than sound and we’re therefore used to hearing the audio delayed relative to the picture, not vice-versa. Many set-top boxes have a delay function, but this has to be configured. The BBC has developed a sync test to assist in setup which is broadcast a regular intervals during the daytime. (For a full description of the test see Rowan’s white paper.)

Providing a sync test is a great idea but for it to work correctly it’s essential that the audio and video signals arriving at the set-top box are in sync. The broadcast chain was measured all the way through to the broadcast encoder and adjustments made for minor sync errors introduced throughout the system. A duplicate system at BBC R&D Kingswood Warren was also measured to verify the figures. However, the only way to check categorically that everything was OK was actually to broadcast a test.

The final problem was how to measure the sync off-air. A set-top box was not reliable enough so the solution was to record the MPEG transport stream, decode it offline and measure the analogue waveform and video frame numbers. The BBC’s was aiming for ±5ms – a quarter of the EBU’s recommendation – but the result of this test was measured to be ±2ms. “So, it’s no longer just ‘OK leaving me’, it’s also ‘OK arriving at you,’” Andy noted, adding that servers do drift so ±5ms is BBC HD’s target as an average. This is still an excellent figure when taking into account that there’s around 8ms sound delay between a TV and the viewer.

Before transmitting this test BBC HD received 20-30 complaints a week regarding sync but after the test these dropped to zero. The only complaints received since were for one live broadcast that actually was out of sync. In that instance BBC HD knew the feed was out of sync because they had the confidence that the broadcast chain was 100 per cent in sync.

In conclusion, Andy stressed that audience education is essential. The BBC receives about 90,000 hits on its website and 3,000 calls per month about HD. The days of just plugging everything in and it all working are gone. Users need to understand about 5.1, adjusting audio delay and speaker positions, and – very important – removing the SCART lead. Countless viewers are watching HD programmes in SD because pin 8 on their SCART has switched the TV from HDMI to the AV input!

‘An introduction to forensic audio’

Title: ‘An introduction to forensic audio’
Location: Royal Academy of Engineering, London
Description: Lecture by Gordon Reid, CEDAR Audio Ltd
Start Time: 18:30 for 19:00
Date: Thursday 15th April 2010

Speech enhancement has come a long way in the digital era, but it is not the ‘magic wand’ depicted on TV and in Hollywood movies. Adaptive filters have traditionally been the basis of forensic audio work, but a combination of techniques – including broadband noise reduction, buzz removal, equalisation and background noise suppression – can provide superior results when compared with any single approach. This introduction, illustrated using examples processed in real-time on a CEDAR Cambridge Forensic system, aims to shed light on this, demonstrating how signal processing can aid investigators in areas including criminal investigation, counter-terrorism and air accident investigation.

Lecture Report

Note: we are unable to provide a recording of this lecture because some of CEDAR’s police and security customers place strict constraints on the public dissemination of the audio clips and details of cases used in demonstrations of CEDAR’s forensic technology.

Gordon Reid is the Managing Director of CEDAR Audio, a leading manufacturer of audio restoration and speech enhancement products. He kicked off his lecture with a scenario of how video surveillance, without audio content, can give ambiguous or even completely misleading indications of intent.

Audio forensics is a relatively new field that first entered common use in the 1960s/1970s. Thanks to the technology of companies like CEDAR, audio forensics is now an established field, and the most recent trend is for audio and video surveillance data to be integrated. Before the arrival of digital technology in the 1990s, audio forensics was relatively crude, using often poorly-maintained analogue tape recorders, no single-ended noise reduction, and often just analogue EQ and dynamics processes for clean-up.

Nowadays, recordings are mostly digital, and can be made using low-cost consumer equipment. But this brings some new problems. Recordings are often made by untrained people using small, cheap recorders: he highlighted a divorce litigation case in which a woman concealed a recorder at the bottom of her handbag, covered by a scarf and jumper to make sure it wasn’t found. Unsurprisingly, there was almost no discernable speech data on the resulting recording. So there are new problems to face, but fortunately, DSP algorithms and powerful computers can help get around many of these. But even these have limits: Gordon described a phenomenon known as the “CSI Effect”, whereby the public has unrealistic and fantasy-based expectations of surveillance restoration technology. He cited the apparently genuine example of a person who’d snapped a photo of the side of a speeding getaway vehicle on a mobile phone, and handed it to the police in the expectation that by rotating the side-on (and blurred, low-quality) image in a 3-D computer imaging system, they could read the license plate! But absurd cases aside, there is an increasing problem: the bad guys are increasingly aware of surveillance techniques, making (for example) body wires impractical because criminals know how to frisk for them effectively. They also know to hold sensitive conversations in locations where there is loud, effective masking noise such as running water or TV noise.

Gordon broke noise reduction technologies for audio forensics into two main applications: real-time surveillance and non-real-time laboratory investigation. Surveillance systems have live listeners (typically police or security officers), who may need to make fast, accurate and life-critical decisions based on what is heard. The principal requirements are low latency, high intelligibility and low listener fatigue. Non-real-time systems are typically used to produce evidence admissible for the courts, so the requirements are for high transcription accuracy, the retrieval of otherwise unintelligible speech, and to reduce transcriptor fatigue. Also, jurors are not trained listeners and courtrooms typically have very poor acoustics, so the presence of background noise may affect their judgement. He cited the case of a defence lawyer who used the presence of modest traffic and street noise on an intelligible recording of incriminating statements to cast doubt on the transcription of the recorded speech – and won.

Gordon listed the long-established principles of good non-covert audio evidence: a suitable recorder, competent operator, authentic recordings, recordings preserved such that they are demonstrable in court, speakers identified, evidence made voluntarily and in good faith – and no edits or changes made. The last point is potentially problematic as, in principle, it could exclude the enhancement processes that render noisy evidence intelligible. This is a grey area, with the degree of processing admissible dependent on the judge, court and jurisdiction. Clearly, there is a need to demonstrate that the processing has not modified the meaning of the evidence. For example, it’s not possible for the microscopic editing of a real-time declicking algorithm to change phonemes, and so change the meaning, but the court may need to be convinced of this. Additionally, proposed UK government regulations on handling evidence may be applied to audio evidence, potentially causing substantial problems when regulations designed to protect physical items are applied to digital media.

Gordon moved on to talk about the specifics of the technology used: it’s usually some combination of noise reduction, equalisation and level processing (e.g. dynamics processing). Dialogue noise suppression is a technology originally developed for the film industry, and CEDAR’s first product in this field, a real-time and very easy to use device, was aimed at post-production for film, video and TV: the typical application was to save a take that had been spoiled by ambient sound intrusion. This was contrasted with lab systems: large computer-based systems intended for off-line batch processing rather than real-time use.

The use of declickers was demonstrated. The earliest algorithms in this field were originally developed for 78rpm archives, but have been developed much further and are now extremely helpful in removing GSM noise, the familiar buzzing/pulsing interference caused by mobile phones. GSM noise can be shown to comprise buzz at around 217Hz and a series of pulses. The declicker can remove the impulsive noises, and the buzz can be removed with a dedicated Debuzz algorithm. The results of this were demonstrated with a 999 call recording, originally almost completely inaudible, but which when processed revealed much more information and the presence of a second, previously-unheard speaker in the background – of crucial important to the court case in which the recording was presented as evidence!

Gordon next discussed the use of adaptive filters. If the statistics of the noise are relatively constant, it’s possible to design a filter to separate speech (which tends to change rapidly) from the noise. Additional improvements can sometimes be achieved by treating low and mid frequencies differently to high frequencies, based on perceptual models of hearing and intelligibility.

Some of the interesting applications of adaptive filters include cleaning-up reverberant spaces such as holding cells and transfer vans, and removal of the 400Hz buzz from aircraft power systems that can degrade air traffic control recordings. And, in a reversal of the normal filtering, it was described how CEDAR removed the shouting from a cockpit voice recording in a helicopter that had just suffered a catastrophic mechanical failure, so the investigators could listen to the mechanical sounds to trace the cause of the accident.

Cross-channel adaptive filters can overcome steps taken to defeat surveillance, such as using loud radio or TV to mask a conversation. This type of filter exploits the correlation between the direct broadcast signal (if available) and the tonally altered broadcast signal present in the surveillance, and can effectively remove it from the surveillance signal. If there isn’t a convenient reference of the broadcast, use of multiple microphone locations causes some to have more speech and others to have more interfering signal, giving the cross-channel adaptive filter enough to work with. A reconstructed demonstration was played in which, when using a single mic recording of some speech in the presence of loud music from a radio, a transcription expert obtained approximately 30% accuracy. Adding a second mic positioned closer to the radio than the first and using this as the reference channel for the cross-channel adaptive filter, the intelligibility was hugely improved, and the transcription accuracy increased to 100%.

The form of broadband noise reduction known as spectral subtraction is an impressive tool in music production and restoration, but in forensics its use can be more limited: although it improves listenability and reduces fatigue, the best that can be hoped for regarding intelligibility is that it doesn’t damage it. Nonetheless, it has significant other uses in audio forensics, such as removing the hiss that can be added by adaptive filters. EQ, despite its simplicity and ubiquity, has been a staple processor for forensics since long before the days of DSP and adaptive filters. Removal of low frequencies and the addition of a little boost in the upper mids can hugely increase intelligibility. Limiters are used to reduce the impact of sudden loud noises. By its nature, forensic audio can involve extreme dynamic ranges. When a surveillance officer or transcriptor is listening closely to very low-level signals at very high gain, loud sounds such as gunshots/vehicle crashes/etc. can, without limiting, damage the listener’s hearing. In other cases, such as a recording of a telephone conversation made using a hand-held recorder, balancing the levels of the local and remote speaker can help render the evidence more intelligible and therefore more useful in court.

Gordon mentioned the increasingly widespread suspicion that audio data mining is being deployed by security agencies: that is, mass interception of all voice communications with automatic recognition of certain key words (e.g. bomb, jihad, etc.). Gordon’s view is that this is not currently technically practical, but that its use may increase within a decade or two. What is currently feasible, and is being used to an ever greater degree, is automatic speaker recognition: commercial solutions are developing fast, but their robustness to voice signals that have been altered by enhancement processing is an ongoing research field. Another significant recent development is the prevalence of low bit-rate, highly-compressed perceptual codecs, which can make both enhancement and automatic speaker recognition more problematic.

Gordon concluded his lecture with a mention of spectrographic editing, which was invented by CEDAR. Time-domain editing can be recognised in a spectrograph, making this kind of evidence-tampering obvious. But spectrographic editing allows powerful manipulation of the signal, often invisible to future investigation. This tampering can be very dangerous in the wrong hands, but when used ethically can reduce or remove masking signals, making it a powerful enhancement tool.

Many thanks to Gordon for an eye-opening lecture, and his fascinating insights into the remarkable technology his company has created.

Lecture report by Michael Page

‘Remastering and Audio Restoration at Abbey Road Studios’

Title: ‘Remastering and Audio Restoration at Abbey Road Studios’
Location: Royal Academy of Engineering, London
Description: Lecture by Simon Gibson of Abbey Road Studios
Start Time: 18:30 for 19:00
Date: Tuesday 11th May 2010

EMI has an archive going back to 1898 and, since Abbey Road Studios opened in 1931, there has been a gradual increase in the remastering of that back catalogue for new formats. Starting with a potted history of EMI, the early years of recording and the work of Alan Blumlein, we move on to the emergence of remastering at Abbey Road and the systems and techniques used today. The talk will then concentrate on the use made of CEDAR Audio’s Retouch software in the audio restoration of The Beatles album remasters as well as its more usual use in the creation of music for the video game The Beatles Rockband. Along the way we will hear rare audio extracts from EMI’s archive and clips from The Beatles’ recordings to demonstrate these remastering and restoration techniques.

‘Improved methods for controlling touring loudspeaker arrays’

Title: ‘Improved methods for controlling touring loudspeaker arrays’
Location: Royal Academy of Engineering, London
Description: Lecture by Ambrose Thompson, Martin Audio
Start Time: 18:30 for 19:00
Date: Tuesday 9th March 2010

Download the recording of this lecture here (12MB MP3)

The focus of this paper is a popular type of line array loudspeaker used for large- and medium-scale sound reinforcement. These systems are required to deliver very high SPL to a large audience area sometimes as far as 100m from the array, but typically in the 30-70m range. This class of line array is characterised by relatively widely spaced acoustic sources, each with high vertical directionality compared to the more traditional steered column loudspeaker where the acoustic sources are small and tightly spaced. These differences, together with the fact that large audience regions are typically in the near-field, preclude the use of the existing techniques to control linear arrays.

Currently successful methods of control were examined and found to be inadequate for meeting a new more stringent set of user requirements. This paper describes how users of the modern articulated line array loudspeakers used for high level sound reinforcement can control these systems with more precision, and explains how these requirements can be formed into a mathematical model of the system suitable for numerical optimisation. The primary design variable for optimisation was the complex transfer functions applied to each acoustic source. How the optimised transfer functions were implemented with IIR/FIR filters on typically available hardware is explained, and a comparison made between the predicted and measured output for a large array.