Archive

Archive for the ‘Past meeting reports’ Category

Active Acoustic Absorbers: Do They Work? [London]

Title: Active Acoustic Absorbers: Do They Work?
Location: Royal Academy of Engineering, London, SW1Y 5DG
Description: Lecture by John Vanderkooy
Start Time: 18:30 for 19:00
Date: Tuesday 10th May 2011

Lecture Report

Active acoustic absorbers can replace low-frequency ‘passive’ absorption techniques. Passive techniques generally involve solving resonance problems in a room by introducing absorbing materials or resonant structures. General-purpose acoustic absorption, comprising sheets of heavy material attached to rigid frames, are somewhat impractical in many rooms, because the size at which they become effective is between a quarter- and a half-wavelength of the frequency of interest — around six feet when treating a 50Hz resonance. Membrane absorbers and resonators are tuned to move when stimulated by certain frequencies, and hence to terminate standing waves. These can be relatively small in size, but because they can react to only a small range of frequencies, several may be required to treat a room. We can alleviate serious problems in a room by equalising the loudspeakers that excite them, but this addresses only problems of sound pressure level. Equalisation cannot treat the equally insidious phase and reverberation time discontinuities that afflict specific frequencies.

Active absorption is effected by positioning subwoofers or full-range loudspeakers strategically, and driving them with a specially-calculated signal that cancels a large range of frequencies, using less space and treating a greater range of frequencies than a passive absorber could.

A relatively simple and theoretically ideal example of this is the ‘delay and cancel’ scheme. Taking a rectangular room, we can place two loudspeakers 25% and 75% of the distance along a wall, and drive them coherently. The images of these loudspeakers reflected in the other walls are evenly spaced, creating a plane wave along the room. This can be cancelled at the rear of the room using a similar arrangement of loudspeakers, delayed appropriately. The bass then becomes effectively anechoic at low frequencies.

Given a rectangular room of dimensions 8 × 7 × 3.5 metres and a reverberation time of one second, we can use the Sabine formula to calculate that the room contains 31.5 sabins (square metres of ideal absorption). The effective area of an active absorber is equal to:

Aabs = λ2 / 4π

This is about 3.7 sabins at 50Hz (an extra 12% of absorption in this room), and 24 sabins at 20Hz (an extra 74%). The theoretical benefits of ideal active absorption are clear, but there are some practical difficulties. Firstly, an ideal loudspeaker is a point source radiator, and the wavefront that we want to treat is generally closer to a plane wave. How do we know that our absorber is not interacting elsewhere with the wave that we are attempting to absorb? Secondly, how does this treatment work in a room where the absorption signal itself is reflected?

John Vanderkooy’s derivation of the driving voltage for an acoustic absorber was performed very rapidly at the lecture, but the brief answer is that both conditions are met without difficulty. The emerging formula for the absorbing signal is:

q(t) = 2πc/ρ × ∫ ∫ p(r,t) dt dt

Where q(t) is the desired volume velocity of the active absorber loudspeaker. Thus, the cone velocity of the absorber is proportional to the double time integral of the pressure at the loudspeaker from the room. To produce this volume velocity we could use a  velocity-sensing coil on the loudspeaker in feedback, for example.  The pressure p(r,t) must not be contaminated by the absorber signal itself, so we must know the absorber response and subtract it from the microphone signal.  Eliminating the self-pressure of the loudspeaker (caused by its provision of the absorbing signal), and shaping the output transfer function to be both stable and correct for the loudspeaker is a significant challenge.

In summary, active absorption is an acoustically valid way of treating low-frequency problems in real rooms, but there are considerable practical difficulties in doing it well. One practical barrier is the necessity for near zero-latency analogue-to-digital conversion and DSP in order to suppress the local absorber signal, read the instantaneous external pressure from the room, react to it, and hence calculate the desired absorption signal.

Report by Ben Supper

Harmonic phase – the missing factor in distortion measurement [London]

Title: Harmonic phase – the missing factor in distortion measurement
Location: Royal Academy of Engineering, London, SW1Y 5DG
Description: Lecture by Keith Howard
Start Time: 18:30 for 19:00
Date: Tuesday 12th April 2011

Lecture Report

It is a truism that harmonic distortion affects the perceived quality of an audio signal. It is less readily accepted that such distortion may sometimes be pleasant. In 1977, Hiraga’s article ‘Amplifier Musicality’1 controversially suggested that certain kinds of harmonic distortion may improve the perceived quality of a Hi‑Fi system. This notion is now dubbed ‘euphonic distortion’, although more than thirty years later, few, if any, new insights exist on the subject.

Less controversially, many recording engineers insist on specifying equipment that introduces certain types of harmonic distortion at high input levels — valve amplifiers and analogue tape, for example — and deliberately overdriving it. The effect of this distortion is not always obvious, but imparts a diaphanous quality of warmth or complexity.  Other types of harmonic distortion, such as Class B amplifier crossover distortion, are undoubtedly dysphonic: even tiny amounts of crossover distortion are audible, and very unpleasant.

Keith Howard has measured, characterised and emulated harmonic distortion in certain situations. This research led to a number of important conclusions. One of these gives this lecture its title: whether reproducing distortion or measuring it, is not sufficient to record only the level and spectrum of harmonics. Rendering the correct harmonic phase correctly is just as important.

The distortion algorithm that Keith Howard uses in his experiments is based around a waveshaping kernel. This is a function which maps every input sample value to an output sample value. This process is time and frequency invariant, but forms the core of a class of systems that are commonly used for non-linear signal processing. The mapping function may be controlled, using any of a number of methods, to generate a certain pattern of harmonics for an input of a certain amplitude. To add a second harmonic, for example, a waveshaping kernel is derived using the following trigonometric identity as a starting point:

2 cos2 x - 1 = cos 2x

So y = 2x2 - 1 is a waveshaping kernel function for generating a second harmonic, mapping an input sample value between -1 and +1 to an output in the same range. (Click the image to see an animated version.)

X-squared kernel

Neither this example, nor even harmonic kernels in general, cross at the origin, so a d.c. component is generated that must be filtered from the output if anything but a full-amplitude sinusoid is presented. For the third harmonic, a different identity is used:

4 cos3x - 3 cos x = cos 3x

So y = 4x3 – 3x is a waveshaping kernel function that generates a third harmonic. (Click the image to see an animated version.)

X-cubed kernel

In waveshaping, the amplitude of a harmonic falls faster than that of the input signal, so that attenuating the input (in this case, by 3dB) changes the shape of the output wave. (Click the image to see an animated version.)

X-cubed kernel, 3dB down

The generation of wave shaping functions higher than this order may be performed iteratively using Chebyshev polynomials.

Keith’s method of designing and applying waveshaping distortion is encapsulated in a free program called AddDistortion, available from the freeware page of his web site.

Beyond the second and third harmonics, the fractions of each order of polynomial become strongly interdependent. For any input signal that is not a sinusoid at full amplitude, it is not possible to add a fourth harmonic without also introducing a second harmonic. The same is true for any other harmonic beyond the third. Also, because the distortion kernel is derived from a series of continuous functions, discontinuities such as corners or jumps in the transfer characteristic cannot be modelled. A final complication is that the signal must be interpolated before waveshaping and decimated afterwards. This prevents aliasing distortion from occuring when the upper harmonics pass the Nyquist limit.

The ramifications of these limitations are powerful. For example, we could attempt to correct a system that distorts audio in a known way, by applying pre-distortion to the input. However, this results in problems. If the system introduces a second harmonic, we might generate this harmonic in antiphase in the input so that it cancels the distortion product. However, the second harmonic introduced in the input will itself be distorted by the system, and will generate a fourth harmonic in the output, and very likely a third harmonic as an intermodulation product. We eliminated the second harmonic, but possibly made the problem somewhat worse. If we then anticipate the fourth harmonic, there will then be an eighth harmonic in the output, and so on. Such correction cannot therefore be performed using analogue circuitry. This rule was often advanced in the argument against the use of corrective feedback when the debate raged in the Hi-Fi community a few decades ago. However, a correct transfer characteristic may carefully be derived in the digital domain by generating a true inverse function, which is effective at least until a certain maximum frequency is reached.

When more complicated signals are distorted by nonlinear functions, it is known that harmonic distortion is a very small part of the overall picture: Brockbank and Wass determined analytically that, for a signal containing thirty harmonic products, the intermodulation distortion generated by a nonlinearity in the system comprises 99% of the total distortion power2. Full measurement and analysis of intermodulation distortion requires at least as many components in the input signal as harmonics that are under scrutiny.

This method and these observations take us to a practical example of the importance of harmonic phase. Keith advanced three case studies, the first of which demonstrates the point effectively; the other two highlight the opportunities for wider research.

Case study 1: Crossover distortion

In 1975, James Moir performed a series of listening experiments in which a Class AB amplifier was biased at different levels, and the audibility of the resulting distortion measured3. Keith Howard’s first attempt to reproduce these results using a waveshaping kernel was not effective: amounts of distortion that would have been perceived as unacceptable in the listening test were barely audible in practice. The generated transfer characteristic looks nothing like crossover distortion, and has very little effect on a low-amplitude signal.

Crossover distortion with same polarity

However, by alternating the polarity of the harmonic partials but keeping them at the same level, a more familiar characteristic is revealed:

Different polarity

For a full-deflection sine tone, these would measure exactly the same on a spectrogram or a THD+n meter, but they are clearly not the same. The resulting waveform reproduces the results of Moir’s test satisfactorily, and keeps the distortion components far higher as the amplitude falls. It also proves that when we are analysing or modelling distortion, we are interested just as much in the waveshaping function as the absolute level of the harmonic partials.

Case study 2: Hysteresis in transformers

In addition to the nonlinearities caused by saturation, audio transformers exhibit an asymmetrical transfer characteristic caused by their magnetic memory (hysteresis). As well as being frequency dependent, this characteristic makes modelling the distortion very difficult, because phase shift is introduced into the signal as well as wave shaping. Keith suggested a number of ways in which this could be incorporated into the distortion model in future, by using two waveshaping kernels in quadrature.

Case study 3: Loudspeaker distortion

The mechanisms that cause loudspeaker distortion are split into many different types: some, such as the cone or spider hitting their maximum excursion, are proportional to the displacement of the loudspeaker; some, such as eddy currents, are proportional to the force applied to the coil; others are proportional to cone velocity. The problem of modelling this distortion therefore falls into the same category as hysteresis in transformers: non-linearities act in different ways at different phases of the signal, and a static waveshaping kernel is clearly of limited use.

References

1. Hiraga, J.  ‘Amplifier Musicality’. Hi-Fi News & Record Review, Vol. 22(3), March 1977, pp.41–45.

2. Brockbank, R. A., and Wass, C. A. A.  ‘Non-linear distortion in transmission systems’.  J. I.E.E., Vol. 92, III, 17, 1945, pp.45–56.

3. Moir, J.  ‘Crossover Distortion in Class AB Amplifiers’.  50th AES Convention, March 1975. Paper number L-47.

Further reading

Howard, K.  ‘Weighting Up’.  Multimedia Manufacturer, September/October 2005, pp.7–11.

Report by Ben Supper

‘Intelligent Audio Editing Technologies’

Title: Intelligent Audio Editing Technologies
Location: Royal Academy of Engineering, London
Description: Lecture by Dr. Josh Reiss, Senior Lecturer, Centre for Digital Music,
Queen Mary University of London
Start Time: 18:30 for 19:00
Date: Tuesday 11h January 2011

A recording of the lecture is available here (81MB mp3)

The tools of our trade have transformed in the last twenty years, but the workflow of a mixing engineer is almost the same. A large proportion of the time and effort spent mixing down a multitrack recording is invested not in the execution of creative judgement, but in the mundane manipulation of equalisers, dynamics compressors, panning, and replay levels, so that the timbre and blend of individual channels is correct enough to attempt a balance.

There are two good reasons why much of this work has not already been automated. The first is that the task is not trivial: it is a highly parallel and cross-adaptive problem, and the correct value for every setting will depend to some extent on every other. The second reason is a resistance from those who assume that automating the mixdown process will either remove the requirement for a skilled hand and ear, or result in lazy use of automation to the extent that their careers or their integrity will be threatened. To make all music sound the same is not the goal of automation. Rather, automatic mixing will speed up the repetitive parts of an engineer’s job so that more effort can be expended on the art of production.

We need only look at the evolution of digital cameras to see what could be possible with audio. A typical consumer camera of twenty years ago would have had a fixed focal length and aperture, and perhaps an adjustable shutter speed. Now, multi-point auto-focus is a standard feature, the exposure time, aperture, and colour balance are adjusted automatically, a digital signal processor ameliorates camera shake, and so on. Poor shots may be recognised and retaken as many times as is necessary, because the photographer can immediately view their photograph. In spite of these enhancements, professional photographers still exist, and still need to be taught about the optics and anatomy of a camera. However, the emphasis of photographic discipline has shifted towards the creative side of the profession: there is less time spent setting up the camera and developing exposures, and more time in perfecting the technique and shot, and retouching the images.

There are, broadly speaking, four kinds of automatic sound processing tool:

Adaptive processing. Adaptive processes adjust instantaneously to the material that is being played through them. De-noisers and transient shapers are adaptive in nature.

Automatic processing. Automatic processes place some aspects of operation under user control, and make intelligent guesses about the positions of other controls. The ‘automatic’ mode on a dynamics compressor is such an example.

Cross-adaptive processing. A cross-adaptive tool must be aware of, and react to, every signal within the system. For example, the automatic level control on a public address system that adjusts to the ambient noise level may be cross-adaptive.

Reverse engineering tools. Deconstruction of a mix for historical reasons would involve taking the multitrack session master and the stereo master, and determining which processes must be applied to the former to derive the latter. It would be useful to automate some of this.

Adaptive mixing tools require two components: an accumulative feature extraction process, and a set of constrained control rules. Much of the difficulty of getting these tools right is in obtaining the correct information from the audio in the first place: to detect, for example, the pattern of onsets, the correct loudness, and thus precise masking information. The target for an equaliser can then be to reduce temporal and spectral masking, rather than to aim for a flat frequency response. Panning can be used to reduce spatial masking. A compressor can be inserted when the probability of a particular instrument being heard falls below a certain threshold, and it can be boosted to have a certain average loudness without its peak loudness exceeding a higher threshold.

Dr. Reiss played some examples of automated mixing from the Centre for Digital Music, showing us the system element by element. First, each instrument was manipulated in isolation. Then an automatic fader balance was performed. Finally, with one button, the compressor, equaliser, panning, and fader settings were set up for an entire multitrack jazz recording. The result was surprisingly effective, although the automatic nature of the balancing was clear. The vocals, for example, were somewhat quieter than custom usually allows, and the mix was equalised to a fairly flat spectrum whereas most commercial music is boosted at the top and bottom ends. Nevertheless, the power of automated mixing was effectively demonstrated – the result was perfectly reasonable for a monitor mix and, as the algorithms are perfected, the results will certainly improve further.

Suggestions and examples of other automatic tools were shown: an eliminator of feedback for live sound, which set itself the target of keeping the loop gain of the system below 0dB in every frequency band. It achieved this by finding the transfer function of the system and calculating its inverse. A plug-in for automatically correcting inter-channel delay was also demonstrated, which successfully reduced the artefacts created by spill between one microphone and another. The aim of these tools is again to free up the balance engineer’s hands and mind for the more creative aspects of live sound engineering.

The scope for further work in refining these tools is clear, although they already work impressively well. Informal blind testing has shown that it is hard to discern the automated mixes from those executed by students (at least, in short excerpts). In an act of subterfuge, Dr Reiss entered an automated mixdown into a student competition, and confessed his crime only after the competition was judged. Although the mix failed to win a place in the competition, it also failed to pique the judges. Inevitably, technology will soon change our craft beyond recognition. Fortunately for us, the researchers appear no closer to developing a substitute for talent.

Report by Ben Supper

‘Santa Baby, Come Creep a Codec Under the Tree for Me’

Title: Christmas Lecture – Santa Baby, Come Creep a Codec Under the Tree for Me
Location: Royal Academy of Engineering, London
Start Time: 18:30 for 19:00
Date: Tuesday 7th December 2010
Description: Lecture by Prof. Jamie Angus, Professor of Audio Technology, University Of  Salford

A recording of the lecture is available here (50MB mp3)

Lecture Report

By the time it reaches the listener, most recorded music has been processed using at least one lossy encoder. Love it or hate it, such a process is central to the convenience of personal music devices, digital broadcasts, and domestic video technology. Countless people awoke on Christmas morning to discover a new codec, in one guise or another, beneath the tree.

So how does bit-rate reduction work, and what can be done to improve both its efficiency and the quality of its output? A codec can be seen as a four-stage process, as illustrated below:
 

Codec flowchart

The black arrows symbolise audio data, while the green arrows represents side data: information that informs neighbouring processes, but is not directly related to the audio samples.

Apart from the psychoacoustic model, every stage reduces the bit rate of incoming data. The signal redundancy remover is tailored to audio, and can be a process such as a discrete cosine transform or a predictive filter that alters the statistical distribution of the data to make it easier to compress. The entropy coder exploits the non-uniformity of the input data to represent it in a more compact way. Psychoacoustic quantisation removes data which is perceptually masked, and therefore inaudible to the listener. This stage makes the decoded output data non-identical to the input data, but enables considerably higher compression ratios to be obtained by discarding a proportion of the input signal.

We know from demonstrations of existing systems that digital audio can be compressed satisfactorily to a ratio of between 2:1 and 3:1 using lossless methods. To explain how these work, it is convenient to start at the end of the chain, the entropy coder, and approach the problem from the point of view of information theory.

Entropy coding

Change and surprise are what makes information interesting, and audio is no exception. A sine wave is not interesting to listen to. Silence is interesting only when it interrupts what has come before, or changes what follows. A human voice is more interesting when the speaker modulates pitch and speed, and is conveying information that is engaging. Our input data, then, is a background of predictable information, punctuated at intervals by unpredictable elements, and it is this unpredictability that we and our codecs work hardest to convey.

All information is composed of an alphabet of symbols, and the use of these symbols is seldom uniform. The 65 536 sample levels that comprise 16-bit audio are such an alphabet, and their use is very non-uniform. This is due partly to the statistical nature of sound, but also to our desire for dynamic novelty in music. The data below comes from two commercial recordings. A sine wave does not present such a distribution, and neither does white noise.

Graph: distribution of samples

This graph shows the distribution of samples in two CDs [click to enlarge]. Blue: Kind of Blue by Miles Davis. Red: Come To Daddy EP by Aphex Twin. The former is a  re-mastering of a 1950s jazz recording. The latter, released in 1997, is an archetype of high-ratio compression and distortion. The distribution of samples is nonetheless similar. Although the dynamic range of the programme material under investigation may change the offset or initial slope of this graph, it is clear that the frequency of occurrence falls by approximately half for every linear increase of 2000 sampling intervals. Sinusoids do not behave like this, but sinusoids are not musically interesting.

We need 16 bits to convey 16-bit samples, but if we look at the frequency of use, the most common symbols are used with a probability of 1/2-11 (about twenty per second at 44.1kHz), and the least-used symbols with a probability of about 1/2-24 (fewer than one every six minutes). If we use symbols of variable lengths, with short symbols for the most frequently-used sample values, and longer symbols for the least-used ones, the size of the data is considerably reduced. The number of bits of information we would require to encode an arbitrary sample in the data set is referred to as the self-information of the data. This value is around 12.3 bits for the Miles Davis example above, and 13.2 bits for Aphex Twin.

Huffman coding

The most commonly-used method to exploit self-information is Huffman coding. A binary tree is built from the bottom up using a recursive algorithm:

  1. Join the two symbols or structures with the lowest probability of occurring, so that ‘0’ symbolises the first and ‘1’ symbolises the second.
  2. Add their probabilities together:  this is now the probability of that structure occurring.
  3. Repeat from stage 1, until all the symbols are connected.

David A. Huffman, incidentally, was a graduate student when he was assigned this problem as an exercise by his professor, Robert M. Fano. Fano and Claude Shannon had together spent some years developing the theory of creating binary trees, and could not find a method that was optimal in every case. They were building their binary trees from the top down, tackling the highest probabilities first. After many months of effort, Huffman had a sudden realisation that the solution was to build the trees the other way. Professor Fano’s response to this revelation: ‘Is that all there is to it!’

Huffman binary tree: audio samples

This is a Huffman binary tree of a five-bit rendition of Kind of Blue [click to enlarge]. Starting at the top, a binary zero is used for movement left down the tree, and one for a movement right. The binary code for zero is thus given by 1; for -2 it is 0100; 5, which is used about one hundredth as often, is 0101010001. Data that does not have the same binomial distribution as audio will generate a bushier tree; for the purposes of illustration, a Huffman binary tree based on letter frequencies in the complete works of Shakespeare is shown below.

Huffman binary tree: complete works of Shakespeare

The average number of bits we would need to represent our data using the Huffman code is 1.55, which is fairly close to its self-information coefficient of 1.36 bits, and considerably less than the 4 bits that would be needed using plain sample data.

By encoding our data this way, we can remove a substantial proportion of storage demand without changing a single sample. However, we cannot easily use Huffman encoding for large sample sizes, as we need to distribute the binary tree along with the audio. The memory requirements for storing a deep binary tree quickly become unreasonable, since the tree doubles in size for every bit added.

We can instead use the distribution of data to our advantage in a slightly different way: first, to restrict our alphabet to a number of symbols of increasing length that get us to the region of interest, and then to convey the rest of the data in raw binary. This approach is known as Golomb-Rice coding. The data that conveys the region of interest is generally conveyed using a thermometer code: 0 for the first region, 01 for the second, then 011, 0111, and so on. This is the same code as would be encountered by rotating some of the branches of the calculated tree above, but is much simpler to manipulate.

Redundancy removal

Further savings can be obtained by considering the nature of audio, and by moving somewhat towards the frequency domain. Audio is to some extent predictable: it is the response of a number of resonant systems to an excitation. The resonance and excitation components may be conveyed separately, and fairly compactly, by sending the parameters of a predictive filter together with an excitation signal. Since human speech is also produced by an excited resonant system, this approach, called adaptive or predictive encoding, forms the basis of many speech compression algorithms. The disadvantage of such systems is that they are not robust to errors: an undetected error between the encoder and decoder will upset the filter coefficients. This causes instability, and makes the audio data diverge from its proper values.

The greatest economies, those found in MPEG encoders, are when the frequency domain is considered. The discrete cosine transform (DCT) is used to shift to the frequency domain, and a number of tricks are then played to simplify the data in that domain. Spectral masking, where some content masks coincident, quieter content — particularly that at higher frequencies — allows many of the coefficients of the DCT to be ignored, or stored at a lower resolution than would otherwise be necessary. Similar economies are obtained using temporal masking, where a transient sound masks events that follow closely. Audio content assumed to be below the threshold of perception can also be removed. The coefficients of the simplified data are then compressed using Huffman encoding, which is rendered more effective by the greater simplicity of data distribution.

AAC includes an extra component to improve the transient response: quantisation error introduced by the bit-rate reduction process is fed back into the system via a noise-shaping filter to improve the result. This is temporal noise shaping, or TNS.

These processes form the basis of every audio bit-rate reduction system in use today. However, there is plenty of room for improvement. Assuming that Santa’s elves are handy at numerical methods and DSP algorithms, what codec should we wish for next Christmas? Even greater economies and better perceived quality would be obtained by informing our codec using the most recent development in audio engineering, exploiting more sophisticated psychoacoustic models, auditory scene decomposition, dereverberation, and pitch tracking.

Report by Ben Supper

‘Lord’s Cricket Ground: Voice Alarm’

Title: Lord’s Cricket Ground: Voice Alarm
Location: Royal Academy of Engineering, London
Description: Lecture by Roland Hemming of RH Consulting
Start Time: 18:30 for 19:00
Date: Thursday 14th October 2010

A recording of the lecture is available here (13MB mp3)

Lecture Report

With an portfolio encompassing the Millennium Dome, Ascot Racecourse, Twickenham, and St. Pancras International station, Roland Hemming most recently turned his sound system design and project management expertise to Lord’s Cricket Ground, which is currently undergoing a multi-year renovation project.

Three interrelated aspects of the new system were covered in Roland’s talk. The first of these concerned the correct approach to standards compliance. The second covered the more general technical challenges involved in fitting out a large and complex sports venue, where the public address system is used routinely to entertain as well as inform. Thus the system must at once be versatile enough to cope with any conceivable situation, simple enough for a novice to use, and robust enough to withstand partial failures and still carry on working in an emergency. The third aspect is the diplomatic side of the work, and the importance of communication and commercial skills in managing a large installation project.

Voice alarm systems for public address are covered by a number of standards. These specify such things as the speech intelligibility of the system and the need for distributed redundancy of circuits and amplification to avoid any single point of failure. They also stipulate requirements for fire resistance, remote fault monitoring, and operability in the event of a power failure. Many of these needs are specified fairly loosely, providing scope for interpretation. Consequently, much of the skill in working with voice alarm systems is in knowing how much redundancy and fireproofing to build into the system, and where to put it. The latest of these standards is EN54, which will be enforced from 2011, and introduces product testing to the mix. Further to this, there are other standards that cover specific installations, including BS7827, which concerns sound systems for sports venues.

The best practice for voice alarm installation often diverges from that found in professional audio engineering, as the emphasis is on failsafe design that entrusts dynamic control of loudspeaker amplification and routing to pre-programmed paths that are self-regulating, without the need for human intervention. Redundancy is harder to achieve over data networks: unlike audio, Ethernet must be singly-connected and wired point-to-point, and cannot normally be run in loops, but this can be done in stadia using spanning tree technology.

Certain special cases are exempt from EN54, including self-powered loudspeakers and loudspeakers for ‘special’ applications (those, for example, with particular directivity characteristics). Also exempt are ‘kit systems’, made from individual elements of non-Voice Alarm equipment that together comprise the system: the discretion is then left to the project manager to justify the safety of the resulting system, and the local safety authority to approve it.

The Lord’s system comprises eight distributed digital rack rooms. This distribution not only assists redundancy, but also keeps down the length of cable runs. The system is designed to be truly expandable, which can mean anything from moving the walls around in a hospitality box to razing and rebuilding an entire stand. Four control stations provide the opportunity for live announcements, controlled by a touch-screen user interface that allows these to be directed appropriately. Although the venue will eventually divide into 165 sound zones, this is greatly simplified for normal operation so that the system can be employed by an announcer with just a few minutes of training. Hybrid transformers allow audio to be injected into individual areas to provide localised input where this is required. The system features Dante audio networking technology providing audio over IP, ASL Vipedia audio processors, Lab.gruppen amplifiers, and DAS loudspeakers. A local analogue loop provides emergency backup in each rack room, and the use of many small speakers allows every spectator to be reached without disturbing the neighbours.

One of the most difficult elements of any installation, not least one concerning such a historic venue in such an exclusive neighbourhood, is the balancing of the many vested interests. The local council, residents, the various committees, the operators themselves, and the safety team must all be satisfied with the system’s specifications and performance. Discussions prior to installation take years, and can continue after the major part of the installation is complete. The installation itself, being by far the most expensive stage of the operation, is often over in a matter of weeks and must therefore be planned with precision. The potential for catastrophe makes these large projects an exercise in risk management. Without satisfaction, there can be no compliance; without compliance, there can be no venue; without a venue, there can be no business.

Report by Ben Supper

For those who wish to know more about voice alarm, Roland has co-written a book on the subject that is available from avitas-global.com.

‘Synchronising the synchronisation standards’

Title: ‘Synchronising the synchronisation standards’
Location: Royal Academy of Engineering, London
Description: Lecture by John Emmett
Start Time: 19:00 for 19:30
Date: Tuesday 16th February, 2010

Download recording of lecture here (20MB MP3)

Lecture Report

Dr Emmett opened the lecture by summarising the audio-video synchronisation challenges encountered when putting together a television programme. It is better to correct synchronisation problems as they occur in the broadcasting chain than to attempt to correct them all immediately prior to transmission, as the former practice greatly simplifies video editing. With this achieved, attention turns to keeping audio synchronised during broadcast transmission and reception. This is particularly important for human speech: humans are exquisitely sensitive to lip sync. We develop this facility almost as soon as we can see, and the psychological need for lip movement to be attached to speech is so great that it extends even to characters without mouths. Each Dalek needs a light that pulses in sync with its speech, to bond the dialogue to that character.

A number of techniques were employed in the days of purely analogue transmission to ensure that audio and video were kept in sync. It was not unusual for a programme’s video signal to be relayed via satellite and its audio via telephone, and a compensating audio delay had to be inserted to offset uplink and downlink delays. An example of this was used in ITN in the early 1980s. An in-band masked ‘bong’ was timed to follow any video cut in the programme by exactly one second. It was possible then for engineers to adjust the audio delay manually to maintain sync, even where this varied during the programme. Similar timestamps must still be maintained in digital systems, although this facility is now generally accommodated within the channel code.

It is increasingly common for audio and video to be streamed by piggy-backing on a packet-based protocol and transmitting via existing IT infrastructure. This works as long as there is sufficient bandwidth. Otherwise, heavy-duty interleaving is required to compensate for dropped packets, which increases transmission delay, and the chances of sync loss and system failure. As with real piggy-backs, the heavier the payload, the slower the system, and the greater the likelihood of collapse.

Now consider what the word ’standard’ means: this is where problems are compounded. The word has two distinct meanings. It can refer to an outgoing or obsolescent paradigm (such as ’standard definition’), or to standard-bearing in its original sense — at the technological vanguard. We frequently encounter problems when it is necessary to choose between a plenitude of competing standards of different ages, some of which have yet to be adopted, and many of which should not. Standards are necessary only when the current best practice is unclear, but there are usually clues about which standards are ’good’. A good standard must be fit for purpose, timely, and robustly defined: if the plug fits, the signal should work. There are caveats, too: not all standards are intended to be friendly (DRM systems are such an example), and even de facto standards undergo sudden and complete changes. Finally, although a standard needs to be owned by a company or committee to avoid obsolescence, it should contain no element for revenue generation.

The emergence of competing delivery standards in broadcasting has brought the synchronisation problem into the home. Many digital multichannel audio transport layers can be conveyed over S/PDIF channel code using IEC 61937 (Dolby Digital; DTS; linear PCM), and a home cinema amplifier may typically accommodate sixty connectors and a dozen multichannel formats. As for the picture, high-definition video formats such as 720p and 1080p co-exist with conventional 625-line 4:3 and 16:9 broadcasts. There are a number of video interconnection formats with different costs, advantages, and limitations. Any of four digital video broadcasting standards are in use in different regions throughout the world, encompassing several standard frame rates. Meanwhile, individual consumer products are designed for world markets, and are simultaneously compatible with many of these standards. In fact, UK broadcasters have been unable to rely on viewers possessing ’standard’ receiving equipment since 625-line broadcasts began in the 1960s.

Now that it can take half a day for a professional engineer to set up a domestic television, it is quite likely that a set-top box in a typical home, set up without specialist expertise, may be configured to down-convert 720p video to standard definition, and transmit this signal over RGB SCART to a plasma television, which will then up-convert it to 1080p. Audio-video synchronisation is then at the mercy of equipment manufacturers.

Dr Emmett summarised his lecture with advice from Antoine de Saint-Exupéry: ‘No design is finished until the last superfluous item has been removed.’

Report by Ben Supper

Lecture Report, January 2009: Loudness

Thomas Lund, TC Electronic A/S

January 2009 lecturer Thomas LundThomas Lund’s background includes work as a recording engineer and musician and the study of medicine – an unusual combination which may contribute to his understanding of loudness perception. Thomas has also been involved the design of many TC Electronic’s products, he has contributed to various standardisation groups on the subject of loudness, and has authored many papers presented to the AES and other bodies.

Traditional Loudness Measurement

Recent years have seen the ‘level’ of pop/rock music, as delivered by CD, steadily increase. Thomas cited the simple way that audio level has been measured as a partial cause. Historically, audio level has often been measured by peak programme meters, and commonly used definitions of overload have been very simplistic methods such as peak-level-counting (eg three consecutive full-scale samples equals overload). Such simple techniques of measuring (and by association, limiting) the level may have worked well when systems consisted of a microphone, a preamp and an ADC but with digital processing techniques numerous methods have been devised to increase the apparent loudness of material delivered on CD while ‘working around’ the peak-level limitations, apparently (we must assume) to some perceived commercial benefit to the record industry.

Many hold the opinion that such ‘hot mastering’ techniques are severely detrimental to the overall quality of modern music releases. Thomas calls this drive for increased level whatever the cost, coupled with a high willingness of broadcasters and consumers to use large amounts of data compression (for archiving, broadcast and replay), a ‘war on music’.

The Problems of Incorrect Levels

With such hot-mastering techniques, it is trivial to generate digital signals that exceed 0dBFS in the analogue output, after the assumed reconstruction or up-sampling filters. The greater-than-0dB peak levels can cause serious problems in the reproduction chain where some processes have been implemented with the assumption that 0dBFS is the largest signal they should expect.

Thomas offered demonstrations based on a commercially available ‘professional-grade’ sample rate converter, subtracting output from input. In this experiment the output should have been silent but differences could be heard clearly, manifested as ticks and signal-related noise. Other potential problem areas, according to Thomas, include limiting in mix-busses and codecs such as MPEG 1 layer 3 . These processes can all exhibit similar problems when faced with very high level inputs, a phenomenon Thomas further demonstrated. The codec problems can depend on the implementation of the codec as well as the codec itself.

Because of these issues, Thomas recommends normalising to -3dBFS – not to 0dBFS, in digital mixing and recording situations. He pointed out that the final 3dB increase can be done in the mastering room without any real quality loss, given that most recordings use 24 bits.

Better Methods of Loudness/Level Measurement

Thomas gave a functional summary of various improved methods of measuring loudness level and showed relative results based on ITU-R BS.1770. A simple improvement is the over-sampling peak programme meter which offers a more accurate representation of the true peak level.

Thomas also presented a loudness meter available from TC Electronic as a plugin for Pro Tools as ‘LM5 Loudness Radar Meter’.

TC LM5 Loudess Radar Meter

This meter includes representations described as ‘Loudness Units’ (LU) or LkFS, ‘Consistency’ and ‘Center of Gravity’, where Center of Gravity indicates the overall loudness of the programme material or music track, and Consistency indicates the ‘intrinsic loudness changes’ present in the track, with 0 representing a steady-state signal (one which has no loudness changes at all, eg a sine-wave) and progressively more negative numbers indicate reducing Consistency. Low Consistency scores such as -4 or lower indicate that the material may have a large dynamic range.

Conclusions

In conclusion, Thomas offered the following recommendations:

  • Stop Counting Samples: There are better methods of measuring peak levels than counting the number of consecutive full-scale samples
  • True Peak Level: Set maximum peak level at -1dBFS using a true peak meter equipped with oversampling capability.
  • Dialog Level: Suggested level of dialog is -26 to -22 LkFS.
  • Music: Suggested level of music is -20 to -20 LkFS.
  • Avoid Peak level normalisation

If audio level is anchored only to peak level or only to dialogue, both commonly used techniques, loudness chaos is likely to ensue with extreme level jumps between programme, commercials and other home sources.

The tools and understanding exist to provide well-balanced loudness levels between different programmes and material, providing the end-listener a more pleasant viewing/listening experience and the potential for reduced distortion and overall quality improvement. Thomas outlined the problems and offered tools and methods for solving them.

Report by Nathan Bentall (edited by Keith Howard)

‘Reality is Not a Recording / A Recording is Not Reality’

Title: ‘Reality is Not a Recording / A Recording is Not Reality’
Location: Royal Academy of Engineering
Description: Jim Anderson of Jim Anderson Sounds
Start Time: 19:00 for 19:00
Date: May 12th 2009

Abstract

The former New York Times film critic Vincent Canby wrote: “all of us have different thresholds at which we suspend disbelief, and then gladly follow fictions to conclusions that we find logical.” Any recording is a ‘fiction’, a falsity, even in its most pure form. It is the responsibility, if not the duty, of the recording engineer, and producer, to create a universe so compelling and transparent that the listener isn’t aware of any manipulation. Using basic recording techniques, and standard manipulation of audio, a recording is made, giving the listener an experience that is not merely logical but better than reality. How does this occur? What techniques can be applied? How does an engineer create a convincing loudspeaker illusion that a listener will perceive as a plausible reality?

Meeting Report

Jim Anderson: Professor of Recorded Music, Clive Davis Department of Recorded Music, New York University

Jim started his lecture with the attention-grabbing statement that audio recording is trickery, a devious deception – then expanded the point to explain that the aim is to make you, the listener, believe you’re hearing the truth: but actually it’s sleight of hand. He set about illustrating that by playing back a diverse range of audio recordings over the course of the lecture and discussing them, casting some light onto the techniques and tricks he’d used to exercise that devious deception: and without exception, create musical listening experiences of quite exceptional quality.

Jim started by playing the commercial release of J. J. Johnson’s “The Brass Orchestra” – it was extremely punchy, dynamic, and live-sounding. He then played another track: while obviously the same piece, and possibly the same very performance, it had much less impact, drums were much quieter, the soloist was clearly off-mic – this was from a simple stereo pair of mics to capture the “air” of the room, and illustrated the striking difference between the somewhat artificial, yet highly-appealing experience created by the commercial release, and the fly-on-the wall experience of the performance – which is arguably the “real” experience. Jim then discussed some of the details of this performance and the techniques he’d used to create the “false”, yet plausible and appealing final product: it was captured live-performance-style in a single take with no overdubbing; microphone selection was key in realising tonal and dynamic differences within the group; the studio had a “good” acoustic for performance, but this was enhanced with artificial concert-hall reverb. The artist wanted to mix first without the solos, in order to get all the internal balances right: then add the solos later – so the whole thing was mixed twice.

Jim expanded on the microphone selection points by playing “High Noon – The Jazz Soul of Frankie Laine” featuring Gary Smulyan, baritone sax player. Jim used ribbon microphones, with their smooth, easy sound, on all the nine-piece backing group; but used condensers to bring the baritone sax and French horn into sharp dynamic focus. It allows the backing to be up-front in the mix, yet keeping the sax solo sounding appropriately prominent.

To illustrate another interesting technique, Jime played drummer Marvin “Smitty” Smith tracking “The Road Less Travelled”: Marvin had requested “more depth, more breadth” in the kick drum. Jim met this requirement by using a Beyer Opus 51, a boundary effect mic designed for piano, under a sheet of wood to isolate it from the rest of the kit. He used two Opus 51s and an M88 in the middle, to create a mid/side array. In stereo, it creates perfect image of the kit: in mono, it collapses and provides a remarkably leakage-free kick drum.

Among other recordings Jim discussed, he played a track by Patricia Barber, recorded in Chicago. It had an extraordinarily huge, deep, broad-sounding kick drum, very prominent and snappy drums in general, whereas the female vocal is up-front yet full in the low-mids. He then played another recording, with same trumpeter in the same room, yet smoother-sounding – because it’s a tube mic rather than ribbon. Kick drum is only 18”, but with good tuning and an M/S mic it gives the huge depth and finish.

All recordings played so far had been tracked straight to digital: Jim’s next recording was a modern attempt to recreate the classic 1970s Blue Note sound, for an album called “Hubsound – The Music of Freddie Hubbard” Contrary to direct-to-digital tracking, this was done using a 16-track 2” at 15 inches-per-second with no noise reduction. It’s impossible to make lots of overdubs because 16 tracks is very limited. In this way, it emulates not only the sound, but also the practical constraints and therefore the recording techniques, of the Blue Note vintage.

Next up, we heard Gonzalo Rubelcaba performing “Here’s that Rainy Day” in Criteria Studio A in LA: solo piano in a large live rectangular room. Mics were a U87 above, DPA4007 close, DPA 4006 a little further back: and beyond that, a pair of U87s in a modified polyhymnia configuration, so the room sound was also captured in case a surround mix was subsequently needed.

He then played for us Bebo Valdes, a live recording done in a recording truck at the Village Vanguard nightclub. Mics were just a Sanken CUW180 with pair of ratchet movable capsules, here set up for X/Y. Mic pres with A-D were on stage, plus an audience microphone, and optical links connected the A/Ds to the truck. The recording setup was triple-redundant with Tascam DA98s, but the primary recorder was ProTools HD. Jim created a rough mix on Yamaha DM2000, for the performers to check each performance immediately afterwards. Mics were a combination of omnis and cardioids on piano, the Sanken X/Y on bass, and omnis on audience. The worth of the latter was shown when the audience start singing along – precise capture of the audience really added atmosphere to the final product.

Jim concluded by playing us his first ever jazz recording – Ella Fitzgerald at the New Orleans Jazz and Heritage festival 1977, knew Stevie Wonder was in the audience, so called him up to join in! The encore was the duet “You Are The Sunshine Of My Life”. It was a pretty magical moment to capture for a first jazz recording: particularly as immediately after the end of the song, the tape ran out, right then! A close-run thing.

Jim wrapped up this interesting talk – and listening session – by maintaining he’s the liar! Thanks to PMC and Arcam for the superlative audio reproduction system kindly lent to us for the evening.

Meeting report by Michael Page

Lecture Report, November 2008: The Engineering Art Behind the Beolab 5 Loudspeaker

Gert Munch, Bang & Olufsen

During this lecture Gert Munch will demonstrate how the development of several key technologies, including the development of “acoustical lenses,” led to the design and implementation of the BeoLab 5 loudspeakers.

Gert is based at the Acoustics Research division of Bang & Olufsen, Denmark; he is a specialist in electro-acoustics and has worked at B&O for 30 years. In that time he contributed to the development and design of numerous speaker models, including the subject of this evening’s lecture, the BeoLab 5.

The aims for the BeoLab 5 design included

  • to make the best possible loudspeakers with the most convincing total sound experience
  • to give best possible experience wherever you sit, wherever the loudspeakers are placed
  • to reproduce the whole audible spectrum and dynamic range
  • to make a loudspeaker that didn’t sound like one!

In order to realise the ambition, the following requirements were specified:

  • Adaptive bass control including a moving microphone measurement system
  • Active loudspeaker design using high power ICE power amps
  • Thermal compression compensation (to remove temperature dependency of response)
  • Advanced thermal protection including thermal modelling and monitoring
  • Precise mechanical control and fitting for consistency
  • DSP Processing for response correction and manufacturing variation control

A little history: In the mid-1980s, B&O made the ‘Penta loudspeaker, which embodied the early attempts at B&O to take control of speaker directivity. It had a tapered design, with centralised tweeters, to minimise effects of the floor and ceiling reflections, a factor recognised by B&O engineers as critical to the sound in a real room.

To further understand these reflection issues, the Archimedes project was established (running from 1988 to 1992), and carried out by B&O in conjunction with the Technical University of Denmark and KEF (the UK-based loudspeaker manufacturer). This work led to many ideas about improving loudspeakers and a new, improved unit was designed that, unfortunately, never made it to market.

BeoLab 5 evolution: Also around this time, Sausalito Audio Works was pioneering loudspeaker design incorporating what it dubbed ‘Acoustic Lens Technology’. Despite some initial scepticism, B&O engineers concluded that the speakers from Sausalito actually sounded good.

After several iterations at B&O of the initial Sausalito design, the BeoLab 5 was the evolutionary result. Its distinctive shape (some liken it to a Dalek or a pylon) make it easily recognisable – and it weighs in at a hefty 61kg!

The Acoustic Lens (perhaps a ‘lens’ in the sense that a curved-mirror in a reflector telescope can be a lens) is a mechanical structure that consists of a specially shaped reflector mounted atop an upward facing driver, the special shape being a quarter of an ellipsoid.

An ellipse has two focal points; the drive unit is located at the first so that, by virtue of the shape, all sound passes through the second (assuming a ray-tracing model and an infinitesimal source).

Prior to the building of the speaker, some ray-tracing based simulations were attempted. This simulation technique was later abandoned because such a basic model lacks the ability to predict diffraction effects, a critical factor in loudspeaker directivity.

An audience member asks, ‘Why not place the speaker at the second focal point and do-away with the lens?’. Gert’s answer is that such an approach would not provide any control over the radiation pattern – and it is this radiation pattern control that the ‘acoustic lens’ technology seeks to master.

Later modelling attempts included Boundary Element and Finite Element Analysis.

An animated picture is shown to demonstrate a radiation pattern simulation. The key point is that the response looks the same at a wide range of angles in the horizontal plane. Comparing the two-dimensional Finite element model with the 3-dimensional boundary element model, it is noted that, as presented, they look very similar, providing further confidence in their validity and the concept in general.

Gert points out that, at least initially, the ideal radiation pattern of this speaker appears to be similar to that of a dipole, however, the problem of traditional dipoles loudspeakers is they must be placed at least 1m away from the wall behind to achieve good performance, a restriction which can prove inconvenient in real-world situations, usually due restrictions imposed by one’s cohabitee.

In the BeoLab 5, B&O have aimed to make a design with a forward directivity similar to that of a dipole but, due to the attenuated rear-response, one which can be placed directly against a wall.

Taking the power average from nine measurements made at random room positions, yields some kind of loudspeaker power response. Other measurement techniques have been tried, but this power averaging technique, Gert reports, shows better correlation with subjective testing.

Efficiency of loudspeakers is generally low and the BeoLab 5 is no exception. Free-field, 200W of electrical power input might yield 1W of acoustic power. The BeoLab 5 contains amplifiers capable of supplying around 2.5kW of power!

Gert notes there can be huge changes in power response at around 100Hz for differing speaker placement, so a filter is introduced to equalise the power response positioning-room. A normal tone control can never compensate for this kind of problem; much more precise control is provided in the BeoLab 5 using Digital Signal processing.

The BeoLab 5 includes a formidable array of signal processing. The crossovers are performed digitally, and much more besides.

During factory test, the response of each driver is automatically equalised to compensate for manufacturing tolerances. Overall equalisation is also applied to achieve the overall target frequency response. This production testing employs a total of 6 microphones – four at the front (one close to each of the drivers) and two at the rear. A reference speaker provides the target for the equalisation process . Each production speaker is adjusted to match the frequency response of this reference unit with a target error of less than 0.5dB.

Temperature and air pressure can alter the measurements significantly, so these are monitored during this phase.

Using an in-built, motorised microphone which slides out from under the speaker, automatic correction of low-frequency response up to around 300Hz can be invoked by the user to reduce the effects of the room in which the speakers are placed. Gert points out that this correction is not a modal correction – it’s more like a general equalisation, with the filter response being smoothed during the measurement process.

Interestingly, the target response for this ‘auto-correction’ system is not, as one might expect, a flat response, but rather a response that has been determined empirically through critical listening.

The thermal monitoring uses a combined technique of feed-forward modelling in conjunction with average temperature measurement of the driver mechanical assembly. Each driver also has thermal modelling, arranged such that should, on average, too much power be applied to any driver, progressive attenuation is applied to its output (and also to outputs to all drivers of higher frequency to maintain a consistent tonal balance).

A “party test” is also carried out which runs the speakers at full-power for three days!

The BeoLab 5 is a no-compromise design that might at first appear to be at the more esoteric end of hi-fi. But many thousands of units have been sold, proving that many consumers still aspire to achieve great audio reproduction and are prepared to buy-in to new technology to achieve it.

It was fascinating to hear about the design philosophy and gain some insight into the processes. On behalf of all present I’d like to extend thanks both to Gert for the presentation, and to B&O for making it possible.

Report by Nathan Bentall

Lecture, December 2008: An Interview with Bob Stuart of Meridian Audio

Conducted by Keith Howard

Bob Stuart has been a major figure in the British audio industry for over 30 years. Best known as Chairman and co-founder, with Allen Boothroyd, of what is today Meridian Audio Ltd, he has done much more than steer the company through challenging times to its current high-profile position manufacturing some of the most sophisticated audio equipment available. A pioneer of active and then DSP-equipped loudspeakers, he was quick to recognise the potential of CD and, as part of the ARA, to push for a version of DVD dedicated to high-resolution multichannel audio. Meridian’s own lossless compression algorithm, MLP, was developed in anticipation of this and selected by the DVD Forum for DVD-Audio in a technology shoot-out against stern competition. In expanded form it remains the basis of the Dolby TrueHD lossless compression scheme used in Blu-ray Disc. With a long-standing interest in psychoacoustics, which he studied alongside electronic engineering at Birmingham University, Bob is one of very few creators of high-quality audio equipment to have explored the fundamentals of sound perception and generated computer models of human hearing to help guide the design process. In recent years, in collaboration with Peter Craven, he has investigated the effects of digital anti-aliasing and reconstruction filters, one intriguing result being that Meridian’s latest flagship CD player – the 808.2 Signature Reference – uses minimum-phase rather than linear-phase output filtering.

These subjects and many others are covered in this interview, with Bob presenting supporting material to clarify the issues.

An Interview with Bob Stuart (audio, 23MB)