25th AES UK Conference: Programme with Abstracts
25th AES UK Conference: Spatial Audio in Today’s 3D World
in association with the
4th International Symposium on Ambisonics and Spherical Acoustics
University of York
Heslington, York, YO10 5DD, UK
Sunday 25th March – Tuesday 27th March 2012
PROGRAMME WITH ABSTRACTS (Updated 6th March)
Timings and running order may be subject to change.
Sunday 25th March
16:00 – Registration opens
17:00 – 18:00: Optional pre-conference Ambisonics ‘primer’
This lecture is intended as a tutorial on some basic and more advanced mathematical concepts and tools that are used (or may be used) when dealing with Ambisonics and sound field representation.
The talk covers some elementary concepts like the Fourier series, both in its classical version (involving sine and cosines) and the spherical harmonic expansion. The concept of spaces of functions is then introduced and the analogy with the more familiar Cartesian spaces is discussed, including the representation of a function as a linear combination of the elements of an orthogonal basis. Finally, the more advanced topic of integral representation of sound fields is presented and the integral operators known as the Herglotz wave function and the single layer potential are introduced. Their relation with Ambisonics and spherical acoustics is explained.
Monday 26th March
08:45 – Registration opens
09:30 – Welcome
09:40 – Session A: Microphones
Paper 1: Metrics for performance assessment of mixed-order Ambisonics spherical microphone arrays
Sylvain Favrot & Marton Marschall, Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark
Mixed-order Ambisonics (MOA) combines horizontal 2D higher-order Ambisonics (HOA) with lower order periphonic (3D) Ambisonics. Using MOA for spherical microphone arrays permits, compared to HOA, the improvement of horizontal source directivity while retaining some directivity for elevated sources. Because MOA does not capture all source directions in the same way, performance measures should be split into horizontal and vertical characteristics. (i) Reproduced power and (ii) spatial aliasing error as well as (iii) white noise gain and (iv) directivity index are averaged over horizontal sources and elevated sources.
Paper 2: Investigations on cylindrical microphone arrays
Fabio Kaiser, Student, University of Music and Performing Arts, Graz; Hannes Pomberger & Franz Zotter, Institute of Electronic Music and Acoustics, University of Music and Performing Arts, Graz, Austria
This contribution discusses sound field recording with microphone arrays on the finite rigid cylinder. Sound field analysis with rigid cylinders has been discussed most often using the solutions of the Helmholtz equation in two dimensions. Rigid cylindrical arrays of finite length cannot be formulated in such an elegant way. Nevertheless, it is possible to simulate a rigid cylindrical array using the boundary element method to obtain the responses to an incident field.
To compare the characteristics of rigid cylinder arrays to rigid spherical arrays, the incident field is expanded into spherical harmonics. For both spherical and cylindrical arrays, the decomposition with regard to azimuth is easily achieved in terms of trigonometric functions, even when using only microphones on the equatorial plane. Discrimination between different elevation angles is improved by adding more microphones on different heights on the cylinder. This is shown for cylindrical bodies with different diameter-to-height ratios in comparison with spherical arrays.
Paper 3: A comparison of different surround sound recording and reproduction techniques based on the use of a 32 capsules microphone array, including the influence of panoramic video
Fabio Manola, Adriano Farina & Andrea Genovese, University of York, UK
This paper provides a comparison between the operational results of Ambisonics (1st, 2nd and 3rd order), and the first examples of SPS and 3DVMS techniques. Audio and video were recorded at the same time, employing the Eigenmike microphone and a panoramic video capture system of our design. The results were submitted to a pool of test subjects, that evaluated several common psycho-acoustical parameters. Furthermore the same tests were repeated with and without the accompanying panoramic video.
These tests were performed in the 3Sixty room in the University of York, an immersive space with all-around video projection and a 32 speakers array. A complete set of IRs has been measured placing a Soundfield microphone in the center of the room and sending a sine sweep test signal to each of the 32 loudspeakers. These IRs have been employed for equalizing individually each loudspeaker for Ambisonics and 3DVMS playback, and for computing the matrix of decoding FIR filters for the SPS method. In addition to this we experimented with the capture of a live performance in the room and with the virtual reconstruction of the audio-visual experience.
12:25 – Lunch
13:35 – Welcome back
13:40 – Session B: Binaural and Multichannel
The Head-Related Transfer Function (HRTF) represents the acoustic signature for human spatial hearing, characterizing the diffraction of sound waves by the anatomy of the listener. HRTFs vary greatly from person to person, representing a major issue in reproduction quality of spatial sources over headphones. As each individual is morphologically different, these functions are particularly difficult to transpose to other individuals without audible artifacts.
As the inter-aural time delay (ITD) plays a predominant role in azimuthal location, we focus here on the personalization of this cue. Using HRTFs measurements obtained for a KEMAR dummy head measured on the full sphere with a 5° resolution, something not easily possible on actual individuals, the spectral part of the HRTF and the ITD were separated. This allows for the relatively independent combination of pinnae and head contributions. The error in interpolation and recreation of the spectral component was analyzed as a function of modal order through subsampling of the dataset.
A method of synthesizing the ITD using individual morphological data and an external HRTF database was also evaluated. The dataset matching between measurement grids was also accomplished via spherical harmonics interpolation and the effect of modal order was again evaluated.
Paper 5: Fast measurement system for spatially continuous individual HRTFs
Martin Pollow, Bruno Masiero, Pascal Dietrich, Janina Fels & Michael Vorländer, Institute of Technical Acoustics, RWTH Aachen University, Germany
The head-related transfer functions (HRTFs) play a major role for the auralization of virtual sources around the listener as well as for cross-talk cancellation systems. Generic HRTFs of artificial heads are often used, as measuring individuals using a high spatial resolution is usually time-consuming and tedious.
A fast measurement system for HRTFs is presented, consisting of a circular arc of 40 broadband loudspeakers placed on the elevations of a Gaussian grid. By rotating the subject horizontally (either in discrete steps or continuously) HRTFs can be acquired on a spherical surface. Using an optimized version of the multiple exponential sweep technique [Majdak2007], thousands of discrete points can be measured within a few minutes making the use of individual HRTFs well feasible in practice. This measurement data is used to obtain a spatially continuous representation of the HRTFs by using a reciprocal formulation as modal components of an outgoing spherical wave. The assumed acoustical centre of the wave is varied in order to get a best possible reconstruction for a finite order of spherical harmonics coefficients. This results in a setup independent and compact description of individual HRTFs, allowing to evaluate the binaural transfer functions for any point in near-field or far-field.
The paper will look into the practicalities of recording surround sound with height information as well. Based on a standard 5.1 system, but with four additional loudspeakers for height: centre front, centre rear, hard left and hard right in a cross formation.
Various microphone combinations will be shown and discussed, as well as their use in recording sessions. Also on show will be a microphone Mike Skeet has designed for co-incident recording with height and the reasons for its design will be explained.
Paper 7: Vambu Sound: A mixed-technique 4-D reproduction system with a heightened frontal localisation area
Martin J. Morrell, Chris Baume & Joshua D. Reiss, School of Electronic Engineering and Computer Science, Queen Mary University of London, UK
A system, Vambu Sound, was developed for BBC R&D to create a spatial audio production environment. The specification of the system is to provide better localisation around a main television screen and diffuse sound from around the listener. The developed system uses vector base amplitude panning for six loudspeakers in front of the listener and dual-band decoded first order Ambisonics for eight loudspeakers in the corners of a cube configuration. The system is made “4-D” by the incorporation of a dedicated haptic feedback channel within the audio format.
The work flow of the system is described with the advantages and disadvantages over the conventional single Digital Audio Workstation environment highlighted. The incorporation of spatial audio control within Nuendo and the development of the Vambu Sound spatializer application built in Max/MSP are presented alongside the inter-application midi communication protocol. Real world problems in its use are drawn upon to emphasise the systems positive and negative attributes.
Finally user feedback to Vambu Sound demonstration is presented, showing initial reactions to the system. The overall feedback indicates a good level of immersion, particularly from the use of haptic feedback in key places of the drama-based point-of-view demonstration material produced.
Freely available plug-ins and software tools allowing mixing and production using Ambisonics have been around for some years, but it is only since the release of the Reaper DAW software that using the system has become truly viable and manageable in a way endorsed by the software itself (using speaker agnostic tracks).
In this workshop a production workflow using freely available plug-ins and Reaper will be described and discussed which utilises Reaper’s hierarchical routing structure with methodologies for multiple surround, stereo and headphone renditions presented allowing for a mix once, future proof, project. New, up to 4th order, encoding and regular/irregular decoding plug-ins will also be introduced and described. Example mixes created by 2nd year undergraduate students at the University of Derby will be presented, along with newly developed teaching aids and visualisations that are soon to be released as open educational material for the dissemination of Ambisonics to an audience new to the subject.
17:50 – Close of Day 1 followed by Drinks Reception
Tuesday 27th March
09:00 – Welcome
09:10 – Session C: Synthesis and Simulation
Paper 8: Synthesis of directional sound sources with complex radiation patterns using a planar array of loudspeakers
Christoph Sladeczek, Albert Zhykhar & Sandra Brix, Akustik, Fraunhofer Institute for Digital Media Technology, IDMT Ilmenau, Germany, Germany
Sound field synthesis techniques aiming at the creation of virtual source sound fields, which are physically equivalent to the ones radiated by real sound sources. These techniques use an array of loudspeakers where an individual driving signal for each loudspeaker is determined according to a driving function. The synthesis of angle dependent radiation characteristics, as it is common for real sound sources, is a challenge for such reproduction systems.
In this paper we present an approach for the synthesis of arbitrarily sound source radiation patterns using a planar array of loudspeakers. The method is based on the first Rayleigh integral formula and utilises an expansion of the virtual source sound field into spherical harmonics. Using this approach complex radiation patterns are conveniently presented by a set of expansion coefficients. Beside the derivation of the loudspeaker driving function, an analysis of its properties is performed. Numerical simulation results using a finite planar array of loudspeakers are also presented.
Paper 9: A loudspeaker-based room acoustics simulation for real-time musical performance
Jude S. Brereton, Damian T Murphy & David M Howard, AudioLab, Department of Electronics, University of York, UK
Recent advances in the research and development of Virtual Acoustic Environments allow the user to interact with the virtual space, usually by controlling their position within the virtual environment via joystick or other movement controls. More recently virtual acoustic environments have been developed which allow the user to physically move about the virtual space and hear the resulting changes in their own sound (hand claps, speech, singing) according to the geometry and room acoustics of the acoustically rendered environment.
This paper will report on the design and implementation of the Virtual Singing Studio – a real-time interactive loudspeaker based room acoustics simulation for musical performance. The process of designing and implementing such a ‘vocally interactive’ virtual acoustic environment is examined and compared to “off-line” auralization techniques. Whereas others have used synthetic reverberation techniques, the Virtual Singing Studio is based on real-time convolution of Ambisonic B-format room impulse responses which have been measured in an existing performance venue. Furthermore, particular challenges arise at all stages of the process due to the eventual user, the singer, being at once sound source and sound receiver. These challenges will be outlined and potential solutions explored. The paper will also report on objective testing of the simulation and initial subjective evaluations by singers.
Paper 10: Ambisonic synthesis of directional sources using non-spherical loudspeaker arrays
Jorge Treviño, Yukio Iwaya & Yôiti Suzuki, Graduate School of Information Sciences/Research Institute of Electrical Communication, Tohoku University; Takuma Okamoto, Graduate School of Information Sciences/School of Engineering, Tohoku University, Katahira, Japan
Ambisonics has been touted as a system-independent technique to reproduce and synthesize arbitrary sound fields. It offers several advantages over other technologies, including the potential for compact recording systems and a scalable encoding scheme that focuses on the spatial features of the sound field while omitting the peculiarities of recording and reproduction systems. To date, however, most research surrounding the reproduction of Ambisonics has relied on the assumption of a fairly regular loudspeaker array where all transducers are equidistant to the listener. Mainstream deployment of Ambisonics requires weaker constraints on the reproduction array.
Previously, we have introduced a new approach to decode Ambisonic data for reproduction over arrays with irregular angular spacing between loudspeakers. We now extend our proposal with near-field corrections for each loudspeaker. Unlike previous approaches [J. Daniel, 2003], our proposal considers per-loudspeaker near-field corrections in the decoder design stage; this allows us to precisely reproduce Ambisonic-encoded sound fields over non-spherical arrays. Furthermore, the proper treatment of near-field effects makes it possible to re-create sources exhibiting complex directivity patterns.
We evaluated our proposal using a 157-channel, irregular loudspeaker array covering the walls and ceiling of a rectangular room to synthesize directional sound sources encoded using fifth-order Ambisonics.
A listening room equipped with an Ambisonics-capable sound system has been designed and implemented within our premises to serve as a design aid. Among other uses, the system is employed to illustrate audio ambiances associated with a large number of persons talking in a confined space such as a bar.
These simulations are made computationally efficient by using a low number of distributed virtual sound sources (each playing back a recording of several simultaneous talkers) that are positioned at optimized locations in the modelled space. The choice of the virtual sound source locations is informed by previous research where the relation between the sound field diffusive characteristics and perceived location of auditory events is investigated (see e.g., P.Novo, “Aspects of Hearing and Reproduction of Diffuse Sound Fields and Extended Sound Sources” Proc.ICA 2004, Kyoto; P.Novo ” Speech Generated by Crowds: A spatial Analysis”, Proc. ICSV13, Vienna, 2006).
Results of tests undertaken to assert the relation between the number and location of virtual sound sources, sound field diffusive characteristics and plausibility of the simulations are presented and discussed.
12:20 – Lunch
13:20 – Welcome back
This demonstration provides a hands on introduction to the powerful soundfield authoring, transformation and playback techniques made available to the SuperCollider3 user through the Ambisonic Toolkit (ATK).
The ATK brings together a number of classic and novel tools for the artist working with Ambisonic surround sound and makes these available to the SuperCollider3 user. The toolset is intended to be both ergonomic and comprehensive, and is framed so that the user is enabled to ‘think Ambisonic’. By this, the ATK addresses the holistic problem of creatively controlling a complete soundfield, facilitating spatial composition beyond simple placement of sounds in a sound-scene. The artist is empowered to address the impression and imaging of a soundfield—taking advantage of the native soundfield-kernel paradigm the Ambisonic technique presents.
Along with powerful soundfield transforms (spatial filtering), the ATK provides a comprehensive set of Ambisonic encoders (including pseudo-inverse encoders) and decoders (5.1, binaural, UHJ, full-3D) allowing the artist to thoroughly leverage the potential of the Ambisonic technique.
14:25 – Session D: Higher Order Ambisonics / Perception
Paper 12: WITHDRAWN
Paper 13: ESPRO 2.0 – Implementation of a surrounding 350-loudspeaker array for 3D sound field reproduction
Markus Noisternig, Thibaut Carpentier & Olivier Warusfel, Acoustic and Cognitive Spaces Research Group, IRCAM – CNRS UMR STMS, Paris, France
The “Espace de projection” (ESPRO), the variable acoustics performance hall of Ircam, was designed and built for providing the largest variability possible with regard to form, volume, and acoustical properties. The walls and ceiling consist of individually rotatable prisms with three different material surfaces to absorb, reflect or diffuse the incident sound. To vary the hall’s volume and shape three ceiling panels can be raised or lowered independently, and a roller curtain allows for separating the different volumes.
Despite the remarkable flexibility of this room a surrounding 350-loudspeaker array has been recently installed that aims at varying and controlling the acoustics to a greater extent than is possible by passive variable acoustics. Higher-order Ambisonics (HOA) provides the means for creating immersive 2D/3D audio scenes including reverberation and environmental effects.
This article reviews the theory with regard to the installation of a HOA array in the ESPRO. The discussion will be mainly focused on the definition of a feasible grid of loudspeakers and the design of HOA decoders in order to overcome the practical limitations of using non-uniform loudspeaker arrays.
Sound fields can be represented using spherical and cylindrical harmonics using higher order Ambisonics (HOA). The representation can be made more accurate by increasing the order N of the harmonics used.
It has been shown by Solvang  that, assuming plane wave emitting loudspeakers, for a listener placed such that kr > N there will be a spectral impairment of the sound image when more than 2N + 1 loudspeakers are used, where k is the wave number. However, in practical situations the loudspeakers are placed at a finite distance from the centre of the array, and the assumption that they emit plane waves often no longer holds . This may cause a spectral colouration that differs from far field loudspeaker placement. Further colouration may occur due to boundary reflections, depending on the room as well as the placement of reproduced sound source.
This paper gives an account of the effect of near field loudspeakers in free field as well as in a simulated room. The influence of reflections on the image quality is analysed objectively. A comparison of simulated enclosed HOA systems with sound field measurements in the reproduced area is made.
Paper 15: Distance perception in real and virtual environments
Marcin Gorzel, David Corrigan, John Squires & Frank Boland, Dept of Electronic and Electrical Engineering, Trinity College, Dublin, Ireland; Gavin Kearney, University of York, UK
Spatial localisation of sounding objects is affected not only by auditory cues but also by other modalities e.g. vision. It is true particularly in the context of perception of distance where the number of auditory cues is limited in comparison to e.g. localisation in horizontal and vertical planes.
In this study a group of participants was presented simultaneously with a visual object (either a real-world loudspeaker or a virtual rendering thereof using head-tracked, stereoscopic technology) and accompanying auditory stimulus (head-tracked, binaural presentation of either pink noise bursts or female speech). Visual objects were presented at random at distances ranging from 1m to 8m. At the same time, accompanying sound stimuli matched the location of visual objects horizontally and vertically but were randomly misaligned with visuals on the distance axis. After each presentation, users were asked to evaluate the spatial location of audio with respect to the visuals (in front of, at the same location or behind).
Preliminary results show that one can allow for a significant audio-visual mismatch in both real and virtual scenes before they cease to be perceived as unity by most of the participants. Also, a strong effect of presentation distance has been detected which is congruent with previous studies on the subject.
16:55 – Closing remarks
17:10 – Conference close