As broadcasters and production companies embrace VR and AR, audio needs to play its part in putting the viewer at the centre of the action. Kevin Hilton reports

The visually immersive power of virtual reality (VR) is immediately obvious. Less obvious, but in many ways much more powerful, is the part sound plays in VR, augmented reality (AR) and 360 video presentations. True spatial surround sound puts you right in the middle of things and fills in details not seen in the images.

Broadcasters and production companies are realising the potential of VR, and with it the necessity for all-encompassing audio. “There’s phenomenal growth in VR, AR and hybrid formats,” says Soho Voices managing director Peter Morris.

“Clients are saying they want to do things in VR and there are many applications for it, including business meetings and broadcast.”

Scott m bamsound vr lab c

Soho Voices recently installed four new studios, one with Dolby Atmos. This format is already dominant in cinema production and now Dolby has developed mastering tools to adapt it for VR. Morris sees the integration of Dolby with the Avid S6 console and Pro Tools digital audio workstation as providing a range of mixes suited to different formats from one master.

“Immersive sound is not there yet,” he says. “But we’re seeing people talking about music videos or concerts that would be something not unlike – or even better than – going to the gig.”

Bamsound Creative, based at Vaudeville Post Production in London, is one of the facilities already producing immersive VR music. Established in 2006 by Scott Marshall, Bamsound is a VR/AR/360/immersive specialist.

“It was predominately just me for most of the 11 years but now we’ve got three to four people and are growing exponentially because of VR and 360,” says Marshall.

Eloise whitmore recording movement effects for turning forest

Recent work includes promos for Tottenham Hotspur and Reading football clubs, a “horror experience” VR tie-in for the Universal film Ouija: Origin Of Evil and a recording of Scottish band Biffy Clyro at the NME nominations party.

“You feel like you’re part of the audience,” Marshall says of the latter. “It’s a nice effect, with the audience singing along to the track and being behind you.”

Music has also been a focus for the BBC, which produced test material primarily for 360 video, with some VR production as well. (360 video is effectively a panoramic view of an event or performance, while VR is more immersive and interactive).

Welsh rockers Super Furry Animals were filmed in 360 at Motorpoint Arena in Cardiff using eight cameras, each of which had its own immersive audio mix.

“It gives an idea of what it’s like to be at a concert,” says audio supervisor Catherine Robinson. “If you move your head, the sound comes round to the front as well.”


  • AMBISONICS Developed by mathematician and tape recorder enthusiast Michael Gerzon in the early 1970s, to go beyond what he saw as the confines of stereo and the inadequacies of quadrophonics.
    Gerzon theorised that proper spatial imaging could only be achieved if the actual acoustical signals contained in the recording environment were recorded. He defined the sound field as comprising the absolute sound pressure level and the three pressure gradients: left/right, front/back and up/down.
    These are known as B-format signals, which are achieved after processing the raw A-format data from the four capsules in the SoundField microphone.
  • BINAURAL Uses microphones in the ‘ears’ of a dummy head to accurately record sonic events, making for realistic playback. The concept is that the listener is put in the position of the dummy head.
    While impressive, the system can only be used with headphones and the images are fixed to the original recording position, making the format unsuited to the moving images of VR/360.
  • OBJECT/CHANNEL-BASED IMMERSIVE AUDIO SYSTEMS Dolby Atmos and Auro Technologies’ Auro-3D were developed initially for feature film production. Both now have versions for VR, with the advantage of allowing individual sounds and tracks to be positioned as ‘objects’ anywhere in the sound field.
  • VR-SPECIFIC AUDIO SYSTEMS Such as Facebook’s 360 Video Spatial Audio Workstation. l DSPATIAL Immersive 3D audio workstation (also for film and broadcast).
  • AUDIO SPATIALIZER SDK For the Unity games engine (as used on the BBC’s VR film Turning Forest).
  • MACH1 SPATIAL AUDIO For VR, AR and music.

Robinson works in the 3D sound studio at BBC Wales. Audio projects are recorded and mixed in Ambisonics and binaural (see Audio Formats box, above), the two technologies most associated with VR/AR and 360. While binaural recording is recognised as a true representation of an audio ‘scene’, it is locked to the ‘ears’/microphones of the dummy head.

This, says Robinson, means it can’t be made to match moving VR/360 images. “Instead we take an ordinary recording in mono and translate it into Ambisonics,” she says.

“By capturing in mono, we can add binaural impulse response, which gives azimuth [horizontal direction and angle] and elevation in space. Using a renderer, we then make a multi-WAV for different playout systems.”

Rode videomic sound field

Rode videomic sound field

These include the YouTube technology based on First Order Ambisonics, with four virtual loudspeakers in space, and Facebook’s, which uses Second Order Ambisonics featuring nine virtual speakers.

“The higher the order of Ambisonics, the more definition of spatial audio you get,” says Robinson.

Soho Voices’ first VR project involved Dolby Atmos but Morris says clients are asking for binaural recordings to go with headset productions.

“There’s the whole Ambisonics side of recording, but we are able to mix for this as well,” he says. “It’s not a case of upmixing from stereo but creating something [in a high-end format] and down-mixing from that.”

Marshall feels some education is still necessary when discussing the options for VR/360 audio, with people often using the terms binaural and Ambisonic as synonyms. He says Ambisonics is the “most accessible and popular” of the technologies:

“It’s more of an open format now but it only works for certain things, like simple action or if you don’t need very clear dialogue. With Ambisonics, everything is already mixed together, whereas with Dolby Atmos and some other immersive systems, everything is separate, which gives you more control.”

Sound designer/recordist Eloise Whitmore agrees that the ideal is to record all sound elements separately, which she says doesn’t usually happen.

”The problem seems to be planning. People are not taking on board that a lot needs to be done for the audio to be as good as the pictures”
Eloise Whitmore, sound designer/recordist

Whitmore worked on last year’s Turning Forest production, which started out as audio-only but had VR animation added later. Dialogue was recorded on individual mics in the multi-purpose audio studio at Dock 10, while Whitmore used an array of 20 mics plus a boom on location. then, Whitmore says, most of her VR work has been fixing soundtracks that don’t work with the visuals.

“I’m getting phone calls saying the audio is in the wrong place and doesn’t match the head-turns,” she says. “The cheapest way to fix that is to put speech down as mono, so it doesn’t matter if you turn your head, and then record atmospheres and backgrounds in binaural.

”But mono does feel like audio has taken a step back because on Turning Forest we had everything as objects, which were linked [to metadata] so they were always in the right place. The problem seems to be planning. People are not taking on board that a lot needs to be done for the audio to be as good as the pictures.”

Robinson says producing audio on VR and 360 productions is very similar but there is a significant difference to what is done for TV: “Less is more in VR. With a TV dub, you throw everything at it and create a rich soundscape. That’s too much for VR. The more going on masks the sound that you want to hear.”


Microphone manufacturer Røde bought the SoundField Ambisonic mic brand last year, producing its own on-camera immersive mic, the Videomic SoundField. It plans to further integrate the two brands with new software-based Ambisonics products next year.

Røde and Sennheiser have taken a more mass market/production approach to Ambisonics mic production. Sennheiser’s Ambeo is a four-capsule Ambisonic mic in a tetrahedral configuration. While it is intended for production recording, VR is very much a target.

At IBC, Sennheiser launched the Ambeo for VR Partnership Programme with the aim of creating a seamless production chain for VR and AR work.

The first products were Zoom and Aaton Digital location recorders, Orah 360 video software, Sphericam 360 cameras, Noise Makers mixing plug-ins and Harpex signal processor algorithms. Another recorder manufacturer, Sonosax, has since joined.

Veronique Larcher, director of Ambeo immersive audio for Sennheiser, says the partnership is intended to “make the life of the content creator easier” by integrating the Ambeo mic with other devices.

In the case of the Zoom F8 and F4 recorders, a plug-in embeds raw Ambeo signals into the machine, which Larcher says means “one fewer step for the creative to think about”.

Noise Makers produces the Binauralizer plug-in for binaural panning and down-mixing of multichannel signals. Larcher says this is used to convert Ambisonic information to binaural audio, approximating the output of the Neumann KU100 dummy head.