Working with Multi-channel Audio: The Challenges and Benefits : Limecraft

For many in media production, working with multi-channel audio is a daily fact of life. There are numerous challenges that come along with the format and it can be complex to manage – sometimes even more complicated than video.

In this blog post, we examine and explain the format, discuss the challenges that come along with it, and look at solutions to help content producers work with it successfully and more efficiently. When we talk about efficiency here, we are referring to the ability to configure audio and work with multiple audio layouts side-by-side rather than the inefficient approach of making a copy for each possible audio layout.

💡 More about setting up your workspace to support multi-channel audio on our knowledge base

What is it?

As the name suggests, multi-channel audio refers to audio recordings, playback, or file formats that feature more than two audio channels. Unlike traditional stereo audio, which uses two channels (left and right), multi-channel audio systems involve three or more channels.

From a production perspective, multi-channel audio relates to the recording of different sources on specific tracks. Examples could be different microphones positioned around an orchestra, separate speakers in a debate or audio coming from multiple sources around a racing track. This allows for greater flexibility in the post-production stage, e.g. editing and mixing, more accurate audio transcription, etc.

From a distribution point of view, these channels can be used to create immersive audio experiences by providing spatial positioning and directionality to sound. This type can be used in various applications, including cinema distribution, home entertainment, gaming, virtual reality, music production, film production, and more. By using multiple audio channels, these systems can recreate realistic soundscapes using enhanced spatial awareness, and provide a more immersive experience for listeners or viewers.

When we discuss content delivery applications, it is worth noting that broadcasters often specify a certain number of audio tracks as part of their acceptance criteria and quality control process for the delivery of finished content from a producer.

Different Types of Multi-channel Audio

The most common multi-channel audio formats include:

Poly WAV: A file format most commonly found in TV/broadcast applications, poly WAV is simply a WAV file containing multiple audio channels and extra metadata identifying the channels.

5.1 Surround Sound: This format consists of six channels: front left, front centre, front right, surround left, surround right, and a low-frequency effects (LFE) channel for bass. It is commonly used in home theatre systems and provides a more immersive audio experience compared to stereo.

An example of how a 5.1 surround sound system would be configured in a typical living room, showing loudspeaker placement for each of the 6 channels of audio.

An illustration of how a 5.1 surround sound system would be configured in a typical living room.

7.1 Surround Sound: This format expands on 5.1 surround sound by adding two additional channels: surround back left and surround back right. It offers even more spatial immersion and is often used in larger home theatre setups or professional audio environments.

Dolby Atmos: Dolby Atmos is an advanced audio format that adds height channels to traditional surround sound setups. It allows sound engineers to position audio objects in a three-dimensional space, creating a more realistic and immersive audio experience. Dolby Atmos is commonly used in theatres, home theatres, and select gaming setups.

DTS:X: Similar to Dolby Atmos, DTS:X is an object-based audio format that provides immersive, three-dimensional sound. It allows sound objects to move freely around the listener, creating a lifelike audio experience. DTS:X is also used in theatres, home theatres, and gaming systems.

8K Super Hi-Vision: Developed primarily by Japanese national broadcaster NHK, the 8K Super Hi-Vision format features 22.2 audio (i.e. 24 channels) that can reproduce a natural 3D sound. It is standardised in the international standard Recommendation ITU-R BS.2051, and in the Japanese standard ARIB STD-B59. NHK has placed a central role in the development of these standards, which include guidelines on loudspeaker layout, channel configuration, and other aspects.

💡 In general, Limecraft supports all possible audio layout configurations. Please refer to the complete overview of supported file formats on the knowledge base.

History

Up until the 1940’s mono sound recording was the most popular format. In November 1940 Walt Disney’s Fantasia became the first commercial motion picture with stereophonic sound. However, it took until 1957 for stereo recording to become a feature of the music business. The first commercial stereo recordings were produced in New York City around the autumn of 1957 by famed producer Sidney Frey and Audio Fidelity Recordings.

Quadraphonic sound – what would now be called 4.0 Surround Sound – was briefly popular in the late 1960s and 1970s and was used for both recorded music and live events, most notably by the English rock group Pink Floyd during some of their 1970s concerts.

The 5.1 surround sound format was designed by Dolby Laboratories in 1976 and first used for a theatrical release of Batman Returns in 1992.

The first TV displays capable of handling multi-channel audio appeared in the 1980s. The advent of DVDs saw multi-channel audio (notably 5.1 home theatre setups) become a standard part of the home audience experience, albeit requiring manual setup by the viewer.

Dolby’s 7.1 surround sound format was introduced in 2010, and first featured in that year’s release of Toy Story 3 by Walt Disney Pictures and Pixar.

The first Dolby Atmos installation was in the El Capitain Theatre in Los Angeles for the premiere of Disney and Pixar’s Brave in June 2012.

In the broadcast and production world, recent technologies such as Interoperable Master Format (IMF) and Mpeg DASH have allowed more flexible production and distribution of content that features multi-channel audio.

Drivers of change

There are a number of reasons why multi-channel audio has become increasingly popular and prevalent in the media and entertainment industry:

The explosion in the number of online and streaming platforms plus an increasingly globalised content market have, in turn, led to a greater demand for localisation and personalised content. It is more efficient and convenient for producers to create multiple language audio tracks within one finished piece of content.
Changes in technology have brought about increasingly complex production setups, allowing multiple microphones being used to capture different elements of the event in question.
Consumer demand for more immersive sound (e.g. surround sound) has driven the technology landscape forward, with such new formats as binaural audio and Apple’s spatial audio.
Many countries now mandate a policy of ‘open governance’, requiring national, regional, and local parliaments, councils, assemblies, and other legislative bodies to record and broadcast proceedings in different languages (specially in multi-lingual countries like Belgium and Canada).

In addition to these points, recent advances in synthetic dubbing and voice cloning by companies such as Veritone and Ooona suggest that multi-lingual content (wrapped as multi-channel audio files) will become pervasive as the process of creating it becomes more automated and, by definition, faster and more cost-effective.

The challenges

Multi-channel audio can be complicated to work with, and one of the biggest challenges relates to workflow design and post-production. Productions featuring multi-channel audio require greater organisation and effort at the post-production stage, and this therefore calls for more thought to be given to initial workflow design.

Another significant challenge relates to storage and its associated cost. The need to make a copy of each item of content for each audio layout may have a dramatic effect on storage requirements and the overall costs incurred.

There are very practical considerations at play here, too. For example, browser being used as interfaces to Media Asset Management solutions generally only support a simple stereo track, which complicates the management of clips with multiple audio tracks or different audio layouts.

Solutions

In terms of the media and production industry, various solutions for content sharing and collaboration are available from different vendors in the market, and they each take a slightly different approach to multi-channel audio.

Emotion Systems – Emotion solutions are designed to ‘automate repetitive audio processing tasks’, cutting down time in the edit suite and making it easier to comply with loudness, accessibility and localisation requirements and prepare multiple versions of content. The team at Emotion therefore have a great deal of experience in working with audio, and multi-channel audio specifically.

“We’ve noticed several important trends where multi-channel audio is concerned, especially around regulation, the audience experience, and the competitive environment,” notes MC Patel of Emotion Systems. “Producers of broadcast content are increasingly having to consider audio dynamics (especially loudness) because many countries are now enforcing regulations around this topic more strictly. Movies are often delivered to broadcasters with the original theatrical audio mix and that’s generally not suitable for TV broadcast, and so rework is required. Also, the transitions between a 5.1 movie audio track and a regular stereo mix during, say, a commercial break can be problematic and require careful management. The big streaming platforms have invested a great deal in the audience experience by commissioning content in immersive formats such as Dolby Atmos, and this has bled over into other TV platforms. We’re now seeing archive content being remixed and reworked because the audience has an expectation of certain standards of audio quality and ‘performance’. As the media market has grown more and more competitive, the traditional broadcast networks have had to up their game, and that’s challenging when they have to consider their broadcast infrastructure and hardware in a way that the large streaming platforms do not.”

“The production landscape has become increasingly complex, and multi-channel audio has been a big part of that. From localisation and audio description to broadcaster QC and verification standards, we’re dealing with more ingredients than ever before, and we see automation as an important way of improving efficiency and controlling costs.”

Limecraft – Limecraft’s Media Asset Management solution fully supports multi-channel audio end to end. Importantly, it lets users create and work with different audio track layout configurations. Users can create a multi-track audio proxy. MPEG-DASH (Dynamic Adaptive Streaming over Http) enables users to switch between different audio layouts while playing out, in real time.

The benefits of this approach are important and wide-ranging:

Provides better and more consistent organisation of audio content, which is also helpful when we consider archiving, search and retrieval.
The ability to create several different multi-channel proxies helps optimise storage and reduce file sizes.
Offers impressive savings in storage economics.
Helps avoid the unnecessary duplication and storage of media files if only the audio differs.
Enables easier and more efficient localisation, helping content to reach a wider audience.