Coding for connection: Voice codec and the foundation of communication

by Kari Järvinen , Lasse Laaksonen

15 Jun 2022

Coding for connection: Voice codec and the foundation of communication

Mobile phones and devices have advanced an incredible amount since their earliest iterations. Think of the different equipment you have used to call people and the quality of that experience through the years. One crucial piece of technology that has made all these human connections possible is voice codecs.

A voice codec is a way to encode and compress your voice and send it from one device to another over a network. It encodes voice into a digital data stream using fewer bits than the original representation and then reconstructs the voice signal through decoding. This helps enable vast amounts of simultaneous voice calls over cellular networks.

Voice codecs ensure that our voice communication has the low latency required to enable natural conversation flow. Over time, voice codecs have evolved into generic audio codecs with high quality for all audio (including music) while maintaining low latency. They set the maximum voice and audio quality any communication system can provide.

Nokia has been part of this innovation from the start – playing an integral part in developing every voice codec for each generation of digital cellular technology – and we are continuing to innovate for the communication of the future. Thanks to the power of 5G, interaction in Extended Reality (XR) and the metaverse are possible but this will bring new challenges in how we connect and communicate. Our inventors are already meeting these challenges head-on and ensuring that voice communication will keep pace as new technologies become mainstream.

Before jumping into the exciting future of voice-driven innovation, let’s look at the remarkable impact developments in voice codecs have had on our lives in the past few decades.

Underpinning innovation

Each generation of technology, from the earliest handsets to the latest smart speakers, advances the ways we communicate and enjoy content. One of the earliest codecs, the Enhanced Full Rate (EFR) standard developed in 1996, enabled voice quality equivalent to fixed line telephony across mobile devices and networks, while the next generation of codecs made mobile voice quality even higher than fixed lines. This was an important step forward in the adoption of mobile devices.

The Adaptive Multi-Rate Wideband (AMR-WB) codec from 2001, extended the audio bandwidth from 300-3400 Hz to 50-7000 Hz and improved intelligibility and naturalness of voice, adding a feeling of transparent communication, and making it easier to recognize who is speaking. The Enhanced Voice Services (EVS) codec from 2014 pushed the audio bandwidth up to 20 kHz with high quality for any audio, including music. Today, the EVS codec is widely deployed in devices and networks around the world. Effective collaboration remotely over a call would not be possible without these subtle but crucial advances.

The future sounds amazing

The emergence of 5G, and even 6G beyond that, opens further remarkable possibilities for voice-driven innovation. Spatial audio over headphones will be a major step into realistic communication enabling users to hear audio as coming from outside their head like in real life. Head-tracking will compensate for head movements, for example, turning your head left while wearing headphones makes a sound source rotate right. This means that the spatial audio scene surrounding you won’t move along with the movements of your head but stay still – again just like in real life! This will help create a more natural and enjoyable environment for people in virtual meetings or in livestreaming events.

A key enabler of new lifelike acoustic experiences is the Immersive Voice and Audio Services (IVAS) codec, which is developed to address use cases such as spatial voice and sharing of immersive experiences. In the future, you will hear your teleconference colleagues around you on your headset, even while on the move, and you can share the vivid experience of a live concert with your friends. This standard builds on the success of the EVS codec, and it will power voice and audio communications across mobile networks and 5G devices, including multi-microphone smartphones and Augmented Reality (AR) glasses.

Our inventors continue to lead in the development of codecs to ensure that each new step forward makes audio quality richer and more realistic.

About Kari Järvinen

Kari Järvinen, (M.Sc., Lic.Sc. (Tech.)), is a Distinguished Scientist at Nokia Technologies and a Nokia Bell Labs Fellow. He is an internationally acclaimed expert in voice and audio compression/transmission with about 20 years’ experience as chairman of working groups in ETSI and 3GPP standardization.

Connect with Kari on LinkedIn

About Lasse Laaksonen

Lasse Laaksonen is the Principal Researcher at Nokia Technologies and a Nokia Bell Labs Distinguished Member of Technical Staff. He heads voice and audio codec development and standardization at Nokia, and he is the Nokia Head of Delegation in 3GPP SA4.

Connect with Lasse on LinkedIn

Article tags

5G Innovation Voice Audio

Select your country