Multimedia research and standardization
The latest multimedia technology innovation from Nokia
Our portfolio of innovations continues to grow thanks to our ongoing investment in multimedia R&D and our internationally acclaimed team of experts. The work of our inventors in video research and standardization has been recognized with numerous prestigious awards, including five Technology & Engineering Emmy® Awards.
Hard and fast point sampling with NeRF
Neural radiance field (NeRF) training can create incredibly complex 3D scenes from 2D images – at a high computational cost. Our method uses hard sampling to efficiently optimize these networks at twice the speed, saving both time and memory otherwise lost through traditional random sampling. Imagine NeRF’s potential in efficiently representing a variety of multimedia content (beyond just scene representation) with Nokia-enabled consumer-grade hardware compatibility.
Like the sound of speeding up and reducing the memory cost of neural network training down to inference levels? Read the paper by Juuso Korhonen, Goutham Rangu, Hamed Rezazadegan Tavakoli and Juho Kannala to learn more.
Significant improvement for temporal consistency in video semantic segmentation
Semantic segmentation is a far tricker task for video than for static images, either resulting in temporally inconsistent – or costly and inaccurate – predictions. Momentum Adapt is an unsupervised online method that improves temporal performance to deliver the consistency your AI applications need. Uncover how this approach outperforms state-of-the-art algorithms in adapting to even the most severe environmental changes.
Bringing richness, quality and more lifelike interaction to voice and video calling in 5G advanced
IVAS stands for Immersive Voice and Audio Services, and it is a new voice and audio codec standardized by 3GPP. It is part of 3GPP Rel. 18. IVAS is the first 3GPP standard for transmitting conversational stereo and immersive voice and audio.
The IVAS codec enables live immersive audio for any device form factor, bringing people together for real-life interaction with accurate and immersive three dimensional rendering of captured sound.
Nokia is participating in IVAS standardization in 3GPP and one of its most active contributors and proponents.
Read more about IVAS in 3GPP Rel.18 and in our whitepaper
Getting the full picture of lossless image codecs
Image codec performance can be efficiently enhanced through domain adaptation – but its adaptation overhead can compromise its gain. This is where an adaptive multi-scale progressive probability model delivers: effective domain adaptation without the significant overhead. See how this technique could reduce the bitstream size of lossless image codecs by up to 4.8%.
Want to enhance your lossless image compression? Read the whitepaper from Honglei Zhang, Francesco Cricri, Nannan Zou, Hamed R. Tavakoli and Miska M. Hannuksela.
New AI frontiers for image compression
For the last 30 years, image and video compression algorithms have been designed by engineers – but changes may be afoot. With artificial intelligence set to step up the game, model overfitting at inference time may be necessary to improve the efficiency for learning-based codecs. Learn why Nokia is exploring the potential for modified neural networks to streamline the compression process.
Temporal dependencies: the life hack for federated learning
Federated learning (FL) mitigates some long-lasting challenges of large-scale machine learning including privacy and computation costs, but it also comes with bandwidth challenges of its own. Discover how temporal dependencies are key to improving the communication efficiency in FL without sacrificing model accuracy.
More, more, more: Convolutional cross-component modeling answers streaming demands
Our growing appetite for streaming high-quality media seems insatiable, driving the need for new and advanced coding technologies. Nokia's convolutional cross-component modeling (CCCM) approach excels in next-generation video coding, utilizing advanced filtering for cross-component prediction. Learn how Nokia is leading the way in this exciting technological advancement, achieving significant bit rate reductions compared to current codecs.
Curious about what this means for video streaming? Read the paper by Pekka Astola, Alireza Aminlou, Ramin G. Youvalari, and Jani Lainema to find out more.
A pathway to VVC-based broadcasting and streaming
With 50% greater performance better efficiency than HEVC, Versatile Video Coding (VVC) is a dream for broadcast and streaming – if you know how to use it. Thankfully The Media Coding Industry Forum (MC-IF) has published the first technical guidelines for broadcast and streaming applications to help you navigate this state-of-the-art standard. Discover best practices for compression performance, interoperability, bitrate ranges and more.
VVC: A great all-rounder for immersive video
Immersive video, with its wide range of exciting content types and services, is taking over the show from conventional 2D. Discover why the Versatile Video Coding (VVC) rules the roost when it comes to immersive video compression and implementing advanced features.
VVC caught your eye? Learn more about it in the article by Miska M. Hannuksela and Sachin Deshpande.
Neural network based video post-processing, this time with content adaptation
Decoded video is usually affected by coding artefacts. This can be alleviated by post-processing - for example using neural network based filters - and better filtering can be achieved by adapting the neural network to the video content. However, this comes with a bitrate overhead. In our paper, we show how efficient content adaptation can be performed, with the aid of the MPEG NNR standard for compressing the adaptation signal.
A new low latency feature for Versatile Video Coding
Everything from video conferencing to computer vision depends on keeping latency low. We have developed Gradual Decoding Refresh (GDR), a new feature that builds on Versatile Video Coding (VVC). Learn how GDR alleviates delay issues related to intra coded pictures – putting them on par with their inter coded counterparts – and maximizes coding efficiency while minimizing leaks.
Dive deeper into the topic with Limin Wang, Seungwook Hong and Krit Panusopone
Competitive learning: the content-specific post-processing frontier
For machines intending to perform vision tasks, adapting reconstructed human-ready videos is a must. But how do we address artifacts caused by varying compression rates and unique content? Joint optimization pits content-specific filters against each other for the right to post-process video with fewer resulting artifacts. Discover how this competitive learning can result in greatly improved performances of reconstructed data.
Ready to claim victory over video artifacts? Read the paper by Honglei Zhang, Jukka I. Ahonen, Nam Le, Ruiying Yang, Francesco Cricri.
NN-VVC: A sight for all eyes
While video compressing technologies are traditionally tailored to human viewers, growing AI activity is driving up the demand for machine consumption too. NN-VVC combines machine learning and conventional codecs to optimize video compression, transmission and storage for both human and machine consumption. Learn how this ground-breaking research surpassed today’s state-of-the-art codecs to win IEEE ISM 2023’s Best Paper Award.
Less distraction, more machine learning action
E2E learned compression may take the lead in image coding for machines, but its insufficient flexibility in adaptively allocating bits can sacrifice machine vision performance. Leveraging Regions-of-Interest can minimize the bits allocated for backgrounds, resulting in reduced bitrates while retaining the accuracy of machine tasks. Learn more about how this method can achieve impressive gains within learned image codecs.
Ready to find out more? Read the whitepaper by Jukka I. Ahonen, Nam Le, Honglei Zhang, Francesco Cricri and Esa Rahtu.
Eliminating numerical instability from convolutional neural networks’ equations
Convolutional neural networks can unlock extraordinary tools for image and video coding, but their limited precision in floating point arithmetic is inescapably problematic. Our post-training quantization technique stops data corruption in its tracks, dividing operations between integer and floating-point domains for maximum numerical stability. See how this technique can realize uncompromised deep learning performance across a variety of platforms.
Ready for better machine performance? Take a look at the whitepaper by Honglei Zhang, Nam Le, Francesco Cricri, Jukka Ahonen and Hamed Rezazadegan Tavakoli.
Vision enhanced for human- and machine-kind
Images compressed with neural network-based codecs are often plagued with checkerboard artifacts, degrading picture quality for human, if not machine, eyes. In steps a new codec fine-tuning technique to remove these problematic artifacts, enhancing details for humans and retaining machine performance at no extra cost. Discover how every vision can benefit from this technique.
Machine oriented image compression: a content-adaptive approach
An increasing amount of videos and images are watched by computer algorithms instead of humans. Our research considers how image coding can adapt to non-human eyes, with implications for smart cities, factory robotics, security and much more. Discover how an inference-time content-adaptive approach can improve compression efficiency for machine-consumption without modifying codec parameters.
Dynamic mesh coding: Realizing photorealistic metaverse experiences on every device
Dynamic meshes bring immersive experiences to life, but their full potential can only be unleashed by standards that ensure interoperability. Initially designed for point clouds, the recent MPEG Visual Volumetric Video-based Coding (V3C) framework can extend its talents to efficiently encode and decode these dynamic meshes – on any device. Discover how this approach exceeds the compression performance of today’s best prior art to support tomorrow’s metaverse experiences.
Ready to unlock new immersive opportunities? Get the article by Patrice Rondao Alface, Aleksei Martemianov, Lauri Ilola, Lukasz Kondrad, Christoph Bachhuber and Sebastian Schwarz.
Breaking the barriers of immersive content with volumetric video
Virtual, augmented and mixed reality applications are on the rise, and volumetric video is the fundamental technology enabling the exploration of real-world captured immersive content. Learn how to efficiently store and distribute volumetric video, which is encoded with the family of Visual Volumetric Video-based Coding (V3C) standards.
Real-time decoding goes mobile with point cloud compression
From education to entertainment, capturing the real world in multi-dimensional immersive experiences presents a multitude of opportunities – alongside data-heavy complications. The release of the MPEG standard for video-based point cloud compression (V-PCC) for mobile is an immersive media gamechanger. Discover how V-PCC distribution and storage, and real-time decoding can now be achieved on every single media device on the market.
Find out more in this article by Sebastian Schwarz and Mika Pesonen
Navigating realities in 3-Dimensions with Point Cloud Compression
Point clouds are integral to immersive digital representations, enabling quick 3D assessments for navigating autonomous vehicles, robotic sensing and other use cases. This level of innovation requires massive amounts of data – and that’s where Point Cloud Compression (PCC) comes in. See how PCC lightens point cloud transmission for current and next-generation networks.
I-Frame splicing: a smarter way to stream
Adaptive streaming allows us to “tag in” higher quality segments when network conditions improve. The same mechanism can be used for swapping in low-quality background segments in 360-degree viewport-dependent streaming. But what about all that wasted bandwidth? I-Frame splicing blends pre-downloaded low quality segments with higher ones, enabling better service experiences without wait or waste. Discover how I-Frame splicing can have an immense impact on bandwidth savings for networks and internet traffic dominated by video.
Growing OMAF’s vision in its second generation
Omnidirectional Media Format (OMAF) was the first VR standard to store and distribute immersive media. Now its second edition has its sights set on even more, building upon its predecessor’s best features from overlays to multiple viewpoints. Unveil how to leverage these tools for maximum quality of experience in immersive applications.
Learn more
Blog
Blog
Blog
Blog
Blog
Blog
Blog
Blog