The feeling of presence requires depth, motion, timing, and sound working in harmony, then we can feel fully immersed, says Nokia’s Saverio Blasi

For decades, progress in media has meant higher resolution and faster networks. Recent multimedia advancements are delivering breathtaking visual detail and next level audio output. These achievements have transformed how content is consumed and enjoyed across various platforms.
But despite all our progress – more pixels, cleaner data, and sharper definition – remote connections still don’t feel truly real. We may be able to see and hear content or call participants more clearly than before, but rarely do we experience presence. Our digital interactions feel strangely thin.
Take high definition, for example. It addressed technical problems like low resolution, pixelation, blur and poor motion handling, but it didn’t make digital experiences feel more human.
We have made huge strides forward in pushing sharper images through faster pipes, but along the way we treated sight and sound as separate challenges, optimising each in isolation. The result may look impressive, but in reality, feel slightly flat.
Just like in the real world, presence requires depth, motion, timing, and sound working in harmony. When this happens, our imagination no longer needs to fill in the gaps, and we feel fully immersed.
Anyone who has spent hours in video calls understands this intuitively. Faces are crisp, voices are clear, and yet something essential is often missing. Movement lacks weight. Micro-delays disrupt conversation. Audio floats without a sense of space. Small mismatches add up, and the cost is emotional distance. We are connected, but not together.
For broadcasters, this gap is most evident in live content. Sports, news and events can be delivered in great visual quality, yet still feel emotionally distant. The camera is closer, the sound is cleaner, and the delay is shorter, but the sense of being there often remains elusive.
These are not shortfalls of ambition, but of configuration. Presence is, of course, qualitative, while everything else can be measured in numbers. Presence depends on how multiple signals combine in real time, and whether they behave as they do in the physical world.
Can streaming become something we inhabit? The short answer is yes. Recent breakthroughs mean that the tools, and the coordination across networks, codecs and devices, are within reach. Presence will be treated as a system-level challenge and not a visual upgrade.
Full-fidelity streaming is not about overwhelming the senses, but about restoring the subtle cues that make interaction feel natural. For instance, spatial audio anchors voices and sounds in three dimensions, so you experience them around you as you would in real life rather than from one single direction.
Photorealistic 3D and volumetric video introduce depth and perspective, allowing viewers to feel and move around inside a scene instead of just looking at it. Finally, adaptive AI-based video compression keeps these experiences responsive and efficient, preserving realism without demanding impossible amounts of data. What unites each of these is that they do not treat media as a window, but more as a shared space.
It’s critical that these elements operate in unison. We cannot feel presence if there is a lag in image and sound, or between motion and response. This is why advances in video compression, AI, edge processing and emerging network technologies matter so much. It’s these that make it possible to coordinate complexity, to adapt streams dynamically, and to prioritise what the human senses actually notice.
As networks evolve towards 6G the emphasis moves from raw speed to responsiveness, going from delivering more data to delivering the right data at the right moment.
What emerges from all of these advancements is a new phase of digital media. This shift will not arrive overnight and must not be oversold. But it does mark a change of direction. Rather than just pushing for more pixels, more data and more definition, we can turn towards coherence, where sound, motion and interaction move together through space and time.
If the last 20 years were about making media clearer, the next may be about making it truer. Not only by adding more pixels, but by understanding presence itself, and engineering for the way humans actually experience being together.

Saverio Blasi is principal researcher at Nokia
No comments yet