The visual and sound sensations that they create in the audience must be approached with a high degree of similarity to what is perceived in the real context they intend to recreate.

High realism audiovisual environments

To achieve total immersion, the audiovisual interface must include both image and sound in the same way that the user would perceive in a real situation. For both stimuli, the human perception system can obtain a three-dimensional sensation of space using two sensors, that is, two eyes and two ears. It would, therefore, be possible to recreate the relevant feelings if a realistic perception of the image in space is provided in conjunction with a sensation of spatial sound.

Regarding the image, two technologies are usually used to produce stereoscopic images: those systems in which the user uses special glasses (polarized, sealed or anaglyphics), and on the other hand, auto stereoscopic viewers that provide three-dimensional perception without glasses specials or alternative device.

Regarding sound, the
simplest and most widespread method for providing spatial audio is stereo,
which has been used for the last 50 years as an added value of sound
recordings, especially in music.

Since the mid-70s have
been used in cinemas initially and at home in recent years, the surround sound
systems ( surround ) that try to provide a better feeling than the stereo using
more channels of reproduction

However, these systems
are only intended to increase the sense of spectacle in film projections by
artificially adding in the production processes, special effects, explosions,
reverberation in rear speakers, ambiance, etc.

But they do not provide a real sense of 3D sound. Also, the useful listening area ( sweet spot ) is practically restricted to the center point of the speaker circle, degrading the perception outside the center. When it comes to realistic multimedia immersion in video conferencing, these systems are not suitable since their purpose is to reproduce effects in movies and subsequent speakers do not add any significant contribution to the meeting.

Another much more realistic strategy is to reproduce directly in the listener’s ears the signal that the listener would hear if it were in the acoustic space to be simulated. The sensation obtained by the listener depends on the fidelity of this reproduction. This strategy is commonly called binaural signal reproduction and can be done with both headphones and speakers. Also, the 3D sound signal can be synthesized if the listener’s HRTF ( Head Related Transfer Function) function is known.

As this system is susceptible to variations in the position of the listener concerning the optimal reproduction position, in practice, they are only valid for a single listener and in highly controlled listening environments, e.g., a user in front of the screen of a computer.As an alternative to surround sound systems, there are more advanced systems such as Ambisonics or Virtual Surround Panning that are suitable for more or less restricted listening areas, although always somewhat more significant than binaural systems with cross-talk canceler.

. The solution to extending the listening area in these systems involves increasing the number of speakers used, with the complexity and difficulty included, as well as making the transmission formats more flexible.