B.1 General

26.1183GPPRelease 17TSVirtual Reality (VR) profiles for streaming applications

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

Binaural rendering allows 3D audio content to be played back via headphones. The rendering is performed as a fast convolution of point sound source streams in the 3D space with head-related impulse responses (HRIRs) or binaural room impulse responses (BRIRs) corresponding to the direction of incidence relative to the listener. HRIRs will be provided from an external source.

Figure B.1-1: High level overview of an external binaural renderer setup.

The renderer has three input interfaces (see Fig. B.1-1): the audio streams and metadata from the MPEG-H decoder, a head tracking interface for scene displacement information (for listener tracking), and a head-related impulse response (HRIR) interface providing binaural impulse responses for a given direction of incidence. The metadata as described in B.3, together with the scene displacement information, is used to construct a scene model, from which the renderer can infer the proper listener-relative point source positions.

The audio input streams may include Channel content, Object content, HOA content. The renderer performs pre-processing steps to translate the respective content type into several point sources that are then processed for binaural rendering. Channel groups and objects that are marked a non-diegetic in the metadata are excluded from any scene displacement processing.