4.2 End-to-end Architecture

26.1183GPPRelease 17TSVirtual Reality (VR) profiles for streaming applications

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

The architecture introduced in this clause addresses service scenarios for the distribution of VR content in file-based download and DASH-based streaming services.

Figure 4.2-1 considers a functional architecture for such scenarios. VR Content is acquired and the content is pre-processed such that all media components are mapped to the 3GPP 3DOF coordinate system and are temporarily synchronized. Such pre-processing may include video stitching, rotation or other translations. The 3GPP VR Headend is responsible for generating content that can be consumed by receivers conforming to the present document. Typically, 3D Audio and spherical video signals are properly encoded. Especially for video, the processing follows the two step approach of mapping, projecting and pre-processing to 2D texture and then encoding with regular 2D video codecs. After media encoding, the content is made available to file format encapsulation engine as elementary streams. The encapsulated streams are referred to as 3GPP VR Tracks, i.e. they are spatially mapped to the same timing system for synchronized playback. For file based distribution a complete file for delivery is generated by multiplexing the 3GPP VR tracks into a single file. For DASH based delivery, the content is mapped to DASH segments and proper Adaptation Sets are generated, including the necessary MPD signaling. The Adaptation Sets are included in a VR Media Presentation, documented in a DASH MPD. Content may be made available such that it is optimized for a specific viewpoint, so the same content may be encoded in an ensemble of multiple viewport-optimized versions.

The content is delivered through file based delivery of DASH based delivery, potentially using 3GPP services such as DASH in PSS or DASH-over-MBMS.

At the receiving end, a VR application is assumed that communicates with the different functional blocks in the receivers’ 3GPP VR service platform, namely, the DASH client or the download client, the file processing units for each media profile, the media decoding units, the rendering environment and the pose generator. The reverse operations of the VR Headend are performed. The operation is expected to be dynamic, especially taking into account updated pose information in the different stages of the receiver. The pose information is essential in the rendering units, but may also be used in the download or DASH client for delivery and decoding optimizations. For more details on the client reference architecture, refer to clause 4.3.

Figure 4.2-1: architecture for VR streaming services

Based on the architecture in Figure 4.2-1, the following components are relevant for 3GPP VR Streaming Services:

– Consistent source formats that can be distributed by a 3GPP VR Headend:

– For audio that can be used by a 3D audio encoding profile according the present document.

– For video that can be used by a spherical video encoding profile according to the present document.

– Mapping formats from a 3-dimensional representation to a 2D representation in order to use regular video encoding engines

– Encapsulation of the media format tracks to ISO file format together, adding sufficient information on to decode and render the VR content. The necessary metadata may be on codec level, file format level, or both.

– Delivery of the formats through file download, DASH delivery and DASH-over-MBMS delivery.

– Static and dynamic capabilities and environmental data, including decoding and rendering capabilities, as well as dynamic pose information.

– Media decoders that support the decoding of the formats delivered to the receiver.

– Information for audio and video rendering to present the VR Presentation on the VR device.

Based on the considerations above, to support the use case of VR Streaming, the following functions are defined in the present document:

– Consistent content contribution formats for audio and video for 360/3D AV applications including their metadata. This aspect should be considered informative, but example formats are provided to enable explaining the workflow.

– Efficient encoding of 360 video content. In the present document, this encoding is split in two steps, namely a pre-processing and projection mapping from 360 video to 2D texture and a regular video encoding.

– Efficient encoding of 3D audio including channels, objects and scene-based audio.

– Encapsulation of VR media into a file format for download delivery.

– The relevant enablers for DASH delivery of VR experiences.

– The necessary capabilities for static and dynamic consumption of the encoded and delivered experiences in the Internet media type and the DASH MPD.

– A reference client architecture that provides the signalling and processing steps for download delivery as well as DASH delivery as well as the interfaces between the 3GPP VR service platform, a VR application (e.g. pose information), and the VR rendering system (displays, GPU, loudspeakers).

– Decoding requirements for the defined 360 video formats.

– Decoding requirements for the defined 3D audio formats.

– Rendering requirements or recommendations for the above formats, for both separate and integrated decoding/rendering.