8 Split Rendering User Plane

26.5653GPPRelease 18Split Rendering Media Service EnablerTS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

MSE-4

8.1 Split Rendering Signalling Protocols

8.2 Split Rendering Formats

8.2.1 General

8.2.2 Pixel Streaming Profile

8.2.2.1 Overview

The full-prerendering profile is restricted to the support of 2D content exclusively. The capabilities of the receiving UE are shared with the split rendering server prior to the start of the split rendering session. These capabilities and configurations would indicate the audio-visual output setup on the UE. For example, it would indicate that the output device is an HMD that supports 2 views and stereo audio.

8.2.2.2 Downlink Formats

The visual formats supported in this profile match the options that are supported by OpenXR. The supported view configurations are:

Mono: a single view
Stereo: one view per eye

The following composition layers are supported:

Projection: projection of the scene to a 2D plane using a perspective camera
Quad: a 2d surface that is composed in the 3D space by the XR runtime
Equirectangular: an equirectangular projection of the 3D space that is usually used to provide a background
Cubemap: a set of 6 swapchain images that represent a projection of the 3D scene onto a cube

Each swapchain image will have the following properties:

Format: RGB, RGB with Alpha (RGBA), and single-channel Depth formats with different precisions. RGB may be recovered from the coded YUV video stream. Depth information may be coded a separate video stream.
Dimension: width and height of the swapchain image
Mipmap: count of the level of detail of the swapchain image. The swapchain images maybe created at the UE side. Some Graphics Engines expect that the image dimensions are a power of 2.

For audio, the following formats are to be supported:

Stereo audio mixed and binauralized based on the viewer’s current pose
HOA audio mixed based on the viewer’s current position that extracted from the pose

8.2.2.3 Uplink Formats

The rendering process relies on the reception of pose predictions and user input. The pose information is formatted as follows:

An array of multiple pose predictions
Each pose prediction consists of a position and orientation component as a 3D (coordinates) and 4D (quaternion) vectors respectively.
The prediction timestamp associated with the predicted pose
An XR space for which the pose is created. If not present, this defaults to the viewer’s XR space.

The user input may be described following OpenXR’s convention on naming the actions.