4.1.3 Video Signal Representation

26.1183GPPRelease 17TSVirtual Reality (VR) profiles for streaming applications

Commonly used video encoders cannot directly encode spherical videos, but only 2D textures. However, there is a significant benefit to reuse conventional 2D video encoders. Based on this, Figure 4.1-4 provides the basic video signal representation in the context of omnidirectional video in the context of the present document. By pre-processing, the spherical video is mapped to a 2D texture. The 2D texture is encoded with a regular 2D video encoder and the VR rendering metadata (i.e. the data describing the mapping from the spherical coordinate to the 2D texture) is encoded and provided along with the video bitstream, such that at the receiving end the inverse process can be applied to reconstruct the spherical video.

Figure 4.1-4: Video Signal Representation

Mapping of a spherical picture to a 2D texture signal is illustrated in Figure 4.1-5. The most commonly used mapping from spherical to 2D is the equirectangular projection (ERP) mapping. The mapping is bijective, i.e. it may be expressed in both directions.

Figure 4.1-5: Examples of Spherical to 2D mappings

Following the definitions in clause 4.1.2, the mapping of the colour samples of 2D texture images onto a spherical coordinate space in angular coordinates (ϕ,θ) for use in omnidirectional video applications for which the viewing perspective is from the origin looking outward toward the inside of the sphere. The spherical coordinates are defined so that ϕ is the azimuth and θ is the elevation.

Assume a 2D texture with pictureWidth and pictureHeight, being the width and height, respectively, of a monoscopic projected luma picture, in luma samples and the center point of a sample location (i,j) along the horizontal and vertical axes, respectively, then for the equirectangular projection the sphere coordinates (φ,θ) for the luma sample location, in degrees, are given by the following equations:

φ = ( 0.5 − i ÷ pictureWidth ) * 360
θ = ( 0.5 − j ÷ pictureHeight ) * 180

Whereas ERP is commonly used for production formats, other mappings may be applied, especially for distribution. The present document also introduces cubemap projection (CMP) for distribution in clause 5. In addition to regular projection, other pre-processing may be applied to the spherical video when mapped into 2D textures. Examples include region-wise packing, stereo frame packing or rotation. The present document defines different pre- and post-processing schemes in the context of video rendering schemes.