5 Video

26.1183GPPRelease 17TSVirtual Reality (VR) profiles for streaming applications

5.1 Video Operation Points

5.1.1 Definition of Operation Point

For the purpose to define interfaces to a conforming video decoder, video operation points are defined. In this case the following definitions hold:

– Operation Point: A collection of discrete combinations of different content formats including spatial and temporal resolutions, colour mapping, transfer functions, VR specific rendering metadata, etc. and the encoding format.

– Receiver: A receiver that can decode and render any bitstream that is conforming to a certain Operation Point.

– Bitstream: A video bitstream that conforms to a video encoding format and certain Operation Point including VR rendering metadata.

Figure 5.1-1: Video Operation Points

This clause focuses on the interoperability point to a media decoder as indicated in Figure 5.1-1. This clause does not deal with the access engine and file parser which addresses aspects how the video bitstream is delivered.

In all video operation points, the VR Presentation can be rendered using a single media decoder which provides decoded signals and rendering metadata by decoding relevant SEI messages.

5.1.2 Parameters of Visual Operation Point

This clause defines the potential parameters of Visual Operation Points. This includes the video decoder profile and levels with additional restrictions, conventional video signal parameters and VR rendering metadata. The requirements are defined from the perspective of the video decoder and renderer.

Parameters for a Visual Operation Point include:

– Codec, Profile and level requirements

– Restrictions of regular video parameters, typically expressed in the Video Usability information

– Usage and restrictions of VR rendering metadata

5.1.3 Operation Point Summary

The present document defines several operation points for different target applications and scenarios. In particular, two legacy operation points are defined that use existing video codecs H.264/AVC and H.265/HEVC to enable distribution of up to 4K full 360 mono video signals up to 60 Hz by using simple equirectangular projection.

In addition, one operation for each codec is defined that enables enhanced features, in particular stereo video, up to 8K mono, higher frame rates and HDR.

Furthermore, one additional operation point is defined that uses H.265/HEVC to enable distribution of up to 8K full 360 mono video signals up to 60 Hz and with HDR using equirectangular projection.

Table 5.1-1 summarizes the Operation Points, the detailed definitions are defined in the remainder of clause 5.1 where 3k refers to 2880 × 1440 pixels, 4k to 4096 × 2048 pixels, 6k to 6144 × 3072 pixels and 8k to 8192 × 4096 pixels (expressed in luminance pixel width × luminance pixel height).

Note: The Table only provides an informative high-level summary and is not considered to be complete. The specification text in the remainder of clause 5.1 refines the table and takes precedence over any information documented in the table.

Restrictions on source formats such as resolution and frame rates, content generation and encoding guidelines are provided in Annex A.

Table 5.1-1: High-level Summary of Operation Points

Operation Point name

Decoder

Bit depth

Typical

Original
Spatial
Resolution

Frame
Rate

Colour space format

Transfer

Characteristics

Projection

Rotation

RWP

Stereo

Basic H.264/AVC

H.264/AVC HP@L5.1

8

Up to 4k

Up to 60 Hz

BT.709

BT.709

ERP w/o padding

No

No

No

Main H.265/HEVC

H.265/HEVC MP10@L5.1

8, 10

Up to 6k in mono and 3k in stereo

Up to 60 Hz

BT.709

BT.2020

BT.709

ERP w/o padding

No

Yes

Yes

Flexible H.265/HEVC

H.265/HEVC MP10@L5.1

8, 10

Up to 8k in mono and 3k in stereo

Up to 120 Hz

BT.709

BT.2020

BT.709,
BT.2100 PQ, BT.2100 HLG

ERP w/o padding
CMP

No

Yes

Yes

Main 8K H.265/HEVC

H.265/HEVC MP10@L6.1

10

Up to 8k in mono and 6k in stereo

Up to 60 Hz for 8K and 120 Hz for 4k

BT.709

BT.2020

BT.709,
BT.2100 PQ, BT.2100 HLG

ERP w/o padding

No

Yes, but restricted to coverage

Yes

VR Rendering metadata in the Operation Points is carried in SEI messages. Receivers are expected to be able to process the VR metadata carried in SEI messages. However, the same VR metadata may be duplicated on system-level. In this case, the Receiver may rely on the system level processing to extract the relevant VR Rendering metadata rather than extracting this from the SEI message.

5.1.4 Basic H.264/AVC

5.1.4.1 General

This operation point targets simple deployments and legacy receivers at basic quality. A full 360-degree video signal with equirectangular projection following the 3GPP reference system may be provided to the decoding and rendering system for immediate decoding and rendering. Note that this operation point enables to distribute 4k video at regular frame rates and 3k video at higher frame rates.

Restricted coverage is supported as well, but only in a basic and backward-compatible fashion.

A Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation point shall conform to the requirements in the remainder of clause 5.1.4.

A receiver conforming to the 3GPP VR Basic H.264/AVC Operation point shall support decoding and rendering a Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation point. Detailed receiver requirements are provided in the remainder of clause 5.1.4.

5.1.4.2 Profile and level

A Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation point shall conform to H.264/AVC Progressive High Profile Level 5.1 [5] for H.264/AVC with the following additional restrictions and requirements:

– the maximum VCL Bit Rate is constrained to be 120 Mbps with cpbBrVclFactor and cpbBrNalFactor being fixed to be 1250 and 1500, respectively.

– the bitstream does not contain more than 10 slices per picture.

Note: High Profile for H.264/AVC excludes Flexible macro-block order, Arbitrary slice ordering, Redundant slices, Data partition.

Hence, for a Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation point, the following applies:

– The profile_idc shall be set to 100 indicating the High profile.

– The constrain_set0_flag, constrain_set1_flag, constrain_set2_flag and constrain_set3_flag shall all be set to 0.

– The value of level_idc shall not be greater than 51 (corresponding to the level 5.1) and should indicate the lowest level to which the Bitstream conforms.

5.1.4.3 Aspect Ratios and Spatial resolutions

Picture aspect ratio 2:1 should be used for the encoded picture.

The spatial resolution of the original format in equirectangular projection (ERP) should be one of the following (expressed in luminance pixel width × luminance pixel height):

– 4096 × 2048, 3840 × 1920, 3072 × 1536, 2880 × 1440, 2048 × 1024.

The spatial resolution of the distribution format should be one of the following (expressed in luminance pixel width × luminance pixel height):

– 3840 × 1920, 2880 × 1440, 1920 × 960, 1440 × 720, 960 × 480.

– 4096 × 2048, 3072 × 1536, 2048 × 1024, 1536 × 768, 1024 × 512.

NOTE: Distribution formats do not exceed the native resolution of the Operation Point, but they may be subsampled in order to optimize distribution or adapt to the viewing conditions.

A Receiver conforming to the 3GPP VR Basic H.264/AVC Operation Point shall be capable of decoding and rendering Bitstreams that contain spatial resolutions as above.

5.1.4.4 Colour information

A Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation Point shall use Recommendation ITU-R BT.709 [3] colorimetry. Hence, in the VUI, the colour parameter information shall be present, i.e.

– video_signal_type_present_flag value and colour_description_present_flag value shall be set to 1.

– The colour_primaries value, the transfer_characteristics value and the matrix_coefficients value in the Video Usability Information shall all be set to 1.

A Receiver conforming to the 3GPP VR Basic H.264/AVC Operation Point shall be capable of decoding and rendering Bitstreams that use Recommendation ITU-R BT.709 [3] colorimetry according to the bitstream requirements documented above.

5.1.4.5 Frame rates

A Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation Point shall have one of the following frame rates: 24; 25; 30; 24/1001; 30/1001; 50; 60; 60/1001 Hz.

The profile and level constraints of H.264/AVC Progressive High Profile Level 5.1 require careful balance of the permitted frame rates and spatial resolutions. Table 5.1-2 provides the permitted combinations of spatial resolutions and frame rates.

Table 5.1-2: Permitted combinations of spatial resolutions and frame rates

Spatial Resolution

Permitted Frame Rates

4096 × 2048

24; 25; 30; 24/1001; 30/1001 Hz

3840 × 1920

24; 25; 30; 24/1001; 30/1001 Hz

3072 × 1536

24; 25; 30; 24/1001; 30/1001; 50 Hz

2880 × 1440

24; 25; 30; 24/1001; 30/1001; 50; 60; 60/1001 Hz

2048 × 1024

24; 25; 30; 24/1001; 30/1001; 50; 60; 60/1001 Hz

In the VUI, the timing information may be present:

– If the timing information is present, i.e. the value of timing_info_present_flag is set to 1, then the values of num_units_in_tick and time_scale shall be set according to the frame rates allowed above. The timing information present in the video Bitstream should be consistent with the timing information signalled at the system level.

– The frame rate shall not change between two RAPs. fixed_frame_rate_flag value shall be set to 1.

A Receiver conforming to the 3GPP VR Basic H.264/AVC Operation Point shall be capable of decoding and rendering Bitstreams that use frame rates according to the bitstream requirements documented above.

5.1.4.6 Random access point

For H.264/AVC random access point (RAP) definition refer to TS 26.116 [12], clause 4.4.1.1.

RAPs shall be present in the Bitstream at least once every 5 seconds. It is recommended that RAPs occur in the video Bitstream on average at least every 2 seconds. The time interval between successive RAPs is measured as the difference between their respective decoding time values.

5.1.4.7 Sequence parameter set

The following restrictions apply to the active Sequence Parameter Set (SPS):

– gaps_in_frame_num_value_allowed_flag value shall be set to 0.

– The Video Usability Information shall be present in the active Sequence Parameter Set. The vui_parameter_present_flag shall be set to 1.

– The source video format shall be progressive. frame_mbs_only_flag shall be set to 1 for every picture of the Bitstream.

5.1.4.8 Video usability information

In addition to the previous constraints on the VUI on colour information in clause 5.1.4.4 and on frame rates in clause 5.1.4.5, this clause contains further requirements.

The aspect ratio information shall be present, i.e.

– The aspect_ratio_present_flag value shall be set to 1.

– The aspect_ratio_idc value shall be set to 1 indicating a square pixel format.

There are no requirements on output timing conformance for H.264/AVC decoding (Annex C of [5]). The Hypothetical Reference Decoder (HRD) parameters, if present, should be ignored by the Receiver.

5.1.4.9 Omni-directional Projection Format

This operation point uses equirectangular projection, such the video is automatically rendered in the 3GPP reference system. This is enabled by using the MPEG metadata on equirectangular projection.

A Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation Point shall include the equirectangular projection SEI message (payloadType equal to 150) at every RAP. The erp_guard_band_flag shall be set to 0.

A Receiver conforming to the 3GPP VR Basic H.264/AVC Operation Point shall be able to process the information contained on equirectangular projection SEI message (payloadType equal to 150) with erp_guard_band_flag shall be set to 0.

5.1.4.10 Restricted Coverage

This operation point permits the decoding and rendering of restricted coverage video signals in a rudimentary way. In this case it is expected that pixels that are projected to a non-covered region are included in the full image, but are visually differentiated from the covered region, for example using black, grey or white colour.

Application or system-based signalling may support signalling the coverage region.

5.1.4.11 Other VR Metadata

For a Bitstream conforming to the 3GPP VR Basic H.264/AVC Operation Point:

– the equirectangular projection SEI message (payloadType equal to 150) with erp_guard_band_flag not set to 0 shall not be present,

– the sphere rotation SEI message (payloadType equal to 154) shall not be present,

– the region-wise packing SEI message (payloadType equal to 155) shall not be present,

– the frame-packing arrangement SEI message (payloadType equal to 45) shall not be present.

5.1.4.12 Receiver Compatibility

Receivers conforming to the 3GPP VR Basic H.264/AVC Operation Point shall support decoding and displaying 3GPP VR Basic H.264/AVC Operation Point Bitstreams.

Receivers conforming to the 3GPP VR Basic H.264/AVC Operation Point shall support all Receiver requirements in clause 5.1.4.

5.1.5 Main H.265/HEVC

5.1.5.1 General

This operation targets enhanced 360 video decoding and rendering of H.265/HEVC video for VR applications. Among others, this operation point supports among others rendering of:

– 4K mono video at up to 60 Hz frame rates

– 3K stereoscopic video at up to 60 Hz frame rates

– Higher than 4K resolutions for restricted coverage

– Rendering of certain viewports in higher quality than others beyond 4K

– extended colour space and SDR transfer characteristics

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation point shall conform to the requirements in the remainder of clause 5.1.5.

A Receiver conforming to the 3GPP VR Main H.265/HEVC Operation point shall support decoding and rendering a Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation point. Detailed receiver requirements are provided in the remainder of clause 5.1.5.

5.1.5.2 Profile and level

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation point shall conform to H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [6].

Hence, for a Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation point shall comply with the following restrictions:

– The general_profile_idc shall be set to 2 indicating the Main10 profile.

– The general_tier_flag shall be set to 0 indicating the Main tier.

– The value of level_idc shall not be greater than 153 (corresponding to the Level 5.1) and should indicate the lowest level to which the Bitstream conforms.

5.1.5.3 Bit depth

Bitstreams conforming to the 3GPP VR Main H.265/HEVC Operation point shall be encoded with either 8 or 10 bit precision:

– bit_depth_luma_minus8 = 0 or 2 (8 or 10 bits respectively)

– bit_depth_chroma_minus8 = bit_depth_luma_minus8

Receivers conforming to the 3GPP VR Main H.265/HEVC Operation Point shall support 8 bit and10 bit precision.

5.1.5.4 Spatial Resolutions

Due to the options provided in this operation point, additional original format may be considered that can then be decoded and rendered by a Receiver conforming to this operation point. Recommended original formats beyond those specified in clause 5.1.4.3 for equirectangular projection (ERP) are:

– Mono formats: 6144 × 3072, 5880 × 2880

– Stereo formats with resolution for each eye: 3840 × 1920, 2880 × 1440, 2048 × 1024

If original signals are beyond the maximum permitted resolution of the video codec, then the region-wise packing needs to be applied to generate suitable distribution formats.

The distribution formats are more flexible as additional VR metadata as defined in the remainder of clause 5.1.5 may be used. However, for the distribution formats, all requirements of H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [5] shall apply to the decoded texture signal.

According to H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [6], the maximum luminance width and height does not exceed 8,444 pixels. In addition to the H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [6] constraints, a Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation point, the decoded texture signal shall in addition:

– not exceed the luminance width of 8192 pixels, and

– not exceed the luminance height of 8192 pixels.

A Receiver conforming to the 3GPP VR Main H.265/HEVC Operation Point shall be capable of decoding and rendering Bitstreams with a decoded texture signal of maximum luminance width of 8192 pixels a, maximum luminance height of 8192 pixels and the overall profile/level constraints.

5.1.5.5 Colour information and Transfer Characteristics

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation Point shall use either Recommendation ITU-R BT.709 [3] colorimetry or Recommendation ITU-R BT.2020 [4] colorimetry in non-constant luminance for standard dynamic range (SDR).

Specifically, in the VUI, the colour parameter information shall be present, i.e.:

– video_signal_type_present_flag value and colour_description_present_flag value shall be set to 1.

– If BT.709 [3] is used, it shall be signalled by setting colour_primaries to the value 1, transfer_characteristics to the value 1 and matrix_coeffs to the value 1.

– If BT.2020 [4] and SDR is used,

– it shall be signalled by setting colour_primaries to the value 9, transfer_characteristics to the value 14 and matrix_coeffs to the value 9;

– the chroma_loc_info_present_flag should be equal to 1, and if set, the chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field shall both be equal to 2.

A Receiver conforming to the 3GPP VR Main H.265/HEVC Operation Point shall be capable of decoding and rendering according to any of the two above configurations.

5.1.5.6 Frame rates

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation Point shall have one of the following frame rates: 24; 25; 30; 24/1001; 30/1001; 50; 60; 60/1001 Hz.

Selected combinations of frame rates with other source parameters are provided in Annex A.2.2.2.

In the VUI, the timing information may be present:

– If the timing information is present, i.e. the value of vui_timing_info_present_flag is set to 1, then the values of vui_num_units_in_tick and vui_time_scale shall be set according to the frame rates allowed in this clause. The timing information present in the video Bitstream should be consistent with the timing information signalled at the system level.

– The frame rate shall not change between two RAPs. fixed_frame_rate_flag value, if present, shall be set to 1.

There are no requirements on output timing conformance for H.265/HEVC decoding (Annex C of [6]). The Hypothetical Reference Decoder (HRD) parameters, if present, should be ignored by the Receiver.

A Receiver conforming to the 3GPP VR Main H.265/HEVC Operation Point shall be capable of decoding and rendering Bitstreams that use frame rates according to the bitstream requirements documented above.

5.1.5.7 Random access point

For H.265/HEVC random access point (RAP) definition refer to TS 26.116 [12], clause 4.5.1.2.1.

RAPs shall be present in the Bitstream at least once every 5 seconds. It is recommended that RAPs occur in the video Bitstream on average at least every 2 seconds. The time interval between successive RAPs is measured as the difference between their respective decoding time values.

If viewport adaptation is offered, then RAPs should occur even more frequently to enable transitioning across these viewport-optimized bitstreams.

5.1.5.8 Video and Sequence Parameter Sets

Receivers conforming to the 3GPP VR Main H.265/HEVC Operation Point should ignore the content of all Video Parameter Sets (VPS) NAL units [9] as defined in Recommendation ITU-T H.265 / ISO/IEC 23008-2 [6].

The following restrictions apply to the active Sequence Parameter Set (SPS):

– The Video Usability Information (VUI) shall be present in the active Sequence Parameter Set. The vui_parameters_present_flag shall be set to 1.

– The chroma sub-sampling shall be 4:2:0, chroma_format_idc value shall be set to 1.

– The source video format shall be progressive, i.e.:

– The general_progressive_source_flag shall be set to 1,

– The general_interlaced_source_flag shall be set to 0,

– The general_frame_only_constraint_flag shall be set to 1.

Receivers conforming to the 3GPP VR Main H.265/HEVC Operation Point shall support Bitstreams with the restrictions on the SPS defined above.

5.1.5.9 Video usability information

In addition to the previous constraints on the VUI on colour information in clauses 5.1.5.5 and 5.1.5.6, this clause contains further requirements.

The aspect ratio information shall be present, i.e.:

– The aspect_ratio_present_flag value shall be set to 1.

– The aspect_ratio_idc value shall be set to 1 indicating a square pixel format.

There are no requirements on output timing conformance for H.265/HEVC decoding (Annex C of [6]). The Hypothetical Reference Decoder (HRD) parameters, if present, should be ignored by the Receiver.

5.1.5.10 Omni-directional Projection Formats

This operation point permits using either equirectangular projection following the MPEG metadata specifications, such the video is automatically rendered in the 3GPP reference system.

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation Point shall include at every RAP the equirectangular projection SEI message (payloadType equal to 150) with the erp_guard_band_flag set to 0.

5.1.5.11 Restricted Coverage

This operation point permits to distribute content with less than 360 degree coverage in an encoding optimized manner by the use of region-wise packing.

It is recommended that the number of pixels that are projected to non-covered regions are minimized in the decoded texture signal. If this is applied and not the full 360 video is encoded, the region-wise packing SEI message (payloadType equal to 155) shall be included in the bitstream to signal the encoded regions of the 360 video. If present, it shall be present in a H.265/HEVC RAP.

Application or system-based signalling may support signalling the exact coverage region in the spherical coordinates.

5.1.5.12 Viewport-Optimized Content

This operation point permits the use of region-wise packing, for example to optimize the spatial resolution of specific viewports. For some example usage and settings, refer to Annex A.2.

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation Point may include the region-wise packing SEI message (payloadType equal to 155). If present, it shall be present in a H.265/HEVC RAP.

A Receiver conforming to the 3GPP VR Main H.265/HEVC Operation Point shall be able to process the region-wise packing SEI message (payloadType equal to 155).

5.1.5.13 Frame packing arrangement

A Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation Point may include the frame packing arrange SEI message (payloadType equal to 45). If present, then the following settings shall apply:

– The SEI message is present in a H.265/HEVC RAP.

– The value of frame_packing_arrangement_cancel_flag is equal to 0.

– The value of frame_packing_arrangement_type is equal to 4.

– The value of quincunx_sampling_flag is equal to 0.

– The value of spatial_flipping_flag is equal to 0.

– The value of field_views_flag is equal to 0.

– The value of frame0_grid_position_x is equal to 0.

– The value of frame0_grid_position_y is equal to 0.

– The value of frame1_grid_position_x is equal to 0.

– The value of frame1_grid_position_y is equal to 0.

A Receiver conforming to the 3GPP VR Main H.265/HEVC Operation Point shall process the frame packing arrangement SEI (payloadType equal to 45) with settings restrictions as above. If processing is supported, then the Receiver shall render the viewport indicated by the message.

5.1.5.14 Other VR Metadata

For a Bitstream conforming to the 3GPP VR Main H.265/HEVC Operation Point:

– the sphere rotation SEI message (payloadType equal to 154) shall not be present.

– any frame-packing arrangement SEI message (payloadType equal to 45) that does not conform to an SEI message defined in clause 5.1.5.13 shall not be present.

5.1.5.15 Receiver Compatibility

Receivers conforming to the 3GPP VR Main H.265/HEVC Operation Point shall support decoding and displaying 3GPP VR Main H.265/HEVC Operation Point Bitstreams.

Receivers conforming to the 3GPP VR Main H.265/HEVC Operation Point shall support all Receiver requirements in clause 5.1.5. Specifically, receivers conforming to the 3GPP VR Main H.265/HEVC Operation Point shall support decoding and rendering Bitstreams that include the following VR rendering metadata:

– the region-wise packing SEI message (for details see clauses 5.1.5.11 and 5.1.5.12)

– the equirectangular projection SEI message (for details see clause 5.1.5.10)

– the frame-packing arrangement SEI message (for details see clause 5.1.5.13)

– any combinations of those

5.1.6 Flexible H.265/HEVC

5.1.6.1 General

This operation targets enhanced 360 video decoding and rendering of H.265/HEVC video for VR applications. Among others, this operation point supports rendering of:

– 4K mono video at up to 120 Hz frame rates

– 3K stereoscopic video at up to 60 Hz frame rates

– Higher than 4K resolutions for restricted coverage

– Rendering of certain viewports in higher quality than others beyond 4K

– ERP and CMP projection

– SDR and HDR transfer characteristics

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation point shall conform to the requirements in the remainder of clause 5.1.6.

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation point shall support decoding and rendering a Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation point. Detailed receiver requirements are provided in the remainder of clause 5.1.6.

5.1.6.2 Profile and level

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation point shall conform to H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [6].

Hence, for a Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation point shall comply with the following restrictions:

– The general_profile_idc shall be set to 2 indicating the Main10 profile.

– The general_tier_flag shall be set to 0 indicating the Main tier.

– The value of level_idc shall not be greater than 153 (corresponding to the Level 5.1) and should indicate the lowest level to which the Bitstream conforms.

5.1.6.3 Bit depth

Bitstreams conforming to the 3GPP VR Flexible H.265/HEVC Operation point shall be encoded with either 8 or 10 bit precision:

– bit_depth_luma_minus8 = 0 or 2 (8 or 10 bits respectively)

– bit_depth_chroma_minus8 = bit_depth_luma_minus8

Receivers conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall support 8 bit and10 bit precision.

5.1.6.4 Spatial Resolutions

Due to the options provided in this operation point, additional original format may be considered that can then be decoded and rendered by a Receiver conforming to this operation point. Recommended original formats beyond those specified in clause 5.1.5.4 for equirectangular projection (ERP) are:

– Mono formats: 8192 × 4096

This operation point permits the distribution of ERP signals directly as well as the conversion of ERP signals to cube-map (CMP) projection. A conversion operation is provided in Annex A.2.3. Typical original cubemap format, either generated by conversion or provided by the content provider, that are suitable for this operation point are listed as follows:

– Mono Formats: 6144×4096, 4608×3072, 4320×2880, 3072×2048, 2880×1920, 2304×1536, 2160×1440

– Stereo Formats with resolution for each eye: 4320×2880, 3072×2048, 2880×1920, 2304×1536, 2160×1440

If original signals are beyond the maximum permitted resolution of the video codec, then region wise packing needs to be applied to generate suitable distribution formats.

The distribution formats are more flexible as additional VR metadata as defined in the remainder of clause 5.1.6 may be used. However, for the distribution formats, all requirements of H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [5] shall apply to the decoded texture signal.

According to H.265/HEVC Main-10 Profile Main Tier Profile Level 5.1 [6], the maximum luminance width and height does not exceed 8,444 pixels. However, for improved interoperability, for a Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation point, the decoded texture signal:

– shall not exceed the luminance width of 8192 pixels, and

– shall not exceed the luminance height of 8192 pixels.

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall be capable of decoding and rendering Bitstreams with a decoded texture signal of maximum luminance width of 8192 pixels and maximum luminance height of 8192 pixels.

5.1.6.5 Colour information and Transfer Characteristics

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall use either Recommendation ITU-R BT.709 [3] colorimetry or Recommendation ITU-R BT.2020 [4] colorimetry in non-constant luminance for standard dynamic range (SDR).

For Perceptual Quantization (PQ) High Dynamic Range (HDR), BT.2020 [4] colorimetry in non-constant luminance and the PQ electro-optical transfer function (EOTF) as defined in Recommendation ITU-R BT.2100 [11] are used.

For Hybrid Log–Gamma (HLG) High Dynamic Range (HDR), BT.2020 [4] colorimetry in non-constant luminance and the HLG opto-electronic transfer function (OETF) as defined in Recommendation ITU-R BT.2100 [11] are used.

Specifically, in the VUI, the colour parameter information shall be present, i.e.:

– video_signal_type_present_flag value and colour_description_present_flag value shall be set to 1.

– If BT.709 [3] is used, it shall be signalled by setting colour_primaries to the value 1, transfer_characteristics to the value 1 and matrix_coeffs to the value 1.

– If BT.2020 [4] and SDR is used,

– it shall be signalled by setting colour_primaries to the value 9, transfer_characteristics to the value 14 and matrix_coeffs to the value 9,

– the chroma_loc_info_present_flag should be equal to 1, and if set the chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field shall both be equal to 2

– If BT.2020 [4] and ITU-R BT.2100 [11] are used in HDR,

– it shall be signalled by setting colour_primaries to the value 9 and matrix_coeffs to the value 9,

– the chroma_loc_info_present_flag should be equal to 1, and if set, the chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field shall both be equal to 2

– If the PQ EOTF is used, transfer_characteristics shall be set to the value 16.

– If the HLG OETF is used, transfer_characteristics shall be set to the value 14. The Bitstream shall also contain the alternative_transfer_characteristics SEI message. The alternative_transfer_characteristics SEI message shall be inserted at each RAP, and its parameter preferred_transfer_characteristics shall be set to the value 18.

NOTE 1: HLG is specified using the alternative_transfer_characteristics method only to ensure backwards compatibility with earlier releases at this Operation Point.

NOTE 2: If the content is provided to a receiver that is not able to process the SEI message, the receiver uses the backward-compatibility mode of HLG to present an SDR representation of the signal.

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall be capable of decoding and rendering according to any of the above configurations.

SEI messages for HDR metadata signalling may be used. The requirements and recommendations for Bitstreams and Receivers as documented in TS 26.116 [12], clause 4.5.5.7 also apply for the 3GPP VR Flexible H.265/HEVC Operation Point.

5.1.6.6 Frame rates

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall have one of the following frame rates: 24; 25; 30; 24/1001; 30/1001; 50; 60; 60/1001, 90, 100, 120 Hz.

Selected combinations of frame rates with other source parameters are provided in Annex A.2.2.2.

In the VUI, the timing information may be present:

– If the timing information is present, i.e. the value of vui_timing_info_present_flag is set to 1, then the values of vui_num_units_in_tick and vui_time_scale shall be set according to the frame rates allowed in this clause. The timing information present in the video Bitstream should be consistent with the timing information signalled at the system level.

– The frame rate shall not change between two RAPs. fixed_frame_rate_flag value, if present, shall be set to 1.

There are no requirements on output timing conformance for H.265/HEVC decoding (Annex C of [6]). The Hypothetical Reference Decoder (HRD) parameters, if present, should be ignored by the Receiver.

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall be capable of decoding and rendering Bitstreams that use frame rates according to the bitstream requirements documented above.

5.1.6.7 Random access point

For H.265/HEVC random access point (RAP) definition refer to TS 26.116 [12], clause 4.5.1.2.1.

RAPs shall be present in the Bitstream at least once every 5 seconds. It is recommended that RAPs occur in the video Bitstream on average at least every 2 seconds. The time interval between successive RAPs is measured as the difference between their respective decoding time values.

If viewport adaptation is offered, then RAPs should occur even more frequently to enable transitioning across these viewport-optimized bitstreams.

5.1.6.8 Video and Sequence Parameter Sets

Receivers conforming to the 3GPP VR Flexible H.265/HEVC Operation Point should ignore the content of all Video Parameter Sets (VPS) NAL units [9] as defined in Recommendation ITU-T H.265 / ISO/IEC 23008-2 [6].

The following restrictions apply to the active Sequence Parameter Set (SPS):

– The Video Usability Information (VUI) shall be present in the active Sequence Parameter Set. The vui_parameters_present_flag shall be set to 1.

– The chroma sub-sampling shall be 4:2:0, chroma_format_idc value shall be set to 1.

– The source video format shall be progressive, i.e.:

– The general_progressive_source_flag shall be set to 1,

– The general_interlaced_source_flag shall be set to 0,

– The general_frame_only_constraint_flag shall be set to 1.

Receivers conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall support Bitstreams with the restrictions on the SPS defined above.

5.1.6.9 Video usability information

In addition to the previous constraints on the VUI on colour information in clauses 5.1.6.5 and 5.1.6.6, this clause contains further requirements.

The aspect ratio information shall be present, i.e.:

– The aspect_ratio_present_flag value shall be set to 1.

– The aspect_ratio_idc value shall be set to 1 indicating a square pixel format.

There are no requirements on output timing conformance for H.265/HEVC decoding (Annex C of [6]). The Hypothetical Reference Decoder (HRD) parameters, if present, should be ignored by the Receiver.

5.1.6.10 Omni-directional Projection Formats

This operation point permits using either equirectangular projection or cubemap projection following the MPEG metadata specifications, such the video is automatically rendered in the 3GPP reference system.

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall include at every RAP either:

– the equirectangular projection SEI message (payloadType equal to 150) with the erp_guard_band_flag set to 0, or

– the cubemap projection SEI message (payloadType equal to 151).

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall be able to process the equirectangular projection SEI message (payloadType equal to 150) and the cubemap projection SEI message (payloadType equal to 151).

5.1.6.11 Restricted Coverage

This operation point permits to distribute content with less than 360 degree coverage in an encoding optimized manner by the use of region-wise packing.

It is recommended that the number of pixels that are projected to non-covered regions are minimized in the decoded texture signal. If this is applied and not the full 360 video is encoded, the region-wise packing SEI message (payloadType equal to 155) shall be included in the bitstream to signal the encoded regions of the 360 video. If present, it shall be present in a H.265/HEVC RAP.

Application or system-based signalling may support signalling the exact coverage region in the spherical coordinates.

5.1.6.12 Viewport-Optimized Content

This operation point permits the use of region-wise packing, for example to optimize the spatial resolution of specific viewports. For some example usage and settings, refer to Annex A.2.

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation Point may include the region-wise packing SEI message (payloadType equal to 155). If present, it shall be present in a H.265/HEVC RAP.

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall be able to process the region-wise packing SEI message (payloadType equal to 155).

5.1.6.13 Frame packing arrangement

A Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation Point may include the frame packing arrange SEI message (payloadType equal to 45). If present, then the following settings shall apply:

– The SEI message is present in a H.265/HEVC RAP.

– The value of frame_packing_arrangement_cancel_flag is equal to 0.

– The value of frame_packing_arrangement_type is equal to 4.

– The value of quincunx_sampling_flag is equal to 0.

– The value of spatial_flipping_flag is equal to 0.

– The value of field_views_flag is equal to 0.

– The value of frame0_grid_position_x is equal to 0.

– The value of frame0_grid_position_y is equal to 0.

– The value of frame1_grid_position_x is equal to 0.

– The value of frame1_grid_position_y is equal to 0.

A Receiver conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall process the frame packing arrangement SEI (payloadType equal to 45) with settings restrictions as above. If processing is supported, then the Receiver shall render the viewport indicated by the message.

5.1.6.14 Other VR Metadata

For a Bitstream conforming to the 3GPP VR Flexible H.265/HEVC Operation Point:

– the sphere rotation SEI message (payloadType equal to 154) shall not be present.

– any frame-packing arrangement SEI message (payloadType equal to 45) that does not conform to an SEI message defined in clause 5.1.6.13 shall not be present.

5.1.6.15 Receiver Compatibility

Receivers conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall support decoding and displaying 3GPP VR Main H.265/HEVC Operation Point Bitstreams and 3GPP VR Flexible H.265/HEVC Operation Point Bitstreams.

Receivers conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall support all Receiver requirements in clause 5.1.6. Specifically, receivers conforming to the 3GPP VR Flexible H.265/HEVC Operation Point shall support decoding and rendering Bitstreams that include the following display or VR rendering metadata:

– the region-wise packing SEI message (for details see clauses 5.1.6.11 and 5.1.6.12),

– the equirectangular projection SEI message (for details see clause 5.1.6.10),

– the cubemap projection SEI message (for details see clause 5.1.6.10),

– the frame-packing arrangement SEI message (for details see clause 5.1.6.13),

– the alternative_transfer_characteristics SEI message with preferred_transfer_characteristics set to the value 18 (for details see clause 5.1.6.5),

– any combinations of those.

5.1.7 Main 8K H.265/HEVC

5.1.7.1 General

This operation targets enhanced 360 video decoding and rendering of H.265/HEVC video for VR applications. Among others, this operation point supports among others rendering of:

– 8K mono video at up to 60 Hz frame rates for full coverage

– 6K stereoscopic video at up to 60 Hz frame rates for full coverage

– 4K mono video at up to 120 Hz frame rates for full coverage

– Higher than 8K resolutions for restricted coverage

– Extended colour space, and SDR and HDR transfer characteristics

A Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation point shall conform to the requirements in the remainder of clause 5.1.7.

A Receiver conforming to the 3GPP VR Main 8K H.265/HEVC Operation point shall support decoding and rendering a Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation point. Detailed receiver requirements are provided in the remainder of clause 5.1.7.

5.1.7.2 Profile and level

A Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation point

shall conform to H.265/HEVC Main-10 Profile Main Tier Profile Level 6.1 [6], i.e. for a Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation,

– the general_profile_idc shall be set to 2 indicating the Main10 profile.

– the general_tier_flag shall be set to 0 indicating the Main tier.

– the value of level_idc shall not be greater than 183 (corresponding to the Level 6.1) and should indicate the lowest level to which the Bitstream conforms.

– shall have general_progressive_source_flag equal to 1,

– shall have general interlaced_source_flag equal to 0,

– shall have general_frame_only_constraint_flag equal to 1

– shall conform to following further limitations:

– if frame rate 60 fps is used, then the maximum luma picture size in samples of 33,554,432 is not exceeded,

– the maximum VCL Bit Rate is constrained to be 80 Mbps with CpbVclFactor and CpbNalFactor being fixed to be 1000 and 1100, respectively.

5.1.7.3 Bit depth

Bitstreams conforming to the 3GPP VR Main 8K H.265/HEVC Operation point shall be encoded with 10-bit precision:

– bit_depth_luma_minus8 = 2 (10 bits)

– bit_depth_chroma_minus8 = bit_depth_luma_minus8

Receivers conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall support 10-bit precision.

5.1.7.4 Spatial Resolutions

Due to the options provided in this operation point, additional original format may be considered that can then be decoded and rendered by a Receiver conforming to this operation point. Recommended original formats beyond those specified in clause 5.1.4.3 for equirectangular projection (ERP) are:

– Mono formats: 8192 × 4096, 7680 × 3840, 6144 × 3072, 5760 × 2880

– Stereo formats with resolution for each eye: 6144 × 3072, 5860 × 2880, 4096 × 2048, 3840 × 1920, 2880 × 1440, 2048 × 1024

The distribution formats are more flexible as additional VR metadata as defined in the remainder of clause 5.1.7 may be used. However, for the distribution formats, all requirements of H.265/HEVC Main-10 Profile Main Tier Profile Level 6.1 [5] shall apply to the decoded texture signal.

According to H.265/HEVC Main-10 Profile Main Tier Profile Level 6.1 [6], the maximum luminance width and height does not exceed 16,888 pixels. In addition to the H.265/HEVC Main-10 Profile Main Tier Profile Level 6.1 [6] constraints, a Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation point, the decoded texture signal shall in addition:

– not exceed the luminance width of 16,384 pixels, and

– not exceed the luminance height of 16,384 pixels.

A Receiver conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall be capable of decoding and rendering Bitstreams with a decoded texture signal of maximum luminance width of 16,384 pixels, a maximum luminance height of 16,384 pixels and the overall profile/level constraints.

5.1.7.5 Colour information and Transfer Characteristics

A Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall use either Recommendation ITU-R BT.709 [3] colorimetry or Recommendation ITU-R BT.2020 [4] colorimetry in non-constant luminance for standard dynamic range (SDR).

For Perceptual Quantization (PQ) High Dynamic Range (HDR), BT.2020 [4] colorimetry in non-constant luminance and PQ electro-optical transfer function (EOTF) as defined in Recommendation ITU-R BT.2100 [11] are used.

For Hybrid Log–Gamma (HLG) High Dynamic Range (HDR), BT.2020 [4] colorimetry in non-constant luminance and the HLG opto-electronic transfer function (OETF) as defined in Recommendation ITU-R BT.2100 [11] are used.

Specifically, in the VUI, the colour parameter information shall be present, i.e.:

– video_signal_type_present_flag value and colour_description_present_flag value shall be set to 1.

– If BT.709 [3] is used, it shall be signalled by setting colour_primaries to the value 1, transfer_characteristics to the value 1 and matrix_coeffs to the value 1.

– If BT.2020 [4] and SDR is used, it shall be signalled by setting colour_primaries to the value 9, transfer_characteristics to the value 14 and matrix_coeffs to the value 9,

– the chroma_loc_info_present_flag should be equal to 1, and if set the chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field shall both be equal to 2.

– If BT.2020 [4] and ITU-R BT.2100 [11] are used in HDR, it shall be signalled by setting colour_primaries to the value 9 and matrix_coeffs to the value 9. The chroma_sample_loc_type_top_field shall be set to 2.

– If the PQ EOTF is used, transfer_characteristics shall be set to the value 16.

– If the HLG OETF is used, transfer_characteristics shall be set to either the value 18 or 14. In the latter case, the Bitstream shall also contain the alternative_transfer_characteristics SEI message. The alternative_transfer_characteristics SEI message shall be inserted at each RAP, and its parameter preferred_transfer_characteristics shall be set to the value 18.

– the chroma_loc_info_present_flag should be equal to 1, and if set, the chroma_sample_loc_type_top_field and chroma_sample_loc_type_bottom_field shall both be equal to 2.

A Receiver conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall be capable of decoding and rendering according to any of the above configurations.

SEI messages for HDR metadata signalling may be used. The requirements and recommendations for Bitstreams and Receivers as documented in TS 26.116 [12], clause 4.5.5.7 also apply for the 3GPP VR Main 8K H.265/HEVC Operation Point.

5.1.7.6 Frame rates

A Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall have one of the following frame rates: 24; 25; 30; 24/1001; 30/1001; 50; 60; 60/1001; 90; 100; 120; 120/1001 Hz.

Selected combinations of frame rates with other source parameters are provided in Annex A.2.2.2a.

In the VUI, the timing information may be present:

– If the timing information is present, i.e. the value of vui_timing_info_present_flag is set to 1, then the values of vui_num_units_in_tick and vui_time_scale shall be set according to the frame rates allowed in this clause. The timing information present in the video Bitstream should be consistent with the timing information signalled at the system level.

– The frame rate shall not change between two RAPs. fixed_frame_rate_flag value, if present, shall be set to 1.

There are no requirements on output timing conformance for H.265/HEVC decoding (Annex C of [6]). The Hypothetical Reference Decoder (HRD) parameters, if present, should be ignored by the Receiver.

A Receiver conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall be capable of decoding and rendering Bitstreams that use frame rates according to the bitstream requirements documented above.

5.1.7.7 Random access point

The same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.7 apply.

5.1.7.8 Video and Sequence Parameter Sets

Receivers conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall satisfy the same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.8.

5.1.7.9 Video usability information

The same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.9 apply.

5.1.7.10 Omni-directional Projection Formats

Bitstreams conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall satisfy the same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.10.

5.1.7.11 Restricted Coverage

The same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.11 apply.

However, the region-wise packing SEI message (payloadType equal to 155) shall only be used for restricted coverage signalling (in contrast to 3GPP VR Main H.265/HEVC Operation Point, where region-wise packing SEI message is used for signalling both restricted coverage and viewport-optimized content).

When the video does not cover the entire sphere, for each picture, there shall be a region-wise packing SEI message present in the bitstream that applies to the picture. Furthermore, the following restrictions apply:

– num_packed_regions shall be set to 1.

– rwp_guard_band_flag[0] shall be equal to 0.

– rwp_transform_type[0] shall be equal to 0.

– The value of packed_region_width[0] shall be equal to proj_region_width[0].

– The value of packed_region_height[0] shall be equal to proj_region_height[0].

5.1.7.12 Frame packing arrangement

Bitstreams and receivers conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall satisfy the same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.13.

5.1.7.13 Other VR Metadata

For a Bitstream conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point:

– the equirectangular projection SEI message (payloadType equal to 150) with erp_guard_band_flag not set to 0 shall not be present,

– the sphere rotation SEI message (payloadType equal to 154) shall not be present.

– the region-wise packing SEI message (payloadType equal to 155) not conforming to the restrictions stated in clause 5.1.7.11 shall not be present.

5.1.7.14 Receiver Compatibility

Receivers conforming to the 3GPP VR Main 8K H.265/HEVC Operation Point shall satisfy the same requirements as those defined for 3GPP VR Main H.265/HEVC Operation Point in clause 5.1.5.15.

5.2 Video Media Profiles

5.2.1 Introduction and Overview

This clause defines the media profiles for video. Media profiles include specification on the following:

– Elementary stream constraints based on the video operation points defined in clause 5.1.

– File format encapsulation constraints and signalling including capability signalling. The defines to a 3GPP VR Track as defined above.

– DASH Adaptation Set constraints and signalling including capability signalling. This defines a DASH content format profile.

Table 5.2-1 provides an overview of the Media Profiles in defined in the remainder of clause 5.3.2.

Table 5.2-1 Video Media Profiles

Media Profile

Operation Point

Sample Entry

DASH Integration

Basic Video

Basic H.264/AVC

resv

avc1

Single Adaptation Set

Single Representation streaming

Main Video

Main H.265/HEVC or Main 8K H.265/HEVC

resv

hvc1

Single or Multiple independent Adaptation Sets offered

Single Representation streaming

Advanced Video

Flexible H.265/HEVC

resv

hvc1, hvc2

Single or Multiple dependent Adaptation Sets offered

Single or Multiple representation streaming

Note: Advanced Video Profile Receivers are expected to playback content conforming to the Main Video Media Profile.

5.2.2 Basic Video Media Profile

5.2.2.1 Overview

The Basic Video Media Profile permits to download and stream elementary streams for VR content generated according to the H.264/AVC Basic Operation Point as defined in clause 5.1.4. This enables reuse of the avc1 sample entry as for example also used in the TV Video Profiles in TS 26.116 [12]. It also permits to reuse streaming the VR video content in an adaptive manner by offering multiple switchable Representations in a single Adaptation Set in a DASH MPD.

For content generation guidelines for this media profile refer to Annex A.2.3.

5.2.2.2 File Format Signaling and Encapsulation

3GP VR Tracks conforming to this media profile used in the context of the specification shall conform to ISO BMFF [17] with the following further requirements:

– The bitstream included on the track shall comply to the Bitstream requirements and recommendations for the Basic H.264/AVC Operation Point as defined in clause 5.1.4.

– The sample entry type of each sample entry of the track shall be equal to ‘resv’.

– The scheme_type value of SchemeTypeBox in the RestrictedSchemeInfoBox shall be ‘podv’, and all instances of CompatibleSchemeTypeBox defined in ISO/IEC 23090-2 [13] in the same RestrictedSchemeInfoBox shall include at least the scheme_type value ‘erpv’.

– The untransformed sample entry type shall be equal to ‘avc1’.

Note: If a file decoder experiences issues in the playback of the VR Track with the restricted sample ‘resv’, but the application is able to control the rendering according to the VR rendering metadata, then the untransformed sample entry could be used to initialize the decoding process for the file decoder.

– The Track Header Box (‘tkhd’) shall obey the following constraints:

– The width and height fields for a visual track shall specify the track’s visual presentation size as fixed-point 16.16 values expressed in on a uniformly sampled grid (commonly called square pixels) (of the decoded texture signal)

– The Video Media Header (‘vmhd’) shall obey the following constraints:

– The value of the version field shall be set to ‘0’.

– The value of the graphicsmode field shall be set to ‘0’.

– The value of the opcolor field shall be set to {‘0’, ‘0’, ‘0’}.

– The Sample Description Box (‘stsd’) obeys the following constraints:

– A visual sample entry shall be used.

– The box shall include a NAL Structured Video Parameter Set.

– width and height field shall correspond to the cropped horizontal and vertical sample counts provided in the Sequence Parameter Set of the track.

– It shall contain a Decoder Configuration Record which signals the Profile, Level, and other parameters of the video track.

– It shall contain AVCConfigurationBox which signals the Profile, Level, Bit depth, and other parameters conforming to the bitstream constraints specified in clause 5.1.4.

– The Colour Information Box (‘colr’) should be present. If present, it shall signal the colour_primaries, transfer_characteristics and matrix_coeffs applicable to all the bitstreams associated with this sample entry.

– The ProjectionFormatBox with projection_type equal to 0 as defined in ISO/IEC 23090-2 [13] should be present in the sample entry applying to the sample containing the picture.

– It shall not contain the RegionWisePackingBox and StereoVideoBox.

– If the content contained in the Bitstream in the track does not cover the entire sphere, the CoverageInformationBox as defined in ISO/IEC 23090-2 [13] should be present. If present, only a single region may be signaled and the following restrictions apply:

– The coverage_shape_type shall be set to 1.

– The num_regions value shall be set to 1.

– The view_idc_presence_flag shall be set to 0.

– The default_view_idc shall be set to 0.

If 3GP VR Tracks conforming to the constraints of this media profile, the ‘3vrb’ ISO brand should be set as a compatible_brand in the File Type Box (‘ftyp’).

5.2.2.3 DASH Integration

5.2.2.3.1 Definition

If all Representations in an Adaptation Set conform to the requirements in clause 5.2.2.3.2 and the Adaptation Set conforms to the requirements in clause 5.2.2.3.3, then the @profiles parameter in the Adaptation Set may signal conformance to this Operation Point by using "urn:3GPP:vrstream:mp:video:basic".

5.2.2.3.2 Additional Restrictions for DASH Representations

If a VR Track conforming to this media profile is included in a DASH Representation, the Representation use movie fragments and therefore, the following additional requirements apply:

– The Media Header Box (‘mdhd’) shall obey the following constraints:

– The value of the duration field shall be set to ‘0’.

– The value of the duration field in the Movie Header Box (‘mvhd’) shall be set to a value of ‘0’.

– The Sample Table Box (‘stbl’) shall obey the following constraints:

– The entry_count field of the Sample-to-Chunk Box (‘stsc’) shall be set to ‘0’.

– Both the sample_size and sample_count fields of the Sample Size Box (‘stsz’) box shall be set to zero (‘0’). The sample_count field of the Sample Size Box (‘stz2’) box shall be set to zero (‘0’). The actual sample size information can be found in the Track Fragment Run Box (‘trun’) for the track.

Note: This is because the Movie Box (‘moov’) contains no media samples.

– The entry_count field of the Chunk Offset Box (‘stco’) shall be set to ‘0’.

– The Track Header Box (‘tkhd’) shall obey the following constraints:

– The value of the duration field shall be set to ‘0’.

– Movie Fragment Header Boxes (‘mfhd’) shall contain sequence_number values that are sequentially numbered starting with the number 1 and incrementing by +1, sequenced by movie fragment storage and presentation order.

– Any Segment Index Box (‘sidx’), if present, shall obey the additional constraints:

– The timescale field shall have the same value as the timescale field in the Media Header Box (‘mdhd’) within the same track; and

– The reference_ID field shall be set to the track_ID of the ISO Media track as defined in the Track Header Box (‘tkhd’).

– The Segment Index shall describe the entire file and only a single Segment Index Box shall be present.

For all Representation in an Adaptation Set, the following shall apply:

– The identical coverage information shall be present on all Representations in one Adaptation Set.

– The frame rates of all Representations in one Adaptation Set shall be identical.

5.2.2.3.3 DASH Adaptation Set Constraints

For a video Adaptation Set, the following constraints apply:

– The @codecs parameter shall be present on Adaptation Set level and shall signal the maximum required capability to decode any Representation in the Adaptation Set. The @codecs parameter should be signalled on the Representation level if different from the one on Adaptation Set level.

– The attributes @maxWidth and @maxHeight shall be present. They are expected be used to signal the original projected source content format. This means that they may exceed the actual largest size of any coded Representation in one Adaptation Set.

– The @width and @height shall be signalled for each Representation (possibly defaulted on Adaptation Set level) and shall match the values of the maximum width and height in the Sample Description box of the contained Representation.

– The Chroma Format may be signalled. If signalled:

– An Essential or Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:MatrixCoefficients as defined ISO/IEC 23001-8 [10] and the @value attribute according to Table 4 of ISO/IEC 23001-8 [10]. The values shall match the values set in the VUI.

– The signalling shall be on Adaptation Set level.

– The Colour Primaries and Transfer Function may be signalled. If signalled:

– An Essential or Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:ColourPrimaries and urn:mpeg:mpegB:cicp:TransferCharacteristics as defined ISO/IEC 23001-8 [10] and the @value attribute according to Table 4 of ISO/IEC 23001-8 [10]. The values shall match the values set in the VUI.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– The @frameRate should be signalled on Adaptation Set level.

– Random Access Points shall be signalled by @startsWithSAP set to 1 or 2.

– a Supplemental Descriptor should be used to signal the projection by setting the @schemeIdURI attribute to urn:mpeg:mpegI:omaf:2017:pf as defined ISO/IEC 23090-2 [13] and the omaf:@projection_type attribute set to 0.

– If the CoverageInformationBox is present, a Supplemental Descriptor should be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegI:omaf:2017:cc as defined ISO/IEC 23090-2 [13] and shall match the information provided in the CoverageInformationBox. Specifically,

– the cc@shape_type shall be present and be set to 1.

– the cc@view_idc_presence_flag shall not be present.

– exactly one cc.CoverageInfo element shall be present.

– any cc.CoverageInfo attribute that is not centre_azimuth, centre_elevation,
azimuth_range and elevation_range, shall not be present.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– The FramePacking element shall not be present.

– The @profiles parameters may be present to signal the constraints for the Adaptation Set.

5.2.3 Main Video Media Profile

5.2.3.1 Overview

The Main Video Media Profile permits to download and stream elementary streams for VR content generated according to the H.265/HEVC Main Operation Point as defined in clause 5.1.5 or H.265/HEVC Main 8K Operation Point as defined in clause 5.1.7. This enables reuse of the hvc1 sample entry as for example also used in the TV Video Profiles in TS 26.116 [12]. It also permits to reuse streaming the VR video content in an adaptive manner by offering multiple switchable Representations in a single Adaptation Set in a DASH MPD. Furthermore, this profile enables that multiple Video Adaptation Sets are offered for the same content, each encoded for a preferred viewport. Multiple Viewpoints may be signaled, for example expressing different type of content or different camera positions.

For content generation guidelines for this media profile refer to Annex A.2.3.2.

5.2.3.2 File Format Signaling and Encapsulation

3GP VR Tracks conforming to this media profile used in the context of the specification shall conform to ISO BMFF [17] with the following further requirements:

– The included in the video track shall comply to the Bitstream requirements and recommendations for the Main.H.265/HEVC Operation Point as defined in clause 5.1.5 or Main.8K H.265/HEVC Operation Point as defined in clause 5.1.7 with the additional constraints

– the region-wise packing SEI message (payloadType equal to 155). if present in any H.265/HEVC RAP, shall be present in any H.265/HEVC RAP and shall be identical for all H.265/HEVC RAP.

– The sample entry type of each sample entry of the track shall be equal to ‘resv’.

– The scheme_type value of SchemeTypeBox in the RestrictedSchemeInfoBox shall be ‘podv’, and all instances of CompatibleSchemeTypeBox defined in ISO/IEC 23090-2 [13] in the same RestrictedSchemeInfoBox shall include at least one of the scheme_type values ‘erpv’ and ‘ercm’.

– The untransformed sample entry type shall be equal to ‘hvc1’.

Note: If a file decoder experiences issues in the playback of the VR Track with the restricted sample ‘resv’, but the application is able to control the rendering according to the VR rendering metadata, then the untransformed sample entry could be used to initialize the decoding process for the file decoder.

– The Track Header Box (‘tkhd’) shall obey the following constraints:

– The width and height fields for a visual track shall specify the track’s visual presentation size as fixed-point 16.16 values expressed in on a uniformly sampled grid (commonly called square pixels) (of the decoded texture signal)

– The Video Media Header (‘vmhd’) shall obey the following constraints:

– The value of the version field shall be set to ‘0’.

– The value of the graphicsmode field shall be set to ‘0’.

– The value of the opcolor field shall be set to {‘0’, ‘0’, ‘0’}.

– The Sample Description Box (‘stsd’) obeys the following constraints:

– A visual sample entry shall be used.

– The box shall include at least one Sequence Parameter Set NAL unit.

– width and height field shall correspond to the cropped horizontal and vertical sample counts provided in the Sequence Parameter Set of the track.

– It shall contain a Decoder Configuration Record which signals the Profile, Level, and other parameters of the video track.

– The Colour Information Box (‘colr’) should be present. If present, it shall signal the colour_primaries, transfer_characteristics and matrix_coeffs applicable to all the bitstreams associated with this sample entry.

– The ProjectionFormatBox with projection_type equal to 0 as defined in ISO/IEC 23090-2 [13] shall be present in the sample entry applying to the sample containing the picture.

– If the content contained in the Bitstream in the track does not cover the entire sphere, the CoverageInformationBox as defined in ISO/IEC 23090-2 [13] should be present. If present, only a single region may be signaled and the following restrictions apply:

– The coverage_shape_type shall be set to 1, i.e. the sphere region is specified by two azimuth circles and two elevation circles.

– The num_regions value shall be set to 1.

– The view_idc_presence_flag shall be set to 0.

– The default_view_idc shall be set to 0 or 3.

– If the content contained in the Bitstream in the track includes the region-wise packing SEI message (payloadType equal to 155), then the RegionWisePackingBox as defined in ISO/IEC 23090-2 [17] shall be present. It shall signal the same information that is included in the region-wise packing SEI message(s) in the elementary stream.

– If the content contained in the Bitstream in the track does includes the frame packing arrangement SEI message (payloadType equal to 45) in the video stream, the StereoVideoBox shall be present in the sample entry applying to the sample containing the picture. When StereoVideoBox is present, it shall signal the frame packing format that is included in the frame packing arrangement SEI message(s) in the elementary stream.

If 3GP VR Tracks conforming to the constraints of this media profile, the ‘3vrm’ ISO brand should be set as a compatible_brand in the File Type Box (‘ftyp’).

5.2.3.3 DASH Integration

5.2.3.3.1 Definition

If all Representations in an Adaptation Set conform to the requirements in clause 5.2.3.3.2 and the Adaptation Set conforms to the requirements in clause 5.2.3.3.3, then the @profiles parameter in the Adaptation Set may signal conformance to this Operation Point by using "urn:3GPP:vrstream:mp:video:main".

Clause 5.2.3.3.4 defines Adaptation Set Ensembles for viewport-optimized offering.

5.2.3.3.2 Additional Restrictions for DASH Representations

If a VR Track conforming to this media profile is included in a DASH Representation, the Representation use movie fragments and therefore, the following additional requirements apply:

– The Media Header Box (‘mdhd’) shall obey the following constraints:

– The value of the duration field shall be set to ‘0’.

– The value of the duration field in the Movie Header Box (‘mvhd’) shall be set to a value of ‘0’

– The Sample Table Box (‘stbl’) shall obey the following constraints:

– The entry_count field of the Sample-to-Chunk Box (‘stsc’) shall be set to ‘0’.

– Both the sample_size and sample_count fields of the Sample Size Box (‘stsz’) box shall be set to zero (‘0’). The sample_count field of the Sample Size Box (‘stz2’) box shall be set to zero (‘0’). The actual sample size information can be found in the Track Fragment Run Box (‘trun’) for the track.

NotE: This is because the Movie Box (‘moov’) contains no media samples.

– The entry_count field of the Chunk Offset Box (‘stco’) shall be set to ‘0’.

– The Track Header Box (‘tkhd’) shall obey the following constraints:

– The value of the duration field shall be set to ‘0’.

– Movie Fragment Header Boxes (‘mfhd’) shall contain sequence_number values that are sequentially numbered starting with the number 1 and incrementing by +1, sequenced by movie fragment storage and presentation order.

– Any Segment Index Box (‘sidx’), if present, shall obey the additional constraints:

– The timescale field shall have the same value as the timescale field in the Media Header Box (‘mdhd’) within the same track; and

– The reference_ID field shall be set to the track_ID of the ISO Media track as defined in the Track Header Box (‘tkhd’).

– The Segment Index shall describe the entire file and only a single Segment Index Box shall be present.

5.2.3.3.3 DASH Adaptation Set Constraints

For all Representation in an Adaptation Set, the following shall apply:

– The identical coverage information shall be present on all Representations in one Adaptation Set, both on ISO BMFF and elementary stream level.

– The frame rates of all Representations in one Adaptation Set shall be identical.

– The identical region-wise packing information shall be present all Representations in one Adaptation Set, both on ISO BMFF and elementary stream level.

– The identical stereoscopic information shall be present all Representations in one Adaptation Set, both on ISO BMFF and elementary stream level.

For an Adaptation Set, the following constraints apply:

– The @codecs parameter shall be present on Adaptation Set level and shall signal the maximum required capability to decode any Representation in the Adaptation Set. The @codecs parameter should be signalled on the Representation level if different from the one on Adaptation Set level.

– The attributes @maxWidth and @maxHeight shall be present. They are expected be used to signal the used format prior to encoding. This means that they may exceed the actual largest size of any coded Representation in one Adaptation Set.

– The @width and @height shall be signalled for each Representation (possibly defaulted on Adaptation Set level) and shall match the values of the maximum width and height in the Sample Description box of the contained Representation.

– The Chroma Format may be signalled. If signalled:

– An Essential or Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:MatrixCoefficients as defined ISO/IEC 23001-8 [10] and the @value attribute according to Table 4 of ISO/IEC 23001-8 [10]. The values shall match the values set in the VUI.

– The signalling shall be on Adaptation Set level.

– The Colour Primaries and Transfer Function may be signalled. If signalled:

– An Essential or Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:ColourPrimaries and urn:mpeg:mpegB:cicp:TransferCharacteristics as defined ISO/IEC 23001-8 [10] and the @value attribute according to Table 4 of ISO/IEC 23001-8 [10]. The values shall match the values set in the VUI.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– The @frameRate shall be signalled on Adaptation Set level.

– Random Access Points shall be signalled by @startsWithSAP set to 1 or 2.

– A Supplemental Descriptor should be used to signal the projection by setting the @schemeIdURI attribute to urn:mpeg:mpegI:omaf:2017:pf as defined ISO/IEC 23090-2 [13] and the omaf:@projection_type attribute set to 0.

– If the CoverageInformationBox is present then the Coverage information should be signaled on Adaptation Set. If signalled

– a Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegI:omaf:2017:cc as defined ISO/IEC 23090-2 [13] and shall match the information provided in the CoverageInformationBox. Specifically:

– The cc@shape_type shall be present and be set to 1.

– The cc@view_idc_presence_flag shall not be present.

– Exactly one cc.CoverageInfo element shall be present.

– Any cc.CoverageInfo attribute that is not centre_azimuth, centre_elevation,
azimuth_range and elevation_range, shall not be present.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– If the StereoVideoBox is present then the stereo information should be signaled on Adaptation Set. If signalled

– A FramePacking descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:VideoFramePackingType as defined ISO/IEC 23008-1 [10] and the @value attribute shall be set to 4.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

5.2.3.3.4 Adaptation Set Ensembles for Viewport-Optimized offering

5.2.3.3.4.1 Introduction

If multiple Adaptation Sets are offered for the same content in order to permit seamless switching across Representations for a different Viewports, each offered in a different Adaptation Set, then this forms an Ensemble of Adaptation Sets. Note that switching across viewports is not a DASH client functionality, but it is enabled by possible access to the pose and/or viewport information by the DASH client using the 3GPP VR API as shown in Figure 4.6.

5.2.3.3.4.2 Definition and Adaptation Set Signalling

An Ensemble is defined as by Adaptation Sets with a Viewpoint Descriptor for which the value of the @schemeIdURI is prefixed as urn:3GPP:vrstream:ve and the actual value is urn:3GPP:vrstream:ve:<id> with <id> an unsigned integer that is identical for all Adaptation Sets in one Ensemble. By using different ids, multiple ensembles may be defined, each defining a different content (for example different camera angles). The value of @value of the descriptor, if present, is either

– a single unsigned integer value that is different for each Adaptation Set in the Ensemble. If this is present, then the spherical region-wise quality ranking (SRQR) descriptor for which the value of the @schemeIdURI is prefixed as urn:mpeg:mpegI:omaf:2017:srqr shall be present in the each Adaptation Set, or

– a tuple of integer values, separated by a white-spaces. The semantics and order are as follows:

– centre_azimuth: Specifies the azimuth of the centre point of the sphere region in units of 2−16 degrees relative to the 3GPP coordinate system for which this Ensemble has been optimized.

– centre_elevation: Specifies the elevation of the centre point of the sphere region in units of 2−16 degrees relative to the 3GPP coordinate system for which this Ensemble has been optimized.

the spherical region-wise quality ranking (SRQR) descriptor for which the value of the @schemeIdURI is prefixed as urn:mpeg:mpegI:omaf:2017:srqr may additionally be present for additional information.

If the @value attribute is not present, then this Adaptation Set is not optimized for any Viewport. At most one adaptation set without the @value not present shall be present.

One Adaptation Set of one Ensemble shall be signalled as the main content. Signaling as main content shall be done by using the Role descriptor with @schemeIdUri="urn:mpeg:dash:role:2011" and @value="main". If for the main Ensemble an Adaptation Set is present for which the @value of the Viewpoint descriptor is not present, then this should be signalled as the main Adaptation Set.

The content should be offered such that within an Ensemble, if multiple Adaptation Sets with different centre points are signalled, the one is preferred which has the minimum square distance to actual Viewport center.

5.2.3.3.4.3 Representation Constraints in an Ensemble

For all Representations in an Ensemble, the following shall apply:

– The identical coverage information shall be present on all Representations in one Ensemble, both on ISO BMFF and elementary stream level.

– The frame rates of all Representations in one Ensemble shall be identical.

– The identical stereoscopic information shall be present all Representations in one Ensemble, both on ISO BMFF and elementary stream level.

5.2.3.3.4.4 Adaptation Set Constraints in an Ensemble

For all Adaptation Sets in an Ensemble, the following shall apply:

– The @codecs parameter shall be identical for all Adaptation Sets in one Ensemble.

– The Chroma Format shall be identical for all Adaptation Sets in one Ensemble.

– The Color Primaries and Transfer Function shall be identical for all Adaptation Sets in one Ensemble.

– The @frameRate shall be identical for all Adaptation Sets in one Ensemble.

– Segments and subsegments shall be aligned, i.e. @segmentAlignment or @subSegmentAlignment shall be present and shall signal the same unsigned integer value for all Adaptation Sets in an Ensemble.

– Coverage information shall be identical for all Adaptation Sets in one Ensemble.

5.2.4 Advanced Video Media Profile

5.2.4.1 Overview

This Profile permits to download and stream elementary streams for VR content generated according to the Flexible H.265/HEVC operation point as defined in clause 5.1.6. It also allows unconstrained use of rectangular region-wise packing and monoscopic and stereoscopic spherical video up to 360 degrees are supported. With the presence of region-wise packing, the resolution or quality of the omnidirectional video could be emphasized in certain regions, e.g., according to the user’s viewing orientation. In addition, the untransformed sample entry type ‘hvc2’ is allowed, making it possible to use extractors and get a conforming HEVC bitstream when tile-based streaming is used.

5.2.4.2 File Format Signaling and Encapsulation

When a track is the only track in a file, compatible_brands containing a brand equal to ‘3vra’ in FileTypeBox indicates that the track conforms to this media profile. When a file contains multiple tracks, compatible_brands containing a brand equal to ‘3vra’ in FileTypeBox indicates that at least one of the tracks conforms to this media profile.

– The video track shall be indicated to conform to this media profile through one or both of FileTypeBox and TrackTypeBox.

– At least one sample entry type of each sample entry of the track shall be equal to ‘resv’.

– The scheme_type value of SchemeTypeBox in the RestrictedSchemeInfoBox shall be ‘podv’, and of all instances of CompatibleSchemeTypeBox defined in ISO/IEC 23090-2 [13] in the same RestrictedSchemeInfoBox shall include at least one of the scheme_type values ‘erpv’ and ‘ercm’.

– The untransformed sample entry type shall be equal to ‘hvc1’ or ‘hvc2’.

– When the untransformed sample entry type is ‘hvc2’, the track shall include one or more ‘scal’ track references.

– LHEVCConfigurationBox shall not be present in VisualSampleEntry.

– HEVCConfigurationBox in VisualSampleEntry shall be added such that it does not contradict to the Bitstream requirements of the Flexible H.265/HEVC operation point in clause 5.1.6.

– The track_not_intended_for_presentation_alone flag of the TrackHeaderBox may be used to indicate that a track is not intended to be presented alone.

– The Track Header Box (‘tkhd’) shall obey the following constraints:

– The width and height fields for a visual track shall specify the track’s visual presentation size as fixed-point 16.16 values expressed in on a uniformly sampled grid (commonly called square pixels) (of the decoded texture signal)

– The Video Media Header (‘vmhd’) shall obey the following constraints:

– The value of the version field shall be set to ‘0’.

– The value of the graphicsmode field shall be set to ‘0’.

– The value of the opcolor field shall be set to {‘0’, ‘0’, ‘0’}.

– The Sample Description Box (‘stsd’) obeys the following constraints:

– A visual sample entry shall be used.

– The box shall include a NAL Structured Video Parameter Set.

– width and height field shall correspond to the cropped horizontal and vertical sample counts provided in the Sequence Parameter Set of the track.

– It shall contain a Decoder Configuration Record which signals the Profile, Level, and other parameters of the video track.

– The Colour Information Box (‘colr’) should be present. If present, it shall signal the colour_primaries, transfer_characteristics and matrix_coeffs applicable to all the bitstreams associated with this sample entry.

– A ProjectionFormatBox as defined in ISO/IEC 23090-2 [13] shall be present in the sample entry with projection_type equal to 0 or 1.

– If the content contained in the Bitstream in the track does not cover the entire sphere, the CoverageInformationBox as defined in ISO/IEC 23090-2 [13] should be present.

– If the video content contained in the Bitstream in the track is a subset of the entire video content carried in the file and the CoverageInformationBox as defined in ISO/IEC 23090-2 [13] is present, the following restrictions apply:

– If the equirectangular projection is used then,

– The coverage_shape_type shall be set to 1, i.e. the sphere region is specified by two azimuth circles and two elevation circles.

– The num_regions value shall be set to 1.

– If the cubemap projection is used, then one of the two following options applies:

a) The coverage_shape_type shall be set to 1, i.e. the sphere region is specified by two azimuth circles and two elevation circles and the num_regions value shall be set to 1, or

b) The coverage_shape_type shall be set to 0, i.e. the sphere region is specified by four great circles.

– The view_idc_presence_flag shall be set to 0.

– The default_view_idc shall be set to 0 or 3.

– If the content contained in the Bitstream in the track includes the region-wise packing SEI message (payloadType equal to 155), then the RegionWisePackingBox as defined in ISO/IEC 23090-2 [17] shall be present. It shall signal the same information that is included in the region-wise packing SEI message(s) in the elementary stream.

– If the content contained in the Bitstream in the track includes the frame packing arrangement SEI message (payloadType equal to 45) in the video stream, the StereoVideoBox shall be present in the sample entry applying to the sample containing the picture. When StereoVideoBox is present, it shall signal the frame packing format that is included in the frame packing arrangement SEI message(s) in the elementary stream.

5.2.4.3 DASH Integration

5.2.4.3.1 Definition

If all Representations in an Adaptation Set conform to the requirements in clause 5.2.4.3.2 and the Adaptation Set conforms to the requirements in clause 5.2.4.3.3, then the @profiles parameter in the Adaptation Set may signal conformance to this Operation Point by using "urn:3GPP:vrstream:mp:video:advanced".

5.2.4.3.2 Additional Restrictions for DASH Representations

If a VR Track conforming to this media profile is included in a DASH Representation, the Representation use movie fragments and therefore, the following additional requirements apply:

– The value of the duration field in the Media Header Box (‘mdhd’) shall be set to a value of ‘0’.

– The value of the duration field in the Movie Header Box (‘mvhd’) shall be set to a value of ‘0’.

– The value of the duration field in the Track Header Box (‘tkhd’) shall be set to a value of ‘0’.

– Movie Fragment Header Boxes (‘mfhd’) may contain sequence_number values that are not sequentially numbered.

– Any Segment Index Box (‘sidx’), if present, shall obey the additional constraints:

– the timescale field shall have the same value as the timescale field in the Media Header Box (‘mdhd’) within the same track;

– the reference_ID field shall be set to the track_ID of the ISO Media track as defined in the Track Header Box (‘tkhd’).

– The Sample Table Box (‘stbl’) shall obey the following constraints:

– The entry_count field of the Sample-to-Chunk Box (‘stsc’) shall be set to ‘0’.

– Both the sample_size and sample_count fields of the Sample Size Box (‘stsz’) box shall be set to zero (‘0’). The sample_count field of the Sample Size Box (‘stz2’) box shall be set to zero (‘0’). The actual sample size information can be found in the Track Fragment Run Box (‘trun’) for the track.

NOTE 1: This is because the Movie Box (‘moov’) contains no media samples.

– The entry_count field of the Chunk Offset Box (‘stco’) shall be set to ‘0’.

– The same projection format shall be used on all Representations in one Adaptation Set.

– The same frame packing format shall be used on all Representations in one Adaptation Set.

– The same coverage information shall be used on all Representations in one Adaptation Set.

– The same spatial resolution shall be used on all Representations in one Adaptation Set.

– When @dependencyId is used, the values of profiles of the respective dependent and complementary Representations shall be the same.

When the MPD contains a Representation with a track for which the untransformed sample entry type is equal to ‘hvc2’, the following applies:

– Either the Representations carrying a track conforming to the media profile track constraints with the untransformed sample entry type equal to ‘hvc2’ shall contain @dependencyId listing all dependent Representations that carry a track conforming to the media profile track constraints with the untransformed sample entry type equal to ‘hvc1’ or a Preselection property descriptor shall be present and constrained as follows:

– The Main Adaptation Set shall contain a Representation carrying a track conforming to the media profile track constraints with the untransformed sample entry type equal to ‘hvc2’.

– The Partial Adaptation Sets shall contain Representations each carrying a track conforming to the media profile track constraints with the untransformed sample entry type equal to ‘hvc1’.

NOTE 2: When using the Preselection property descriptor, the number of Representations for carrying tracks with the untransformed sample entry type equal to ‘hvc2’ is typically smaller than when using @dependencyId. However, the use of @dependencyId might be needed for encrypted video tracks.

– The Initialization Segment of the Representation that contains @dependencyId or belongs to the Main Adaptation Set is constrained as follows:

– Tracks conform to the media profile track constraints.

– The track corresponding to the untransformed sample entry type equal to ‘hvc2’ refers to the tracks indicated in the TrackReferenceBox of the Initialization Segment.

NOTE 3: When Preselection is used, the sequence_number integer values are not required to be processed and therefore the concatenation of the Subsegments (of the different Representations of the Adaptation Sets of a Preselection) in any order results in a conforming file.

NOTE 4: The conforming Segment sequence formed on the basis of the Preselection property descriptor or by resolving @dependencyId attribute(s) as specified in ISO/IEC 23009-1 [18] and the track_ID value of the track with the untransformed sample entry type equal to ‘hvc2’ produces the HEVC bitstream which conforms to H.265/HEVC Flexible Operation Point.

When switching or accessing Representations at each segment or subsegment is relevant, the following DASH profiles include sufficient constraints:

– ISO Base Media File Format Live profile: urn:mpeg:dash:profile:isoff-live:2011

– ISO Base Media File Format Main profile: urn:mpeg:dash:profile:isoff-main:2011

When low latency considerations are relevant, the following DASH profiles provide tools to support efficient low latency services:

– ISO Base Media File Format On Demand profile: urn:mpeg:dash:profile:isoff-on-demand:2011

– ISO Base Media File Format Broadcast TV profile: urn:mpeg:dash:profile:isoff-broadcast:2015

5.2.4.3.3 DASH Adaptation Set Constraints

For all Representation in an Adaptation Set, the following shall apply:

– The identical coverage information shall be present on all Representations in one Adaptation Set, both on ISO BMFF and elementary stream level.

– The frame rates of all Representations in one Adaptation Set shall be identical.

– The identical region-wise packing information shall be present all Representations in one Adaptation Set, both on ISO BMFF and elementary stream level.

– The identical stereoscopic information shall be present all Representations in one Adaptation Set, both on ISO BMFF and elementary stream level.

For an Adaptation Set, the following constraints apply:

– The @codecs parameter shall be present on Adaptation Set level and shall signal the maximum required capability to decode any Representation in the Adaptation Set. The @codecs parameter should be signalled on the Representation level if different from the one on Adaptation Set level.

– The attributes @maxWidth and @maxHeight shall be present. They are expected be used to signal the decoded texture format of the original signal. This means that they may exceed the actual largest size of any coded Representation in one Adaptation Set.

– The @width and @height shall be signalled for each Representation (possibly defaulted on Adaptation Set level) and shall match the values of the maximum width and height in the Sample Description box of the contained Representation.

– The Chroma Format may be signalled. If signalled:

– An Essential or Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:MatrixCoefficients as defined ISO/IEC 23001-8 [10] and the @value attribute according to Table 4 of ISO/IEC 23001-8 [10]. The values shall match the values set in the VUI.

– The signalling shall be on Adaptation Set level.

– The Colour Primaries and Transfer Function may be signalled. If signalled:

– An Essential or Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:ColourPrimaries and urn:mpeg:mpegB:cicp:TransferCharacteristics as defined ISO/IEC 23001-8 [10] and the @value attribute according to Table 4 of ISO/IEC 23001-8 [10]. The values shall match the values set in the VUI.

The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– The @frameRate shall be signalled on Adaptation Set level.

– Random Access Points shall be signalled by @startsWithSAP set to 1 or 2.

– An Essential Descriptor shall be used to signal the projection by setting the @schemeIdURI attribute to urn:mpeg:mpegI:omaf:2017:pf as defined ISO/IEC 23090-2 [13] and the omaf:@projection_type attribute set to 0 or 1.

– If the CoverageInformationBox is present, then the Coverage information should be signaled on Adaptation Set. If signalled:

– A Supplemental Descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegI:omaf:2017:cc as defined ISO/IEC 23090-2 [13] and shall match the information provided in the CoverageInformationBox. Specifically,

– the cc@shape_type shall be present and be set to 0 or 1.

– the cc@view_idc_presence_flag shall not be present.

– exactly one cc.CoverageInfo element shall be present.

– any cc.CoverageInfo attribute that is not centre_azimuth, centre_elevation, azimuth_range and elevation_range, shall not be present.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– If the StereoVideoBox is present, then the stereo information should be signaled on Adaptation Set. If signalled:

– a FramePacking descriptor shall be used to signal the value by setting the @schemeIdURI attribute to urn:mpeg:mpegB:cicp:VideoFramePackingType as defined ISO/IEC 23008-1 [10] and the @value attribute shall be set to 4.

– The signalling shall be on Adaptation Set level only, i.e. the value shall not be different for different Representations in one Adaptation Set.

– The following applies for the use of @mimeType:

– @mimeType of the Main Adaptation Set shall include the profiles parameter ‘3vra’.

– When Preselection is used, the value of profiles of the main Adaptation Set shall be the same as the value of profiles of its partial Adaptation Sets.

– When Preselection is used, the following applies:

– The value of @subsegmentAlignment in the Main Adaptation Set shall be an unsigned integer and equal to the value of @subsegmentAlignment of the each associated Partial Adaptation Set.

– The value of @segmentAlignment in the Main Adaptation Set shall be an unsigned integer and equal to the value of @segmentAlignment of the each associated Partial Adaptation Set.

5.2.4.3.4 Adaptation Set Constraints for Viewport Selection

If multiple Adaptation Sets are offered for the same content which have emphasized quality regions for different viewports, in order to provide signaling information for switching across Viewports, the spherical region-wise quality ranking (SRQR) descriptor for which the value of the @schemeIdURI is prefixed as urn:mpeg:mpegI:omaf:2017:srqr shall be present in the each Adaptation Set with following restrictions:

– The sphRegionQuality@view_idc_presence_flag shall be set to 0.

– The sphRegionQuality@default_view_idc shall be set to 0 or 3.

– The value of sphRegionQualityll.qualityInfo@quality_ranking shall be greater than 0.

For all Representations in multiple Adaptation Sets for switching accross Viewports, the following shall apply:

– The identical coverage information shall be present on all Representations, both on ISO BMFF and elementary stream level.

– The frame rates of all Representations in Adaptation Sets shall be identical.

– The identical stereoscopic information shall be present all Representations, both on ISO BMFF and elementary stream level.

For all Adaptation Sets with SRQR descriptors for switching across Viewports, the following shall apply:

– The @codecs parameter shall be identical for all Adaptation Sets.

– The Chroma Format shall be identical for all Adaptation Sets.

– The Colour Primaries and Transfer Function shall be identical for all Adaptation Sets.

– The @frameRate shall be identical for all Adaptation Sets.

– Segments and subsegments shall be aligned, i.e. @segmentAlignment or @subSegmentAlignment shall be present and shall signal the same unsigned integer value for all Adaptation Sets.

– Coverage information shall be identical for all Adaptation Sets.