Y.6 Media configuration

26.1143GPPIP Multimedia Subsystem (IMS)Media handling and interactionMultimedia telephonyRelease 18TS

Y.6.1 General

Based on the architecture described in clause Y.2, an SDP framework for immersive video and immersive voice/audio exchange for ITT4RT is presented to negotiate codec support, SEI messages for decoder rendering metadata, as well as RTP/RTCP signaling necessary for viewport dependent processing.

The SDP attributes 3gpp_360video, 3gpp_fisheye, 3gpp_overlay shall be used to indicate respectively a 360-degree projected video stream, a 360-degree fisheye video stream, and a spherical overlay. ITT4RT-Tx clients that support both 360-degree projected video and 360-degree fisheye video may include both 3gpp_360video and 3gpp_fisheye attributes as alternatives in the SDP offer, but an ITT4RT-Rx client willing to receive 360-degree video shall include only one attribute (either 3gpp_360video or 3gpp_fisheye, based on support or selection) in the SDP answer. The 3gpp_overlay attributes may be included in the SDP answer independent on whether projected or fisheye video is selected, since spherical overlays are applicable to both types of 360-degree video streams. The detailed definition and usage of these SDP attributes are presented in the clauses below.

Y.6.2 Main 360-degree video

Y.6.2.1 General

A new SDP attribute 3gpp_360video is defined with the following ABNF syntax:

att-field = "3gpp_360video"

att-value = <payload type> [SP FOV] [SP FOV_CENTER] [SP "Stereo"] [VDP]

VDP = "VDP" [SP SLVL] [SP Projection] [SP PPM] SP viewport_ctrl SP viewport_size

NOTE: If the SDP negotiations become too complex, defining profiles can be considered.

The semantics of the above attribute and parameters is provided below. Unsupported parameters of the 3gpp_360video attribute may be ignored. The payload type is the RTP payload type number of the media stream associated with the 3gpp_360video attribute.

An ITT4RT client supporting the 3gpp_360video attribute shall support the following procedures:

– when sending an SDP offer, the ITT4RT client includes the 3gpp_360video attribute in the media description for video in the SDP offer,

– when sending an SDP answer, the ITT4RT client includes the 3gpp_360video attribute in the media description for video in the SDP answer if the 3gpp_360video attribute was received in an SDP offer,

– after successful negotiation of the 3gpp_360video attribute in the SDP, for the video streams based on the HEVC or AVC codec, the ITT4RT clients exchange an RTP-based video stream containing an HEVC or AVC bitstream with omnidirectional video specific SEI messages as defined in clause Y.3.

An ITT4RT client supporting the 3gpp_360video attribute supporting use of viewport-dependent processing (VDP) shall include the VDP parameter in the SDP offer and answer. Depending on the value indicated by the VDP parameter, the ITT4RT client shall further support the following procedures:

– the RTCP feedback (FB) message described in clause Y.7.2 of type ‘Viewport’ to carry requested viewport information during the RTP streaming of media (signalled from the ITT4RT-Rx client to the ITT4RT-Tx client).

An ITT4RT client shall not include VDP parameter in the SDP answer if the SDP offer contains the 3gpp_360video attribute without the VDP parameter.

An ITT4RT-Tx client that supports VDP may use viewport margins to maintain consistent quality during small head motion and also to reduce the need for frequent viewport updates. Viewport margins can be extended on all or some sides of the viewport and may be at the same quality (or resolution) as the viewport or at a quality (or resolution) lower than the viewport but higher than the background. Viewport margins may be extended around the viewport evenly or unevenly depending on head motion or network quality.

Y.6.2.2 Projection

An ITT4RT- client supporting the 3gpp_360video attribute with VDP supporting projection may include the Projection parameter indicating the types of projection (e.g. ERP, CMP) it prefers (in the order of preference) in the SDP. An ITT4RT client may respond to an SDP offer with multiple options indicated in the Projection parameter with the agreed option. An ITT4RT-Tx client is not required to provide the preferred form of projection indicated by an ITT4RT-Rx client but may do so when possible.

The ABNF syntax is defined as follows:

Projection = "projection=" proj-type *("," proj-type)

proj-type = "ERP" / "CMP"

Y.6.2.3 Field Of View (FOV)

An ITT4RT-Tx client may support sending a limited 360-degree video.

i. An ITT4RT-Tx client supporting the 3gpp_360video attribute capable of sending a limited 360-degree video shall include the parameter FOV in its SDP offer to indicate the cfov (Capture FoV) as the extent (range) of the 360-degree video with respect to the unit sphere. The range is expressed in units of 2–16 degrees with an x parameter for azimuth range and a y parameter for elevation range, sent as a comma-seperated tuple. The values for azimuth range shall be in the range of 0 to 360 * 216 (i.e., 23 592 960), inclusive, and the values for elevation range shall be in the range of 0 to 180 * 216 (i.e., 11 796 480), inclusive. In the absence of cfov, the default value of x and y are 360 and 180 degrees, respectively.

ii. An ITT4RT-Rx client supporting the 3gpp_360video attribute capable that wants to receive a limited 360-degree video shall include the parameter FOV in its SDP offer/answer to indicate the pfov (Preferred FoV), where pfov <= cfov in one or both the x and y dimensions when cfov is known. The pfov range is expressed in units of 2–16 degrees with an x parameter for azimuth range and a y parameter for elevation range, sent as a comma-seperated tuple. The values for azimuth range shall be in the range of 0 to 360 * 216 (i.e., 23 592 960), inclusive, and the values for elevation range shall be in the range of 0 to 180 * 216 (i.e., 11 796 480), inclusive. In the absence of pfov, the default value of x and y are 360 and 180 degrees, respectively.

iii. An ITT4RT-Tx client that has received an SDP offer from an ITT4RT-Rx client with the parameter FOV shall include in its SDP answer the parameter FOV to indicate the range of the 360-degree video it will provide. The value is the same as the FOV in the SDP offer or different based on the ITT4RT-Tx client capabilities.

An ITT4RT client supporting the 3gpp_360video attribute with the FOV parameter may include the paramater FOV_CENTER in the SDP. FOV_CENTER is expressed as a comma-separated tuple (x,y), where x is the azimuth (in units of 2–16 degrees) and y is the elevation (in units of 2–16 degrees) with respect to the global coordinates such that the range defined FOV bypasses through the coordinates defined by FOV_CENTER. The values for azimuth shall be in the range of −180 * 216 (i.e., −11 796 480) to 180 * 216 − 1 (i.e., 11 796 479), inclusive, and the values for elevation shall be in the range of −90 * 216 (i.e., −5 898 240) to 90 * 216 (i.e., 5 898 240), inclusive.The imageattr attribute indicates the resolution of the delivered content based on the cfov and pfov options.

The ABNF syntax is defined as follows

FOV = "fov=" 1*(fovset)

fovset = "[x=" azimuthrange ",y=" elevationrange "]"

azimuthrange = "0"

/ POS-DIGIT *6DIGIT

/ "1" 7DIGIT

/ "2" ("0"/"1"/"2") 6DIGIT

/ "23" ("0"/"1"/"2"/"3"/"4") 5DIGIT

/ "235" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ "2359" ("0"/"1") 3DIGIT

/ "23592" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 2DIGIT

/ "235929" ("0"/"1"/"2"/"3"/"4"/"5") DIGIT

/ "23592960"

; 0 to 23 592 960, inclusive

elevationrange = "0"

/ POS-DIGIT *6DIGIT

/ "10" 6DIGIT

/ "11" ("0"/"1"/"2"/"3"/"4"/"5"/"6") 5DIGIT

/ "117" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ "1179" ("0"/"1"/"2"/"3"/"4"/"5") 3DIGIT

/ "11796" ("0"/"1"/"2"/"3") 2DIGIT

/ "117964" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 1DIGIT

/ "11796480"

; 0 to 11 796 480, inclusive

FOV_CENTER = "fov_center=[x=" centerazimuth ",y=" centerelevation "]"

centerazimuth = "0"

/ ["-"] POS-DIGIT *6DIGIT

/ ["-"] "10" 6DIGIT

/ ["-"] "11" ("0"/"1"/"2"/"3"/"4"/"5"/"6") 5DIGIT

/ ["-"] "117" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ ["-"] "1179" ("0"/"1"/"2"/"3"/"4"/"5") 3DIGIT

/ ["-"] "11796" ("0"/"1"/"2"/"3") 2DIGIT

/ ["-"] "117964" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 1DIGIT

/ "-11796480"

; -11 796 480 to 11 796 479, inclusive

centerelevation = "0"

/ ["-"] POS-DIGIT *5DIGIT

/ ["-"] ("1"/"2"/"3"/"4") 6DIGIT

/ ["-"] "5" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 5DIGIT

/ ["-"] "58" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ ["-"] "589" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 3DIGIT

/ ["-"] "5898" ("0"/"1") 2DIGIT

/ ["-"] "58982" ("0"/"1"/"2"/"3") DIGIT

/ ["-"] "5898240"

; -5 898 240 to 5 898 240, inclusive

Y.6.2.4 Picture Packing

An ITT4RT client supporting mixed-quality tiled encoding, mixed-resolution tiled encoding and/or a 360-degree low-quality background frame-packed with an overlapping high-quality viewport shall include the PPM parameter in the 3gpp_360video attribute of the SDP offer. PPM has the following ABNF syntax:

PPM = "ppm=" ppm-list 

ppm-list = ppm-value *["/"ppm-value]

ppm-value = "1" / "2" / packing

packing = "[" PPWHQ "," PPHHQ "," TRHQ "," PPWLQ "," PPHLQ "," TRLQ "]"

PPWHQ = pos-integer

PPHHQ = pos-integer

TRHQ = transform-value

PPWLQ = pos-integer

PPHLQ = pos-integer

TRLQ = transform-value

pos-integer = POS-DIGIT *DIGIT

POS-DIGIT = %x31-39 ;1-9

transform-value = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" ; transform values as per Table Y.6.1

A list of all supported options as defined by ppm-list above shall be included in the SDP offer, where:

– A ppm-value of 1 indicates mixed-quality tiled encoding

– A ppm-value of 2 indicates mixed-resolution tiled encoding

– A ppm-value set to the comma-separated list ‘packing’ indicates low-quality viewport-independent background 360-degree video frame-packed with a high-quality viewport (possibly with margins) such that the two regions have overlapping content

An ITT4RT client that receives an SDP offer with a ppm-list of more than one ppm-value shall include only one preferred/supported ppm-value in the SDP answer. An ITT4RT-Rx client that includes the PPM parameter in its SDP offer with the ppm-value set to ‘packing’ (as defined above in the PPM syntax) shall set all values of the ‘packing’ to zero. An ITT4RT-Tx client that receives the PPM parameter in an SDP offer with the ppm-value set to ‘packing’ (as defined above in the PPM syntax) shall set all values of the ‘packing’ appropriately in the response.

Tiled encoding may be used to deliver the full 360-degree video or a high-quality video which includes the viewport and may include viewport margins.

The ppm-value ‘packing’ consists of the following six fields:

– PPWHQ defines packed_picture_width of the high-quality region in pixels

– PPHHQ defines packed_picture_height of the high-quality region in pixels

– TRHQ defines transform operations applied on the high-quality region.

– PPWLQ defines packed_picture_width of the low-quality region in pixels

– PPHLQ defines packed_picture_height of the low-quality region in pixels

– TRLQ defines transform operations applied on the low-quality region

The transform operations have a value of 0-7 as defined in Table Y.6.1:

Table Y.6.1: Transform values

Transform value

Transform operation

0

no transform

1

mirrored horizontally

2

rotation by 180 degrees (counter-clockwise)

3

rotation by 180 degrees (counter-clockwise) before mirroring horizontally

4

rotation by 90 degrees (counter-clockwise) before mirroring horizontally

5

rotation by 90 degrees (counter-clockwise)

6

rotation by 270 degrees (counter-clockwise) before mirroring horizontally

7

rotation by 270 degrees (counter-clockwise)

An ITT4RT-Rx client shall render the high-quality viewport region where these two regions are overlapping. The PPM parameter for defining the HQ and LQ regions should be used when the information remains constant during the session. When the packed regions are not overlapping, the high-quality and low-quality regions do not need to be explicitly defined and SEI messages for region-wise packing may be used instead of the SDP PPM parameter.

NOTE: The size of the viewport and fov attributes define the size of projected regions. [It should be considered if the two values should be included explicitly here].

Y.6.2.5 Viewport Control

An ITT4RT client that supports the 3gpp_360video with VDP shall in its SDP offer include the parameter viewport_ctrl with one or more of the following control options:

device_controlled if ITT4RT-Tx client will provide VDP based on the requested viewport indicated by the RTCP feedback (FB) message type ‘Viewport’ sent by the corresponding ITT4RT-Rx client.

recommended_viewport if ITT4RT-Tx client will provide VDP with the help of a recommendation/prediction engine.

presenter_viewport if ITT4RT_Tx will provide VDP based on the viewport of an ITT4RT-Rx client other than the one the SDP offer is being sent to.

Table Y.6.2 provides a mapping between viewport control values and viewport control options.

Table Y.6.2: Viewport control values

Viewport control value

Viewport control option

0

device_controlled

1

recommended_viewport

2

presenter_viewport

Multiple options are provided as a comma-separated list. An ITT4RT client that receives an SDP offer with multiple viewport_ctrl options may include its preferred viewport_ctrl option in the SDP answer. If no options are given in the answer, the sender shall use the first option in the list. If the recommended_viewport is successfully negotiated as viewport_ctrl, the ITT4RT-Rx client should not use viewport prediction when sending the RTCP feedback (FB) message type ‘Viewport’ to avoid any conflicts with the prediction engine of the ITT4RT-Tx client. The ABNF syntax is defined as follows:

viewport_ctrl = "viewport_ctrl=" vc-value *2 ("," [SP] vc-value)

vc-value = "0" / "1" / "2"

; viewport control values as per Table Y.6.2

Y.6.2.6 Overlay and 360-degree video

An ITT4RT client that sends an SDP message with at least one 360-degree video/audio and at least one overlay shall include in SDP the attribute itt4rt_group before any media lines. The itt4rt_group attribute is used to group 360-degree media and overlay media using the mid attribute and the syntax for the SDP attribute is:

a=itt4rt_group: <group-1> / … / <group-N>

where <group-X> shall include at least one mid associated with 360-degree media and at least one mid associated with an overlay as defined by the mid attribute in the corresponding media description.

The ABNF syntax for this attribute is the following:

att-field = "itt4rt_group"

att-value = rest-group *[" /" rest-group]

rest-group = 2*(SP identification-tag)

; identification-tag is defined in RFC 5888

An ITT4RT-Tx client and an ITT4RT-Rx client may negotiate the overlays that can be associated with the 360-degree video offered by the ITT4RT-Tx client using the itt4rt_group attribute. An ITT4RT client shall indicate in an offer the overlays to be grouped with the 360-degree video using the itt4rt_group attribute. The overlays that are acceptable shall be retained in the answer and the ones that are not acceptable shall be removed. An ITT4RT-Tx client may offer overlay configuration options using the 3gpp_overlay attribute based on the list of media lines (i.e., potential overlay sources) provided in the itt4rt_group attribute in an SDP offer initiated by an ITT4RT-Rx client. The 3gpp_overlay attribute is offered in an SDP renegotiation.

The order of the media included in the itt4rt_group indicates the synchronization source with the first media always being the synchronization anchor when synchronization is required.

NOTE: 2D video received from an ITT4RT-Rx clients may be offered as an overlay by the ITT4RT MRF to other ITT4RT-Rx clients. The ITT4RT MRF (acting as the ITT4RT-Tx client) would be the source of overlay media in this case.

Y.6.2.7 Viewport Size

An ITT4RT client that includes the 3gpp_360video with the VDP parameter shall also include in SDP the parameter viewport_size to indicate the size of the device viewport using the azimuth and elevation ranges expressed in degrees. An ITT4RT-Tx client may include the viewport_size of the ITT4RT-Rx client when this is known (e.g., in response to an SDP offer from an ITT4RT-Rx client) or include "viewport=0x0" and the value can be ignored by the ITT4RT-Rx client.

The ABNF syntax is given below:

viewport_size = "viewport=" azimuthrange"x"elevationrange

where the syntax of azimuthrange and elevationrange is as defined in clause Y.6.2.3.

NOTE: The viewport size defines the size of the viewport of the ITT4RT-Rx client UE. An ITT4RT-Tx client provides VDP based on this viewport size. The capture and preferred FOV, on the other hand, defines the range of the 360-degree content, which should be larger than the viewport size and can be negotiated even for viewport-independent processing.

Y.6.2.8 Viewport-only VDP and Image attributes

An ITT4RT-Tx client supporting VDP may deliver only the viewport or viewport with viewport margins and not the full captured/preferred field-of-view of the 360-degree video. If the viewport region (with or without a viewport margin) is extracted from a projected picture (e.g., ERP), the resolution would change depending on where the viewport is located on the picture. To avoid this, the ITT4RT-Tx client may rotate the desired viewport region to the centre of the ERP before cropping it to the desired size as indicated by imageattr. The delivered bitstream shall contain the rotation SEI and the region-wise packing SEI message if the ITT4RT-Rx client is expected to do sphere-locked rendering by reversing the rotation of the received image before rendering.

An ITT4RT client may support viewport-locked VDP for delivering the viewport region only. A viewport-locked VDP bit stream should include only the viewport region and should not include rotation SEI messages. An ITT4RT client that supports viewport-locked VDP shall include in its SDP offer the parameter VLSL as defined below

SLVL = "VL"/"VL,SL"/"SL"

The value "SL" refers to sphere-locked rendering, which requires the receiver to render the received picture according to the global coordinate axes. The value "VL" refers to viewport-locked rendering, which require the receiver to render the received picture such that the center of the received picture is aligned to the center of the current viewport.

An ITT4RT client that supports only viewport-locked VDP shall include "VL" in its SDP offer. An ITT4RT client that supports both viewport-locked VDP and a sphere-locked type of VDP shall include "VL,SL". An ITT4RT client that receives an SDP offer with "VL" shall include it in its response if it supports viewport-locked VDP and chooses to use it. An ITT4RT client that receives an SDP offer with "VL" shall remove the VDP parameter in its response if it does not support viewport-locked VDP or does not wish to use it; the 360-degree video is then delivered using viewport-independent processing. An ITT4RT client that receives an SDP offer with "SL" shall include it in its response if it supports sphere-locked VDP and chooses to use it. An ITT4RT client that receives an SDP offer with "SL" shall remove the VDP parameter in its response if it does not support sphere-locked VDP or does not choose to use it; the 360-degree video is then delivered using viewport-independent processing. An ITT4RT client that receives an SDP offer with "VL,SL" shall include either "SL" or "VL" in the SDP response based on its preferred mode. Alternatively, if it does not support nor choose either VL or SL it shall remove the VDP from the 3gpp_360video attribute in the response.

Y.6.2.9 Viewport Feedback Trigger

An ITT4RT client include the parameter viewportfb_trigger in the 3gpp_360video attribute to define the minimum view port change to initiate an early or event-based RTCP feedback, with the following syntax

viewportfb_trigger = "<"D | D_azimuth, D_elevation">"

where D_azimuth, D_elevation are the minimum number of degrees that the viewport may change in the horizontal or vertical direction, respectively. The value D is the minimum spherical distance in degrees between the center of the old and the new viewport. The values for D shall be in the range 0 to 180 * 216 − 1 (i.e., 11 796 479). Spherical distance between the centre of a first viewport (x1,y1) and second viewport (x2,y2), is calculated as:

where x1 and x2 is the azimuth in radians and y1 and y2 is the elevation in radians. The value c is in radians and must be converted to degrees for use with D.

The viewport feedback trigger value is estimated by the ITT4RT-Tx client based on the viewport margin configuration it intends to use. An ITT4RT-Rx client supporting RTCP viewport feedback shall use periodic RTCP viewport feedback. The frequency of the periodic feedback should be such that it does not exceed the allocated RTCP bandwidth as defined in RFC 4585. An ITT4RT-Rx client may use immediate/early RTCP feedback in addition to the periodic feedback as long as the allocated RTCP bandwidth requirements are met. An ITT4RT-Tx may define a viewport feedback trigger value for an early/immediate feedback and signal this value to the ITT4RT-Rx client in the SDP. The ITT4RT-Tx client should select a threshold value that is suitable for the margin configuration that it intends to use for that stream. The threshold value should be defined within the viewport margin region such that the ITT4RT-Tx client would update the high-quality region (viewport and viewport margin) if the viewport breaches this threshold.

If an ITT4RT-Rx client does not have the capability to provide an RTCP viewport feedback at the viewport feedback threshold value provided by the ITT4RT-Tx client in an SDP offer, it may respond with the the minimum threshold value it can support. The ITT4RT-Tx client may adjust its viewport margin configuration based on the threshold value in the answer. If an ITT4RT-Rx client only supports periodic feedback, it shall remove the viewportfb_trigger parameter from the response.

An ITT4RT-Rx client that supports a viewport feedback trigger shall include the parameter viewportfb_trigger with the minimum threshold value it can support in an SDP offer. The ITT4RT-Tx client may remove the parameter if it does not support this value or respond with an acceptable value that is equal or higher than the one in the ITT4RT-Rx’s offer.

If both sides acknowledge the support of viewportfb_trigger, the ITT4RT-Rx client shall use event-driven/early viewport feedback in addition to periodic feedback. If viewportfb_trigger is not defined by the ITT4RT-Tx client, the ITT4RT-Rx client may still use immediate/early feedback. An ITT4RT-Rx client may use the velocity of the viewport during head motion and the viewport margin (if known) to trigger an immediate feedback. Alternatively, it may use the spherical distance between the viewport in the last feedback and the current viewport to trigger an immediate feedback. The spherical distance can be selected based on viewport margins (if known). An ITT4RT-Rx client may suppress an immediate/early feedback if the time to the next periodic viewport feedback is less than an application-defined threshold.

Y.6.3 Still Background

Still image backgrounds may be supported by ITT4RT clients. The format and signaling shall follow the static image format and signaling as defined in clauses 5.2.4, 6.2.11, and 7.4.8. An ITT4RT-Tx client should send the image/image sequence as a video bitstream if still images are not supported.

The signaling in clause Y.6.2 shall apply to indicate that the still background is 360 degree. The 3gpp_360video or 3gpp_fisheye attribute shall be used for that purpose.

Y.6.4 Overlays

Y.6.4.1 General

ITT4RT clients supporting the ‘Overlay’ feature may define an overlay source and overlay configuration in the SDP. An ITT4RT client that supports overlays shall support a video or image stream indicated by a media line in the SDP as the source of an overlay.

Y.6.4.2 Visual Media

Any visual media that is defined with the ‘m=video’ line, includes the attribute mid and does not have the attribute a=3gpp_360video in the SDP may be rendered as an overlay by an ITT4RT-Rx client. An ITT4RT client shall include the attribute mid in the overlay media description. If an overlay is to be associated with a particular overlay configuration, the mid shall be used to associate the overlay media description to the the overlay configuration, which is later described in clause Y.6.4.3.

Visual media may consist of an encoded video bitstream or HEVC encoded images/image sequences, both of which are defined with the ‘m=video’ line in SDP. ITT4RT clients that support images shall use the “a=imageseq” attribute as defined in clause 6.2.11.

Y.6.4.3 Overlay Configuration

Y.6.4.3.1 General

ITT4RT clients may support the following types of rendering for overlays, as defined in the OMAF specification [179] clause 7.14.

– viewport-relative overlay, specifying that the overlay is displayed on a rectangular area at an indicated position relative to the top-left corner of the viewport;

– [sphere-relative projected omnidirectional overlay, specifying that the overlay is displayed on a sphere surface at an indicated position within or on the unit sphere];

– sphere-relative 2D overlay, specifying that the overlay is displayed on a plane at an indicated position within the unit sphere.

An ITT4RT client supporting overlays may include in its SDP media description the attribute 3gpp_overlay to define one or more parameters for configuring the rendering properties of an overlay. The 3gpp_overlay attribute has the following syntax:

a = 3gpp_overlay: overlay_id SP type SP (sphere_relative_overlay_config / viewport_relative_overlay_config) [ SP overlay_info]

The 3gpp_overlay attribute is included as part of the 360-degree video media description. More than one 3gpp_overlay attribute may exist as part of the media description if more than one overlay is to be configured. The overlay_id in 3gpp_overlay is set to the mid of the media description of the overlay source for which the configuration is provided. An ITT4RT-Tx client may include more than one 3gpp_overlay attributes with the same non-zero overlay_id (associated with an overlay source) in an SDP offer. The overlay configurations are listed in order of preference. An ITT4RT-Rx client that receives an SDP offer with multiple 3gpp_overlay attributes with the same non-zero overlay_id shall reply with only one acceptable 3gpp_overlay line for one acceptable overlay source. If an overlay source is rejected, all 3gpp_overlay attribute lines with the overlay_id associated with that source are excluded in the SDP answer.

An ITT4RT-Tx client may set the free_overlay flag to indicate that no media source is associated with this overlay configuration by the ITT4RT-Tx client and the corresponding overlay region in the video may be used by the ITT4RT-Rx client to overlay any media including external media sources (e.g., a text notification from an external application, an external video overlay or a source describing an occlude-free region that any overlay on top would not result in any significant loss of information). The overlay_id in 3gpp_overlay in case of overlays with the free_overlay flag with a value of 1 shall not be equal to any mid in the SDP media description. In case of overlay_id which is not associated with an overlay source (to identify an external source overlay and/or an occlude-free region), the free_overlay flag shall be included with a value equal to 1. If the free_overlay flag is not present or the free_overlay flag has a value of 0, the overlay_id shall be a value which corresponds to at least one mid of the media description of the overlay source. An ITT4RT-Tx client may include more than one 3gpp_overlay attributes with the same non-zero overlay_id and with the free_overlay flag set to 1 in an SDP offer. The implementation details for use of overlay region with the free_overlay flag set to 1 is left to the application.

An ITT4RT-Rx client that sends an SDP offer to receive 360-degree media and overlay source media shall not include the 3gpp_overlay attribute. An ITT4RT-Tx client that receives such an offer may initiate an SDP renegotiation offer to configure the overlays for the overlay sources as indicated by the itt4rt_group attribute.

An ITT4RT client that receives an SDP offer including overlay sources without the 3gpp_overlay attribute may decline the offer and send a renegotiation offer with the 3gpp_overlay attribute.

The type shall have the value ‘0’ for viewport-relative overlays and ‘1’ for sphere-relative overlays. Depending on the value of type, the 3gpp_overlay attribute may further include the corresponding configuration information sphere_relative_overlay_config (type = ‘1’) or the viewport_relative_overlay_config (type = ‘0’).

A set of flags describing interactivity of the overlay may be included in the optional overlay_info parameter defined in clause Y.6.4.3.4.

NOTE 1: Other overlay definitions in OMAF are not excluded from ITT4RT. Which overlay definition(s) from OMAF are adopted for overlays in ITT4RT is currently TBD. Optional user interactivity flags such as for definining moveability, resizing of the overlay may be added later to the 3gpp_overlay parameter.

NOTE 2: There should be a default configuration for overlay when explicit configuration is not provided to ensure that multiple receivers have similar experience.

Y.6.4.3.2 Sphere-relative Overlay Configuration

An ITT4RT client supporting the 3gpp_overlay attribute to configure a sphere-relative overlay shall set parameter type = ‘1’ and additionally include the parameter sphere_relative_overlay_config defined as follows:

sphere_relative_overlay_config = Overlay_azimuth "," Overlay_elevation "," Overlay_tilt "," Overlay_azimuth_range "," Overlay_elevation_range "," Overlay_rot_yaw "," Overlay_rot_pitch "," Overlay_rot_roll "," region_depth_minus1 "," timeline_change_flag

– Overlay_azimuth: Specifies the azimuth angle of the centre of the overlay region on the unit sphere in units of 2−16 degrees relative to the global coordinate axes.

– Overlay_elevation: Specifies the elevation angle of the centre of the overlay region on the unit sphere in units of 2−16 degrees relative to the global coordinate axes.

– Overlay_tilt: Specifies the tilt angle of the offered overlay region, in units of 2−16 degrees, relative to the global coordinate axes.

– Overlay_azimuth_range: Specifies the azimuth range of the region corresponding to the 2D plane on which the overlay is rendered through the centre point of the overlay region in units of 2−16 degrees.

– Overlay_elevation_range: Specifies the elevation range of the offered region corresponding to to the 2D plane on which the overlay is rendered through the centre point of the overlay region in units of 2−16 degrees.

– Overlay_rot_yaw, Overlay_rot_pitch, and Overlay_rot_roll specify the rotation of the 2D plane on which the overlay is rendered. Prior to rendering the 2D plane, it may be rotated as specified by overlay_rot_yaw, overlay_rot_pitch and overlay_rot_yaw and placed on a certain distance as specified by region_depth_minus1. The rotations are relative to the coordinate system as specified in clause 5.1 of ISO/IEC 23090-2 in which the origin of the coordinate system is in the centre of the overlay region, the X axis is towards the origin of the global coordinate axes, the Y axis is towards the point on the plane that corresponds to cAzimuth1 in Figure 7‑4 of ISO/IEC 23090-2, and the Z axis is towards the point on the plane that corresponds to cElevation2 in Figure 7‑4 of ISO/IEC 23090-2. overlay_rot_yaw expresses a rotation around the Z axis, overlay_rot_pitch rotates around the Y axis, and overlay_rot_roll rotates around the X axis. Rotations are extrinsic, i.e., around X, Y, and Z fixed reference axes. The angles increase clockwise when looking from the origin towards the positive end of an axis. The rotations are applied starting from overlay_rot_yaw, followed by overlay_rot_pitch, and ending with overlay_rot_roll.

– region_depth_minus1 – indicates the depth (z-value) of the region on which the overlay is to be rendered. The depth value is the norm of the normal vector of the overlay region. region_depth_minus1 + 1 specifies the depth value relative to a unit sphere in units of 2−16.

– timeline_change_flag equal to ‘1’ specifies that the overlay content playback shall pause if the overlay is not in the user’s current viewport, and when the overlay is back in the user’s viewport the overlay content playback shall resume with the global presentation timeline of the content. The content in the intermediate interval is skipped. timeline_change_flag equal to ‘0’ specifies that the overlay content playback shall pause if the overlay is not in the user’s current viewport, and when the overlay is back in the user’s viewport the overlay content playback resumes from the paused sample. This prevents loss of any content due to the overlay being away from the user’s current viewport.

Y.6.4.3.3 Viewport-relative Overlay Configuration

An ITT4RT client supporting the 3gpp_overlay attribute to configure a viewport-relative overlay shall set parameter type = ‘0’ and additionally include the parameter viewport_relative_overlay_config defined as follows:

viewport_relative_overlay_config = Overlay_rect_left_percent "," Overlay_rect_top_percent "," Overlay_rect_width_percent "," Overlay_rect_height_percent "," Relative_disparity_flag "," (Disparity_in_percent / Disparity_in_pixels),” media_alignment”, ”layering_order”,”opacity”,”overlay_priority”

– Overlay_rect_left_percent: Specifies the x-coordinate of the left corner of the rectangular region of the overlay to be rendered on the viewport in per cents relative to the width of the viewport. The values are indicated in units of 2-16 in the range of 0 (indicating 0%), inclusive, up to but excluding 65536 (that indicates 100%).

– Overlay_rect_top_percent: Specifies the y-coordinate of the top corner of the rectangular region of the overlay to be rendered on the viewport in per cents relative to the height of the viewport. The values are indicated in units of 2-16 in the range of 0 (indicating 0%), inclusive, up to but excluding 65536 (that indicates 100%).

– Overlay_rect_width_percent: Specifies the width of the rectangular region of the overlay to be rendered on the viewport in per cents relative to the width of the viewport. The values are indicated in units of 2-16 in the range of 0 (indicating 0%), inclusive, up to but excluding 65536 (that indicates 100%).

– Overlay_rect_height_percent: Specifies the height of the rectangular region of the overlay to be rendered on the viewport in per cents relative to the height of the viewport. The values are indicated in units of 2-16 in the range of 0 (indicating 0%), inclusive, up to but excluding 65536 (that indicates 100%).

NOTE 1: The size of overlay region over the viewport changes according to the viewport resolution and aspect ratio. However, the aspect ratio of the overlaid media is not intended to be changed.

– Relative_disparity_flag indicates whether the disparity is provided as a percentage value of the width of the display window for one view (when the value is equal to 1) or as a number of pixels (when the value is equal to 0). This applies for the case when there is a monoscopic overlay.

– Disparity_in_percent: Specifies the disparity, in units of 2−16, as a fraction of the width of the display window for one view. The value may be negative, in which case the displacement direction is reversed. This value is used to displace the region to the left on the left eye view and to the right on the right eye view. This applies for the case when there is a monoscopic overlay and stereoscopic background visual media.

– Disparity_in_pixels indicates the disparity in pixels. The value may be negative, in which case the displacement direction is reversed. This value is used to displace the region to the left on the left eye view and to the right on the right eye view. This applies for the case when there is a monoscopic overlay and stereoscopic background visual media.

– Media_alignment: Specifies the default intended scaling of the overlay source depending on the dimensions of the specified rectangular region and the intended placement of the scaled overlay source relative to the specified rectangular region.

– Layering_order: Indicates the default layering order among the overlays that are relative to the viewport, and separately among each set of overlays that have the same depth. Viewport-relative overlays are overlaid on top of the viewport in descending order of layering_order, i.e., an overlay with a smaller layering_order value shall be in front of an overlay with a greater layering_order value. The layering order for overlays of the 360-degree video should be decided by the ITT4RT-Tx client.

– Opacity: Indicates an integer value that specifies the default opacity that is to be applied for the overlay and assigned by the ITT4RT-Tx client. Value 0 is fully transparent, and value 100 is fully opaque with a linear weighting between the two extremes. Values greater than 100 are reserved.

– Overlay_priority: Indicates which overlay should be prioritized in the case the ITT4RT-Rx client does not have enough decoding capacity to decode all overlays. A lower overlay_priority indicates higher priority. The value of overlay_priority, when present, shall be equal to 0 for overlays that are essential for displaying. More than one overlay may have the same overlay_priority and an ITT4RT-Rx client that does not support all overlays with the same priority may choose any subset of these.

NOTE 2: Other overlay definitions in OMAF are not excluded from ITT4RT. Which overlay definition(s) from OMAF are adopted for overlays in ITT4RT is currently TBD.

NOTE 3: For both of the above overlay types from OMAF, incorporate from the spec further details about the order of operations for overlay rendering (in particular order of translation and rotation).

Y.6.4.3.4 Overlay info parameter

The parameter overlay_info is a bit field consisting of flags describing the type of overlay interactivity recommended:

overlay_info= ‘overlay_info=’b4b3b2b1b0

– b0 = ‘0’: changing the position of the overlay is not recommended, overlay position is fixed
b0 = ‘1’: changing the position of the overlay is allowed, ITT4RT-Rx client may change position

– b1 = ‘0’: switching the overlay on/off is not recommended
b1 = ‘1’: switching the overlay on/off is allowed, ITT4RT-Rx client may switch overlay off

– b2 = ‘0’: rotating the overlay is not recommended
b2 = ‘1’: rotating the overlay is allowed

– b3 = ‘0’: resizing the overlay is not recommended
b3 = ‘1’: resizing the overlay is allowed

– b4 = ‘0’: changing the opacity of the overlay is not recommended
b4 = ‘1’: changing the opacity of the overlay is allowed

Y.6.4.3.5 Additional Overlay Configuration

An ITT4RT client supporting the 3gpp_overlay attribute to configure a sphere-relative overlay or viewport-relative overlay may include the following additional parameter for overlay support:

overlay_overlap_flag: Indicates if the ITT4RT-Rx client is allowed to overlap overlays from the ITT4RT-Tx client. If set to 1, the ITT4RT-Rx client may overlap overlays shared by the ITT4RT-Tx client.

Y.6.4.4 Captured Content Replacement

To prevent the degradation of presentation material (e.g., slides, screen share, video, notes) that may be captured from a display (screen or projector) with a 360-degree camera, the captured content in the 360-degree video can be replaced with the original presentation material. Such replacement implies decoding the content (360-degree video and presentation material), identifying the position of the presentation material in the 360-degree video, replacing the captured presentation content at the display coordinates in the 360-degree video, and finally encoding the new 360-degree video (i.e., with the same encoding parameters as the original 360-degree video). The replacement could either be performed in the ITT4RT-Tx client in terminal which is sending the 360-degree video or in the ITT4RT-MRF.

When replacement is to be performed, the availability of the original presentation content must be signalled by the source of the content to the client performing the replacement (that is, the ITT4RT-Tx client in terminal or the ITT4RT-MRF) using the SDP attribute “a=content:slides” [81] which may include different content, for example slides, screen share, video, notes. The client performing the replacement shall determine an appropriate configuration for performing the content replacement in the 360-degree video, unless overlay parameters are given by the source of the original presentation content (e.g., configuration in terms of sphere-relative overlay coordinates as defined in clause Y.6.4.3.2).

When the SDP negotiation is initiated by the ITT4RT-Tx client in terminal, the ITT4RT-Tx client in terminal shall include the attribute “a=3gpp_360video_replacement” in its SDP offer to indicate that the content captured in the 360-degree video can be replaced. If the ITT4RT-MRF supports content replacement and receives an SDP offer with the attribute “a=3gpp_360video_replacement”, then the ITT4RT-MRF shall include the attribute “a=3gpp_360video_replacement” in its SDP answer and shall perform content replacement.

If the ITT4RT-Tx client in terminal includes the attribute “a=3gpp_360video_replacement” in its SDP offer but does not receive the attribute in the SDP answer (that is, replacement is not supported in the ITT4RT-MRF) then the ITT4RT-Tx client in terminal may send the original presentation content using a different process than ITT4RT-MRF replacement (e.g., the presentation can be sent as an overlay as defined in clause Y.6.4., or inserted into the 360-degree video by the ITT4RT-Tx client in terminal as described above).

If the ITT4RT-MRF does not receive the attribute “a=3gpp_360video_replacement” in an SDP offer, it shall not perform any replacement and will not include the attribute in its SDP answer.

When replacement is to be performed by the ITT4RT-MRF and the SDP negotiation is initiated by the ITT4RT-MRF, the offer sent by the ITT4RT-MRF to the ITT4RT-Tx client in terminal shall include the attribute “a=3gpp_360video_replacement”. If the ITT4RT-Tx client in terminal accepts the offer by the MRF to perform replacement, the ITT4RT-Tx client in terminal shall include the attribute “a=3gpp_360video_replacement” in the SDP answer and the ITT4RT-MRF shall perform content replacement.

If the ITT4RT-MRF does not receive the attribute “a=3gpp_360video_replacement” in the SDP answer of the ITT4RT-Tx in terminal (i.e., the content captured in the 360-degree video cannot be replaced), the ITT4RT-MRF shall not perform any replacement.

If the ITT4RT-MRF does not support content replacement, it shall not include the attribute “a=3gpp_360video_replacement” in an SDP offer, it will not perform any replacement, and the ITT4RT-Tx client in terminal may send the original presentation content using a different process (e.g., the presentation can be sent as an overlay as defined in clause Y.6.4, or inserted into the 360-degree video by the ITT4RT-Tx client in terminal as described above). In the case that the ITT4RT-MRF does not send the attribute “a=3gpp_360video_replacement” in an offer, the ITT4RT-Tx client in terminal shall not send the attribute “a=3gpp_360video_replacement” in an answer.

After an accepted offer/answer between ITT4RT-Tx in terminal and ITT4RT-MRF with both offer and answer including the attribute “a=3gpp_360video_replacement”, the ITT4RT-MRF shall perform content replacement once the original presentation content is available from the source of the content and the replacement configuration is determined.

If the replacement configuration of the content is analysed and determined by the ITT4RT-Tx in terminal, the client shall include the configuration as sphere-relative overlay coordinates (defined in Y.6.4.3.2) in the SDP offer/answer while negotiating the stream with the ITT4RT-MRF. If the sphere-relative overlay coordinates are not signalled in the SDP offer/answer by the ITT4RT-Tx, the ITT4RT-MRF shall analyse and determine an appropriate configuration for performing the content replacement in the 360-degree video.

The ABNF syntax for the replacement attribute is as follows:

att-field = “3gpp_360video_replacement”

att-value = [sphere_relative_overlay_config]

Y.6.5 Fisheye Video

Y.6.5.1 Identifying the 360-degree fisheye video stream

The SDP attribute 3gpp_fisheye is used to indicate a 360-degree fisheye video stream.

The semantics of the above attribute and parameters is provided below.

ITT4RT clients supporting 360-degree fisheye video shall support the 3gpp_fisheye attribute and shall support the following procedures:

– when sending an SDP offer, the ITT4RT-Tx client includes the 3gpp_fisheye attribute in the media description for video in the SDP offer

– when sending an SDP answer, the ITT4RT-Rx client includes the 3gpp_fisheye attribute in the media description for video in the SDP answer if the 3gpp_fisheye attribute was received in an SDP offer

– after successful negotiation of the 3gpp_fisheye attribute in the SDP, the MTSI clients exchange an RTP-based video stream containing an HEVC or AVC bitstream with fisheye omnidirectional video specific SEI messages as defined in clause Y.3

ITT4RT-Tx clients that support both 360-degree projected video and 360-degree fisheye video may include both 3gpp_360video and 3gpp_fisheye attributes as alternatives in the SDP offer, but an ITT4RT-Rx client shall include only one attribute (either 3gpp_360video or 3gpp_fisheye, based on support or selection) in the SDP answer.

Y.6.5.2 360-degree fisheye video SDP attribute parameters

Media-line level parameters are defined in order to aid session establishment between the ITT4RT-Tx and ITT4RT-Rx clients for 360-degree fisheye video, as well as to describe the fisheye video stream as identified by the 3gpp_fisheye attribute.

The syntax for the SDP attribute is:

a=3gpp_fisheye: <fisheye> <fisheye-img> <maxpack>

– Total number of fisheye circular videos at the capturing terminal.

Depending on the camera configuration of the sending terminal, the 360-degree fisheye video may be comprised of multiple different fisheye circular videos, each captured through a different fisheye lens.

– <fisheye>: this parameter inside an SDP offer sent by an ITT4RT-Tx client indicates the total number of fisheye circular videos output by the camera configuration at the terminal.

– Fisheye circular video static parameters.

In order to enable the quick selection of desired fisheye circular videos by the ITT4RT-Rx client during SDP negotiation, the following static parameters are defined for each fisheye circular video. These parameters are defined from the video bitstream fisheye video information SEI message as defined in ISO/IEC 23008-2 [119] and ISO/IEC 23090-2 [179].

– <fisheye-img> = <fisheye-img-1> … <fisheye-img-N>

– <fisheye-img-X> = [<id-X> <azi> <ele> <til> <fov>] for 1 ≤ X ≤ N where:

– <id>: an identifier for the fisheye video.

– <azi>, <ele>: azimuth and elevation indicating the spherical coordinates that correspond to the centre of the circular region that contains the fisheye video, in units of 2-16 degrees. The values for azimuth shall be in the range of −180 * 216 (i.e., −11 796 480) to 180 * 216 − 1 (i.e., 11 796 479), inclusive, and the values for elevation shall be in the range of −90 * 216 (i.e., −5 898 240) to 90 * 216 (i.e., 5 898 240), inclusive.

– <til>: tilt indicating the tilt angle of the sphere regions that corresponds to the fisheye video, in units of 2−16 degrees. The values for tilt shall be in the range of −180 * 216 (i.e., −11 796 480) to 180 * 216 − 1 (i.e., 11 796 479), inclusive.

– <fov>: specifies the field of view of the lens that corresponds to the fisheye video in the coded picture, in units of 2−16 degrees. The field of view shall be in the range of 0 to 360 * 216 (i.e., 23 592 960), inclusive.

– Stream packing of fisheye circular videos

Depending on the terminal device capabilities and bandwidth availability, the packing of fisheye circular videos within the stream can be negotiated between the sending and receiving terminals.

– <maxpack>: this parameter inside an SDP offer indicates the maximum supported number of fisheye videos which can be packed into the video stream by the ITT4RT-Tx client. The value of this parameter inside an SDP answer indicates the number of fisheye videos to be packed, as selected by the ITT4RT-Rx client.

The ABNF syntax for this attribute is the following:

att-field = “3gpp_fisheye”

att-value = [SP fisheye] SP fisheye-img SP maxpack

fisheye = integer

fisheye-img = 1*fisheye-img-X

fisheye-img-X = "[" "id=" idvalue "," "azi=" azivalue "," "ele=" elevalue "," "til=" tilvalue "," "fov=" fovvalue "]"

idvalue = byte-string ; byte-string defined by RFC 4566

azivalue = "0"

/ ["-"] POS-DIGIT *6DIGIT

/ ["-"] "10" 6DIGIT

/ ["-"] "11" ("0"/"1"/"2"/"3"/"4"/"5"/"6") 5DIGIT

/ ["-"] "117" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ ["-"] "1179" ("0"/"1"/"2"/"3"/"4"/"5") 3DIGIT

/ ["-"] "11796" ("0"/"1"/"2"/"3") 2DIGIT

/ ["-"] "117964" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 1DIGIT

/ "-11796480"

; -11 796 480 to 11 796 479, inclusive

elevalue = "0"

/ ["-"] POS-DIGIT *5DIGIT

/ ["-"] ("1"/"2"/"3"/"4") 6DIGIT

/ ["-"] "5" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 5DIGIT

/ ["-"] "58" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ ["-"] "589" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7") 3DIGIT

/ ["-"] "5898" ("0"/"1") 2DIGIT

/ ["-"] "58982" ("0"/"1"/"2"/"3") DIGIT

/ ["-"] "5898240"

; -5 898 240 to 5 898 240, inclusive

tilvalue = azivalue

; -11 796 480 to 11 796 479, inclusive

fovvalue = "0"

/ POS-DIGIT *6DIGIT

/ "1" 7DIGIT

/ "2" ("0"/"1"/"2") 6DIGIT

/ "23" ("0"/"1"/"2"/"3"/"4") 5DIGIT

/ "235" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 4DIGIT

/ "2359" ("0"/"1") 3DIGIT

/ "23592" ("0"/"1"/"2"/"3"/"4"/"5"/"6"/"7"/"8") 2DIGIT

/ "235929" ("0"/"1"/"2"/"3"/"4"/"5") DIGIT

/ "23592960"

; 0 to 23 592 960, inclusive

maxpack = integer

POS-DIGIT = %x31-39 ; 1-9

integer = POS-DIGIT *DIGIT

An example SDP offer is shown in table Y.A.

Table Y.6.5.2-1: Example SDP offer with 360-degree fisheye video attribute parameters

SDP offer

m=video 49154 RTP/AVP 99

a=tcap:1 RTP/AVPF

a=pcfg:1 t=1

b=AS:10000

b=RS:0

b=RR:2500

a=rtpmap:99 H265/90000

a=fmtp:99 profile-id=1; level-id=93;

a=3gpp_fisheye: 2 [id=1,azi=0,ele=0,til=0,fov=11796480] [id=2,azi=11796479,ele=0,til=0,fov=11796480] 2

a=sendonly

As an example, a receiving terminal which only receives 360-degree fisheye video (and possibly sends a 2D video to the sender) replies with an SDP answer containing only the selected fisheye videos equal to the number as selected by the value of maxpack in the corresponding m-line, which is set to recvonly.

Y.6.5.3 Viewport dependent delivery of fisheye video

By exposing the coverage information of each fisheye circular video using the parameters in section Y.6.5.2, the collective multitude of which makes up the whole 360-degree video, a ITT4RT-Rx client can opt to select only the required fisheye circular videos needed to render the current viewport of the user.

Through the parameters defined in section Y.6.5.2, a ITT4RT-Rx client can select the desired fisheye packing configuration of the video stream during SDP negotiation, as well as the initial desired fisheye videos using the id parameter.

Once a session is established, dynamic delivery of the desired fisheye videos depending the ITT4RT-Rx client user’s viewport can be enabled using RTCP-based signalling, specifically with the RTCP feedback message with type “Viewport” as defined in Y.7.2.

Y.6.6 Camera Calibration for Network-based Stitching

Network-based stitching in the context of ITT4RT refers to generation of 360-degree videos in the ITT4RT MRF based on 2D video captures received from MTSI clients. This clause describes SDP-based signalling of camera calibration parameters for this purpose using the a=3gpp-camera-calibration attribute and SDP-based grouping of the corresponding 2D video captures using the a=stitch_group attribute.

The SDP syntax for a=3gpp-camera-calibration is defined with the following semantics (detailed ABNF presented at the end of the clause):

3gpp-camera-calibration = "a=3gpp-camera-calibration:" [SP "Param 1" SP "Param 2" SP ……. SP "Param K"]

where “Param 1”, …. , “Param K” express the set of intrinsic and extrinsic camera parameters as specified below.

If the ITT4RT-Tx client in the ITT4RT MRF intends to perform network-based stitching to generate 360-degree video from a particular set of 2D video captures received from an MTSI sender, it shall use the SDP session-level attribute a=stitch_group before any media lines that correspond to the particular 2D video captures during the SDP negotiation of the corresponding media. Likewise, an MTSI sender capable of capturing 2D videos for 360-degree video generation shall use the session-level a=stitch_group attribute in the SDP before any media lines that correspond to the particular 2D video captures. The a=stitch_group attribute is used to group the corresponding to-be-stitched 2D video captures using the mid attribute as defined according to the ABNF below:

a = stitch_group: <mid1> SP <mid2> SP <mid3> …

The mid attribute with the appropriate value as defined in the other parts of the SDP shall be included in the media description for the relevant 2D video captures when the a=stitch_group attribute is used. Furthermore, for each of these 2D video captures, the MTSI sender shall also include the SDP attribute 3gpp-camera-calibration in the SDP under the relevant m= line for that particular video to signal the relevant camera calibration information. The order of the media included in the a=stitch_group indicates the synchronization source with the first media always being the synchronization anchor when synchronization is required.

More specifically, detailed camera calibration parameters based on ISO/IEC 23008-2 [3] are provided as follows, considering the multi-view acquisition information SEI message for HEVC. With these specifications, a 3-dimensional world point, wP = [ x y z ] is mapped to a 2-dimensional camera point, cP[ i ] = [ u v 1 ], for the i-th camera according to:

s * cP[ i ] = A[ i ] * R−1[ i ] * ( wP − T[ i ] ) (eqn. Y.6.6.1)

where A[ i ] denotes the intrinsic camera parameter matrix, R−1[ i ] denotes the inverse of the rotation matrix R[ i ], T[ i ] denotes the translation vector, and s (a scalar value) is an arbitrary scale factor chosen to make the third coordinate of cP[ i ] equal to 1.

Equation Y.5.6.1 can be extended to incorporate the entrance pupil variation to correct the incidence ray of cP[ i ] = [ u v 1 ] such that it always passes through the camera optical center, thereby removing distortion. The resulting entrance pupil coefficients E[i] may be incorporated into Equation Y.6.6.1 as

s * cP[ i ] = A[ i ] * R−1[ i ] * ( (wP + E) − T[ i ] ) (eqn. Y.6.6.2)

where wP + E[i]) = [ x y z+E ], E = e1* 𝞡3 + e2* 𝞡5 + e3* 𝞡7 + e4* 𝞡9, 𝞡 is the incidence angle pertaining to each ray formed by the pixel cP[ i ] = [ u v 1 ], and [e1, e2, e3, e4] are entrance pupil coefficients. In addition, the accuracy of these entrance pupil parameters have an influence of the accuracy of estimated extrinsic parameters and thus improve the future imaging tasks. If not available, vector E is considered as 0 and a fallback to eqn. Y.6.6.1 is expected.

Accordingly, the following intrinsic camera parameters can be signalled in the SDP for each 2D video capture using the a=3gpp-camera-calibration attribute:

focalLengthX[ i ] specifies the focal length of the i-th camera in the horizontal direction as a signed floating-point number.

focalLengthY[ i ] specifies the focal length of the i-th camera in the vertical direction as a signed floating-point number.

principalPointX[ i ] specifies the principal point of the i-th camera in the horizontal direction as a signed floating-point number.

principalPointY[ i ] specifies the principal point of the i-th camera in the vertical direction as a signed floating-point number.

skewFactor[ i ] specifies the skew factor of the i-th camera as a signed floating-point number.

The intrinsic matrix A[ i ] for i-th camera is represented by:

It is possible that the intrinsic camera parameters are equal for all of the cameras. In that case, only one set of values based on the above parameters would need to be signalled, e.g., via SDP signalling at the session level.

Furthermore, the following extrinsic camera parameters can be signalled in the SDP for each camera as per ISO/IEC 23008-2 [3]:

rE[ i ][ j ][ k ] specifies the ( j, k ) component of the rotation matrix for the i-th camera as a signed floating-point number.

The rotation matrix R[ i ] for i-th camera is represented as follows:

tE[ i ][ j ] specifies the j-th component of the translation vector for the i-th camera as a signed floating-point number.

The translation vector T[ i ] for the i-th camera is represented by:

For the i-th camera, E[ i ][ j ] specifies the j-th component of the entrance pupil coefficient [e1, e2, e3, e4] where j=1,…4. The parameters are represented as a signed floating-point number, as per eqn (2) above.

The syntax for the "a=3gpp-camera-calibration" attribute shall conform to the following ABNF:

3gpp-camera-calibration = "3gpp-camera-calibration:" PT 1*WSP attr-list

PT = 1*DIGIT / "*"

attr-list = ( set *(1*WSP set) ) / "*"

; WSP and DIGIT defined in [RFC5234]

set= "[" "focalLengthX=" sfloatvalue "," "focalLengthY=" sfloatvalue "," "skewFactor=" sfloatvalue "," "principalPointX=" sfloatvalue "," "principalPointY=" sfloatvalue "rotation00=" sfloatvalue "rotation01=" sfloatvalue "rotation02=" sfloatvalue "rotation10=" sfloatvalue "rotation11=" sfloatvalue "rotation12=" sfloatvalue "rotation20=" sfloatvalue "rotation21=" sfloatvalue "rotation22=" sfloatvalue "translation0=" sfloatvalue "translation1=" sfloatvalue "translation2=" sfloatvalue "epupil1=" sfloatvalue "epupil2=" sfloatvalue "epupil3=" sfloatvalue “epupil4=" sfloatvalue "]"

sfloatvalue= [sign] sizevalue ["." 6*DIGIT]

sign = "-"

sizevalue = onetonine *5DIGIT

; Digit between 1 and 9 that is

; followed by 0 to 5 other digits

onetonine = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"

; Digit between 1 and 9

Y.6.7 Support for Stream Pausing/Resuming

An ITT4RT-Tx client shall use the a=rtcp-fb ccm pause attribute and parameter values as specified in [43] and [156] to indicate the capability to support receiving and acting on PAUSE and RESUME requests targeted for RTP streams it sends. The optional parameter setting of a=rtcp-fb ccm pause config=3 could be used by the ITT4RT-Tx client to indicate that it will only receive and react to PAUSE and RESUME requests but will not send them.

An ITRT4RT-Rx client shall use the a=rtcp-fb ccm pause attribute and parameter values as specified in [43] and [156] to indicate the capability to support sending PAUSE and RESUME requests targeted for RTP streams it receives. The optional parameter setting of a=rtcp-fb ccm pause config=2 could be used by the ITT4RT-Rx client to indicate that it will only send PAUSE and RESUME requests but does not support receiving these requests.

Y.6.8 Multiple 360-degree videos

An ITT4RT conference may contain multiple 360-degree videos which originate from multiple conference rooms at the conference location, or from remote participants. When multiple 360-degree videos are present in an ITT4RT conference, an ITT4RT MRF shall negotiate an SDP session with every remote participant.

In the SDP offer from the ITT4RT MRF, 360-degree video is identified by either the a=3gpp_360video or a=3gpp_fisheye media line attributes. When multiple 360-degree videos are present in the SDP offer, the ITT4RT MRF shall include the a=content attribute under the media lines for 2D or 360-degree video originating from the conference location. For media streams originating from the main default conference room, the content attribute is set to a=content:main. For media streams originating from other conference rooms, the content attribute is set to a=content:alt. 2D and 360-degree video from remote participants shall not include the a=content attribute under their corresponding media lines.

When there are multiple 360-degree videos from multiple sources available to the ITT4RT MRF, the ITT4RT MRF may include the ‘itt4rt_group’ attribute (as defined in Y.6.2.6) to define one or more restricting groups, each group containing at least one mid associated with a 360-degree video media line, and at least one mid associated with an overlay.

On receipt of an SDP offer containing multiple 360-degree videos from the ITT4RT MRF, an ITT4RT-Rx client shall select to receive only one 360-degree video media together with possible 2D video media from other sources, rejecting the other 360-degree video media.

Example SDP offers for multiple 360-degree video with and without group restrictions are shown in clause Y.8.

Y.6.8.2 Excluding other participants’ overlays

When an ITT4RT-Tx client in terminal sends a 360-degree video media stream to the MRF, it may include an attribute "a= no_other_overlays", which indicates that the MRF shall not group the 360-degree media stream from that ITT4RT-Tx client with overlay media streams from other ITT4RT clients. In this case, the MRF shall group the 360-degree video media stream and one or more overlays of that ITT4RT-Tx client in a separate <rest-group> in the itt4rt_group attribute when describing them to any ITT4RT-Rx client.

The ABNF syntax for this attribute is the following:

att-field = "no_other_overlays"

NOTE: If multiple itt4rt_group are created, an ITT4RT-Rx client in terminal would need to re-negotiate the session to switch to media streams from other itt4rt_group. However, doing so may add further burden on the signaling nodes.

Y.6.9 Scene Description-Based Overlays

Y.6.9.1 General

ITT4RT clients that support the “Overlay” feature may support the scene description as defined in [182] for signaling the overlay configuration.

If scene description-based overlays are supported, the following subset of the MPEG-I scene description extensions and features shall be supported:

– The MPEG_media extension: used to reference the media streams.

– The MPEG_accessor_timed and the MPEG_buffer_circular: used to bind timed media.

– The MPEG_texture_video: used to define video textures for the overlay and the 360 video.

– The scene description update mechanism as defined in clause 5.2.4 of [182].

If scene description-based overlays are used in an ITT4RT session with multiple participants, then the ITT4RT MRF shall be used for the session and shall own the scene description.

If scene description-based overlays are used, then the ITT4RT-TX client in the ITT4RT MRF shall:

– Create a sphere or cubemap mesh node (depending on the selected projection) in the scene description for each 360 video stream in the ITT4RT session. The source of the node’s texture shall reference the ITT4RT media stream of the corresponding 360 video as signaled by the SDP.

– Create a rectangular or spherical mesh node in the scene description for each overlay stream in the ITT4RT session. The source of the node’s texture shall reference the media stream of the corresponding overlay stream as signaled by the SDP.

– The location of the overlay shall be indicated by the transformation of the corresponding overlay node in the scene description.

NOTE: In a scene description-based overlay solution, the scene camera corresponds the viewer’s position and it tracks the user’s 3DoF movements. The camera’s projection determines the field of view of the user.

The URL format as specified in 23090-14 Annex C shall be used to reference media streams in the ITT4RT session.

For participants that support scene description, the overlay information and positioning that is provided as part of the scene description shall take precedence over any information provided as part of the 3gpp_overlay attribute.

An ITT4RT-Tx client in terminal that offers overlays may select to signal the overlay either through the 3gpp_overlay attribute or through a scene update that adds the overlay node. The scene update mechanism is described in [182]. In case the ITT4RT-Tx uses the 3gpp_overlay attribute to describe its overlays, the ITT4RT-Tx client in the ITT4RT MRF shall generate the scene description or scene description update document that signals the presence and position of that overlay.

Y.6.9.2 Offer/Answer Negotiation

An ITT4RT-Tx client that support scene description-based overlays, shall offer a data channel with a data channel indicating the “mpeg-sd” sub-protocol. The ITT4RT-Rx client in the MRF that supports scene-based overlays shall answer by accepting the scene description data channel.

If the offer is accepted, the ITT4RT MRF shall generate and send the scene description to the offerer upon establishment of the data channel.

If the ITT4RT MRF receives an offer that does not contain a data channel with the “mpeg-sd” sub-protocol, it shall assume that the ITT4RT client does not support scene description-overlays. In such case, the answer shall describe any overlays using the 3gpp_overlay attribute.

Y.6.9.3 SDP Signaling

An ITT4RT-Tx in the ITT4RT MRF that supports scene description-based overlays, shall support MTSI data channel media and act as an DCMTSI client. The data channel stream id shall be in the range allocated for bootstrap channels, i.e. between 1 and 1000, ecluding values in Table 6.2.10.1-2. A single data channel with sub-protocol “mpeg-sd” shall be present in the offer/answer SDP. If multiple data channels with the “mpeg-sd” sub-protocol are detected, the one with the lowest stream ID shall be used. The scene description data channel shall be configured as ordered, reliable, with normal SCTP multiplexing priority.

When scene description-based overlays are offered, the ITT4RT-Tx in the ITT4RT MRF shall offer a data channel with a stream id that indicates the “mpeg-sd” subprotocol in the dcmap attribute. The “mpeg-sd” messages shall be JSON formatted in UTF-8 coding without BOM.

scene description-based overlay descriptions, including complete scene descriptions and scene updates, shall be delivered through the same data channel.