8 Audio/Video Parameters

26.2233GPPMedia handling and interactionRelease 18Telepresence using the IP Multimedia Subsystem (IMS)TS

8.1 Overview

The audio/video parameters provided in clauses 8.2 and 8.3 should be supported by TP UEs as part of CLUE-based signaling in IMS-based telepresence sessions both at session initiation and during a session.

Collectively, these audio/video parameters and their associated values can be expected to provide a high quality telepresence experience for 3GPP’s IMS-based telepresence services from a media handling point of view.

Clause 8.2 describes the set of the capture-related audio/video parameters for 3GPP IMS-based telepresence services, while clause 8.3 describes the audio/video parameters on the telepresence system environment. Furthermore, guidance is provided in these clauses on the need for signalling these parameters at session initiation and during a session. While most of the parameters are already part of the CLUE framework, some of them are not and further references on the suitable signalling options for such parameters are also described. Some of these parameters are not signalled neither during session initiation nor during a session, but are still recommended to be supported in TP UEs for the purposes of quality monitoring.

8.2 Capture-Related Parameters

8.2.1 General Parameters

Table 8.2.1.1: General parameters

Parameter

Need for signalling at session initiation

Need for signalling during session

Remarks

mediaType

Y

Y

See the “mediaType” attribute in clause 11.2 of IETF CLUE data model schema [10].

captureScene description

Y

Y

See Capture Scene attributes in clause 7.3.1 of IETF CLUE framework [7] and the <captureScene> element in clause 16 of IETF CLUE data model schema [10].

sceneView description

Y

Y

See Capture Scene View attributes in clause 7.3.2 of IETF CLUE framework [7] and the <sceneView> element in clause 17 of IETF CLUE data model schema [10].

lang

Y

N

See the Language attribute in clause 7.1.1.9 of IETF CLUE framework [7] and the <lang> element in clause 11.15 of IETF CLUE data model schema [10].

priority

Y

Y

See the Priority attribute in clause 7.1.1.12 of IETF CLUE framework [7] and the <priority> element in clause 11.14 of IETF CLUE data model schema [10].

embeddedText

Y

Y

See the Embedded Text attribute in clause 7.1.1.13 of IETF CLUE framework [7] and the <embeddedText> element in clause 11.20 of IETF CLUE data model schema [10].

relatedTo

Y

Y

See the Related To attribute in clause 7.1.1.14 of IETF CLUE framework [7] and the <relatedTo> element in clause 11.17 of IETF CLUE data model schema [10].

presentation

Y

Y

See the Presentation attribute in clause 7.1.1.7 of IETF CLUE framework [7] and the <presentation> element in clause 11.19 of IETF CLUE data model schema [10].

personInfo

Y

Y

See the Person Information attribute in clause 7.1.1.10 of IETF CLUE framework [7] and the <personInfo> element in clause 21.1.2 of IETF CLUE data model schema [10].

personType

Y

Y

See the Person Type attribute in clause 7.1.1.11 of IETF CLUE framework [7] and the <personType> element in clause 21.1.3 of IETF CLUE data model schema [10].

sceneInformation

Y

Y

See the Scene Information attribute in clause 7.3.1.1 of IETF CLUE framework [7] and the <sceneInformation> element in clause 16.1 of IETF CLUE data model schema [10].

mediaCapture description

Y

Y

See the Description attribute in clause 7.1.1.6 of IETF CLUE framework [7] and the <description> element in clause 11.13 of IETF CLUE data model schema [10].

captureScene scale

Y

N

See Capture Scene attributes in clause 7.3.1 of IETF CLUE framework [7] and the “scale” attribute in clause 16.4 of IETF CLUE data model schema [10].

mediaCapture mobility

Y

N

See the Mobility of Capture attribute in clause 7.1.1.4 of IETF CLUE framework [7] and the <mobility> element in clause 11.16 of IETF CLUE data model schema [10].

mediaCapture view

Y

Y

See the View attribute in clause 7.1.1.8 of IETF CLUE framework [7] and the <view> element in clause 11.18 of IETF CLUE data model schema [10].

maxGroupBandwidth

Y

N

See the Encoding Group data structure in clause 9.2 of IETF CLUE framework [7] and the <maxGroupBandwidth> element in clause 18.1 of IETF CLUE data model schema [10].

Simulcast

Y

Y

Telepresence systems may provide multiple encodings for the one capture through a technique known as simulcast. For example, this may be achieved by sending multiple video coding streams with different characteristics to allow a receiving endpoint to choose the stream that meets its needs. Mechanisms for accomplishing simulcast in RTP and how to signal it in SDP are provided in [21].

8.2.2 Visual Parameters

Table 8.2.2.1: Visual parameters

Parameter

Need for signalling at session initiation

Need for signalling during session

Remarks

colorGamut

Y

N

This parameter indicates the Colour Gamut used in a Telepresence Video Stream. Signalled as part of the codec information, e.g. in H.264 and H.265 SEI [16]-[17].

lumaBitDepth

Y

N

This parameter indicates the bit depth of the luma samples in a digital picture. Signalled as part of the codec information, e.g. in H.264 and H.265 SEI [16]-[17].

chromaBitDepth

Y

N

This parameter indicates the bit depth of the chroma samples in a digital picture. Signalled as part of the codec information, e.g. in H.264 and H.265 SEI [16]-[17].

effectiveResolution

N

N

This parameter indicates effective resolution of a rendered video stream as perceived by the viewer, as defined by ITU-T H.TPS-AV [41]. Not signalled.

captureArea

Y

Y

See the Area of Capture attribute in clause 7.1.1.3 of IETF CLUE framework [7] and the <captureArea> element in clause 11.5.2 of IETF CLUE data model schema [10].

capturePoint

Y

Y

See the Point of Capture attribute in clause 7.1.1.1 of IETF CLUE framework [7] and the <captureOrigin> element in clause 11.5.1 of IETF CLUE data model schema [10].

lineOfCapturePoint

Y

Y

See the Point on Line of Capture attribute in clause 7.1.1.2 of IETF CLUE framework [7] and the <captureOrigin> element in clause 11.5.1 of IETF CLUE data model schema [10].

fovAzimuth

(NOTE1)

Y

N

This parameter indicates the azimuth range of the captured Field of View of a 360-degree video and is signalled in SDP. See azimuthrange in clause Y.6.2.3 of TS 26.114 [2].

fovElevation

(NOTE1)

Y

N

This parameter indicates the elevation range of the captured Field of View of a 360-degree video and is signalled in SDP. See elevationrange in clause Y.6.2.3 of TS 26.114 [2].

fovCentreAzimuth

(NOTE1)

Y

N

This parameter indicates the azimuth of the centre of the captured Field of View of a 360-degree video and signalled in SDP. See centreazimuth in clause Y.6.2.3 of TS 26.114 [2].

fovCentreAzimuth

(NOTE1)

Y

N

This parameter indicates the azimuth of the centre of the captured Field of View of a 360-degree video and signalled in SDP. See centreazimuth in clause Y.6.2.3 of TS 26.114 [2].

fovCentreAzimuth

(NOTE1)

Y

N

This parameter indicates the azimuth of the centre of the captured Field of View of a 360-degree video and signalled in SDP. See centreazimuth in clause Y.6.2.3 of TS 26.114 [2].

elevalue

(NOTE2)

Y

N

This parameter indicates the elevation for the circular region that contains the fisheye video and is signalled in SDP. See elevalue in clause Y.6.5.2 of TS 26.114 [2].

elevalue

(NOTE2)

Y

N

This parameter indicates the elevation for the circular region that contains the fisheye video and is signalled in SDP. See elevalue in clause Y.6.5.2 of TS 26.114 [2].

fovvalue

(NOTE2)

Y

N

This parameter indicates the field of view of the lens that corresponds to the fisheye video in the coded picture and is signalled in SDP. See fovvalue in clause Y.6.5.2 of TS 26.114 [2].

maxVideoBitrate

Y

Y

This parameter indicates the maximum number of bits per second relating to a single video encoding and is signalled in the SDP. See "max-mbps" in IETF RFC 6184 [18] and "CustomMaxMBPS" in ITU-T H.241 [22].

maxWidth

Y

N

This parameter indicates the maximum video resolution width in pixels and is signalled in the SDP. See "horizontal image size" in IETF RFC 6236 [23] and "CustomPictureFormat" in ITU-T H.245 [24].

maxHeight

Y

N

This parameter indicates the maximum video resolution height in pixels and is signalled in the SDP. See "vertical image size" in IETF RFC 6236 [23] and "CustomPictureFormat" in ITU-T H.245 [24].

maxFramerate

Y

N

This parameter indicates the maximum video framerate and is signalled in the SDP. See "framerate" in IETF RFC 4566 [25] and "MaxFPS" in ITU-T H.241 [22].

NOTE1: The parameters fovAzimuth, fovElevation, fovCentreAzimuth and fovCentreElevation should be used in case of immersive 360-degree video capture for ITT4RT clients, as defined in clause 15 of this document. In this case captureArea is not used.

NOTE2: The parameters azivalue, elevalue, tiltvaluea and fovvalue should be used in case of immersive 360-degree fisheye video capture for ITT4RT clients, as defined in clause 15 of this document. In this case captureArea is not used.

8.2.3 Audio Parameters

Table 8.2.3.1: Audio parameters

Parameter

Need for signalling at session initiation

Need for signalling during session

Remarks

Audio capturePoint

Y

Y

See the Point of Capture attribute in clause 7.1.1.1 of IETF CLUE framework [7] and the <captureOrigin> element in clause 11.5.1 of IETF CLUE data model schema [10].

Audio lineOfCapturePoint

Y

Y

See the Point on Line of Capture attribute in clause 7.1.1.2 of IETF CLUE framework [7] and the <captureOrigin> element in clause 11.5.1 of IETF CLUE data model schema [10].

Audio sensitivityPattern

Y

Y

See the Audio Capture Sensitivity Pattern attribute in clause 7.1.1.5 of IETF CLUE framework [7] and the <sensitivityPattern> element in clause 12.1 of IETF CLUE data model schema [10].

maxAudioBitrate

Y

Y

This parameter indicates the maximum number of bits per second relating to a single audio encoding and signalled in the SDP. See "bandwidth" in IETF RFC 4566 [25] and "maxBitRate" in ITU-T H.245 [24].

nominalAudio Level

Y

Y

This parameter indicates the nominal audio level sent in the Telepresence audio stream. See ITU-T H.245 [24] and clause 7.1.3.3 of ITU-T H.TPS-AV [41].

dynamicAudioLevel

N

Y

This parameter indicates the actual audio level sent in the Telepresence audio stream as it varies as a function of time, and may be signalled in the RTP header extension. See IETF RFC 6464 [26] and clause 7.1.3.4 of ITU-T H.TPS-AV [41].

8.2.4 Delay Parameters

Table 8.2.4.1: Delay parameters

Parameter

Need for signalling at session initiation

Need for signalling during session

Remarks

endToEndVideoDelay

N

N

This parameter indicates the one-way end to end delay (camera lens to video display) of the video media sent between two Telepresence terminals. In order to provide a high QoE telepresence experience to end-users, telepresence systems, it is desirable for the end to end video delay to be less than 320 milliseconds [39]-[41]. Not signalled.

endToEndAudioDelay

N

N

This parameter indicates the one-way end to end delay (mouth to ear) of the audio media sent between two Telepresence terminals. In order to provide a high QoE telepresence experience to end-users, telepresence systems, it is desirable for the end to end audio delay to be less than 280 milliseconds [39]-[41]. Not signalled.

audioVideoSynchronization

N

N

This parameter indicates the synchronization between an audio and the corresponding video media stream (EndtoEndVideoDelay-EndtoEndAudioDelay). In order to provide high QoE telepresence services to end-users, telepresence systems should maintain synchronization within 40 and -60 milliseconds (i.e. synchronization error is less than 40 ms if the audio stream is ahead of the video stream and less than 60 ms if the video stream is ahead of the audio stream) [39]-[41]. Not signalled.

NOTE: Delay numbers are based on ITU-T references [39]-[41] and 3GPP-specific modifications are FFS.

8.2.5 Multiple Source Capture Parameters

Table 8.2.5.1: Multiple Source Capture parameters

Parameter

Need for signalling at session initiation

Need for signalling during session

Remarks

multiContentCapture

Y

Y

See the Multiple Content Capture in clause 7.2 of IETF CLUE framework [7].

MCC sources

Y

Y

See the Multiple Content Capture in clause 7.2 of IETF CLUE framework [7].

MCC maxCaptures

Y

Y

See the MaxCaptures attribute in clause 7.2.1.1 of IETF CLUE framework [7].

MCC policy

Y

Y

See the Policy MCC attribute in clause 7.2.1.2 of IETF CLUE framework [7].

MCC synchronizationID

Y

Y

See the SynchronizationID MCC attribute in clause 7.2.1.3 of IETF CLUE framework [7].

8.3 Telepresence System Environment Parameters

Table 8.3.1: Telepresence System Environment parameters

Parameter

Need for signalling at session initiation

Need for signalling during session

Remarks

illuminantType

Y

Y

This parameter describes the profile of the visible light at a telepresence endpoint. May need to be signalled if lighting changes during session. Signalling is based on Annex E of ITU-T H.264 [16] and Annex E of ITU-T H.265 [17].

illuminantCRI Index

Y

Y

This parameter describes the colour rendering index (CRI) of the visible (ambient) light at the telepresence endpoint. Signalling is based on Annex E of ITU-T H.264 [16] and Annex E of ITU-T H.265 [17].

illuminantColourTemperature

Y

Y

This parameter describes the correlated colour temperature (CCT) of the visible (ambient) light at the telepresence endpoint. Signalling is based on Annex E of ITU-T H.264 [16] and Annex E of ITU-T H.265 [17].