7 Data transport

26.1143GPPIP Multimedia Subsystem (IMS)Media handling and interactionMultimedia telephonyRelease 18TS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

7.1 General

MTSI clients shall support an IP-based network interface for the transport of session control and media data. Control-plane signalling is sent using SIP; see TS 24.229 [7] for further details. Real-time user plane media data is sent over RTP/UDP/IP. Real-time interaction is using data channels over SCTP/DTLS/UDP/IP. Non-real-time media may use other transport protocols, for example UDP/IP or TCP/IP. An overview of the user plane protocol stack can be found in figure 4.3 of the present document.

7.2 RTP profiles

MTSI clients shall transport speech, video and real-time text using RTP (RFC 3550 [9]) over UDP (RFC 0768 [39]). The following profiles of RTP shall be supported for all media types:

– RTP Profile for Audio and Video Conferences with Minimal Control (RFC 3551 [10]), also called RTP/AVP;

The following profiles of RTP shall be supported for video and should be supported for all other media types:

– Extended RTP Profile for RTCP-based Feedback (RTP/AVPF) (RFC 4585 [40]), also called RTP/AVPF.

The support of AVPF requires an MTSI client in terminal to implement the RTCP transmission rules, the signalling mechanism for SDP and the feedback messages explicitly mentioned in the present document.

For a given RTP based media stream, the MTSI client in terminal shall use the same port number for sending and receiving RTP packets. This facilitates interworking with fixed/broadband access. However, the MTSI client shall accept RTP packets that are not received from the same remote port where RTP packets are sent by the MTSI client.

7.3 RTCP usage

7.3.1 General

The RTP implementation shall include an RTCP implementation.

For a given RTP based media stream, the MTSI client in terminal shall use the same port number for sending and receiving RTCP packets. This facilitates interworking with fixed/broadband access. However, the MTSI client shall accept RTCP packets that are not received from the same remote port where RTCP packets are sent by the MTSI client.

The bandwidth for RTCP traffic shall be described using the "RS" and "RR" SDP bandwidth modifiers at media level, as specified by RFC 3556 [42]. Therefore, an MTSI client shall include the "b=RS:" and "b=RR:" fields in SDP, and shall be able to interpret them. There shall be an upper limit on the allowed RTCP bandwidth for each RTP session signalled by the MTSI client. This limit is defined as follows:

– 8 000 bps for the RS field (at media level);

– 6 000 bps for the RR field (at media level).

The RS and RR values included in the SDP answer should be treated as the negotiated values for the session and should be used to calculate the total RTCP bandwidth for all terminals in the session.

If the session described in the SDP is a point-to-point speech only session, the MTSI client may request the deactivation of RTCP by setting its RTCP bandwidth modifiers to zero.

If a MTSI client receives SDP bandwidth modifiers for RTCP equal to zero from the originating MTSI client, it should reply (via the SIP protocol) by setting its RTCP bandwidth using SDP bandwidth modifiers with values equal to zero.

RTCP packets should be sent for all types of multimedia sessions to enable synchronization with other RTP transported media, remote end-point aliveness information, monitoring of the transmission quality, and carriage of feedback messages such as TMMBR for video and RTCP APP for speech. The RR value should be set greater than zero to enable RTCP packets to be sent when media is put on hold and during active RTP media transmission, including real-time text sessions which may have infrequent RTP media transmissions.

Point-to-point speech only sessions may not require the above functionalities and may therefore turn off RTCP by setting the SDP bandwidth modifiers (RR and RS) to zero. When RTCP is turned off (for point-to-point speech only sessions) and the media is put on hold, the MTSI client should re-negotiate the RTCP bandwidth with the SDP bandwidth modifier RR value set greater than zero, and send RTCP packets (i.e., Receiver Reports) to the other end. This allows the remote end to detect link aliveness during hold. When media is resumed, the resuming MTSI client should request to turn off the RTCP sending again through a re-negotiation of the RTCP bandwidth with SDP bandwidth modifiers equal to zero.

When RTCP is turned off (for point-to-point speech only sessions) and if sending of an additional associated RTP stream becomes required and both RTP streams need to be synchronized, or if transport feedback due to lack of end-to-end QoS guarantees is needed, a MTSI client should re-negotiate the bandwidth for RTCP by sending an SDP with the RR bandwidth modifier greater than zero. Setting the RR bandwidth modifier greater than zero allows sending of RTCP Receiver Reports even when the session is put on hold and neither terminal is actively sending RTP media.

NOTE: Deactivating RTCP will disable the adaptation mechanism for speech defined in clause 10.2.

7.3.2 Speech

MTSI clients in terminals offering speech should support AVPF (RFC 4585 [40]). When allocating RTCP bandwidth, it is recommended to allocate RTCP bandwidth and set the values for the "b=RR:" and the "b=RS:" parameters such that a good compromise between the RTCP reporting needs for the application and bandwidth utilization is achieved, see also Annex A.6. The value of "trr-int" should be set to zero or not transmitted at all (in which case the default "trr‑int" value of zero will be assumed) when Reduced-Size RTCP (see clause 7.3.6) is not used.

For speech sessions it is beneficial to keep the size of RTCP packets as small as possible in order to reduce the potential disruption of RTCP onto the RTP stream in bandwidth-limited channels. RTCP packet sizes can be minimized by using Reduced-Size RTCP packets or using the parts of RTCP compound packets (according to RFC 3550 [9]) which are required by the application. RTCP compound packet sizes should be at most as large as 1 time and, at the same time, shall be at most as large as 4 times the size of the RTP packets (including UDP/IP headers) corresponding to the highest bit rate of the speech codec modes used in the session. Reduced-Size RTCP and semi-compound RTCP packet sizes should be at most as large as 1 time and, at the same time, shall be at most as large as 2 times the size of the RTP packets (including UDP/IP headers) corresponding to the highest bit rate of the speech codec modes used in the session.

An MTSI client using ECN for speech in RTP sessions may support the RTCP AVPF ECN feedback message and the RTCP XR ECN summary report [84]. If the MTSI client supports the RTCP AVPF ECN feedback message then the MTSI client shall also support the RTCP XR ECN summary report.

NOTE 1: This can improve the interworking with non-MTSI ECN peers.

When an MTSI client that has negotiated the use of ECN and then receives RTP packets with ECN-CE marks, the MTSI client shall send application specific adaptation requests (RTP CMR [28] or RTCP-APP CMR, as defined in Subclause 10.2.1.5) and shall not send RTCP AVPF ECN feedback messages, even if RTCP AVPF ECN feedback messages were negotiated.

NOTE 2: RTP CMR is mandated to be supported by any AMR or AMR-WB implementation using the RTP profile [28].

When an MTSI client in terminal that has negotiated the use of ECN for speech and RTCP AVPF ECN feedback messages receives both application specific requests and RTCP AVPF ECN feedback messages, the MTSI client should follow the application specific requests for perfoming media bit rate adaptation.

When an MTSI client in terminal that has negotiated the use of ECN for speech and RTCP XR ECN summary reports receives an RTCP XR ECN summary report, the MTSI client should use the RTCP XR ECN summary report as specified in [84]. If the MTSI client received and acted upon a recent application specific adaptation request, then the MTSI client shall not perform any additional rate adaptation based on the received RTCP XR ECN summary report.

If ANBR (see clause 10.7) is available to the MTSI client in terminal, it should use this information when performing media bitrate adaptation. In addition, a media receiving MTSI client in terminal may send RTCP-APP or RTP CMR messages for speech rate adaptation based on adaptation decisions, including ANBR information.

For speech, RTCP APP packets are used for adaptation (see clause 10.2). If the MTSI client determines that RTCP APP cannot be used or does not work then the MTSI client may use CMR in the AMR RTP payload [28] inband CMR or other RTCP mechanisms for adaptation.

An MTSI client that requests mode adaptation shall use the CMR in the AMR/AMR-WB RTP payload [28] when using the AMR or the AMR-WB codec or in the EVS payload [125] when using the EVS codec, respectively, when:

– the RTCP bandwidth is set to zero,

– the MTSI client detected that the remote end-point does not respond to adaptation requests sent with RTCP APP during the session, or

– the support for RTCP APP was not negotiated for the session.

If RTCP-APP was negotiated, an MTSI client that requests mode adaptation for EVS shall use RTCP-APP when the CMR in the EVS RTP payload has been disabled for the session.

NOTE 3: It is not possible to send adaptation requests if both CMR in the EVS RTP payload has been disabled and if RTCP-APP is not negotiated for the session.

An MTSI client using AMR or AMR-WB that requests mode adaptation when no MTSI feature tag was received (see clause 5.2 of [57]) may use the CMR in the AMR/AMR-WB RTP payload, [28], when AMR or AMR-WB is used and may use the CMR in the EVS RTP payload, [125], when EVS is used, respectively. If ECN-triggered adaptation is used and an MTSI client requests mode adaptation when no MTSI feature tag was received it should use the CMR in the AMR RTP payload, [28].

NOTE 4: Other procedures by which the MTSI client determines that RTCP APP cannot be used or does not work is implementation specific.

If ECN-triggered adaptation is used with AVP then the RTCP APP signalling could be too slow and CMR in the AMR RTP payload [28] should be used for faster feedback.

An MTSI client that requests mode adaptation in combination with other codec control requests (as defined in clause 10.2.1) shall use RTCP APP.

An MTSI client that requests rate adaptation for unidirectional streams shall use RTCP-based adaptation signaling (RTCP APP or RTCP SR/RR) since CMR in the AMR RTP payload, [28] is not usable for unidirectional streams.

7.3.3 Video

MTSI clients offering video shall support AVPF (RFC 4585 [40]). The behaviour can be controlled by allocating enough RTCP bandwidth using "b=RR:" and "b=RS:" (see clause 7.3.1) and setting the value of "trr-int".

MTSI clients offering video shall support transmission and reception of AVPF NACK messages, as an indication of non-received media packets. MTSI terminals offering video shall also support transmission and reception of AVPF Picture Loss Indication (PLI). The actions of an MTSI client receiving NACK or PLI to improve the situation for the MTSI client that sent NACK or PLI is defined in clause 9.3. Note that by setting the bitmask of following lost packets (BLP) the frequency of transmitting NACK can be reduced, but the repairing action by the MTSI client receiving the message can be delayed correspondingly.

The Temporary Maximum Media Bit-rate Request (TMMBR) and Temporary Maximum Media Bit-rate Notification (TMMBN) messages of Codec-Control Messages (CCM) [43] shall be supported by MTSI clients in terminals supporting video. The TMMBR notification messages along with RTCP sender reports and receiver reports are used for dynamic video rate adaptation. See clause 10.3 for usage and Annexes B and C for examples of bitrate adaptation.

MTSI clients supporting video shall support Full Intra Request (FIR) of CCM [43]. A sender should ignore FIR messages that arrive within Response Wait Time (RWT) duration after responding to a previous FIR message. Response Wait Time (RWT) is defined as RTP-level round-trip time, estimated by RTCP or some other means, plus twice the frame duration.

MTSI clients in terminals shall not use SIP INFO message, as specified in [96], for video picture fast update.

The usage of the AVPF and CCM feedback messages is negotiated in SDP offer/answer, see Clause 6.2.3.2. Any AVPF or CCM feedback messages that have not been agreed in the SDP offer/answer negotiation shall not be used in the session, [40].

An MTSI client using ECN for video in RTP sessions may support the RTCP AVPF ECN feedback message and the RTCP XR ECN summary report [84]. If the MTSI client supports the RTCP AVPF ECN feedback message then the MTSI client shall also support the RTCP XR ECN summary report.

NOTE: This can improve the interworking with non-MTSI ECN-capable peers.

When an MTSI client that has negotiated the use of ECN and TMMBR receives RTP packets with ECN-CE marks, the MTSI client shall send application specific adaptation requests (TMMBR) and shall not send RTCP AVPF ECN feedback messages, even if RTCP AVPF ECN feedback messages were negotiated in addition to TMMBR.

When an MTSI client that has negotiated the use of ECN for video and RTCP AVPF ECN feedback messages receives both application specific requests and RTCP AVPF ECN feedback messages, the MTSI client should follow the application specific requests for perfoming media bit rate adaptation.

When an MTSI client that has negotiated the use of ECN for video and RTCP XR ECN summary reports receives an RTCP XR ECN summary report, the MTSI client should use the RTCP XR ECN summary report as specified in [84]. If the MTSI client received and acted upon a recent application specific adaptation request, then the MTSI client shall not perform any additional rate adaptation based on the received RTCP XR ECN summary report.

If ANBR (see clause 10.7) information is available to the MTSI client in terminal, it should use this information when performing media bitrate adaptation. In addition, a media receiving MTSI client in terminal may send RTCP feedback messages (e.g., TMMBR, TMMBN messages of CCM, etc.) for video rate adaptation based on adaptation decisions, including ANBR information.

7.3.4 Real-time text

For real-time text, RTCP reporting should be used according to general recommendations for RTCP.

7.3.5 Void

7.3.6 Reduced-Size RTCP

MTSI clients should support the use of Reduced-Size RTCP reports [87]. A Reduced-Size RTCP packet is an RTCP packet that does not follow the sending rules outlined in RFC 3550 [9] in the aspect that it does not necessarily contain the mandated RR/SR report blocks and SDES CNAME items.

As specified in RFC5506 [87], a client that support Reduced-Size RTCP shall also support AVPF, see clause 7.2 An SDP offer to use Reduced-Size RTCP shall also offer using AVPF.

When Reduced-Size RTCP is used, the following requirements apply on the RTCP receiver:

– The RTCP receiver shall be capable of parsing and decoding report blocks of the RTCP packet correctly even though some of the items mandated by RFC3550 [9] are missing.

– An SDP attribute "a=rtcp-rsize" is used to enable Reduced-Size RTCP. A receiver that accepts the use of Reduced-Size RTCP shall include the attribute in the SDP answer. If this attribute is not set in offer/answer, then Reduced-Size RTCP shall not be used in any direction.

When Reduced-Size RTCP is used, an RTCP sender transmitting Reduced-Size RTCP packets shall follow the requirements listed below:

– AVPF early or immediate mode shall be used according to RFC4585 [40].

– The "a=rtcp-rsize" attribute shall be included in the SDP offer, see Annex A.9a.

– Reduced-Size RTCP packets should be used for transmission of adaptation feedback messages, for example APP packets as defined in Clause 10.2 and TMMBR as defined in Clause 10.3. When regular feedback packets are transmitted, the individual packets that would belong to a compound RTCP packet shall be transmitted in a serial fashion, although adaptation feedback packets shall take precedence.

– Two or more RTCP packets should be stacked together, within the limits allowed by the maximum size of Reduced-Size RTCP packets (see clause 7.3.2) (i.e., to form a semi-compound RTCP packet which is smaller than a compound RTCP packet). The RTCP sender should not send Reduced-Size RTCP packets that are larger than the regularly scheduled compound RTCP packets.

– Compound RTCP packets with an SR/RR report block and CNAME SDES item should be transmitted on a regular basis as outlined in RFC 3550 [9] and RFC 4585 [40]. In order to control the allocation of bandwidth between Reduced-Size RTCP and compound RTCP, the AVPF "trr-int" parameter should be used to set the minimum report interval for compound RTCP packets.

– The first transmitted RTCP packet shall be a compound RTCP packet as defined in RFC3550 [9] without the size restrictions defined in clause 7.3.2.

The application should verify that the Reduced-Size RTCP packets are successfully received by the other end-point. Verification can be done by implicit means, for instance the RTCP sender that sends an adaptation feedback requests is expected to detect some kind of a response to the requests in the media stream. If verification fails then the RTCP sender shall switch to the use of compound RTCP packets according to the rules outlined in RFC3550 [9].

Examples of SDP negotiation for Reduced-Size RTCP given in Clause A.9a.

7.3.7 Video Region-of-Interest (ROI) Signaling

Video Region-of-Interest (ROI) consists of signalling the currently requested region-of-interest (ROI) of the video on the receiver side to the sender for appropriate encoding and transmission.

Video ROI is composed of three modes of signalling from an MTSI receiver to an MTSI sender in order to request a desired region of interest, and an MTSI client supporting ROI shall support at least one of these modes:

– ‘FECC’ mode, in which the MTSI client uses the FECC protocol based on ITU-T H.281 over H.224 [135]-[138] to signal ROI information as a sequence of ‘Pan’, ‘Tilt’, ‘Zoom’ and ‘Focus’ (PTZF) commands.

– ‘Arbitrary ROI’ mode, in which the MTSI receiver determines a specific ROI and signals this ROI to the MTSI sender.

– ‘Pre-defined ROI’ mode, in which the MTSI receiver selects one of the ROIs pre-determined by the MTSI sender and signals this ROI to the MTSI sender. In this mode, the MTSI receiver obtains the set of pre-defined ROIs from the MTSI sender during the SDP capability negotiation.

In the FECC mode, the ROI information shall be signaled by the MTSI client via RTP packets that carry H.224 frames using the stack IP/UDP/RTP/H.224/H.281. FECC is internal to the H.224 frame and is identified by the client ID field of the H.224 packet. The zooming to a particular region of interest is enabled by the H.281 protocol that supports the 4 basic camera movements "PTZF" (Pan, Tilt, Zoom, and Focus). In case of a fixed camera without pan/tilt capabilities, the pan command should be mapped to left/right movements/translations and tilt command should be mapped to up/down movements/translations over the 2D image plane. As such, a combination of PTZ commands can still allow for zooming into an arbitrary ROI.

The signalling of ‘Arbitrary ROI’ and ‘Pre-defined ROI’ requests uses RTCP feedback messages as specified in IETF RFC 4585 [40]. The RTCP feedback message is identified by PT (payload type) = PSFB (206) which refers to payload-specific feedback message. FMT (feedback message type) shall be set to the value ‘9’ for ROI feedback messages. The IANA registration information for the FMT value for ROI is provided in Annex R.1. The RTCP feedback method may involve signaling of ROI information in both of the immediate feedback and early RTCP modes.

The FCI (feedback control information) format for ROI shall be as follows. The FCI shall contain exactly one ROI. The ROI information is composed of the following parameters:

– Position_X – specifies the x-coordinate for the upper left corner of the ROI area covered in the original content (i.e., uncompressed captured content) in units of pixels

– Position_Y – specifies the y-coordinate for the upper left corner of the ROI area covered in the original content in units of pixels

– Size_X – specifies the horizontal size of the ROI area covered in the original content in units of pixels

– Size_Y – specifies the vertical size of the ROI area covered in the original content in units of pixels

– ROI_ID – identifies the pre-defined ROI selected by the MTSI receiver

For ‘Arbitrary ROI’ requests, the RTCP feedback message for ROI shall contain the parameters Position_X, Position_Y, Size_X and Size_Y. The values for the each of the parameters Position_X, Position_Y, Size_X and Size_Y shall each be indicated using two bytes. The MTSI sender shall ignore ROI requests describing regions outside the original video. The FCI for the RTCP feedback message for ‘Arbitrary ROI’ shall follow the following format:

For each two-byte indication of the Position_X, Position_Y, Size_X and Size_Y parameters, the high byte (indicated by ‘(h)’ above) shall be followed by the low byte (indicated by ‘(l)’ above), where the low byte holds the least significant bits.

For ‘Pre-defined ROI’ requests, the RTCP feedback message for ROI shall contain the ROI_ID parameter. The value of ROI_ID shall be acquired from the "a=predefined_ROI" attributes that are indicated in the SDP offer-answer negotiation (see clause 6.2.3.4 for the related SDP-based procedures). The value for the ROI_ID parameter shall be indicated using one byte. The FCI for the RTCP feedback message for ‘Pre-defined ROI’ shall follow the following format:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| all ones | ROI_ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

If ‘Arbitrary ROI’ and ‘Pre-defined ROI’ are both successfully negotiated, then the RTCP feedback message from the MTSI receiver shall conform to one of the two message formats specified above for ‘Arbitary ROI’ or ‘Pre-defined ROI’, respectively. The MTSI sender should distinguish between the two RTCP feedback message formats by parsing the first 24 bits, which is uniquely set to all ones in case of ‘Pre-defined ROI’ requests.

The semantics of the ROI feedback messages is independent of the payload type.

‘Sent ROI’ involves signalling from the MTSI sender to the MTSI receiver and this helps the MTSI receiver to know the actually sent ROI corresponding to the video transmitted by the MTSI sender, i.e., which may or may not agree with the ROI requested by the MTSI receiver, but shall contain it so that the end user is still able to see the desired ROI. When ‘Sent ROI’ is successfully negotiated, it shall be signalled by the MTSI sender.

If the sent ROI corresponds to an arbitrary ROI (indicated via the URN urn:3gpp:roi-sent in the SDP negotiaton, see clause 6.2.3.4), the signalling of the ROI shall use RTP header extensions as specified in IETF 5285 [95] and shall carry the Position_X, Position_Y, Size_X and Size_Y parameters corresponding to the actually sent ROI. The one-byte form of the header should be used. The values for the parameters Position_X, Position_Y, Size_X and Size_Y shall each be indicated using two bytes, with the following format:

The 4-bit ID is the local identifier as defined in [95]. The length field takes the value 7 to indicate that 8 bytes follow. For each two-byte indication of the Position_X, Position_Y, Size_X and Size_Y parameters, the high byte (indicated by ‘(h)’ above) shall be followed by the low byte (indicated by ‘(l)’ above), where the low byte holds the least significant bits.

If the sent ROI corresponds to one of the pre-defined ROIs (indicated via the URN urn:3gpp:predefined-roi-sent in the SDP negotiation, see clause 6.2.3.4), then the signalling of the ROI shall again use the RTP header extensions and shall carry the ROI_ID parameter corresponding to the actually sent pre-defined ROI. The one-byte form of the header should be used. The value for the ROI_ID parameter shall be indicated using one byte, with the following format:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ID | len=0 | ROI_ID | zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In this case, the length field takes the value 0 to indicate that only a single byte follows.

‘Arbitrary ROI’ and ‘Pre-defined ROI’ may be supported bi-directionally or uni-directionally depending on how clients negotiate to support the feature during SDP capability negotiations. For terminals with asymmetric capability (e.g. the ability to process ROI information but not detect/signal ROI information), the sendonly and recvonly attributes may be used. Terminals should express their capability in each direction sufficiently clearly such that signals are only sent in each direction to the extent that they both express useful information and can be processed by the recipient.

‘Arbitary ROI’ and ‘Pre-defined ROI’ support may be offered at the same time, or only one of them may be offered. When both capabilities are successfully negotiated by the MTSI sender and receiver, it is the MTSI receiver’s decision to request an arbitrary ROI or one of the pre-defined ROIs at a given time. When pre-defined ROIs are offered by the MTSI sender, it is also the responsibility of the MTSI sender to detect and track any movements of the ROI, e.g., the ROI could be a moving car, or moving person, etc. and refine the content encoding accordingly.

The presence of ROI signalling should not impact the negotiated resolutions (based on SDP imageattr attribute) between the sending and receiving terminals. The only difference is that the sending terminal should encode only the ROI with the negotiated resolution rather than the whole captured frame, and this would lead to a higher overall resolution and better user experience than having the receiving terminal zoom in on the ROI and crop out the rest of the frame.

The ROI information parameters exchanged via the RTP/RTCP signalling defined above are independent of the negotiated video resolution for the encoded content. Instead, the ROI information parameters defined above take as reference the original video content, i.e., uncompressed captured video content. Therefore, no modifications or remappings of ROI parameters are necessary during any transcoding that results in changes in video resolution or during potential dynamic adaptations of encoded video resolution at the sender.

An MTSI sender may have to handle multiple simultaneously received ROI requests. The encoder at the MTSI sender may consider the multiple ROI requests to determine a proximity ROI that is a larger area that contains all the requested ROIs, and encode the transmitted video stream according to the proximity ROI. The encoder may iteratively adjust the proximity ROI based on the interactive additional ROI requests received from the remote clients. These additional ROI requests can be in the form of PTZF commands (using the FECC protocol) corresponding to the desired translation of the proximity ROI each MTSI receiver wishes the MTSI sender to make. Alternatively, the MTSI sender may offer the set of candidate proximity ROIs to the MTSI receivers using the pre-defined ROI signalling framework, and collect responses from the MTSI receivers to determine their preferred proximity ROIs. By considering these additional ROI requests, the MTSI sender can make a better decision on the proximity ROI to fulfil the requests of as many MTSI receivers as possible.

When the MTSI sender is not able to derive a proximity ROI from the received concurrent ROI requests, the MTSI sender should transmit the full-size view of the video to those users whose ROI requests cannot be satisfied. In case of ‘Pre-defined ROI’, this can be achieved by including the full-size view of the video in the list of pre-defined ROIs. Then, the MTSI sender can transmit the full-size view of the video and also signal the corresponding ROI_ID (via the RTP header extension using ‘Sent ROI’) if a specific pre-defined ROI request cannot be satisfied. In case of ‘Arbitrary ROI’, the MTSI sender can transmit the full-size view of the video and also signal the corresponding coordinates of the full-size view (via the RTP header extension using ‘Sent ROI’) during times when an ROI request cannot be satisfied.

7.3.8 Delay Budget Information (DBI) Signaling

RAN delay budget reporting is specified in TS 36.331 [160] for E-UTRA and TS 38.331 [163] for NR while the use of RAN delay budget reporting is specified for coverage enhancements only in E-UTRA.. RAN delay budget reporting through the use of RRC signalling to eNB / gNB allows UEs to locally adjust air interface delay. Based on the reported delay budget information, a good coverage UE on the receiving end (i.e., the UE that contains the MTSI receiver) can reduce its air interface delay, e.g., by turning off CDRX or via other means. This additional delay budget can then be made available for the sending UE (i.e., the UE that contains the MTSI sender), and can be quite beneficial for the sending UE when it suffers from poor coverage. When the sending UE is in bad coverage, it would request the additional delay from its local eNB / gNB, and if granted, it would utilize the additional delay budget to improve the reliability of its uplink transmissions in order to reduce packet loss, e.g., via suitable repetition or retransmission mechanisms, and thereby improve end-to-end delay and quality performance.

While RAN-level delay budget reporting as defined in TS 36.331 [160] and TS 38.331 [163] allows UEs (i.e., MTSI sender and MTSI receiver) to locally adjust air interface delay, such a mechanism does not provide coordination between the UEs on an end-to-end basis. To alleviate this issue, this clause defines RTCP signalling to realize the following capabilities on signalling of delay budget information (DBI) across UEs: (i) an MTSI receiver can indicate available delay budget to an MTSI sender, and (ii) an MTSI sender can explicitly request delay budget from an MTSI receiver.

More specifically, the RTCP-based signalling of DBI is composed of a dedicated RTCP feedback (FB) message type to carry available additional delay budget during the RTP streaming of media, signalled from the MTSI receiver to the MTSI sender. In addition, the defined RTCP feedback message type may also be used to carry requested additional delay budget during the RTP streaming of media, signalled from the MTSI sender to the MTSI receiver.

A corresponding dedicated SDP parameter on the RTCP-based ability to signal available or requested additional delay budget during the IMS/SIP based capability negotiations is also defined, as described in sub-clause 6.2.8.

Such RTCP-based signaling of DBI can also be used by an MTSI receiver to indicate delay budget availability created via other means such as jitter buffer size adaptation as mentioned in clause 8.2.1.

The signalling of available or requested additional delay budget information (DBI) shall use RTCP feedback messages as specified in IETF RFC 4585 [40]. The RTCP feedback message is identified by PT (payload type) = RTPFB (205) which refers to RTP-specific feedback message. FMT (feedback message type) shall be set to the value ’10’ for delay budget information (DBI). The RTCP feedback method may involve signalling of available or requested additional delay budget in both of the immediate feedback and early RTCP modes.

As such, the RTCP feedback message shall be sent from the MTSI receiver to the MTSI sender to convey to the sender the available additional delay budget from the perspective of the receiver. The recipient UE of the RTCP feedback message (i.e., the UE containing the MTSI sender) may then use this information in determining how much delay budget it may request from its eNB / gNB over the RAN interface, e.g. by using RRC signalling based on UEAssistanceInformation as defined in TS 36.331 [160] and TS 38.331 [163].

The FCI (feedback control information) format shall be as follows. The FCI shall contain exactly one instance of the available additional delay budget information, composed of the following parameters:

– Available additional delay budget delay – specified in milliseconds (16 bits)

– Sign ‘s’ for the additional delay budget delay and whether this is positive or negative– specified as a Boolean (1 bit)

– Query ‘q’ for additional delay budget – specified as a Boolean (1 bit)

The sign value, ‘s’ may be positive, indicated by ‘1’ or negative, indicated by ‘0’. Essentially, when the additional delay parameter takes on a positive value, the UE indicates that there is additional delay budget available. In case the additional delay parameter takes on a negative value, the UE indicates that the available delay budget has been reduced. A sequence of RTCP feedback messages may be sent by the UE to report on the additional delay budget availability in increments.

When the MTSI receiver sends RTCP feedback messages indicating the available delay budget for the received RTP stream, the query parameter shall be to be set to ‘0’. When the MTSI sender sends RTCP feedback messages indicating the requested delay budget for the RTP stream sent from the MTSI sender to the MTSI receiver, the query parameter shall be set to ‘1’. In this case, the value of delay indicates the additional delay budget requested by the sender of the RTCP feedback message (i.e., the MTSI sender) for the RTP stream sent from the MTSI sender to the MTSI receiver.

The FCI for the proposed RTCP feedback message shall follow the following format where (i) ‘s’ stands for the single-bit message on the sign of the additional delay parameter and (ii) ‘q’ stands for the single-bit message on query:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| delay |s|q| zero padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The high byte of delay shall be followed by the low byte, where the low byte holds the least significant bits.

Annex V presents example signalling flows on RAN delay budget reporting usage for voice in MTSI with and without DBI signalling.

An MTSI receiver shall not indicate available delay budget to an MTSI sender via DBI signalling more frequently than once every T_DBI seconds, provided that the necessary amount of RTCP bandwidth is available. If an MTSI receiver indicates available delay budget to an MTSI sender via DBI signalling, this shall mean that the indicated delay budget amount is available to the MTSI sender for at least the duration of T_DBI seconds. An MTSI sender shall not request delay budget from an MTSI receiver via DBI signalling more frequently than once every T_DBI seconds. T_DBI shall be set to a value between 1 – 3 seconds.

NOTE: The requirement on how to set the exact value of T_DBI is FFS.

Timing-wise, it is possible that DBI signalling may happen concurrently or asynchronously between the MTSI sender and MTSI receiver, i.e., the MTSI receiver may indicate available delay budget to the MTSI sender, while the MTSI sender may request delay budget from an MTSI receiver.

If the MTSI sender receives available delay budget information from an MTSI receiver via DBI signaling, this delay budget is available for its uplink over the duration of at least T_DBI seconds. Thus, if an MTSI receiver has already indicated available delay budget to the MTSI sender via DBI signalling, reception of a DBI request from the MTSI sender during any time within the time window of T_DBI seconds shall not trigger any further DBI signalling from the MTSI receiver to the MTSI sender on the available delay budget at any time sooner than T_DBI seconds following the last indication of the available delay budget.

Once the period of T_DBI seconds following the last indication of the available delay budget is over, if the available delay budget has changed, the MTSI receiver shall inform the MTSI sender on the new delay budget availability (as a relative value as explained above) using DBI signalling. If the MTSI sender does not receive any new DBI signalling on the available delay budget from the MTSI receiver after the T_DBI second period is over, it shall mean the continued availability of the same amount of delay budget indicated to the MTSI sender via the latest DBI signalling.

Likewise, if the MTSI sender no longer needs the additional delay budget it has requested earlier or has a delay budget request that is different from what it had requested earlier, it shall inform the MTSI receiver about the new delay budget request (as a relative value as explained above) via DBI signalling. If the MTSI receiver does not receive any new DBI signalling on the requested delay budget from the MTSI sender after the T_DBI second period is over, this shall mean that the MTSI sender is still requesting the same amount of delay budget indicated to the MTSI receiver via the latest DBI signalling.

It should be noted that the delayBudgetReportingProhibitTimer parameter for RAN delay budget reporting as defined in TS 36.331 [160] for E-UTRA and TS 38.331 [163] for NR may take any of the values among 0, 0.4, 0.8, 1.6, 3, 6, 12 and 30 seconds, as set by the local eNB / gNB. Hence, if an MTSI receiver is to provide additional delay budget by locally adjusting air interface delay via RAN delay budget reporting (as supposed to adjusting its jitter buffer size, which can be set independently from the delayBudgetReportingProhibitTimer parameter), the frequency of its signalling to eNB / gNB is subject to the delayBudgetReportingProhibitTimer parameter. Likewise, when an MTSI sender requests delay budget from its local eNB / gNB via RAN delay budget reporting, the frequency of this signalling is subject to the delayBudgetReportingProhibitTimer parameter. Therefore, it should be observed that end-to-end delay adaptation through the use of RAN delay budget reporting and DBI signalling may be limited when the eNB / gNB sets the delayBudgetReportingProhibitTimer parameter to a large value. In particular, if delayBudgetReportingProhibitTimer is set to a value larger than T_DBI seconds, then DBI signaling cannot be used in conjunction with RAN delay budget reporting.

Provided that the delayBudgetReportingProhibitTimer configurations over the uplink and downlink access networks of the respective MTSI sender and MTSI receiver both do not exceed 3 seconds, T_DBI should be set to a value greater than or equal to the maximum of the delayBudgetReportingProhibitTimer configurations over uplink and downlink access networks. In case an MTSI receiver adjusts its jitter buffer size and does not use RAN delay budget reporting, delayBudgetReportingProhibitTimer parameter for downlink may be considered to be set to zero as part of this recommendation. Typical delayBudgetReportingProhibitTimer configurations will be in the values of 0, 0.4, 0.8, 1.6 seconds, so setting T_DBI to 1.6 seconds is recommended to operate with typical delayBudgetReportingProhibitTimer configurations.

When transcoding is present on the media path between the MTSI sender and MTSI receiver in the packet-switched domain, the end-to-end delay and quality performance enhancements realized by DBI signalling are still applicable as long as the media gateway in between passes the RTCP feedback messages carrying DBI. There may be a possible reduction however on the end-to-end performance gains, due to the additional delays incurred from transcoding.

When transcoding is present on the media path between an MTSI sender in the packet-switched domain and a media receiver in the circuit-switched domain, the end-to-end delay and quality performance enhancements realized by DBI signalling may still be applicable if the media gateway is able to offer additional delay budget, e.g., by extending its jitter buffer size, while also considering the fixed delay over the circuit-switched domain. In this case, the media gateway may receive delay budget request from the MTSI sender via DBI signalling, and the media gateway may further inform the MTSI sender about available delay budget via DBI signalling (note that no DBI signalling happens in the circuit switched domain).

In case of multiparty conferencing, DBI signalling may also be useful to improve end-to-end delay and quality performance of the RTP streams exchanged between the clients and conferencing server. In particular, an MSMTSI client (as defined in Annex S) and an MSMTSI MRF (as defined in Annex S) may negotiate DBI signalling using the SDP based procedures described in sub-clause 6.2.8. An MSMTSI client may then use DBI signalling to indicate available additional delay budget for the RTP streams received from the MSMTSI MRF and also request additional delay budget for the RTP streams it sends to the MSMTSI MRF. Likewise, an MSMTSI MRF may then use DBI signalling to indicate available additional delay budget for the RTP streams received from the MSMTSI client and also request additional delay budget for the RTP streams it sends to the MSMTSI client.

7.4 RTP payload formats for MTSI clients

7.4.1 General

This clause specifies RTP payload formats for MTSI clients, except for MTSI media gateways that is specified in clause 12.3.2, for all codecs supported by MTSI in clause 5.2. Note that each RTP payload format also specifies media type signalling for usage in SDP.

7.4.2 Speech

When the AMR codec is selected in the SDP offer-answer negotiation the AMR payload format [28] shall be used between RTP termination points.

When the AMR-WB is selected in the SDP offer-answer negotiation the AMR-WB payload format [28] shall be used between RTP termination points.

NOTE 1: It may happen that EVS AMR-WB IO encoded speech is transported using the AMR-WB payload format between an EVS-capable MTSI client and a legacy (not EVS capable) MTSI client. This may also happen after SRVCC (see Clause 12.3.4) when an EVS-capable MTSI client sends EVS AMR-WB IO encoded speech in EVS payload format to the ATGW and the ATGW then re-packetizes the EVS AMR-WB IO packet into AMR-WB payload format without performing transcoding of the media.

When the EVS codec is selected in the SDP offer-answer negotiation the EVS payload format [125] shall be used between RTP termination points.

NOTE 2: After SRVCC when a CS UE (not EVS capable) sends AMR-WB encoded speech to the ATGW, it may happen that the ATGW then re-packetizes this AMR-WB packet into the EVS payload format without performing transcoding of the media, see clause 12.3.4.

In case of ambiguity the present specification shall take precedence over RFC 4867 [28].

MTSI clients (except MTSI MGW) shall support both the bandwidth-efficient and the octet-aligned payload format of the AMR/AMR-WB payload format [28]. The bandwidth‑efficient payload format shall be preferred over the octet-aligned payload format.

When sending AMR or AMR-WB encoded media, the RTP Marker Bit shall be set according to Section 4.1 of the AMR/AMR-WB payload format [28]. When sending EVS encoded media, the RTP Marker Bit shall be set as described in the EVS payload format [125].

The MTSI clients (except MTSI MGW) should use the SDP parameters defined in table 7.1 for the session. For all access technologies, and for normal operating conditions, the MTSI client should encapsulate the number of non-redundant (a.k.a. primary) speech frames in the RTP packets that corresponds to the ptime value received in SDP from the other MTSI client, or if no ptime value has been received then according to "Recommended encapsulation" defined in table 7.1. The MTSI client may encapsulate more non-redundant speech frames in the RTP packet but shall not encapsulate more than 4 non-redundant speech frames in the RTP packets. The MTSI client may encapsulate any number of redundant speech frames in an RTP packet but the length of an RTP packet, measured in ms, shall never exceed the maxptime value.

NOTE 3: The terminology "non-redundant speech frames" refers to speech frames that have not been transmitted in any preceding packet.

Table 7.1: Encapsulation parameters (to be used as defined above)

Radio access bearer technology	Recommended encapsulation (if no ptime and no RTCP_APP_REQ_AGG has been received)	ptime	maxptime
Default	1 non-redundant speech frame per RTP packet Max 12 speech frames in total but not more than a received maxptime value requires	20	240
HSPA E-UTRAN NR	1 non-redundant speech frame per RTP packet Max 12 speech frames in total but not more than a received maxptime value requires	20	240
EGPRS	2 non-redundant speech frames per RTP packet, but not more than a received maxptime value requires Max 12 speech frames in total but not more than a received maxptime value requires	40	240
GIP	1 to 4 non-redundant speech frames per RTP packet but not more than a received maxptime value requires. Max 12 speech frames in total but not more than a received maxptime	20, 40, 60 or 80	240

NOTE 4: It is possible to send only redundant speech frames in one RTP packet.

When the radio access bearer technology is not known to the MTSI client, the default encapsulation parameters defined in Table 7.1 shall be used.

When the AMR/AMR-WB payload formats are used, the bandwidth-efficient payload format should be used unless the session setup concludes that the octet-aligned payload format is the only payload format that all parties support. The SDP offer shall include an RTP payload type where octet-align=0 is defined or where octet-align is not specified and should include another RTP payload type with octet-align=1. MTSI client offering wide-band speech shall offer these parameters and parameter settings also for the RTP payload types used for wide-band speech.

For examples of SDP offers and answers, see annex A.

The RTP payload format for DTMF events ís described in Annex G.

7.4.3 Video

The following RTP payload formats shall be used:

– H.264 (AVC) video codec RTP payload format according to RFC 6184 [25], where the interleaved packetization mode shall not be used. Receivers shall support both the single NAL unit packetization mode and the non‑interleaved packetization mode of RFC 6184 [25], and transmitters may use either one of these packetization modes.

– H.265 (HEVC) video codec RTP payload format according to [120].

7.4.4 Real-time text

The following RTP payload format shall be used:

– T.140 text conversation RTP payload format according to RFC 4103 [31].

Real-time text shall be the only payload type in its RTP stream because the RTP sequence numbers are used for loss detection and recovery. The redundant transmission format shall be used for keeping the effect of packet loss low.

Media type signalling for usage in SDP is specified in section 10 of RFC 4103 [31] and section 3 of RFC 4102 [49].

7.4.5 Coordination of Video Orientation

Coordination of Video Orientation consists in signalling of the current orientation of the image captured on the sender side to the receiver for appropriate rendering and displaying. When CVO is succesfully negotiated it shall be signalled by the MTSI client. The signalling of the CVO uses RTP Header Extensions as specified in IETF RFC 5285 [95]. The one-byte form of the header should be used. CVO information for a 2 bit granularity of Rotation (corresponding to urn:3gpp:video-orientation) is carried as a byte formatted as follows:

Bit# 7 6 5 4 3 2 1 0(LSB)
Definition 0 0 0 0 C F R1 R0

With the following definitions:

C = Camera: indicates the direction of the camera used for this video stream. It can be used by the MTSI client in receiver to e.g. display the received video differently depending on the source camera.

0: Front-facing camera, facing the user. If camera direction is unknown by the sending MTSI client in the terminal then this is the default value used.

1: Back-facing camera, facing away from the user.

F = Flip: indicates a horizontal (left-right flip) mirror operation on the video as sent on the link.

0: No flip operation. If the sending MTSI client in terminal does not know if a horizontal mirror operation is necessary, then this is the default value used.

1: Horizontal flip operation

R1, R0 = Rotation: indicates the rotation of the video as transmitted on the link. The receiver should rotate the video to compensate that rotation. E.g. a 90° Counter Clockwise rotation should be compensated by the receiver with a 90° Clockwise rotation prior to displaying.

Table 7.2: Rotation signalling for 2 bit granularity

R1	R0	Rotation of the video as sent on the link	Rotation on the receiver before display
0	0	0° rotation	None
0	1	90° Counter Clockwise (CCW) rotation or 270° Clockwise (CW) rotation	90° CW rotation
1	0	180° CCW rotation or 180° CW rotation	180° CW rotation
1	1	270° CCW rotation or 90° CW rotation	90° CCW rotation

CVO information for a higher granularity of Rotation (corresponding to urn:3GPP:video-orientation:6) is carried as a byte formatted as follows:

Bit# 7 6 5 4 3 2 1 0(LSB)
Definition R5 R4 R3 R2 C F R1 R0

where C and F are as defined above and the bits R5,R4,R3,R2,R1,R0 represent the Rotation, which indicates the rotation of the video as transmitted on the link. Table 7.3 describes the rotation to be applied by the receiver based on the rotation bits.

Table 7.3: Rotation signalling for 6 bit granularity

R1	R0	R5	R4	R3	R2	Rotation of the video as sent on the link	Rotation on the receiver before display
0	0	0	0	0	0	0° rotation	None
0	0	0	0	0	1	(360/64)° Counter Clockwise (CCW) rotation	(360/64)° CW rotation
0	0	0	0	1	0	(2*360/64)° CCW rotation	(2*360/64)° CW rotation
.	.	.	.	.	.	.	.
.	.	.	.	.	.	.	.
.	.	.	.	.	.	.	.
1	1	1	1	1	0	(62*360/64)° CCW rotation	(2*360/64)° CCW rotation
1	1	1	1	1	1	(63*360/64)° CCW rotation	(360/64)° CCW rotation

The sending MTSI client in the terminal using a camera as source and equipped with appropriate orientation sensor(s) should compute the image orientation from the sensor(s) that indicate the rotation of the device with respect to the default camera orientation. It is recommended that appropriate filtering on the time and angular domain is applied onto the sensor’s indications to prevent a "ping-pong" effect between two quantization levels in the case where the measured value is fluctuating between two quantization levels. The sending MTSI client may choose to send any orientation information not necessarily based on orientation sensor(s).

For higher granularity CVO, a terminal shall send a report at least as frequently as it would have sent a 2-bit report. A report interval shorter than this requirement should only be used when the report contains a value that differs significantly from the previous report, i.e. after taking noise removal, sensor precision, and any other relevant factors into account.

The rotation is a quantized value of the angle between the earth vertical projected onto the plane of the image as sent on the link and the image vertical. The earth vertical is a radial line starting at the center of the earth and passing through the depicted scene while the image vertical is a line passing from the middle of the bottom to the middle of the top of the image. For the case where the camera is pointing vertical or nearly vertical, the last valid value used for rotation should be used. In case there is no previous valid value, a suitable default value should be chosen.

When compensating for both rotation and flip at the receiving MTSI client, the operations shall be performed in the order of rotation compensation followed by flipping, because the order of flip and rotation operations matters when rotating 90° or 270°. The sending MTSI client shall correspondingly, when the transmitted image is both flipped and rotated, include information in the RTP Header Extension as if the transmitted image on the link was first flipped (mirrored) and then rotated, using an image perceived as upright (regardless if using portrait or landscape format) as starting point.

The MTSI client shall add the payload bytes as defined in this clause onto the last RTP packet in each group of packets which make up a key frame (I-frame or IDR frame in H.264 (AVC), or an IRAP picture in H.265 (HEVC)). The MTSI client may also add the payload bytes onto the last RTP packet in each group of packets which make up another type of frame (e.g. a P-Frame) only if the current value is different from the previous value sent.

If this is the only header extension present, a total of 8 bytes are appended to the RTP header, and the last packet in the sequence of RTP packets will be marked with both the marker bit and the Extension bit, as defined in RFC3550 [9].

When CVO is not succesfully negotiated the MTSI clients are said to be in non-CVO operation. The sender in non-CVO operation should operate as follows to compensate for image rotation and potential misalignment.

If the receiver has explicitly indicated support for both [x,y] and [y,x] resolutions via the imageattr attribute during SDP negotiation (see clause 6.2.3.3 and an example in clause A.4.6), and when video is negotiated for the session, the sender should rotate the image prior to video encoding and compensate image rotation by changing the signaled Sequence Parameter Set in the video bitstream between [x,y] and [y,x] as applicable.

If the receiver has not explicitely indicated support for both [x,y] and [y,x] resolutions via the imageattr attribute during SDP negotiation, then the sender should apply rotation/padding/cropping/resizing prior to video encoding as the sender considers appropriate while keeping the resolution unchanged. As for CVO operation, the sending MTSI client in the terminal using a camera as source and equipped with appropriate orientation sensor(s) should compute the image orientation from the output of the sensor(s) that indicates the rotation of the device with respect to the default camera orientation. It is recommended that appropriate filtering on the time and angular domain is applied onto the sensor’s indications to prevent a "ping-pong" effect in the case where the measured value is fluctuating between two quantization levels. The decision of MTSI client transmitting video to change the image size needs not necessarily be based on input from orientation sensor(s).

7.4.6 RTP Retransmission

AVPF NACK messages are used by MTSI clients to indicate non-received RTP packets for video (see clause 7.3.3). The RTP Retransmission Payload Format RFC 4588 [140] supports retransmission of lost packets based on NACK feedback. Retransmission is useful if retransmitted packets arrive within the end to end delay requirements of the system. It is suitable for low RTT networks with relatively low observed packet loss [142]. If support for RTP retransmission payload format has been negotiated, the receivers shall support handling of RTP retransmission packets defined in RFC 4588 sent using SSRC multiplexing. Similarly, senders shall use RTP retransmission packets defined in RFC 4588 for packets it retransmits using SSRC multiplexing.

7.4.7 Forward Error Correction (FEC)

Forward Error Correction (FEC) can provide effective error resiliency under certain packet loss and network RTT conditions [142]. If support for FEC is negotiated, then use of a separate SSRC multiplexed FEC stream with the RTP payload defined in [141] shall be supported at both the receiver and the sender. The receiver can demultiplex the incoming stream by the SSRC field and map it to the source by using the ssrc-group mechanism defined in RFC 5956 [143]. The systematic FEC scheme defined in [141] is a flexible parity FEC scheme that supports various signalling of source packets used to generate the parity packets.

Other types of FEC schemes may be supported. The use of a particular FEC sheme shall be negotiated before it is used.

7.4.8 Still Images

The RTP payload format for HEVC as defined in [120] shall be used for the delivery of images and image sequences.

NOTE 1: The time distance between RTP timestamps for HEVC encoded images/image sequence may be very varying and very long compared to typical HEVC encoded video, in the order of several seconds.

7.5 Media flow

7.5.1 General

This clause contains considerations on how to use media in RTP, packetization guidelines, and other transport considerations. The use of ECN for RTP sessions is also described for speech in this clause.

The general handling of bitrate variations is described in clause 7.5.5.1. Media specific handling is described in clause 7.5.5.2 for video.

7.5.2 Media specific

7.5.2.1 Speech

7.5.2.1.1 General

This clause describes how the speech media should be packetized during a session. It includes definitions both for the cases where the access type is known and one default operation for the case when the access type is not known.

Requirements for transmission of DTMF events are described in Annex G.

7.5.2.1.2 Default operation

7.5.2.1.2.1 General

When the radio access bearer technology is not known to the MTSI client, the default encapsulation parameters defined in Table 7.1 shall be used.

The codec modes and the other codec parameters (mode-change-capability, mode-change-period, mode-change-neighbor, etc), applicable for each session, are negotiated as described in clauses 6.2.2.2 and 6.2.2.3.

When transmitting AMR or AMR-WB encoded media, codec mode changes should be aligned to every other frame border and should be performed to one of the neighbouring codec modes in the negotiated mode set, except for a MTSI media gateway, see clause 12.3.1.1. In the transmitted media, the highest codec mode of the negotiated mode-set (or of all modes, if no mode-set was included in the SDP answer) should be used, unless it is restricted by the most recently received CMR. In the received media, codec mode changes shall be accepted at any frame border and to any codec mode within the negotiated mode set.

The bandwidth-efficient payload format should be used for AMR and AMR-WB encoded media unless the session setup determines that the octet-aligned payload format must be used.

The adaptation of codec mode, aggregation and redundancy is defined in clause 10.2.

7.5.2.1.2.2 Codec Mode Requests

For AMR and AMR-WB, if the highest mode within the negotiated mode-set is acceptable for media reception, then the MTSI client in terminal shall either indicate that no codec mode request is present (i.e. value 15) or shall indicate the CMR value corresponding to the highest mode within the negotiated mode-set in the CMR bits in the AMR and/or AMR-WB payload format [28] in every outgoing RTP packet. Otherwise the highest acceptable mode within the negotiated mode-set shall be sent in CMR in each outgoing RTP packet.

NOTE 1: The MTSI client sending CMR values relies on that the remote media-sender will not send media with higher codec modes than requested by CMR, after some reaction time (round trip delay). However the remote party can send with lower modes within the negotiated mode-set, because there could be other mode limiting effects in the voice path.

For AMR and AMR-WB, the MTSI client shall accept that the remote party sends with lower modes within the negotiated mode-set than requested by the CMR. The MTSI client shall follow each received CMR and shall not use higher modes in media-encoding than indicated by the most recently received CMR, while lower modes within the negotiated mode-set are allowed any time.

NOTE 2: The codec modes in media-sending and media-receiving direction may differ in general. Received CMR values have no influence on or relation to received media-frames.

The MTSI client shall accept Codec Mode Requests signalled with the CMR bits in the AMR and/or AMR-WB payload format in every incoming RTP packet.

For EVS, the CMR related procedures in subclause A.2.2.1.1 of TS 26.445 [125] apply.

If the MTSI client supports RTCP-APP packets, it shall also accept CMR in every incoming RTCP-APP packet.

The MTSI client shall follow each received CMR as soon as possible.

NOTE 3: There is no upper limit defined for the reaction time; it is expected that typically the media-sender reacts within less than 40ms after the reception of a new CMR value.

7.5.2.1.2.3 Frame aggregation and redundancy

The MTSI client should send one speech frame encapsulated in each RTP packet unless the session setup or adaptation request defines that the other MTSI client wants to receive another encapsulation variant.

The MTSI client should request to receive one speech frame encapsulated in each RTP packet but shall accept any number of frames per RTP packet up to the maximum limit of 12 speech frames per RTP packet.

For application-layer redundancy, see clause 9.2.

7.5.2.1.3 HSPA

Use default operation as defined in clause 7.5.2.1.2.

NOTE: The RLC PDU sizes defined in TR 25.993 [33] have been optimized for the codec modes, payload formats and frame encapsulations defined in the default operation in clause 7.5.2.1.2.

7.5.2.1.4 EGPRS

Use default operation as defined in clause 7.5.2.1.2, except that the MTSI client in terminal

– should send two speech frames encapsulated in each RTP packet unless the session setup or adaptation request defines that the other PS end-point want to receive another encapsulation variant;

– should request receiving two speech frames encapsulated in each RTP packet but shall accept any number of frames per RTP packet up to the maximum limit of 12 speech frames per RTP packet.

7.5.2.1.5 GIP

Use default operation as defined in clause 7.5.2.1.2, except that the MTSI client in terminal:

– should send 0, 1, 2, 3 or 4 non-redundant speech frames encapsulated in each RTP packet unless the session setup or adaptation request defines that other PS end-point want to receive another encapsulation variant;

– should request receiving 1 to 4 speech frames in each RTP packet but shall accept any number of frames per RTP packet up to the maximum limit of 12 speech frames per RTP packet;

– may use application layer redundancy, in which case the MTSI client in terminal may encapsulate up to 12 speech frames in each RTP packet, with a maximum of four non-redundant speech frames.

7.5.2.1.6 Initial codec mode for AMR and AMR-WB

To avoid congestion on the link and to improve inter-working with CS GERAN when AMR or AMR-WB is used and when more than one codec mode is allowed in the session, the MTSI client in terminal should limit the initial codec mode (ICM) to one of the lowest codec modes for an Initial Waiting Time from the beginning of the RTP stream, or until it receives one of the following:

– a frame-block with rate control information; or:

– an RTCP message with rate control information; or:

– reception quality feedback information, e.g. PLR or jitter in RTCP Sender Reports or Receiver Reports, indicating that the currently used codec mode is too high for the current operating condition.

The value for the Initial Waiting Time is 600 ms when ECN is not used and 500 ms when ECN is used, unless configured differently by the MTSI Media Adaptation Management as described in Clause 17.

The rate control information can either be: a CMR with a value other than ‘15’ in the RTP payload; or a CMR with a value other than ‘15’ in an RTCP_APP message (see Clause 10.2.1).

NOTE 1: A CMR with a value of ‘15’ means that no mode request is present [28].

If no rate control information is received within the Initial Waiting Time, then the sending MTSI client in terminal should gradually increase the codec mode from the ICM towards the highest codec mode allowed in the session. While not detecting poor transmission performance or not receiving rate control information, the sending MTSI client in terminal should use step-wise up-switch to avoid introducing congestion during the upwards adaptation. The step-wise up-switch should be performed by switching to the next higher codec mode in the allowed mode set and then waiting for an Initial Up-switch Waiting Time before each subsequent up-switch until the first down-switch occurs.

The value for the Initial Up-switch Waiting Time is 600 ms when ECN is not used and 500 ms when ECN is used, unless configured differently by the MTSI Media Adaptation Management as described in Clause 17.

The following rules can be used for determining the ICM:

– If 1 codec mode is included in the mode-set then this should be the ICM.

– If 2 or 3 codec modes are included in the mode-set then the ICM should be the codec mode with the lowest rate.

– If 4 or more codec modes are included in the mode-set then the ICM should be the codec mode with the 2^nd lowest rate.

NOTE 2: Without ECN, the Initial Waiting Time needs to be long enough to allow the receiver to collect relilable statistics for the adaptation, e.g. for PLR-triggered or jitter-triggered adaptation. With ECN, a congested network can immediately mark IP packets with ECN-CE, which allows the ECN-triggered adaptation react sooner. The Initial Waiting Time can therefore be shorter when ECN is used. The same applies for the Initial Up-switch Waiting Time.

7.5.2.1.7 E-UTRAN and NR

Use the default operation as defined in Clause 7.5.2.1.2.

7.5.2.1.8 Initial codec mode for EVS

When the EVS AMR-WB IO mode is used from the start of the session, the Initial Codec Mode (ICM) should be selected as defined in Clause 7.5.2.1.6 for AMR-WB.

When EVS Primary mode is used from the start of the session, the following principles apply for the selection of the Initial Codec Mode bit-rate (ICMbr):

– If GBR is known and if GBR is less than MBR, the ICMbr should be aligned with the GBR or should be lower than GBR.

When EVS Primary mode is used from the start of the session, the Initial Codec Mode audio bandwidth (ICMab) should be the highest audio bandwidth negotiated for the Initial Codec Mode bit-rate (ICMbr).

7.5.2.1.9 Dual-mono

An MTSI client may support dual-mono operation for EVS.

The packetization of dual-mono for EVS is described in [125]. When the EVS Primary mode is used for dual-mono encoding, the Header-full format must be used for all RTP packets.

When offering dual-mono for an RTP payload type number, the number of channels is set to 2, see SDP example in Annex A.14.

7.5.2.2 Video

An MTSI client should follow general strategies for error-resilient coding (segmentation) and packetization as specified by each codec [24][119] and RTP payload format [25][120] specification. Further guidelines on how the video media data should be packetized during a session are provided in this clause.

Coded pictures should be encoded into individual segments:

– For H.264 (AVC), a slice corresponds to such a segment.

– For H.265 (HEVC), a slice segment corresponds to such a segment.

Each individual segment should be encapsulated in one RTP packet. Each RTP packet should be smaller than the Maximum Transfer Unit (MTU) size.

NOTE 1: Unnecessary video segmentation, e.g. within RTP packets, may reduce coding efficiency.

NOTE 2: RTP packet fragmentation, e.g. across UDP boundaries, may decrease transport overhead and reduce error robustness. Hence, packet size granularity is a trade-off between error robustness and overhead that may be tuned according to bearer access characteristics if available.

NOTE 3: In most cases, the MTU-size has a direct relationship with the bearer of the radio network.

7.5.2.3 Text

Real-time text is intended for human conversation applications. Text shall not be transferred with higher rate than 30 characters per second (as defined for cps in section 6 of RFC 4103 [31]). A text-capable MTSI client shall be able to receive text with cps set up to 30.

7.5.3 Media synchronization

7.5.3.1 General

RTCP SR shall be used for media synchronization by setting the NTP and RTP timestamps according to RFC 3550 [9]. To enable quick media synchronization when a new media component is added, or an MTSI session is initiated, the RTP sender should send RTCP Sender Reports for all newly started media components as early as possible.

NOTE: An MTSI sender can signal in SDP that no synchronization between media components is required. See clause 6.2.6 and clause A.7.

7.5.3.2 Text

The media synchronization requirements for real-time text are relaxed. A synchronization error between text and other media of a maximum of 3 seconds is accepted. Since this is longer than the maximum accepted latency, no specific methods need to be applied to assure to meet the requirement

7.5.4 ECN usage in RTP sessions

Once the ECN negotiation has been completed as defined in [84], then only ECT(0) shall be used when marking packets with ECT, [83]. When ECN is used for an RTP stream then the sending MTSI client shall mark every packet with ECT until the end of the session or until the session is re-negotiated to no longer use ECN. The leap-of-faith method is used for the ECN initiation.

Handling of ECN Congestion Experience (ECN-CE) marked packets is described in clause 10.

7.5.5 Handling of bit-rate variations

7.5.5.1 General

An MTSI client in terminal using variable bitrate encoding shall ensure for speech, and should ensure for video, that the transmitted bandwidth does not exceed the negotiated bandwidth (b=AS) nor the QoS bandwidth parameters (MBR, GBR) for the bearer, if defined and known. This can, in general, be done in two ways: either by controlling the bit-rate used for encoding each media frame or by controlling when the generated packets are transmitted.

NOTE 1: This bit-rate control is not to be confused with rate adaptation, which is described in clause 10. The main purpose of the bitrate control is to reduce the risk that policing functions in the network drop media packets since the bandwidth of packets is temporarily exceeding the allowed bandwidth, which is typically configured in a static fashion. Rate adaptation is instead used to ensure that the used bandwidth is sufficiently low to avoid packet losses and extended delays when operating conditions change.

NOTE 2: Controlling when packets are transmitted is called "packet pacing". The drawback with packet pacing is the increase of end-to-end delay. Excessive use of packet pacing can lead to not fulfilling the service requirements defined in TS 22.105 [34]. Packet pacing therefore needs to be used with care.

The method to control the encoding bitrate and/or the transmission of packets is left for the discretion of the implementation. However, the average bandwidth of transmitted packets shall be calculated over a sliding window that is no longer than T seconds. Different media types may use different window lengths. The default window length T is 2 seconds if nothing else is specified below.

The bitrate control ensures that the average bandwidth of transmitted media packets does not exceed the maximum allowed bandwidth in the sending direction.

The maximum allowed bandwidth for the sending direction is the smaller of:

– The b=AS bandwidth,

– The MBR for the bearer, after compensating for the RTCP bandwidth, if known.

7.5.5.2 Video

Video encoders use variable bitrate encoding to ensure a sufficiently low average bit-rate, which allow temporarily encoding a few frames at high bit-rate to match the video content with high motion activity or complex scenes, or to send an Intra frame, e.g. , when a Full Intra Request (FIR) is received. The bitrate control should ensure that the average bandwidth of transmitted media packets does not exceed the maximum allowed bandwidth in the sending direction.

The bit-rate control requires managing the following properties:

– The size of large frame to transmit;

– The proportion of neighboring frames that can be reduced in size;

– The length of averaging window.

When any two of the properties are known, it is possible to determine the third.

TR 26.924 [144] Annex A describes a method for determining the length of the averaging window from the other two properties. The same method can also be used to determine the portion of the bit-rate for frames neighboring the large frame, which needs to be reduced depending on the size of the frame and the length of the averaging window.

The recommended procedure consists of reducing the encoding bit-rate of the frames neighboring the large frame in a balanced fashion, taking into account the spatio-temporal tradeoff of video quality when the encoding parameters such as quantization step or frame rate are adjusted. This may even require dropping some frames before they are encoded.

When a FIR is received, the MTSI client in terminal may delay the generation and transmission of an Intra frame by up to a half of the averaging window.

7.5.5.3 Speech

When speech is operated in a Source-Controlled Variable Bit Rate mode (e.g., EVS 5.9VBR), both the GBR and MBR need to be set at least as high as the highest rate used by the codec in this mode (e.g., 8.0kbps for EVS 5.9VBR). These rates may be set higher for a session if, in addition to the VBR mode, a higher rate codec mode is also negotiated for the session. Therefore, packet policing on the MBR or GBR will not prevent the transport of VBR media packets regardless of the averaging window used.