5 Media codecs

26.1143GPPIP Multimedia Subsystem (IMS)Media handling and interactionMultimedia telephonyRelease 18TS

5.1 Media components

The Multimedia Telephony Service for IMS supports simultaneous transfer of multiple media components with real-time characteristics. Media components denote the actual components that the end-user experiences.

The following media components are considered as core components. Multiple media components (including media components of the same media type) may be present in a session. At least one of the first three of these components is present in all conversational multimedia telephony sessions.

– Speech: The sound that is picked up by a microphone and transferred from terminal A to terminal B and played out in an earphone/loudspeaker. Speech includes detection, transport and generation of DTMF events.

– Video: The moving image that is, for example, captured by a camera of terminal A, transmitted to terminal B and, for example, rendered on the display of terminal B.

– Text: The characters typed on a keyboard or drawn on a screen on terminal A and rendered in real time on the display of terminal B. The flow is time-sampled so that no specific action is needed from the user to request transmission.

– Data: Any other data for real-time interaction, closely related to the multimedia telephony session that may be generated or consumed by either one of terminal A or terminal B, possibly via terminal external connections and/or physical connectors, optionally processed by application-specific logic at one or both terminals, and optionally presented on and controlled by the user interface at one or both terminals.

The first three of the above core media components are transported in real time from one MTSI client to the other using RTP (IETF RFC 3550 [9]). The "data" media component for real-time interaction is transported using SCTP (IETF RFC 4960 [173]) over DTLS (IETF RFC 8261 [174]), as described by WebRTC data channels [175]. All media components can be added or dropped during an ongoing session as required either by the end-user or by controlling nodes in the network, assuming that when adding components, the capabilities of the MTSI client support the additional component.

NOTE: The terms voice and speech are synonyms. The present document uses the term speech. The media type is called "audio" in SDP and therefore also the term "audio" is used as synonym.

MTSI specifications also support other media types than the core components described above, for example facsimile (fax) transmission.

Facsimile transmission is described in Annex L.

5.2 Codecs for MTSI clients in terminals

5.2.1 Speech

5.2.1.1 General codec requirements

MTSI clients in terminals offering speech communication shall support narrowband, wideband and super-wideband communication. The only exception to this requirement is for the MTSI client in constrained terminal offering speech communication, in which case the MTSI client in constrained terminal shall support narrowband and wideband, and should support super-wideband communication.

In addition, MTSI clients in terminals offering speech communication shall support:

– .AMR speech codec (TS 26.071 [11], TS 26.090 [12], TS 26.073 [13] and TS 26.104 [14]) including all 8 modes and source controlled rate operation ‎TS 26.093 [15]. The MTSI client in terminal shall be capable of operating with any subset of these 8 codec modes. More detailed codec requirements for the AMR codec are defined in clause 5.2.1.2.

MTSI clients in terminals offering wideband speech communication at 16 kHz sampling frequency shall support:

– AMR-WB codec (TS 26.171 ‎‎[17], TS 26.190 ‎[18], TS 26.173 ‎[19] and TS 26.204 [20]) including all 9 modes and source controlled rate operation ‎TS 26.193 [21]. The MTSI client in terminal shall be capable of operating with any subset of these 9 codec modes. More detailed codec requirements for the AMR-WB codec are defined in clause 5.2.1.3. When the EVS codec is supported, the EVS AMR-WB IO mode may serve as an alternative implementation of AMR-WB as defined in clause 5.2.1.4.

MTSI clients in terminals offering super-wideband or fullband speech communication shall support:

– EVS codec ( TS 26.441 [121], TS 26.444 [124], TS 26.445 [125], TS 26.447 [127], TS 26.451 [131], TS 26.442 [122], TS 26.452 [165] and TS 26.443 [123]) as described below including functions for backwards compatibility with AMR-WB ( TS 26.446 [126]) and discontinuous transmission ( TS 26.449 [129] and TS 26.450 [130]). More detailed codec requirements for the EVS codec are defined in clause 5.2.1.4.

Encoding of DTMF is described in Annex G.

5.2.1.2 Detailed codec requirements, AMR

When transmitting, the MTSI client in terminal shall be capable of aligning codec mode changes to every frame border, and shall also be capable of restricting codec mode changes to be aligned to every other frame border, e.g. like UMTS_AMR_2 (TS 26.103 [16]). The MTSI client in terminal shall also be capable of restricting codec mode changes to neighbouring codec modes within the negotiated codec mode set. When receiving, the MTSI client in terminal shall allow codec mode changes at any frame border and to any codec mode within the negotiated codec mode set.

The codec modes and the other codec parameters (mode-change-capability, mode-change-period, mode-change-neighbor, etc), applicable for each session, are negotiated as described in clauses 6.2.2.2 and 6.2.2.3.

5.2.1.3 Detailed codec requirements, AMR-WB

When transmitting, the MTSI client in terminal shall be capable of aligning codec mode changes to every frame border, and shall also be capable of restricting codec mode changes to be aligned to every other frame border, e.g. like UMTS_AMR_WB‎ (TS 26.103 [16]). The MTSI client in terminal shall also be capable of restricting codec mode changes to neighbouring codec modes within the negotiated codec mode set. When receiving, the MTSI client in terminal shall allow codec mode changes at any frame border and to any codec mode within the negotiated codec mode set.

The codec modes and the other codec parameters (mode-change-capability, mode-change-period, mode-change-neighbor, etc), applicable for each session, are negotiated as described in clauses 6.2.2.2 and 6.2.2.3.

5.2.1.4 Detailed codec requirements, EVS

When the EVS codec is supported, the MTSI client in terminal may support dual-mono encoding and decoding.

When the EVS codec is supported, EVS AMR-WB IO may serve as an alternative implementation of the AMR-WB codec, [125]. In this case, the requirements and recommendations defined in this specification for the AMR-WB codec also apply to EVS AMR-WB IO.

NOTE: The DTX operation of EVS Primary and AMR-WB IO can be configured in sending direction with either a fixed SID update interval (from 3 to 100 frames) or an adaptive SID update interval – more details can be found in clauses 4.4.3 and 5.6.1.1 of TS 26.445 [125]. Implementers of MTSI clients are advised to take into account this SID flexibility of EVS.

5.2.1.5 Offering multiple audio bandwidths and multiple channels

MTSI clients in terminals offering wideband speech communication shall also offer narrowband speech communications.

When offering super-wideband speech, both wideband speech and narrowband speech shall also be offered. When offering fullband speech, super-wideband speech, wideband speech and narrowband speech shall also be offered.

MTSI clients in terminals offering dual-mono, shall also offer mono.

5.2.1.6 Codec preference order

When offering both wideband speech and narrowband speech communication, payload types offering wideband shall be listed before payload types offering only narrowband speech in the ‘m=’ line of the SDP offer (RFC 4566 [8]).

When offering super-wideband speech, wideband and narrowband speech communication, payload types offering super-wideband shall be listed before payload types offering lower bandwidths than super-wideband speech in the ‘m=’ line of the SDP offer (RFC 4566 [8]).

For an MTSI client in terminal supporting EVS the following rules apply when creating the list of payload types on the m= line:

– When the EVS codec is offered for NB by an MTSI client in terminal supporting NB only, it shall be listed before other NB codecs.

– When the EVS codec is offered for up to WB, it shall be listed before other WB codecs.

When dual-mono is offered then this may be preferable over mono depending on the call scenario.

5.2.2 Video

MTSI clients in terminals offering video communication shall support:

– H.264 (AVC) [24] Constrained Baseline Profile (CBP) Level 1.2;

– H.265 (HEVC) [119] Main Profile, Main Tier, Level 3.1. The only exception to this requirement is for the MTSI client in constrained terminal offering video communication, in which case the MTSI client in constrained terminal should support H.265 (HEVC) Main Profile, Main Tier, Level 3.1.

In addition they should support: – H.264 (AVC) [24] Constrained High Profile (CHP) Level 3.1.

For backwards compatibility to previous releases, if H.264 (AVC) [24] Constrained High Profile Level 3.1 is supported, then H.264 (AVC) [24] Constrained Baseline Profile (CBP) Level 3.1 should also be offered.

H.264 (AVC) shall be used without requirements on output timing conformance (annex C of [24]). Each sequence parameter set of H.264 (AVC) shall contain the vui_parameters syntax structure including the num_reorder_frames syntax element set equal to 0.

H.265 (HEVC) Main Profile shall be used with general_progressive_source_flag equal to 1, general_interlaced_source_flag equal to 0, general_non_packed_constraint_flag equal to 1, general_frame_only_constraint_flag equal to 1, and sps_max_num_reorder_pics[ i ] equal to 0 for all i in the range of 0 to sps_max_sub_layers_minus1, inclusive, without requirements on output timing conformance (annex C of [119]).

For both H.264 (AVC) and H.265 (HEVC), the decoder needs to know the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS) to be able to decode the received video packets. A compliant H.265 (HEVC) bitstream must include a Video Parameter Set (VPS), although the VPS may be ignored by the decoder in the context of the present specification. When H.264 (AVC) or H.265 (HEVC) is used it is recommended to transmit the parameter sets within the SDP description of a stream, using the relevant MIME/SDP parameters as defined in RFC6184 [25] for H.264 (AVC) and in [120] for H.265 (HEVC), respectively. Each media source (SSRC) shall transmit the currently used parameter sets at least once in the beginning of the RTP stream before being referenced by the encoded video data to ensure that the parameter sets are available when needed by the receiver. If the video encoding is changed during an ongoing session such that the previously used parameter set(s) are no longer sufficient then the new parameter sets shall be transmitted at least once in the RTP stream prior to being referenced by the encoded video data to ensure that the parameter sets are available when needed by the receiver. When a specific version of a parameter set is sent in the RTP stream for the first time, it should be repeated at least 3 times in separate RTP packets with a single copy per RTP packet and with an interval not exceeding 0.5 seconds to reduce the impact of packet loss. A single copy of the currently active parameter sets shall also be part of the data sent in the RTP stream as a response to FIR. Moreover, it is recommended to avoid using a sequence or picture parameter set identifier value during the same session to signal two or more parameter sets of the same type having different values, such that if a parameter set identifier for a certain type is used more than once in either SDP description or RTP stream, or both, the identifier always indicates the same set of parameter values of that type.

The video decoder in a multimedia MTSI client in terminal shall either start decoding immediately when it receives data, even if the stream does not start with an IDR/IRAP access unit (IDR access unit for H.264, IRAP access unit for H.265) or alternatively no later than it receives the next IDR/IRAP access unit or the next recovery point SEI message, whichever is earlier in decoding order. The decoding process for a stream not starting with an IDR/IRAP access unit shall be the same as for a valid video bit stream. However, the MTSI client in terminal shall be aware that such a stream may contain references to pictures not available in the decoded picture buffer. The display behaviour of the MTSI client in terminal is out of scope of the present document.

An MTSI client in terminal offering H.264 (AVC) CBP support at a level higher than Level 1.2 shall support negotiation to use a lower Level as described in [25] and [58].

An MTSI client in terminal offering H.264 (AVC) CHP support at a level higher than Level 3.1 shall support negotiation to use a lower Level as described in [25] and [58].

An MTSI client in terminal offering video support shall include in the SDP offer H.264 CBP at Level 1.2 or higher.

An MTSI client in terminal offering video support for H.265 (HEVC) [119] Main Profile, Main Tier, Level 3.1, should normally set it to be preferred.

An MTSI client in terminal offering H.265 (HEVC) shall support negotiation to use a lower Level than the one in the offer, as described in [120] and [58].

If a codec is supported at a certain level, then all (hierarchically) lower levels shall be supported as well.

NOTE 1: An example of a lower level than Level 1.2 is Level 1 for H.264 (AVC) Constrained Baseline Profile.

NOTE 2: All levels are minimum requirements. Higher levels may be supported and used for negotiation.

NOTE 3: MTSI clients in terminals may use full-frame freeze and full-frame freeze release SEI messages of H.264 (AVC) to control the display process. For H.265 (HEVC), MTSI clients may set the value of pic_output_flag in the slice segment headers to either 0 or 1 to control the display process.

NOTE 4: An H.264 (AVC) encoder should code redundant slices only if it knows that the far-end decoder makes use of this feature (which is signalled with the redundant-pic-cap MIME/SDP parameter as specified in RFC 6184 [25]). H.264 (AVC) encoders should also pay attention to the potential implications on end‑to‑end delay. The redundant slice header is not supported in H.265 (HEVC).

NOTE 5: If a codec is supported at a certain level, it implies that on the receiving side, the decoder is required to support the decoding of bitstreams up to the maximum capability of this level. On the sending side, the support of a particular level does not imply that the encoder will produce a bitstream up to the maximum capability of the level. This method can be used to set up an asymmetric video stream. For H.264 (AVC), another method is to use the SDP parameters ‘level-asymmetry-allowed’ and ‘max-recv-level’ that are defined in the H.264 payload format specification, [25]. For H.265 (HEVC) it is possible to use the SDP parameter ‘max-recv-level-id’ defined in the H.265 payload format specification, [120], to indicate a higher level in the receiving direction than in the sending direction. See also clause 6.2.3.2, Annex A.4.5 for SDP examples with asymmetric video using H.264 (AVC) and Annex A.4.8 for SDP examples with asymmetric video using both H.264 (AVC) and H.265 (HEVC). Other methods for asymmetric video transmission are also possible.

NOTE 6: If video is used in a session, an MTSI client in terminal should offer at least one video stream with a picture aspect ratio in the range from 0.7 to 1.4. For all offered video streams, the width and height of the picture should be integer multiples of 16 pixels. For example, 224×176, 272×224, and 320×240 are image sizes that satisfy these conditions.

NOTE 7: For H.264 (AVC) and H.265 (HEVC), respectively, multiple sequence and picture parameter sets can be defined, as long as they have unique parameter set identifiers, but only one sequence and picture parameter set can be active between two consecutive IDRs and IRAPs, respectively.

NOTE 8: For H.264 (AVC), Constrained High Profile (CHP) Level 3.1 is not required to be supported as it is less bit rate efficient than H.265 (HEVC) Main Profile, Main Tier, Level 3.1. However, it is recommended for interoperability.

5.2.3 Real-time text

MTSI clients in terminals offering real time text conversation shall support:

– ITU-T Recommendation T.140 [26] and [27].

T.140 specifies coding and presentation features of real-time text usage. Text characters are coded according to the UTF-8 transform of ISO 10646-1 (Unicode).

A minimal subset of the Unicode character set, corresponding to the Latin-1 part shall be supported, while the languages in the regions where the MTSI client in terminal is intended to be used should be supported.

Presentation control functions from ISO 6429 are allowed in the T.140 media stream. A mechanism for extending control functions is included in ITU-T Recommendation T.140 [26] and [27]. Any received non-implemented control code must not influence presentation.

A MTSI client in terminal shall store the conversation in a presentation buffer during a call for possible scrolling, saving, display re-arranging, erasure, etc. At least 800 characters shall be kept in the presentation buffer during a call.

Note that erasure (backspace) of characters is included in the T.140 editing control functions. It shall be possible to erase all characters in the presentation buffer. The display of the characters in the buffer shall also be impacted by the erasure.

5.2.4 Still Images

MTSI clients supporting still images shall support HEVC encoded encoded images conforming to the HEVC bitstream requirements of 5.2.2.

Still images encoded using the HEVC shall have general_progressive_source_flag equal to 1, general_interlaced_source_flag equal to 0, general_non_packed_constraint_flag equal to 1, general_frame_only_constraint_flag equal to 1.

For HEVC encoded images/image sequence, the display properties are carried as SEI and VUI within the bitstream, and the RTP timestamps determine the presentation time of the images.