7 Codecs

26.2343GPPProtocols and codecsRelease 17Transparent end-to-end Packet-switched Streaming Service (PSS)TS

7.1 General

For PSS clients supporting a particular media type, corresponding media decoders are specified in the following clauses.

7.2 Speech

If speech is supported, the AMR decoder shall be supported for narrow-band speech [18][63][64][65]. The AMR wideband speech decoder, [20][66][67][68], shall be supported when wideband speech working at 16 kHz sampling frequency is supported.

7.3 Audio

If audio is supported, then one or both of the following two audio decoders should be supported:

– Enhanced aacPlus [86] [87] [88]

– Extended AMR-WB [82] [83] [84]

Specifically, based on the audio codec selection test results Extended AMR-WB is strong for the scenarios marked with blue, Enhanced aacPlus is strong for the scenarios marked with orange, and both are strong for the scenarios marked with green colour in the table below:

Content type

Bit rate

Music

Speech over Music

Speech between Music

Speech

14 kbps mono

18 kbps stereo

24 kbps stereo

24 kbps mono

32 kbps stereo

48 kbps stereo

More recent information on the performance of the codecs based on more recent versions of the codecs can be found in TR 26.936 [95].

Enhanced aacPlus decoder is also able to decode AAC-LC content.

Extended AMR-WB decoder is also able to decode AMR-WB content.

In addition, MPEG-4 AAC Low Complexity (AAC-LC) and MPEG‑4 AAC Long Term Prediction (AAC-LTP) object type decoders [21] may be supported. The maximum sampling rate to be supported by the decoder is 48 kHz. The channel configurations to be supported are mono (1/0) and stereo (2/0).

When a server offers an AAC-LC or AAC-LTP stream with the specified restrictions, it shall include the "profile-level-id" and "object" MIME parameters in the SDP "a=fmtp" line. The following values shall be used:

Object Type

profile-level-id

object

AAC-LC

15

2

AAC-LTP

15

4

7.3a Synthetic audio

If a PSS client supports synthetic audio both the Scalable Polyphony MIDI (SP-MIDI) content format defined in Scalable Polyphony MIDI Specification [44] and the device requirements defined in Scalable Polyphony MIDI Device 5-to-24 Note Profile for 3GPP [45] should be supported.

SP-MIDI content is delivered in the structure specified in Standard MIDI Files 1.0 [46], either in format 0 or format 1.

In addition a PSS client supporting synthetic audio should also support both the Mobile DLS instrument format defined in [70] and the Mobile XMF content format defined in [71].

A PSS client supporting Mobile DLS shall meet the minimum device requirements defined in [70] in section 1.3 and the requirements for the common part of the synthesizer voice as defined in [70] in sections 1.2.1.2. If Mobile DLS is supported, wavetables encoded with the G.711 A-law codec (wFormatTag value 0x0006, as defined in [70]) shall also be supported. The optional group of processing blocks as defined in [70] may be supported. Mobile DLS resources are delivered either in the file format defined in [70], or within Mobile XMF as defined in [71]. For Mobile DLS files delivered outside of Mobile XMF, the loading application should unload Mobile DLS instruments so that the sound bank required by the SP-MIDI profile [45] is not persistently altered by temporary loadings of Mobile DLS files.

Content that pairs Mobile DLS and SP-MIDI resources is delivered in the structure specified in Mobile XMF [71]. As defined in [71], a Mobile XMF file shall contain one SP-MIDI SMF file and no more than one Mobile DLS file. PSS clients supporting Mobile XMF must not support any other resource types in the Mobile XMF file. Media handling behaviours for the SP-MIDI SMF and Mobile DLS resources contained within Mobile XMF are defined in [71].

7.4 Video

7.4.1 General video decoder requirements

If a PSS client supports video, the following applies:

– H.264 (AVC) Progressive High Profile Level 3.1 decoder [90] shall be supported, wherein the maximum VCL Bit Rate is constrained to be 14Mbps with cpbBrVclFactor and cpbBrNalFactor being fixed to be 1000 and 1200, respectively.

– H.265 (HEVC) Main Profile, Main Tier, Level 3.1 decoder [117] should be supported.

When H.265 (HEVC) Main Profile decoder is supported, the client is only required to process H.265 (HEVC) Main Profile bitstreams that have general_progressive_source_flag equal to 1, general interlaced_source_flag equal to 0, general_non_packed_constraint_flag equal to 1, and general_frame_only_constraint_flag equal to 1.

NOTE 1: An H.264 (AVC) High Profile decoder is able to decode an H.264 (AVC) Main Profile stream that is progressively encoded.

7.4.2 Output timing and buffer model

There are no requirements on output timing conformance for H.264 (AVC) decoding (Annex C of [90]) or H.265 (HEVC) decoding (Annex C of [117]).

The H.264 (AVC) decoder in a PSS client shall start decoding immediately when it receives data (even if the stream does not start with an IDR access unit), or alternatively no later than it receives the next IDR access unit or the next recovery point SEI message, whichever is earlier in decoding order. Note that when the interleaved packetization mode of H.264 (AVC) is in use, de-interleaving is done normally before starting the decoding process. The decoding process for a stream not starting with an IDR access unit shall be the same as for a valid H.264 (AVC) bitstream. However, the client shall be aware that such a stream may contain references to pictures not available in the decoded picture buffer. The display behaviour of the client is out of scope of this specification.

A PSS client supporting H.264 (AVC) should ignore any VUI HRD parameters, buffering period SEI message, and picture timing SEI message in H.264 (AVC) streams or conveyed in the "sprop-parameter-sets" MIME/SDP parameter. Instead, a PSS client supporting H.264 (AVC) shall follow buffering parameters conveyed in SDP, as specified in clause 5.3.3.2, and in RTSP, as specified in clause 5.3.2.4. A PSS client supporting H.264 (AVC) shall also use the RTP timestamp or NALU-time (as specified in [92]) of a picture as its presentation time, and, when the interleaved RTP packetization mode is in use, follow the "sprop-interleaving-depth", "sprop-deint-buf-req", "sprop-init-buf-time", and "sprop-max-don-diff" MIME/SDP parameters for the de-interleaving process. However, if VUI HRD parameters, buffering period SEI messages, and picture timing SEI messages are present in the bitstream, their contents shall not contradict any of the buffering parameters conveyed in SDP, as specified in clause 5.3.3.2, or in RTSP, as specified in clause 5.3.2.4, or any of the timing information conveyed by the RTP timestamps.

A PSS client supporting H.265 (HEVC) should ignore any VPS or SPS HRD parameters, buffering period SEI message, picture timing SEI message, and decoding unit information SEI message in H.265 (HEVC) streams or conveyed in the "sprop-vps" and "sprop-sps" MIME parameters. Instead, a PSS client supporting H.265 (HEVC) shall follow buffering parameters conveyed in SDP, as specified in clause 5.3.3.2, and in RTSP, as specified in clause 5.3.2.4. A PSS client supporting H.265 (HEVC) shall also use the RTP timestamp (as specified in [118]) of a picture as its presentation time, and, follow the de-packetization process (as specified in [118]). However, if VPS or SPS HRD parameters, buffering period SEI messages, picture timing SEI messages, and decoding unit information SEI messages are present in the bitstream, their contents shall not contradict any of the buffering parameters conveyed in SDP, as specified in clause 5.3.3.2, or in RTSP, as specified in clause 5.3.2.4, or any of the timing information conveyed by the RTP timestamps.

7.4.3 Television services

If the 3GPP PSS client supports Television (TV) over 3GPP Services, it shall comply with

– the H.264/AVC 720p HD Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.4.2.6.

If the 3GPP PSS client supports Television (TV) over 3GPP Services, it should comply with

– H.264/AVC Full HD Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.4.3.6,

– H.265/HEVC 720p HD Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.5.2.7,

– H.265/HEVC Full HD Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.5.3.7,

– H.265/HEVC UHD Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.5.4.7,

– H.265/HEVC Full HD HDR Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.5.5.8, and

– H.265/HEVC UHD HDR Operation Point Receiver requirements as specified in TS 26.116 [120], clause 4.5.6.8.

7.4.4 Stereoscopic 3D Video

If a PSS client supports stereoscopic 3D video, it should support frame-packed stereoscopic 3D video with the following characteristics:

– The bitstream conforms to H.264 (AVC) Constrained Baseline Profile Level 1.3, or conforms to H.264 (AVC) Progressive High Profile Level 3.1. The maximum VCL Bit Rate shall be constrained to 14Mbps by cpbBrVclFactor & cpbBrNalFactor being fixed to 1000 and 1200 respectively, irrespective of the profile.

– Frame packing type is indicated by the frame packing arrangement SEI messages of H.264 (AVC) [90] as follows:

– The syntax element frame_packing_arrangement_type has one of the defined values: 3 for Side-by-Side, 4 for Top-and-Bottom.

– The syntax element quincunx_sampling_flag is equal to 0;

– The syntax element content_interpretation_type is equal to 1;

– The syntax elements spatial_flipping_flag is equal to 0;

– The syntax element field_views_flag is equal to 0;

– The syntax element current_frame_is_frame0_flag is equal to 0;

– When an access unit contains a frame packing arrangement SEI message A and the access unit is neither an IDR access unit nor an access unit containing a recovery point SEI message, the following two constraints apply:

– There shall be another access unit that precedes the access unit in both decoding order and output order and that contains a frame packing arrangement SEI message B.

– The two frame packing arrangement SEI messages A and B shall have the same value for the syntax element frame_packing_arrangement_type.

If a PSS client supports frame-packed stereoscopic 3D video, it shall support parsing of frame packing arrangement SEI messages as specified in H.264 (AVC) [90].

If a PSS client supports stereoscopic 3D video, it should support multiview stereoscopic 3D video with the following characteristics:

– The video stream conforms to ITU-T Recommendation H.264 / MPEG-4 (Part 10) AVC [90] Stereo High Profile (SHP) Level 3.1 with frame_mbs_only_flag=1.

– When an H.264 (AVC) SHP sub-bitstream containing the base view only conforms to Level 1.3 or below, the value of the profile_idc should be equal to 66 and the value of the constraint_set1_flag should be equal to 1 in all active sequence parameter sets, i.e. the H.264 (AVC) Constrained Baseline Profile should be indicated to be used for the base view.

NOTE 1: Any PSS (Release 11) client supporting video can play back the base view of any H.264 (AVC) SHP stream if the base view is indicated to conform to H.264 (AVC) Constrained Baseline Profile Level 1.3.

NOTE 2: At the time of publication of this specification there is no RTP payload format specified for H.264 (AVC) SHP bitstreams.Thus, the format is not available for services utilizing RTP transport of media.

NOTE 3: In order to provide the optimal range of perceived depth, a PSS server is recommended to adapt the stereoscopic 3D video content to fit the indicated screen size of a target device. For more information and an example steps to perform stereoscopic video content re-targeting see [113].

7.4.5 360 video and 3D audio for VR (Virtual Reality)

7.4.5.1 Video

7.4.5.1.1 Operation Points

If the PSS client supports 360 VR video, it shall include a receiver that complies with

– the Basic H.264/AVC Operation Point Receiver requirements as specified in TS 26.118 [121], clause 5.1.4.

If the PSS client supports 360 VR video, it should include a receiver that complies with

– the Main H.265/HEVC Operation Point Receiver requirements as specified in TS 26.118 [121], clause 5.1.5.

If the PSS client supports 360 VR video, it may include a receiver that complies with

– the Flexible H.265/HEVC Operation Point Receiver requirements as specified in TS 26.118 [121], clause 5.1.6.

7.4.5.1.2 Progressive Download

If the 3GPP PSS client supports 360 VR video for Progressive Download services, it shall include a receiver that complies with

– the Basic Video Media Profile Receiver requirements for file format signalling and encapsulation as specified in TS 26.118 [121], clause 5.2.2.2.

If the 3GPP PSS client supports 360 VR video for Progressive Download services, it should include a receiver that complies with

– the Main Video Media Profile Receiver requirements for file format signalling and encapsulation as specified in TS 26.118 [121], clause 5.2.3.2.

If the 3GPP PSS client supports 360 VR video for Progressive Download services, it may include a receiver that complies with

– the Advanced Video Media Profile Receiver requirements for file format signalling and encapsulation as specified in TS 26.118 [121], clause 5.2.4.2.

NOTE: This profile requires extensions beyond the 3GP file format that are only defined in TS26.118 [121].

7.4.5.1.3 DASH

If the 3GPP PSS client supports 360 VR video for DASH services, it shall include a receiver that complies with

– the Basic Video Media Profile Receiver requirements for DASH as specified in TS 26.118 [121], clause 5.2.2.3.

If the 3GPP PSS client supports 360 VR video for DASH services, it should include a receiver that complies with

– the Main Video Media Profile Receiver requirements for DASH as specified in TS 26.118 [121], clause 5.2.3.3.

If the 3GPP PSS client supports 360 VR video for DASH services, it may include a receiver that complies with

– the Advanced Video Media Profile Receiver requirements for DASH as specified in TS 26.118 [121], clause 5.2.3.4.

7.4.5.2 Audio

7.4.5.2.1 Operation Points

If the PSS client supports 3D/VR audio, it should include a receiver that complies with

– the 3GPP MPEG-H Audio Operation Point Receiver requirements as specified in TS 26.118 [121], clause 6.1.4.

7.4.5.2.2 Progressive Download

If the 3GPP PSS client supports 3D/VR audio for Progressive Download services, it should include a receiver that complies with

– the OMAF 3D Audio Baseline Media Profile Receiver requirements for file format signalling and encapsulation as specified in TS 26.118 [121], clause 6.2.2.2.

7.4.5.2.3 DASH

If the 3GPP PSS client supports 3D/VR audio for DASH services, it should include a receiver that complies with

– the OMAF 3D Audio Baseline Media Profile Receiver requirements for file format signalling and encapsulation as specified in TS 26.118 [121], clause 6.2.2.3.

7.4.5.3 Metrics

If the 3GPP PSS client supports 360 VR video for Progressive Download or DASH services, it should include a receiver that complies with

– the VR Metrics requirements as specified in TS 26.118 [121], clause 9.

7.5 Still images

If a PSS client supports still images, ISO/IEC JPEG [26] together with JFIF [27] decoders shall be supported. The support requirement for ISO/IEC JPEG only applies to the following two modes:

– baseline DCT, non-differential, Huffman coding, as defined in table B.1, symbol ‘SOF0’ in [26];

– progressive DCT, non-differential, Huffman coding, as defined in table B.1, symbol ‘SOF2’ [26].

7.6 Bitmap graphics

If a PSS client supports bitmap graphics, the following bitmap graphics decoders should be supported:

– GIF87a, [32];

– GIF89a, [33];

– PNG, [38].

7.7 Vector graphics

If a PSS client supports vector graphics, SVG Tiny 1.2 [42] [43] and ECMAScript [94] shall be supported.

NOTE 1: The compression format for SVG content is GZIP [59], in accordance with the SVG specification [42].

NOTE 2: Only codecs and MIME media types supported by PSS, as specified in clause 7 and in subclause 5.4, respectively, shall be used. In particular, PSS clients are not required to support the Ogg Vorbis format.

NOTE 3: Content creators of SVG Tiny 1.2 are strongly recommended to follow the content creation guidelines provided in Annex L.

7.8 Text

Text is provided as part of the HTML5 scene description. The following character coding formats shall be supported:

– UTF-8, [30];

– UCS-2, [29].

7.9 Timed text

PSS clients supporting timed text shall support [51]. Timed text may be transported over RTP or downloaded contained in 3GP files using Basic profile.

NOTE: A PSS client supporting timed text shall receive and parse 3GP files containing the text streams. This does not imply a requirement on PSS clients to be able to render other continuous media types contained in 3GP files, e.g. AMR and H.264, if such media types are included in a presentation together with timed text. Audio and video are instead streamed to the client using RTSP/RTP (see clause 6.2).

7.10 3GPP file format

3GP files [50] can be used by both PSS clients and PSS servers. The following profiles are used:

– Basic profile shall be supported by PSS clients if timed text is supported;

– Basic profile and Extended-presentation profile should be supported by PSS clients;

– Streaming server profile should be supported by PSS servers.

More details on the support of 3GPP file format and segments for Dynamic Adaptive Streaming over HTTP is specified in 3GPP TS 26.247 [112].

7.11 Timed graphics

PSS clients supporting timed graphics shall support 3GPP TS 26.430 [109].