6 Codec registration

26.2443GPP3GPP file format (3GP)Release 17Transparent end-to-end Packet-switched Streaming Service (PSS)TS

6.1 General

The purpose of this clause is to define the necessary structure for integration of the H.263, AMR, AMR-WB, Extended AMR-WB (AMR-WB+), EVS, Enhanced aacPlus and AAC media specific information in a 3GP file. Clause 6.2 gives some background information about the Sample Description box in the ISO base media file format [7] and clause 6.4 about the MP4AudioSampleEntry box in the MPEG-4 file format [14]. The definitions of the Sample Entry boxes for AMR, AMR-WB, AMR-WB+ and H.263 are given in clauses 6.5 to 6.10. The definition of the Sample Entry box for EVS is given in clause 6.14. The integration of timed text in a 3GP file is specified in [4], the integration of H.264 (AVC) is specified in [20], the integration of H.265 (HEVC) is specified in clause 8 of [20], the integration of Quality metrics timed metadata track is specified in clause 4 of [53] and clause 16 of this specification and the integration of DIMS is specified in [36] and clauses 5.4.3, 5.4.6 and 11 of the present document. Requirements for integrating video codecs in the context of the TV Video Profile are documented in TS 26.116 [56].

AMR and AMR-WB data is stored in the stream according to the AMR and AMR-WB storage format for single channel header of Annex E [15], without the AMR magic numbers. The 3GPP file format is the native storage format for AMR-WB+. The data stream, stored in samples of a 3GP file, shall be formatted according to clause 8.3 of [21]. Each sample contains one or more AMR-WB+ storage units. The number of storage units per sample may differ from sample to sample.

For EVS each sample of the media is one speech frame block as specified in Annex A.2.6.2 of [55]. A speech frame block consists of N ToC entries and N speech frames, where N is the value of channelcount in the EVSSampleEntry box specified in clause 6.14 of the present document.

6.2 Sample Description box

In an ISO file, Sample Description Box gives detailed information about the coding type used, and any initialisation information needed for that coding. The Sample Description Box can be found in the ISO file format Box Structure Hierarchy shown in figure 6.1.

Figure 6.1: ISO File Format Box Structure Hierarchy

The Sample Description Box can have one or more Sample Entries.

Valid Sample Entries already defined for ISO and MP4 include MP4AudioSampleEntry and HintSampleEntry. Other Sample Entries shall be according to the following:

– AMR, AMR-WB AMRSampleEntry

– AMR-WB+ AMRWPSampleEntry

– EVS EVSSampleEntry

– H.263 H263SampleEntry

– H.264(AVC) AVCSampleEntry

– H.265(HEVC) HEVCSampleEntry

– Timed text TextSampleEntry

– DIMS DIMSSampleEntry

– CVO timed metadata CVOSampleEntry

– Location timed metadata LocationSampleEntry

– Quality metrics timed metadata QualityMetricsSampleEntry

– Orientation timed metadata OrientationSampleEntry

The format of SampleEntry and its fields are explained as follows:

SampleEntry ::= MP4AudioSampleEntry |
AMRSampleEntry |
AMRWPSampleEntry |

EVSSampleEntry |
H263SampleEntry |
AVCSampleEntry |
TextSampleEntry |
DIMSSampleEntry |
HintSampleEntry |
CVOSampleEntry |
HEVCSampleEntry |
LocationSampleEntry |

QualityMetricsSampleEntry |

OrientationSampleEntry

Table 6.1: SampleEntry fields

Field

Type

Details

Value

MP4AudioSampleEntry

Entry type for audio samples defined in the MP4 specification.

AMRSampleEntry

Entry type for AMR and AMR-WB speech samples defined in clause 6.5 of the present document.

AMRWPSampleEntry

Entry type for AMR-WB+ audio samples defined in clause 6.9 of the present document.

EVSSampleEntry

Entry type for EVS samples defined in clause 6.14 of the present document.

H263SampleEntry

Entry type for H.263 visual samples defined in clause 6.6 of the present document.

AVCSampleEntry

Entry type for H.264 (AVC) visual samples defined in the AVC file format specification in clause 5 of [20].

TextSampleEntry

Entry type for timed text samples defined in the timed text specification

DIMSSampleEntry

Entry type for DIMS scene description samples defined in the DIMS specification.

HintSampleEntry

Entry type for hint track samples defined in the ISO specification.

CVOSampleEntry

Entry type for CVO timed metadata track as defined in clause 6.11 of the present document

HEVCSampleEntry

Entry type for H.265 (HEVC) visual samples defined in the H.265 (HEVC) file format specification in clause 8 of [20].

LocationSampleEntry

Entry type for Location timed metadata track as defined in clause 6.12 of the present document

QualityMetricsSampleEntry

Entry type for Quality metrics timed metadata track as defined in clause 4 of [53]

OrientationSampleEntry

Entry type for Orientation timed metadata track as defined in clause 6.13 of the present document

From the Sample Entries in Table 6.1, only the MP4AudioSampleEntry, H263SampleEntry, AMRSampleEntry, AMRWPSampleEntry, EVSSampleEntry, CVOSampleEntry, LocationSampleEntry and OrientationSampleEntry are taken into consideration here. TextSampleEntry is defined in [4], HintSampleEntry in [7], AVCSampleEntry in clause 5 of [20], HEVCSampleEntry in clause 8 of [20], QualityMetricsSampleEntry in clause 4 of [53] and DIMSSampleEntry in [36].

6.3 MP4VisualSampleEntry box

The MP4VisualSampleEntry Box is defined as follows:

MP4VisualSampleEntry ::= BoxHeader
Reserved_6
Data-reference-index
Reserved_16
Width
Height
Reserved_4
Reserved_4
Reserved_4
Reserved_2
Reserved_32
Reserved_2
Reserved_2
ESDBox

Table 6.2: MP4VisualSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘mp4v’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes.

Reserved_16

Const unsigned int(32) [4]

0

Width

Unsigned int(16)

Maximum width, in pixels of the stream

Height

Unsigned int(16)

Maximum height, in pixels of the stream

Reserved_4

Const unsigned int(32)

0x00480000

Reserved_4

Const unsigned int(32)

0x00480000

Reserved_4

Const unsigned int(32)

0

Reserved_2

Const unsigned int(16)

1

Reserved_32

Const unsigned

int(8) [32]

0

Reserved_2

Const unsigned int(16)

24

Reserved_2

Const int(16)

-1

ESDBox

Box containing an elementary stream descriptor for this stream.

The stream type specific information is in the ESDBox structure, as defined in [14].

This version of the MP4VisualSampleEntry, with explicit width and height, shall be used for MPEG-4 video streams conformant to this specification.

NOTE: width and height parameters together may be used to allocate the necessary memory in the playback device without need to analyse the video stream.

6.4 Void

.

6.5 AMRSampleEntry box

For narrow-band AMR, the box type of the AMRSampleEntry Box shall be ‘samr’. For AMR wideband (AMR-WB), the box type of the AMRSampleEntry Box shall be ‘sawb’.

The AMRSampleEntry Box is defined as follows:

AMRSampleEntry ::= BoxHeader
Reserved_6
Data-reference-index
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
AMRSpecificBox

Table 6.4: AMRSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘samr’ or ‘sawb’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes.

Reserved_8

Const unsigned int(32) [2]

0

Reserved_2

Const unsigned int(16)

2

Reserved_2

Const unsigned int(16)

16

Reserved_4

Const unsigned int(32)

0

TimeScale

Unsigned int(16)

Copied from media header box of this media

Reserved_2

Const unsigned int(16)

0

AMRSpecificBox

Information specific to the decoder.

If one compares the MP4AudioSampleEntry Box – AMRSampleEntry Box the main difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, with a box suitable for AMR and AMR-WB. The AMRSpecificBox field structure is described in clause 6.7.

6.6 H263SampleEntry box

The box type of the H263SampleEntry Box shall be ‘s263’.

The H263SampleEntry Box is defined as follows:

H263SampleEntry ::= BoxHeader
Reserved_6
Data-reference-index
Reserved_16
Width
Height
Reserved_4
Reserved_4
Reserved_4
Reserved_2
Reserved_32
Reserved_2
Reserved_2
H263SpecificBox

Table 6.5: H263SampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘s263’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes.

Reserved_16

Const unsigned int(32) [4]

0

Width

Unsigned int(16)

Maximum width, in pixels of the stream

Height

Unsigned int(16)

Maximum height, in pixels of the stream

Reserved_4

Const unsigned int(32)

0x00480000

Reserved_4

Const unsigned int(32)

0x00480000

Reserved_4

Const unsigned int(32)

0

Reserved_2

Const unsigned int(16)

1

Reserved_32

Const unsigned

int(8) [32]

0

Reserved_2

Const unsigned int(16)

24

Reserved_2

Const int(16)

-1

H263SpecificBox

Information specific to the H.263 decoder.

If one compares the MP4VisualSampleEntry – H263SampleEntry Box the main difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, with a box suitable for H.263. The H263SpecificBox field structure for H.263 is described in clause 6.8.

6.7 AMRSpecificBox field for AMRSampleEntry box

The AMRSpecificBox fields for AMR and AMR-WB shall be as defined in table 6.6. The AMRSpecificBox for the AMRSampleEntry Box shall always be included if the 3GP file contains AMR or AMR-WB media.

Table 6.6: The AMRSpecificBox fields for AMRSampleEntry

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘damr’

DecSpecificInfo

AMRDecSpecStruc

Structure which holds the AMR and AMR-WB Specific information

BoxHeader Size and Type: indicate the size and type of the AMR decoder-specific box. The type must be ‘damr’.

DecSpecificInfo: the structure where the AMR and AMR-WB stream specific information resides.

The AMRDecSpecStruc is defined as follows:

struct AMRDecSpecStruc{
Unsigned int (32) vendor
Unsigned int (8) decoder_version
Unsigned int (16) mode_set
Unsigned int (8) mode_change_period
Unsigned int (8) frames_per_sample
}

The definitions of AMRDecSpecStruc members are as follows:

vendor: four character code of the manufacturer of the codec, e.g. ‘VXYZ’. The vendor field gives information about the vendor whose codec is used to create the encoded data. It is an informative field, which may be used by the decoding end. If a manufacturer already has a four-character code, it is recommended that it uses the same code in this field. Else, it is recommended that the manufacturer creates a four character code which best addresses the manufacturer’s name. It can be safely ignored.

decoder_version: version of the vendor’s decoder which can decode the encoded stream in the best (i.e. optimal) way. This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder version pairs. The value is set to 0 if decoder version has no importance for the vendor. It can be safely ignored.

mode_set: the active codec modes. Each bit of the mode_set parameter corresponds to one mode. The bit index of the mode is calculated according to the 4 bit FT field of the AMR or AMR-WB frame structure. The mode_set bit structure is as follows: (B15xxxxxxB8B7xxxxxxB0) where B0 (Least Significant Bit) corresponds to Mode 0, and B8 corresponds to Mode 8.

The mapping of existing AMR modes to FT is given in table 1.a in [16]. A value of 0x81FF means all modes and comfort noise frames are possibly present in an AMR stream.

The mapping of existing AMR-WB modes to FT is given in Table 1.a in TS 26.201 [17]. A value of 0x83FF means all modes and comfort noise frames are possibly present in an AMR-WB stream.

As an example, if mode_set = 0000000110010101b, only Modes 0, 2, 4, 7 and 8 are present in the stream.

mode_change_period: defines a number N, which restricts the mode changes only at a multiple of N frames. If no restriction is applied, this value should be set to 0. If mode_change_period is not 0, the following restrictions apply to it according to the frames_per_sample field:

if (mode_change_period < frames_per_sample)

frames_per_sample = k x (mode_change_period)

else if (mode_change_period > frames_per_sample)

mode_change_period = k x (frames_per_sample)

where k : integer [2, …]

If mode_change_period is equal to frames_per_sample, then the mode is the same for all frames inside one sample.

frames_per_sample: defines the number of frames to be considered as ‘one sample’ inside the 3GP file. This number shall be greater than 0 and less than 16. A value of 1 means each frame is treated as one sample. A value of 10 means that 10 frames (of duration 20 msec each) are put together and treated as one sample. It must be noted that, in this case, one sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For the last sample of the stream, the number of frames can be smaller than frames_per_sample, if the number of remaining frames is smaller than frames_per_sample.

NOTE1: The "hinter", for the creation of the hint tracks, can use the information given by the AMRDecSpecStruc members.

NOTE2: The following AMR MIME parameters are not relevant to PSS: {mode_set, mode_change_period, mode_change_neighbor}. PSS servers should not send these parameters in SDP, and PSS clients shall ignore these parameters if received.

6.8 H263SpecificBox field for H263SampleEntry box

The H263SpecificBox fields for H. 263 shall be as defined in table 6.7. The H263SpecificBox for the H263SampleEntry Box shall always be included if the 3GP file contains H.263 media.

The H263SpecificBox for H263 is composed of the following fields.

Table 6.7: The H263SpecificBox fields H263SampleEntry

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘d263’

DecSpecificInfo

H263DecSpecStruc

Structure which holds the H.263 Specific information

BitrateBox

Specific bitrate information (optional)

BoxHeader Size and Type: indicate the size and type of the H.263 decoder-specific box. The type must be ‘d263’.

DecSpecificInfo: This is the structure where the H263 stream specific information resides.

H263DecSpecStruc is defined as follows:

struct H263DecSpecStruc{
Unsigned int (32) vendor
Unsigned int (8) decoder_version
Unsigned int (8) H263_Level
Unsigned int (8) H263_Profile
}

The definitions of H263DecSpecStruc members are as follows:

vendor: four character code of the manufacturer of the codec, e.g. ‘VXYZ’. The vendor field gives information about the vendor whose codec is used to create the encoded data. It is an informative field which may be used by the decoding end. If a manufacturer already has a four-character code, it is recommended that it uses the same code in this field. Else, it is recommended that the manufacturer creates a four character code which best addresses the manufacturer’s name. It can be safely ignored.

decoder_version: version of the vendor’s decoder which can decode the encoded stream in the best (i.e. optimal) way. This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder version pairs. . The value is set to 0 if decoder version has no importance for the vendor. It can be safely ignored.

H263_Level and H263_Profile: These two parameters define which H263 profile and level is used. These parameters are based on the MIME media type video/H263-2000. The profile and level specifications can be found in [9].

EXAMPLE 1: H.263 Baseline = {H263_Level = 10, H263_Profile = 0}

EXAMPLE 2: H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3}

NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the H263DecSpecStruc members.

The BitrateBox field shall be as defined in table 6.8. The BitrateBox may be included if the 3GP file contains H.263 media.

The BitrateBox is composed of the following fields.

Table 6.8: The BitrateBox fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘bitr’

DecBitrateInfo

DecBitrStruc

Structure which holds the Bitrate information

BoxHeader Size and Type: indicate the size and type of the bitrate box. The type must be ‘bitr’.

DecBitrateInfo: This is the structure where the stream bitrate information resides.

DecBitrStruc is defined as follows:

struct DecBitrStruc{
Unsigned int (32) Avg_Bitrate
Unsigned int (32) Max_Bitrate
}

The definitions of DecBitrStruc members are as follows:

Avg_Bitrate: the average bitrate in bits per second of this elementary stream. For streams with variable bitrate this value shall be set to zero.

Max_Bitrate: the maximum bitrate in bits per second of this elementary stream in any time window of one second duration.

6.9 AMRWPSampleEntry box

The box type of the AMRWPSampleEntry Box shall be ‘sawp’.

The AMRWPSampleEntry Box is defined as follows:

AMRWPSampleEntry ::= BoxHeader
Reserved_6
Data-reference-index
Reserved_8
Reserved_2
Reserved_2
Reserved_4
TimeScale
Reserved_2
AMRWPSpecificBox

Table 6.9: AMRWPSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘sawp’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes.

Reserved_8

Const unsigned int(32) [2]

0

Reserved_2

Const unsigned int(16)

2

Reserved_2

Const unsigned int(16)

16

Reserved_4

Const unsigned int(32)

0

Sampling rate

Unsigned int(16)

See note 3.

Reserved_2

Const unsigned int(16)

0

AMRWPSpecificBox

Information specific to the AMR-WB+ decoder.

If one compares the MP4AudioSampleEntry Box – AMRWPSampleEntry Box the main difference is in the replacement of the ESDBox, which is specific to MPEG-4 systems, with a box suitable for AMR-WB+. The AMRWPSpecificBox field structure is described in clause 6.10.

NOTE 1: In order to maintain backward compatibility with Release 4 and 5, the AMRWPSampleEntry should not be used for AMR-WB+ streams that only contain AMR-WB modes. Such streams should be stored as AMR-WB, i.e. by using the AMRSampleEntry with box type ‘sawb’, defined in clause 6.5, and the storage format for single channel header of Annex E [15], without the AMR magic numbers. This way file readers of previous releases will always be able to read AMR-WB streams stored in 3GP files.

NOTE 2: In order to enhance interoperability in Release 6, file readers capable of parsing tracks with AMR-WB+ should also be capable of parsing AMR-WB tracks (see note 1).

NOTE 3: The timescale of AMR-WB+ is fixed to 72kHz to accommodate the internal sampling rate which may vary over time. The sampling rate field of the AMRWPSampleEntry is therefore not coupled to the timescale, but contains the recommended playback sampling rate.

6.10 AMRWPSpecificBox field for AMRWPSampleEntry box

The AMRWPSpecificBox fields for AMR-WB+ shall be as defined in table 6.10. The AMRWPSpecificBox for the AMRWPSampleEntry Box shall always be included if the 3GP file contains AMR-WB+ media.

Table 6.10: The AMRWPSpecificBox fields for AMRWPSampleEntry

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘dawp’

DecSpecificInfo

AMRWPDecSpecStruc

Structure which holds the AMR-WB+ Specific information

BoxHeader Size and Type: indicate the size and type of the AMR-WB+ decoder-specific box. The type must be ‘dawp’.

DecSpecificInfo: the structure where the AMR-WB+ stream specific information resides.

The AMRWPDecSpecStruc is defined as follows:

struct AMRWPDecSpecStruc{
Unsigned int (32) vendor
Unsigned int (8) decoder_version
}

The definitions of AMRWPDecSpecStruc members are as follows:

vendor: four character code of the manufacturer of the codec, e.g. ‘VXYZ’. The vendor field gives information about the vendor whose codec is used to create the encoded data. It is an informative field, which may be used by the decoding end. If a manufacturer already has a four-character code, it is recommended that it uses the same code in this field. Else, it is recommended that the manufacturer creates a four character code which best addresses the manufacturer’s name. It can be safely ignored.

decoder_version: version of the vendor’s decoder which can decode the encoded stream in the best (i.e. optimal) way. This field is closely tied to the vendor field. It may give advantage to the vendor which has optimal encoder-decoder version pairs. The value is set to 0 if decoder version has no importance for the vendor. It can be safely ignored.

NOTE: For AMR and AMR-WB the AMRSpecificBox defines the number of frames that are stored in a sample. For AMR-WB+, however, the AMRWPSpecificBox does not specify an overall sample structure, as the number of storage units per sample may differ from sample to sample.

6.11 CVOSampleEntry box

The box type of the CVOSampleEntry Box shall be ‘3gvo’.

The CVOSampleEntry Box is defined as follows:

CVOSampleEntry ::= BoxHeader
Reserved_6
Data-reference-index
Granularity

Table 6.11: CVOSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘3gvo’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes.

Granularity

Unsigned int(8)

Granularity used in CVO Rotation as defined in 26.114 [50]

Takes value 2 for CVO and 6 for high granularity CVO as defined in TS 26.114 [50].

6.12 LocationSampleEntry box

The box type of the LocationSampleEntry Box shall be ‘3glo’.

The LocationSampleEntry Box is defined as follows:

LocationSampleEntry ::= BoxHeader
Reserved_6
Data-reference-index

Table 6.12: LocationSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘3glo’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference used to retrieve the sample data. Data references are stored in data reference boxes.

6.13 OrientationSampleEntry box

The box type of the OrientationSampleEntry Box shall be ‘3gor’.

The OrientationSampleEntry Box is defined as follows:

OrientationSampleEntry := BoxHeader
Reserved_6
Data-reference-index

Table 6.13: OrientationSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘3gor’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference used to retrieve the sample data. Data references are stored in data reference boxes.

6.14 EVSSampleEntry box

The box type of the EVSSampleEntry Box shall be ‘sevs’.

The EVSSampleEntry Box is defined as follows:

EVSSampleEntry ::= BoxHeader
Reserved_6
Data-reference-index
Reserved_8
channelcount
Reserved_2
Reserved_4
TimeScale
Reserved_2

Table 6.4: EVSSampleEntry fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘sevs’

Reserved_6

Unsigned int(8) [6]

0

Data-reference-index

Unsigned int(16)

Index to a data reference that to use to retrieve the sample data. Data references are stored in data reference boxes.

Reserved_8

Const unsigned int(32) [2]

0

channelcount

Const unsigned int(16)

Number of mono channels present in this media.

Reserved_2

Const unsigned int(16)

16

Reserved_4

Const unsigned int(32)

0

TimeScale

Unsigned int(16)

Sample rate of the media according to the maximum encoded bandwidth, e.g., 32000 for SWB. Set to 48000 if unknown.

One of the values: 8000, 16000, 32000, or 48000.

Reserved_2

Const unsigned int(16)

0