7 Streaming-server extensions

26.2443GPP3GPP file format (3GP)Release 17Transparent end-to-end Packet-switched Streaming Service (PSS)TS

7.1 General

This clause defines extensions to 3GP files to be used by streaming servers. The extensions enable a PSS server to relate different tracks and use them for selection and adaptation. In particular, they enable a PSS server to

– generate SDP descriptions with alternatives, as specified in subclauses 5.3.3.3 – 5.3.3.4 of [3];

– select and combine tracks with alternative encodings of media before a presentation;

– switch between tracks with alternative encodings during a streaming session;

– determine the decoding order, playout timestamp, and size for any ADU in an RTP payload.

In addition, the streaming server extensions enable a PSS server to

– use SRTP hint tracks for integrity protection.

The streaming-server extensions are intended to be used with hint tracks, although they are not limited to be used with hint tracks. Hint tracks are defined in the ISO base media file format [7] and provide (RTP) packetization instructions for media stored in a file.

NOTE: The present document defines syntax and semantics for streaming-server extensions in 3GP files. It does not define protocols for, e.g., how a PSS server signals alternative encodings or switches between different bitrate encodings. All protocols used by a PSS server are defined in [3].

7.2 Groupings of alternative tracks

By default all enabled tracks in a 3GP file are streamed (played) simultaneously. However, the ISO base media file format [7] specifies that tracks that are alternatives to each other can be grouped into an alternate group. Tracks in an alternate group that can be used for switching can be further grouped into a switch group, as defined here.

7.2.1 Alternate group

Alternate group is identified by an integer, alternate_group, in the Track Header box of each track. If this integer is 0 (default value), there is no information on possible relations to other tracks. If this integer is not 0, it should be the same for tracks that contain alternate data for one another and different for tracks belonging to different such groups. Only one track within an alternate group should be streamed or played at any time and must be distinguishable from other tracks in the group via attributes such as bitrate, codec, language, packet size etc.

7.2.2 Switch group

Switch group is identified by an integer, switch_group, in the Track Selection box of each track, as defined below. If this box is absent or if this integer is 0 (default value), there is no information on whether the track can be used for switching during streaming or playing. If this integer is not 0, it shall be the same for tracks that can be used for switching between each other. Tracks that belong to the same switch group shall belong to the same alternate group.

7.3 Track Selection box

This subclause defines an optional box that aids the selection between tracks. It is used to encode switch groups and the criteria that should be used to differentiate tracks within alternate and switch groups.

The Track Selection box is defined in table 7.1. It is contained in the User data box of the track it modifies.

Note that Track Selection box is also defined in [7], with a slightly different set of defined attributes. One difference is that herein the definition of the attribute "Language" identified by ‘lang’ is included; while in [7] the definition of the attribute "Media language" identified by ‘mela’ is included.

Table 7.1: Track Selection box fields

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘tsel’

BoxHeader.Version

Unsigned int(8)

0

BoxHeader.Flags

Bit(24)

0

SwitchGroup

int(32)

Switch group of track.

0 (default)

AttributeList

Unsigned int(32) [N]

List of N attributes to the end of the box.

BoxHeader Size, Type, Version and Flags: indicate the size, type, version and flags of the Track Selection box. The type shall be ‘tsel’ and the version shall be 0. No flags are defined.

SwitchGroup: indicates switch group as defined in clause 7.2.2. It shall be 0 if the track is not intended for switching.

AttributeList: is a list of attributes to the end of the box. The attributes in this list should be used as differentiation criteria for tracks in the same alternate or switch group. Each attribute is associated with a pointer to the field or information that distinguishes the track. Attributes and pointers are listed in table 7.2.

Table 7.2: Attributes for AttributeList of the Track Selection box

Name

Attribute

Pointer

Language

‘lang’

Value of grouping type LANG of "alt-group" attribute in session-level SDP (defined in clause 5.3.3.4 of [3])

Bandwidth

‘bwas’

Value of "b=AS" attribute in media-level SDP

Codec

‘cdec’

SampleEntry (in Sample Description box of media track)

Screen size

‘scsz’

Width and height fields of MP4VisualSampleEntry and H263SampleEntry (in media track)

Max packet size

‘mpsz’

Maxpacketsize field in RTPHintSampleEntry

Media type

‘mtyp’

Handlertype in Handler box (of media track)

7.4 Combining alternative tracks

Tracks from different alternate groups are streamed (played) simultaneously. However, all combinations of tracks may not form suitable presentations. In order to suggest suitable combinations of tracks and also to reduce the number of possible combinations, a content provider can encode preferred combinations of alternative tracks in a 3GP file. Such combinations are encoded by the "alt-group" attribute in the session-level SDP fragment, as described in clause 7.5.3.

If information on suitable combinations of tracks is missing, tracks with the lowest track IDs of each alternate group should be streamed (played) by default.

7.5 SDP

7.5.1 Session- and media-level SDP

Fragments that together constitute an SDP description shall be contained in a 3GP file with streaming-server extensions. Session-level SDP, i.e. all lines before the first media-specific line ("m=" line), shall be stored as Movie SDP information within the User Data box, as specified in [7]. Media-level SDP, i.e. an "m=" line and the lines before the next "m=" line (or end of SDP) shall be stored as Track SDP information within the User data box of the corresponding track. Media-level SDP shall be contained in hint tracks (if provided).

7.5.2 Stored versus generated SDP fields

The SDP information stored in a 3GP file should be as complete as possible, although some fields must be generated or modified by the server when a presentation is composed. Table 7.3 gives an overview of the SDP fields used by PSS, c.f. Table A.1 in [3], and whether they are required to be included in 3GP files or whether the server is required to generate them.

Table 7.3: Overview of stored and generated fields in SDP

Type

Description

Contained in 3GP file

Generated by PSS server

Session Description

V

Protocol version

R

O

O

Owner/creator and session identifier

O

R

S

Session Name

R

O

I

Session information

O

O

U

URI of description

O

O

E

Email address

O

O

P

Phone number

O

O

C

Connection Information

O

R

B

Bandwidth information

AS

O

O (see note 7)

RS

O

O

RR

O

O

TIAS

O

O

One or more Time Descriptions (See below)

Z

Time zone adjustments

O

O

K

Encryption key

O

O

A

Session attributes

control

O

R

range

R

O

alt-group

R (see note 4)

O

QoE-Metrics

O

O

3GPP-Asset-Information

O

O

3GPP-Integrity-Key

N

R (see note 6)

3GPP-SDP-Auth

N

R (see note 6)

maxprate

O

O

One or more Media Descriptions (See below)

Time Description

T

Time the session is active

R

O

R

Repeat times

O

O

Media Description

M

Media name and transport address

R

O

I

Media title

O

O

C

Connection information

O

R

B

Bandwidth information

AS

R

O (see note 7)

RS

O

R

RR

O

R

TIAS

R

O

K

Encryption Key

O

O

A

Attribute Lines

control

O

R

range

R

O

fmtp

R

O

rtpmap

R

O

X-predecbufsize

R (see note 5)

O

X-initpredecbufperiod

R (see note 5)

O

X-initpostdecbufperiod

R (see note 5)

O

X-decbyterate

R (see note 5)

O

framesize

R

O

alt

N

R

alt-default-id

N

R

3GPP-Adaptation-Support

N

O

QoE-Metrics

O

O

3GPP-Asset-Information

O

O

3GPP-SRTP-Config

N

R (see note 6)

rtcp-fb

N

R

maxprate

R

O

Note 1: Fields in 3GP files are Required (R), Optional (O), or Not allowed (N).

Note 2: Servers are Required (R) to generate (possibly by copying or modifying from file), or have the Option (O) to generate/copy/modify, or are Not allowed (N) to modify fields. If a field is present in a file, it shall be copied or modified, but not omitted, by the server.

Note 3: Some types shall only be included under certain conditions, as specified by PSS [3].

Note 4: The "alt-group" attribute is required to be stored in 3GP files if it is used.

Note 5: The "X-" attributes are required to be stored in 3GP files if they are used. They may either be specified in the PSS Annex G box ‘3gag’ (see Clause 9) or in media-level SDP fragments.

Note 6: The server is required to generate the "3GPP-Integrity-Key", "3GPP-SDP-Auth", and "3GPP-SRTP-Config" attributes if integrity protection is used.

Note 7: The "b=AS" session bandwidth shall include UDP/IP overhead. The value shall be based on IPv4 when stored in a file, but may be modified by the server to accommodate for IPv6. The "maxprate" attribute is useful for such a conversion.

7.5.3 SDP attributes for alternatives

Clauses 5.3.3.3 and 5.3.3.4 of [3] define SDP attributes that a server can use for presenting options to a client. These attributes can be used to encode suggested groupings of tracks, e.g. for selecting a certain language or target bitrate.

Suggested groupings of tracks from different alternate groups, i.e. groupings of tracks that should be streamed together, are encoded by using the "alt-group" attribute in the session-level SDP. Note that a server may have to prune options from such groupings if certain tracks are not presented to the client.

Media-level SDP fragments shall not contain alternative-media attributes ("alt" and "alt-default-id") as they are difficult to pre-encode. When the server combines several media-level SDP fragments from alternative tracks into one media-level SDP, it must generate the appropriate "alt" and "alt-default-id" attributes. This can be done by using the information provided in the "alt-group" attributes in the session-level SDP.

NOTE 1: Track IDs given by the Track Header boxes shall be used for alternative IDs ("alt-id") in attributes for SDP alternatives.

NOTE 2: Tracks with the lowest track IDs of each alternate group should be used as default tracks, i.e. used with the "alt-default-id" attributes.

7.6 SRTP

Hinted content may require the use of SRTP [19] for streaming, e.g. for integrity protection, by using the hint-track format for SRTP defined here. It consists of a dedicated sample entry, which will be ignored by 3GP servers not capable of handling SRTP.

SRTP hint tracks are formatted identically to RTP hint tracks defined in [7], except that:

– the sample entry name is changed from ‘rtp ‘ to ‘srtp’ to indicate to the server that SRTP is required;

– an extra box is added to the sample entry which can be used to instruct the server in the nature of the on-the-fly encryption and integrity protection that must be applied.

Samples of an SRTP hint track follow the same syntax for constructing RTP packets as RTP hint tracks.

An SRTP Hint Sample Entry (‘srtp’) shall include an SRTP Process Box (‘srpp’) that may instruct the server as to which SRTP algorithms should be applied. It is defined in [7] and included in Table 7.4 for information.

Table 7.4: SRTPProcessBox

Field

Type

Details

Value

BoxHeader.Size

Unsigned int(32)

BoxHeader.Type

Unsigned int(32)

‘srpp’

BoxHeader.Version

Unsigned int(8)

0

BoxHeader.Flags

Bit(24)

0

EncryptionAlgorithmRTP

Unsigned int(32)

4cc identifying the algorithm

EncryptionAlgorithmRTCP

Unsigned int(32)

4cc identifying the algorithm

IntegrityAlgorithmRTP

Unsigned int(32)

4cc identifying the algorithm

IntegrityAlgorithmRTCP

Unsigned int(32)

4cc identifying the algorithm

SchemeTypeBox

Box containing the protection scheme.

SchemeInformationBox

Box containing the scheme information.

The SchemeTypeBox and SchemeInformationBox have the syntax defined in Tables 10.7 and 10.8, respectively. They serve to provide the parameters required for applying SRTP. The Scheme Type Box is used to indicate the necessary key management and security policy for the stream in extension to the defined algorithmic pointers provided by the SRTP Process Box. The key management functionality is also used to establish all the necessary SRTP parameters. The key management functionality is also used to establish all the necessary SRTP parameters as listed in section 8.2 of [19]. The exact definition of protection schemes is out of the scope of the file format.

The algorithms for encryption and integrity protection are defined by SRTP. Table 7.5 summarizes the format identifiers defined here. An entry of four spaces ($20$20$20$20) may be used to indicate that a process outside the file format decides the choice of algorithm for either encryption or integrity protection.

Table 7.5: Algorithms for encryption and integrity protection

Format

Algorithm

$20$20$20$20

The choice of algorithm for either encryption or integrity protection is decided by a process outside the file format

ACM1

Encryption using AES in Counter Mode with 128-bit key, as defined in Section 4.1.1 of [19]

AF81

Encryption using AES in F8-mode with 128-bit key, as defined in Section 4.1.2 of [19]

ENUL

Encryption using the NULL-algorithm as defined in Section 4.1.3 of [19]

SHM2

Integrity protection using HMAC-SHA-1 with 160-bit key, as defined in Section 4.2.1 of [19]

ANUL

Integrity protection not applied to RTP (but still applied to RTCP). Note: this is valid only for IntegrityAlgorithmRTP.

7.7 Aggregated RTP payloads

An application data unit (ADU), normally being the smallest independently usable data unit, is specified as follows for coding formats and RTP payload formats allowed in 3GP files:

– For audio and speech, an ADU is specified as a coded frame intended for transport.

– For H.263 an ADU consists of an entire RTP payload.

– For H.264 (AVC) or H.265 (HEVC), an ADU is a Network Adaptation Layer Unit (NALU).

– For timed text, an ADU consists of any of the type 1-5 RTP payload units [28].

For encrypted RTP payloads, the actual ADUs are hidden within the encrypted payload. Some RTP payload formats allow aggregation of multiple ADUs into a single RTP payload. When any hint sample in an RTP hint track defines a payload including multiple ADUs, each hint sample in the hint track shall comply with the following requirements:

– The extra-flag in the RTPPacket class of the hint sample shall be set to 1. This indicates that there is extra information before the RTP constructors in the form of type-length-value sets.

– The extra information in the hint sample shall include a ‘3gau’ structure as specified below.

class 3gppApplicationDataUnitInfoTLV extends Box(‘3gau’) {
unsigned int(16) entrycount;
for(i=1; i<=entrycount; i++){
unsigned int(32) numbytes;
unsigned int(64) decorder;
unsigned int(32) timestampoffset
}
}

entrycount indicates the number of ADUs in the RTP payload.

numbytes indicates the number of bytes of the i’th ADU in the RTP payload.

decorder indicates the decoding order of ADUs within the RTP hint track. The smaller value of decorder, the earlier the ADU is in decoding order. All ADUs shall have a unique value of decorder, and the assignment shall be done using consecutive numbers. If two or more ADUs can be decoded virtually simultaneously, i.e. their relative decoding order is undefined, they shall still be assigned consecutive numbers.

timestampoffset indicates the RTP timestamp offset of the i’th ADU relative to the timestamp of RTP header of the packet it will be transmitted in. Where the ADU’s timestamp value is equal to what it would have had if it were transmitted in an RTP packet containing only the ADU.