9 Video buffer information

26.2443GPP3GPP file format (3GP)Release 17Transparent end-to-end Packet-switched Streaming Service (PSS)TS

9.1 General

A 3GP file can include video-buffer parameters associated with video streams. For the case when only one set of parameters is associated to an entire video stream, these can be included in the corresponding media-level SDP fragment. However, in order to provide buffer parameters for different operation points, as defined below, and for different synchronization points, a track can contain a video buffer sample grouping. The type of sample grouping depends on which video-buffer model that is used for a particular video codec.

For H.264 (AVC) or H.265 (HEVC), there are two types of buffers:

– Hypothetical Reference Decoder (HRD) model;

– For H.264 (AVC) the de-interleaving buffer of the interleaved RTP packetization mode as specified in [30], or for H.265 (HEVC), the de-packetization buffer as specified in [52].

Buffer parameters for several operation points and synchronization points of the HRD model may be specified by a video HRD sample grouping as defined in clause 9.2.2.

Only one set of de-interleaving parameters for H.264 (AVC) and only one set of de-packetization parameters for H.265 (HEVC) can be associated to a stream and therefore the de-interleaving or de-packetization parameters are included in the corresponding media-level SDP fragment according to the H.264 (AVC) MIME/SDP specification in [30] or the H.265 (HEVC) MIME/SDP specification in [52].

NOTE: Any HRD parameters in parameter sets and SEI message in the bitstream or included in the MIME/SDP parameters of a media-level SDP fragment must not contradict each other or the information in the video HRD sample grouping, if any.

9.2 Sample groupings for video-buffer parameters

A sample grouping is an assignment of each sample in a track to be a member of one (or none) of several sample groups, based on a grouping criterion. The assignment of buffer parameters to synchronization points (sync samples) provides one sample grouping of the samples in a track. The usage of sample groups in 3GP files shall follow the syntax defined in [7].

Each sample is associated to zero or one sample group entries of any given grouping type in the sample group description box (‘sgpd’). Sample group entries for sample groups defined by the grouping type ‘3gag’ are given by the 3GPP PSS Annex G Sample group entry, defined in Table 9.1, and sample group entries for sample groups defined by the grouping type ‘avcb’ are given by the video HRD Sample group entry, defined in Table 9.2.

Sample group entries provide buffer parameters relevant to all samples in the corresponding sample group(s). A sync sample and all following non-sync samples before the next sync sample shall be members of the same sample group with respect to the video-buffer grouping type. The indicated buffer parameters for a sync sample are applicable for the stream from that sync sample onwards.

NOTE: A file, in which some but not all samples are associated with sample groups with respect to the grouping type ‘3gag’ or ‘avcb’, may have been edited and may therefore no longer conform to corresponding buffer model.

9.2.1 3GPP PSS Annex G sample grouping

The grouping type ‘3gag’ defines the grouping criterion for 3GPP PSS Annex G buffer parameters. Zero or one sample-to-group box (‘sbgp’) for the grouping type ‘3gag’ can be contained in the sample table box (‘stbl’) of a track. It shall reside in a hint track, if a hint track is used, otherwise in the video track. The presence of this box and grouping type indicates that the associated video stream complies with PSS Annex G. Note that the nature of the track defines the media transport for which the buffer parameters are calculated, e.g. for an RTP hint track, the media transport is RTP.

Table 9.1: 3GPP PSS Annex G sample group entry

Field

Type

Details

Value

BufferParameters

AnnexGstruc

Structure which holds the buffer parameters of PSS Annex G

BufferParameters: the structure where the PSS Annex G buffer parameters reside.

AnnexGstruc is defined as follows:

struct AnnexGstruc{
Unsigned int(16) operation_point_count
for (i = 0; i < operation_point_count; i++){
Unsigned int (32) tx_byte_rate
Unsigned int (32) dec_byte_rate
Unsigned int (32) pre_dec_buf_size
Unsigned int (32) init_pre_dec_buf_period
Unsigned int (32) init_post_dec_buf_period
}
}

The definitions of the AnnexGstruc members are as follows:

operation_point_count: specifies the number of operation points, each characterized by a pair of transmission byte rate and decoding byte rate. Values of buffering parameters are specified separately for each operation point. The value of operation_point_count shall be greater than 0.

tx_byte_rate: indicates the transmission byte rate (in bytes per second) that is used to calculate the transmission timestamps of media-transport packets for the PSS Annex G buffering verifier as follows. Let t1 be the transmission time of the previous media-transport packet and size1 be the number of bytes in the payload of the previous media-transport packet in transmission order, excluding the media-transport payload header and any lower-layer headers. For the first media-transport packet of the stream, t1 and size1 are equal to 0. The media track shall comply with PSS Annex G when each sample is packetized in one media-transport packet, the transmission order of media-transport packets is the same as their decoding order, and the transmission time of an media-transport packet is equal to t1 + size1 / tx_byte_rate. The value of tx_byte_rate shall be greater than 0.

dec_byte_rate: indicates the peak decoding byte rate that was used in this operation point to verify the compatibility of the stream with PSS Annex G. Values are given in bytes per second. The value of dec_byte_rate shall be greater than 0.

pre_dec_buf_size: indicates the size of the PSS Annex G hypothetical pre-decoder buffer in bytes that guarantees pauseless playback of the entire stream under the assumptions of PSS Annex G.

init_pre_dec_buf_period: indicates the required initial pre-decoder buffering period that guarantees pauseless playback of the entire stream under the assumptions of PSS Annex G. Values are interpreted as clock ticks of a 90-kHz block. That is, the value is incremented by one for each 1/90 000 seconds. For example, value 180 000 corresponds to a two second initial pre-decoder buffering.

init_post_dec_buf_period: indicates the required initial post-decoder buffering period that guarantees pauseless playback of the entire stream under the assumptions of PSS Annex G. Values are interpreted as clock ticks of a 90-kHz clock.

9.2.2 Video HRD sample grouping

The grouping type ‘avcb’ defines the grouping criterion for video HRD parameters. Zero or one sample-to-group box (‘sbgp’) for the grouping type ‘avcb’ can be contained in the sample table box (‘stbl’) of a track. It shall reside either in a hint track or a video track for H.264 (AVC) or H.265 (HEVC). The presence of this box and grouping type indicates that the associated video stream complies with the HRD with the indicated parameters.

Table 9.2: Video HRD sample group entry

Field

Type

Details

Value

AVCHRDParameters

AVCHRDstruc

Structure which holds the video HRD parameters

AVCHRDParameters: the structure where the video HRD parameters reside.

AVCHRDstruc is defined as follows:

struct AVCHRDstruc{
Unsigned int(16) operation_point_count
for (i = 0; i < operation_point_count; i++){
Unsigned int (32) tx_byte_rate
Unsigned int (32) pre_dec_buf_size
Unsigned int (32) post_dec_buf_size
Unsigned int (32) init_pre_dec_buf_period
Unsigned int (32) init_post_dec_buf_period
}
}

The definitions of the AVCHRDstruc members are as follows:

operation_point_count: specifies the number of operation points. Values of the HRD parameters are specified separately for each operation point. The value of operation_point_count shall be greater than 0.

tx_byte_rate: indicates the input byte rate (in bytes per second) to the coded picture buffer (CPB) of the HRD.

For H.264 (AVC), the bitstream is constrained by the value of BitRate equal to 8 * the value of tx_byte_rate for NAL HRD parameters as specified in [29]. For VCL HRD parameters, the value of BitRate is equal to tx_byte_rate * 40 / 6.

For H.265 (HEVC), the bitstream is constrained such that, for the NAL HRD parameters, the value of BitRate[ i ] for at least one value of i in the range of 0 to cpb_cnt_minus1[ HighestTid ], inclusive, as specified in [51], is less than or equal to 8 * tx_byte_rate, and for the VCL HRD parameters, the value of BitRate[ i ] for at least one value of i in the range of 0 to cpb_cnt_minus1[ HighestTid ], is less than or equal to tx_byte_rate * 80 / 11.

The value of tx_byte_rate shall be greater than 0.

pre_dec_buf_size: gives the required size of the pre-decoder buffer or coded picture buffer in bytes.

For H.264 (AVC), the bitstream is constrained by the value of CpbSize equal to pre_dec_buf_size * 8 for NAL HRD parameters as specified in [29]. For VCL HRD parameters, the value of CpbSize is equal to pre_dec_buf_size * 40 / 6.

For H.265 (HEVC), the bitstream is constrained such that, for the NAL HRD parameters, the value of CpbSize[ i ] for at least one value of i in the range of 0 to cpb_cnt_minus1[ HighestTid ], inclusive, as specified in [51], is less than or equal to pre_dec_buf_size * 8, and for the VCL HRD parameters, the value of CpbSize[ i ] for at least one value of i in the range of 0 to cpb_cnt_minus1[ HighestTid ], is less than or equal to pre_dec_buf_size * 80 / 11.

At least one pair of values of tx_byte_rate and pre_dec_buf_size of the same operation point shall conform to the maximum bitrate and CPB size allowed by profile and level of the stream.

post_dec_buf_size: gives the required size of the post-decoder buffer, or the decoded picture buffer, in unit of bytes.

For H.264 (AVC), the bitstream is constrained by the value of max_dec_frame_buffering equal to Min( 16, Floor( post_dec_buf_size ) / ( PicWidthMbs * FrameHeightInMbs * 256 * ChromaFormatFactor ) ) ) as specified in [29]. If the SDP attribute 3gpp-videopostdecbufsize is not present for an H.264 (AVC) stream, the value of max_dec_frame_buffering is inferred as specified in [29].

For H.265 (HEVC) Main profile, the bitstream is constrained such that the value of sps_max_dec_pic_buffering_minus1[ HighestTid ] + 1 as specified in [51] is less than or equal to Floor( post_dec_buf_size / ( PicSizeInSamplesY * 3 / 2 ) ), where PicSizeInSamplesY is as specified in [51].

init_pre_dec_buf_period: gives the required delay between the time of arrival in the pre-decoder buffer of the first bit of the first access unit and the time of removal from the pre-decoder buffer of the first access unit. It is in units of a 90 kHz clock.

For H.264 (AVC), the bitstream is constrained by the value of the nominal removal time of the first access unit from the coded picture buffer (CPB), tr,n( 0 ), equal to init_pre_dec_buf_period as specified in [29].

For H.265 (HEVC), the bitstream is constrained such that the value of the nominal removal time of the first access unit from the coded picture buffer (CPB), AuNominalRemovalTime[ 0 ], as specified in [51] is equal to init_pre_dec_buf_period.

init_post_dec_buf_period: gives the required delay between the time of arrival in the post-decoder buffer of the first decoded picture and the time of output from the post-decoder buffer of the first decoded picture. It is in units of a 90 kHz clock.

For H.264 (AVC), the bitstream is constrained by the value of dpb_output_delay for the first decoded picture in output order equal to init_post_dec_buf_period as specified in [29] assuming that the clock tick variable, tc, is equal to 1 / 90 000.

For H.265 (HEVC), the bitstream is constrained such that the value of pic_dpb_output_delay for the first decoded picture in output order, as specifeid in [51], is equal to init_post_dec_buf_period, assuming that the clock tick, ClockTick, is equal to 1 / 90 000.