10 Adaptation of continuous media

26.2343GPPProtocols and codecsRelease 17Transparent end-to-end Packet-switched Streaming Service (PSS)TS

10.1 General

The PSS includes a number of protocols and functionalities that can be utilized to allow the PSS session to adapt transmission and content rates to the available network resources. The goal of this is of course to achieve highest possible quality of experience for the end-user with the available resources, while maintaining interrupt-free playback of the media. This requires that the available network resources are estimated and that transmission rates are adapted to the available network link rates. This can prevent overflowing network buffers and thereby avoid packet losses. The real-time properties of the transmitted media must be considered so that media does not arrive too late to be useful. This will require that media content rate is adapted to the transmission rate.

To avoid buffer overflows, resulting in that the client must discard useful data, while still allowing the server to deliver as much data as possible into the client buffer, a functionality for client buffer feedback is defined. This allows the server to closely monitor the buffering situation on the client side and to do what it is capable in order to avoid client buffer underflow. The client specifies how much buffer space the server can utilize and the minimum target level of protection the client perceives necessary to provide interrupt-free playback. Once this desired level of target protection is achieved, the server may utilize any resources beyond what is needed to maintain that protection level to increase the quality of the media or, the server may choose to leave the transmission rate alone and simply accrue additional time in the client buffer at the present rate. The server can also utilize the buffer feedback information to decide if the media quality needs to be lowered in order to avoid a buffer underflow and the resulting play-back interruption.

10.2 Bit-rate adaptation

The bit-rate adaptation for PSS is server centric in the meaning that transmission and content rate are controlled by the server. The server uses RTCP and RTSP as the basic information sources about the state of the client and network. This allows link-rate adaptation also when communicating with PSS clients of earlier releases, as long as they send RTCP receiver reports frequently enough.

10.2.1 Link-rate estimation

The actual algorithm providing the link-rate estimation is implementation specific. However, this chapter describes and gives rules for the different information sources that can be used for link-rate estimation.

10.2.1.1 Initial values

A PSS client should inform the server the quality of service parameters for the used wireless link. The known parameters should be included in the RTSP "3GPP-Link-Char" header (chapter 5.3.2.1) in either the RTSP SETUP or PLAY request. This enables the server to set some basic assumption about the possible bit-rates and link response. If the client has initially reported these parameters and they are changed during the session the client shall update these parameters by including the "3GPP-Link-Char" header in a SET_PARAMETER or OPTIONS request.

A PSS client should inform the server about initial bit-rate available over the link, if known. This reporting shall be done using the RTSP "Bandwidth" header in either the RTSP SETUP or PLAY request. The QoS negotiated guaranteed bit-rate is the best estimate for the bandwidth value.

10.2.1.2 Regular information sources

The basic information source giving regular reports useful for bit-rate estimations is the RTCP receiver reports as defined by [9]. The RTCP reporting interval is dependent on the RTP profile in use, the bit-rate assigned to RTCP, the average size of RTCP packets, and the number of reporting entities. Most of these parameters can be set or affected by the PSS server through signalling. This allows the server to configure the reporting interval to a desirable working point. See chapter 5.3.3.1 for specification on how the RTCP bandwidth is signalled by the server.

In most PSS RTP sessions the server and the client only have one SSRC each, thus providing the highest possible reporting rate. However some scenarios could result in that the number of used SSRC is larger, thereby possibly lowering the effective reporting interval for client, server or both.

The average size of the RTCP packets cannot be tightly controlled, but a loose control is possible by controlling which RTCP packet types that are used. This will depend on which of the below-listed RTCP extensions are in use.

The PSS server can signal the PSS client in SDP, to request that "Loss RLE Report Block" in RTCP XR (section 6.2.3) are used to report packet loss vectors.

10.2.2 Transmission adaptation

The transmission adaptation is implementation dependent. The 3GPP file format server extensions [50] provide a server the possibility to store alternative encodings useful for stream switching.

A server doing transmission rate adaptation through content rate adaptation shall still deliver content according to the SDP description of the media streams, e.g. a video stream delivered after content rate adaptation must still belong to the SDP announced profile and be consistent with any configuration. This will either put restrictions on the possible alternatives or require declaration of several RTP payload types or media encodings that might not be used.

10.2.3 Signalling for client buffer feedback

The client buffer feedback signalling functionality should be supported by PSS clients and PSS servers. For PSS clients and servers that support the client buffer feedback signalling functionality, the following parts shall be implemented:

– SDP service support, as described in clause 5.3.3.5.

– The size (in bytes) of the buffer the client provides for rate adaptation. It is signalled to the server through RTSP, as described in clause 5.3.2.2

– The target buffer protection time (in milliseconds). It is signalled to the server through RTSP, as described in clause 5.3.2.2.

– The client buffer status feedback information, including free buffer space, next ADU to be decoded and playout delay. It is signalled to the server via RTCP, as described in clause 6.2.3.2.

If a PSS server supports client buffer feedback, it shall include the attribute "3GPP-Adaptation-Support" in the SDP, as described in clause 5.3.3.5. If a PSS client supports client buffer feedback, upon reception of an SDP containing the "3GPP-Adaptation-Support" attribute, it shall include the "3GPP-Adaptation" header in the SETUP for each individual media. Furthermore, upon reception of a successful SETUP response (including "3GPP-Adaptation" header), the PSS client shall send NADU APP packets according to clause 5.3.3.5 and 6.2.3.2.

The "3GPP-Adaptation" header may be included in PLAY, OPTIONS and SET_PARAMETER requests in order to update the target buffer protection time value during a session. However, the target-protection-time is intended to be stable for the entire session with the server there are very few reasons for a client to modify the target buffer protection time once a session is established. The buffer size value shall not be modified during a session.

With the total buffer size, and the reported amount of free buffer space, the server can avoid overflowing the buffer. A server should assume that any sent RTP packet will consume receiver buffer space equal to the complete RTP packet size. For interleaved or aggregated media, the actual buffer space consumption may be slightly larger if buffering is done in the ADU domain. This is because each ADU may save metadata corresponding to the RTP header and payload fields, like timestamp and decoding sequence numbers individually. This should only be a problem if a server tries to fill exactly to the last free memory block.

The server can determine the time to underflow by calculating the amount of media time present in the buffer. This is done using the next ADU numbers, the highest received sequence number, and the playout delay, combined with the server’s view of the sent ADUs and their decoding order and playout time. The information about the ADUs for 3GP files that are produced according to the streaming-server profile can be read from the "3gau" box [50]. It is also possible to derive some of the information about the ADUs from the media track, or hint-track, or the actual RTP packets.

A client needs to choose the target-time and the point on the playout timeline from which it will measure PlayoutDelay such that it will never re-buffer when the target-time is fulfilled. A client should typically begin rebuffering only when it has reached 0 ms buffered data. Once rebuffering has begun, the client should resume playback when the target-time has been fulfilled for all synchronized media streams.

The level of protection needed against transmission rate variations over a wireless network can be substantial (throughput variation because of network load, radio conditions, several seconds of interruption because of handovers, possible extra buffering to perform retransmission). In order to minimise the initial buffering delay, the client may choose an initial buffering that is less than the required buffering it has determined would be satisfactory. The client needs to take into account, however, that it may be unsafe to begin playback prior to fulfilling its target time. For this reason, the target buffer protection time indicates the amount of playable media (in time), which the client perceives necessary to have in its buffer. Therefore a server should not perform content adaptation towards higher content rates until the given target time of media units is available in the buffer.

It is important to note that target-protection-time is intended only to guide the server in its attempts to sustain or improve the quality of the media. There are many situations in which the target-protection time may not be respected by the server which will actually result in better media quality for the client (e.g. when the client sends a target-protection-time smaller than the perceived jitter or when the client sends a target-protection-time that is close to or exceeds the client buffer maximum). The only requirement the target-time places on the server is that the server shall not attempt to upshift prior to attaining the target-time.

Furthermore, while it is possible for the client to modify the target protection time in the 3GPP adaptation header with each RTSP request that is sent to the server, the target protection time is intended to be a stable value for the entire session with the server and should only be modified in circumstances where the client has a more accurate understanding of network and transmission jitter and the efficiency of its ability to process the network buffer. In these circumstances, adjusting the target time up could prevent buffer low points which will cause rebuffering or, adjusting the target time down could provide more head room to allow the server to adapt to the most appropriate rate.

10.3 Issues with deriving adaptation information (informative)

This clause attempts to provide some insight into the functions and issues that exist in deriving client’s buffer status in the server. The issues and the complexity of the functions depend on the media format, but can be characterised by media properties, in particular how much flexibility the media formats allows in transmission, decoding, and playout order. As there are three orderings of encoded media data that are possible, there are two re-orderings:

a) Data may be interleaved (i.e. the transmission order of data differs from the decoding order), and it must be de-interleaved before passing to the decoder.

b) There are forward references in the encoding, e.g. in a video stream, then those references are decoded ‘early’ (out of order) compared to playout order. Thus, the playout order in this case differs from the decode order.

In buffer management, we are trying to ensure

1. that the client’s receiver buffer does not get over-filled (this is over-run);

2. that data does not arrive at an operation point after its need. Specifically, this means that ADUs should not be placed into the final playout queue with a timestamp that has already been passed in playout (this is under-run).

The parameters supplied enable a server to deduce at least this much. The server can always protect against buffer over-run by respecting the ‘free space’ that is periodically signalled by the client. This free-space is totalled over all data held before the decoder (decoder and de-interleave buffers). If the server desires more visibility, it can inspect the ADU that has been reported as ‘next to decode’. If there has been no interleaving, the client holds all data between that ADU and the highest sequence number received, and will probably hold up to the last packet the server has sent. If interleaving is used, then there may have been ADUs that were sent after the reported ADU, but which passed out of both the de-interleaving and decoder buffers before that ADU. The server would have to analyze the de-interleaving process to work out which ADUs these are. The hint-track extension "3gau" to the 3GP file format [50] provides extended information about both the decoding and playout order in relation to transmission order of the ADUs. This extension does also provide the size of the ADUs to the server.

Protection against under-run is more subtle. It is in general not possible for the client to know which ADUs that are yet to be decoded (or yet to be received) that have earlier timestamps than ADUs already received and decoded. Therefore the client does not in fact know what is the ‘latest playable timestamp’, up to which it has received all the ADUs in the sequence to that time.

If the server does not adapt its transmission bit-rate and the transmission path has sufficient bit-rate, the parameters supplied at stream setup (such as the initial buffering delay) are sufficient to protect against under-run. The simple generalization of this is that if the server calculates its average bit-rate since starting the stream, and ensures that the average never falls below the bit-rate that would have been used without rate adaptation, it must be safe. Put in another way, the server may send a packet earlier than it would without rate-adaptation, but it might not be safe to send it later.

A more subtle analysis uses the reported information about the next-to-be-decoded ADU: the sequence number of the packet that contained it, the ADU number within that packet, and the offset (playout delay) of its timestamp (playback time) from the current playback time. Given the first pair of numbers, the server can find the ADU and therefore its timestamp. By subtracting the reported play-out delay from this timestamp, the server can now estimate the current playback time. It can find the earliest timestamp in the ADUs it has yet to transmit, and it can also examine the data that has been sent that will still be in the de-interleaving buffer, for the earliest timestamp still held in the client’s de-interleaving buffer. If the earlier of these two timestamps is at, or close to, the current play time, the client has, or is about to, under-run.

Consider now the following cases, in increasing order of complexity:

1. simple data that is neither interleaved nor re-ordered for display (e.g. AMR without interleave, AAC, H.264).

2. data that is interleaved, but not re-ordered (e.g. AMR with interleave).

3. data that is re-ordered, but not interleaved (AVC without interleave).

4. data that is both interleaved and re-ordered (AVC with interleave).

Consider now over-run and under-run protection for these streams. In all cases, the free-space can be used to protect against over-run, and the maintenance of the average rate at or above the static rate protects against under-run.

1. By subtracting the reported free-space from the overall buffer size (reported in stream setup) the buffered data can be calculated. If this is nearly exhausted, the buffer is about to under-run. However for codecs with variable bit-rate encoding, the buffered space may represent different amounts of playout time. In these cases the playout time present in the yet to be decoded part of the buffer can easily be calculated as the RTP timestamp difference between the latest ADU received by the client as reported implicitly by Highest Received Sequence number and the ADU reported by NADU.

2, The server can estimate the playback time as above. However to perform the calculation of the playout time of the buffer before the decoding, the server may need to maintain a list of the ADUs in the decoding order, rather than in transmission order. Also the data present in the de-interleaving buffer is not complete and would have holes in it and should not be considered to be playable. The server can determine, by looking at the decoding order of the different ADUs present in the transmitted packets, how far the client is expected to have a receiver buffer without holes, due to not yet transmitted packets.

3, In this case it may be fairly complicated to estimate the actual playout time of the un-decoded media. The reason is that the present RTP timestamp associated with the ADUs may fluctuate widely in ADUs consecutive in both transmission and decoding order, due to the early decoding of referenced ADUs. Therefore to perform an accurate estimation the server needs to make special consideration of any ADU with early decoding so that it does not skew the measurement. Note that for AVC bitstreams, a bound of the difference between presentation order and decoding order is given by the bitstream restriction parameter num_reorder_frames.

4, As 3 above, but with the further consideration of needing to perform any investigation in decoding order and consider the holes of the de-interleaving buffer.