9 Packet-loss handling

26.1143GPPIP Multimedia Subsystem (IMS)Media handling and interactionMultimedia telephonyRelease 18TS

9.1 General

This clause specifies some methods to handle conditions with packet losses. Packet losses in general will also trigger adaptation, which is specified in clause 10.

The ‘a=bw-info’ attribute defined in clause 19 allows for negotiating how much additional bandwidth (if any) may be used for application layer redundancy in the session. When application layer redundancy is used, the media bandwidth negotiated for the session may need to be increased, i.e. by increasing the value used for the b=AS bandwidth modifier. The b=AS bandwidth modifier is however only a single value, which also applies only to the receiving direction. When an MTSI client in terminal sends the SDP, it is therefore not possible for the networks and the other client to know if the intention is to use the entire media bandwidth all the time (both with and without redundancy); or if the intention is to use the b=AS bandwidth only when redundancy is needed and to use a lower bandwidth when redundancy is not needed. It is also not possible to know what the MTSI client in terminal can do in the sending direction. The ‘a=bw-info’ attribute (see clause 19) offers an improved negotiation mechanism to better know what the MTSI client in terminal can do and what it intends to do. This is further discussed in TR 26.924 [144].

Improved error robustness can be enabled by packet-loss handling procedures of the codec or codec mode, via the adaptation procedures described in clause 10, or other mechanisms. Annex X specifies the CHEM feature which enables the 3GPP system to exploit error robustness to avoid, delay, or reduce the need to handoff a terminal due to degradation in the media quality. Annex Y provides example PLR threshold values that can be used for different codec configurations.

9.2 Speech

9.2.1 General

This clause provides a recommendation for a simple application layer redundancy scheme that is useful in order to handle operational conditions with severe packet loss rates. Simple application layer redundancy is generated by encapsulating one or more previously transmitted speech frames into the same RTP packet as the current previously not transmitted frame(s). An RTP packet may thus contain zero, one or several redundant speech frames and zero, one or several non-redundant speech frames.

When transmitting redundancy, the MTSI client should switch to a lower codec mode, if possible. An MTSI client using AMR or AMR-WB shall utilize the codec mode rates within the negotiated codec mode set with the negotiated adaptation steps and limitations as defined by mode-change-neighbor and mode-change-period. It is recommended to not send redundant speech frames before the targeted codec mode is reached. Table 9.1 defines the recommended codec modes for different redundancy level combinations.

When application layer redundancy is used for AMR or AMR-WB encoded speech media, the transmitting application may use up to 300 % redundancy, i.e. a speech frame transported in one RTP packet may be repeated in 3 other RTP packets.

Table 9.1: Recommended codec modes and redundancy level combinations
when redundancy is supported

Redundancy level

No redundancy

100 % redundancy

Narrow-band speech

AMR 12.2

AMR 5.9

Wide-band speech (when wide-band is supported)

AMR12.65

AMR 6.60

9.2.2 Transmitting redundant frames

When transmitting redundant frames, the redundant frames should be encapsulated together with non-redundant media data as shown in figure 9.1. The frames shall be consecutive with the oldest frame placed first in the packet and the most recent frame placed last in the packet. The RTP Timestamp shall represent the sampling time of the first sample in the oldest frame transmitted in the packet.

NOTE 1: When switching from no redundancy to using redundancy, the RTP Timestamp may be the same for consecutive RTP packets.

Figure 9.1: Redundant and non-redundant frames in the case of 100 % redundancy,
when the original packing is 1 frame per packet

Figure 9.1 shows only one non-redundant frame encapsulated together with one redundant frame. It is allowed to encapsulate several non-redundant frames with one or several redundant frames. The following combinations of non-redundant frames and redundant frames can be used.

Table 9.2: Example frame encapsulation with different redundancy levels and when maxptime is 240

Original encapsulation (without redundancy)

Encapsulation with 100 % redundancy

Encapsulation with 200 % redundancy

Encapsulation with 300 % redundancy

1 frame per packet

≤ 1 non-redundant frame and

≤ 1 redundant frame

≤ 1 non-redundant frame and

≤ 2 redundant frames

≤ 1 non-redundant frame and

≤ 3 redundant frames

2 frames per packet

≤ 2 non-redundant frames and

≤ 2 redundant frames

≤ 2 non-redundant frames and

≤ 4 redundant frames

≤ 2 non-redundant frames and

≤ 6 redundant frames

3 frames per packet

≤ 3 non-redundant frames and

≤ 3 redundant frames

≤ 3 non-redundant frames and

≤ 6 redundant frames

≤ 3 non-redundant frames and

≤ 9 redundant frames

4 frames per packet

≤ 4 non-redundant frames and

≤ 4 redundant frames

≤ 4 non-redundant frames and

≤ 8 redundant frames

Not allowed since maxptime does not allow more than 12 frames per RTP packet in this example

With a maxptime value of 240, it is possible to encapsulate up to 12 frames per packet. It is therefore not allowed to use 300 % when the original encapsulation is 4 frames per packet, as shown in table 9.2. If the receiver’s maxptime value is lower than 240 then even more combinations of original encapsulation and redundancy level will be prohibited.

The sender shall also ensure that the Maximum Transfer Unit (MTU) is not exceeded when sending the IP/UDP/RTP packet.

Figure 9.2 shows an example where the frame aggregation is 2 frames per packet and when 100 % redundancy added.

Figure 9.2: Redundant and non-redundant frames in the case of 100 % redundancy,
when the original packing is 2 frames per packet

A redundant frame may be replaced by a NO_DATA frame. If the transmitter wants to encapsulate non-consecutive frames into one RTP packet, then NO_DATA frames shall be inserted for the frames that are not transmitted in order to create frames that are consecutive within the packet. This method is used when sending redundancy with an offset, see figure 9.3.

Figure 9.3: Redundant and non-redundant frames in the case of 100 % redundancy, when the original
packing is 1 frame per packet and when the redundancy is transmitted with an offset of 20 ms

Note that with this scheme, the receiver may receive a frame 3 times: first the non-redundant encoding; then as a NO_DATA frame; and finally the redundant frame. Other combinations of redundancy and offset may result in receiving even more copies of a frame. The proper receiver behaviour is described in the AMR/AMR-WB payload format [28] and in the EVS payload format [125], respectively.

For any combinations of frame aggregation, redundancy and redundancy offset, the transmitter shall not exceed the frame encapsulation limit indicated by the receiver’s maxptime value when constructing the RTP packet.

When source controlled rate operation is used, it is allowed to send redundant media data without any non-redundant media, if no non-redundant media is available.

NOTE 2: When going from active speech to DTX, there may be no non-redundant frames in the end of the talk spurt while there still are redundant frames that need to be transmitted.

In the end of a talk spurt, when there are no more non-redundant frames to transmit, it is allowed to drop the redundant frames that are in the queue for transmission.

NOTE 3: This ensures that it is possible to use redundancy without increasing the packet rate. The quality degradation by having less redundancy for the last frames should be negligible since these last frames typically contain only background noise.

9.2.3 Receiving redundant frames

In order to receive and decode redundant media properly, the receiving application shall sort the received frames based on the RTP Timestamp and shall remove duplicated frames. If multiple versions of a frame are received, i.e. encoded with different bitrates, then the frame encoded with the highest bitrate should be used for decoding.

9.3 Video

9.3.1 General

MTSI clients can use the following mechanisms to recover from packet losses:

– AVPF Generic NACK and Picture Loss Indication (PLI) feedback messages

– RTP Retransmission

– Forward Error Correction (FEC)

These mechanisms offer different performance tradeoff according to channel conditions: end-to-end delay, bandwidth, rate and profile of packet losses.

AVPF NACK messages are used by MTSI clients to indicate non-received RTP packets for video (see clause 7.3.3). An MTSI client transmitting video can use this information, as well as the AVPF Picture Loss Indication (PLI), to at its earliest opportunity take appropriate action to recover video from errors for the MTSI client that sent the NACK or PLI message. Recovery from error action is defined as sending a recovery picture that is equivalent to a good frame in clause 16.2.1, sending Gradual Decoder Refresh (GDR) that results in a good frame, or retransmitting missing packets. Requirements and recommendations for packet loss handling are described below.

Recommendation regarding usage under various channel conditions of error recovery mechanisms available for MTSI clients are provided in clause 9.3.4.

9.3.2 Receiver behaviour

When NACK and PLI have been negotiated without retransmission support for the session then an MTSI client in terminal receiving media:

– shall immediately queue a NACK message for RTCP scheduling upon detection of first error after decoding a good frame.

– should repeat queuing NACK messages for RTCP scheduling after an RWT duration if recovery picture does not arrive.

– shall queue a PLI message for RTCP scheduling if a recovery picture does not arrive in two RWT duration, and shall then stop sending NACK messages that relate to the same data as that PLI.

– shall repeat queuing PLI messages for RTCP scheduling after an RWT duration if the initially requested recovery picture does not arrive.

Receiver may report more losses or repeat messages if it deems necessary. As a minimum requirement on the receiver side, it shall support the capability of picture level error detection or tracking in order to stop reporting of prior losses from the recovery point. If FEC is supported, NACK messages should signal irrecoverable lost packets after recovery by FEC instead of the detected lost packets before recovery. The receiver should wait "repair-window" duration before issuing a NACK message for the missing packets.

When retransmission is supported, NACK messages correspond to requests for missing packets to be retransmitted instead of indication of error position for recovery without retransmission. The receiver should queue NACK messages for RTCP scheduling as necessary. The importance of lost packets along with possibility of timely arrival of requested packets should be considered before requesting retransmission.

Annex P.2 gives further description of receiver behaviour for error correction.

9.3.3 Sender behaviour

When NACK and PLI have been negotiated without retransmission support for the session then an MTSI client in terminal sending media:

– shall send a recovery picture or Gradual Decoder Refresh (GDR) upon receiving NACK message if loss indicated by the message corresponds to error in a reference picture within 500 ms. If a recovery picture corresponding to the NACK message was sent prior to reception of the NACK message by less than RWT duration, the sender does not have to respond to this particular NACK message.

– shall send an Instantaneous Decoder Refresh (IDR) or GDR picture upon receiving PLI message within 500 ms.

– should not respond to incoming NACK or PLI messages within RWT duration of the same message type indicating the same loss from the reception of the initial feedback message triggered by the onset of the loss.

IDR picture is an intra picture where pictures following the IDR picture can be decoded without referring (inter prediction) to pictures decoded prior to IDR picture. This corresponds to IDR pictures in H.264 and HEVC. GDR is performing intra refresh by distributing intra picture data over N pictures. At the end of N pictures from the start of GDR all macroblock regions are intra coded (refreshed) generating a good frame. Similar to IDR case, if intra picture or GDR is used as a recovery mechanism, the pictures following the intra picture or the GDR pictures should not reference pictures decoded prior to these pictures.

When retransmission is supported, sender should retransmit packets that it deems beneficial for timely recovery. Only source packets should be retransmitted. The minimum time the sender should keep an original RTP packet in its buffers available for retransmission, i.e. "rtx-time" value, should be RTT and the maximum time the sender should keep an original RTP packet should be 400 ms.

When FEC is supported, the amount of time that spans the source and the corresponding repair packets, i.e. "repair-window", should not be less than the minimum required duration for the bitmask used for generation of the repair packets. The maximum time that spans the source and corresponding repair packets should not be more than 300 ms.

An MTSI client in terminal sending video shall obey the rate restrictions imposed by the video rate adaptation specified in clause 10.3.

Annex P.2 gives further description of sender behaviour for error correction. Annex P.3 gives further description of sender and receiver behaviour for RTP retransmission based recovery.

9.3.4 Recommendations for packet loss recovery mechanisms usage

FEC should be used to provide robustness against moderate packet loss rates at high delay scenario. FEC can especially handle random losses and short burst losses and be beneficial in environments with high packet loss rates and/or high delay (RTT). The use of FEC may not be appropriate when packet losses are caused by insufficient throughput (over radio access or due to congestions in network) since it introduces some bit rate overhead. In order to compensate for bit rate overhead, FEC should be used with efficient rate adaptation mechanisms to reduce the source bit rate according to channel conditions and not increase the total RTP bitrate. When error cases cannot be recovered by FEC, other mechanisms are needed in combination with FEC.

– Retransmission in combination with FEC should be used for the low RTT case with relatively high packet loss since retransmission can efficiently handle the FEC failure case.

– Generic NACK based recovery in combination with FEC should be used for high RTT, relatively high packet loss conditions since generic NACK based recovery does not introduce additional delay.

Selective retransmission should be used under low delay (RTT) and low failure (loss) rate conditions. Retransmission needs to ensure that retransmitted packets arrive in time to meet delay requirements of the end-to-end system. Higher packet loss rates may cause loss of retransmitted packets, hence leading to larger end-to-end delay.

Generic NACK and PLI based error correction mechanism should be used in combination with FEC or selective retransmission or under low packet loss rates with high RTT conditions. Generic NACK message can be used for indication of packets to be retransmitted as well as informing the sender of loss of particular RTP packets for sender to take necessary actions to recover from errors.

NOTE: Under unknown and varying conditions the MTSI client should dynamically select & adapt the propoer mechanisms.

Additional information on the usage of these mechanisms is provided in [142].

9.4 Text

Redundant transmission provided by the RTP payload format as described in RFC 4103 [31] shall be supported. The transmitting application may use up to 200 % redundancy, i.e. a T140block transported in one RTP packet may be repeated once or twice in subsequent RTP packets. 200 % redundancy shall be used when the conditions along the call path are not known to be free of loss. However, the result of media negotiation shall be followed, and transmission without redundancy used if one of the parties does not show capability for redundancy.

The sampling time shall be 300 ms as a minimum (in order to keep the bandwidth down) and should not be longer than 500 ms. New text after an idle period shall be sent as soon as possible. The first packet after an idle-period shall have the M-bit set.

The procedure described in section 5 of RFC 4103 [31], or a procedure with equivalent or better performance, shall be used for packet-loss handling in the receiving MTSI client in terminal.