5.6 De-Jitter Buffer

26.4483GPPCodec for Enhanced Voice Services (EVS)Jitter Buffer ManagementRelease 17TS

RTP packets are transmitted on the network with a time-varying delay (network jitter) and might be reordered, lost or duplicated. The De-Jitter Buffer stores the frames contained in RTP packets received from the network and prepares them to be fed into the audio decoder in the correct order.

The De-Jitter Buffer uses a ringbuffer data structure with a fixed capacity. To avoid excessive delay and memory usage in extreme cases like major delay bursts on the network, the ringbuffer memory is allocated at initialization with a capacity to store up to three seconds of active audio data, i.e. 150 entries for a frame duration of 20 ms. In case of an overflow of the De-Jitter Buffer the oldest frames (lowest timestamp) will be dropped from the ringbuffer.

The ringbuffer is sorted by the media timestamp in ascending order. To forward a frame from the De-Jitter Buffer to the audio decoder, the frame stored at the beginning of the ringbuffer is dequeued, i.e. the frame with the lowest timestamp. To enqueue a frame the De-Jitter Buffer uses a binary search algorithm to compare the timestamps, in order to insert the new frame at the correct position in the ringbuffer and undo any potential reordering that occurred on the network. To handle packet duplication on the network, a newly received frame will be ignored if another frame with equal timestamp and size is already stored in the ringbuffer. If a newly received frame has the same timestamp but different size than an existing frame, the frame with the greater buffer size will be stored in the ringbuffer and the other frame will be ignored, or dropped if already stored. This allows for the sending of redundant frames (either identical copies of a frame or frames where the same signal is encoded with a lower bitrate mode) as forward error correction to combat high packet loss on the network.

The De-Jitter Buffer does not explicitly handle missing frames, e.g. frames which are lost on the network or are not yet received, and thus does not store NO_DATA frames. Instead, the Adaptation Control Logic module is responsible to decide if a frame should be dequeued from the De-Jitter Buffer. The Adaptation Control Logic module will preview the first frame stored in the De-Jitter Buffer and compare its timestamp with the expected timestamp for playout before dequeuing the frame. The depth of the De-Jitter Buffer is therefore dynamic and controlled by the Adaptation Control Logic module.