5.2 Depacketization of RTP packets (informative)

26.4483GPPCodec for Enhanced Voice Services (EVS)Jitter Buffer ManagementRelease 17TS

The RTP Depacker module of the JBM performs the depacketization of the incoming RTP packet stream. During this operation the EVS frames, embedded in RTP packets according to the respective RTP payload format [2], [9], are extracted and pushed to the de-jitter buffer. The RTP timestamp in an RTP packet for EVS always refers to the first EVS frame in the RTP payload. Any further EVS frames in the RTP payload are indexed in the RTP Payload Format Header by a Table of Contents (ToC) [2], [9]. The RTP Depacker performs the unpacking and calculates and assigns a media timestamp to every speech frame present in each received RTP packet.

The Jitter Buffer Management (JBM) for the EVS codec depends on information that is part of the received RTP packet stream. Each RTP packet consists of an RTP header and the RTP payload. The following data fields of the RTP header are of relevance for the JBM:

– RTP timestamp

– RTP sequence number

The marker bit in the RTP header is not evaluated by this JBM solution. Other fields in the RTP header are needed to correctly assign the incoming RTP packets to an RTP session, which is outside the scope of this specification.

All extracted frames (without NO_DATA frames) are fed to the JBM. The data structure for one frame consists of:

– Frame payload data, including the size of the payload

– Arrival timestamp of the RTP packet containing the frame

– Media timestamp in RTP timescale units, derived from the RTP timestamp of the packet

– Media duration in RTP timescale units (20 ms for EVS frames)

– RTP timescale as specified in the specification of the RTP payload format

– RTP sequence number

– SID flag

– Partial copy flag

To optimize the JBM behaviour for DTX, the JBM needs to be aware of SID frames. Determining this information depends on the implementation of the underlying audio codec. To keep the JBM independent of the audio codec, the SID flag needs to be fed to the JBM. In case of the EVS, AMR and AMR-WB codecs the SID flag can be determined from the size of the frame payload data.

Audio encoders supporting DTX typically output NO_DATA frames between SID frames to signal that a frame was not encoded because it does not contain an active signal and should be substituted with comfort noise by the audio decoder. Instead of NO_DATA frames this JBM solution uses the RTP timestamp for media time calculation. Therefore the RTP Depacker should not feed NO_DATA frames into the JBM.

The JBM handles packet reordering and duplication on the network and so the RTP Depacker can feed those frames into the JBM exactly as received, therefore a typical RTP Depacker implementation might be state-less.