5.8 Channel Aware Coding
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
5.8.1 Introduction
EVS offers partial redundancy based error robust channel aware mode at 13.2 kbps for both wideband and super-wideband audio bandwidths. Depending on the criticality of the frame, the partial redundancy is dynamically enabled or disabled for a particular frame, while keeping a fixed bit budget of 13.2 kbps.
5.8.2 Principles of Channel Aware Coding
In a VoIP system, packets arrive at the decoder with random jitters in their arrival time. Packets may also arrive out of order at the decoder. Since the decoder expects to be fed a speech packet every 20 msec to output speech samples in periodic blocks, a de-jitter buffer [7] is required to absorb the jitter in the packet arrival time. Larger the size of the de-jitter buffer, the better is its ability to absorb the jitter in the arrival time and consequently, fewer late arriving packets are discarded. Voice communications is also a delay critical system and therefore it becomes essential to keep the end to end delay as low as possible so that a two way conversation can be sustained.
The design of an adaptive de-jitter buffer reflects the above mentioned trade-offs. While attempting to minimize packet losses, the jitter buffer management algorithm in the decoder also keeps track of the delay in packet delivery as a result of the buffering. The jitter buffer management algorithm suitably adjusts the depth of the de-jitter buffer in order to achieve the trade-off between delay and late losses.
EVS channel aware mode uses partial redundant copies of current frames along with a future frame for error concealment. The partial redundancy technology transmits partial copies of the current frame along with a future frame with the hope that in the event of the loss of the current frame (either due to network loss or late arrival) the partial copy from the future frame can be retrieved from the jitter buffer to improve the recovery from the loss.
The difference in time units between the transmit time of the primary copy of a frame and the transmit time of the redundant copy of the frame (piggy backed onto a future frame) is called the FEC offset. If the depth of the jitter buffer at any given time is at least equal to the FEC offset, then it is quite likely that the future frame is available in the de-jitter buffer at the current time instance. The FEC offset is a configurable parameter at the encoder which can be dynamically adjusted depending on the network conditions. The concept of partial redundancy in EVS with FEC offset equal to 3 is shown in figure .
Figure 87 : Concept of partial redundancy in channel aware mode
The redundant copy is only a partial copy that includes just a subset of parameters that are most critical for decoding or arresting error propagation.
The EVS channel aware mode transmits redundancy in-band as part of the codec payload as opposed to transmitting redundancy at the transport layer (e.g., by including multiple packets in a single RTP payload). Including the redundancy in-band allows the transmission of redundancy to be either channel controlled (e.g., to combat network congestion) or source controlled. In the latter case, the encoder can use properties of the input source signal to determine which frames are most critical for high quality reconstruction at the decoder and selectively transmit redundancy for those frames only. Another advantage of in-band redundancy is that source control can be used to determine which frames of input can best be coded at a reduced frame rate in order to accommodate the attachment of redundancy without altering the total packet size. In this way, the channel aware mode includes redundancy in a constant-bit-rate channel (13.2 kbps).
5.8.3 Bit-Rate Allocation for Primary and Partial Redundant Frame Coding
5.8.3.1 Primary frame bit-rate reduction
A measure of compressibility of the primary frame is used to determine which frames can best be coded at a reduced frame rate. For TCX frame the 9.6kpbs setup is applied for WB as well as for SWB. For ACELP the following apply. The coding mode decision coming from the signal classification algorithm is first checked. Speech frames classified for Unvoiced Coding (UC) or Voiced Coding (VC) are suitable for compression. For Generic Coding (GC) mode, the correlation (at pitch lag) between adjacent sub-frames within the frame is used to determine compressibility. Primary frame coding of upper band signal (i.e., from 6.4 to 14.4 kHz in SWB and 6.4 to 8 kHz in WB) in channel aware mode uses time-domain bandwidth extension (TBE) as described in subclause 5.2.6.1. For SWB TBE in channel aware mode, a scaled down version of the non-channel aware mode framework is used to obtain a reduction of bits used for the primary frame. The LSF quantization is performed using an 8-bit vector quantization in channel aware mode while a 21-bit scalar quantization based approach is used in non-channel aware mode. The SWB TBE primary frame gain parameters in channel aware mode are encoded similar to that of non-channel aware mode at 13.2 kbps, i.e., 8 bits for gain parameters. The WB TBE in channel aware mode uses similar encoding as used in 9.6 kbps WB TBE of non-channel aware mode, i.e., 2 bits for LSF and 4 bits for gain parameters.
5.8.3.2 Partial Redundant Frame Coding
The size of the partial redundant frame is variable and depends on the characteristics of the input signal. Also criticality measure is an important metric. A frame is considered as critical to protect when loss of the frame would cause significant impact to the speech quality at the receiver. The criticality also depends on if the previous frames were lost or not. For example, a frame may go from being non-critical to critical if the previous frames were also lost. Parameters computed from the primary copy coding such as coder type classification information, subframe pitch lag, factor M etc are used to measure the criticality of a frame. The threshold, to determine whether a particular frame is critical or not, is a configurable parameter at the encoder which can be dynamically adjusted depending on the network conditions. For example, under high FER conditions it may be desirable to adjust the threshold to classify more frames as critical. Partial frame coding of upper band signal relies on coarse encoding of gain parameters and interpolation/extrapolation of LSF parameters from primary frame. The TBE gain parameters estimated during the primary frame encoding of the (n – FEC offset )-th frame is re-transmitted during the n-th frame as partial copy information. Depending on the partial frame coding mode, i.e., GENERIC or VOICED or UNVOICED, the re-transmission of the gain frame, uses different quantization resolution and gain smoothing.
The following sections describe the different partial redundant frame types and their composition.
5.8.3.2.1 Construction of partial redundant frame for Generic and Voiced Coding modes
In the coding of the redundant version of the frame, a factor is determined based on the adaptive and fixed codebook energy.
()
In this equation, denotes the adaptive codebook energy and
denotes the fixed codebook energy. A low value of
indicates that most of the information in the current frame is carried by the fixed codebook contribution. In such cases, the partial redundant copy (RF_NOPRED) is constructed using one or more fixed codebook parameters only (FCB pulses and gain). A high value of M indicates that most of the information in the current frame is carried by the adaptive codebook contribution. In such cases, the partial redundant copy (RF_ALLPRED) is constructed using one or more adaptive codebook parameters only (pitch lag and gain). If M takes mid values then a mixed coding mode is selected where one or more adaptive codebook parameters and one or more fixed codebook parameters are coded (RF_GENPRED). Under Generic and Voiced Coding modes, the TBE gain frame values are typically low and demonstrate less variance. Hence a coarse TBE gain frame quantization with gain smoothing is used.
5.8.3.2.2 Construction of partial redundant frame for Unvoiced Coding mode
The low bit-rate Noise Excited Linear Prediction coding scheme (subclause 5.2.5.4) is used to construct a partial redundant copy for an unvoiced frame type (RF_NELP). In Unvoiced coding mode, the TBE gain frame has a wider dynamic range. To preserve this dynamic range, the TBE gain frame quantization in Unvoiced coding mode uses a similar quantization range as that of the one used in the primary frame.
5.8.3.2.3 Construction of partial redundant frame for TCX frame
In case of TCX partial redundant frame type, a partial copy consisting of some helper parameters is used to enhance the frame loss concealment algorithm. There are three different partial copy modes available, which are RF_TCXFD, RF_TCXTD1 and RF_TCX_TD2. Similar to the PLC mode decision on the decoder side, the selection of the partial copy mode for TCX is based on various parameters such as the mode of the last two frames, the frame class, LTP pitch and gain.
5.8.3.2.3.1 Frequency domain concealment (RF_TCXFD) partial redundant frame type
29 bits are used for the RF_TCXFD partial copy mode.
• 13bits are used for the LSF quantizer which is the same as used for regular low rate TCX coding.
• The global TCX gain is quantized using 7 bits.
• The classifier info is coded on 2 bits.
5.8.3.2.3.2 Time domain concealment (RF_TCXTD1 and RF_TCXTD2) partial redundant frame type
The partial copy mode RF_TCXTD1 is selected if the frame contains a transient or if the global gain of the frame is much lower than the global gain of the previous frame. Otherwise RF_TCXTD2 is chosen.
Overall 18bits of side data are used for both modes.
• 9bits are used to signal the TCX LTP lag
• 2 bits for signalling the classifier info
5.8.3.2.4 RF_NO_DATA partial redundant frame type
This is used to signal a configuration where the partial redundant copy is not sent and all bits are used towards primary frame coding.
The primary frame bit-rate reduction and partial redundant frame coding mechanisms together determine the bit-rate allocation between the primary and redundant frames to be included within a 13.2 kbps payload.
5.8.3.3 Decoding
At the receiver, the de-jitter buffer provides a partial redundant copy of the current lost frame if it is available in any of the future frames. If present, the partial redundant information is used to synthesize the lost frame. In the decoding, the partial redundant frame type is identified and decoding performed based on whether only one or more adaptive codebook parameters, only one or more fixed codebook parameters, or one or more adaptive codebook parameters and one or more fixed codebook parameters, TCX frame loss concealment helper parameters, or Noise Excited Linear Prediction parameters are coded. If either the current frame or the previous frame adjacent to the current frame is a partial redundant frame, then the decoding parameter of the current frame, such as the LSP parameters, the gain of the adaptive codebook, the gain of the fixed codebook or the BWE gain, is firstly obtained and then post-processed according to the decoding parameters or signal type from previous frames relative to the current frame position and the decoding parameters or signal type from the current frame. Additionally, the decoding parameters or signal type from the future frames relative to the current frame position may also be used for the post-processing of the gains of the adaptive codebook and the fixed codebook, and the spectral tilt may also be used for the post-processing of the gain of the adaptive codebook. The post-processed parameters are used to reconstruct the output signal. Finally, the frame is reconstructed based on the coding scheme. The TCX partial info is decoded, but in contrast to ACELP partial copy mode, the decoder is run in concealment mode. The difference to regular concealment is just that the parameters available from the bitstream are directly used and not derived by concealment.
5.8.4 Channel aware mode encoder configurable parameters
The channel aware mode encoder may use the following configurable parameters to adapt its operation to track the channel characteristics seen at the receiver. These parameters maybe computed at the receiver (as described in subclause 6.3) and communicated to the encoder via a receiver triggered feedback mechanism.
Partial redundancy offset (): The difference in time units between the transmit time of the primary copy of a frame
and the transmit time of the redundant copy of that frame which is piggy backed onto a future frame
is called the partial redundancy offset or FEC offset
. The optimal partial redundancy offset that maximizes the probability of partial redundant copy retrival may be computed in the decoder JBM solution and fed back to the encoder as described above.
Frame erasure rate indicator () having the following values:
(low) for FER rates <5% or
(high) for FER>5%. This parameter controls the threshold used to determine whether a particular frame is critical or not as described in subclause 5.8.3.2. Such an adjustment of the criticality threshold is used to control the frequency of partial copy transmission. The
setting adjusts the criticality threshold to classify more frames as critical to transmit as compared to the
setting.
It is noted that these encoder configurable parameters are optional with default set to =
and
=3.