5 Functions on the transmit (TX) side

26.1923GPPAdaptive Multi-Rate - Wideband (AMR-WB) speech codecComfort noise aspectsRelease 17Speech codec speech processing functionsTS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

The comfort noise evaluation algorithm uses the following parameters of the AMR-WB speech encoder, defined in [2]:

– the unquantized Linear Prediction (LP) parameters, using the Immittance Spectral Pair (ISP) representation, where the unquantized Immittance Spectral Frequency (ISF) vector is given by ;

The algorithm computes the following parameters to assist in comfort noise generation:

– the weighted averaged ISF parameter vector (weighted average of the ISF parameters of the eight most recent frames);

– the averaged logarithmic frame energy (average of the logarithmic energy of the eight most recent frames).

These parameters give information on the level () and the spectrum () of the background noise.

The evaluated comfort noise parameters ( and) are encoded into a special frame, called a Silence Descriptor (SID) frame for transmission to the RX side.

A hangover logic is used to enhance the quality of the silence descriptor frames. A hangover of seven frames is added to the VAD flag so that the coder waits with the switch from active to inactive mode for a period of seven frames, during that time the decoder can compute a silence descriptor frame from the quantized ISFs and the logarithmic frame energy of the decoded speech signal. Therefore, no comfort noise description is transmitted in the first SID frame after active speech. If the background noise contains transients which will cause the coder to switch to active mode and then back to inactive mode in a very short time period, no hangover is used. Instead the previously used comfort noise frames are used for comfort noise generation.

The first SID frame also serves to initiate the comfort noise generation on the receive side, as a first SID frame is always sent at the end of a speech burst, i.e., before the transmission is terminated.

The scheduling of SID or speech frames on the network path is described in [4].

5.1 ISF evaluation

The comfort noise parameters to be encoded into a SID frame are calculated over N=8 consecutive frames marked with VAD=0, as follows:

Prior to averaging the ISF parameters over the CN averaging period, a median replacement is performed on the set of ISF parameters to be averaged, to remove the parameters which are not characteristic of the background noise on the transmit side. First, the spectral distances from each of the ISF parameter vectors to the other ISF parameter vectors , i=0,…,7, j=0,…,7, ij, within the CN averaging period are approximated according to the equation:

, (1)

where is the kth ISF parameter of the ISF parameter vector at frame i.

To find the spectral distance of the ISF parameter vector to the ISF parameter vectors of all the other frames j=0,…,7, ji, within the CN averaging period, the sum of the spectral distances is computed as follows:

(2)

for all i=0,…,7, ij.

The ISF parameter vector with the smallest spectral distance of all the ISF parameter vectors within the CN averaging period is considered as the median ISF parameter vector of the averaging period, and its spectral distance is denoted as . The median ISF parameter vector is considered to contain the best representation of the short-term spectral detail of the background noise of all the ISF parameter vectors within the averaging period. If there are ISF parameter vectors within the CN averaging period with

, (3)

where is the median replacement threshold, then at most two of these ISF parameter vectors (the ISF parameter vectors causing to be exceeded the most) are replaced by the median ISF parameter vector prior to computing the averaged ISF parameter vector .

The set of ISF parameter vectors obtained as a result of the median replacement are denoted as , where n is the index of the current frame, and i is the averaging period index (i=0,…,7).

When the median replacement is performed at the end of the hangover period (first CN update), all of the ISF parameter vectors of the 7 previous frames (the hangover period, i=1,…,7) have quantized values, while the ISF parameter vector at the most recent frame n has unquantized values. In the subsequent CN updates, the ISF parameter vectors of the CN averaging period in the frames overlapping with the hangover period have quantized values, while the parameter vectors of the more recent frames of the CN averaging period have unquantized values. When the period of the eight most recent frames is non-overlapping with the hangover period, the median replacement of ISF parameters is performed using only unquantized parameter values.

The averaged ISF parameter vector at frame n shall be computed according to the equation:

(4)

where is the ISF parameter vector of one of the eight most recent frames (i = 0,…,7) after performing the median replacement, i is the averaging period index, and n is the frame index.

The averaged ISF parameter vector at frame n is quantized using the comfort noise ISF quantization tables The mean removed ISF vector to be quantized is obtained according to the following equation:

(5)

where is the averaged ISF parameter vector at frame n, is the constant mean ISF vector, is the computed ISF mean removed vector at frame n, and n is the frame index.

5.2 Frame energy calculation

The frame energy is computed for each frame marked with VAD=0 according to the equation :

(6)

where is the high-pass-filtered input speech signal of the current frame i. The energy is also adjusted according to the signalled speech modes capabilities, as to provide high quality transitions from Comfort Noise to Speech.

The averaged logarithmic energy is computed by:

. (7)

The averaged logarithmic energy is quantized using a 6 bit arithmetic quantizer. The 6 bits for the energy index are transmitted in the SID frame (see bit allocation in table 1).

5.3 Analysis of the variation and stationarity of the background noise

The encoder first determines how stationary background noise is. Dithering is employed for non-stationary background noise. The information about whether to use dithering or not is transmitted to the decoder using a binary information (CN_dith -flag).

The binary value for the CN_dith -flag is found by using the spectral distance of the spectral parameter vector to the spectral parameter vectors of all the other frames j=0,…, l_dtx-1, ji within the CN averaging period (l_dtx). The computation of the spectral distance is described in Chapter 5.1. A sum of spectral distances is then computed. If D_S is small, CN_dith -flag is set to 0. Otherwise, CN_dith -flag is set to 1. Additionally, variation of energy between frames is studied. The sum of absolute deviation of en_log(i) from the average en_log is computed. If the sum is large, CN_dith -flag is set to 1, even if the flag was earlier set to 0.

5.4 Modification of the speech encoding algorithm during SID frame generation

When the TX_TYPE is not equal to SPEECH the speech encoding algorithm is modified in the following way:

– The non-averaged LP parameters which are used to derive the filter coefficients of the filters and of the speech encoder are not quantized;

– The open loop pitch lag search is performed, but the closed loop pitch lag search is inactivated. The adaptive codebook memory is set to zero.

– No fixed codebook search is made.

– The memory of weighting filter is set to zero, i.e., the memory of is not updated.

– The ordinary LP parameter quantization algorithm is inactive. The averaged ISF parameter vector is calculated each time a new SID frame is to be sent. This parameter vector is encoded into the SID frame as defined in subclause 5.1.

– The ordinary gain quantization algorithm is inactive.

– The predictor memories of the ordinary LP parameter quantization algorithm is initialized when TX_TYPE is not SPEECH, so that the quantizers start from known initial states when the speech activity begins again.

In the 23.85 kbit/s mode, when the TX_TYPE is equal to SPEECH and VAD is OFF, the speech encoding algorithm is modified in the following way:

– The generation of high-band gain g_HB is changed by adapting it during non-active speech period towards estimated gain in order to ensure smooth transition of high-band gain. g_HB is then

(8)

where hang_DTX is DTX counter.

5.4 SID-frame encoding

The encoding of the comfort noise bits in a SID frame is described in [5] where the indication of the first SID frame is also described. The bit allocation and sequence of the bits from comfort noise encoding is shown in Table 1.