5 Functions on the transmit (TX) side

3GPP46.022Comfort noise aspects for the half rate speech traffic channelsHalf rate speechRelease 17TS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

The comfort noise evaluation algorithm uses the following parameters of the GSM half rate speech encoder, defined in GSM 06.20 [2]:

– the unquantized frame energy value R0;

– the unquantized (normalized) autocorrelation sequence R(i) derived from the optimal reflection coefficients rj;

– the quantized energy tweak parameter GS.

These parameters give information on the level (R0 and GS) and the spectrum (R(i)) of the background noise.

Two of the evaluated comfort noise parameters (R0 and R(i)) are encoded into a special frame, called a SIlence Descriptor (SID) frame, for transmission to the RX side. While the energy tweak parameter GS can be evaluated in the encoder and decoder in the same way as given in subclause 5.1, therefore no transmission of GS is necessary.

The SID frame also serves to initiate the comfort noise generation on the RX side, as a SID frame is always sent at the end of a speech burst, i.e. before the radio transmission is terminated.

The scheduling of SID or speech frames on the radio path is described in GSM 06.41 [3].

5.1 Background acoustic noise evaluation

The comfort noise parameters to be encoded into a SID frame are calculated over 8 consecutive frames marked with Voice Activated Detector (VAD) flag = "0", as follows:

The frame energy values shall be averaged according to the equation:

mean (R0[j]) = 1/8 SUM R0[j-n];

n=0

where:

R0[j] is the frame energy value of the current frame j (n=0);

R0[j-n] is the frame energy of the previous frames (n=1,…,7);

n is the averaging period index n=0,1,…,7;

j is the frame index.

The averaged value mean(R0[j]) is encoded using the same encoding table that is also used by the GSM half rate speech codec for the encoding of the non-averaged R0 values in ordinary speech encoding mode.

The (normalized) autocorrelation sequence R(i) shall be averaged according to the equation:

mean (R[j](i)) = 1/8 SUM R[j-n](i) i = 0,1,2…,10;

n=0

where:

R[j](i) is the i’th autocorrelation value of the current frame j (n=0);

R[j-n](i) is the i’th autocorrelation value of one of the previous frames (n=1,…,7);

n is the averaging period index n=0,1…,7;

j is the frame index.

The averaged values mean(R[j](i)) are used as input parameters of the Autocorrelation Fixed Point LAttice Technique (AFLAT) recursion algorithm which calculates the Vector Quantization (VQ) indices of the reflection coefficients, see GSM 06.20 [2].

The SID frame containing the quantization index of mean(R0[j]), the VQ indices of mean(R[j](i)) and the SID codeword is passed to the radio subsystem instead of frame number j (see subclause 5.3, SID-frame encoding).

The averaging of the energy tweak parameters GS is made on the basis of the quantized GS parameters. The quantized GS parameters can be derived from the GSP0 indices. These indices are used as pointers to the GSP0 vector quantization codebook. The GS components of the selected GSP0 vectors are the quantized GS values which will be averaged.

The quantized energy tweak parameters GS shall be averaged according to the equation:

7 4

mean (GS[j]) = 1/28 SUM ( SUM GS[j-n](i) );

n=1 i=1

where:

GS[j](i) is the quantized energy tweak parameter in subframe i of the current frame j (n=0);

GS[j-n](i) is the quantized energy tweak parameter in subframe i of one of the last frames (n=1,…7);

n is the averaging period index n=1,2,…,7;

i is the subframe index i=1,2,3,4;

j is the frame index.

NOTE: The averaging of GS is made over 7 frames only.

For each comfort noise insertion period, the averaging of the GS parameters is done only once before sending the first SID frame to the decoder and for the rest of the comfort noise insertion period, the averaged value mean(GS[j]) will be frozen.

Under normal conditions, the averaging of the GS parameters is done during the hangover period, but in case of short speech bursts handling, the hangover period can be skipped under certain conditions, see GSM 06.41 [3]. In such cases, the GS parameters of the last seven speech frames marked with SP flag="1" are averaged.

The hangover period is defined in GSM 06.41 [3]. It is a period added at the end of a speech burst in which no voice activity is detected (VAD flag="0"), but the speech encoder stays for the processing of 7 speech frames in speech encoding mode (SP flag= "1"). This hangover period and the first SID frame are used for averaging the comfort noise parameters contained in the first SID frame.

mean(GS[j]) can be evaluated at the decoder in the same way as in the encoder, because in both the encoder and decoder, the GSP0 indexes of the last 7 speech frames shall be kept in memory. In case of an error free transmission, the GSP0 indexes are identical at the encoder and decoder.

5.2 Modification of the speech encoding algorithm during SID frame generation

When the SP flag is equal to "0", the speech encoding algorithm is modified in the following way:

– the non-averaged reflection coefficients which are used to derive the filter coefficients of the filters H(z) and W(z) of the speech encoder are not quantized;

– the unvoiced speech encoding mode is forced. This simplifies the open loop long term prediction processing: only the integer lags have to be calculated, no determination of fractional lags is necessary and the frame lag trajectory derivation can be avoided;

– no fixed codebook search is made. In each subframe, the indices of both fixed codebooks (CODE1_1, …,CODE1_4 and CODE2_1, …,CODE2_4) are replaced by pseudo random numbers uniformly distributed in [0,127] (7 bit random numbers);

– no GSP0 determination is made. The GSP0 codeword is selected as follows:

– at the beginning of a comfort noise insertion period, mean(GS[j]) is calculated as defined in subclause 5.1. Then mean(GS[j]) is quantized, using only the GS component of the GSP0 vector quantization codebook of the unvoiced speech encoding mode as quantization table. The P0 parameter is not averaged. For this parameter, the value is used which is associated with the quantized mean(GS[j]) value in the GSP0 codebook of the unvoiced speech encoding mode. For the rest of the comfort noise insertion period, the GSP0 indices are frozen.

A simplified block diagram of the GSM half rate speech encoder in comfort noise insertion mode is shown in figure 1.

Figure 1: GSM half rate speech encoder in comfort noise insertion mode

5.3 SID-frame encoding

The SID frame encoding algorithm exploits the fact that only some of the 112 bits in a frame are needed to code the comfort noise parameters. The other bits can then be used to mark the SID frame by means of a fixed bit pattern, called the SID codeword.

SID frames are encoded in the encoder output format for voiced frames (MODE = 3), because the two voicing mode bits are part of the SID codeword.

The index of the frame energy value R0 is replaced by the quantization index derived from mean(R0[j]). mean(R0[j]) is defined in subclause 5.1 and is encoded as described in GSM 06.20 [2].

The VQ indices of the reflection coefficients are replaced by VQ indices derived from mean(R[j](i)). mean(R[j](i)) is defined in subclause 5.1 and the VQ of the reflection coefficients is described in GSM 06.20 [2].

The SID codeword consists of 79 bits which are all "1". To mark a frame as a SID frame, the parameters in table 1 have to be set as shown.

Table 1: SID codeword

Parameter	Number of bits	Value (Hex)
MODE	2	0x0003
INT_LPC	1	0x0001
LAG_1	8	0x00ff
LAG_2	4	0x000f
LAG_3	4	0x000f
LAG_4	4	0x000f
CODE_1	9	0x01ff
CODE_2	9	0x01ff
CODE_3	9	0x01ff
CODE_4	9	0x01ff
GSP0_1	5	0x001f
GSP0_2	5	0x001f
GSP0_3	5	0x001f
GSP0_4	5	0x001f

The parameters in table 1 are defined in GSM 06.20 [2].