6.2.3 High Quality MDCT decoder (HQ)
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
6.2.3.1 Low-rate HQ decoder
6.2.3.1.1 Mode decoding
Based on the encoded bandwidth and operated bit-rate, mode information is decoded from 1 or 2 bits. Based on the decoded mode information, decoding configurations like band structures are set. The band structure definition for NB, WB, SWB, and FB is the same as encoder presented in table 103 to 108.
6.2.3.1.2 Energy Envelope decoding
From the received low-rate HQ envelope coding method bit and coding mode bit, the low-rate HQ energy decoding mode is determined and the coded quantization differential indices are decoded by the Large symbol decoding method or the Small symbol decoding method. From the received coding mode bits for energy envelope decoding, the envelope coding mode flag is determined, based on the coding mode the transmitted differential indices are decoded. For example, if flag
has value 1 the Small symbol decoding method is used otherwise the Large symbol decoding method is used for decoding the differential indices.
The final resulting reconstructed quantized energies are obtained equally as in the encoder, described in subclause 5.3.4.1.3.
6.2.3.1.2.1 Small symbol decoding method
If the flag has value 1, the flag LCmode information is extracted from the bit stream. If the LCmode has value 1 resized Huffman decoding mode is used otherwise context based Huffman decoding mode is used for decoding the differential indices.
If IsTransient is True,
The decoded differential indices is extracted either from context based or resized Huffman coding mode according to flag LCmode and the differential indices for band b=0
, are up packed directly with 5 bits. The decoded differential indices are adjusted to extract the original values according to
(1806)
If IsTransient is False,
The decoded differential indices is extracted either from context based or resized Huffman coding mode according to flag LCmode and the differential indices for band b=0
, are up packed directly with 5 bits. Once the differential indices are extracted, least significant code
are up packed directly with 1 bit and the differential indices are reconstructed according to
(1807)
The decoded differential indices are adjusted to extract the original values according to
(1808)
6.2.3.1.2.1.1 Context based Huffman decoding mode
If the context based Huffman decoding mode has been determined, the decoding is performed by referring table 168 and table 169 based on the context described in subclause 5.3.4.1.3.3.1. Four LSBs of entries in table 168 and table 169 indicates how many bits shall be read from bit-stream buffer to decode the next symbol and the signs indicate if the Huffman decoding is terminated or not. The procedure of how to perform Huffman decoding is shown below:
i=0
while( hufftab[i] > 0)
{
read_bits += hufftab[i] & 0xF
i = (hufftab[i]>>4)+read_bits(hufftab[i] & 0xF)
}
return hufftab[i]
Table 168: Huffman decoding table for the context based Huffman decoding (group0,group2)
|
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|---|---|---|---|
|
0 |
0X13 |
11 |
-0X0D |
22 |
-0X18 |
33 |
0X41 |
44 |
-0X05 |
|
1 |
-0X10 |
12 |
0X51 |
23 |
-0X16 |
34 |
-0X1C |
45 |
-0X1E |
|
2 |
-0X0F |
13 |
0X62 |
24 |
0X71 |
35 |
-0X08 |
46 |
-0X04 |
|
3 |
-0X11 |
14 |
-0X14 |
25 |
-0X0B |
36 |
0X31 |
47 |
-0X1F |
|
4 |
0X51 |
15 |
0X81 |
26 |
0X71 |
37 |
0X41 |
48 |
0X11 |
|
5 |
0X61 |
16 |
-0X0C |
27 |
-0X1A |
38 |
-0X1D |
49 |
-0X03 |
|
6 |
-0X0E |
17 |
0X81 |
28 |
0X71 |
39 |
-0X06 |
50 |
0X11 |
|
7 |
-0X12 |
18 |
-0X15 |
29 |
-0X09 |
40 |
0X31 |
51 |
-0X02 |
|
8 |
0X51 |
19 |
-0X17 |
30 |
-0X1B |
41 |
0X41 |
52 |
0X11 |
|
9 |
0X61 |
20 |
0X71 |
31 |
-0X0A |
42 |
-0X07 |
53 |
-0X01 |
|
10 |
-0X13 |
21 |
0X81 |
32 |
-0X19 |
43 |
0X41 |
54 |
0X00 |
Table 169: Huffman decoding table for the context based Huffman decoding (group1)
|
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|---|---|---|---|
|
0 |
0X12 |
12 |
-0X12 |
24 |
0X51 |
36 |
-0X18 |
48 |
-0X1B |
|
1 |
0X41 |
13 |
0X42 |
25 |
0X61 |
37 |
-0X05 |
49 |
-0X1A |
|
2 |
-0X0F |
14 |
-0x0C |
26 |
-0X16 |
38 |
-0X04 |
50 |
0X11 |
|
3 |
0X41 |
15 |
0X61 |
27 |
-0X09 |
39 |
-0X03 |
51 |
0X00 |
|
4 |
-0X10 |
16 |
-0X13 |
28 |
0X51 |
40 |
0X51 |
52 |
0X11 |
|
5 |
-0X0E |
17 |
0X61 |
29 |
0X61 |
41 |
-0X06 |
53 |
-0X1D |
|
6 |
0X31 |
18 |
0X71 |
30 |
-0X17 |
42 |
0X51 |
54 |
0X11 |
|
7 |
-0X11 |
19 |
-0X0A |
31 |
0X62 |
43 |
-0X19 |
55 |
-0X1E |
|
8 |
0X31 |
20 |
0X71 |
32 |
-0X08 |
44 |
0X51 |
56 |
-0X1F |
|
9 |
0X41 |
21 |
-0X0B |
33 |
0X81 |
45 |
-0X01 |
57 |
-0X1B |
|
10 |
-0X0D |
22 |
-0X14 |
34 |
-0X07 |
46 |
-0X1C |
– |
– |
|
11 |
0X41 |
23 |
-0X15 |
35 |
0X81 |
47 |
-0X02 |
– |
– |
6.2.3.1.2.1.2 Resized Huffman decoding mode
If IsTransient is True
If the frame is Transient, the Huffman decoding is then performed on the transmitted differential indices. The Huffman codes for the differential indices are given in table 111 in subclause 5.3.4.1.3.3.
For Non-Transient frames, the Huffman decoding is then performed on the transmitted differential indices. The Huffman codes for decoding the indices are given in table 115 in subclause 5.3.4.1.3.3.3. The differential indices decoded using table 115 takes the form, the decoded differential indices
are reconstructed which is exactly reverse to the encoder described in subclause 5.3.4.1.3.3.3 equation (1040). The way to reconstruct the differential index, which corresponds to the modification in encoder, can be done as shown in the following equation.
(1809)
6.2.3.1.2.2 Large symbol decoding method
If the Large symbol coding method is determined, the encoded envelope data should be decoded using the reverse process of encoding either the pulse mode or the scale mode as described in subclause 5.3.4.1.3.4
The Huffman data in the encoded envelope data is decoded by a Huffman decoding method described in subclause 6.2.3.1.2.1.1 using table 170.
Table 170: Huffman decoding table for the Large symbol decoding method
|
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|
|
0 |
0X11 |
5 |
0X21 |
10 |
-0X01 |
|
1 |
0X21 |
6 |
-0X02 |
11 |
-0X06 |
|
2 |
-0X04 |
7 |
-0X05 |
12 |
0X11 |
|
3 |
0X21 |
8 |
0X11 |
13 |
-0X07 |
|
4 |
-0X03 |
9 |
0X21 |
14 |
-0X00 |
6.2.3.1.3 Spectral coefficients decoding
6.2.3.1.3.1 Normal Mode
Figure 94 shows the overview of the normal mode decoder.
Figure 94: Block diagram of the Normal mode decoder overview
6.2.3.1.3.1.1 Energy envelope decoding
Details are described in subclause 6.2.3.1.2.
6.2.3.1.3.1.2 Tonality flag decoding
Tonality flags described in subclause 5.3.4.1.4.1.3 are decoded and used for calculation of the bit allocations.
6.2.3.1.3.1.3 Bit allocation
The processing is in the same manner with the one at the encoder side.
Firstly, the fine gain adjustment bits are derived.
Secondly, the bands of the limited-band mode are identified based on the decoded limited-band mode flags, and their corresponding bandwidths are set if the limited-band mode is used in encoding. As is described in 5.3.4.1.4.1.4.4.2, the band is limited to the vicinity of the maximum amplitude spectrum frequency of the previous frame. The position of the maximum amplitude spectrum frequency of the previous frame is stored in a memory, which was searched using the decoded MDCT spectrum in the previous frame. The information of the limited band (i.e. identified vicinity of the maximum amplitude spectrum frequency position) is output to the TCQ decoder along with the bit budget allocated through the following bit allocation process.
Thirdly, bands encoded using PFSC are identified based on the decoded tonality flags among the four highest bands and necessary bits (1 or 2 bits) are allocated to each of the identified bands.
Finally, remaining bits are allocated to other bands based on perceptual importance using the decoded quantized band energies. When there is any band whose assigned bit results in zero in the four bands, such band is re-identified as a PFSC encoding band and the bit allocations are re-calculated.
6.2.3.1.3.1.4 Fine structure decoding
6.2.3.1.3.1.4.1 TCQ decoding
6.2.3.1.3.1.4.1.1 Joint USQ and TCQ
In order to de-quantize the fine structure of the normalized spectrum, the ISC and the information for the selected ISCs in each band are decoded by the position, number, sign and magnitude of the ISCs.
The magnitude information is decoded by the joint USQ and TCQ with an arithmetic decoding, while the position, number and sign are decoded by an arithmetic decoding.
The decoding method is selected at the Selecting Decoding Method block by the bit allocation and the information for each band. If a bit allocated for a band is zero, all the samples in the band are decoded to zero by the zero decoding block. Otherwise, each band is decoded by the selected de-quantizer.
The quantizer selection information selects between the TCQ and USQ quantizes to get the same results as that of encoder.
The Estimating Number of Pulses block determines the number of pulses per a band using the band length and the bit allocation data R[]. Its principle of operation is same as that of the method which is used in the Scaling Bands module in encoder, see subclause 5.3.4.1.4.1.5.1.1.
The Lossless Decoding and the Decoding Position Info block reconstruct the position information of the ISCs, i.e. the number of ISCs and their positions. This process is similar to encoder side and same probabilities should be used for the proper decoding, see subclause 5.3.4.1.4.1.5.1.
In the Joint USQ and TCQ Decoding block, the magnitudes of the gathered ISCs are decoded by the arithmetic decoding and de-quantized by the joint USQ and TCQ decoding. In this block the non-zero position and the number of the ISC is utilized for the arithmetic decoding. The joint USQ and TCQ have two types of decoding methods. One is TCQ and USQ with 2nd bit allocation for the NB and WB, and the other one is the LSB TCQ for USQ for the SWB and FB. These methods are described in subclause 6.2.3.1.3.1.4.1.2 TCQ and USQ with second bit allocation and 6.2.3.1.3.1.4.1.3 LSB TCQ for USQ.
In the Decoding Signs block, the sign information of the selected ISC is decoded by the arithmetic decoding with equal probabilities for the positive and negative signs.
In order to recover the quantized components for each band, the position, sign and magnitude information is added to the quantized components to recover the real components at the Recovering Quantized Components block.
At this point the determined bands with no transmitted data are filled by zeroes. Then the number of pulses in the non-zero bands is estimated and the position information, including the number and position of ISCs, is decoded using this estimated number. After the magnitude information is decoded using the lossless decoder, the joint USQ and TCQ decoding is performed. For non-zero magnitude values the signs and quantized components are finally reconstructed.
In the Inverse Scaling Bands block, the inverse scaling of the quantized components is performed by using the transmitted norm information. The inverse scaled signal is the output of the TCQ decoding.
Figure 95: Block diagram of fine structure decoding using TCQ
6.2.3.1.3.1.4.1.2 TCQ and USQ with second bit allocation
The general de-quantization and decoding scheme of the TCQ and USQ with second bit allocation consists of several main blocks: quantizer decision, TCQ decoder, USQ decoder, lossless decoder, and Second bit allocation. In the quantizer decision module the quantization mode of the current band is selected by using the results of the Selecting Decoding Method block. Then the selected decoder restores the current band in association with the lossless decoder, based on the arithmetic decoding with the transmitted bit stream.
Figure : Block diagram of TCQ and USQ decoding with second bit allocation
The decoding process is started by the recovering the non-zero bands and positions using the transmitted bit-stream and the bit allocation R[] for the selected quantizer. By using this information, the appropriate magnitude for the decoded band is selected. The difference between bit allocation R[] and actual decoded bits per band is accumulated and called the surplus. This surplus will be used while decoding two band determined by second bit allocation procedure described in encoder in subclause 5.3.4.1.4.1.5.1.2.
The magnitude decoding based on binary arithmetic decoding is as follows. First the probability of symbol is calculated by the equations in encoder subclause 5.3.4.1.4.1.5.1.2. Then the number of pulses for each magnitude is decoded by using the probabilities and
, where
corresponds to last pulse in magnitude and
to all other pulses. The magnitude of the pulse probabilities are then modified after this calculation with respect to the trellis code limitation, i.e. magnitudes that are impossible are assigned zero probability.
This algorithm was modified to save complexity for bands with a large number of pulses. The procedure is same as that of encoder subclause 5.3.4.1.4.1.5.1.2.
Location decoding is done based on the same algorithm as that of magnitudes decoding and uses the same complexity reduction technique.
Signs are decoded with the arithmetic decoder, using equal probabilities of positive and negative signs.
6.2.3.1.3.1.4.1.3 LSB TCQ for USQ
The idea of the LSB TCQ for USQ is to use advantages of both quantizers (USQ and TCQ) in one scheme and exclude the path limitation from the TCQ.
Figure : Block diagram for LSB TCQ decoding
The decoding process starts from receiving the bit allocation R[] and the decoding of the band information including:
- Number of nonzero positions for ISCs
- Nonzero positions
- USQ magnitude
- Signs for nonzero magnitudes
First the number of nonzero pulses and their positions are decoded using the arithmetic decoder. Then the USQ magnitudes are decoded band by band using bit allocation with surplus control. This generates Delta values in the same manner as the encoder, see subclause 5.3.4.1.4.1.5.1.3. The difference between the bit allocation R[] and actual decoded bits per band is accumulated and called the surplus, which is then used in the next bands.
The algorithms used for decoding positions and magnitudes are the same as those described in subclause 6.2.3.1.3.1.4.1.2 in TCQ and USQ decoder.
After receiving the USQ magnitudes, the TCQ path is decoded from the bit-stream using the arithmetic decoder.
The decoded path is used to reconstruct the residual array according to the decoded trellis state. From each path bit, two LSB bits are generated in the residual array. This process shown in pseudo code:
for( state = 0, i = 0; i < bcount; i++)
{
residualbuffer[2*i] = dec_LSB[state][dpath[i]] & 0x1;
residualbuffer [2*i + 1] = dec_LSB[state][dpath[i]] & 0x2;
state = trellis_nextstate[state][dpath[i]];
}
Starting from state 0, the decoder moves through the trellis using decoded dpath bits, and extracts two bits corresponding to the current trellis edge.
In the Spectrum recovering block the decoded residual array is added to the non-zero spectral components. The output of this block is the reconstructed spectrum.
The decoded MDCT coefficients are de-normalized using the decoded band energies.
Finally, as described in subclause 5.3.4.1.4.1.4.4.1, fine gain adjustment is performed on the dominant bands. Decoded fine gain adjustment factor is applied to the de-normalized decoded MDCT coefficients.
6.2.3.1.3.1.4.2 Noise-filling
Noise-filling is performed between “De-norm. and Fine gain adj.” and “PFSC decoder” blocks in Figure 94 and the process is the same as the one at the encoder side.
6.2.3.1.3.1.4.3 PFSC decoding
6.2.3.1.3.1.4.3.1 Envelope normalization
This process is the same with the one described in subclause 5.3.4.1.4.1.5.3.2.
6.2.3.1.3.1.4.3.2 Lag information decoding
Lag indices for the last four sub-bands (i.e. b=18 to 21 in 13.2 kbps and b=20 to 23 in 16.4 kbps) are decoded if the corresponding decoded tonality flag is set to “0”. The starting position is decoded as as described in subclause 5.3.4.1.4.1.5.3.3. Based on the starting position and width of search band, the predicted high-frequency spectrum is generated from the envelope normalized TCQ-decoded low-frequency spectrum.
6.2.3.1.3.1.4.3.3 Scaling and noise smoothing
Scaling factors are calculated for the predicted bands using the decoded band energies. Each scaling factor is calculated as the square root of the quotient of the quantized band energy divided by its corresponding band energy from the predicted high-frequency spectrum. The calculated scaling factors are attenuated by the scaling factor of 0.9 and applied to the predicted high-frequency spectrum.
Inter-frame smoothing process for the noise components are applied as described in subclause 5.3.4.1.4.1.5.3.3.3.
The Normal mode PFSC decoding overview is shown in figure 98.
Figure 98: Block diagram of the Normal mode PFSC decoder
6.2.3.1.3.2 Transient Mode
6.2.3.1.3.2.1 Energy envelope decoding
Details are described in subclause 6.2.3.1.2.
6.2.3.1.3.2.2 Bit allocation
The processing is the same as subclause 5.3.4.1.4.2.2
6.2.3.1.3.2.3 Fine structure decoding
TCQ decoding with Transient mode configurations is performed.
6.2.3.1.3.3 Harmonic Mode
6.2.3.1.3.3.1 Overview
The high-level decoder structure of the Harmonic mode is basically the same with the Normal mode. The main difference can be found in its detailed structure of the PFSC block, and it is shown in the following figure.
Figure : Block diagram of the Harmonic mode decoder overview
6.2.3.1.3.3.2 Energy envelope decoding
Details are described in subclause 6.2.3.1.2.
6.2.3.1.3.3.3 Bit allocation
The processing is in the same manner with the one at the encoder side.
At first, the fine gain adjustment bits are derived, procedure is same as in explained in sub-clause 5.3.4.1.4.3.2.1, and then remaining bits are allocated in an adaptive manner where more bits are allocated to the bands in a perceptually significant group than those in a less significant group. Detailed procedure is same as in explained in sub-clause 5.3.4.1.4.3.2.2.
6.2.3.1.3.3.4 Fine structure decoding
6.2.3.1.3.3.4.1 TCQ decoding
This part is the same with the Normal mode as described in subclause 6.2.3.1.3.1.4.1.
6.2.3.1.3.3.4.2 Noise filling for quantized spectrum
In this subclause noise is filled in the quantized spectrum where coefficients have been quantized to zero when the bit allocation subclause allocates non zero bits to the bands and also fills the un quantized bands up to the transition frequency in the same manner as in the encoder, see subclause 5.3.4.1.4.3.3.2
6.2.3.1.3.3.4.3 PFSC-based gap filling
6.2.3.1.3.3.4.3.1 Overview
This subclause is only applied to SWB and FB input signals. The spectral coefficients which belong to bands which are assigned zero bits from the bit‑allocation subclause are not quantized. This means that not all transform coefficients are transmitted to the decoder. From the noise filled quantized spectrum, the gaps in the high frequency region which has zero bit allocation are identified and are filled with the new generated spectrum. The predicted spectrum is generated using normalized noise filled quantized spectrum described in subclause 6.2.3.1.3.3.4.3.2.
Based on the bit allocation described in subclause 6.2.3.1.3.3.3, if any of is allocated with zero bits, the corresponding band with start and end positions
according to table 108 in the
has a gap and it is filled with the predicted spectrum described in subclause 6.2.3.1.3.3.4.3.5 corresponding to
.in
6.2.3.1.3.3.4.3.2 Envelope Normalization
The envelope normalization is performed equally as in the encoder, described in subclause 5.3.4.1.4.3.3.3.2. As a result the envelope normalized signal is obtained, where
is the envelope normalized low frequency quantized spectrum and
is the envelope normalized low frequency noise spectrum.
6.2.3.1.3.3.4.3.3 Decoding of lag index
Lag index for sub-bands i=0,1 is decoded from the bit stream. For sub-bands 0 and 1, encoded best match position is decoded using the starting position
and the lag index
,
is defined in equation (1147).
Based on the best match position the predicted spectrum is generated from the envelope normalized noise filled quantized spectrum. The detailed description of the predicted spectrum generation is described in following subclause 6.2.3.1.3.3.4.3.5.
6.2.3.1.3.3.4.3.4 Structure analysis for Harmonics
The structure analysis for Harmonic mode is performed equally as in the encoder, described in subclause 5.3.4.1.4.3.3.3.4. As a result estimated harmonic is obtained; the estimated harmonic is used for generating the predicted spectrum for the HF region
6.2.3.1.3.3.4.3.5 Predicted spectrum generation
Predicted spectrum is generated for the high frequency region by using the envelope normalized noise-filled quantized spectrum
, which is obtained from subclause 6.2.3.1.3.3.4.3.2. Predicted spectrum is generated, first by extracting the desired noise components from the
described in subclause 6.2.3.1.3.3.4.3.6 followed by tonal generation using
described in subclause 6.2.3.1.3.3.4.3.7.
Noise filled spectrum is used for estimating the tonal energy
and the tonal components
of the spectrum in the high frequency region, which is obtained from subclause 6.2.3.1.3.3.4.3.7 are normalized using the estimated tonal energy
, where
is calculated as follows:
(1810)
: is the noise energy obtained using the noise filled spectrum
according to
(1811)
The noise energy obtained from equation (1811) is adjusted, when the noise filled spectrum has low level noise and / or when the noise filled spectrum has high level noise. Low noise level is detected using the energy ratio
between noise and the total band energy and high noise level is detected when the estimated tonal energy
is negative. The adjustment factor
is estimated according to
(1812)
For each band, based on the obtained from equation () is used to re-calculate the tonal energy
and estimated noise
using equations () and (). The tonal components
of the spectrum in the high frequency region are normalized using the scale factor
calculated as follows
(1813)
The calculated scale factor and extracted tonal components are used for injecting the tonal components into the noise filled spectrum
according to
(1814)
where, is the tonal positions obtained from subclause 5.3.4.1.4.3.3.3.5
6.2.3.1.3.3.4.3.6 Noise filling for the predicted spectrum
Noise filling for the predicted spectrum is performed equally as in the encoder, described in subclause 5.3.4.1.4.3.3.3.5. As a result noise filled spectrum and tonal positions
is obtained, where j is the pulse resolution. The obtained predicted spectrum which contains noise is adjusted using the noise factor
according to
(1815)
where is the noise factor which is decoded from the bit stream and the decoded noise factor is converted to linear domain as follows
(1816)
6.2.3.1.3.3.4.3.7 Tonal generation for predicted spectrum
First, the tonal components are extracted from the desired portion of envelope normalized quantized spectrum
based on the decoded best match position . The extracted tonal components
are used for the spectrum in the high frequency region. As the normalized quantized spectrum characteristics are flat all the values during the normalization process will have similar values, all the non-zero coefficients in the desired region of
is identified as follows
where, are defined as follows
is the tonal resolution obtained from the normalized quantized spectrum for sub band i=0,1
is the tonal components extracted from the normalized quantized spectrum and used as the spectrum in the high frequency for sub band i=0,1
The tonal information, for i=0, 1 obtained from normalized quantized spectrum is used for sub band i=2, 3. Using the estimated harmonic frequency obtained from subclause 6.2.3.1.3.3.4.3.4 frequency positions of the extracted tonal components are adjusted as described in subclause 6.2.3.1.3.3.4.3.5.
Based on the band definition described in table 108, the high frequency band ranges are defined . Using the band definitions for high frequency region, the extracted tonal components and its corresponding pulse resolutions are restructured, and used for generating predicted spectrum. For example, the restructured information for sub band i=0 is equivalent to
.
6.2.3.2 High-rate HQ decoder
A high level structural block diagram of the high-rate HQ decoder is in figure .
Figure 100: High level structure of the high-rate HQ decoder
Firstly, the High-rate HQ coding mode information is decoded.
6.2.3.2.1 Normal Mode
6.2.3.2.1.1 Envelope decoding
From the received high-rate HQ norm coding mode bits, the high-rate HQ norm coding mode is determined and the transmitted differential indices are decoded using the selected method. The quantization index of the lowest-frequency band, i.e., , is directly decoded in all modes.
6.2.3.2.1.1.1 Context based Huffman decoding mode
If this coding mode is determined for the current frame, the context based Huffman decoding is then performed on the transmitted quantization differential indices using the method described in subclause 6.2.3.1.2.1.1 and the tables shown in 168 and 169.
6.2.3.2.1.1.2 Re-sized Huffman decoding mode
If this coding mode is determined for the current frame, the resized Huffman decoding is then performed on the transmitted quantization differential indices using the method described in subclause 6.2.3.1.2.1.2. The Huffman codes for the differential indices are given in table 105.
6.2.3.2.1.1.3 Normal Huffman decoding and bit-packing mode
If this coding mode is determined for the current frame, the Normal Huffman decoding is then performed on the transmitted differential indices. The Huffman codes for the differential indices are given in subclause 5.3.4.2.1.2.3.
When the bit-packing mode is determined; the adjusted differential indices are un-packed directly with 5 bits.
The actual quantized norms are obtained by lookup table, defined in subclause 5.3.4.2.1.1.
6.2.3.2.1.2 Normal mode fine structure inverse quantization
6.2.3.2.1.2.1 Fine structure inverse PVQ-quantization
The spectral coefficient inverse quantization is done as is described in subclause 6.2.3.2.6
6.2.3.2.1.2.2 Fine gain prediction, inverse quantization and application
The bit allocation for the PVQ shape vector and fine gain adjustment
, as well as
and
are obtained as in subclause 5.3.4.2.1.3a.1. The quantized gain prediction error
is obtained by using the assigned bitrate
and the fine gain adjustment
is obtained by
()
with for
. The gain of the synthesis is adjusted by scaling the decoded fine structure with the fine gain
.
6.2.3.2.1.3 Spectral filling
This subclause gives a technical overview of the spectrum filling processing which is applied at the decoder in HQ high rate mode.
6.2.3.2.1.3.1 Wideband adaptive noise filling at 24.4/32kbps
Wideband adaptive noise filling at 24.4 and 32 kbps proceeds by calculating the total available bits and the bits variance for the sub-bands in non-transient frames over the index range,
()
()
The average bit allocation threshold is initialized for each coefficient in each sub-band according to the values in table .
Table 171: Threshold for average bit allocation
|
Band |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
|
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
1.5 |
|
Band |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
|
|
1.5 |
1.5 |
1.5 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
1.0 |
0.8 |
0.8 |
For the sub-bands in the index range,
denotes the number of the sub-bands where the average number of allocated bits for each coefficient is not less than the threshold
. The harmonic parameter
for those sub-bands is calculated as follows:
()
()
The step length, , is calculated according to:
()
For any sub-band in non-transient frames the following procedure is then followed. If the average number allocated bits for each coefficient in the sub-band is greater than or equal to the threshold 1.5, then the bit allocation for the sub-band is saturated and the un-decoded coefficients of the sub-band are not processed further by the noise filling. Otherwise, the bit allocation to the sub-band is un-saturated, and the un-decoded coefficients of the sub-band are reconstructed by noise filling. For any un-saturated sub-band with zero bits allocated to its coding, the envelope of the un-decoded coefficients in the sub-band are set to the decoded norm for that sub-band. Otherwise, if the un-saturated sub-band has bits allocated to it, the envelope of the un-decoded coefficients is calculated as follows:
The average energy of the sub-band is then calculated using the de-quantized norm
as follows:
()
The energy sum of the all decoded non-zero coefficients in this subband is then calculated
()
A search of the maximum magnitude and the minimum magnitude
of the decoded coefficients in each subband is also calculated for further processing.
The energy differenceis next calculated. If
, the envelope of the un-decoded coefficients is set to zero
. Otherwise, the envelope of the un-decoded coefficients
is calculated as follows:
The initial envelope of the un-decoded coefficients in the un-saturated sub-band is calculated by the energy difference,
()
The average norm of the un-saturated sub-bandis calculated
()
If or
, then the spectrum of the sub-band is sharp. The envelope of the un-decoded coefficients is obtained by modifying the initial envelope as follows:
()
If the spectrum of the sub-band is not sharp and ,
, then the harmonic parameter
is added by the step length
.
If the envelope is more than the half of the minimum magnitude, then the envelope is set to be equal to the half of the minimum magnitude
.
()
If the ratio of the average norms in the current frame to that of the previous frame lies in the range (0.5,…, 2), and the previous frame is a non-transient frame, the the envelope of the un-decoded coefficients for the current frame and the previous frame are weighted as follows.
()
For the un-decoded coefficients in the sub-band, the coefficients are generated using a random noise generator and multiplied by the estimated envelope as described above.
For the last sub-band, a check is made whether the mode of the previous frame was not a transient, and whether the ratio of the decoded norms of the current frame to those of the previous frame are in the range (0.5, …,2.0), and the bit variance is not more than 0.3, and the sub-band of the current frame is bit allocated and the sub-band of the previous frame is not bit allocated or vice versa. If all of the above conditions are fulfilled, then the coefficients in the current frame and the previous frame for the last 20 coefficients in the last sub-band are weighted as follows:
()
In the transient mode, un-decoded coefficients in a sub-band are generated from random noise, and de-normalized by the decoded norm for that sub-band.
6.2.3.2.1.3.2 General spectral filling
Based on the received bit-allocation, the transition frequencyis estimated in the same manner as in the encoder, see subclause 5.3.4.2.1.4. Spectral filling consists of two algorithms. The first algorithm fills the low‑frequency spectrum up to the transition frequency
, the second algorithm regenerates the possibly non-coded high-frequency components by using the low-frequency noise-filled spectrum.
The interaction between these two algorithms is shown in figure . The resulting spectrum from both the noise-filling algorithm and the high frequency noise fill is a normalized spectrum which is shaped by the received quantized norms.
Figure 101: Spectrum filling block diagram
6.2.3.2.1.3.2.1 Noise filling
The first step of the noise fill procedure relies on the building of the so-called spectral codebook from the received (decoded) normalized transform coefficients. This step is achieved by concatenating the perceptually relevant coefficients of the decoded spectrum. Figure 102 illustrates this procedure. The decoded spectrum has several series of zero coefficients that are called spectral holes of a certain length. This length is the sum of the consecutive lengths of bands which were allocated zero bits.
Figure 102: Building the spectral codebook from the decoded transform signal
Since the length of all spectral holes can be higher than the length of the spectral codebook, the codebook elements might be re-used for filling several spectral holes.
Figure 103: Noise filling from the spectral codebook up to the transition frequency
Figure shows how, based on the spectral codebook C, the non-quantized spectral coefficients are filled. Spectral holes are filled by increasing the codebook index j as much as the index i, used to cover all the spectral holes up to the transition frequency. Reading from the spectral codebook is done sequentially and as a circular buffer according to the following:
i=0; j=0
(1:) if then
,
increment i,j (if out of bound, rewind j to start of codebook)
if i=0 then
STOP
else
goto (1:)
endif
For low bit rates, many of the quantized bands will contain few pulses and have a sparse structure. For signals which require a more dense and noise-like fill, a set of two anti-sparseness processed codebooks are created instead of the regular spectral codebook as illustrated in figure .
Figure 104: Creation of two parallel codebooks to handle sparse coded vectors.
The compression of the coded residual vectors is done according to the following definition:
()
The virtual codebook which constitutes the spectral codebook is built only from “populated” sub-vectors, where each sub-vector has a length of 8. If a coded sub-vector does not fulfill the criterion:
()
it is considered sparse, and is rejected. Since the sub-vector length is 8, this corresponds to a rejection criterion if less than 25% of the vector positions are populated. The remaining compressed sub-vectors are concatenated into Spectral codebook 1, with the length
. The final step of the anti-sparseness processing is to combine the codebook samples pair-wise sample-by-sample with a frequency reversed version of the codebook. The combination can be described with the following relation:
()
For SWB processing at 24.4 or 32 kbps in case of low spectral stability, spectral codebook 1 is used below band and the spectral codebook 2 is used above and including band
. The spectral filling using these two codebooks is depicted in figure 105.
Figure 105: Creation of two parallel codebooks to handle sparse coded vectors.
6.2.3.2.1.3.2.2 High frequency noise fill
Based on the low-frequency filled spectrum, and prior to noise level attenuation, as described in the previous clause, the last step of the spectral filling consists of the generation of the target bandwidth audio signal. In other words, the process synthesizes a high-frequency spectrum from the filled spectrum by spectral folding based on the value of the transition frequency.
The target bandwidth generation is based on the spectral folding of the spectrum below the transition frequency to the high-frequency spectrum (zeroes above the transition frequency), see figure . A first spectral folding is achieved with respect to the point of symmetry defined by the transition frequency. No spectrum coefficients from frequencies below
are folded into the high frequencies. In other words, only the upper half of the low frequencies are folded. If there are not enough coefficients in the upper half of the low frequencies to fill the whole spectrum above the transition frequency, the spectrum is folded again around the last filled coefficient. This process is repeated until the last band is filled.
Figure 106: The spectrum above the transition frequency is regenerated using spectral folding from the transition frequency
6.2.3.2.1.3.2.3 Noise level adjustment
After the fine structure of the spectral holes has been determined, the noise-filled part of the spectrum is attenuated according to the received NoiseLevel index. In the case of transient mode, the NoiseLevel is not estimated in the encoder and is automatically set to the value corresponding to zero index, i.e., 0 dB.
This operation is summarized by the following equation:
()
For SWB processing at 24.4 or 32 kbps in case of low spectral stability, an additional adaptive noise-fill level adjustment is employed. First, an envelope adjustment vector is derived according to the following pseudo-code:
For ,
if ,
if ,
if
if ,
else
else
else
if and
,
else
else
where . Further,
denotes the number of pulses for band
as described in subclause 5.3.4.2.7, where
corresponds to the case when zero bits are assigned to band
. In short it permits strong attenuation for short bands where the neighboring bands are quantized, and gradually less when these requirements are not fulfilled. Once
has been obtained, attenuation regions of consecutive bands
where
are identified. The attenuation for each of these regions are adjusted according to
()
where is the number of consecutive bands in the attenuation region. The width-dependent attenuation function
is a piece-wise linear function defined as
()
The resulting vector is further combined with a limiting function which prevents attenuation during audio with high spectral stability. The spectral stability is calculated based on a low-pass filtered Euclidian distance
between the spectral envelope values
of adjacent frames:
()
()
Here denotes the value of the variable for frame
. The spectral envelope stability parameter
is derived by mapping
to the range
using a discreetly sampled sigmoid function implemented as a lookup table
. Due to the symmetry of the function, the table is mirrored around the mid-point such that the final stability parameter can be obtained by
()
where the quantization index is found by
and clamping the index
to the range
. Finally, the gain adjustment vector
is derived as
()
where the envelope stabilityacts as a limiting function for the gain adjustment vector.
For WB processing, a slightly different gain adjustment vector is derived. Here, the is computed as
()
where is a gain attenuation table for index
, which in turn is derived by
()
For SWB and 24.4 and 32 kbps, the gain adjustment is applied using a hangover logic which only permits attenuation in case a sequence of 150 frames without transients has been observed. In case this requirement is met for SWB encoded bandwidth or if the encoded bandwidth is WB, the gain adjustment vector is combined with the quantized envelope vector to form the gain adjusted envelope vector
.
6.2.3.2.1.3.2.4 Spectral fill envelope shaping
When the full-bandwidth fine spectral structure is generated, the resulting spectrum is shaped by applying the gain adjusted envelope vectors for each band according to:
()
6.2.3.2.2 Transient Mode
6.2.3.2.2.1 Envelope decoding
The envelope is decoded as is described in subclause 6.2.3.2.1.1. In addition to those step the norms are also sorted as is done in the encoder, see subclause 5.3.4.2.2.1.
6.2.3.2.2.2 Fine structure inverse quantization (spectral coefficients decoding)
The spectral coefficients are decoded as for the Normal HQ mode as described in subclause 6.2.3.2.1.2.
6.2.3.2.2.3 Spectral filling
The spectral filling is done as described in 6.2.3.2.1.3, but the bandwidth extension in subclause 6.2.3.2.1.3.2.2 is not done.
6.2.3.2.3 Harmonic Mode
6.2.3.2.3.1 Core decoding
Envelope decoding and the PVQ decoder are described in subclause 6.2.3.2.1.1 and subclause 6.2.3.2.1.2, respectively.
If a sub-band has bits allocated to it, then the decoded coefficients of the sub-band are de-normalized by multiplying the de-quantized norm of the sub-band, and in this way the de-normalized coefficients are obtained. Otherwise, if a sub-band has no bits allocated to it, the de-normalized coefficients
of that sub-band are set to 0. And the higher frequency band coefficients with the index of sub-band above
are 0 and are reconstructed by bandwidth extension, where
is the index of the highest frequency sub-band of the decoded low frequency band signal.
6.2.3.2.3.2 Bandwidth extension decoding for harmonic mode
The start index for the bandwidth extension is adaptively obtained according to the value of.
Firstly preset the start index for bandwidth extension :
()
Then, in order to predict the excitation signal of bandwidth extension, judge whether the index of the highest frequency sub-band of the decoded low frequency band signal is less than the start index for bandwidth extension
, i.e. judge whether the highest frequency bin of bit allocation is less than the preset start frequency bin for bandwidth extension,
- if
,
is then set to
. The excitation signal of bandwidth extension is predicted by the preset start index
and the chosen excitation signal from the decoded low frequency band signal with the given bandwidth length.
- Otherwise, the excitation signal of bandwidth extension is predicted by the preset start index
, the index of the decoded highest frequency sub-band
and the chosen excitation signal from the decoded low frequency signal with the given bandwidth length.
Finally, the higher frequency band signal is reconstructed by the predicted excitation signal and the envelopes as described in subclause 6.2.3.2.2.1.
6.2.3.2.3.2.1 Calculate excitation adaptive normalization lengths
The de-normalized coefficients calculated in subclause 6.2.3.2.3.1 need to be recovered to remove the original core envelope effects to give the excitation for bandwidth extension. The normalization length is adaptively obtained according to the signal characteristics. The normalization length of the previous frame
is initialized to 8.
208 MDCT coefficients in the 0-5200 Hz frequency range are split into 13 normalization sub-bands with 16 coefficients per sub-band. The peak magnitude and average magnitude in each normalization sub-band are then calculated. The counter is initialized to zero and increased by one if
and
, where
()
The normalization length is set to
, and it is adjusted with reference to the value from the previous frame,
()
6.2.3.2.3.2.2 Calculate envelopes for excitation normalization
The normalization envelopes, for each spectral bin are calculated as follow:
()
The value are then normalized using the normalization envelopes
to obtain the normalized coefficients
,
()
6.2.3.2.3.2.3 Adaptive excitation generation
The normalized coefficients in the frequency range 1500-5025Hz, i.e. the coefficients, are selected for the excitation calculation. The starting frequency bin of the excitation,
, is calculated as follows,
()
The selected low frequency normalized coefficients from which the re-constructed higher band coefficients are obtained are copied to the high band starting at frequency,
, as follows
()
The low frequency normalized coefficients may in practice be copied N times as a circular buffer in order to fill in the re-constructed higher bands, where N can be a decimal fraction.
6.2.3.2.3.2.4 Weighting the re-constructed higher band coefficients and random noise
The envelopes of the re-constructed higher band coefficients are calculated according to the band structure given in table 129, and then the re-constructed higher band signal is weighted and random noise added.
()
Where and
.
The weighting factor for the normalized re-constructed higher band signal, , is
()
The weighting value of the normalized re-constructed higher band signal, , is
()
where the noise level is estimated as follows:
()
and the sum of the differences between the consecutive norms and the sum of the norms
in the index range
, are given by
()
()
6.2.3.2.4 HVQ
First the HVQ decoder extracts from the bitstream number of coded peaks, and reconstructs spectral peaks positions and peak gains
. The peaks positions are decoded with either Huffman decode or space coding decoder, based on the received mode decision. The peak shapes vectors
are reconstructed from the received VQ indices and further scaled with reconstructed peak gains
for the corresponding shape region. The low-frequency bands are PVQ decoded, with number of bands determined as described in 5.3.4.2.5
The unquantized coefficients below 5.6 kHz for 24.4 kb/s and 8 kHz for 32 kb/s are grouped into 2 sections and noise filled and scaled. Each of the sections covers half of coded band (of 112 bins at 24.4 kbps and 160 bins at 32 kbps). After the noise fill each of the sections is scaled with the corresponding reconstructed gains and
. The gains reconstructed in the current frame
are smoothed with the levels from the past frame
()
The reconstructed envelope levels used above 5.6 kHz for 24.4 kb/s and 8 kHz for 32 kb/s are adjusted based on the presence or absence of peak in the low-frequency fine structure used in the noise-fill.
()
6.2.3.2.5 Generic Mode
Figure 107: Generic mode Decoder Block Diagram
6.2.3.2.5.1 Low frequency envelope decoding
This is described in subclause 6.2.3.2.1.1.
6.2.3.2.5.2 High frequency envelope de-quantization
The envelope VQ indices for SWB or
for FB are used to de-quantize the high frequency envelope.
At 24.4kbps, the de-quantized high frequency envelope can be determined by:
(1859)
While at 32kbps, the de-quantized high frequency envelope can be determined by: (1860)
The final de-quantized envelope is then calculated:
()
In FB case, is further used to generate the de-quantized high frequency envelope.
6.2.3.2.5.3 High frequency envelope refinement
The high frequency envelope refinement is described in subclause 5.3.4.2.6.5. After de-quantizing the high frequency envelope using the VQ described in subclause 6.2.3.2.5.2, the de-quantized high frequency envelope is mapped to one of the HQ high rate normal mode bands. To generate the norms the mapped high frequency envelope is quantized and de-quantized with the scalar quantizer, as shown in subclause 5.3.4.2.6. The de-quantized low frequency envelope and the de-quantized high frequency norms are then combined. Using the combined norms, the bit allocation information per each band is calculated using the fractional bit allocation method. If there are any high bands which have allocated bits, the refinement data is decoded and used to update the high frequency norms. The updated norms are used for de-normalizing the de-quantized spectrum by the PVQ algorithm in subclause 6.2.3.2.5.4 and the noise filling algorithm in subclause 6.2.3.2.5.5.Then the initial bit allocation information is updated, based on the number of bits used for representing the refinement data.
6.2.3.2.5.4 PVQ
This is described in subclause 6.2.3.2.1.2.2
6.2.3.2.5.5 Noise filling
Noise filling is performed described in subclause 6.2.3.2.1.3.2.1. The last band for this noise filling in Generic mode is defined as, where
is the last band index where the spectrum was quantized using PVQ. The filled spectrum is further de-normalized to generate
as described in subclause 6.2.3.2.1.3.2.4. If core_sfm is higher than Nband_LF-1, then the
are the high frequency norms described in subclause 6.2.3.2.5.3.
6.2.3.2.5.6 High frequency excitation spectrum
The high frequency excitation spectrum is based on a copy of the decoded low frequency spectrum. First spectral anti-sparseness processing is applied, and then dynamic range control is applied to depending on the decoded excitation class. Finally, a simple spectral copy is done to create the high frequency excitation spectrum.
6.2.3.2.5.6.1 Spectral anti-sparseness processing
The spectral anti-sparseness processing is performed on the low frequency spectrum by inserting a 0.5 amplitude coefficient, with a random sign, where the normalized spectrum is zero. The end band for the spectral anti-sparseness processing is specified by Banti (=max(core_sfm,Nband_LF-1)) and the end frequency is specified by Lanti(=kend(Banti)).
()
where is a random seed and updated by
.
After applying this anti-sparseness processing, the energy is further modified by applying the low band dequantized envelope as described in subclause 6.2.3.2.5.1.
6.2.3.2.5.6.2 Control of dynamics based on the excitation class
Following the spectral anti-sparseness processing, the spectrum is further modified by additional processing to control the dynamics.
The spectrum is first normalised by calculating the envelope of the processed spectrum, then dividing the spectrum by this envelope. The window size,, for this normalisation depends on the signal characteristics..
The 256 low frequency MDCT coefficients in the 0-6400 Hz frequency range, are split into 16 sharpness bands (16 coefficients per band). In sharpness band j, if
and
, the counter
is incremented by one.
The maximum magnitude of the spectral coefficients in a sharpness band, denoted, is:
()
Parameteris initialized to 0 and calculated for every frame. Then the normalization length
is obtained:
()
where the current normalization length is calculated as follows:
()
and the current normalization length is preserved as.
The spectrum is then normalized
()
where is the number of bands used in the control of dynamics.
The sign vectors for the spectrum are then removed, leaving just the magnitude, and the mean is then calculated for each band p. The bands are 16 frequency bins wide, and start at frequency bin 2. For the SWB case, at 24.4kbps there are 9 bands ending at frequency bin 145, while for 32kbps there are 8 bands, ending at frequency bin 129. For the FB case, at 24.4kbps there are 19 bands ending at frequency bin 305, while for 32kbps there are 18 bands, ending at frequency bin 289
The amplitude of each frequency bin is then reduced by a dynamics control factor of the difference between the bin amplitude and mean of the band.
()
where drf is the dynamics control factor depending on the decoded excitation class, ()
The original signs are then re-applied for the HF_Speech_excitation_class and the HF_excitation_class1; however random signs are used for the HF excitation_class0. If is higher than 0, the original sign is re-applied, otherwise, a reversed sign of the original is applied. The initial random seed is
()
where is the number of allocated integer bits for each band.
The spectrum is then normalised:
()
The normalised spectrum is then copied, using the mapping in table , to create the high frequency excitation spectrum.
Table 172: Frequency mapping to generate high frequency excitation spectrum
|
l |
|
|
|
|
|
|
24.4kbps |
0 |
2 |
129 |
320 |
447 |
|
1 |
2 |
129 |
448 |
575 |
|
|
2 |
80 |
143 |
576 |
639 |
|
|
24.4kbps FB |
3 |
144 |
303 |
640 |
799 |
|
32kbps |
0 |
2 |
129 |
384 |
511 |
|
1 |
2 |
129 |
512 |
639 |
|
|
32kbps FB |
2 |
130 |
289 |
640 |
799 |
Finally the high frequency excitation spectrum is adjusted at the junction boundaries,
()
where ,
,
and
.
If and then
()
where and
At 24.4kbps,
()
where ,
,
and
, and then
()
where,
and
.
6.2.3.2.5.7 Spectral envelope adjustment
Spectral envelope adjustment is used to generate high frequency spectrum with combining the high frequency excitation spectrum and the interpolated envelope according to the frequency. The low frequency envelope is used in the first band. The interpolated envelope is calculated as follows:
(1875)
where p (p=1,…,Nband_G-1) is a band index;
wp(k) is a interpolation function where and
; and
is the initial value of the spectral envelope
.
The envelope is then multiplied by the generated high frequency excitation spectrum in subclause 6.2.3.2.5.6.
(1876)
In the FB case, the maximum value of k in equations (1875) and (1876) is corrected to 799 and a decision is made on whether or not to interpolate the envelope by comparing the envelope of the first band in FB and the last band in SWB.
()
where pb is the index of the first band in FB, 14 at 24.4kbps or 12 at 32kbps.
6.2.3.2.5.8 Spectral combining
The final step of the Generic mode is to combine the noise filled spectrum, obtained from decoding the quantized spectrum in subclauses 6.2.3.2.5.4 and 6.2.3.2.5.5, with the high frequency spectrum
, generated in subclause 6.2.3.2.5.7. The noise filled spectrum includes the low frequency spectrum and some bands in the high frequency spectrum where bits were allocated during the spectral quantization. The generated high frequency spectrum includes only the high frequency spectrum.
There are two kinds of overlap bands between these two spectra; one is a partial overlap at the junction between the low frequency and the high frequency (376~400 at 32kbps, 304~328 at 24.4kbps). The other is a full overlap due to the difference between the two band allocations, i.e. due to some bands in the high frequency spectrum being allocated bits during the spectral quantisation.
In the partial overlap band, the spectral combining is performed based on an overlap and add process. If there are any allocated bits from the spectral quantizer, the noise filled spectrum is used directly for the final decoded spectrum. If there were no bits allocated by the spectral quantizer, a overlap and add process between the two spectra is performed:
()
where is the overlapped length at the junction band, 16 at 24.4kbps and 8 at 32kbps.
In the full overlap bands, the spectrum is combined in a selective way. If there are any allocated bits from the spectral quantizer, the noise filled spectrum is used directly for the final decoded spectrum. If there were no bits allocated by the spectral quantizer, the high frequency spectrum
is used for generating the final decoded spectrum
.
6.2.3.2.6 PVQ decoding and de-indexing
6.2.3.2.6.1 High dynamic range arithmetic decoding
The PVQ-codewords are extracted from the bit stream using the Range decoder.
6.2.3.2.6.2 Split-PVQ decoding approach
The PVQ-split parameters are obtained as inverse of the functions in subclause 5.3.4.2.7.2
6.2.3.2.6.2.1 Split-PVQ Decoder band splitting calculation
The initial number of segments (parts) is computed according to the first equation in subclause 5.3.4.2.7.2.1. In case
and the band bit rate is high, the flag
is read from the bit stream. Finally,
is computed as
()
6.2.3.2.6.2.2 PVQ sub vector gain decoding
The decoded Split-PVQ angles are converted into sub-vector gains.
6.2.3.2.6.3 PVQ sub-vector MPVQ de-indexing
First the,
values for the sub vector
to be decoded are used in a ‘FindSizeAndOffsets(N,K)’ function which pre-computes the row
of the MPVQ offset matrix [
,
] and also calculates the integer size of the MPVQ-index MPVQ-size using this last row. The last four equation in subclause 5.3.4.2.7.4.1 are employed for the offset and size calculations.
Secondly the 1 bit leading sign index and the MPVQ-indexis obtained from the Range decoder using the calculated MPVQ-size information. The leading sign
is decoded from the 1 bit leading sign index, where a zero sign index yields a positive
value of “+1”, and a non-zero sign index yields a negative
value of “-1”.
The third step is the actual MPVQ-de-indexing scheme, converting the leading sign and the
to a valid integer
vector
.
The MPVQ de-indexing loop is carried out according to figure 108, where is the MPVQ-index,
is the current row number of the MPVQ offset matrix,
is the pointer into the samples/coefficients of the received PVQ-vector
. The function “FindAmplitudeAndOffset” obtains the amplitude
and the MPVQ indexing offset
for the current number of accumulated pulses
, by searching in the current row
in the MPVQ offset matrix. Further the function “UpdateOffsetsBwd” iteratively updates the required MPVQ-offsets for the next larger dimension using combinations of the last four equations in subclause 5.3.4.2.7.4.1. The function “GetLeadSign” obtains the next leading sign value
from the LSB of
, and shifts the
one bit to the right. On the decoder side the MPVQ recursion is run in the order of position 0 to position
, with a dimension
decreasing from
to 1.
Figure 108: Detailed MPVQ-de-indexing
The calculation of the indexing offset matrix is optimized to use direct calculations up to row for any combination of
and
, further if the number of unit pulses
are low enough and the dimension
is 5 or lower, a direct row
initialization of the offset matrix is used for the offset determination, where the last column in row
is calculated using the low dynamic “row-only” relation:
()