5.3.4 High Quality MDCT coder (HQ)
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
5.3.4.1 Low-rate HQ coder
The structure of the Low-rate HQ MDCT core coder is presented in figure 65. Encoding is performed in the MDCT domain. Based on the input signal bandwidth, operational bitrates and signal characteristics the coding modes are decided. For example, an input signal with the sampling frequency of 32 kHz or 48 kHz with operational bitrates of 13.2 kbps and 16.4 kbps will be encoded using any one of the three alternative modes: Transient mode, Normal mode, and Harmonic mode. For an input signal with sampling frequency of 8 kHz or 16 kHz, the tonality estimation block is not used, so the input signal is either encoded under Transient or Normal mode. The mode classification decision is described in subclause 5.3.4.1.1. The following table 101 summarizes the supported modes per bit rate and bandwidth. Mode information (one bit for Transient/non-Transient, and one bit for Normal/Harmonic) is encoded and transmitted to the decoder side.
Table 101: Supported modes for low-rate HQ coder
|
Bitrate [kbps] |
Bandwidth |
Supported modes |
|
7.2, 8 |
NB |
Normal, Transient |
|
13.2, 16.4 |
NB, WB |
Normal, Transient |
|
SWB |
Normal, Transient, Harmonic |
Based on the mode, the obtained spectral coefficients are grouped into bands of unequal lengths. The energy of each band is estimated and the resulting spectral envelope consisting of the energies of all bands is quantized and encoded using Huffman coding. The quantized energies are used as input for bit allocation. The spectral coefficients are quantized using TCQ and USQ and encoded based on the allocated bits for each frequency band. The encoded spectral coefficients are adjusted using the quantized energies. The level of the most significant spectral coefficients is adjusted using an estimated gain which is coded and transmitted to the decoder. The spectral bands which are not quantized in the HF region are identified and coded with relatively few bits using the quantized spectral coefficients information.
The parameters transmitted from encoder to decoder are the mode selection, energy envelope information, the quanitzed spectral coefficients, LF and the HF parameters.
Figure 65: Structural block diagram of the Low-rate HQ coder
5.3.4.1.1 Tonality Estimation
Non transient mode is referred as Normal mode for NB and WB inputs, whereas for SWB and FB inputs at 13.2 and 16.4 kbps, Non-transient signals are further classified as Normal and Harmonic mode. The detailed description for Normal and Harmonic mode classification is as follows,
448 MDCT coefficients in the 2400-13600 Hz frequency range are used in order to perform this classification. 14 sub-bands are split, which is equally 32 coefficients per sub-band. The frequency sharpness is defined as the ratio between the peak and the average magnitudes in each sharpness band:
(1026)
where the peak magnitude of spectral coefficients in a sharpness band, denoted, is:
(1027)
The counter denotes the number of sub-bands which have harmonic characteristics corresponding to the first 5 sub-bands. is initialized to zero and increased by one if . The counter denotes the number of sub-band which have harmonic characteristics corresponding to the remaining 9 sub-bands. is initialized to zero and increased by one if and .
The mode counters and are introduced for harmonic mode detection and are initialized to zero. is increased by one and is decreased by one if and . Otherwise, the mode counter is decreased by one and is increased by one. and are constrained to only hold values between 0 and 8. The calculations of and are summarized as follows,
(1028)
The current frame mode is classified as harmonic if , and ; or if .
5.3.4.1.2 Grouping of spectral coefficients
The spectral coefficients are divided to obtain bands of variable lengths; the total number of bands varies depending on the signal bandwidth (NB, WB, SWB, and FB), operational bitrates and the classifier. Tables 103 to 108 describe the band structure for different signal bandwidths and operational bitrates. The NB band structure is used for NB signals, the WB band structure is used for WB signals, and the SWB band structure is used for SWB and FB signals.
The total number of bands and the corresponding bandwidth used for NB, WB and SWB is presented in table 102. The band structure and the number of bands for FB are same as SWB for 13.2 and 16.4 kbps.
In case of the Transient mode, the coefficients of the equivalent four 5-ms transforms are consecutively joined, and the bandwidth varies based on signal bandwidth. table 103, 104, 105 shows the detailed band structure for the NB, WB, and SWB (and FB) Transient frames based on the operational bitrates. In each table, b denotes the index of the band, kwidth(b) is the corresponding band length, and kstart(b) and kend(b) denote the start and end index of the spectral coefficients forming the band.
Table 102: Number of bands and its corresponding bandwidth
|
Bitrate (kbps) |
Nbands |
Bandwidth |
||
|
Transient |
Normal/Harmonic |
|||
|
NB |
7.2, 8 |
16 (4*4) |
13 |
160 |
|
13.2 |
20(4*5) |
19 |
||
|
WB |
13.2 |
28(4*7) |
18 |
320 |
|
16.4 |
20 |
|||
|
SWB, FB |
13.2 ,16.4 |
32(4*8) |
22 |
568 |
|
24 |
640 |
|||
Table 103: Band structure for NB Transient frames
|
Transient mode |
||||||
|
7.2,8 kbps |
13.2 kbps |
|||||
|
b |
||||||
|
0 |
6 |
0 |
5 |
6 |
0 |
5 |
|
1 |
8 |
6 |
13 |
7 |
6 |
12 |
|
2 |
11 |
14 |
24 |
7 |
13 |
19 |
|
3 |
15 |
25 |
39 |
9 |
20 |
28 |
|
4 |
6 |
40 |
45 |
11 |
29 |
39 |
|
5 |
8 |
46 |
53 |
6 |
40 |
45 |
|
6 |
11 |
54 |
64 |
7 |
46 |
52 |
|
7 |
15 |
65 |
79 |
7 |
53 |
59 |
|
8 |
6 |
80 |
85 |
9 |
60 |
68 |
|
9 |
8 |
86 |
93 |
11 |
69 |
79 |
|
10 |
11 |
94 |
104 |
6 |
80 |
85 |
|
11 |
15 |
105 |
119 |
7 |
86 |
92 |
|
12 |
6 |
120 |
125 |
7 |
93 |
99 |
|
13 |
8 |
126 |
133 |
9 |
100 |
108 |
|
14 |
11 |
134 |
144 |
11 |
109 |
119 |
|
15 |
15 |
145 |
159 |
6 |
120 |
125 |
|
16 |
– |
– |
– |
7 |
126 |
132 |
|
17 |
– |
– |
– |
7 |
133 |
139 |
|
18 |
– |
– |
– |
9 |
140 |
148 |
|
19 |
– |
– |
– |
11 |
149 |
159 |
Table 104: Band structure for WB Transient frames
|
Transient mode |
|||
|
13.2, 16.4 kbps |
|||
|
b |
|||
|
0 |
6 |
0 |
5 |
|
1 |
7 |
6 |
12 |
|
2 |
8 |
13 |
20 |
|
3 |
10 |
21 |
30 |
|
4 |
12 |
31 |
42 |
|
5 |
16 |
43 |
58 |
|
6 |
21 |
59 |
79 |
|
7 |
6 |
80 |
85 |
|
8 |
7 |
86 |
92 |
|
9 |
8 |
93 |
100 |
|
10 |
10 |
101 |
110 |
|
11 |
12 |
111 |
122 |
|
12 |
16 |
123 |
138 |
|
13 |
21 |
139 |
159 |
|
14 |
6 |
160 |
165 |
|
15 |
7 |
166 |
172 |
|
16 |
8 |
173 |
180 |
|
17 |
10 |
181 |
190 |
|
18 |
12 |
191 |
202 |
|
19 |
16 |
203 |
218 |
|
20 |
21 |
219 |
239 |
|
21 |
6 |
240 |
245 |
|
22 |
7 |
246 |
252 |
|
23 |
8 |
253 |
260 |
|
24 |
10 |
261 |
270 |
|
25 |
12 |
271 |
282 |
|
26 |
16 |
283 |
298 |
|
27 |
21 |
299 |
319 |
Table 105: Band structure for SWB, FB Transient frames
|
Transient mode |
||||||
|
13.2 kbps |
16.4 kbps |
|||||
|
b |
||||||
|
0 |
7 |
0 |
6 |
8 |
0 |
7 |
|
1 |
8 |
7 |
14 |
9 |
15 |
8 |
|
2 |
10 |
15 |
24 |
11 |
25 |
17 |
|
3 |
11 |
25 |
35 |
13 |
36 |
28 |
|
4 |
15 |
36 |
50 |
17 |
51 |
41 |
|
5 |
21 |
51 |
71 |
23 |
71 |
58 |
|
6 |
29 |
72 |
100 |
32 |
99 |
81 |
|
7 |
41 |
101 |
141 |
47 |
140 |
113 |
|
8 |
7 |
142 |
148 |
8 |
160 |
167 |
|
9 |
8 |
149 |
156 |
9 |
168 |
176 |
|
10 |
10 |
157 |
166 |
11 |
177 |
187 |
|
11 |
11 |
167 |
177 |
13 |
188 |
200 |
|
12 |
15 |
178 |
192 |
17 |
201 |
217 |
|
13 |
21 |
193 |
213 |
23 |
218 |
240 |
|
14 |
29 |
214 |
242 |
32 |
241 |
272 |
|
15 |
41 |
243 |
283 |
47 |
273 |
319 |
|
16 |
7 |
284 |
290 |
8 |
320 |
327 |
|
17 |
8 |
291 |
298 |
9 |
328 |
336 |
|
18 |
10 |
299 |
308 |
11 |
337 |
347 |
|
19 |
11 |
309 |
319 |
13 |
348 |
360 |
|
20 |
15 |
320 |
334 |
17 |
361 |
377 |
|
21 |
21 |
335 |
355 |
23 |
378 |
400 |
|
22 |
29 |
356 |
384 |
32 |
401 |
432 |
|
23 |
41 |
385 |
425 |
47 |
433 |
479 |
|
24 |
7 |
426 |
432 |
8 |
480 |
487 |
|
25 |
8 |
433 |
440 |
9 |
488 |
496 |
|
26 |
10 |
441 |
450 |
11 |
497 |
507 |
|
27 |
11 |
451 |
461 |
13 |
508 |
520 |
|
28 |
15 |
462 |
476 |
17 |
521 |
537 |
|
29 |
21 |
477 |
497 |
23 |
538 |
560 |
|
30 |
29 |
498 |
526 |
32 |
561 |
592 |
|
31 |
41 |
527 |
567 |
47 |
593 |
639 |
In the Normal mode of operation, the bands have different sizes that increase with increasing frequency. This subdivision allows a consistent representation of the spectrum which closely resembles that of the human ear. Higher frequency resolution is used for low frequencies, while lower frequency resolution is used for high frequencies. The detailed allocation of spectral coefficients to bands for NB, WB, SWB and FB signals are presented in tables 106, 107, 108
Table 106: Band structure for NB Normal mode frames
|
Normal mode |
||||||
|
7.2,8 kbps |
13.2 kbps |
|||||
|
b |
||||||
|
0 |
6 |
0 |
5 |
6 |
0 |
5 |
|
1 |
6 |
6 |
11 |
6 |
6 |
11 |
|
2 |
6 |
12 |
17 |
6 |
12 |
17 |
|
3 |
6 |
18 |
23 |
6 |
18 |
23 |
|
4 |
7 |
24 |
30 |
6 |
24 |
29 |
|
5 |
8 |
31 |
38 |
6 |
30 |
35 |
|
6 |
9 |
39 |
47 |
7 |
36 |
42 |
|
7 |
10 |
48 |
57 |
7 |
43 |
49 |
|
8 |
13 |
58 |
70 |
8 |
50 |
57 |
|
9 |
15 |
71 |
85 |
8 |
58 |
65 |
|
10 |
19 |
86 |
104 |
9 |
66 |
74 |
|
11 |
24 |
105 |
128 |
10 |
75 |
84 |
|
12 |
31 |
129 |
159 |
11 |
85 |
95 |
|
13 |
– |
– |
– |
13 |
96 |
108 |
|
14 |
– |
– |
– |
15 |
109 |
123 |
|
15 |
– |
– |
– |
17 |
124 |
140 |
|
16 |
– |
– |
– |
19 |
141 |
159 |
Table 107: Band structure for WB Normal mode frames
|
Normal mode |
||||||
|
13.2 kbps |
16.4 kbps |
|||||
|
b |
||||||
|
0 |
6 |
0 |
5 |
6 |
0 |
5 |
|
1 |
6 |
6 |
11 |
6 |
6 |
11 |
|
2 |
6 |
12 |
17 |
6 |
12 |
17 |
|
3 |
6 |
18 |
23 |
6 |
18 |
23 |
|
4 |
6 |
24 |
29 |
6 |
24 |
29 |
|
5 |
7 |
30 |
36 |
6 |
30 |
35 |
|
6 |
7 |
37 |
43 |
7 |
36 |
42 |
|
7 |
8 |
44 |
51 |
8 |
43 |
50 |
|
8 |
10 |
52 |
61 |
8 |
51 |
58 |
|
9 |
11 |
62 |
72 |
9 |
59 |
67 |
|
10 |
13 |
73 |
85 |
11 |
68 |
78 |
|
11 |
16 |
86 |
101 |
12 |
79 |
90 |
|
12 |
19 |
102 |
120 |
14 |
91 |
104 |
|
13 |
24 |
121 |
144 |
17 |
105 |
121 |
|
14 |
30 |
145 |
174 |
20 |
122 |
141 |
|
15 |
37 |
175 |
211 |
23 |
142 |
164 |
|
16 |
47 |
212 |
258 |
28 |
165 |
192 |
|
17 |
61 |
259 |
319 |
34 |
193 |
226 |
|
18 |
– |
– |
– |
42 |
227 |
268 |
|
19 |
– |
– |
– |
51 |
269 |
319 |
Table 108: Band structure for SWB and FB Normal and Harmonic frames
|
Normal and Harmonic mode |
||||||
|
13.2 kbps |
16.4 kbps |
|||||
|
b |
||||||
|
0 |
6 |
0 |
5 |
6 |
0 |
5 |
|
1 |
6 |
6 |
11 |
6 |
6 |
11 |
|
2 |
6 |
12 |
17 |
6 |
12 |
17 |
|
3 |
6 |
18 |
23 |
6 |
18 |
23 |
|
4 |
6 |
24 |
29 |
6 |
24 |
29 |
|
5 |
6 |
30 |
35 |
6 |
30 |
35 |
|
6 |
7 |
36 |
42 |
7 |
36 |
42 |
|
7 |
8 |
43 |
50 |
7 |
43 |
49 |
|
8 |
9 |
51 |
59 |
8 |
50 |
57 |
|
9 |
10 |
60 |
69 |
9 |
58 |
66 |
|
10 |
11 |
70 |
80 |
10 |
67 |
76 |
|
11 |
13 |
81 |
93 |
11 |
77 |
87 |
|
12 |
16 |
94 |
109 |
13 |
88 |
100 |
|
13 |
19 |
110 |
128 |
15 |
101 |
115 |
|
14 |
23 |
129 |
151 |
18 |
116 |
133 |
|
15 |
28 |
152 |
179 |
21 |
134 |
154 |
|
16 |
34 |
180 |
213 |
26 |
155 |
180 |
|
17 |
42 |
214 |
255 |
32 |
181 |
212 |
|
18 |
55 |
256 |
310 |
39 |
213 |
251 |
|
19 |
68 |
311 |
378 |
48 |
252 |
299 |
|
20 |
84 |
379 |
462 |
59 |
300 |
358 |
|
21 |
105 |
463 |
567 |
74 |
359 |
432 |
|
22 |
– |
– |
– |
92 |
433 |
524 |
|
23 |
– |
– |
– |
115 |
525 |
639 |
5.3.4.1.3 Energy Envelope coding
Energy envelope coding module is applied for all types of signal i.e., from NB, WB, SWB and FB for various bitrates as described in table 101. In this module, the spectrum energy of a band is computed the differential indices of scalar quantized band energies are encoded using either a Large symbol coding method or a Small symbol coding method. The coding method is selected according to the range required to represent all the differential indices and the bit consumption. The encoding of band energies is detailed below.
The spectrum energy of a band, EM(b) is computed as follows:
(1029)
In case of transient mode, the energies to be quantized are first reordered such that energy corresponding to even sub-frame index m = 0, 2 are in frequency-increasing order while the energy of odd sub-frame index m = 1, 3 are in frequency decreasing order which allows for an efficient differential energy encoding
In each frame, the energies are scalar quantized with a uniform scalar quantizer qint. The value of qint varies and it is selected based on table 109.The index of the quantized energy, IM (b), can easily be obtained as:
(1030)
Table 109: Scalar quantizer values for NB, WB, SWB, FB modes
|
BW |
Mode |
7.2 kbps |
8 kbps |
13.2 kbps |
16.4kbps |
|
NB |
Transient |
1.8 |
2.2 |
1.4 |
– |
|
Normal |
1 |
1 |
0.8 |
– |
|
|
WB |
Transient |
– |
– |
1.8 |
1.8 |
|
Normal |
– |
– |
0.8 |
0.8 |
|
|
SWB, FB |
Transient |
– |
– |
3 |
1.2 |
|
Normal, Harmonic |
– |
– |
0.6 |
0.6 |
The quantized indices of the band energies are differentially coded by computing
(1031)
where lowest frequency band quantized index is differentially coded using a reference band energy .
The differential indices ∆IM (b) are constrained into the range of [–256, 255]. This is performed by first adjusting the negative differential indices and then adjusting the positive differential indices as follows:
(1032)
The constrained differential indices are used for selecting the more efficient mode from either the Small symbol coding method or the Large symbol coding method. The method selection between the Small symbol coding mode and the Large symbol coding mode is described in subclause 5.3.4.1.3.2.
Using the coding method information obtained from subclause 5.3.4.1.3.2 differential indices are coded using the respective modes as indicated below.
A flag bit DENG_CMODE which was obtained from subclause 5.3.4.1.3.2 used to indicate the type of encoding method between the Small symbol coding method and the Large symbol coding method and transmitted as side information to the decoder. The flag DENG_CMODE is set to 1 when the Small coding method is used and it is set to zero for the Large symbol coding method and it is described in table 110.
If the flag DENG_CMODE is set to 1, in the Small symbol coding the band energies are either coded by resized or context based Huffman coding. A flag bit LC_MODE is used to indicate the mode selection between resized or context based Huffman coding and it is transmitted as side information to the decoder. The mode selection between resized and context based coding is done based on the estimated number of bits consumed by the respective coding modes. The resized Huffman coding described in subclause 5.3.4.1.3.3.2 is used for coding the differential indices when LC_MODE is set to 1 and LC_MODE is set to 0 for coding the differential indices using context based coding which is described in subclause 5.3.4.1.3.3.1.
5.3.4.1.3.1 Reconstruction of quantized energies
The quantized differential indices are reconstructed according to
(1033)
where is the reference band energy The final resulting reconstructed quantized energies are obtained as follows
(1034)
where the value of qint varies and it is selected based on table 109.
If band energies are quantized under transient mode i.e., IsTransient is True, the energies are reordered back to original.
5.3.4.1.3.2 Energy envelope coding mode selection
The differential quantization indices are encoded by one of two coding methods. The coding method is selected according to the range required to represent all the differential indices and the bit consumption. The range of the large symbol coding method enables it to represent larger number of bits. The large symbol coding method consists of a scale mode and a pulse mode. The small symbol coding method uses an upper bit coding method which consists of a context based Huffman coding mode and a re-sized Huffman coding mode as well as bit packing for the lower bit.
Table 110: Low-rate HQ envelope coding modes
|
Coding Method index DENG_CMODE (1bit) |
Coding Method |
Coding Mode index (1bit) |
Description |
|---|---|---|---|
|
0 |
Large symbol method |
0 |
Pulse mode |
|
1 |
Scale mode |
||
|
1 |
Small symbol method |
0 |
Context based Huffman coding mode |
|
1 |
Re-sized Huffman coding mode |
If at least one of the differential quantization indices in all the bands of a frame cannot be represented in [-32, 31]([-46,17] for the first index), the large symbol coding method is always used. If this larger range is not required the bits consumption for both large and small symbol coding modes are compared and the coding mode with the least bits is selected. The corresponding coding method information is transmitted for each frame.
The Small symbol coding method and the large symbol method are used for estimating the bits consumption for coding the differential indices which are obtained in equation (1031). The detailed descriptions of the respective coding modes are given in the following subclause 5.3.4.1.3.3.
The number of estimated bits hcode1 and flag LCmode information obtained from subclause 5.3.4.1.3.3. is used for selecting the best mode with the Large symbol coding method bits ULbits obtained from subclause 5.3.4.1.3.4, as shown below.
5.3.4.1.3.3 Small symbol coding method
If IsTransient is True,
In this module, the quantized indices obtained from equation (1030) are constrained into the range of [–15, 16] by calculating the differences as in equation (1031) and constraining the range. This is performed by first adjusting the negative differential indices and then adjusting the positive differential indices as follows:
- Compute the differential indices for lowest frequency band using reference band energy as in equation (1031)
- For b = 0, if and
- For the rest of the bands, compute the differential indices defined in equation (1031) in order from the highest-frequency band to the lowest-frequency band.
- If
- Re-compute the differential indices in order from the lowest-frequency sub-vector from band b=1 to the highest-frequency sub-vector.
- If
- The adjusted differential indices in the range [0, 31] are obtained by adding an offset of 15 to ∆IM(b).
Context based Huffman coding mode is used for estimating the bit consumption, if the range of differential indices lies in between [10, 22] resized Huffman coding [b-Huffman] mode is enabled for estimating the bits consumption. The Huffman codes for the differential indices for resized Huffman coding mode when IsTransient is True are given in table 111. In the table 111, Hi denotes the index of the Huffman code, Hc is the Huffman code corresponding to index Hi and Hb denotes the bits required for representing the Huffman code corresponding to index Hi.
Table 111: Huffman code for Transient frames
|
Hi |
Hc |
Hb |
Hi |
Hc |
Hb |
Hi |
Hc |
Hb |
Hi |
Hc |
Hb |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
0 |
0 |
0 |
8 |
0 |
0 |
16 |
11 |
2 |
24 |
0 |
0 |
|
1 |
0 |
0 |
9 |
0 |
0 |
17 |
0010 |
4 |
25 |
0 |
0 |
|
2 |
0 |
0 |
10 |
0 |
0 |
18 |
011010 |
6 |
26 |
0 |
0 |
|
3 |
0 |
0 |
11 |
1111010 |
7 |
19 |
00111010 |
8 |
27 |
0 |
0 |
|
4 |
0 |
0 |
12 |
01010 |
5 |
20 |
010111010 |
9 |
28 |
0 |
0 |
|
5 |
0 |
0 |
13 |
110 |
3 |
21 |
110111010 |
9 |
29 |
0 |
0 |
|
6 |
0 |
0 |
14 |
01 |
2 |
22 |
0 |
0 |
30 |
0 |
0 |
|
7 |
0 |
0 |
15 |
00 |
2 |
23 |
0 |
0 |
31 |
0 |
0 |
The estimated bits for context based coding which was obtained from subclause 5.3.4.1.3.3.1 is represented as ,while for resized coding it is represented as, a best coding mode is selected based on
.
If IsTransient is False,
The differential quantization indices are adjusted to have positive values by adding 46 in the first band and 32 in the other bands. The differential quantization indices are split into 5 upper bits and 1 lower bit. The 5 upper bits are encoded by either a context based Huffman coding mode described in subclause 5.3.4.1.3.3.1 or a re-sized Huffman coding mode described in subclause 5.3.4.1.3.3.2 and the 1 lower bit is packed.
In more detail, the context based coding mode or the resized Huffman coding is used for estimating the bits. The differential indices which are obtained in equation (1031) ∆IM (b) are constrained into the range of [0, 63] by adding an offset of 32 to ∆IM(b) for b = 1,….,Nbands-1 and for b=0 an offset of 46 is used for ∆IM(0).If the constrained differential indices exceed [0 63] when IsTransient is False and [0 31] when IsTransient is True, hcode1 is set to -1 and the differential indices are coded using the Large symbol coding method.
The least significant bit is extracted from the constrained differential indices ∆IM(b) for b = 0,….,Nbands-1 using
(1035)
The updated differential indices are used for estimating the bits consumed by the two different coding modes. Based on the estimated bits obtained from context based coding, , and resized Huffman coding mode, , the best coding mode is selected as shown below.
The differential index ∆IM (0) is usually transmitted as is for both IsTransient is True or False, i.e., 5 bits per norm index is required and these bits are updated to the estimated bits which was shown below.
(1036)
The least significant bit extracted from the constrained differential indices is transmitted as is, if the Small symbol coding method is selected and is updated as shown below.
(1037)
By default flag LCmode is set to 0 and represents context based coding, while if flag LCmode is reset to 1, it indicates resized Huffman coding mode consumes less bits compared to context based coding, and the estimated bits is used for selecting the best mode with the Large symbol coding method bits ULbits.
5.3.4.1.3.3.1 Context based Huffman coding mode
If the this coding mode is selected for the current frame, the context based Huffman coding is applied to the adjusted differential indices. These indices are encoded using a context model which corresponds to the adjusted differential index in a previous band. The first band must be handled separately, so the context for encoding is adjusted by subtracting
(1038)
There are three groups depending on the and two probability models as shown in table 112, and share the Huffman tables depending on the probability model.
Table 112: The groups and the probability models
|
Group index |
Lower bound |
Upper bound |
Probability model |
|---|---|---|---|
|
0 |
– |
12 |
0 |
|
1 |
13 |
17 |
1 |
|
2 |
18 |
– |
0 |
A Huffman table for and is defined in table 113 and a Huffman table for is defined in table 114.
If is located in, is reversed toand then the reversed value is encoded.
Table 113: Huffman coefficient table for the group0 Huffman coding (group0,group2)
|
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|---|---|
|
0 |
11100111101111 |
8 |
11100110 |
16 |
000 |
24 |
111011 |
|
1 |
11100111101110 |
9 |
1110100 |
17 |
010 |
25 |
0110011 |
|
2 |
1110011110110 |
10 |
0110010 |
18 |
110 |
26 |
1110010 |
|
3 |
111001111010 |
11 |
100100 |
19 |
0111 |
27 |
1110101 |
|
4 |
1001010001 |
12 |
01101 |
20 |
1111 |
28 |
1001011 |
|
5 |
1110011111 |
13 |
1000 |
21 |
10011 |
29 |
10010101 |
|
6 |
111001110 |
14 |
101 |
22 |
011000 |
30 |
1001010000 |
|
7 |
100101001 |
15 |
001 |
23 |
111000 |
31 |
11100111100 |
Table 114: Huffman coefficient table for the context based Huffman coding (group1)
|
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|---|---|
|
0 |
0010000100110 |
8 |
001000101 |
16 |
11 |
24 |
0010000101 |
|
1 |
001000100110 |
9 |
00100110 |
17 |
100 |
25 |
00100001000 |
|
2 |
00100111010 |
10 |
0010010 |
18 |
1011 |
26 |
001000010010 |
|
3 |
00100010010 |
11 |
101000 |
19 |
10101 |
27 |
00100111011 |
|
4 |
00100010001 |
12 |
00101 |
20 |
101001 |
28 |
001000100111 |
|
5 |
00100010000 |
13 |
0011 |
21 |
00100000 |
29 |
00100001001110 |
|
6 |
0010011100 |
14 |
000 |
22 |
00100011 |
30 |
001000010011110 |
|
7 |
001001111 |
15 |
01 |
23 |
001000011 |
31 |
001000010011111 |
5.3.4.1.3.3.2 Resized Huffman coding mode
Resized Huffman coding is applied to the adjusted differential indices. In this method, the span of the differential indices is reduced while being able to perfectly reconstruct the differential indices. Based on the newly modified differential indices obtained from equation (1040), number of bits consumed for coding the new differential indices is estimated as shown below.
(1039)
5.3.4.1.3.3.3 Differential Indices Modification
The modification of differential indices is done according to the value of the differential index for the preceding sub band and a threshold. Equation (1040) is used for modifying the span of differential indices. It should be noted that this modification is not applied to the first differential index, i.e. the case of "b = 1" and the differential indices which are not true for the both of the “if” conditions.
(1040)
Based on the new differential indices obtained from equation (1040), resized Huffman coding is applied, if any of the new differential indices lies outside [0 31] range, resized Huffman coding is not used for coding the differential indices.
The range of the new differential indices for Huffman coding is identified as shown below.
(1041)
where b = 1, … Nbands-1
Based on the range obtained from equation (1041), range difference is calculated as shown below.
(1042)
Resized Huffman coding is used for coding the new differential indices if the RangeDiff values lies below 11, otherwise resized Huffman coding is not used. The Huffman codes and its corresponding bits consumption for coding of the new differential indices are given in table 115.
Table 115: Huffman code for Non-Transient frames
|
Hi |
Hc |
Hb |
Hi |
Hc |
Hb |
Hi |
Hc |
Hb |
Hi |
Hc |
Hb |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
0 |
0 |
0 |
8 |
001111111 |
9 |
16 |
10 |
2 |
24 |
1011111111 |
10 |
|
1 |
0 |
0 |
9 |
00111111 |
8 |
17 |
101 |
3 |
25 |
1111111111 |
11 |
|
2 |
0 |
0 |
10 |
0011111 |
7 |
18 |
1011 |
4 |
26 |
0 |
0 |
|
3 |
0 |
0 |
11 |
001111 |
6 |
19 |
10111 |
5 |
27 |
0 |
0 |
|
4 |
0 |
0 |
12 |
00111 |
5 |
20 |
101111 |
6 |
28 |
0 |
0 |
|
5 |
01111111111 |
11 |
13 |
0011 |
4 |
21 |
1011111 |
7 |
29 |
0 |
0 |
|
6 |
0111111111 |
10 |
14 |
001 |
3 |
22 |
10111111 |
8 |
30 |
0 |
0 |
|
7 |
0011111111 |
10 |
15 |
00 |
2 |
23 |
101111111 |
9 |
31 |
0 |
0 |
5.3.4.1.3.4 Large symbol coding method
If the large symbol coding method is used then either the pulse mode or the scale mode is selected to encode the differential quantization indices. The pulse mode is adequate when no differential quantization index is over [-4,3]. If this range is exceeded, the pulse mode cannot be used, and instead the scale mode is always used. Additionally, if the first quantization differential index is over [-64,63], the scale mode is always used. In the large symbol coding method a Huffman coding with 8 symbols shown in table 116 is used.
Table 116: Huffman coefficient table in the large symbol coding method
|
Index |
Code |
|---|---|
|
-4 |
0001011 |
|
-3 |
00011 |
|
-2 |
001 |
|
-1 |
01 |
|
0 |
1 |
|
1 |
0000 |
|
2 |
000100 |
|
3 |
0001010 |
5.3.4.1.3.4.1 Pulse mode
In the pulse mode, there are two indicators; an indicator to show whether the first index is transmitted separately and an indicator to show if there is a differential quantization index exceeding the range [-4,3].
If the first index is within [-4,3], is set to 0 and the first index is then encoded by the Huffman coding defined in table 116 with the other indices. Otherwise, is set to 1 and the first index is then packed using 7 bits after adding 64.
If a pulse exists in the current frame, is set to 1 and the pulse position and amplitude are transmitted using 5 bits and 7 bits respectively. All the other indices are then encoded by the Huffman coding in table 116. If no pulse exists, all indices are encoded by the Huffman coding in table 116.
Table 117: bit allocation for the scale mode
|
Huffman bits |
||||||||
|---|---|---|---|---|---|---|---|---|
|
bits |
1 |
1 |
1 |
1 |
7 |
5 |
7 |
– |
5.3.4.1.3.4.2 Scale mode
In the scale mode, all the indices are split into 3 upper bits and a few lower bits depending on the minimum and maximum of all the indices. The 3 upper bits are encoded by the Huffman coding in table 116 and the lower bits are packed. The number of lower bits is defined as . Theis calculated to make all the differential quantization indices fit within the range [-4,3] by scaling down the indices, and is represented by three bits.
5.3.4.1.4 MDCT coefficients quantization
5.3.4.1.4.1 Normal Mode
5.3.4.1.4.1.1 Overview
The figure below shows the overview of the normal mode encoder.
Figure 66: Block diagram of the Normal mode encoder overview
5.3.4.1.4.1.2 Energy envelope coding
Details are described in subclause 5.3.4.1.3.
5.3.4.1.4.1.3 Tonality flag calculation
Tonality flags are calculated for the high bands b= Nbands– hb, ..,Nbands-1 as described in table 118. For example, for the last (highest) five bands, i.e. b=17 to 21 for 13.2 kbps and b=19 to 23 for 16.4kbps in table 108, and the last three bands b=15 to 17 for 13.2 kbps and b=17 to 19 for 16.4kbps in case of WB as in table 107 and last two bands b=15 to 16 for 13.2 kbps for NB inputs as in table 106, peak-to-average ratios are calculated and compared with a threshold. Flags indicating whether the peak-to-average ratios are greater than the threshold are sent to the decoder side using one bit per band. If the peak-to-average ratio is greater than the threshold, the tonality flag is set to “1”, otherwise it is set to “0”. For the SWB and FB case, a limited-band mode flag is further sent to the decoder side as described in subclause 5.3.4.1.4.1.4.4.2, if the tonality flag is set to “1”.
Table 118: Total number of high bands for tonality calculation
|
Bandwidth |
Bitrate (kbps) |
High bands , hb |
|
NB |
13.2 |
2 |
|
WB |
13.2,16.4 |
3 |
|
SWB, FB |
13.2,16.4 |
5 |
5.3.4.1.4.1.4 Bit allocation
5.3.4.1.4.1.4.1 Bit allocation overview
In the Normal Mode, band spectra are encoded by either Trellis Coding Quantization (TCQ) or Pitch Filtering Spectrum Coding (PFSC) with assigned bits for the bands. TCQ is used for encoding peaky/tonal spectrum bands for NB, WB, SWB and FB, while PFSC is mainly applied for encoding other spectrum bands in a high-frequency region other than NB. This switching principle is analogous to the coding mode switching between Generic mode and Sinusoidal mode in G.718 Annex B [25].
Bit-allocation process is performed in the following manner. Firstly, bands encoded using PFSC are identified based on the tonality flags among the four highest bands in case of SWB ,FB and peak-to-average ratio is calculated for last band in the WB and necessary bits (1 or 2 bits) are allocated to each of the identified bands. Secondly, remaining bits are allocated to other bands based on perceptual importance. When there is any band whose assigned bit results is zero in the four bands for SWB and FB signals, such band is re-identified as a PFSC encoding band and the bit allocations are re-calculated.
Figure 67: A flowchart of bit allocation processing for LR-HQ Normal mode.
5.3.4.1.4.1.4.2 The adjustment of quantized energy envelope prior to bit allocation
For the Non-Transient mode of NB (the bit rate is less than or equal to 13.2kbps) and WB cases, in order to make the inter-frame reconstruction more continuous and allocate more bits to perceptual important bands, the quantized energy envelopes of the highest bands are adjusted prior to bit allocation. Some of the quantized energy envelopes of the high frequency bands and low frequency bands are adjusted. Then perform the bit allocation to bands according to the adjusted energy envelopes. Finally, the coefficients of the bit allocated bands are quantized and written to the bit-stream.
To adjust the quantized energy envelope of high frequency bands at NB, the following steps are performed:
- Two bits are encoded to indicate whether the highest two bands of the previous frame is encoded. Initialize a band boundary to 6 and initialize adjustment factors , respectively:
(1043)
- For each band, calculate the magnitude envelope:
(1044)
where denotes the bandwidth of each band, and is computed as follows:
(1045)
where denotes the quantized energy envelope of each band.
- Then, calculate the sum of the differences between the consecutive two magnitude envelopes and the sum of the magnitude envelopes for high frequency bands:
(1046)
(1047)
- Search the peak of magnitude envelopes and calculate the sum of the magnitude envelopes for low frequency bands:
(1048)
(1049)
- Initialize an adjustment factor for each band to 1, and then adjust the quantized energy envelopes of high frequency bands as follows:
- if is satisfied at the bit-rates below 13.2kbps or
is satisfied at 13.2kbps,
the quantized energy envelopes of the last bands are adjusted as follows:
(1050)
- Otherwise, the adjustment factors of the last 2 bands are calculated:
(1051)
Then, the adjustment factors of the last 2 bands are updated further according to :
(1052)
Finally, the quantized energy envelopes of the last 2 bands are adjusted: ,
where is the tonality flag which is calculated in subclause 5.3.4.1.4.1.3, i.e. the classification mode of the band, and is obtained from the previous frame and indicates whether the bits are allocated to the highest two bands of the previous frame or not. If is equal to 1, the band of the previous frame is allocated bits; Otherwise, if is equal to 0, the -band of the previous frame is not allocated bits -. After bit allocation, the flag of the current frame is preserved for the next frame.
To adjust the quantized energy envelops of low frequency bands at NB, the following steps are performed:
- Initialize to 3, select low frequency bands from bands. For each band, calculate the magnitude envelope:
(1053)
- Then, calculate the sum of the magnitude envelopes for high frequency bands:
(1054)
the variable is assigned to different value for different bit rates, specifically it is assigned to 13,14,15,17 for 7.2kbps, 8kbps, 9.6kbps and 13.2kbps respectively.
- Search the peak of magnitude envelopes and calculate the sum of the magnitude envelops for low frequency bands:
(1055)
(1056)
- Initialize a flag to 0. is a flag to indicate whether second stage bit allocation algorithm is used in TCQ module. When is equal to 0, second stage bit allocation algorithm will be used. When is equal to 1, the second stage bit allocation algorithm will not be used. The flag is utilized in subclause 5.3.4.1.4.1.5.1.2.
- Determining whether to modify the quantized energy envelops of the six low frequency band according to their energy characteristics and spectral characteristics. The energy characteristics denote the ratio between the energy of six low frequency bands and the energy of the other bands which are determined by and ; The spectral characteristics denote the degree of spectrum fluctuation which are determined by.
- If the following conditions are satisfied,
, the flag is set to 1, and the quantized energy envelops of the three low frequency bands are adjusted as follows:
(1057)
To adjust the quantized energy envelopes of high frequency bands at WB, the following steps are performed:
- Encode two bits to indicate whether the highest two bands of the previous frame is encoded. Define two band boundaries and. If the bit rate is 13.2kbps, set and to 8 and 15, respectively; Otherwise, set and to 8 and 16, respectively. The bandwidths of low frequency bands and high frequency bands are obtained as follows:
(1058)
- For each band, calculate the magnitude envelope:
(1059)
Then, calculate the sum of the differences between the consecutive 2 magnitude envelopes and the sum of the magnitude envelopes for part of the high frequency bands:
(1060)
(1061)
- The energies of low frequency bands and high frequency bands are computed as follows:
(1062)
- Obtain adjustment factors for the highest bands according to the tonality flag and the energies of low frequency bands and high frequency bands:
(1063)
and for the highest bands, update the adjustment factors according to :
(1064)
and then adjust the quantized energy envelopes of the highest bands .
To adjust the quantized energy envelopes of low frequency bands at WB, the following steps are performed:
- Initialize a low frequency band boundary to 6. For each band, calculate the magnitude envelope:
(1065)
- Then, calculate the sum of the magnitude envelopes for high frequency bands:
(1066)
the variableis assigned to different value for different bit rate, specifically it is assigned to 18, 20 for 13.2kbps and 16.4kbps respectively.
- Search the peak of magnitude envelopes and calculate the sum of the magnitude envelops for low frequency bands:
(1067)
(1068)
- Determining whether to modify the quantized energy envelops of the six low frequency band according to their energy characteristics and spectral characteristics. The energy characteristics denote the ratio between the energy of six low frequency bands and the energy of the other bands which are determined by and ; The spectral characteristics denote the degree of spectrum fluctuation which are determined by .
If the conditions or are satisfied, the flag, and the quantized energy envelops of the six low frequency bands are adjusted as follows:
(1069)
Finally, the adjusted quantized energy envelopes and the initial quantized energy envelopes are used in the first bit allocation module.
5.3.4.1.4.1.4.3 Bit allocation for PFSC
Based on the tonality flags, whether TCQ is used for encoding the band in the high frequency region is determined. When the tonality flag is set to “0”, such band is excluded from target bands of the TCQ, i.e. no bit is assigned to the band for TCQ.
In the Normal Mode, the four bands in the high-frequency region for SWB/FB and one band for WB are assumed to be quantized by TCQ in default operation. However, when the tonality flag is set to “0”, such band is quantized with the PFSC scheme using a similar procedure with the “sub-band search” (called as “band search” in this specification) in [25], whereas for WB cases such band is filled with noise. This means the peaky/tonal bands are encoded by the TCQ while the PFSC scheme is used for other bands.
In Figure 66, whether the PFSC is used for quantizing the high-frequency bands is indicated by ‘Quantizing mode’.
In the PFSC scheme, the band search is based on a pitch filter based prediction (, where T is a pitch coefficient, and low-frequency spectrum is used as its filter state (filter memory)), and the pitch coefficient (i.e. lag information as a filter parameter) is encoded. The lag information is encoded with 2 bits or 1 bit. Lower two bands among the four bands are encoded with 2 bits, while higher two bands among the four are encoded with 1 bit. When the PFSC is selected for encoding some of the four bands, necessary bits for encoding those bands are reserved and assigned for the bands before starting the bit-allocation process for TCQ encoding bands.
5.3.4.1.4.1.4.4 Bit allocation for TCQ
5.3.4.1.4.1.4.4.1 Allocating bits for fine gain adjustment
In the TCQ based MDCT coefficients quantization, fine gain adjustment is applied to several bands whose energies are larger than the others. The number of those bands is configured based on encoding bit-rate and signal bandwidth as shown in table 119.
Table 119: Bits reserved for fine gain adjustments
|
7.2 kbit/s |
8.0 kbit/s |
13.2 kbit/s |
16.4 kbit/s |
|
|
NB |
2 bands, |
2 bands, |
4 bands, |
2 bands, |
|
WB |
– |
– |
6 bands, |
6 bands, |
|
SWB |
– |
– |
4 bands, |
4 bands, |
One-bit scalar quantization is used for the fine gain adjustment. Therefore the number of reserved bits for NB, WB, SWB, and FB is shown in the table 119. The bits for the TCQ are obtained by subtracting the fine gain bits, and expressed by, where is the available bit budget for spectrum coding, ebits is the bits consumed for quantizing the band energies which is obtained from subclause 5.3.4.1.3, and R is the total bits. It should be noted that the mode bits for switching Transient/non-Transient and Normal/Harmonic are included in the available bit budget. Therefore the necessary bits (1 or 2) are subtracted from the available bit budget in the following subclauses,
i.e. (NB and WB cases) or (SWB and FB cases).
5.3.4.1.4.1.4.4.2 Limited-band mode
For the SWB case, bandwidth, , is more than 50 bins for the last four bands (i.e. b=18 to 21 in 13.2 kbps and b=20 to 23 in 16.4 kbps). At low bit-rates, such wide bandwidth would result in insufficient quantization performance. Therefore, a limited-band mode is introduced for achieving efficient encoding. In the limited-band mode, only the vicinity of a perceptually important spectrum in each band is targeted to be quantized, i.e., outside the vicinity in each band is not targeted to be quantized. This is realized by the following principle. A bandwidth is adaptively shortened if the maximum amplitude spectrum frequency falls into a range of frequencies around the maximum amplitude spectrum frequency in the previous frame. The difference between the maximum amplitude spectrum frequencies for the current and previous frame is calculated. When it is smaller than a threshold, the limited-band mode is used for such band. The frequency position of the maximum amplitude spectrum is searched for each of the four bands. The position for the previous frame is stored in a memory, which was searched using the quantized MDCT spectrum (i.e. decoded MDCT spectrum) in the previous frame. The position for the current frame is searched using the MDCT spectrum calculated from the input signal in the current frame. When the limited-band mode is selected, the targeted band is limited to the vicinity of the maximum amplitude spectrum frequency of the previous frame. The range of the vicinity is 15 or 31 spectrum bins depending on coding bit-rate and band index. One bit is used for indicating whether the limited band mode is used when tonality flag is set to “1”.
The limited-band mode can be used only in the case where the corresponding previous band was quantized by TCQ.
5.3.4.1.4.1.4.4.3 Final bit allocation for TCQ and PFSC
Band widths and quantized band energies are used for bit allocation. The band widths are basically configured by Table 108. When the limited-band mode is selected, corresponding band widths are shortened. Bit allocation for TCQ is as follows. Firstly, bits are distributed according to the quantized band energy. Secondly, if there is a band whose assigned bits are less than a minimum number of bits, no bit is assigned to the band and the assigned bits will be re-allocated to the other bands. Thirdly, if there is a noise-like band whose assigned bits are not sufficient in comparison with its bandwidth, no bit is assigned to the band and the assigned bits will be re-allocated to the other bands. Furthermore, assigned bits are compared with predefined band-based threshold, and no bit is re-assigned if the assigned bits are less than the threshold for the band.
In case no bit is assigned to any band among the highest four bands for a SWB and FB signals, PFSC is used for quantizing such band. This case can happen when the tonality flag is set to “1” but assigned bit results in zero. In this case, necessary bits for the PFSC encoding for the band is extracted from the bit budget for TCQ, . Therefore is updated by subtracting the necessary bits, and the bit-allocation is re-calculated.
5.3.4.1.4.1.5 Fine structure encoding
5.3.4.1.4.1.5.1 Trellis Coding Quantization (TCQ)
5.3.4.1.4.1.5.1.1 Joint USQ and TCQ
Trellis Coded Quantization (TCQ) quantizes the fine structure of normalized spectrum, selecting the Important Spectral Components (ISCs). The information for the selected ISCs in each band is coded as the position, number, sign and magnitude of the ISCs. The magnitude information is quantized by the joint Uniform Scalar Quantization (USQ) and TCQ with arithmetic coding, while the information on position, number and sign is coded by arithmetic coding. A block diagram of the fine structure encoding using TCQ is depicted by the figure 68.
Figure 68: Block diagram of fine structure encoding using TCQ
The encoding method is selected at the Selecting Encoding Method block by the bit allocation and the information for each band. If a bit allocated for a band is zero, all the samples in that band are coded to zero by the zero encoding block. Otherwise, each band is quantized by the selected quantizer.
The quantizer selection information selects the most efficient quantizer between the USQ and TCQ by considering the input signal characteristics, i.e. the bit allocation and the length of each band. If the average number of bits for each sample in a band is greater or equal to 0.75, the band is of high importance and USQ is used; for all other bands TCQ is used. The quantizer selection flag, USQ_TCQ[i], is set to 1 when USQ is used.
The Scaling Bands block performs scaling at each band to control the bit rate, using the bit allocations and the normalized spectrum for each band. The scaling is done by considering the average bit allocation for each spectral component in the band. If the average bit allocation is bigger than the number specified by the bit allocation, then more scaling is done.
The detailed scaling process is as follows. First the estimation of the number of pulses for the current band is obtained using the length and the bit allocation information for each band. Then the number of nonzero positions is obtained by the following equation, which is based on the probabilities.
(1070)
where b is the number of bits and calculated as:
(1071)
and the number of required bits for the positions is estimated as:
(1072)
where n is the band length, m is the number of pulses, i is number of non-zero positions which have an ISC, and b is the number of bits required for the given size of band and number of pulses. Finally the number of pulses is selected by the b value which is the closest to the value of the allocated bits for the band.
The initial scaling factor is decided by the estimation of the number of pulses and the absolute value of the input signal. The input signal for each band is scaled by this factor. If the estimated number of pulses is not the same as the summation of the number of pulses for the quantized signal after scaling, a pulse redistribution process will be performed using the updated scaling coefficient. The redistribution process is as follows: if the number of selected pulses is smaller than the number of estimated pulses, the scaling coefficient will be decreased, otherwise the scaling coefficient will be increased.
The distortion function for the TCQ is sum of squared distance of each quantized and un-quantized value in each band. It is similar to the Euclidean distance but avoids the square root, since only the relative magnitudes of the values are needed rather than the exact distance.
(1073)
where is actual value and is quantized value.
For the USQ module the Euclidean distance is used to determine the best quantized values. To minimize the computational complexity, the modified equation including the scaling factor is used, which is done by calculating the d1 as:
(1074)
If the number of pulses per band does not match the required value, it is necessary to add or delete some pulses while preserving the minimal metric. This procedure is done in an iterative manner by adding or deleting a single pulse, and then repeating until the number of pulses reaches the required value.
To add or delete one pulse it is necessary to calculate n values of distortions in order to select best one For example, the distortion value j corresponds to adding pulse at j-th position in a band:
(1075)
To avoid the full calculation of this formula n times, the following derivation can be used:
(1076)
where values can be calculated just once, n is the band length (number of coefficients in a band), p is the input signal of the quantizer , q is the quantized signal, and g is the scaling factor. Finally the position j, which minimized the distortion d, is selected and qj is updated.
To control the bit rate, an appropriate ISC has to be selected using the scaled spectral coefficients at the Selecting Important Spectral Components (ISCs) block. The spectral component to quantize is selected by using the bit allocations for each band. This selection can have various combinations which depend on the distribution and variance of the spectral component. The actual non-zero position is then calculated. The non-zero position is obtained by analyzing the amount of scaling and redistribution operations. The non-zero position will be the ISC.
If the number of pulses is not controlled by the scaling and the redistribution operations, the selected pulse will be quantized by TCQ and the surplus adjusted with the results of the quantization. Hence, if the number of non-zero position is greater than 1, and not equal to the estimated number of the pulse, and the quantizer selection information indicates the TCQ, the surplus redistribution process will be used. If this condition is satisfied, the surplus will be redistributed by the TCQ quantization operation in advance. If the real number of the pulses from the TCQ quantization is smaller than the estimated number of pulses which was derived for each band in advance, the scale factor will be multiplied by 1.1. However, if the number of pulses from the TCQ quantization is bigger than the estimated number of pulses, the scale factor will be multiplied by 0.9.
The final selected non-zero position is called by ISC whose information is encoded at the Encoding Position Info block. The information consists of the number of the selected ISC and the non-zero positions. In order to enhance the efficiency of the information encoding arithmetic coding is used.
Given a stream of symbols and their probabilities, the arithmetic coder produces a space efficient bit-stream to represent these symbols and, given the bit stream and the probabilities, the arithmetic decoder reverses the process.
In the arithmetic coding algorithm two 16 bits integers are taken as the numerators of fractions called “low” and “high”. These fractions have a common denominator equal to, such that all the fractions fall in the range . These integers define a range so that a single number stores all the symbols of the message. This number is saved bit by bit to the bit stream during the arithmetic coding process. To avoid precision loss, at each step of coding process renormalization is performed when difference between ‘low’ and ‘high’ is less than 0.5.
In the decoder, 16 bits will first be read and stored in the arithmetic decoder accumulator. The decoder can then replicate the coding process, but instead of encoding the values, it uses the bit-stream to produce symbols. These symbols will be reconstructed correctly, if their probabilities are the same in both the encoder and decoder.
In the Gathering ISCs block, the new buffer is constructed by the selected ISCs, as shown in figure 69. The zero-band and the position which is not selected are both excluded from this buffer.
In the Joint USQ and TCQ Coding block, the magnitudes of the gathered ISCs are quantized by the joint USQ and TCQ, and the quantized information is additionally coded by arithmetic coding. In order to enhance the efficiency of the arithmetic coding, the non-zero position and the number of ISCs is utilized for the arithmetic coding. The joint USQ and TCQ have two types of coding methods. One is TCQ and USQ with 2nd bit allocation for the NB and WB, and the other one is the LSB TCQ for USQ for the SWB and FB. These methods are described in subclause 5.3.4.1.4.1.5.1.2 TCQ and USQ with second bit allocation and subclause 5.3.4.1.4.1.5.1.3 LSB TCQ for USQ.
Figure 69: Concept for the gathering ISCs
In the Encode Signs block, the sign information of the selected ISC is coded by the arithmetic coding. In order to recover the quantized components, the position, sign and magnitude information is added to the quantized components to recover the real components at the Recovering Quantized Coefficients block. In this block, the zero is allocated to the zero positions.
In the Inverse Scaling Bands block, the inverse scaling of the quantized components is performed. The inverse scaling factor can be extracted by using the scaling factor in the Scaling Bands block. The inverse scaled signal is same level as that of the normalized input signal and the signal is the output of the TCQ quantization.
5.3.4.1.4.1.5.1.2 TCQ and USQ with second bit allocation
The following information is needed for TCQ and USQ with second bit allocation:
- Determine the total number of bits which will be allocated to the corresponding bands to be processed in current frame in subclause 5.3.4.1.4.1.4.4.1.
- In order to obtain the number of bits for each band by first bit allocation, implement the first bit allocation to the bands based on the total number of bits to be allocated in subclause 5.3.4.1.4.1.4.4.3.
- Based on the number of bits for each band by first bit allocation, the first detailed scaling process in subclause 5.3.4.1.4.1.5.1.1 is implemented on each band which is allocated bits by first bit allocation. Then, the total number of redundant bits of the current frame and the number of pulses of each band are obtained.
The general quantization and coding scheme of the TCQ and USQ with second bit allocation consists of several main blocks: quantizer decision, TCQ quantizer, USQ quantizer, lossless coder, and second bit allocation. In the quantizer decision module the quantization mode of the current band is selected by using the results of the Selecting Encoding Method block. Then the selected quantizer quantizes the current band, and the lossless encoder based on the arithmetic coding transmits the data to the bit stream. The second bit allocation block distributes surplus bits from the previously coded bands. The second bit allocation procedure detects two bands that will be encoded separately.
Figure 70: Block diagram of TCQ and USQ encoding with second bit allocation
Based on the second bit allocation parameters, two bands are selected from those bands allocated bits during the first bit allocation that will be allocated more bits. The second bit allocation parameters include at least one of the total number of redundant bits and the characteristics of each band. The characteristics of each band include the harmonic characteristics and the bit allocation state of each band. The tonality flags for the bands of current frame which are calculated in subclause 5.3.4.1.4.1.3 represent the harmonic characteristics of each band, and whether the highest two bands of the previous frame is quantized represents the bit allocation state of each band. If the tonality flag is equal to one, the signal type of corresponding band is harmonic. Otherwise, the signal type of corresponding band is not harmonic. If the tonality flags are not all zero or any one of the highest two bands of the previous frame is quantized, the two bands to be quantized last will be finally selected in the highest few bands. Otherwise, they will be selected in other bands than the highest few ones.
The first of these two bands is selected using the analysis of the average number of bits per bin in the band by the first bit allocation:
For each band, calculate the average number of bits per bin as follows:
(1077)
where R(b) is the number of bits allocated to each band by the first bit allocation based on the total number of bits in the subclause 5.3.4.1.4.1.4.4.3, and is the bandwidth of each band.
In order to determine the first band, the average number of bits per bin of each band will be compared. And the bands should satisfy the following conditions:
(1078)
where is the tonality flag, and indicates whether the highest two bands of the previous frame is quantized. If is equal to 1, the band of the previous frame is quantized, else if is equal to 0, the band of the previous frame is not quantized.
If the average number of bits per bin in band is the least one in the bands which satisfy the above conditions, the band will be determined as the first band for a second bit allocation.
When there are no bands satisfy the above conditions or the encoded bandwidth is NB, the first band for a second bit allocation will be selected during the bands which satisfy the following conditions:
(1079)
If the average number of bits per bin in band is the least one in the bands which satisfy the above conditions, the band will be determined as the first band for a second bit allocation.
For the first band for a second bit allocation, if or, then the band or the band will be selected as the second band for a second bit allocation. Otherwise, for the band and the band, if the average number of bits per bin of the band is less than the band, the band will be selected as the second band for a second bit allocation, if not, the band will be selected as the second band for a second bit allocation.
Thus implement the second bit allocation on the two selected bands to be allocated bits once again. When the total number of redundant bits is replenished, the redundant bits are allocated to the two bands selected above, with the first one being allocated the majority, or all, of the surplus. The number of bits allocated to the two bands is obtained respectively and the proportion of the surplus allocated to each band is shown in the following:
When the encoded bandwidth is NB:
(1080)
When the encoded bandwidth is WB:
(1081)
where denotes the bit surplus, and denotes the number of bits allocated to the first and the second band for a second bit allocation respectively.
At last, based on the number of bits allocated to the two selected bands by the first bit allocation and the number of bits allocated to the two selected bands by the second bit allocation, implement the second detailed scaling process on each of the two selected bands same as the first detailed scaling process in subclause 5.3.4.1.4.1.5.1.1. So, the number of pulses for each of the two selected bands is obtained again.
In the TCQ quantization module a trellis is used with 8 states, 4-coset (subsets) with 2 zero levels. The quantization indexes are derived from the TCQ codebook, which consists of the branch information of a trellis state (path information) and the information for quantization level allocated to the selected coset (subset). The quantization indexes are always positive integers and are coded by arithmetic coding. The detailed magnitude encoding is as follows.
Figure 71: Trellis structure with 8 states 4-coset with 2 zero levels
Each quantized band starts from state (0), because in other cases it is required to transmit two additional bits per band. Then, by using the Viterbi algorithm, the optimal path through trellis is selected, in order to have best possible SNR when comparing the original and quantized sequences. To calculate the SNR, it is necessary to have quantization scale before starting TCQ quantization, so the band is initially quantized by USQ and the optimal scale is calculated. After the lossless encoding the bit surplus is accumulated.
The accumulated surplus is used as an additional resource for the encoding of the last two bands determined by the second bit allocation procedure. Depending on the coding mode and surplus value, the surplus is shared between these two bands in different proportions. Then the encoding of these two bands is performed in the same way as described above.
The magnitude coding based on binary arithmetic coding is carried out as follows. First the probability of a symbol is calculated as
(1082)
(1083)
where is the number of magnitudes left to transmit in the band, is the number of pulses left to transmit in the band, is the current coded pulse in magnitude and is the set of existing magnitudes at trellis state .
Each magnitude pulse is encoded by using probabilities and , where corresponds to last pulse in magnitude and to all other pulses.
The magnitude pulse probabilities are modified after this calculation with respect to the trellis code limitation. This information is determined by the trellis structure and is available to both the encoder and decoder. Thus any magnitude values that are impossible are encoded using zero probability and hence do not require any bits.
The encoding algorithm was modified to save complexity for bands with a large number of pulses. The idea is to introduce a non-binary arithmetic coder, where the probabilities of a coded symbol are calculated as the mutual probability of the binary symbols for the current magnitude. To avoid very low probabilities and hence loss of precision, escape symbols are utilized. After an escape symbol transmitted, the rest of the magnitude pulses transmitted are started from initial probabilities condition.
The location coding is carried out based on same algorithm as used for the magnitude coding and uses the same complexity reduction technique. The signs and LSB path vector are transmitted as is, because the distributions are random.
5.3.4.1.4.1.5.1.3 LSB TCQ for USQ
The aim of the LSB TCQ for USQ is to use the advantages of both quantizers (USQ and TCQ) in one scheme and exclude the path limitation from the TCQ. Conceptually the LSB coding of the quantized data is shown in figure 72.
Figure 72: Concept of LSB coding
Each quantized value that is greater than one contains an LSB which can be zero or one. The sequence of LSBs can then be quantized by TCQ to find the best match between that sequence and the available trellis paths. In terms of the SNR criteria, it does not matter where the error occurs. Thus at the cost of some errors in the quantized sequence, the length of the sequence is decreased.
The encoding of the spectral data is done by the TCQ and USQ quantizers and lossless coding based on the binary arithmetic coder. Before processing the norms, the extraction and spectrum normalization are done. The combined quantizer scheme is presented at figure 73.
Figure 73: Block diagram for LSB TCQ for USQ encoding
The spectral data in each band are quantized by the USQ quantizer with the number on bits R[] determined by the bit allocation module. In order to fit the bit requirement, the number of bits which will be used for TCQ data is extracted from each non-zero band evenly, and then the bands are quantized by the USQ. The quantization procedure is the same as described above for the TCQ and USQ algorithm. All bands that have non-zero data after quantization are collected as the difference between the quantized and un-quantized data and are called the residual. If some frequencies are quantized as zero in a nonzero band, they are not included into residual.
Figure 74: Concept for constructing the residual array
The residual array is quantized by TCQ with code rate ½ known (7,5)8 code:
Figure 75: Trellis structure for the 4 states TCQ
Quantization using TCQ is performed for the first magnitudes. After quantization the path metrics are checked and the best one selected. For lossless coding the data for the best trellis path is stored in separate array while the trace back procedure is performed. The constant was defined as 10, which allows up to 20 magnitudes per frame to be encoded.
The trellis path data is encoded by the arithmetic encoder as equi-probable symbols. The path data is binary sequence which is encoded using an arithmetic encoder with a uniform probability model.
The quantized spectral data produced by the USQ are encoded by the same method as described in subclause 5.3.4.1.4.1.5.1.2
The quantized MDCT coefficients are de-normalized using the quantized band energies.
Finally, as described in subclause 5.3.4.1.4.1.4.4.1, quantization of fine gain adjustment is performed on the dominant bands. The inner product between target MDCT coefficients and the de-normalized quantized MDCT coefficients is calculated, and fine gain adjustment factor is calculated by dividing the inner product by the energy of the de-normalized quantized MDCT coefficients. The fine gain adjustment factor is quantized by a 1-bit scalar quantizer.
5.3.4.1.4.1.5.2 Noise-filling for 0-bit assigned sub-bands
Decode the MDCT coefficients of each sub-band from the received bit stream. In order to reconstruct the un-decoded MDCT coefficients, the sub-bands are classified into the bit allocation saturated sub-bands and the bit allocation un-saturated sub-bands according to the average number of bit allocation to a coefficient in a sub-band. The noise filling module is applied to the un-saturated sub-bands. Firstly the noise gain for noise filling of each sub-band is calculated. Then, fill the appropriate noise into the un-decoded MDCT coefficients of each sub-band, and the un-decoded MDCT coefficients are reconstructed. At last, the whole frequency domain signal is obtained based on the decoded MDCT coefficients and the reconstructed MDCT coefficients.
5.3.4.1.4.1.5.2.1 Parameters calculation for noise gain
Initialize a critical number of the sub-band to. If the encoded bandwidth is SWB and the encoding mode is harmonic or normal, is updated to 19 at 16.4kbps and is updated to 17 at 13.2kbps.
Calculate the average number of allocated bits for each coefficient in each sub-band:
(1084)
where denotes the number of bits allocated to each sub-band, and denotes the bandwidth of each sub-band.
Calculate the average envelope of each sub-band:
(1085)
where denotes the quantized energy of each sub-band.
The maximum magnitude of the decoded coefficients in each sub-band and the number of the non-zero decoded coefficients in each sub-band are determined by the decoded coefficients.
Suppose denotes the number of the encoded pulses of each sub-band and means the average number of bit allocation to a coefficient in a sub-band is larger than 0. And the energy difference is obtained as follows:
(1086)
If , compute the initial noise gain of each sub-band by the energy difference and sub-band bandwidth :
(1087)
Otherwise, .
And, if , compute the parameter of harmonic character . The parameter of harmonic character is used to calculate the noise gain, then fill noise to the not decoded coefficients of bit allocation un-saturated sub-band.
5.3.4.1.4.1.5.2.2 Noise gain calculation
If the average number of bit allocation to a coefficient in a sub-band is not less than 0.8, the bit allocation of the sub-band is defined as saturated; otherwise, the bit allocation of the sub-band is defined as un-saturated.
When the bit allocation of sub-band is un-saturated and , based on the envelope and decoded coefficients the noise gain is calculated as follows:
(a) First estimate the overall noise factors under different conditions:
(a1) When and the encoded bandwidth is SWB:
If the encoding mode is harmonic and :
(1088)
If the encoding mode is harmonic and :
(1089)
If the encoding mode is normal and :
(1090)
If the encoding mode is normal and :
(1091)
If the encoding mode is transient:
(1092)
(a2) When and the encoded bandwidth is WB or NB:
Initialize as:
(1093)
While and, is updated as follows:
(1094)
While and the encoded bandwidth is WB, is updated as follows. Where is a number of sub-bands at the highest frequency which have a flag to indicate whether the sub-band is harmonic or not. If is equal to 1, the sub-band have harmonic character; otherwise, the sub-band do not have harmonic character.
(1095)
where and denote the energy of high frequency bands and low frequency bands, respectively. and denote the numbers of correspond high frequency coefficients and correspond low frequency coefficients respectively.
(a3) When, is determined as follows:
If the encoded bandwidth is SWB and the encoding mode is harmonic; Otherwise, .
(b) Based on the overall noise factor, the noise gain is calculated:
(1096)
When the sub-band is a bit allocation saturated sub-band or is less than zero, the noise gain is set to zero.
(c) In order to smooth the noise gain between the current frame and the previous frame, a sub-band boundary is defined. If the encoded bandwidth is SWB, . Otherwise, . Here denotes the last sub-band with encoded pulses in current frame, denotes the last sub-band with encoded pulses in previous frame.
For the sub-band, if, then the noise gain is smoothed as follows:
When is less than:
(1097)
When is equal to:
(1098)
where denotes the average envelope of the previous frame, denotes the noise gain of the previous frame.
5.3.4.1.4.1.5.2.3 Noise filling
First, if the encoded bandwidth is not SWB and, modify the decoded coefficients:
(1099)
For sub-band, if the conditions: bit allocation is unsaturated, the encoded bandwidth is not SWB, , are all satisfied, the decoded coefficients is filled noise as follows:
Initialize a counter to zero. Based on the smoothed noise gain, calculate a noise filling factor:
(1100)
If the decoded coefficient is equal to zero,
(1101)
where is generated by random noise, is increased by 1.
Otherwise, the noise filling of the decoded coefficients is described as follows:
(1102)
Finally, update the memories as follows:
,, (1103)
5.3.4.1.4.1.5.3 Pitch filtering spectrum coding (PFSC)
5.3.4.1.4.1.5.3.1 PFSC overview
Pitch filtering spectrum coding is applied only for the SWB and FB signals.
To encode and decode high frequency MDCT coefficients, PFSC utilizes both decoded MDCT coefficients which is the output of “TCQ”, “De-norm and Fine gain SQ” blocks in Figure 66 and noise-filled low-frequency decoded MDCT coefficients. The band which is encoded by PFSC is determined through the bit-allocation process described in the previous subclauses. In the PFSC, the four highest bands are the high-frequency bands, and the other bands lower than the high-frequency bands are the low-frequency bands. For example, in Table 108, the low-frequency bands are =0 to 17, and the high-frequency bands are
=18 to 21, for the 13.2 kbps.
For each band, the most similar match with the selected similarity criteria is searched from the TCQ-quantized and envelope normalized low-frequency content (i.e., the envelope normalized version of the decoded MDCT coefficients in the low-frequency bands). The most similar match is scaled with a scaling factor calculated using the quantized band energy to obtain the synthesized high frequency content.
5.3.4.1.4.1.5.3.2 Envelope normalization
In a manner similar to G.718 Annex B, quantized low-frequency content is normalized with its envelope. However, the TCQ-quantized low-frequency content (which is the decoded low-frequency content before the noise filling) is a sparse pulse sequence, and a normalization process is therefore used to flatten the low-frequency content.
The low-frequency content is normalized by dividing it by the maximum amplitude value in each sub-band. Here, the sub-band configuration is special and used only for this normalization. Each sub-band consists of 12 MDCT coefficients. By performing this process, each sub-band will have the same maximum amplitude value, and the low-frequency content can therefore be converted to an MDCT coefficient sequence whose spectral characteristic is flat and smoothed.
The envelope normalization (or smoothing) process is separately performed on the TCQ-quantized low-frequency content and the filled low-frequency noise content, respectively. And the amplitude of the normalized noise content is adaptively scaled according to the sparseness of the TCQ-quantized low-frequency content. The sparseness is calculated by dividing the number of non-zero spectrum in the TCQ-quantized low-frequency content by the bandwidth of the low-frequency content. A threshold is calculated using the sparseness and used for removing low amplitude TCQ-quantized low-frequency content and for scaling the maximum amplitude of the noise content. The normalized TCQ-quantized low-frequency content is further modified so that its non-zero content has larger amplitude than the maximum amplitude of the normalized noise content thus dynamic range of the normalized TCQ-quantized low-frequency content is modified for better matching with a targeted high-frequency spectrum. Finally, the scaled low-frequency noise content and the modified TCQ-quantized low-frequency content are added for generating envelope normalized spectrum (MDCT coefficients). If the generated spectrum becomes zero, such spectrum component is replaced with a randomly generated noise whose maximum amplitude is limited to the half of the maximum amplitude of the scaled noise component.
A block diagram of this envelope normalization is shown in Figure 98 of subclause 6.2.3.1.3.1.4.3.3. In Figure 98, all processing blocks except ‘Lag info. decoding’ are common to the both sides of encoder and decoder.
5.3.4.1.4.1.5.3.3 Band search
5.3.4.1.4.1.5.3.3.1 Selection of representative MDCT coefficients
Similarly to G.718 Annex B, a band search approach is used. The last four bands (i.e. b=18 to 21 in 13.2 kbps and b=20 to 23 in 16.4 kbps) are then subject to be encoded with PFSC. As shown in the table 108, the widths of the bands are 55, 68, 84, and 105 for 13.2 kbps, and 59, 74, 92, and 115 for 16.4 kbps.
To reduce computational load for calculating correlation, only limited number of input (target) MDCT coefficients are selected as representative MDCT coefficients and used for correlation calculation. The selection of the MDCT coefficients is performed by amplitude threshold process, i.e. an MDCT coefficient is selected if its absolute value is greater than a threshold. The threshold is determined using the average and standard deviation of the absolute values of the MDCT coefficients in a subjected high-frequency band.
(1104)
Here is the initial threshold for the i-th high-frequency band, is the average of the absolute values of the MDCT coefficients in the i-th high-frequency band, is the standard deviation of the absolute values of the MDCT coefficients in the i-th high-frequency band, and is a factor for controlling the selected number of the MDCT coefficients. is chosen so that a calculated threshold becomes higher than a threshold which is expected to be appropriate for selecting the limited number of the MDCT coefficients.
If the number of selected MDCT coefficients is less than a pre-determined number, the threshold is updated by the following equation, and an additional selection process is performed.
(1105)
Here, is the weakest attenuation factor and is the strongest attenuation factor, and . is the pre-determined number of the MDCT coefficients to be selected in the end, and is remaining number of MDCT coefficients to be selected. is the updated threshold. By using this equation, the threshold is calculated according to the number of non-selected MDCT coefficients, i.e. the larger the number of non-selected MDCT coefficients is, the lower the threshold become. The above equation is equivalent to the following equation. is the number of already selected MDCT coefficients.
(1106)
This threshold update is performed twice using a different set of and unless the number of selected MDCT coefficients does not reach the pre-determined number.
The selected MDCT coefficients (target MDCT coefficients for the band search) are stored in a memory as their frequency positions and used for the band search process.
5.3.4.1.4.1.5.3.3.2 Matching process
Once the representative MDCT coefficients are selected from the j-th band input MDCT coefficients , a matching process is performed by calculating the correlation between the representative MDCT coefficients and normalized low-frequency MDCT coefficients derived from the envelope normalized MDCT coefficients
calculated in subclause 5.3.4.1.4.1.5.3.2. Since the correlation is calculated only using the selected MDCT coefficients, required computational complexity can be saved.
The task of the matching process is to find k’ which maximizes S(k’).
(1107)
where , and denote the following.
: correlation between representative MDCT coefficients and normalized low-frequency MDCT coefficients for the k’-th lag candidate,
: energy of normalized low-frequency MDCT coefficients for the k’-th lag candidate
: number of lag candidates for the j-th band.
and are calculated by the following equations.
(1108)
(1109)
, , and denote the followings.
: selected number of representative MDCT coefficients in the j-th band,
: frequency position of the k-th representative MDCT coefficient in the j-th band,
: the k’-th lag candidate for the j-th band,
: starting frequency position of normalized low-frequency MDCT coefficients for the j-th band.
The lag candidates are defined as the frequency positions of non-zero normalized low-frequency spectrum. Therefore means the k’-th non-zero normalized low-frequency MDCT coefficients frequency position in the j-th band search range. The j-th sub-band search range is started at, which is defined as offsets from the zero frequency point of the low-frequency spectrum. = {0, 0, 64, 64}
By using this lag candidate representation, even when the bit budget for the lag information is small, actual lag search range can be wide, and it enables to guarantee to generate candidate spectra which have at least one non-zero spectrum.
Figure 76 shows a conceptual block diagram of the matching process. For the lag search, only the TCQ quantized component is used as the low-frequency MDCT coefficients while both of the TCQ-quantized and noise-filled low-frequency MDCT coefficients are used for generating high-frequency spectrum and calculating the scaling factors.
The best k’, which maximizes S(k’), is packed into the bit stream as an encoded parameter of the best lag candidate.
Figure 76: Conceptual block diagram of the matching process
5.3.4.1.4.1.5.3.3.3 Scaling and noise smoothing
Once the best match has been found through the matching process, scaling factors are calculated for the searched bands using the quantized band energies. Each scaling factor is calculated as the square root of the quotient of each quantized band energy divided by its corresponding band enegy of a generated high-frequency spectrum. The band energy of the generated high-frequency spectrum is equal to for the selected and is calculated according to equation (1109). The calculated scaling factors are attenuated by the scaling factor of 0.9.
Inter-frame smoothing process is applied on the generated high-frequency noise components. The generated MDCT coefficients whose amplitudes are below a threshold calculated using the sparseness of the TCQ-quantized low-frequency components are targeted for the smoothing process. When the energy of the targeted components in the previous frame is sufficiently less than current band energy, relatively strong smoothing is applied, while relatively weak smoothing is applied in other cases (i.e. current band energy is not sufficiently larger than the noise energy in the previous frame). The smoothing is performed on the energy of the targeted (noise) components.Then the smoothened energy of the noise components is divided by the energy of the noise components for calculating the final scaling factor in the smoothing process. The final scaling factor is applied to the noise components to obtain smoothened MDCT coefficients for the noise components.
Local decoding at the encoder side is necessary for switching to non-MDCT-based modes. For the local decoding, the noise-filling process is performed between the “De-norm. & Fine gain SQ” and the “PFSC” blocks in Figure 66.
5.3.4.1.4.2 Transient mode
5.3.4.1.4.2.1 Energy envelope coding
The energy envelope coding is performed as described in subclause 5.3.4.1.3.
5.3.4.1.4.2.2 Bit allocation
The bit allocation to bands based on quantized band energies is performed as described in subclause 5.3.4.1.4.1.4.4. However, for transient frames the Limited band mode described in subclause 5.3.4.1.4.1.4.4.2 is not used. For SWB, no bits are allocated at the 6th and 7th bands.
5.3.4.1.4.2.3 Fine structure encoding
5.3.4.1.4.2.3.1 Trellis Coding Quantization (TCQ)
The spectral coefficient quantization is done as is described in subclause 5.3.4.1.4.1.5.1.
5.3.4.1.4.2.4 Noise-filling for 0-bit assigned bands
The noise filling is done as is described in subclause 5.3.4.1.4.1.5.2.
5.3.4.1.4.3 Harmonic Mode
5.3.4.1.4.3.1 Energy envelope coding
Overall framework of the Harmonic mode is the same as the Normal mode as shown in Figure 66, but the PFSC block is called as PFSC-based gap filling in the Harmonic mode.
The energy envelope coding is performed as described in subclause 5.3.4.1.3.
5.3.4.1.4.3.2 Bit allocation
5.3.4.1.4.3.2.1 Allocating bits for fine gain adjustment
Bit-allocation process is performed in the following manner when the signal is classified as harmonic. Firstly, two bits are reserved for transmitting the noise factor information (cf. Sec. 5.3.4.1.4.3.3.3.6) followed by four bits are allocated for performing gap filling using PFSC based approach. Then some bits are reserved for applying fine gain quantization to the band energies that are larger than the others.
The bit budget used for the spectrum quantization is obtained using
(1110)
where ebits is the bits consumed for quantizing the band energies which is obtained from subclause 5.3.4.1.3, is the available bit budget for spectrum quantization. is the number of bits reserved for fine gain quantization and described in table 119. For SWB and FB at the bit rates 13.2 and 16.4 kbps, .
5.3.4.1.4.3.2.2 Adaptive bit allocation to bands
Based on the available bit budget, bits are allocated to the bands using the adaptive bit allocation. The adaptive bit-allocation scheme uses the quantized band energiesto allocate the available bits in a frame among the bands. In this scheme, bits are allocated by adaptively grouping the band energies into a number of groups; each group is allocated variable number of bits based on characteristics between the groups followed by bits allocation to bands within groups based on the available bit budget to the groups. Figure 77 illustrates the overview of the adaptive bit allocation. The detailed procedure is described in the following subclause.
Figure 77: Overview of the Adaptive bit allocation scheme
5.3.4.1.4.3.2.2.1 Bits allocation to groups.
The adaptive bit-allocation scheme uses the quantized band energies, to allocate the available bits in a frame among the bands.
First, the bit-allocation vector entries, i.e., bit allocation of each bands in bits per sample, are set to one, and followed by temporary storage of band energies in a buffer. This is done according to:
(1111)
In order to allocate bits to bands using band energies, firstly bands are adaptively grouped and each group contains variable number of bands, the grouping is to separate the dominant frequency band from non-dominant frequency band. In total, four groups are formed, while the widths of the first three groups are adaptively varied and the last group has a fixed width which has last four bands. The maximum variable width of each adaptive group is thresholded and it is given in table 120
Table 120: Maximum widths of the groups
|
13.2kbps |
16.4kbps |
|
|
7 |
7 |
|
|
9 |
10 |
|
|
2 |
3 |
The grouping of bands is done by identifying the dominant frequency vectors (the frequency bands with large and local maximum energy factor values in the spectrum). Using the dominant frequency band as center, include the descending slope at both sides to one group; the group width is adaptively determined corresponding to the input signal characteristic. If the dominant frequency band is at the edge, only one side of the descending slope would be included in the group. In order to reduce the complexity sensitive ness of human ear is considered when searching
Once width of each group is decided, bits are allocated to each group and it is done in the following steps.
- Calculate the energy of each group according to
(1112)
where is the start and end of each group and width of each group is defined according to
. (1113)
- Allocate bits to the last group which has a fixed width based on the following steps.
- Calculate average number of bits that can be allocated for a group
(1114)
-
- Calculate mean for the group energies according to
(1115)
-
- Calculate energy difference for last group according to
(1116)
-
- Calculate group energy ratio between the last group and average group energy of the remaining groups according to
(1117)
-
- Calculate the energy difference between maximum energy of a subvector in the last group and with average sub vector energy of the rest of the groups.
(1118)
where is the maximum energy in the last group according to equation (1119) and is the average energy of the rest of the groups band and it is calculated according to .
(1119)
(1120)
-
- Allocate bits to the last group according to
(1121)
where bw1 is the bit allocation weight, which takes value based on
- Once bits are allocated to last group, remaining bits are calculated according to
(1122)
- Based on the remaining bits obtained from c), bits are allocated to the rest of the groups using following steps.
- Calculate the band energy difference between bands.
(1123)
-
- Identify the maximum band energy difference according to
(1124)
-
- Identify the group which has maximum band energy difference and allocate more bits to the group .
Ifbelongs to group g=1, bits allocation to the groups will be according to
(1125)
If belongs to group g=0 or group g=2, bits allocation to the groups will be according to
(1126)
where Scale1 and Scale2 values are based on table 121
Table 121: Scale factors for groups
|
13.2 kbps |
16.4 kbps |
||
|
Scale1 |
1.05 |
1.1 |
|
|
Scale2 |
0.97 |
0.92 |
|
|
Scale1 |
0.92 |
0.97 |
|
|
Scale2 |
1.0 |
1 |
Bits are allocated to individual bands in a group based on the available bit budget for the groups and it is done according to subclause 5.3.4.1.4.3.2.2.2. If 0 bits are available to any of the groups the corresponding bands in the group are allocated 0 bits.
5.3.4.1.4.3.2.2.2 Bits allocation to bands in a group
The bits allocation to individual bands in a group is applied using the steps given below
- Calculate sum of band energies in a group
(1127)
- Calculate Mean energy for the group
(1128)
- Set scale3 value according to
(1129)
where is a bitallocation weight constant which is used for controlling the amount of bits being allocated to the bands.
is the number of bands whose difference between the Mean energy for the group and band energy of the sub band exceeds 5 dB.
- For each iteration in a group,
- Calculate Mean energy for the group
(1130)
-
- Allocate bits to the individual subvectros in a group according to .
(1131)
-
- Once the bits are allocated to indvidual bands in a group g, identify the band which has least energy in the group and verify whether bits allocated to the identfied band has under allocation using a threshold, if the band has under allocation. update the sum of energies using
where Gmin is the index corresponding to minimum energy Rmin of a band in the group g
-
- Repeat Step g, until there is no under allocation.
The process described in subclause 5.3.4.1.4.3.2.2.2, is used for allocating bits to individual bands for groups g=0,1,2,3.
5.3.4.1.4.3.3 Fine structure encoding
5.3.4.1.4.3.3.1 Trellis Coding Quantization (TCQ)
The spectral coefficient quantization is done as is described in subclause 5.3.4.1.4.1.5.1.
5.3.4.1.4.3.3.2 Noise-filling for 0-bit assigned bands:
This subclause gives a technical overview of the noise filling processing. Noise filling consists of two algorithms. The first algorithm described in this subclause 5.3.4.1.4.1.5.2 fills the gaps in the quantized spectrum where coefficients have been quantized to zero when the bit allocation clause allocates non zero bits to the bands and fills the un quantized bands up to the transition frequency where the transition frequencyis estimated based on the received bit allocation. The second algorithm is described as follows
5.3.4.1.4.3.3.3 PFSC-based gap filling
5.3.4.1.4.3.3.3.1 Overview
This subclause is only applied to SWB and FB input signals. The spectral coefficients which belong to bands which are assigned zero bits from the bit‑allocation clause are not quantized. This means that not all transform coefficients are transmitted to the decoder. From the noise filled quantized spectrum, the gaps in the high frequency region which has zero bit allocation are identified and a predicted spectrum information is generated and the missing frequency bands (gaps) are filled with the predicted spectrum information.
In order to perform gap filling, the most similar match with the selected similarity criteria is searched from the coded and envelope normalized content and encodes the lag index parameter followed by encoding of the noise factor. The lag index and noise factor parameters are used in the decoder for generating the predicted spectrum, which will be used for filling the gaps. Figure 78 illustrates the overview of PFSC based gap filling for Harmonic mode.
Figure 78: Overview of the Harmonic mode PFSC based gap filling
5.3.4.1.4.3.3.3.2 Envelope Normalization
Low frequency content is extracted from the quantized noise filled spectrum, which is stored temporarily in a buffer and supplies the buffer information to noise separator, which separates noise and quantized content. The envelope of the quantized and noise contents is normalized before further processing. The normalization is performed in logarithmic domain and it is according to
Peak to average information is calculated in log domain for the quantized content.
(1132)
where is the maximum absolute amplitude of the quantized spectrum and is the mean of the absolute amplitude of the quantized spectrum.
The quantized content is divided into sub-vectors and the total number of the sub-vectors is based on the table 122. Each sub-vector consists of eight spectral coefficients: This sub-vector division is only available for envelope normalization.
(1133)
The arithmetic means of the 1st and 2nd half of each sub-band and are calculated as follows:
(1134)
(1135)
Next, the geometric mean of and in logarithmic domain of each sub-band, is obtained as:
(1136)
Here, the square root calculation in linear domain is performed by multiplying 0.5.
The , l = 0, …, Nsv, is smoothed using moving average method with a span of seven samples. The envelope smoothed values are obtained as:
(1137)
The smoothed envelope is transformed from logarithmic domain to linear domain as follows:
(1138)
The quantized contents are normalized by multiplying the smoothed envelope in order to obtain the normalized contents:
(1139)
Table 122: Number of sub-vectors for energy normalization
|
Bitrate in kbps |
Nsv |
|
13.2 |
32 |
|
16.4 |
38 |
From the normalized content, thresholds are calculated according to
(1140)
where value is according to.
(1141)
is calculated according to
(1142)
Absolute values of the normalized coefficients are compared to the thresholdsand , which are adaptively calculated by equation (1140). The normalized coefficients with absolute amplitudes above are clipped to to avoid excessive spectral peaks, and the normalized coefficients with absolute amplitudes below are suppressed to 0 to enhance the harmonic structure, according to
Above process i.e., normalization and clipping of the normalized contents is also applied for normalizing the low frequency noise contents and the obtained normalized noise contents is represented as the normalized noise contents are further adjusted according to
(1143)
where is the sparseness ratio and it is calculated according to
(1144)
is the number of non-zero spectrum in the TCQ-quantized low-frequency content.
The normalized low frequency quantized noise filled spectrum is obtained according to
(1145)
5.3.4.1.4.3.3.3.3 Band Search
The high-frequency region of with width hfw is divided into four bands as follows
(1146)
where is the input MDCT spectrum, is the starting position of high frequency, high frequency width is represented as and its corresponding values are according to table 123. The values of band widths corresponding to index i and offsets are according to tables 124 and 125.
Table 123: High frequency width and starting position
|
Bitrate in kbps |
||
|
13.2 |
256 |
312 |
|
16.4 |
300 |
340 |
Table 124: High frequency band width and offsets for 13.2 kbps
|
Index i |
||
|
0 |
56 |
0 |
|
1 |
100 |
56 |
|
2 |
100 |
156 |
|
3 |
56 |
256 |
Table 125: High frequency band width and offsets for 16.4 kbps
|
Index i |
||
|
0 |
60 |
0 |
|
1 |
110 |
60 |
|
2 |
110 |
170 |
|
3 |
60 |
280 |
For sub-band , i=0,1there is a search band which defines from where the best match is searched.
The start position and the width of each sub-band are set as follows:
(1147)
(1148)
where is the number of search positions and it is calculated according to
(1149)
where the number of bits is reserved for the band search for index i.
The best match is searched for every band i ,in order to reduce computational load for calculating correlation, only limited number of input spectral coefficients are selected and used for the calculation of correlation. The selection of the MDCT coefficients is described in subclause 5.3.4.1.4.1.5.3.3.1.
Once the input spectral coefficients are selected, searching process is performed by calculating the correlation between the input spectral coefficients and normalized low-frequency coefficients derived from the envelope normalized coefficients calculated in subclause 5.3.4.1.4.3.3.3.2. Since the correlation is calculated only using the selected coefficients, required computational complexity can be reduced.
The best match is searched for every band i as follows. The correlation for index k’ is computed as
(1150)
where ,, and denote the following
is the correlation between input spectral coefficients and search band for the k’-th lag candidate,
selected number of representative spectral coefficients in the i-th band.: frequency position of the k-th representative spectral coefficient in the i-th sub-band,
: is the start and end position of search lag and its values are calculated according to
(1151)
where is the previous frame best match position for band i
similarly, the energy for index k’ is obtained as
(1152)
where is the energy of the search band for the k’-th lag candidate
The actual similarity measure used is of the form
(1153)
The task is to find k’ which maximizes S’(k’). The complexity of the implementation can be reduced by squaring S’(k’), which removes the absolute value signs as well as square root from the denominator as equation (1107). The best match is now searched efficiently for band i using
where is the best match position for band i and if there is no spectral coefficients in the search band then value takes according to
(1154)
The best match index is packed into the bit stream as the parameter. Once the best match index is obtained, past frame best match index is updated with current frame best match index according to
(1155)
5.3.4.1.4.3.3.3.4 Structure analysis for Harmonics
This subclause gives a technical overview of the structure analysis for harmonics which is applied for SWB and FB Harmonic signals. The purpose of this subclause is to estimate the harmonics from the quantized spectrum and the estimated harmonic is used for generating the new tonal spectrum for the HF region at the decoder, where the HF region is calculated using the frequency transition which is estimated based on the received bit allocation.
The first step of the structure analysis is to extract a portion from the quantized spectrum. A portion from 2 kHz to 6.4 kHz for 13.2 kbps and from 2- 7.5 kHz for 16.4 kbps is used for structure analysis.
From the extracted portion, structure for harmonics is analysed. The detail procedure of this subclause comprises the following steps:
- Grouping of spectral coefficients into a number of blocks of equal length; in total 16 and 19 blocks (Nblock) are formed for 13.2 and 16.4. Kbps with a length of 16 coefficients per block.
- Extract the spectral peak magnitude and their corresponding positions within each block; as shown
(1156)
- From the extracted spectral peaks identify the spectral peaks whose spacing lie closely and discard the spectral peak from analysis which is not perceptually important.
- Split the extracted portion into regions with cut-off at 140, 200, 16Nblock-1 spectral coefficients and count the peaks in each region according to
(1157)
Before the process counteris set to zero.
- Calculate the spacing between the identified peak positions and identify the minimum and maximum spacing between spectral peaks.
(1158)
6) Calculate sum of spacing according to
where are the sum of spacing and the counter increments when the respective conditions are satisfied.
7) Estimate the harmonic frequency based on the spacing between the identified peak positions according to the following.
where are the estimated harmonic frequencies, which can be used for generating the missing bands.
In order to improve the stability of present frame, previous frame estimated frequency information is used.
if and the calculated estimated frequencies has a zero value it indicates that there is no harmonics extracted from 0 to Nblock -1.usually this happens when decoded spectrum is sparse. For this case, is set to previous frame estimated frequencies if it not equal to zero or it is set to a default value of 80. In the encoder, estimated harmonic is used for generating the noise spectrum for the HF region. As the estimated harmonic is determined using the quantized spectrum, there is no need to transmit the parameter to the decoder. Under rate switching conditions this structure analysis for harmonics is also used for bitrates above 16.4 kbps for extracting the harmonics.
5.3.4.1.4.3.3.3.5 Noise filling for the predicted spectrum
First high-frequency region with width hfw is divided into four bands with same band configuration described in subclause 5.3.4.1.4.3.3.3.3
From the envelope normalized noise spectrum, a desired portion of noise spectrum is extracted. The start position and the end of each desired portion in the normalized noise spectrum are set as follows:
(1159)
(1160)
here lagsi is the number of search positions and it is according to equation (1149), lag index value is obtained from subclause 5.3.4.1.4.3.3.3.3.
Estimate the position of harmonics for band i =0, 1 in the predicted spectrum, according to
(1161)
where represents the first tonal position of the HF spectrum estimated based on the end tonal frequency position in the low frequency spectrum. represents pulse resolution for predicted spectrum and is obtained from subclause 5.3.4.1.4.3.3.3.4.
Fill the bands,i=0,1 using extracted noise from the normalized spectrum except in the positions obtained in equation (1161) and fill the remaining bands,i=2,3 by copying the information obtained from the lower bands i=0,1 reversely.
5.3.4.1.4.3.3.3.6 Noise factor
On the decoder side a predicted spectrum is generated for the high frequency region by using the envelope normalized noise-filled quantized spectrum, which is obtained from subclause 5.3.4.1.4.3.3.3.5. The predicted spectrum is generated for the high frequency region, first by extracting the desired noise components from the described in subclause 5.3.4.1.4.3.3.3.5 followed by tonal generation using the envelope normalized quantized spectrum
. To control the amount of noise inserted in the predicted spectrum at the decoder, a noise factor is computed on the encoder side and quantized using a 2-bit scalar quantizer. The procedure for noise factor calculation is described in this subclause.
From the input spectral coefficients , extract the high-frequency region according to
(1162)
where the starting position of high frequency is represented as and high frequency width is represented as. The corresponding values of is same as in table 123.
Tonal components in the are selected using the values obtained in subclause 5.3.4.1.4.3.3.3.3. Calculate energy of the selected coefficients according to
(1162a)
Calculate noise factor using and according to
(1163)
where is the energies obtained using, according to
(1164)
The calculated noise factor Nfac is encoded using a two bit scalar quantizer.
5.3.4.2 High-rate HQ coder
Table 126: High rate HQ supported modes
|
Bitrate [kbps] |
Bandwidth |
Supported modes |
|
32 |
WB |
Normal, Transient |
|
24.4, 32 |
SWB |
Transient, Harmonic, HVQ, Generic |
|
24.4, 32 |
FB |
Transient, Generic |
|
64 |
SWB, FB |
Normal, Transient |
The high level structure of the high-rate HQ coder is in figure 79. The different operating modes of the high-rate HQ coder are outlined in table 126, the high-rate HQ is used at WB, SWB, and FB, at bit-rates 24.4, 32, and 64 kb/s. The coding is done in the MDCT domain. There are 5 different modes: Transient mode handles transient signals by using shorter transforms, Harmonic mode handles harmonic signals and HVQ takes care of strongly harmonic signals. Normal and Generic mode are used for all other signals.
Figure 79: High level structure of the high-rate HQ encoder
The whole frequency spectrum is divided into bands, there are 3 different band structures used: Default, harmonic, and wideband, see tables 127, 128, and 129. Harmonic band structure is used for HVQ and Harmonic mode, wideband band structure is used for WB. Otherwise the default band structure is used.
Table 127: Normal band structure used in high rate HQ
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
|
0 |
8 |
16 |
24 |
32 |
40 |
48 |
56 |
64 |
72 |
80 |
88 |
96 |
|
|
7 |
15 |
23 |
31 |
39 |
47 |
55 |
63 |
71 |
79 |
87 |
95 |
103 |
|
|
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
|
|
104 |
112 |
120 |
128 |
144 |
160 |
176 |
192 |
208 |
224 |
240 |
256 |
280 |
|
|
111 |
119 |
127 |
143 |
159 |
175 |
191 |
207 |
223 |
239 |
255 |
279 |
303 |
|
|
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
|
|
304 |
328 |
352 |
376 |
400 |
424 |
448 |
472 |
496 |
520 |
554 |
576 |
608 |
|
|
327 |
351 |
375 |
399 |
423 |
447 |
471 |
495 |
519 |
553 |
575 |
607 |
639 |
|
|
39 |
40 |
41 |
42 |
43 |
|||||||||
|
640 |
672 |
704 |
736 |
768 |
|||||||||
|
671 |
703 |
735 |
767 |
799 |
Table 128: Wideband band structure used in high rate HQ
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
|
0 |
8 |
16 |
24 |
32 |
40 |
48 |
56 |
64 |
72 |
80 |
88 |
96 |
|
|
7 |
15 |
23 |
31 |
39 |
47 |
55 |
63 |
71 |
79 |
87 |
95 |
103 |
|
|
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
|
|
104 |
112 |
120 |
128 |
144 |
160 |
176 |
192 |
208 |
224 |
240 |
256 |
288 |
|
|
111 |
119 |
127 |
143 |
159 |
175 |
191 |
207 |
223 |
239 |
255 |
287 |
319 |
Table 129: Harmonic band structure used in high rate HQ
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
|
0 |
8 |
16 |
24 |
32 |
40 |
48 |
56 |
64 |
72 |
80 |
88 |
96 |
|
|
7 |
15 |
23 |
31 |
39 |
47 |
55 |
63 |
71 |
79 |
87 |
95 |
103 |
|
|
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
|
|
104 |
112 |
120 |
128 |
144 |
160 |
176 |
192 |
208 |
224 |
256 |
288 |
320 |
|
|
111 |
119 |
127 |
143 |
159 |
175 |
191 |
207 |
223 |
255 |
287 |
319 |
367 |
|
|
26 |
27 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
36 |
37 |
38 |
|
|
368 |
416 |
464 |
512 |
576 |
|||||||||
|
415 |
463 |
511 |
575 |
639 |
The number of bands used in WB is 26, and for FB it is 44. For SWB 39 bands are used, except if Harmonic or HVQ is used. For Harmonic mode 31 bands are used. The HVQ will use bands 21-30 for 24.4 kb/s, or 24-30 for 32 kb/s.
Based on the different modes, the low frequency band signal is encoded with the different bandwidths. In addition, the envelopes of the higher frequency band are encoded differently according to the different modes.
Then, the mode information, the indices of the low frequency band signal and the envelopes of the higher frequency band signal are written to the bitstream.
5.3.4.2.1 Normal Mode
5.3.4.2.1.1 Envelope calculation and quantization
The norm or spectrum energy of a band is defined as the root-mean-square (rms) value of the band and computed as follows:
(1165)
where is the length of the band .
Table 130: Envelope quantization table
|
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|---|---|
|
0 |
10 |
20 |
30 |
||||
|
1 |
11 |
21 |
31 |
||||
|
2 |
12 |
22 |
32 |
||||
|
3 |
13 |
23 |
33 |
||||
|
4 |
14 |
24 |
34 |
||||
|
5 |
15 |
25 |
35 |
||||
|
6 |
16 |
26 |
36 |
||||
|
7 |
17 |
27 |
37 |
||||
|
8 |
18 |
28 |
38 |
||||
|
9 |
19 |
29 |
39 |
In each frame, the norms are scalar quantized with a uniform logarithmic scalar quantizer with 40 steps of 3 dB.
The index of the quantized norm,, can easily be obtained as:
(1166)
and is saturated such that it is limited to the range of [0,31] for 0 and [0, 39] for the others.
The quantization index of the lowest-frequency band, i.e., , is directly transmitted to the decoder. The quantization indices of the norms in the remaining bands are differentially coded by computing
(1167)
The differential indices are constrained into the range of [–15, 16]. This is performed by first adjusting negative differential indices and then adjusting positive differential indices as follows:
1) Compute the differential indices defined in equation (1167) in order from the highest-frequency band to the lowest-frequency.
2)
3) Recompute the differential indices in order from the lowest-frequency band to the highest-frequency.
4)
5) The adjusted differential indices in the range [0, 31] are obtained by adding an offset of 15 to.
5.3.4.2.1.2 Envelope coding
The first index of the quantized norm is transmitted by packing the bits directly using 5 bits. The adjusted quantization differential indices are coded by one of four high-rate HQ norm coding modes shown in table 131. The mode requiring the least bits is selected and used for high-rate HQ norm coding. This HQ norm coding mode information is signalled with 2 bits. The final resulting reconstructed quantized norms are denoted by, .
Table 131: High-rate HQ norm coding modes
|
Mode index |
Description |
|---|---|
|
0 |
Context based Huffman coding |
|
1 |
Resized Huffman coding |
|
2 |
Normal Huffman coding |
|
3 |
Bit-packing |
5.3.4.2.1.2.1 Context based Huffman coding mode
If this coding mode is selected for the current frame, the context based Huffman coding is applied to the quantization differential indices described in subclause 5.3.4.1.3.3.1. In this case equation (1038) shall be replaced with
(1168)
When the bit-packing mode is selected the adjusted differential rates are packed directly with 5 bits.
5.3.4.2.1.2.2 Re-sized Huffman coding mode
If this coding mode is selected for the current frame, the Re-sized Huffman coding is applied to the adjusted differential indices as same as the low-rate HQ coder. Details are described in subclause 5.3.4.1.3.3.2. The Huffman codes for the differential indices are given in table 115. When the bit-packing mode is selected the adjusted differential rates are packed directly with 5 bits.
5.3.4.2.1.2.3 Normal Huffman coding and bit-packing mode
If this coding mode is selected for the current frame, the Normal Huffman coding is then applied to the adjusted differential indices. The Huffman codes for the differential indices are given in table 132.
Table 132: Huffman coefficient table
|
Index |
Code |
Index |
Code |
Index |
Code |
Index |
Code |
|---|---|---|---|---|---|---|---|
|
0 |
0011010 |
8 |
001100 |
16 |
000 |
24 |
0011110 |
|
1 |
0111010 |
9 |
011100 |
17 |
010 |
25 |
0111110 |
|
2 |
1011010 |
10 |
101100 |
18 |
1010 |
26 |
1011110 |
|
3 |
1111010 |
11 |
111100 |
19 |
1110 |
27 |
1111110 |
|
4 |
0011011 |
12 |
0010 |
20 |
001110 |
28 |
0011111 |
|
5 |
0111011 |
13 |
0110 |
21 |
011110 |
29 |
0111111 |
|
6 |
1011011 |
14 |
100 |
22 |
101110 |
30 |
1011111 |
|
7 |
1111011 |
15 |
110 |
23 |
111110 |
31 |
1111111 |
When the bit-packing mode is selected the adjusted differential rates are packed directly with 5 bits.
5.3.4.2.1.3 Bit allocation
5.3.4.2.1.3.1 Envelope adjustment before bit allocation
In order to account for psycho-acoustical weighting and masking effects, the quantized norms are adjusted prior to bit allocation. The algorithm consists of mapping the quantized norms by using spectral weighting functions. This algorithm is only used at FB.
First, the quantized norms are mapped to the spectral domain. This is equivalent to copying the quantized norms in case of stationary signals and averaging the time-dependent quantized norms of the four spectra in case of transients. This is performed according to:
(1169)
The obtained spectrum is afterwards mapped to a function which is similar to the ear’s output of auditory filters; this gives a representation of the psycho-acoustical importance of the input signal. This operation is performed according to the following linear operation:
(1170)
where the constants, and the summation interval are given in table 133. when the bit-rate is 24.4 kb/s, at 32 kb/s, and at 64 kb/s.
Table 133: Spectrum mapping variables
|
0 |
0 |
1 |
3 |
8 |
|
1 |
1 |
1 |
3 |
6 |
|
2 |
2 |
1 |
3 |
3 |
|
3 |
3 |
1 |
3 |
3 |
|
4 |
4 |
1 |
3 |
3 |
|
5 |
5 |
1 |
3 |
3 |
|
6 |
6 |
1 |
3 |
3 |
|
7 |
7 |
1 |
3 |
3 |
|
8 |
8 |
1 |
3 |
3 |
|
9 |
9 |
1 |
3 |
3 |
|
10 |
10, 11 |
2 |
4 |
3 |
|
11 |
12, 13 |
2 |
4 |
3 |
|
12 |
14, 15 |
2 |
4 |
3 |
|
13 |
16, 17 |
2 |
5 |
3 |
|
14 |
18, 19 |
2 |
5 |
3 |
|
15 |
20, 21, 22, 23 |
4 |
6 |
3 |
|
16 |
24, 25, 26 |
3 |
6 |
4 |
|
17 |
27, 28, 29 |
3 |
6 |
5 |
|
18 |
30, 31, 32, 33, 34 |
5 |
7 |
7 |
|
19 |
35, 36, 37, 38, 39, 40, 41, 42, 43 |
9 |
8 |
11 |
The mapped spectrum is forward-smoothed according to the following:
(1171)
and the resulting in-place function is backward-smoothed according to:
(1172)
After the smoothing operation, the resulting function is thresholded in order to take into account an average level of absolute threshold of hearing. Thresholding and renormalization are performed according to:
(1173)
where is given by table 133 and represents a pseudo-threshold of hearing.
The resulting function is further adaptively mapped, companded or expanded depending on the dynamic range of the spectrum, to the range of [–0,…,3] according to the following linear mapping:
(1174)
The resulting spectrum is mapped back to bands according to:
(1175)
If the variable IsTransient is set to TRUE, i.e., transient mode, the resulting spectrum is further smoothed according to:
(1176)
Finally, the norms used for bit allocation are computed as:
(1177)
5.3.4.2.1.3.2 Envelope based bit allocation
5.3.4.2.1.3.2.1 Wideband bit allocation of non-transient mode at 24.4/32kbps
A group based bit allocation scheme has been introduced for wideband signal coding at 24.4kbps and 32kbps to avoid zero bit allocations to some of the sub-bands when they may be perceptually important.
5.3.4.2.1.3.2.1.1 Group based bit allocation
The sub-bands are divided into three groups. Firstly, the initial number of allocated bits to each group is determined according to the sum or the average of the norms in each group. Based on the initial number of allocated bits to each group, the second stage group based bit allocation is performed according to the characteristics and the energy information of the signal.
To achieve the group based bit allocation, firstly the average of the quantized norms in the index range is calculated,
(1178)
For the first 4 sub-bands, if the norm for bit allocation is less than, then the norm is set to .
As a first step in the bit allocation, 1 bit is allocated to code each sub-band, and then the sub-bands are divided into 3 groups, i.e. [0,…, 15], [16,…, 23], [24,…, 25].
Then calculate the following parameters which indicate the characteristics and the energy information of the signal for group based bit allocation.
The factors are initialized. These factors are then employed to adjust the norms, and then the average of the norms in each group is calculated along with the sum of the averages.
(1179)
(1180)
Table 134: Spectrum mapping group structure
|
0 |
0 |
15 |
16 |
|
1 |
16 |
23 |
8 |
|
2 |
24 |
25 |
2 |
The differences of the consecutive averages and calculated:
(1181)
If is greater than or equal to 6, or is greater than or equal to 3.75, then the number of bits allocated to each group is described as follows,
(1182)
where is the total available bit budget. Otherwise, the bit allocation for each group is detailed as follows:
Initialize the number of bits allocated to each group using the following equation,
(1183)
If the average envelope of the third groupis more than 12, then adjust the number of bits of each group further:
(1184)
where is the number of the saved bits and initialized to zero. If, the saved bits are re-allocated to each group according to:
(1185)
and reset the factors.
If the number of bits allocated to the third group is less than 3, then the number of allocated bits in the third group are moved to the second group , and the number of bits allocated to the third group is set to zero, .
5.3.4.2.1.3.2.1.2 Bit allocation in each group
For the 3 sub-band groups, the smallest thresholds for the number of allocated bits are 5, 6 and 7, respectively, and the largest number of the sub-bandswhich are bit-allocated is calculated according to:
(1186)
The norms in each group are re-ordered and the re-ordered norms are , where is the number of the sub-bands in each group.
The step length is initialized, and the re-ordered norms are adjusted according to the step length, and :
(1187)
The bits allocated to each sub-band are set according to the adjusted norms in each group as follows:
If, initialize the allocated bit of each sub-band to 1. If the adjusted norm is less than 0, reset it to 0. Then the following 4 steps are performed :
- Initialize the counter.
- Calculate the sum of the first norms in the group, , and allocate the bits of each sub-band in the index range :
(1188)
- If the number of bits allocated to the last sub-band is fewer than thresholds, , then the number of allocated bits in the sub-band is set to zero and the counter is decremented by one. Processing now returns to step 2.
- If the number of bits allocated to the sub-band is not fewer than thresholds, then the bit-allocation of the sub-bands in each group has been completed.
Otherwise, if , initialize the number of allocated bits of the first sub-bands to 1, and initialize the bit of the last sub-bands to 0. If the adjusted norms of the first sub-bands are less than 0, set them to 0. Then the following 4 steps are performed:
- Initialize the counter.
- Calculate the sum of the first norms in the group, , and allocate the bits of each sub-band in the index range :
(1189)
- If the number of the bits allocated to sub-band is fewer than thresholds, , then the number of allocated bits in the last sub-bands is set to zero and the counter is set to to . Processing now returns to step 2.
- If the number of bits allocated to all sub-bands is not fewer than thresholds, then the bit-allocation of the sub-bands in each group has been completed.
5.3.4.2.1.3.2.2 General envelope based bit allocation
The adaptive bit-allocation scheme uses the adjusted quantized norms , to allocate the available bits in a frame among the bands.
The maximum number of bits assigned to each normalized transform coefficient is by default set tobits/coefficient.
First, the bit-allocation vector entries, i.e., bit allocation of each band in bits per sample, are set to zero. This is done according to:
for (1190)
The number of remainder bits, denoted , is set to the total available bit budget. The latter is calculated after subtraction of the number of signalling bits, the bits used by the envelope coding and possibly the noise level bits (see subclause 5.3.4.2.1.4) from the total available bits for the frame.
The vector of bit-allocations,, remainder and fulfil:
(1191)
At each iteration, the index of the band which has the largest norm among number of bands used is found:
(1192)
For this band, the algorithm allocates 1 bit for each spectral coefficient, i.e.,is incremented by 1. The norm will, on the other hand, be decremented by 6 dB, i.e., the norm index is decreased by two. These two operations are performed according to:
(1193)
The remainder is updated as well to take into account the updated bit-allocation vector:
(1194)
Whenever the bit-allocationreaches the maximum allowable bit-rate, its norm is set to minus infinity (in reality, to ) such that this vector is not taken into account in the next iterations. This procedure is repeated until is less than.
When the iterative procedure stops and, depending on the value of the remainder bits, the remaining bits are allocated to bands of a lower dimension than which caused the stop of the bit‑allocation loop.
The last step is to convert the bit allocation from bits per coefficients to total bits per band:
(1195)
5.3.4.2.1.3a Fine structure quantization
The normalized MDCT spectrum is encoded in bands using the bit allocation .
5.3.4.2.1.3.a1 Fine gain bit allocation
The spectral envelope coefficients have been quantized in 3 dB steps. For higher rates, this resolution becomes too coarse and additional fine gain adjustments are made. The bits assigned for the band is split into a number of bits for the PVQ shape quantizer and bits for the fine gain quantizer . The assignment is done using a pre-trained look-up table based on the assigned band bitsand the bandwidth according t
(1196)
where is a lookup-table for fine gain bits (see table 135) and is the number of bits per sample rounded down. The bits for the PVQ shape quantization are obtained by subtracting the fine gain bits, .
Table 135: Fine gain bits table
|
0 |
0 |
|
1 |
0 |
|
2 |
0 |
|
3 |
1 |
|
4 |
2 |
|
5 |
2 |
|
6 |
4 |
|
7 |
5 |
5.3.4.2.1.3a.2 Fine structure quantization using PVQ
The quantization of the PVQ shape vector is performed as described in subclause 5.3.4.2.7. A decoded version of the PVQ shape vector is also obtained.
5.3.4.2.1.3a.3 Fine gain prediction, quantization and application
The fine gain adjustment is based on a predicted value, . It is formed using an accuracy measure which combines the band bitrate , band size and the largest absolute integer pulse valueof the synthesized band.
(1197)
where is the actual number of pulses used for encoding and depends on the bitrate and the PVQ encoding process.The accuracy measure is then translated into a gain prediction following this relation:
(1198)
where approximates the optimal MMSE gain. In case there are bits allocated, further refinement of the fine gain may be done by encoding the gain prediction error , defined as
(1199)
where the optimal MMSE gain is defined as
(1200)
and the normalization factor to RMS=1.0 is
(1201)
The gain prediction error is encoded with a non-uniform scalar quantizer in log domain using bits.
5.3.4.2.1.4 Noise level adjustment
The spectral coefficients which belong to bands which are assigned zero bits from the bit‑allocation procedure are not encoded. This means that not all transform coefficients are transmitted to the decoder.
The level of these non-coded spectral coefficients is estimated and quantized in the encoder. This is not done in WB at rates 24.4 kb/s and 32 kb/s.
The estimation of the non-quantized signal level is done directly in the normalized spectrum domain. Prior to estimating the noise level, a transition frequency between the noise fill region and high frequency noise fill region is estimated. This transition frequency is identically estimated in the encoder and decoder and marks the start of high frequency noise fill and the end of noise fill.
The transition frequency is estimated according to the criterion of the last quantized band. The general method consists in looping through all the bands, starting at down to 0. If there are no quantized coefficients in the current band, it will be flagged as filled by high frequency noise fill. If there are quantized coefficients in the band, the holes of this band as well as the following bands are filled using noise fill. Figure 80 illustrates such a procedure, where the transient frequency separates between noise-fill and high frequency noise-fill.
Figure 80: Estimation of transition frequency
Figure 81: Noise level estimation done below the transition frequency
The noise level is in turn estimated by measuring the average level of the normalized non-coded signal below the transition frequency, see figure 81. Formally this is obtained by:
(1202)
where is the set of indices of coefficients allocated zero bits below the transition frequency ft.
The above equation always returns a value which is below zero because of the convexity of the logarithm. The noise level is quantized using a two-bit scalar quantizer according to table 136.
Table 136: Codebook entries for the uniform noise level quantizer
|
Index |
Output quantized NoiseLevel (dB) |
|---|---|
|
0 |
0 |
|
1 |
–6 |
|
2 |
–12 |
|
3 |
–18 |
5.3.4.2.2 Transient Mode
5.3.4.2.2.1 Envelope calculation and quantization
The bands are sorted according to equation (1203), where is defined in tables 137, 138, and 139. Then the envelope calculation is done as in subclause 5.3.4.2.1.1 but using instead of . After the quantization then the sorting is inverted by doing the inverse step of equation (1203).
(1203)
Table 137: Envelope sorting table in transient mode for WB
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
|
0 |
1 |
8 |
9 |
16 |
20 |
24 |
21 |
17 |
11 |
10 |
|
|
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
|
|
3 |
2 |
4 |
5 |
12 |
13 |
18 |
22 |
25 |
23 |
19 |
|
|
22 |
23 |
24 |
25 |
||||||||
|
15 |
14 |
7 |
6 |
Table 138: Envelope sorting table in transient mode for SWB
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
|
0 |
1 |
8 |
9 |
16 |
20 |
24 |
28 |
32 |
36 |
37 |
|
|
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
|
|
33 |
29 |
25 |
21 |
17 |
11 |
10 |
3 |
2 |
4 |
5 |
|
|
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
|
|
12 |
13 |
18 |
22 |
26 |
30 |
34 |
38 |
35 |
31 |
27 |
|
|
33 |
34 |
35 |
36 |
37 |
38 |
||||||
|
23 |
19 |
15 |
14 |
7 |
6 |
Table 139: Envelope sorting table in transient mode for FB
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
|
0 |
1 |
8 |
9 |
16 |
20 |
24 |
28 |
32 |
36 |
40 |
|
|
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
|
|
41 |
37 |
33 |
29 |
25 |
21 |
17 |
11 |
10 |
3 |
2 |
|
|
22 |
23 |
24 |
25 |
26 |
27 |
28 |
29 |
30 |
31 |
32 |
|
|
4 |
5 |
12 |
13 |
18 |
22 |
26 |
30 |
34 |
38 |
42 |
|
|
33 |
34 |
35 |
36 |
37 |
38 |
39 |
40 |
41 |
42 |
43 |
|
|
43 |
39 |
35 |
31 |
27 |
23 |
19 |
15 |
14 |
7 |
6 |
5.3.4.2.2.2 Bit allocation
For most bit rates bit allocation is done as is described in subclause 5.3.4.2.1.2.1 and 5.3.4.2.1.2.2. For SWB at 24.4 and 32 kbps the Generic mode bit allocation is used, as described in subclause 5.3.4.2.5.6.
5.3.4.2.2.3 Fine structure quantization using PVQ
The spectral coefficient quantization is done as is described in subclause 5.3.4.2.1.4.
5.3.4.2.3 Generic, Harmonic and HVQ mode detector
Non-transient signals are further classified as Generic mode, Harmonic mode or HVQ at 24.4 kb/s and 32 kb/s for SWB inputs. The detailed classification is described in subclause 5.3.4.1.1, with slight differences in the final decision logic as follows.
At 24.4 kb/s the current frame mode is classified as harmonic if , and .
At 32 kb/s the current frame mode is classified as harmonic if , and ; or if .
5.3.4.2.3.1 HVQ classifier
Instantaneous noise-level and peak-level are estimated from the absolute values of transform coefficients, where at 24.4 kb/s and at 32 kb/s. The noise-level is calculated as:
(1204)
where
(1205)
The peak-level is calculated as
(1206)
where
(1207)
and both and are initialized to 800.
The per-band averages of noise-level and peak-level are calculated by averaging instantaneous level in a band (every 32 bins). The number of bands is at 24.4 kb/s and at 32 kb/s.
The average of the elements of the first half of gives the noise-floor gain, while the second half of coefficients forms. In a similar way the per-band peak-levels produce two peak energy gains and.
The decision to switch to HVQ mode is based on the threshold in table 140, and three variables, , , and , defined below.
The threshold for selecting peak candidates is calculated as:
(1208)
Absolute values of transform coefficients are compared to the threshold, and the ones with amplitude above it, form a vector of peak candidates. Elements from the peaks candidate vector are extracted in decreasing order, and when a peak is extracted the threshold over the neighboring peaks is adjusted with {0.7071068, 0.5000000, 0.2500000, 0.5000000, 0.7071068}. This procedure produces a set of spectral peaks, with number.
A measure of frequency sharpness per-band , similar to the one in 5.3.4.1.1, is defined as peak to noise-floor ratio in each band:
(1209)
Variable is calculated as the number of bands for each
Variable is calculated as
(1210)
Table 140: Thresholds for HVQ mode decision
|
rate |
|||
|
24.4 kb/s |
4 |
20 |
22 |
|
32 kb/s |
7 |
23 |
22 |
The current frame is encoded in HVQ mode if:, , and
5.3.4.2.4 Harmonic Mode
The harmonic mode is used to code the harmonic-like signal. For harmonic signals, usually less noise filling is performed. It is also important to mitigate any discontinuity in the higher sub-band due to changes in core coding and for the most part it is preferable at 24.4 kb/s and 32kb/s, when there are insufficient bits to encode the full signal, to stop short of coding right up to the Nyquist frequency.
5.3.4.2.4.1 Envelope calculation and quantization
Envelope calculation and quantization is performed as described in subclause 5.3.4.2.1.1.
In order to reconstruct harmonic characteristics of the higher frequency band signal, the widths of the bands are larger than the ones for non-harmonic mode.
5.3.4.2.4.2 Bit allocation
The band-energy limitation factor is introduced before bit allocation to mitigate discontinuous core coding in the higher sub-band.
Initialize the band-energy limitation factor
(1211)
Calculate the energy of the first 10 sub-bands and the energy of the first 28 sub-bands , and add the norms to when . When , is the index of the highest encoded sub-band.
Reorder the quantized norms with the index range to obtain the reordered norms, and adjust the quantized norms as follows:
(1212)
Then, the bit allocation is performed based on the adjusted norms to the sub-bands with the index range as described in subclause 5.3.4.2.1.2.2.
5.3.4.2.4.3 PVQ
The spectral coefficient quantization and coding is done according to the number of bits allocated to each sub-band as is described in subclause 5.3.4.2.1.3.
5.3.4.2.5 HVQ
The HVQ mode is used only for SWB signals at 24.4 kb/s and 32 kb/s. At 24.4 kb/s it initially codes the first 224 MDCT coefficients (this corresponds to frequency range up to 5.6 kHz), while at 32 kb/s it initially codes the first 320 coefficients (this corresponds to frequency range up to 8 kHz). The major algorithmic steps at the encoder are: detect and code spectral peaks regions, code low-frequency spectral coefficients (the size of coded region depends on the remaining bits after peaks coding), code noise-floor gains for spectral coefficients outside the peaks regions, code high-frequency spectrum envelope to be used with the high-frequency noise-fill.
The input to the HVQ mode is the set of MDCT coefficients, the noise-floor gains, and the spectral peaks (both noise-floor gains and spectral peaks are calculated in the classification module, described in subclause 5.3.4.2.3.1). Each peak is normalized to unit energy and the surrounding 4 neighbours are normalized to with the peak gain. The peaks position, gain and sign are quantized. A VQ is applied to the four MDCT bins surrounding each peak. The coded number of peaks, peaks position, gain and sign, as well as the surrounding shape vectors are quantized and quantization indices transmitted to the decoder.
First sorted by energy peaks are arranged by position. Then peaks amplitudes are differentially coded by 5 bits SQ on a log domain, and the indices Huffman coded to form a quantized peak gains. Prior to quantization the peak gains are multiplied by 0.25 and after the quantization multiplied by 4.
The spectral peak positions are coded by choosing between two alternative lossless coding schemes, where the coding scheme that requires the least number of bits is selected, and explicitly indicated to the decoder.
The first lossless spectral peak position coding scheme is delta Huffman coding, suitable for periodic or semi-periodic spectral peak position distributions. The second lossless spectral peak position coding scheme is or-ing, suitable for sparse spectral peak position distributions.
The first scheme consists of the following steps: deltas (differences) between consecutive elements (positions) are created. Then these deltas are Huffman coded. Since the peaks are selected in a way that they cannot be positioned closer than 3 positions apart, is subtracted from the peak differences.
(1213)
This eliminates the need of keeping codewords in the Huffman table, corresponding to unused deltas.
The second scheme consists of the following steps: the vector representing the spectral peak positions (absence of peak is indicated with 0) is divided into consecutive equal size (5 elements) bit groups, and the bits in each group are OR-ed forming a group bit vector (second layer), which is 5 time shorter. Each bit in this second layer indicates presence or absence or peak in the 5-dim group from layer below. In this way the only 5-dim groups from the first layer, which are not indicated as all-zero by the second layer have to be transmitted. The second (control) layer is always transmitted. The non-zero bit groups from the first layer also are mapped to exploit the constraints the fact that peaks cannot be closer than 3 positions (not all possible 5-dim vectors are allowed).
The selected coding scheme is explicitly signaled to the decoder with one bit: indicates sparse coding scheme, while indicates usage of delta coding scheme. Here is the largest distance between two consecutive peaks, which is compared to the largest difference possible to code with the pre-stored Huffman tables, is the total number of bits consumed by the delta coding scheme, and is the total number of bits consumed by the sparse coding scheme
(1214)
Sign of the spectral peaks is coded separately with 1 bit per-peak, with 0 indicating negative sign and 1 indicating positive sign.
The peak regions to be coded are 5-dim MDCT vectors; a bin corresponding to the spectral peak and 2 MDCT bins on each side of the peak. The peak amplitude of the central bin is used to normalize the entire peak region. In this way the central bin is scaled to unit energy, while surrounding 4 bins are normalized relative to the central one. The shape vector of the peak region, centered at bin is defined as:
(1215)
Each shape vector is quantized with 9 bits; 8 bits for the VQ index and 1 bit for classification. The numbers of peak regions vary over frames; this means different number bits will be required for coding the shape vectors, which will result in variable number bits used in the PVQ coding of low-frequency MDCT bands (except for the first low-frequency band, which has reserved bits).
Variations in the number of peaks per-frame results in different number of VQs, which leads to large variation in complexity. To keep the complexity nearly constant, while achieving low quantization error, the following approach is used: the search for each is performed in a structured CB, with dynamically selected offset and size of the search region. The starting point for the search is determined by initial classification of the input shape vector, while the length of the search region depends on the number of received shape vectors.
The codewords in the CB used for quantization of the shape vectors are order based on their distance to two pre-demined classes, with centroids and. The CB is structured in a way that the codewords closest to and most distanced to are in one side of the CB, while codewords closest to and most distanced to are clustered in the other side of the CB. The distance between the input vectors to each of the classes determines the starting point for the search. Since the codevectors in the codebook are sorted according to a distortion measure reflecting the distance between each codevectors and the centroids, the search procedure goes first over set of vectors that is likely to contain the best match.
The search space is dynamically adjusted to the number of input vectors. The maximum search space is used with 8 peaks or less at 24.4 kb/s, and 12 peaks or less at 32 kb/s. When larger numbers of peaks are to be quantized in the current frame, the search space is reduced to limit the peak complexity.
Table 141: Adaptive search space in HVQ at 24.4 kb/s
|
17 |
16 |
15 |
14 |
13 |
12 |
11 |
10 |
9 |
8 |
1 |
||
|
128 |
136 |
145 |
155 |
167 |
181 |
197 |
217 |
241 |
256 |
256 |
Table 142: Adaptive search space in HVQ at 32 kb/s
|
23 |
22 |
21 |
20 |
19 |
18 |
17 |
16 |
15 |
14 |
13 |
12 |
1 |
||
|
128 |
134 |
141 |
149 |
158 |
168 |
179 |
192 |
206 |
224 |
244 |
256 |
256 |
The target shape vector exhibits certain symmetries (the MDCT coefficients on the both sides of the spectral peak have similar statistics) that can be used to optimize centroids for the class selection and the CB. A “flipped” version of the centroids used in the initial classification, “flipped” version of the CB (not pre-stored, but through modified search) can capture this symmetry in the shape vectors.
First the input shape vector is compared to 4 centroids (each centroid representing a respective class of codevectors in a codebook), which determines a starting point to the search. If or are selected, the same logic is used, but the search is performed in a “flipped” CB.
Table 143: Centroids for the class selection in HVQ search
|
Class |
||||
|
-0.2324457 |
-0.4390556 |
0.0651793 |
0.2109977 |
|
|
0.1471332 |
-0.1351437 |
0.4312476 |
-0.1384814 |
|
|
0.2109977 |
0.0651793 |
-0.4390556 |
-0.2324457 |
|
|
-0.1384814 |
0.4312476 |
-0.1351437 |
0.1471332 |
|
In case of close peaks with overlapping shape vectors, a weighted minimum-mean-squared error is used in the VQ search. Zero weights are assigned to the overlapping shape elements that belong to the peak with lowest amplitude. In this way the CB entries are matched against the meaningful coefficients only.
After the peak regions are extracted and quantized, all remaining bits that are not reserved for signalling are used to quantize the low frequency MDCT coefficients. This is done by grouping the remaining un-quantized MDCT coefficients into 24-dimensional bands (not including already coded peak regions). A bit budget for coding the first band always exists, but the total number of coded bands depends on the bits left after peak coding. The number of bands to be coded with remaining bits is determined by dividing the number of available bits by the maximum number allowed per-band, which is set to at 24.4 kb/s and at 32 kb/s.
(1216)
Then the selected bands are gain-shape quantized. Gains are SQ on log domain with 5 bits, and shape are PVQ quantized as described in subclause 5.3.4.2.7.
A coded band is introduced above 5.6 kHz for 24.4 kb/s and 8 kHz for 32 kb/s if energy in the high band is relatively high compared to the peak coded region in the lower band, the band has high energy compared to the neighbouring high-frequency bands, and there is sufficient number of bits for encoding band of that size.
To encode the high-frequency band with band log energyand bandwidth following conditions have to be simultaneously met:
(1217)
where an estimate of the low band energy is obtained through summation over all peak amplitudes, coded at low-frequencies:
(1218)
and denotes the number of bits required to encode one pulse in a band of width . The average band log energy at high-frequencies is denoted by and the amount available bits by .
void (1219)
The two noise floor gains use in the low-frequency noise-fill and are scalar quantized with 5 bits on a log domain. The high-frequency gains and are transmitted to the decoder for the high-frequency noise-fill. First two variables: the noise-floor and peak-energy gains and are calculated, in similar way as described in subclause 5.3.4.2.3.1, but over high-frequency MDCT coefficients (coefficients above 224 at 24.4 kb/s and above 320 at 32 kb/s). That is the summation is done over for 24.4 kb/s and for 32 kb/s. Then the two gains are calculated as:
(1220)
These gains are quantized with 2 bit uniform SQ to form quantized high-frequency gains and transmitted and used in the high-frequency noise-fill.
5.3.4.2.6 Generic Mode
The first stage in this mode is the selection of one of three types of high frequency excitation class which is followed by the separate encoding of the low and high frequency envelopes. The low frequency envelope is quantized and coded in the same manner as the normal high rate mode of the HQ coder. The high frequency envelope is first quantised in the generic mode domain before being mapped onto the HQ normal mode domain, re-quantized and then combined with the low frequency envelope. An initial bit allocation is determined, and then delta’s are calculated and coded. The coded values are used to update the combined envelope, before the final bit allocation is determined.
Figure 82: Generic mode Encoder Block Diagram
5.3.4.2.6.1 Band allocation for the Generic Mode
The band allocation for the low frequency envelope at 24.4 and 32kbps is the same as default mode band allocation defined in table 127. The band allocation for the high frequency envelope at 24.4 and 32kbps is shown in table 144.
Table 144: Band Allocation for the high frequency envelope
|
Band index |
SWB, FB @ 24.4kbps |
SWB, FB @ 32kbps |
||||
|
0 |
320 |
335 |
16 |
384 |
399 |
16 |
|
1 |
336 |
359 |
24 |
400 |
423 |
24 |
|
2 |
360 |
375 |
16 |
424 |
439 |
16 |
|
3 |
376 |
399 |
24 |
440 |
463 |
24 |
|
4 |
400 |
415 |
16 |
464 |
479 |
16 |
|
5 |
416 |
439 |
24 |
480 |
503 |
24 |
|
6 |
440 |
455 |
16 |
504 |
519 |
16 |
|
7 |
456 |
479 |
24 |
520 |
543 |
24 |
|
8 |
480 |
503 |
24 |
544 |
567 |
24 |
|
9 |
504 |
527 |
24 |
568 |
591 |
24 |
|
10 |
528 |
551 |
24 |
592 |
615 |
24 |
|
11 |
552 |
575 |
24 |
616 |
639 |
24 |
|
12 |
576 |
607 |
32 |
640 (FB) |
679 (FB) |
40 (FB) |
|
13 |
608 |
639 |
32 |
680 (FB) |
719 (FB) |
40 (FB) |
|
14 |
640 (FB) |
679 (FB) |
40 (FB) |
720 (FB) |
799 (FB) |
80 (FB) |
|
15 |
680 (FB) |
719 (FB) |
40 (FB) |
– |
– |
– |
|
16 |
720 (FB) |
799 (FB) |
80 (FB) |
– |
– |
– |
5.3.4.2.6.2 High frequency Excitation Class
There are three different high frequency excitation classes, one each for speech, tonal music and non-tonal music. The HF_Speech_excitation_class is determined by the instantaneous result of the speech/music classifier, i.e. by applying the output of the first speech/music classifier without adding any hang-over in subclause 5.1.13.6.3.
(1221)
If the output of the classifier indicates music then a tonality measurement is calculated.
(1222)
where is the number of bands for calculating tonality, 10 at 24.4kbps and 8 at 32kbps.
This tonality value is then thresholded to further subdivide the excitation into HF_excitation_class0 for noisy signals or HF_excitation_class1 for tonal signals. The bit allocations are shown in table 145.
Table 145: Bit Allocation for the HF Excitation Classes
|
High Frequency Excitation Classes |
Code |
Num of bits |
|
HF_excitation_class0 |
00 |
2 |
|
HF_Speech_excitation_class |
1 |
1 |
|
HF_excitation_class1 |
01 |
2 |
5.3.4.2.6.3 Low Frequency Envelope Quantization and Coding
The low frequency envelope is quantized and coded in the same manner as described for the normal high rate mode of the HQ coder, subclause 5.3.4.2.1.1
For both SWB and FB the transmitted low frequency envelope is approximately 8 kHz at 24.4 kbps and increases to approximately 9.6 kHz at 32kbps. The number of bands transmitted by the low frequency envelope quantization and coding is defined as 27 at 24.4kbps and 30 at 32kbps.
5.3.4.2.6.4 High Frequency Envelope Quantization
The high frequency envelope is quantized in a similar manner as described in subclause 5.2.6.2.1.5, but with different starting frequencies. For SWB at 24.4kbps the frequency range is 8 to16 kHz with 14 dimensions, while at 32kbps the range is 9.6 to 16 kHz with 12 dimensions. For FB at 24.4kbps the frequency range is 8 to 20 kHz, while at 32kbps the range is 9.6 to 20 kHz with 3 additional bands.
First the high frequency envelope is calculated, and then the energy control tool is applied as described in subclause 5.2.6.2.1.5 at SWB bands (14 bands at 24.4kbps, 12 bands at 32kbps) for both SWB and FB. When performing energy control, the variable from equation (715) in subclause 5.2.6.2.1.5, is set to 0.55. However, the copied spectrum used to generate the simulated spectrum is generated by using the frequency mapping as defined in the following table, rather than the mapping used in subclause 5.2.6.2.1.5 and defined in table 60.
Table 146: Frequency mapping to generate base excitation spectrum
|
BW, Bit-rate |
l |
||||
|
SWB, FB @ 24.4kbps |
0 |
2 |
239 |
320 |
447 |
|
1 |
2 |
239 |
448 |
575 |
|
|
2 |
80 |
143 |
576 |
639 |
|
|
SWB, FB@ 32kbps |
0 |
2 |
239 |
384 |
511 |
|
1 |
2 |
239 |
512 |
639 |
VQ is then applied as described in subclause 5.2.6.2.1.5. At 24.4kbps the VQ is identical to that used at Non-TRANSIENT mode, but at the 32kbps, the VQ is modified as shown in figure 83:
Figure 83: VQ for HQ generic mode at 32kbps
In the first stage, three candidate indices are chosen using the weighted mean squared error minimization criterion. The 6 values in even positions, as well as the last (11th) position are selected and quantized using a 7 dimensional VQ with 5 bits.
(1223)
The quantization error is calculated:
(1224)
where is the de-quantized value.
Then the errors are split into and and quantized:
(1225)
The two quantized and de-quantized values and are then combined:
(1226)
At odd positions (excluding position 11), an interpolation using boundary values is applied for intra-frame prediction and the predicted error is calculated and quantized:
(1227)
is then split into and and each is then quantised to 5 bits with a 3 dimensional VQ.
For FB, the energies for three additional bands are calculated. These are then quantised using 5 bits, after subtracting the mean vector (shown in table 147)
Table 147: Mean vector in FB
|
j |
FB |
|
0 |
13.75 |
|
1 |
6.29 |
|
2 |
3.70 |
The final selected set of indices for SWB, or for FB are then transmitted.
5.3.4.2.6.5 High Frequency Envelope Refinement and Fine structure quantization
After de-quantisation the value in each band of the de-quantized high frequency envelope is mapped to one of the HQ high rate normal mode bands in order to match the frequencies. The following energy displacement factors are used:
The mapping for SWB at 24.4kbps is:
(1228)
The mapping for SWB at 32kbps is:
(1229)
The mapping for FB is:
(1230)
The resulting mapped envelope is then re-quantized using the same method as is used for the HQ high rate normal mode, as described in subclause 5.3.4.2.1.1.
The quantized high and low frequency envelopes are then combined, and an initial fractional bit allocation is carried out. The initial bit allocation for the SWB generic mode is described in 5.3.4.2.6.6. However, in the first step, equation (1235) is replaced with the following equation:
(1231)
The initial bit allocation used for FB generic mode is the same as the bit allocation used for the Normal mode as is described in 5.3.4.2.1.3.
The information obtained from the bit allocation indicates whether or not the envelope refinement is required for the current frame. If there are any high bands which have allocated bits, delta coding needs to be done to refine the high frequency envelope. In other words, if there are any important spectral components in the higher bands, the refinement is performed to provide a finer spectral envelope. If there are no bits allocated to the higher bands during the initial bit allocation, the envelope refinement is not required and the initial bit allocation is used. In this case the envelope requires no further modification so the following steps are not required and the fine structure quantization is applied to spectrum to quantize the coefficients as is described in subclause 5.3.4.2.1.3.
The quantized and de-quantized norms are calculated with the scalar quantizer as shown in subclause 5.3.4.2.1.1 by using the original spectrum, which it is defined to. The scalar quantized and de-quantized norms are also calculated using the mapped norms with the vector quantization, which is defined to.
Deltas are calculated at every band with allocated bits:
(1232)
A possible bit for representing allspans 2,3,4 and 5 bits, and they are encoded as using 2 bits. So, the possible bit is calculated with and it then is adjusted to fit within the maximum number of bits (5). Any which exceeds the maximum value is corrected to 31 or -32 depending on a sign of .
(1233)
The corrected is transmitted with .
The value of is then used to compute a final update to the envelope.
(1234)
where is the updated norm.
The fine structure quantization is then applied to the resulting spectrum to quantize the coefficients as is described in subclause 5.3.4.2.1.3.
Finally the initial bit allocation information is updated, based on the number of bits used for representing the deltas. This is done by reducing the bits allocated to some sub bands, to provide enough bits to code the deltas. If during the initial bit allocation a sub band was allocated more than 3 bits, its allocation is reduced by one bit until all the bits required for the deltas have been accounted for. This is shown in more detail in the pseudo code below.
while (Bits_needed_for_deltas > 0)
{
Current_band = number_of_bands-1
while(bits_needed_for_deltas >0 and Current_band >= 0)
{
if (Bits_allocated_to_current_band > 3)
{
Bits_allocated_to_current_band —
Bits_needed_for_deltas —
}
Current_band —
}
}
This procedure provides the final bit allocation.
5.3.4.2.6.6 Bit Allocation
In the Generic mode, a fractional bit allocation is used for the spectral quantizer bit allocation. This permits the allocation of bits with three bits fractional parts. Initially bits for each band are estimated by:
(1235)
where is a total bit budget.
The fully allocated bits are calculated as a starting point and the first-stage iterations are done to re-distribute the allocated bits to the bands with non-zero bits until the number of fully allocated bits is equal to the total bit budget.
(1236)
where is the number of spectral lines in all bands with allocated bits after k iterations.
If too few bits are allocated, this can cause a quality degradation due to the reduced SNR. To avoid this problem a minimum bit limitation is applied to the allocated bits. The first minimums for the bits consist of constant values depending on the band index and bit-rate.
(1237)
In the second-stage iterations, the re-distribution of bits is done again to allocate bits to the bands with more than bits. The value bits indicates the number of bits that corresponds to 1 bit/sample in band and is the second minimum required bits for each band. Initially, the allocated bits are calculated based on the result of the first-stage iteration and the first and second mininum required bits for each band.
(1238)
where is the allocated bits after the first-stage iterations andis 2 at 24.4kbps and 3 at 32kbps.
The is updated by subtracting the number of bits in bands with bits , and the band index is updated to which indicates the band indices with higher bits than bits. is updated to which is the number of bands for . The second-stage iterations are then done until the updated () is equal to the number of bits in bands with more than bits.
(1239)
where is the number of spectral lines in all bands with more than bits after k iterations.
During the second-stage iterations, if there are no bands with more than bits, the bits in bands with non-zero allocated bits from the highest bands are set to zero until is equal to zero. Then the iterations are terminated.
Then, a final re-distribution of over-allocated and under-allocated bits is performed and finally, the fractional parts of bit allocation are adjusted to have three bits.
(1240)
5.3.4.2.7 Pyramid Vector Quantization (PVQ) and indexing
The PVQ is a lattice quantizer, with complexity growing linearly with the vector dimension. The PVQ quantizes a dimensional vector by allocating signed pulses to match the shape of that vector. In this way the PVQ codebook is defined as a combination of signed pulses in adimensional space (with pulses at the same position having the same sign). The maximum size of PVQ codebooks is set to 32 bits, and if allocated bits are more than 32, the vector is split into sub-vectors, bits re-distributed, and a gain parameter is quantized to represent relative energy between these sub-vectors. Largest allowed target vector is of dimension 64.
5.3.4.2.7.2 PVQ split methodology
When the bits assigned for the band are above a pre-determined threshold, as described in subclause 5.3.4.2.7.2.1, an algorithm for band splitting is activated. First the input vector is split into uniform (or close to uniform) segments in a non-recursive way. Then angles, which represent the ratio between energies andof a left and a right level segment, are calculated recursively. At each iteration, the level segments consist of one or several of the pre-determined segments from the initial split of the input vector. The angles are calculated from the top level (full size of the band to be quantized), and continuing towards the levels of shorter sub-vectors, i.e. shorter level segments.
The angle is determined from the energy ratio between one left and one right level segment. In case of an even number of splits, the left and right level segments will consist of an equal number of segments. In case of odd number of splits, the angle calculated during the first iteration of the recursion will be with the right segment having a larger number of segments than the left level segment, and the recursion will require more steps for the right level segment.
For example if the bit budget for the band size require split in 3 segments (here assuming that a split into equal sized segments is possible), the angle calculation and the bit distribution is done in the following way:
Step 0: the angle and the shape bits are, the left level segment is of length, while the right level segment is of length, and bits for the two level segments are and.
Step 1: the right level segment from the previous step is split in a left and a right level segment, both with the length, the angle and the shape bits to be distributed are , the two level segments (here equal to the second and third segment of the initial split of the input vector) are allocated the bits and.
The angles are used to distribute bits recursively to the already determined segments. At each iteration, the number of bits and for a left and a right level segment are derived from the available number of bits for shape coding of these segments , lengths of the level segments and , and the angle :
(1241)
(1242)
where and are compensation factors for differences in segment lengths within the level segments, defined as
, (1242a)
, (1242b)
where the function gives the number of unit pulses that giving the segment lengths and of the segments within the left and the right level segment respectively, can be represented using bits. is the total number of pre-determined segments/splits within the left and right level segments. The function gives the number of bits used to represent the by the function determined number of unit pulses for the minimum dimension among the pre-determined segments.
The angle, which captures the relation between the energies of the left and right segment at certain split level, is defined as:
(1243)
These angles are calculated recursively, starting with the top level, and continuing towards the levels of shorter sub-vectors. The angles are quantized by a range coder with asymmetric triangular PDF.
5.3.4.2.7.2.1 Band splitting analysis
The number of segments (parts) in the split PVQ vector is determined in two steps. First, an initial number of segments is computed as
(1244)
where is the maximum bitrate for each PVQ vector segment. If and the band bit rate is high, an additional split is considered. Using segments, the energies of each PVQ target vector segment is computed.
(1245)
If the maximum absolute deviation from the mean energy ,
(1246)
is larger than the difference between the maximum number of bits for each segment and the average bit rate for each segment, an additional split will be added. That is,
(1247)
Depending on if the conditions for the additional split are met, the split increment flag or is signalled in the bitstream using 1 bit.
5.3.4.2.7.3 PVQ sub-vector shape search and shape normalization
5.3.4.2.7.3.1 PVQ-search introduction
The goal of the search procedure is to find the vector , which is defined as:
(1248)
where is a point on the surface of an -dimensional hyper-pyramid and the L1 norm of is . I.e. is the selected integer shape code vector of size N according to:
(1249)
i.e. is the unit energy normalized integer sub vector scaled with a sub vector splitting gain , (if the band is not split in sub vectors the gain is ).
The best vector is the one minimizing the mean squared shape error between the target vector and the scaled normalised quantized output vector. The target vector can be the time domain or in the frequency domain. Finding the best is achieved by minimizing the following search distortion:
(1250)
By squaring the numerator and denominator and eliminating the predetermined constant gain scale factor, we may maximize the quotient:
(1251)
The L1-norm structured PVQ-quantizer, allows for several search optimisations, where the primary optimization is to move the target to the all positive “quadrant” in -dimensional space and the second optimization is to use an L1-norm projection as a starting approximation for. A third optimization is to iteratively update the quotient, instead of re-computing equation (1251) over the whole vector space for every candidate change to the vector in pursuit of reaching the L1-norm , which is required for the subsequent PVQ-indexing step .
In the search of the optimal PVQ vector shape with L1-norm , iterative updates of the variables are made in the all positive “quadrant” in -dimensional space according to:
(1252)
(1253)
where signifies the correlation achieved so far by placing the previous unit pulses, and signifies the accumulated energy achieved so far by placing the previous unit pulses, and signifies the amplitude of at position from the previous placement of unit pulses. To further speed up the in-loop iterative processing the energy term is scaled down by 2, thus saving one multiplication in the inner-loop.
(1254)
(1255)
The best position for the ’th unit pulse, is iteratively updated by iterating over .
(1256)
To avoid divisions the maximization update decision is performed using a cross-multiplication of the saved best squared correlation numerator and the saved best energy denominator so far.
(1257)
The iterative maximization of may start from a zero number of placed unit pulses or from an adaptive lower cost pre-placement number of unit pulses, based on an integer projection to a point below the ’th-pyramid’s surface, with a guaranteed undershoot of unit pulses in the target L1 norm .
Due to the structured nature of the PVQ integer sub vector, where all possible sign combinations are allowed and it is possible to encode all sign combinations, as long as the resulting vector adheres to the L1 norm of unit pulses, the search is performed in the all positive first “quadrant”. Further to achieve as a high accuracy as possible for a limited precision implementation the maximum absolute value of the input signalis pre-analysed for future use in the setup of the inner search loop precision.
(1258)
(1259)
In case the input vector is an all zero vector or the sub-vector gain is very low, the PVQ-search is bypassed, and a valid PVQ-vector is deterministically created by assigning half of the unit pulses to the first position ( ), and the remaining unit pulses to the last position ( ). (With this approach PVQ-search complexity is reduced and the indexing complexity is spread between encoder indexing and decoder de-indexing.)
5.3.4.2.7.3.4 PVQ pre-search projection
If the pulse density ratio is larger than 0.5 unit pulses per coefficient and is larger than 1, a projection to the sub pyramid is made and used as a starting point for y, on the other hand if the pulse density is less than 0.5 or is 1, the iterative PVQ-search will start off from zero pre-placed unit pulses. The projection is performed as:
(1260)
(1261)
If no projection is made the starting point is an all zeroed vector. In preparation for the fine search to reach the’th-pyramid’s surface the accumulated number of unit pulses, the accumulated correlation and the accumulated energy for the starting point is computed as:
(1262)
(1263)
(1264)
(1265)
5.3.4.2.7.3.5 PVQ fine search
The final integer shape vector must adhere to the L1 norm of pulses. The fine search starts from a lower point in the pyramid and iteratively finds its way to the surface of the N-dimensional ’th-hyperpyramid. The -value in the fine search ranges from 1 pulse to 512 unit pulses, to keep the complexity of the search at a reasonable level, the search is split into two main branches, one branch is used when the in-loop energy representation of will stay within a signed 16 bit word, and another branch is used if the in-loop energy may or may not exceed the dynamic range of a 16 bit word.
When the final is lower than or equal to 127 unit pulses, the dynamics of the 1 bit upshifted will always stay within 15 bits, allowing efficient use of a signed 16 bit word for representing every within all the fine pulse search inner loop iterations up to .
In preparation for the next unit pulse addition, the near optimal maximum possible upshift of the next loop’s accumulated in-loop correlation value in a signed 32 bit word is pre-analysed using the previously calculated maximum absolute input value as:
(1266)
To make the critical equation (1257) update as efficient as possible, the numerator is represented by a 16 bit signed word, by the following approach:
(1267)
(1268)
where the function “Round16” extracts the top 16 bits of a signed 32 bit variable with rounding, this near optimal upshift and the use of 16 bit representation of the squared correlation bestCorrSq16 enables a very fast inner-loop search for performing the equation (1268) test and variable updates.
The location of the next unit pulse is now determined by iterating over the possible positions, while employing equations (1267), (1253) and (1268).
When the best position of the unit pulse has been determined , the accumulated correlation the accumulated inloop energy and the number of accumulated unit pulses are updated. If there are further unit pulses to add , a new inner-loop is started with a new near optimal analysis (equation (1266)) for the next unit pulse.
When the final is higher than 127 unit pulses, the dynamics of the 1 bit upshifted may exceed 15 bits, the fine search will adaptively choose between 16 bit representation and 32 bit representation of the pair . The fine search will keep track of the maximum pulse amplitude in achieved so far, this information is used in a pre-analysis step before entering the optimized inner dimension loop, to determine the precision to use for the inner unit pulse addition loop. If the pre-analysis indicates that more than a signed 16 bit word is needed to represent the in-loop energy without losing any energy information, a high precision unit pulse addition loop is employed, where both the saved best squared correlation term and the saved best accumulated energy term are represented by 32 bit words.
(1269)
(1270)
If is FALSE the lower precision inner search loop in equations (1267), (1253) and (1268) is employed, on the other hand if is TRUE, the location of the next unit pulse is determined by iterating over the possible positions, using equations (1271), (1253) and (1272).
(1271)
(1272)
When the best position of the unit pulse has been determined, the accumulated correlation , the accumulated inloop energy and the number of accumulated unit pulses are updated. Further the maximum amplitude in the best integer vector so far, is kept up to date for the next unit pulse addition loop.
(1273)
If there are further unit pulses to add , a new inner-loop is started with a new near optimal analysis equation (1266) and a new energy precision analysis equations (1269) and (1270) and then commencing the next unit pulse loop equations (1271), (1253) and (1272)).
The effect of the in-loop accumulated energy based inner loop precision selection is that target sub vectors that have a high peakiness, or have very fine shape granularity (final is high) will be using the higher precision loop, while non-peaky or lower shape granularity sub vectors will more often use the lower precision loop.
5.3.4.2.7.3.6 PVQ sub-vector finalization and normalization
After shape search each non-zero PVQ-sub-vector element is assigned its proper sign and the vector is L2-normalized to unit energy. Additionally if the band was split, it is further scaled with the previously determined sub-vector gain.
(1274)
(1275)
(1276)
5.3.4.2.7.4 PVQ short codeword indexing
The PVQ short codeword indexing assigns a unique index to any possible vector in. The incoming vector, which represents a point on the K’th hyper-pyramid in N-dimensional space is indexed into two short codeword(s) using a leading sign sections Modular PVQ recursive indexing method denoted. After the MPVQ indexing procedure, the integer codewords and the size of each codeword are sent to a Range encoder, an arithmetic encoder, which in turn provides a stream of arithmetically encoded bits to the to the bit stream.
5.3.4.2.7.4.1 MPVQ modular leading sign recursion definition
The first codeword represents a leading sign, which is the sign of the first non-zero position in, the second codeword is a recursive representation of the amplitudes (including zeroes) and leading signs of the remaining vector after the first occurring non-zero position sign in has been extracted. The modular leading sign extraction is performed recursively until all amplitudes (including zeroes) and signs have been consumed.
The number of possible combinations ofis denoted. The number of possible combinations of, anvector with unit pulses in dimensionwith an L1-norm of and an initial positive sign is denoted. The elimination of the initial leading sign from yields the number of possible entries in as:
(1277)
The MPVQ-index for vector is based on the recursive scheme where we:
Define as the number of integer vectors of dimension and L1-norm of, that does not have a leading zero, and does not have the leading value k, has a leading positive value, and has a positive leading sign, (where the leading sign is the first sign encountered after the current value in the direction of the recursion.
Define as the number of integer vectors of dimension and L1-norm of , that has a positive leading value and that does not have a leading zero.
See table 148 for a structured view of and its relation to the total number of vectors in the structure with vectors.
Table 148: MPVQ recursion overview: the three subsections for theleading sign based recursion
|
Subsection |
Number of entries for each subsection |
|---|---|
|
1 |
|
|
All unit pulses consumed. |
|
(non-zero with a positive or negative next leading sign) |
|
|
First encountered non-zero position requires 1 bit of next leading sign information, The recursion consumes the amplitude value and moves on to |
|
|
Recursion moves on to without any additional next leading sign information |
The recursive definition of the total number of entries in becomes:
(1278)
By applying Fischer’soriginal recursion fortogether with equation (1277) and (1278) , we establish that:
(1279)
, from the definition (and table 148) one can find that the relation betweenandis:
(1280)
where ( the “1” comes from the single initial-“k” valued vector, and the factor “2” is due to positive and negative sign possibilities of the coming next leading sign.
On the encoder side recursive relations are used for the forward update of the MPVQ indexing offsets and as the dimension increases up to the size of the vector to be indexed:
(1281)
(1282)
where the low dynamic range calculation equation (1282) is used for the last column forward update of the MPVQ offset matrix, a column which is needed for the calculation of the exact size. Anbased recursion is used for the offset matrix columns, as these columns are known in advance to have a low dynamic range. To keep the dynamic range within a signed 32 bit word, the finalsize calculation is performed as:
(1283)
5.3.4.2.7.4.2 Detailed MPVQ indexing approach
The maximum index range of the found PVQ-vector ’s is [0.. 232-1 ] and the total resulting index would require an unsigned 32 bit word, however with the introduction of the modular leading sign approach the maximum index range is reduced to [0.. 231-1 ], which enables fast implementation using signed 32 bit word arithmetic.
The MPVQ-indexing scheme on the encoder side is run in the order of position to position 0, (on the decoder side the de-indexing will start from position 0 and end at position). The MPVQ indexing loop is carried out according to figure 84, where is the row number of the MPVQ offset matrix,is the pointer into the samples/coefficients of the PVQ-vector , “>>1” denotes a 1 bit right shift and further the function “UpdateOffsetsFwd” iteratively updates the required MPVQ-offsets for the next larger dimension using combinations of equations (1281), (1280) and (1282).
Figure 84: Detailed MPVQ-indexing, leading sign index extraction and size calculation
To speed up the indexing when the number of unit pulses is high (and the dimension is low), direct functions are used to calculate both the MPVQ-offsets and the MPVQ-size.
5.3.4.2.7.5 High Dynamic Range Arithmetic Encoding
The Split PVQ scheme relies on a Range encoder to encode symbols with higher granularity than bits. A high dynamic range implementation, where of up to 24 bits resolution may be used for the CDF is used to encode short codewords in three distinct ways:
- Direct bit encoding
- 1/8 bit resolution uniform PDF encoding
- Asymmetric triangular PDF encoding of band split energy angles, using tailor made CDFs.
Further special range encoder functions are used to determine the remaining bit budget in the frame and band quantization loops, and used to correct the bit budget for coming frames or coming bands.