5.7 AMR-WB-interoperable modes

26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS

The EVS codec supports modes to allow for interoperability with AMR-WB (and ITU-T G.722.2). The inclusion of the interoperable modes has been streamlined due to the fact that the core ACELP coder described in subclause 5.2 is similar to AMR-WB (when operating at 12.8 kHz internal sampling, using the same pre emphasis and perceptual weighting, etc.).

5.7.1 Pre-processing

The high-pass filtering, sampling conversion, pre-emphasis, spectral analysis, signal activity detection functions are the same as those described in subclause 5.1.

5.7.2 Linear prediction analysis and quantization

5.7.2.1 Windowing and autocorrelation computation

Short-term prediction analysis is performed once per speech frame using the autocorrelation approach per subclause 5.1.9. However, the 30 ms asymmetric window defined in subclause 5.2.1 of [9] and a look‑ahead of 5 ms are used in the autocorrelation calculation. The frame structure is depicted in figure .

Figure 86: Relative positions and length of the LP analysis windows for the AMR-WB interoperable option

The autocorrelations of the windowed signal are computed in the same way as described in subclause 5.1.9.2, except that = 384 in equation (45). Note that the autocorrelations are computed in the same way as in subclause 5.2.1 of [9] but with a different white noise correction factor value and lag windowing as described in subclause 5.1.9.3 of this Specification.

5.7.2.2 LevinsonDurbin algorithm

The Levinson-Durbin algorithm is the same as in subclause 5.5.1.3.

5.7.2.3 LP to ISP conversion

For a linear predictive model A(z) of order m we can define the line spectral polynomials as

(1405)

where l=1 for the line spectrum polynomials and l=0 for the immittance spectrum polynomials. These polynomials P(z) and Q(z) are symmetric and anti-symmetric, respectively, such that the point of symmetry is . It follows that when evaluating at the unit circle then the obtained spectra of and will be real and imaginary, respectively. Further, since polynomials P(z) and Q(z) have roots on the unit circle, they can be located by a zero-crossing search of the two spectra.

The evaluation of on the unit circle can be implemented with an FFT of length N=256. Since the two spectra are imaginary, we can evaluate both spectra simultaneously, that is, since , we can determine the spectrum of 2 by an FFT, multiply by and then obtain the two spectra in the real and imaginary parts. Scaling by the factor 2 does not influence location of zeros, whereby it can be omitted.

To reduce numerical range of the spectrum, we can convolve by a filter , where

()

and the constants are and . That is, we calculate the spectrum of and multiply with the phase-shift .

When calculating the FFT, we can reduce complexity by applying pruning methods. That is, since is a sequence of length , but the FFT is of length , we can omit all those operations which involve computations with the zeros.

5.7.2.4 ISP to LP conversion

The ISP to LP conversion is the same as in subclause 5.2.4 of [9].

5.7.2.5 Quantization of the ISP coefficients

For interoperability reasons, ISF quantization is the same as in subclause 5.2.5 of [9].

5.7.2.6 Interpolation of the ISPs

The set of LP parameters is used for the 4th subframe, whereas the 1st, 2nd and 3rd subframes use a linear interpolation of the parameters in the adjacent frames. The interpolation is performed on the ISPs in the domain. Let be the ISP vector at the 4th subframe of the current frame, and is the ISP vector at the 4th subframe of the previous frame. The interpolated ISP vectors at the 1st, 2nd and 3rd subframes are given by

()

The same formula is used for interpolation of both quantized and unquantized ISPs. The interpolated ISP vectors are used to compute a different LP filter at each subframe (both quantized and unquantized) using the ISP to LP conversion method described in subclause 5.2.4 of [9].

5.7.3 Perceptual weighting

Perceptual weighting is performed as described in subclause 5.1.10.1 for a sub-frame size = 64.

5.7.4 Open-loop pitch analysis

The open-loop pitch analysis is performed as described in subclause 5.1.10.

5.7.5 Impulse response computation

Same as subclause 5.2.3.1.3.

5.7.6 Target signal computation

Same as subclause 5.2.3.1.2.

5.7.7 Adaptive codebook search

Same as subclause 5.2.3.1.4.

5.7.8 Algebraic codebook search

For interoperability reasons, the algebraic codebook structure and pulse indexing is the same as clauses 5.8.1 and 5.8.2 of [9]. The algebraic codebook search procedure is the same as described in clause 5.8.3 of [9] except the pulse-sign pre-selection described in the last paragraph of Clause 5.2.3.1.5.9 (The search criterion at lower bitrates), which is also used at the lowest bit-rate of AMR-WB-interoperable modes.

5.7.9 Quantization of the adaptive and fixed codebook gains

For interoperability reasons, the quantization of gains is conducted in the same manner as described in subclause 5.9 of [9].

5.7.10 Memory update

The memory update for AMR-WB interoperable modes is similar to subclause 5.10 of [9], however some extra states defined in the EVS codec are also updated to maintain a consistent operation for the next coding frame when the last frame was coded with an AMR-WB interoperable mode.

5.7.11 High-band gain generation

For interoperability reasons, the quantization of high-band gain in each 5ms sub-frame is conducted in the same manner as described in subclause 5.11 of [9].

5.7.12 CNG coding

The CNG encoding in AMR-WB-interoperable mode is described by referring to subclause 5.6.2 with several differences described below. Instead of the LSF vector which is quantized and transmitted in the SID frame in primary mode, the ISF vector is used for quantization and transmitted in the SID frame in AMR-WB-interoperable mode. 28 bits are used for ISF quantization that is one bit less than the primary mode. The quantization of the ISF vector is described in [9]. The excitation energy used for quantization and transmission in the SID frame is calculated in the same way as described in subclause 5.6.2.1.5. However, the excitation energy is quantized in the AMR-WB-interoperable mode using different numbers of bits and a different quantization step-size from the primary mode. 6 bits are used for quantization in AMR-WB-interoperable mode, instead of the 7 bits used in the primary mode. The quantization step Δ described in subclause 5.6.2.1.5 is 2.625 in the AMR-WB-interoperable mode, instead of 5.25 in primary mode. The quantization index is also limited to [0, 63] in AMR-WB-interoperable mode, instead of to [0, 127] in primary mode. The smoothed LP synthesis filter, , used for local CNG synthesis is obtained in the same way as described in subclause 5.6.2.1.4. And the smoothed quantized excitation energy,, used for local CNG synthesis is also obtained in the same way as described in subclause 5.6.2.1.6. However, the CNG type flag bit, the bandwidth indicator bit, the core sampling rate indicator bit, the hangover frame counter bits and the Low-band excitation spectral envelope bits used in primary SID mode are not encoded into the SID frame in AMR-WB-interoperable mode, but one dithering bit (always set to 0) is always encoded. The high-band analysis and quantization is also ignored for AMR-WB-interoperable mode, so the high-band energy bits are also not encoded into the AMR-WB-interoperable mode SID frame. Local CNG synthesis is performed by filtering the excitation signal (see subclause 6.8.4) through the smoothed LP synthesis filter.