5.7 Adaptive codebook

26.1903GPPAdaptive Multi-Rate - Wideband (AMR-WB) speech codecSpeech codec speech processing functionsTranscoding functionsTS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

Adaptive codebook search is performed on a subframe basis. It consists of performing closed loop pitch search, and then computing the adaptive codevector by interpolating the past excitation at the selected fractional pitch lag.

The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the search stage, the excitation is extended by the LP residual to simplify the closed-loop search.

In 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, in the first and third subframes, a fractional pitch delay is used with resolutions 1/4 in the range[34, 127], resolutions 1/2 in the range [128, 159], and integers only in the range [160, 231]. For the second and fourth subframes, a pitch resolution of 1/4 is always used in the range [T₁-8, T₁+7], where T₁ is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe.

In 8.85 kbit/s mode, in the first and third subframes, a fractional pitch delay is used with resolutions 1/2 in the range [34, 91], and integers only in the range [92, 231]. For the second and fourth subframes, a pitch resolution of 1/2 is always used in the range [T₁-8, T₁+7], where is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe.

In 6.60 kbit/s mode, in the first subframe, a fractional pitch delay is used with resolutions 1/2 in the range [34,91], and integers only in the range [92, 231]. For the second, third and fourth subframes, a pitch resolution of 1/2 is always used in the range [T₁-8, T₁+7], where is nearest integer to the fractional pitch lag of the first subframe.

Closed-loop pitch analysis is performed around the open-loop pitch estimates on a subframe basis. In 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, in the first (and third) subframe the range T_op7, bounded by 34…231, is searched. In 6.60 kbit/s mode, in the first subframe the range T_op7, bounded by 34…231, is searched. For all the modes, for the other subframes, closed-loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. In 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, the pitch delay is encoded with 9 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits. In 8.85 kbit/s mode, the pitch delay is encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 5 bits. In 6.60 kbit/s mode, the pitch delay is encoded with 8 bits in the first subframe and the relative delay of the other subframes is encoded with 5 bits.

The closed loop pitch search is performed by minimizing the mean-square weighted error between the original and synthesized speech. This is achieved by maximizing the term

( 37 )

where x(n) is the target signal and y_k(n) is the past filtered excitation at delay (past excitation convolved with h(n)). Note that the search range is limited around the open-loop pitch as explained earlier.

The convolution y_k(n) is computed for the first delay in the searched range, and for the other delays, it is updated using the recursive relation

( 38 )

where u(n),n=–(231+17),…,63, is the excitation buffer. Note that in search stage, the samples , are not known, and they are needed for pitch delays less than 64. To simplify the search, the LP residual is copied to u(n) in order to make the relation in Equation (38) valid for all delays.

Once the optimum integer pitch delay is determined, the fractions from to with a step of around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in Equation (37) and searching for its maximum. Once the fractional pitch lag is determined, v'(n) is computed by interpolating the past excitation signal u(n) at the given phase (fraction). (The interpolation is performed using two FIR filters (Hamming windowed sinc functions); one for interpolating the term in Equation (34) with the sinc truncated at 17 and the other for interpolating the past excitation with the sinc truncated at 63). The filters have their cut-off frequency (-3 dB) at 6000 Hz in the oversampled domain, which means that the interpolation filters exhibit low-pass frequency response Thus, even when the pitch delay is an integer value, the adaptive codebook excitation consists of a low-pass filtered version of the past excitation at the given delay and not a direct copy thereof. Further, for delays smaller than the subframe size, the adaptive codebook excitation is completed based on the low-pass filtered interpolated past excitation and not by repeating the past excitation.

In order to enhance the pitch prediction performance in wideband signals, a frequency-dependant pitch predictor is used. This is important in wideband signals since the periodicity doesn’t necessarily extend over the whole spectrum. In this algorithm, there are two signal paths associated to respective sets of pitch codebook parameters, wherein each signal path comprises a pitch prediction error calculating device for calculating a pitch prediction error of a pitch codevector from a pitch codebook search device. One of these two paths comprises a low-pass filter for filtering the pitch codevector and the pitch prediction error is calculated for these two signal paths. The signal path having the lowest calculated pitch prediction error is selected, along with the associated pitch gain.

The low pass filter used in the second path is in the form B_LP(z)=0.18z+0.64+0.18z^-1. Note that 1 bit is used to encode the chosen path.

Thus, for 12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s modes, there are two possibilities to generate the adaptive codebook v(n), in the first path, or in the second path, where b_LP=[0.18,0.64,0.18]. The path which results in minimum energy of the target signal x₂(n) defined in Equation (40) is selected for the filtered adaptive codebook vector. For 6.60 and 8.85 kbit/s modes, v(n) is always .

The adaptive codebook gain is then found by

( 39 )

where is the filtered adaptive codebook vector (zero-state response of to v_i(n)). To insure stability, the adaptive codebook gain g_p is bounded by 0.95, if the adaptive codebook gains of the previous subframes have been small and the LP filters of the previous subframes have been close to being unstable.