5.6.1 Adaptive codebook search

26.0903GPPAdaptive Multi-Rate (AMR) speech codecMandatory speech CODEC speech processing functionsRelease 17Transcoding functionsTS

Adaptive codebook search is performed on a subframe basis. It consists of performing closed‑loop pitch search, and then computing the adaptive codevector by interpolating the past excitation at the selected fractional pitch lag.

The adaptive codebook parameters (or pitch parameters) are the delay and gain of the pitch filter. In the adaptive codebook approach for implementing the pitch filter, the excitation is repeated for delays less than the subframe length. In the search stage, the excitation is extended by the LP residual to simplify the closed‑loop search.

12.2 kbit/s mode

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/6 in the range and integers only in the range [95, 143]. For the second and fourth subframes, a pitch resolution of 1/6 is always used in the range , where is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe, bounded by 18…143.

Closed‑loop pitch analysis is performed around the open‑loop pitch estimates on a subframe basis. In the first (and third) subframe the range , bounded by 18…143, is searched. For the other subframes, closed‑loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 9 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits.

The closed‑loop pitch search is performed by minimizing the mean‑square weighted error between the original and synthesized speech. This is achieved by maximizing the term:

(37)

where is the target signal and is the past filtered excitation at delay (past excitation convolved with ). Note that the search range is limited around the open‑loop pitch as explained earlier.

The convolution is computed for the first delay in the searched range, and for the other delays in the search range , it is updated using the recursive relation:

, (38)

and , where , is the excitation buffer. Note that in search stage, the samples, are not known, and they are needed for pitch delays less than 40. To simplify the search, the LP residual is copied to in order to make the relation in equation (38) valid for all delays.

Once the optimum integer pitch delay is determined, the fractions from –3/6 to 3/6 with a step of 1/6 around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and searching for its maximum. The interpolation is performed using an FIR filter based on a Hamming windowed function truncated at  23 and padded with zeros at  24 (). The filter has its cut‑off frequency (‑3 dB) at 3 600 Hz in the over‑sampled domain. The interpolated values of for the fractions –3/6 to 3/6 are obtained using the interpolation formula:

(39)

where corresponds to the fractions 0, 1/6, 2/6, 3/6, -2/6, and –1/6, respectively. Note that it is necessary to compute the correlation terms in equation (37) using a range to allow for the proper interpolation.

Once the fractional pitch lag is determined, the adaptive codebook vector is computed by interpolating the past excitation signal at the given integer delay and phase (fraction) :

(40)

The interpolation filter is based on a Hamming windowed function truncated at  59 and padded with zeros at  60 (). The filter has a cut‑off frequency (‑3 dB) at 3 600 Hz in the over‑sampled domain.

The adaptive codebook gain is then found by:

(41)

where is the filtered adaptive codebook vector (zero state response of to ).

The computed adaptive codebook gain is quantified using 4‑bit non‑uniform scalar quantization in the range [0.0,1.2].

7.95 kbit/s mode

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/3 in the range and integers only in the range [85, 143]. For the second and fourth subframes, a pitch resolution of 1/3 is always used in the range , where is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe, bounded by 20…143.

Closed‑loop pitch analysis is performed around the open‑loop pitch estimates on a subframe basis. In the first (and third) subframe the range , bounded by 20…143, is searched. For the other subframes, closed‑loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 6 bits.

The closed‑loop pitch search is performed by minimizing the mean‑square weighted error between the original and synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited around the open‑loop pitch as explained earlier.

The convolution is computed for the first delay in the searched range, and for the other delays in the search range , it is updated using the recursive relation of equation (38).

Once the optimum integer pitch delay is determined, the fractions from –2/3 to 2/3 with a step of 1/3 around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector is computed by interpolating the past excitation signal at the given integer delay and phase (fraction). The interpolation is performed using two FIR filters (Hamming windowed sinc functions); one for interpolating the term in equation (37) with the sinc truncated at  11 and the other for interpolating the past excitation with the sinc truncated at  29. The filters have their cut‑off frequency (‑3 dB) at 3 600 Hz in the over‑sampled domain.

The adaptive codebook gain is then found as in equation (41).

The computed adaptive codebook gain is quantified using 4‑bit non‑uniform scalar quantization as described in clause 5.8.

10.2, 7.40 kbit/s mode

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/3 in the range and integers only in the range [85, 143]. For the second and fourth subframes, a pitch resolution of 1/3 is always used in the range , where is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe, bounded by 20…143.

Closed‑loop pitch analysis is performed around the open‑loop pitch estimates on a subframe basis. In the first (and third) subframe the range , bounded by 20…143, is searched. For the other subframes, closed‑loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 5 bits.

The closed‑loop pitch search is performed by minimizing the mean‑square weighted error between the original and synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited around the open‑loop pitch as explained earlier.

The convolution is computed for the first delay in the searched range, and for the other delays in the search range , it is updated using the recursive relation of equation (38).

Once the optimum integer pitch delay is determined, the fractions from –2/3 to 2/3 with a step of 1/3 around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector is computed by interpolating the past excitation signal at the given integer delay and phase (fraction). The interpolation is performed using two FIR filters (Hamming windowed sinc functions); one for interpolating the term in equation (37) with the sinc truncated at  11 and the other for interpolating the past excitation with the sinc truncated at  29. The filters have their cut‑off frequency (‑3 dB) at 3 600 Hz in the over‑sampled domain.

The adaptive codebook gain is then found as in equation (41).

The computed adaptive codebook gain (and the fixed codebook gain) is quantified using 7‑bit non‑uniform vector quantization as described in clause 5.8.

6.70, 5.90 kbit/s modes

In the first and third subframes, a fractional pitch delay is used with resolutions: 1/3 in the range and integers only in the range [85, 143]. For the second and fourth subframes, integer pitch resolution is used in the range , where is nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe, bounded by 20…143. Additionally, a fractional resolution of 1/3 is used in the range .

Closed‑loop pitch analysis is performed around the open‑loop pitch estimates on a subframe basis. In the first (and third) subframe the range , bounded by 20…143, is searched. For the other subframes, closed‑loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 8 bits in the first and third subframes and the relative delay of the other subframes is encoded with 4 bits.

The closed‑loop pitch search is performed by minimizing the mean‑square weighted error between the original and synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited around the open‑loop pitch as explained earlier.

The convolution is computed for the first delay in the searched range, and for the other delays in the search range , it is updated using the recursive relation of equation (38).

Once the optimum integer pitch delay is determined, the fractions from –2/3 to 2/3 with a step of 1/3 around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector is computed by interpolating the past excitation signal at the given integer delay and phase (fraction). The interpolation is performed using two FIR filters (Hamming windowed sinc functions); one for interpolating the term in equation (37) with the sinc truncated at  11 and the other for interpolating the past excitation with the sinc truncated at  29. The filters have their cut‑off frequency (‑3 dB) at 3 600 Hz in the over‑sampled domain.

The adaptive codebook gain is then found as in equation (41).

The computed adaptive codebook gain (and the fixed codebook gain) is quantified using vector quantization as described in clause 5.8.

5.15, 4.75 kbit/s modes

In the first subframe, a fractional pitch delay is used with resolutions: 1/3 in the range and integers only in the range [85, 143]. For the second, third, and fourth subframes, integer pitch resolution is used in the range , where is nearest integer to the fractional pitch lag of the previous subframe, bounded by 20…143. Additionally, a fractional resolution of 1/3 is used in the range .

Closed‑loop pitch analysis is performed around the open‑loop pitch estimates on a subframe basis. In the first subframe the range Top  5, bounded by 20…143, is searched. For the other subframes, closed‑loop pitch analysis is performed around the integer pitch selected in the previous subframe, as described above. The pitch delay is encoded with 8 bits in the first subframe and the relative delay of the other subframes is encoded with 4 bits.

The closed‑loop pitch search is performed by minimizing the mean‑square weighted error between the original and synthesized speech. This is achieved by maximizing the term of equation (37). Note that the search range is limited around the open‑loop pitch as explained earlier.

The convolution is computed for the first delay in the searched range, and for the other delays in the search range , it is updated using the recursive relation of equation (38).

Once the optimum integer pitch delay is determined, the fractions from –2/3 to 2/3 with a step of 1/3 around that integer are tested. The fractional pitch search is performed by interpolating the normalized correlation in equation (37) and searching for its maximum. Once the fractional pitch lag is determined, the adaptive codebook vector is computed by interpolating the past excitation signal at the given integer delay and phase (fraction). The interpolation is performed using two FIR filters (Hamming windowed sinc functions); one for interpolating the term in equation (37) with the sinc truncated at  11 and the other for interpolating the past excitation with the sinc truncated at  29. The filters have their cut‑off frequency (‑3 dB) at 3 600 Hz in the over‑sampled domain.

The adaptive codebook gain is then found as in equation (41).

The computed adaptive codebook gain (and the fixed codebook gain) is quantified using vector quantization as described in clause 5.8.