5.7.2 Algebraic codebook search

26.0903GPPAdaptive Multi-Rate (AMR) speech codecMandatory speech CODEC speech processing functionsRelease 17Transcoding functionsTS

The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesized speech. The target signal used in the closed‑loop pitch search is updated by subtracting the adaptive codebook contribution. That is:

(42)

where is the filtered adaptive codebook vector and is the quantified adaptive codebook gain. If is the algebraic codevector at index , then the algebraic codebook is searched by maximizing the term:

, (43)

where is the correlation between the target signal and the impulse response , is a the lower triangular Toepliz convolution matrix with diagonal and lower diagonals , and is the matrix of correlations of . The vector (backward filtered target) and the matrix are computed prior to the codebook search. The elements of the vector are computed by

, (44)

and the elements of the symmetric matrix are computed by:

. (45)

The algebraic structure of the codebooks allows for very fast search procedures since the innovation vector contains only a few nonzero pulses. The correlation in the numerator of Equation (43) is given by:

, (46)

where is the position of the th pulse, is its amplitude, and is the number of pulses (). The energy in the denominator of equation (43) is given by:

(47)

To simplify the search procedure, the pulse amplitudes are preset by the mere quantization of an appropriate signal . This is simply done by setting the amplitude of a pulse at a certain position equal to the sign of at that position. The simplification proceeds as follows (prior to the codebook search). First, the sign signal and the signal are computed. Second, the matrix is modified by including the sign information; that is, . The correlation in equation (46) is now given by:

(48)

and the energy in equation (47) is given by:

(49)

12.2 kbit/s mode

In this case the signal , used for presetting the amplitudes, is a sum of the normalized vector and normalized long‑term prediction residual :

(50)

is used. Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an efficient non‑exhaustive analysis‑by‑synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of position combinations.

First, for each of the five tracks the pulse positions with maximum absolute values of are searched. From these the global maximum value for all the pulse positions is selected. The first pulse i0 is always set into the position corresponding to the global maximum value.

Next, four iterations are carried out. During each iteration the position of pulse i1 is set to the local maximum of one track. The rest of the pulses are searched in pairs by sequentially searching each of the pulse pairs {i2,i3}, {i4,i5}, {i6,i7} and {i8,i9} in nested loops. Every pulse has 8 possible positions, i.e., there are four 8×8‑loops, resulting in 256 different combinations of pulse positions for each iteration.

In each iteration all the 9 pulse starting positions are cyclically shifted, so that the pulse pairs are changed and the pulse i1 is placed in a local maximum of a different track. The rest of the pulses are searched also for the other positions in the tracks. At least one pulse is located in a position corresponding to the global maximum and one pulse is located in a position corresponding to one of the 4 local maxima.

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre‑filter which enhances special spectral components in order to improve the synthesized speech quality. Here the filter is used, where is the nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe, and is the quantized pitch gain of the current subframe bounded by [0.0,1.0]. Note that prior to the codebook search, the impulse response must include the pre‑filter . That is, for values of less than 40, the impulse is modified according to

(50a)

The fixed codebook gain is then found by:

(51)

where is the target vector for fixed codebook search and is the fixed codebook vector convolved with ,

(52)

10.2 kbit/s mode

In this case the signal , used for presetting the amplitudes, is given by eq. (50). Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an efficient non‑exhaustive analysis‑by‑synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of position combinations.

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre‑filter which enhances special spectral components in order to improve the synthesized speech quality. Here the filter is used, where is the nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe, and is the quantized pitch gain of the previous subframe bounded by [0.0,0.8]. Note that prior to the codebook search, the impulse response must include the pre‑filter . That is, for values of less than 40, the impulse is modified according to equation (50a).

The fixed codebook gain is then found by equation (51).

7.95, 7.40 kbit/s modes

In this case the signal, used for presetting the amplitudes, is equal to the signal . Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an efficient non‑exhaustive analysis‑by‑synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of position combinations.

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre‑filter which enhances special spectral components in order to improve the synthesized speech quality. Here the filter is used, where is the nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe, and is the quantized pitch gain of the previous subframe bounded by [0.0,0.8]. Note that prior to the codebook search, the impulse response must include the pre‑filter . That is, for values of less than 40, the impulse is modified according to equation (50a).

The fixed codebook gain is then found by equation (51).

6.70 kbit/s mode

In this case the signal , used for presetting the amplitudes, is equal to the signal . Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an efficient non‑exhaustive analysis‑by‑synthesis search technique. In this technique, the term in equation (43) is tested for a small percentage of position combinations.

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre‑filter which enhances special spectral components in order to improve the synthesized speech quality. Here the filter is used, where is the nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe, and is the quantized pitch gain of the previous subframe bounded by [0.0,0.8]. Note that prior to the codebook search, the impulse response must include the pre‑filter . That is, for values of less than 40, the impulse is modified according to equation (50a).

The fixed codebook gain is then found by equation (51).

5.90 kbit/s mode

In this case the signal , used for presetting the amplitudes, is equal to the signal . Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an exhaustive analysis‑by‑synthesis search technique.

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre‑filter which enhances special spectral components in order to improve the synthesized speech quality. Here the filter is used, where is the nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe, and is the quantized pitch gain of the previous subframe bounded by [0.0,0.8]. Note that prior to the codebook search, the impulse response must include the pre‑filter . That is, for values of less than 40, the impulse is modified according to equation (50a).

The fixed codebook gain is then found by equation (51).

5.15, 4.75 kbit/s modes

In this case the signal , used for presetting the amplitudes, is equal to the signal . Having preset the pulse amplitudes, as explained above, the optimal pulse positions are determined using an exhaustive analysis‑by‑synthesis search technique. Note that both subsets are searched.

A special feature incorporated in the codebook is that the selected codevector is filtered through an adaptive pre‑filter which enhances special spectral components in order to improve the synthesized speech quality. Here the filter is used, where is the nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe, and is the quantized pitch gain of the previous subframe for the 5.15 kbit/s mode and the previous odd subframe for the 4.75 kbit/s mode bounded by [0.0,0.8]. Note that prior to the codebook search, the impulse response must include the pre‑filter . That is, for values of less than 40, the impulse is modified according to equation (50a).

The fixed codebook gain is then found by equation (51).