5.8 Algebraic codebook
26.1903GPPAdaptive Multi-Rate - Wideband (AMR-WB) speech codecSpeech codec speech processing functionsTranscoding functionsTS
5.8.1 Codebook structure
The codebook structure is based on interleaved single-pulse permutation (ISPP) design. The 64 positions in the codevector are divided into 4 tracks of interleaved positions, with 16 positions in each track. The different codebooks at the different rates are constructed by placing a certain number of signed pulses in the tracks (from 1 to 6 pulses per track). The codebook index, or codeword, represents the pulse positions and signs in each track. Thus, no codebook storage is needed, since the excitation vector at the decoder can be constructed through the information contained in the index itself (no lookup tables).
An important feature of the used codebook is that it is a dynamic codebook consisting of an algebraic codebook followed by an adaptive prefilter F(z) which enhances special spectral components in order to improve the synthesis speech quality. A prefilter relevant to wideband signals is used whereby F(z) consists of two parts: a periodicity enhancement part 1/(1-0.85z–T) and a tilt part (1 – 1 z-1), where T is the integer part of the pitch lag and 1 is related to the voicing of the previous subframe and is bounded by [0.0,0.5]. The codebook search is performed in the algebraic domain by combining the filter F(z) with the weighed synthesis filter prior to the coddedbook search. Thus, the impulse response h(n) must be modified to include the prefilter F(z). That is, .
The codebook structures of different bit rates are given below.
5.8.1.1 23.85 and 23.05 kbit/s mode
In this codebook, the innovation vector contains 24 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each track contains six pulses, as shown in Table 4.
Table 4. Potential positions of individual pulses in the algebraic codebook, 23.85 and 23.05 kbit/s
Track |
Pulse |
Positions |
1 |
i0, i4, i8, i12, i16, i20 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5, i9, i13, i17, i21 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6, i10, i14, i18, i22 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7, i11, i15, i19, i23 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
The six pulses in one track are encoded with 22 bits.
This gives a total of 88 bits (22+22+22+22) for the algebraic code.
5.8.1.2 19.85 kbit/s mode
In this codebook, the innovation vector contains 18 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each of the first two tracks contains five pulses and each of the other tracks contains four pulses, as shown in Table 5.
Table 5. Potential positions of individual pulses in the algebraic codebook, 19.85 kbit/s
Track |
Pulse |
Positions |
1 |
i0, i4, i8, i12, i16 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5, i9, i13, i17 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6, i10, i14 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7, i11, i15 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
The five pulses in one track are encoded with 20 bits. The four pulses in one track is encoded with 16 bits.
This gives a total of 72 bits (20+20+16+16) for the algebraic code.
5.8.1.3 18.25 kbit/s mode
In this codebook, the innovation vector contains 16 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each track contains four pulses, as shown in Table 6.
Table 6. Potential positions of individual pulses in the algebraic codebook, 18.25 kbit/s
Track |
Pulse |
Positions |
1 |
i0, i4, i8, i12 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5, i9, i13 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6, i10, i14 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7, i11, i15 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
The four pulses in one track are encoded with 16 bits.
This gives a total of 64 bits (16+16+16+16) for the algebraic code.
5.8.1.4 15.85 kbit/s mode
In this codebook, the innovation vector contains 12 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each track contains three pulses, as shown in Table 7.
Table 7. Potential positions of individual pulses in the algebraic codebook, 15.85 kbit/s
Track |
Pulse |
Positions |
1 |
i0, i4, i8 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5, i9 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6, i10 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7, i11 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
The three pulses in one track are encoded with 13 bits.
This gives a total of 52 bits (13+13+13+13) for the algebraic code.
5.8.1.5 14.25 kbit/s mode
In this codebook, the innovation vector contains 10 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each track contains two or three pulses, as shown in Table 8.
Table 8. Potential positions of individual pulses in the algebraic codebook, 14.25 kbit/s
Track |
Pulse |
Positions |
1 |
i0, i4, i8 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5, i9 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
Each two pulse positions in one track are encoded with 8 bits (4 bits for the position of every pulse), and the sign of the first pulse in the track is encoded with 1 bit.
The three pulse in one track are encoded with 13 bits.
This gives a total of 44 bits (13+13+9+9) for the algebraic code.
5.8.1.6 12.65 kbit/s mode
In this codebook, the innovation vector contains 8 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each track contains two pulses, as shown in Table 9.
Table 9. Potential positions of individual pulses in the algebraic codebook, 12.65 kbit/s
Track |
Pulse |
Positions |
1 |
i0, i4 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1, i5 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2, i6 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3, i7 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
Each two pulse positions in one track are encoded with 8 bits (total of 32 bits, 4 bits for the position of every pulse), and the sign of the first pulse in the track is encoded with 1 bit (total of 4 bits). This gives a total of 36 bits for the algebraic code.
5.8.1.7 8.85 kbit/s mode
In this codebook, the innovation vector contains 4 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 4 tracks, where each track contains one pulse, as shown in Table 10.
Table 10. Potential positions of individual pulses in the algebraic codebook, 8.85 kbit/s
Track |
Pulse |
Positions |
1 |
i0 |
0, 4, 8, 12, 16, 20, 24, 28, 32 36, 40, 44, 48, 52, 56, 60 |
2 |
i1 |
1, 5, 9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49, 53, 57, 61 |
3 |
i2 |
2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50, 54, 58, 62 |
4 |
i3 |
3, 7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47, 51, 55, 59, 63 |
Each pulse position in one track are encoded with 4 bits and the sign of the pulse in the track is encoded with 1 bit. This gives a total of 20 bits for the algebraic code.
5.8.1.8 6.60 kbit/s mode
In this codebook, the innovation vector contains 2 non‑zero pulses. All pulses can have the amplitudes +1 or ‑1. The 64 positions in a subframe are divided into 2 tracks, where each track contains one pulse, as shown in Table 11.
Table 11. Potential positions of individual pulses in the algebraic codebook, 6.60 kbit/s
Track |
Pulse |
Positions |
1 |
i0 |
0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62 |
2 |
i1 |
1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63 |
Each pulse position in one track are encoded with 5 bits and the sign of the pulse in the track is encoded with 1 bit. This gives a total of 12 bits for the algebraic code.
5.8.2 Pulse indexing
In the above section, the number of bits needed to encode a number of pulses in a track was given. In this section, the procedures used for encoding from 1 to 6 pulses per track will be described. The description will be given for the case of 4 tracks per subframe, with 16 positions per track and pulse spacing of 4 (which is the case for all modes except the 6.6 kbit/s mode).
Encoding 1 signed pulse per track
The pulse position index is encoded with 4 bits and the sign index with 1 bit. The position index is given by the pulse position in the subframe divided by the pulse spacing (integer division). The division remainder gives the track index. For example, a pulse at position 31 has a position index of 31/4 = 7 and it belong to the track with index 3 (4th track).
The sign index here is set to 0 for positive signs and 1 for negative signs.
The index of the signed pulse is given by
I1p= p +s2M
where p is the position index, s is the sign index, and M=4 is the number of bits per track.
Encoding 2 signed pulses per track
In case of two pulses per track of K=2M potential positions (here M=4), each pulse needs 1 bit for the sign and M bits for the position, which gives a total of 2M+2 bits. However, some redundancy exists due to the unimportance of the pulse ordering. For example, placing the first pulse at position p and the second pulse at position q is equivalent to placing the first pulse at position q and the second pulse at position p. One bit can be saved by encoding only one sign and deducing the second sign from the ordering of the positions in the index. Here the index is given by
I2p = p1 + p02M + s22M
where s is the sign index of the pulse at position index p0. If the two signs are equal then the smaller position is set to p0 and the larger position is set to p1. On the other hand, of the two signs are not equal then the larger position is set to p0 and the smaller position is set to p1. At the decoder, the sign of the pulse at position p0 is readily available. The second sign is deduced from the pulse ordering. If p0 is larger than p1 then the sign of the pulse at position p1 is opposite to that at position p0. If this is not the case then the two signs are set equal
Encoding 3 signed pulses per track
In case of three pulses per track, similar logic can be used as in the case of two pulses. For a track with 2M positions, 3M+1 bits are needed instead of 3M+3 bits. A simple way of indexing the pulses is to divide the track positions in two sections (or halves) and identify a section that contains at least two pulses. The number of positions in the section is K/2 = 2M/2 = 2M-1, which can be represented with M-1 bits. The two pulses in the section containing at least two pulses are encoded with the procedure for encoding 2 signed pulses which requires 2(M-1)+1 bits and the remaining pulse which can be anywhere in the track (in either section) is encoded with the M+1 bits. Finally, the index of the section that contains the two pulses is encoded with 1 bit. Thus the total number of required bits is 2(M-1)+1 + M+1 + 1 = 3M+1.
A simple way of checking if two pulses are positioned in the same section is done by checking whether the most significant bits (MSB) of their position indices are equal or not. Note that a MSB of 0 means that the position belongs to the lower half of the track (0-7) and MSB of 1 means it belongs to the upper half (8-15). If the two pulses belong to the upper half, they need to be shifted to the range (0-7) before encoding them using 23+1 bits. This can be done by masking the M-1 least significant bits (LSB) with a mask consisting of M-1 ones (which corresponds to the number 7 in this case).
The index of the 3 signed pulses is given by
I3p = I2p +k22M-1+ I1p22M
where I2p is the index of the two pulses in the same section, k is the section index (0 or 1), and I1p is the index of the third pulse in the track.
Encoding 4 signed pulses per track
The 4 signed pulses in a track of length K=2M can be encoded using 4M bits. Similar to the case of 3 pulses, the K positions in the track are divided into 2 sections (two halves) where each section contains K/2=8 positions. Here we denote the sections as Section A with positions 0 to K/2-1 and Section B with positions K/2 to K-1. Each section can contain from 0 to 4 pulses. The table below shows the 5 cases representing the possible number of pulses in each section:
case |
Pulses in Section A |
Pulses in Section B |
Bits needed |
0 |
0 |
4 |
4M-3 |
1 |
1 |
3 |
4M-2 |
2 |
2 |
2 |
4M-2 |
3 |
3 |
1 |
4M-2 |
4 |
4 |
0 |
4M-3 |
In cases 0 or 4, the 4 pulses in a section of length K/2=2M-1 can be encoded using 4(M-1)+1=4M-3 bits (this will be explained later on).
In cases 1 or 3, the 1 pulse in a section of length K/2=2M-1 can be encoded with M-1+1 = M bits and the 3 pulses in the other section can be encoded with 3(M-1)+1 = 3M-2 bits. This gives a total of M+3M-2 = 4M-2 bits.
In case 2, the pulses in a section of length K/2=2M-1 can be encoded with 2(M-1)+1 = 2M–1 bits. Thus for both sections, 2(2M–1) = 4M–2 bits are required.
Now the case index can be encoded with 2 bits (4 possible cases) assuming cases 0 and 4 are combined. Then for cases 1, 2, or 3, the number of needed bits is 4M-2. This gives a total of 4M-2 + 2 = 4M bits. For cases 0 or 4, one bit is needed for identifying either case, and 4M-3 bits are needed for encoding the 4 pulses in the section. Adding the 2 bits needed for the general case, this gives a total of 1+4M-3+2= 4M bits.
The index of the 4 signed pulses is given by
I4p = IAB + k24M-2
where k is the case index (2 bits), and IAB is the index of the pulses in both sections for each individual case.
For cases 0 and 1, IAB is given by
IAB_0,4 = I4p_section + j24M-3
where j is a 1-bit index identifying the section with 4 pulses and I4p_section is the index of the 4 pulses in that section (which requires 4M-3 bits).
For case 1, IAB is given by
IAB_1 = I3p_B + I1p_A 23(M-1)+1
where I3p_B is the index of the 3 pulses in Section B (3(M-1)+1 bits) and I1p_A is the index of the pulse in Section A ((M-1)+1 bits).
For case 2, IAB is given by
IAB_2 = I2p_B + I2p_A 22(M-1)+1
where I2p_B is the index of the 2 pulses in Section B (2(M-1)+1 bits) and I2p_A is the index of the two pulses in Section A (2(M-1)+1 bits).
Finally, for case 3, IAB is given by
IAB_3 = I1p_B + I3p_A 2M
where I1p_B is the index of the pulse in Section B ((M-1)+1 bits) and I3p_A is the index of the 3 pulses in Section A (3(M-1)+1 bits).
For cases 0 and 4, it was mentioned that the 4 pulses in one section are encoded using 4(M-1)+1 bits. This is done by further dividing the section into 2 subsections of length K/4=2M-2 (=4 in this case); identifying a subsection that contains at least 2 pulses; coding the 2 pulses in that subsection using 2(M-2)+1=2M-3 bits; coding the index of the subsection that contains at least 2 pulses using 1 bit; and coding the remaining 2 pulses, assuming that they can be anywhere in the section, using 2(M-1)+1=2M-1 bits. This gives a total of (2M-3)+(1)+(2M-1) = 4M-3 bits
Encoding 5 signed pulses per track
The 5 signed pulses in a track of length K=2M can be encoded using 5M bits. Similar to the case of 4 pulses, the K positions in the track are divided into 2 sections A and B. Each section can contain from 0 to 5 pulses. A simple approach to encode the 5 pulses is to identify a section that contains at least 3 pulses and to encode the 3 pulses in that section using 3(M-1)+1= 3M-2 bits, and to encode the remaining 2 pulses in the whole track using 2M+1 bits. This gives 5M-1 bits. An extra bit is needed to identify the section that contains at least 3 pulses. Thus a total of 5M bits are needed to encode the 5 signed pulses.
The index of the 5 signed pulses is given by
I5p = I2p + I3p22M + k25M-1
Where k is the index of the section that contains at least 3 pulses, I3p is the index of the 3 pulses in that section (3(M-1)+1 bits), and I2p is the index of the remaining 2 pulses in the track (2M+1 bits).
Encoding 6 signed pulses per track
The 6 signed pulses in a track of length K=2M are encoded using 6M-2 bits. Similar to the case of 5 pulses, the K positions in the track are divided into 2 sections A and B. Each section can contain from 0 to 6 pulses. The table below shows the 7 cases representing the possible number of pulses in each sections:
case |
Pulses in Section A |
Pulses in Section B |
Bits needed |
0 |
0 |
6 |
6M-5 |
1 |
1 |
5 |
6M-5 |
2 |
2 |
4 |
6M-5 |
3 |
3 |
3 |
6M-4 |
4 |
4 |
2 |
6M-5 |
5 |
5 |
1 |
6M-5 |
6 |
6 |
0 |
6M-5 |
Note that cases 0 and 6 are similar except that the 6 pulses are in different section. Similarly, cases 1 and 5 as well as cases 2 and 4 differ only in the section that contains more pulses. Therefore these cases can be coupled and an extra bit can be assigned to identify the section that contains more pulses. Since these cases initially need 6M-5 bits, the coupled cases need 6M-4 bits taking into account the Section bit. Thus, we have now 4 states of coupled cases, that is (0,6), (1,5), (2,4), and (3),with 2 extra bits needed for the state. This gives a total of 6M-4+2=6M-2 bits for the 6 signed pulses.
In cases 0 and 6, 1 bit is needed to identify the section which contains 6 pulses. 5 pulses in that section are encoded using 5(M-1) bits (since the pulses are confined to that section), and the remaining pulse is encoded using (M-1)+1 bits. Thus a total of 1+5(M-1)+M=6M-4 bits are needed for this coupled case. Extra 2 bits are needed to encode the state of the coupled case, giving a total of 6M-2 bits. For this coupled case, the index of the 6 pulses is given by
I6p = I1p + I5p2M+ j26M-5 + k26M-4
where k is the index of the coupled case (2 bits), j is the index of the section containing 6 pulses (1 bit), I5p is the index of 5 pulses in that section (5(M-1) bits), and I1p is the index of the remaining pulse in that section ((M-1)+1 bits).
In cases 1 and 5, 1 bit is needed to identify the section which contains 5 pulses. The 5 pulses in that section are encoded using 5(M-1) bits and the pulse in the other section is encoded using (M-1)+1 bits. For this coupled case, the index of the 6 pulses is given by
I6p = I1p + I5p2M+ j26M-5 + k26M-4
where k is the index of the coupled case (2 bits), j is the index of the section containing 5 pulses (1 bit), I5p is the index of the 5 pulses in that section (5(M-1) bits), and I1p is the index of the pulse in the other section ((M-1)+1 bits).
In cases 2 or 4, 1 bit is needed to identify the section which contains 4 pulses. The 4 pulses in that section are encoded using 4(M-1) bits and the 2 pulses in the other section are encoded using 2(M-1)+1 bits. For this coupled case, the index of the 6 pulses is given by
I6p = I2p + I4p22(M-1)+1 + j26M-5 + k26M-4
where k is the index of the coupled case (2 bits), j is the index of the section containing 4 pulses (1 bit), I4p is the index of 4 pulses in that section (4(M-1) bits), and I2p is the index of the 2 pulses in the other section (2(M-1)+1 bits).
In case 3, the 3 pulses in each section are encoded using 3(M-1)+1 bits in each Section. For this case, the index of the 6 pulses is given by
I6p = I3pB + I3pA23(M-1)+1 + k26M-4
where k is the index of the coupled case (2 bits), I3pB is the index of 3 pulses Section B (3(M-1)+1 bits), and I3pA is the index of the 3 pulses in Section A (3(M-1)+1 bits).
5.8.3 Codebook search
The algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesis speech. The target signal used in the closed-loop pitch search is updated by subtracting the adaptive codebook contribution. That is
( 40 )
where is the filtered adaptive codebook vector and gp is the unquantized adaptive codebook gain.
The matrix H is defined as the lower triangular Toeplitz convolution matrix with diagonal h(0) and lower diagonals h(1),…,h(63), and is the correlation between the target signal x2(n) and the impulse response h(n) (also known as the backward filtered target vector), and is the matrix of correlations of h(n).
The elements of the vector d are computed by
( 41 )
and the elements of the symmetric matrix are computed by
( 42 )
If ck is the algebraic codevector at index k, then the algebraic codebook is searched by maximizing the search criterion
(43)
The vector d and the matrix are usually computed prior to the codebook search.
The algebraic structure of the codebooks allows for very fast search procedures since the innovation vector ck contains only a few nonzero pulses. The correlation in the numerator of Equation (43) is given by
( 44 )
where mi is the position of the ith pulse, ai is its amplitude, and Np is the number of pulses. The energy in the denominator of Equation (43) is given by
( 45 )
To simplify the search procedure, the pulse amplitudes are predetermined based on a certain reference signal b(n). In this so-called signal-selected pulse amplitude approach, the sign of a pulse at position i is set equal to the sign of the reference signal at that position. Here, the reference signal b(n) is given by
(46)
where is the energy of the signal d(n) and is the energy of the signal which is the residual signal after long term prediction. The scaling factor controls the amount of dependence of the reference signal on d(n), and it is lowered as the bit rate is increased. Here =2 for 6.6 and 8.85 modes; =1 for 12.65, 14.25, and 15.85 modes; =0.8 for 18.25 mode; =0.75 for 19.85 mode; and =0.5 for 23.05 and 23.85 modes.
To simplify the search the signal d(n) and matrix are modified to incorporate the pre-selected signs. Let sb(n) denote the vector containing the signs of b(n). The modified signal d’(n) is given by
n=0,…,N-1
and the modified autocorrelation matrix ’ is given by
, i=0,…,N-1; j=i,…,N-1.
The correlation at the numerator of the search criterion Qk is now given by
and the energy at the denominator of the search criterion Qk is given by
The goal of the search now is to determine the codevector with the best set of Np pulse positions assuming amplitudes of the pulses have been selected as described above. The basic selection criterion is the maximization of the above mentioned ratio Qk.
In order to reduce the search complexity, a fast search procedure known as depth-first tree search procedure is used, whereby the pulse positions are determined Nm pulses at a time. More precisely, the Np available pulses are partitioned into M non-empty subsets of Nm pulses respectively such that N1+N2…+Nm…+NM = Np. A particular choice of positions for the first J = N1+N2…+Nm-1 pulses considered is called a level-m path or a path of length J. The basic criterion for a path of J pulse positions is the ratio Qk(J) when only the J relevant pulses are considered.
The search begins with subset #1 and proceeds with subsequent subsets according to a tree structure whereby subset m is searched at the mth level of the tree. The purpose of the search at level 1 is to consider the N1 pulses of subset #1 and their valid positions in order to determine one, or a number of, candidate path(s) of length N1 which are the tree nodes at level l. The path at each terminating node of level m-1 is extended to length N1+N2…+Nm at level m by considering Nm new pulses and their valid positions. One, or a number of, candidate extended path(s) are determined to constitute level-m nodes. The best codevector corresponds to that path of length Np which maximizes the criterion Qk(Np) with respect to all level-M nodes.
A special form of the depth-first tree search procedure is used here, in which two pulses are searched at a time, that is, Nm=2, and these 2 pulses belong to two consecutive tracks. Further, instead of assuming that the matrix is precomputed and stored, which requires a memory of NN words (6464= 4k words), a memory-efficient approach is used which reduces the memory requirement. In this approach, the search procedure is performed in such a way that only a part of the needed elements of the correlation matrix are precomputed and stored. This part corresponds to the correlations of the impulse response corresponding to potential pulse positions in consecutive tracks, as well as the correlations corresponding to (j,j), j=0,…,N-1 (that is the elements of the main diagonal of matrix ).
In order to reduce complexity, while testing possible combinations of two pulses, a limited number of potential positions of the first pulse are tested. Further, in case of large number of pulses, some pulses in the higher levels of the search tree are fixed. In order to guess intelligently which potential pulse positions are considered for the first pulse or in order to fix some pulse positions, a "pulse-position likelihood-estimate vector" b is used, which is based on speech-related signals. The pth component b(p) of this estimate vector b characterizes the probability of a pulse occupying position p (p = 0, 1, … N-1) in the best codevector we are searching for. Here the estimate vector b is the same vector used for preselecting the amplitudes and given in Equation (46).
The search procedures for all bit rate modes are similar. Two pulses are searched at a time, and these two pulses always correspond to consecutive tracks. That is the two searched pulses are in tracks T0-T1, T1-T2, T2-T3, or T3-T0.
Before searching the positions, the sign of at pulse a potential position n is set the sign of b(n) at that position. Then the modified signal d’(n) is computed as described above by including the predetermined signs.
For the first 2 pulses (1st tree level), the correlation at the numerator of the search criterion is given by
.
and the energy at the denominator of the search criterion Qk is given by
where the correlations has been modified to include the preselected signs at positions mi and mj.
For subsequent levels, the numerator and denominator are updated by adding the contribution of two new pulses. Assuming that two new pulses at a certain tree level with positions mk and mk+1 from two consecutive tracks are searched, then the updated value of R is given by
(47)
and the updated energy is given by
(48)
where Rhv(m) is the correlation between the impulse response h(n) and a vector vh(n) containing the addition of delayed versions of impulse response at the previously determined positions. That is,
and
At each tree level, the values of Rhv(m) are computed online for all possible positions in each of the two tracks being tested. It can be seen from Equation (48) that only the correlations corresponding to pulse positions in two consecutive tracks need to be stored (41616 words), along with the correlations corresponding to the diagonal of the matrix (64 words). Thus the memory requirement in the present algebraic structure is 1088 words instead of 6464=4096 words.
The search procedures at the different bit rates modes are similar. The difference is in the number of pulses, and accordingly, the number of levels in the tree search. In order to keep a comparable search complexity across the different codebooks, the number of tested positions is kept similar.
The search in the 12.65 kbit/s mode will be described as an example. In this mode, 2 pulses are placed in each track giving a total of 8 pulses per subframe of length 64. Two pulses are searched at a time, and these two pulses always correspond to consecutive tracks. That is the two searched pulses are in tracks T0-T1, T1-T2, T2-T3, or T3-T0. The tree has 4 levels in this case. At the first level, pulse P0 is assigned to track T0 and pulse P1 to track T1. In this level, no search is performed and the two pulse positions are set to the maximum of b(n) in each track. In the second level, pulse P2 is assigned to track T2 and pulse P3 to track T3. 4 positions for pulse P2 are tested against all 16 positions of pulse P3. The 4 tested positions of P2 are determined based on the maxima of b(n) in the track. In the third level, pulse P4 is assigned to track T1 and pulse P5 to track T2. 8 positions for pulse P4 are tested against all 16 positions of pulse P5. Similar to the previous search level, the 8 tested positions of P4 are determined based on the maxima of b(n) in the track. In the fourth level, pulse P6 is assigned to track T3 and pulse P7 to track T0. 8 positions for pulse P6 are tested against all 16 positions of pulse P7. Thus the total number of tested combination is 416+816+816=320. The whole process is repeated 4 times (4 iterations) by assigning the pulses to different tracks. For example, in the 2nd iteration, pulses P0 to P7 are assigned to tracks T1, T2, T3, T0, T2, T3, T0, and T1, respectively. Thus the total number of tested position combinations is 4320=1280.
As another search example, in the 15.85 kbit/s mode, 3 pulses are placed in each track giving a total of 12 pulses. There are 6 levels in the tree search whereby two pulses are searched in each level. In the first two levels, 4 pulses are set to the maxima of b(n). In the subsequent 4 levels, the number of tested combinations are 416, 616, 816, and 816, respectively. 4 iterations are used giving a total of 42616=1664 combinations.