4 Outline description
26.1903GPPAdaptive MultiRate  Wideband (AMRWB) speech codecSpeech codec speech processing functionsTranscoding functionsTS
This TS is structured as follows:
Section 4.1 contains a functional description of the audio parts including the A/D and D/A functions. Section 4.2 describes input format for the AMRWB encoder and the output format for the AMRWB decoder. Sections 4.3 and 4.4 present a simplified description of the principles of the AMRWB codec encoding and decoding process respectively. In subclause 4.5, the sequence and subjective importance of encoded parameters are given.
Section 5 presents the functional description of the AMRWB codec encoding, whereas clause 6 describes the decoding procedures. In section 7, the detailed bit allocation of the AMRWB codec is tabulated. Section 8 describes the homing operation.
4.1 Functional description of audio parts
The analogue‑to‑digital and digital‑to‑analogue conversion will in principle comprise the following elements:
1) Analogue to uniform digital PCM
– microphone;
– input level adjustment device;
– input anti‑aliasing filter;
– sample‑hold device sampling at 16 kHz;
– analogue‑to‑uniform digital conversion to 14‑bit representation.
The uniform format shall be represented in two’s complement.
2) Uniform digital PCM to analogue
‑ conversion from 14‑bit/16 kHz uniform PCM to analogue;
‑ a hold device;
‑ reconstruction filter including x/sin( x ) correction;
‑ output level adjustment device;
‑ earphone or loudspeaker.
In the terminal equipment, the A/D function may be achieved
‑ by direct conversion to 14‑bit uniform PCM format;
For the D/A operation, the inverse operations take place.
4.2 Preparation of speech samples
The encoder is fed with data comprising of samples with a resolution of 14 bits left justified in a 16‑bit word. The decoder outputs data in the same format. Outside the speech codec further processing must be applied if the traffic data occurs in a different representation.
4.3 Principles of the adaptive multirate wideband speech encoder
The AMRWB codec consists of nine source codecs with bitrates of 23.85 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85 and 6.60 kbit/s.
The codec is based on the code‑excited linear predictive (CELP) coding model. The input signal is preemphasized using the filter H_{preemph}(z)=1z^{1}. The CELP model is then applied to the preemphasized signal. A 16th order linear prediction (LP), or short‑term, synthesis filter is used which is given by:
, ( 1 )
where â_{i},i=1,…,m are the (quantized) linear prediction (LP) parameters, and m = 16 is the predictor order. The long‑term, or pitch, synthesis filter is usually given by:
, ( 2 )
where T is the pitch delay and g_{p} is the pitch gain. The pitch synthesis filter is implemented using the socalled adaptive codebook approach.
The CELP speech synthesis model is shown in Figure 1. In this model, the excitation signal at the input of the short‑term LP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. The speech is synthesized by feeding the two properly chosen vectors from these codebooks through the short‑term synthesis filter. The optimum excitation sequence in a codebook is chosen using an analysis‑by‑synthesis search procedure in which the error between the original and synthesized speech is minimized according to a perceptually weighted distortion measure.
The perceptual weighting filter used in the analysis‑by‑synthesis search technique is given by:
, ( 3 )
where A(z) is the unquantized LP filter, , and _{1}=0.92 is the perceptual weighting factor. The weighting filter uses the unquantized LP parameters.
The encoder performs the analysis of the LPC, LTP and fixed codebook parameters at 12.8 kHz sampling rate. The coder operates on speech frames of 20 ms. At each frame, the speech signal is analysed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks’ indices and gains). In addition to these parameters, highband gain indices are computed in 23.85 kbit/s mode. These parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signal through the LP synthesis filter.
The signal flow at the encoder is shown in Figure 2. After decimation, highpass and preemphasis filtering is performed. LP analysis is performed once per frame. The set of LP parameters is converted to immittance spectrum pairs (ISP) and vector quantized using splitmultistage vector quantization (SMSVQ). The speech frame is divided into 4 subframes of 5 ms each (64 samples at 12.8 kHz sampling rate). The adaptive and fixed codebook parameters are transmitted every subframe. The quantized and unquantized LP parameters or their interpolated versions are used depending on the subframe. An open‑loop pitch lag is estimated in every other subframe or once per frame based on the perceptually weighted speech signal.
Then the following operations are repeated for each subframe:
– The target signal x(n) is computed by filtering the LP residual through the weighted synthesis filter with the initial states of the filters having been updated by filtering the error between LP residual and excitation (this is equivalent to the common approach of subtracting the zero input response of the weighted synthesis filter from the weighted speech signal).
– The impulse response, h(n) of the weighted synthesis filter is computed.
– Closed‑loop pitch analysis is then performed (to find the pitch lag and gain), using the target x(n) and impulse response h(n), by searching around the open‑loop pitch lag. Fractional pitch with 1/4th or 1/2nd of a sample resolution (depending on the mode and the pitch lag value) is used. The interpolating filter in fractional pitch search has low pass frequency response. Further, there are two potential lowpass characteristics in the the adaptive codebook and this information is encoded with 1 bit.
– The target signal x(n) is updated by removing the adaptive codebook contribution (filtered adaptive codevector), and this new target, x_{2}(n), is used in the fixed algebraic codebook search (to find the optimum innovation).
– The gains of the adaptive and fixed codebook are vector quantified with 6or 7 bits (with moving average (MA) prediction applied to the fixed codebook gain).
– Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal in the next subframe.
The bit allocation of the AMRWB codec modes is shown in Table 1. In each 20 ms speech frame, 132, 177, 253, 285, 317, 365, 397, 461 and 477 bits are produced, corresponding to a bitrate of 6.60, 8.85 ,12.65, 14.25, 15.85, 18.25, 19.85, 23.05 or 23.85 kbit/s. More detailed bit allocation among the codec parameters is given in tables 12a12i. Note that the most significant bits (MSB) are always sent first.
Table 1: Bit allocation of the AMRWB coding algorithm for 20 ms frame
Mode 
Parameter 
1st subframe 
2nd subframe 
3rd subframe 
4th subframe 
total per frame 
VADflag 
1 

23.85 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
88 
88 
88 
88 
352 

Codebook gain 
7 
7 
7 
7 
28 

HBenergy 
4 
4 
4 
4 
16 

Total 
477 

VADflag 
1 

23.05 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
88 
88 
88 
88 
352 

Gains 
7 
7 
7 
7 
28 

Total 
461 

VADflag 
1 

19.85 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
72 
72 
72 
72 
288 

Codebook gain 
7 
7 
7 
7 
28 

Total 
397 

VADflag 
1 

18.25 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
64 
64 
64 
64 
256 

Gains 
7 
7 
7 
7 
28 

Total 
365 

VADflag 
1 

15.85 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
52 
52 
52 
52 
208 

Gains 
7 
7 
7 
7 
28 

Total 
317 

VADflag 
1 

14.25 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
44 
44 
44 
44 
176 

Gains 
7 
7 
7 
7 
28 

Total 
285 

VADflag 
1 

12.65 kbit/s 
ISP 
46 

LTPfiltering 
1 
1 
1 
1 
4 

Pitch delay 
9 
6 
9 
6 
30 

Algebraic code 
36 
36 
36 
36 
144 

Gains 
7 
7 
7 
7 
28 

Total 
253 

VADflag 
1 

8.85 kbit/s 
ISP 
46 

Pitch delay 
8 
5 
8 
5 
26 

Algebraic code 
20 
20 
20 
20 
80 

Gains 
6 
6 
6 
6 
24 

Total 
177 

VADflag 
1 

6.60 kbit/s 
ISP 
36 

Pitch delay 
8 
5 
5 
5 
23 

Algebraic code 
12 
12 
12 
12 
48 

Gains 
6 
6 
6 
6 
24 

Total 
132 
4.4 Principles of the adaptive multirate speech decoder
The signal flow at the decoder is shown in Figure 3. At the decoder, the transmitted indices are extracted from the received bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. These parameters are the ISP vector, the 4 fractional pitch lags, the 4 LTP filtering parameters, the 4 innovative codevectors, and the 4 sets of vector quantized pitch and innovative gains. In 23.85 kbit/s mode, also highband gain index is decoded. The ISP vector is converted to the LP filter coefficients and interpolated to obtain LP filters at each subframe. Then, at each 64sample subframe:
– The excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains.
– The 12.8 kHz speech is reconstructed by filtering the excitation through the LP synthesis filter.
– The reconstructed speech is deemphasized.
Finally, the reconstructed speech is upsampled to 16 kHz and highband speech signal is added to the frequency band from 6 kHz to 7 kHz.
4.5 Sequence and subjective importance of encoded parameters
The encoder will produce the output information in a unique sequence and format, and the decoder must receive the same information in the same way. In table 12a12i, the sequence of output bits and the bit allocation for each parameter is shown.
The different parameters of the encoded speech and their individual bits have unequal importance with respect to subjective quality. The output and input frame formats for the AMR wideband speech codec are given in [2], where a reordering of bits take place.