4 Outline description
26.2903GPPAudio codec processing functionsExtended Adaptive Multi-Rate - Wideband (AMR-WB+) codecRelease 17Transcoding functionsTS
This TS is structured as follows:
Section 4.1 contains a functional description of the audio parts including the A/D and D/A functions. Section 4.2 describes input format for the AMR-WB+ encoder and the output format for the AMR-WB+ decoder. Section 4.3 presents a simplified description of the principles of the AMR-WB codec. In subclause 4.4, the sequence and subjective importance of encoded parameters are given.
Section 5 presents the functional description of the encoding functions of the AMR-WB+ extension modes, whereas clause 6 describes the decoding procedures for the extension modes. In section 7, the detailed bit allocation of the AMR-WB+ codec extension modes is tabulated. The AMR-WB speech modes are functionally unchanged as well as their bit allocation. Detailed information on them is found in [1].
4.1 Functional description of audio parts
The analogue‑to‑digital and digital‑to‑analogue conversion will in principle comprise the elements given below. In case of stereo codec operation, the given principles will be applied to the 2 available audio channels.
1) Analogue to uniform digital PCM
– microphone;
– input level adjustment device;
‑ input anti‑aliasing filter;
‑ sample‑hold device sampling at 16/24/32/48 kHz;
‑ analogue‑to‑uniform digital conversion to 16‑bit representation.
The uniform format shall be represented in two’s complement.
2) Uniform digital PCM to analogue
‑ conversion from 16‑bit uniform PCM sampled at 16/24/32/48 kHz to analogue;
‑ a hold device;
‑ reconstruction filter including x/sin( x ) correction;
‑ output level adjustment device;
‑ earphone or loudspeaker.
In the terminal equipment, the A/D function may be achieved
‑ by direct conversion to 14‑bit uniform PCM format;
For the D/A operation, the inverse operations take place.
4.2 Preparation of input samples
The encoder is fed with data from one/two input channels comprising of samples with a resolution of 16 bits in a 16‑bit word. The decoder outputs data in the same format and number of output channels. Though, mono output of decoded stereo signals is supported.
4.3 Principles of the extended adaptive multi-rate wideband codec
The AMR-WB+ audio codec contains all the AMR-WB speech codec modes 1-9 and AMR-WB VAD and DTX. AMR-WB+ extends the AMR-WB codec by adding TCX, bandwidth extension, and stereo.
The AMR-WB+ audio codec processes input frames equal to 2048 samples at an internal sampling frequency Fs . The internal sampling frequency is limited to the range 12800-38400 Hz, see section 8 for more details. The 2048-sample frames are split into two critically sampled equal frequency bands. This results in two superframes of a 1024 samples corresponding to the low frequency (LF) and high frequency (HF) band. Each superframe is divided into four 256-samples frames.
Sampling at the internal sampling rate is obtained by using a variable sampling conversion scheme, which re-samples the input signal.
The LF and HF signals are then encoded using two different approaches: the LF is encoded and decoded using the "core" encoder/decoder, based on switched ACELP and transform coded excitation (TCX). In ACELP mode, the standard AMR-WB codec is used. The HF signal is encoded with relatively few bits (16 bits/frame) using a bandwidth extension (BWE) method.
The basic set of rates are built based on AMR-WB rates in addition to bandwidth extension. The basic set of mono rates are shown in Table 1.
Table 1: Basic set of mono rates
|
Mono rate(incl. BWE) (bits/frame) |
Corresponding AMR-WB mode |
|
208 |
NA |
|
240 |
NA |
|
272 |
12.65 |
|
304 |
14.25 |
|
336 |
15.85 |
|
384 |
18.25 |
|
416 |
19.85 |
|
480 |
23.05 |
Note that in ACELP mode of operation, compared to AMR-WB, the VAD bit is removed, two bits per frame are added for gain prediction, and 2 bits are added for signaling frame encoding type. This adds 3 bits per frame. Note also that 16 bits/frame is always used for bandwidth extension (to encode the HF band). The first two basic mono rates are similar to other rates except that they use a fixed codebook with 20 bits or 28 bits, respectively.
For stereo coding, the set of stereo extension rates given in Table 2 are used.
Table 2: Basic set of stereo rates
|
Stereo extension rates (incl. BWE) (Bits/frame) |
|
|
40 |
104 |
|
48 |
112 |
|
56 |
120 |
|
64 |
128 |
|
72 |
136 |
|
80 |
144 |
|
88 |
152 |
|
96 |
160 |
Note that the bandwidth extension is applied to both channels which requires additional 16 bits/frame for the stereo extension.
A certain mode of operation is obtained by choosing a rate from Table 1, in case of mono operation, or by combining a rate from Table 1 with a stereo extension rate from Table 2, in case of stereo operation. The resulting coding bitrate is (mono rate + stereo rate) Fs / 512.
Examples:
– For an internal sampling frequency of 32 kHz by choosing mono rate equal to 384 bits/frame and without stereo, we can obtain a bit-rate equal to 24 kbps and the frame length would be of a 16 ms duration.
– For an internal sampling frequency of 25.6 kHz by choosing mono rate equal to 272 bits/frame and stereo rate equal to 88 bits/frame, we can obtain a bit-rate equal to 18 kbps and the frame length would be of a 20 ms duration.
Note. The documentation of the AMR-WB+ floating-point C-code in [4] contains further information on how to use the executables compiled from this source code to exercise the various possible uses, in the codec, of mono bit rate, stereo bit rate and internal sampling frequency, and the resulting total bit rates.
4.3.1 Encoding and decoding structure
Figure 1 presents the AMR-WB+ encoder structure. The input signal is separated in two bands. The first band is the low-frequency (LF) signal, which is critically sampled at Fs/2 . The second band is the high-frequency (HF) signal, which is also downsampled to obtain a critically sampled signal. The LF and HF signals are then encoded using two different approaches: the LF signal is encoded and decoded using the "core" encoder/decoder, based on switched ACELP and transform coded excitation (TCX). In ACELP mode, the standard AMR-WB codec is used. The HF signal is encoded with relatively few bits using a bandwidth extension (BWE) method.
The parameters transmitted from encoder to decoder are the mode selection bits, the LF parameters and the HF parameters. The parameters for each 1024-sample super-frame are decomposed into four packets of identical size.
When the input signal is stereo, the Left and right channels are combined into mono signal for ACELP/TCX encoding, whereas the stereo encoding receives both input channels.
Figure 2 presents the AMR-WB+ decoder structure. The LF and HF bands are decoded separately after which they are combined in a synthesis filterbank. If the output is restricted to mono only, the stereo parameters are omitted and the decoder operates in mono mode.
Figure 1: High-level structure of AMR-WB+ encoder
Figure 2: High-level structure of AMR-WB+ decoder
4.3.2 LP analysis and synthesis in low-frequency band
The AMR-WB+ codec applies LP analysis for both the ACELP and TCX modes when encoding the LF signal. The LP coefficients are interpolated linearly at every 64-sample sub-frame. The LP analysis window is a half-cosine of length 384 samples.
4.3.3 ACELP and TCX coding
To encode the core mono signal (0-Fs/4 kHz band), the AMR-WB+ codec utilises either ACELP or TCX coding for each frame. The coding mode is selected based on closed-loop analysis-by-synthesis method. Only 256-sample frames are considered for ACELP frames (as in AMR-WB), whereas frames of 256, 512 or 1024 samples are possible in TCX mode.
ACELP encoding and decoding are similar to standard AMR-WB speech codec. The ACELP coding consists of LTP analysis and synthesis and algebraic codebook excitation. The ACELP coding mode is used in AMR-WB operation within AMR-WB+ codec.
In TCX mode the perceptually weighted signal is processed in the transform domain. The Fourier transformed weighted signal is quantised using split multi-rate lattice quantisation (algebraic VQ). Transform is calculated in 1024, 512 or 256 samples windows. The excitation signal is recovered by inverse filtering the quantised weighted signal through the inverse weighting filter (same weighting filter as in AMR-WB).
4.3.4 Coding of high-frequency band
Whereas the LF signal (0-Fs/4 kHz band) is encoded using the previously described switched ACELP/TCX encoding approach, the HF signal is encoded using a low-rate parametric bandwidth extension (BWE) approach. Only gains and spectral envelope information are transmitted in the BWE approach used to encode the HF signal.
The bandwidth extension is done separately for left and right channel in stereo operation.
4.3.5 Stereo coding
In the case of stereo coding, a similar band decomposition as in the mono case is used. The two channels L and R are decomposed into LF and HF signals. The LF signals of the two channels are down-mixed to form an LF mono signal, (0-Fs/4 kHz band). This mono signal is encoded separately by the core codec.
The LF part of the two channels is further decomposed into two bands (0-5Fs/128 kHz band) and (5Fs/128 kHz- Fs/4 kHz band). The very low frequency (VLF) band is critically down-sampled, and the side signal is computed. The resulting signal is semi-parametrically encoded in the frequency domain using the algebraic VQ. The frequency domain encoding is performed in closed loop by choosing among 40-, 80- and 160-sample frame lengths.
The high frequency part of the LF signals (Midband) are parametrically encoded. In the decoder, the parametric model is applied on the mono signal excitation in order to restore the high frequency part of the original LF part of the two channels.
The HF part of the two channels are encoded by using parametric BWE described below.
4.3.6 Low complexity operation
In the low complexity operation (use case B) the decision on the usage of ACELP and TCX mode is done in an open-loop manner. This approach introduces computational savings in the encoder.
4.3.7 Frame erasure concealment
When missing packets occur at the receiver, the decoder applies concealment. The concealment algorithm depends on the mode of the correctly received packets preceding and following the missing packet. Concealment uses either time-domain coefficient extrapolation, as in AMR-WB, or frequency-domain interpolation for some of the TCX modes.
4.3.8 Bit allocation
The bit allocation for the different parameters in the low-frequency band coding (Core) (0-Fs/4 kHz band) is shown in Tables 3, 4, 5, and 6. Note that there are two mode bits sent in each 256-sample packet. These mode bits are not shown in the bit allocation tables. The bit allocations for the stereo part is shown in Tables 7, 8, and 9. Note that there are also two additional mode bits for the VLF stereo encoder, which are not shown in the bit allocation. The bit allocation for the stereo HF part is by definition that of the bandwidth extension, as presented in Tables 7,8 and 9.
Tables 2 and 3 show the total bits per 256-sample packet, including mode bits.
Table 3: Bit allocations for ACELP core rates including BWE (per frame)
|
Parameter |
Number of bits |
|||||||
|
Mode bits |
2 |
|||||||
|
ISF Parameters |
46 |
|||||||
|
Mean Energy |
2 |
|||||||
|
Pitch Lag |
30 |
|||||||
|
Pitch Filter |
4 × 1 |
|||||||
|
Fixed-codebook Indices |
4 × 20 |
4 × 28 |
4 × 36 |
4 × 44 |
4 × 52 |
4 × 64 |
4 × 72 |
4 × 88 |
|
Codebook Gains |
4 × 7 |
|||||||
|
HF ISF Parameters |
9 |
|||||||
|
HF gain |
7 |
|||||||
|
Total in bits |
208 |
240 |
272 |
304 |
336 |
384 |
416 |
480 |
Table 4: Bit allocations for 256-sample TCX window (Core)
|
Parameter |
Number of bits |
|||||||
|
Mode bits |
2 |
|||||||
|
ISF Parameters |
46 |
|||||||
|
Noise factor |
3 |
|||||||
|
Global Gain |
7 |
|||||||
|
Algebraic VQ |
134 |
166 |
198 |
230 |
262 |
310 |
342 |
406 |
|
HF ISF Parameters |
9 |
|||||||
|
HF gain |
7 |
|||||||
|
Total in bits |
208 |
240 |
272 |
304 |
336 |
384 |
416 |
480 |
Table 5: Bit allocations for 512-sample TCX window (Core)
|
Parameter |
Number of bits |
|||||||
|
Mode bits |
2+2 |
|||||||
|
ISF Parameters |
46 |
|||||||
|
Noise factor |
3 |
|||||||
|
Global Gain |
7 |
|||||||
|
Gain redundancy |
6 |
|||||||
|
Algebraic VQ |
318 |
382 |
446 |
510 |
574 |
670 |
734 |
862 |
|
HF ISF Parameters |
9 |
|||||||
|
HF gain |
7 |
|||||||
|
HF Gain correction |
8 × 2 |
|||||||
|
Total in bits |
416 |
480 |
544 |
608 |
672 |
768 |
832 |
960 |
Table 6: Bit allocations for 1024-sample TCX window (Core)
|
Parameter |
Number of bits |
|||||||
|
Mode bits |
2+2+2+2 |
|||||||
|
ISF Parameters |
46 |
|||||||
|
Noise factor |
3 |
|||||||
|
Global Gain |
7 |
|||||||
|
Gain redundancy |
3+3+3 |
|||||||
|
Algebraic VQ |
695 |
823 |
951 |
1079 |
1207 |
1399 |
1527 |
1783 |
|
HF ISF Parameters |
9 |
|||||||
|
HF gain |
7 |
|||||||
|
HF Gain correction |
16 × 3 |
|||||||
|
Total in bits |
832 |
960 |
1088 |
1216 |
1344 |
1536 |
1664 |
1920 |
Table 7 Bit allocations for stereo encoder for 256-sample window
|
Parameter |
Number of bits |
|||||||||||||||
|
Mode bits |
2 |
|||||||||||||||
|
Global Gain |
7 |
|||||||||||||||
|
Gain |
7 |
|||||||||||||||
|
Unused bits |
1 |
|||||||||||||||
|
Midband |
6 |
12 |
||||||||||||||
|
Algebraic VQ |
1 |
9 |
17 |
25 |
33 |
41 |
49 |
51 |
59 |
67 |
75 |
83 |
91 |
99 |
107 |
115 |
|
HF ISF Parameters |
9 |
|||||||||||||||
|
HF gain |
7 |
|||||||||||||||
|
Total in bits |
40 |
48 |
56 |
64 |
72 |
80 |
88 |
96 |
104 |
112 |
120 |
128 |
136 |
144 |
152 |
160 |
Table 8 Bit allocations for stereo encoder for 512-sample window
|
Parameter |
Number of bits |
|||||||||||||||
|
Mode bits |
2+2 |
|||||||||||||||
|
Global Gain |
7 |
|||||||||||||||
|
Gain |
7 |
|||||||||||||||
|
Unused bits |
1+1 |
|||||||||||||||
|
Midband |
6×2 |
12×2 |
||||||||||||||
|
Algebraic VQ |
16 |
32 |
48 |
64 |
80 |
96 |
112 |
116 |
132 |
148 |
164 |
180 |
196 |
212 |
228 |
244 |
|
HF ISF Parameters |
9 |
|||||||||||||||
|
HF gain |
7 |
|||||||||||||||
|
HF Gain correction |
8 × 2 |
|||||||||||||||
|
Total in bits |
80 |
96 |
112 |
128 |
144 |
160 |
176 |
192 |
208 |
224 |
240 |
256 |
272 |
288 |
304 |
320 |
Table 9 Bit allocations for stereo encoder for 1024-sample window
|
Parameter |
Number of bits |
|||||||||||||||
|
Mode bits |
2+2+2+2 |
|||||||||||||||
|
Global Gain |
7 |
|||||||||||||||
|
Gain |
7 |
|||||||||||||||
|
Unused bits |
1+1+1+1 |
|||||||||||||||
|
Midband |
6×4 |
12×4 |
||||||||||||||
|
Algebraic VQ |
46 |
78 |
110 |
142 |
174 |
206 |
238 |
246 |
278 |
310 |
342 |
374 |
406 |
438 |
470 |
502 |
|
HF ISF Parameters |
9 |
|||||||||||||||
|
HF gain |
7 |
|||||||||||||||
|
HF Gain correction |
16 × 3 |
|||||||||||||||
|
Total in bits |
160 |
192 |
224 |
256 |
288 |
320 |
352 |
384 |
416 |
448 |
480 |
512 |
544 |
576 |
608 |
640 |