A.1 Encoder usage
26.3043GPPExtended Adaptive Multi-Rate - Wideband (AMR-WB+) codecFloating-point ANSI-C codeRelease 17TS
A.1.1 Simple mode
Simple mode is easy to use and requires no knowledge of AMR-WB+ to use the full capacity of the codec. The usage is as follows:
|
AmrwbPlusEncode -rate <bit rate> [-mono] [–ff <3gp|raw>] -if <infile.wav> -of <outfile.wb+> |
|
|
Where |
|
|
AmrwbPlusEncode |
Name of the AMR-WB+ encoder program either compiled from the floating-point C-code of this specification or from the fixed-point C-code of [1]. |
|
-rate |
Bit rate from 6-36 kbps for mono encoding or 7-48 kbps for stereo encoding |
|
-mono |
Forces mono encoding for stereo inputs. If this option is not used the encoder performs mono encoding for mono WAV files and stereo encoding for stereo WAV files. |
|
-ff |
File format: 3gp raw The default is 3gp file format. |
|
-if |
Input audio WAV file with one (mono) or two (stereo) channels Supported audio sampling rates are 8, 16, 24, 32, 48, 11.025, 22.05, 44.1 kHz |
|
-of |
Output file (according to the -ff argument) |
The codec will use the best combination of mono and stereo bit rates and internal sampling frequency (ISF) according to the bit rate specified by the user. In this simple mode, the codec uses a set of predefined configurations in the bit rate range from 6-48 kbps. The codec will choose the closest configuration to the required bit rates. The configurations have been chosen to optimize the quality/bandwidth trade-off at a certain bit rate for 48 kHz sampled input.
Tables A.1 and A.2 show the default codec configurations for mono and stereo operation, respectively. In the tables, the mapping from selected bit rate to mode index and ISF index are given, and in addition the resulting bit rate factor and coded audio bandwidth (BW). The mode index (or frame type) is defined and explained in Table 25 of 26.290 [2] (modes 16 to 23 are mono modes and modes 24 to 47 are stereo modes). The mapping from ISF index to ISF, corresponding frame size and bit rate factor is defined in Table 24 of 26.290 [2].
Table A.1 Mono default configurations
|
bit rate (kbps) |
Mode Index (bit rate at nominal ISF of 25.6 kHz) |
ISF index (sampling frequency) |
Bit rate factor |
BW (kHz) |
|
5.85 |
16 (10.4 kbps) |
2 (14.4 kHz) |
9/16 |
7.2 |
|
6.933 |
16 (10.4 kbps) |
4 (17.067) |
2/3 |
8.533 |
|
7.8 |
16 (10.4 kbps) |
5 (19.2 kHz) |
3/4 |
9.6 |
|
8.667 |
16 (10.4 kbps) |
6 (21.33 kHz) |
5/6 |
10.67 |
|
9.75 |
16 (10.4 kbps) |
7 (24 kHz) |
15/16 |
12 |
|
11.25 |
17 (12 kbps) |
7 (24 kHz) |
15/16 |
12 |
|
12 |
17 (12 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
13.6 |
18 (13.6 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
15.2 |
19 (15.2 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
17.1 |
19 (15.2 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
18.9 |
20 (16.8 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
21.6 |
21 (19.2 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
24 |
21 (19.2 kbps) |
10 (32 kHz) |
5/4 |
16 |
|
26 |
22 (20.8 kbps) |
10 (32 kHz) |
5/4 |
16 |
|
30 |
23 (24 kbps) |
10 (32 kHz) |
5/4 |
16 |
|
32 |
23 (24 kbps) |
11 (34.13 kHz) |
4/3 |
17.06 |
|
33.75 |
23 (24 kbps) |
12 (36 kHz) |
45/32 |
18 |
|
36 |
23 (24 kbps) |
13 (38.4 kHz) |
3/2 |
19.2 |
Table A.2: Stereo default configurations
|
bit rate (kbps) |
Mode Index (bit rate at nominal ISF of 25.6 kHz) |
ISF index (sampling frequency) |
Bit rate factor |
BW (kHz) |
|
6.975 |
24 (12.4 kbps) |
2 (14.4 kHz) |
9/16 |
7.2 |
|
8.267 |
24 (12.4 kbps) |
4 (17.067 kHz) |
2/3 |
8.533 |
|
9.3 |
24 (12.4 kbps) |
5 (19.2 kHz) |
3/4 |
9.6 |
|
10.33 |
24 (12.4 kbps) |
6 (21.33 kHz) |
5/6 |
10.67 |
|
11.65 |
24 (12.4 kbps) |
7 (24 kHz) |
15/16 |
12 |
|
13.125 |
26 (14 kbps) |
7 (24 kHz) |
15/16 |
12 |
|
14.25 |
28 (15.2 kbps) |
7 (24 kHz) |
15/16 |
12 |
|
15 |
29 (16 kbps) |
7 (24 kHz) |
15/16 |
12 |
|
16 |
29 (16 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
17.2 |
31 (17.2 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
18 |
32 (18 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
19.2 |
34 (19.2 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
20 |
35 (20 kbps) |
8 (25.6 kHz) |
1 |
12.8 |
|
22.5 |
35 (20 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
23.85 |
37 (21.2 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
25.2 |
38 (22.4 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
27 |
40 (24 kbps) |
9 (28.8 kHz) |
9/8 |
14.4 |
|
30 |
40 (24 kbps) |
10 (32 kHz) |
5/4 |
16 |
|
32 |
40 (24 kbps) |
11 (34.13 kHz) |
4/3 |
17.06 |
|
34.13 |
41 (25.6 kbps) |
11 (34.13 kHz) |
4/3 |
17.06 |
|
36 |
41 (25.6 kbps) |
12 (36 kHz) |
45/32 |
18 |
|
37.6875 |
43 (26.8 kbps) |
12 (36 kHz) |
45/32 |
18 |
|
40.2 |
43 (26.8 kbps) |
13 (38.4 kHz) |
3/2 |
19.2 |
|
43.2 |
44 (28.8 kbps) |
13 (38.4 kHz) |
3/2 |
19.2 |
|
45 |
46 (30 kbps) |
13 (38.4 kHz) |
3/2 |
19.2 |
|
48 |
47 (32 kbps) |
13 (38.4 kHz) |
3/2 |
19.2 |
A.1.2 Flexible mode
Flexible ("expert") mode requires more knowledge of AMR-WB+ to use the full capacity of the codec. In this mode, the user can choose the core bit rate and the stereo extension rate, in case of stereo operation, and the internal sampling frequency, to attain a certain bit rate.
The usage is as follows:
|
AmrwbPlusEncode -mi <mode> [-isf <factor>] [-lc] [-dtx] [-ff <3gp/raw>] –if <infile.wav> -of <outfile.wb+> [-cf <configfile.txt>] |
|
|
Where |
|
|
AmrwbPlusEncode |
Name of the AMR-WB+ encoder program either compiled from the floating-point C-code of this specification or from the fixed-point C-code of [1]. |
|
-mi |
Mode Index 0..8 AMR-WB 10..13 AMR-WB+ special modes 16..23 AMR-WB+ mono modes 24..47 AMR-WB+ stereo modes |
|
-isf |
Internal Sampling Frequency factor 0.5..1.5 If this option is missing, the default is 1.0 |
|
-lc |
Low complexity (for AMR-WB+ modes) If this option is missing, the default is to use normal encoding. Note: This option is designed for terminal-based encoding. For optimal quality, it is recommended to not use this parameter. |
|
-dtx |
Enables VAD/DTX functionality (only for AMR-WB modes) If this option is missing, the default is NO DTX |
|
-ff |
File format: 3gp raw If this option is missing, the default is 3gp file format. |
|
-if |
Input audio WAV file Supported audio sampling rates are 8, 16, 24, 32, 48, 11.025, 22.05, 44.1 kHz Modes 0..8 require 16 kHz input audio sampling rate. Modes 10..13 require 16 or 24 kHz audio sampling rate. |
|
-of |
Output file (according to the -ff argument) |
|
-cf |
Configuration file: an auxiliary file that can be used for bit rate switching |
The mode index (mi) can be found in Tables 21 and 25 of 26.290 [2] and the Internal Sampling Frequency (ISF) in Table 24 of 26.290 [2].
Table 21 of 26.290 [2] contains the AMR-WB–compatible modes and four AMR-WB+ special modes (mode index 10-13). The AMR-WB+ special modes have a fixed ISF (ISF index = 0 from Table 24 of 26.290 [2]). The codec can switch dynamically between the AMR-WB and AMR-WB+ special modes (Table 21 of 26.290 [2]) if AMR-WB+ is operated at 16 kHz.
Table 25 of 26.290 [2] contains the AMR-WB+ mono and stereo modes. The core and stereo mode indices of Table 25 of 26.290 [2] correspond to Tables 22 and 23 of 26.290 [2]. The bit rates specified in Table 25 of 26.290 [2] are for a nominal ISF of 25600 Hz (bit rate factor = 1.0). The output bit rate can be computed by multiplying the bit rate value from Table 25 of 26.290 [2] and the bit rate factor from Table 24 of 26.290 [2].
Stereo flexibility
In case of stereo operation, the flexible mode provides some degree of flexibility for trade-off between mono and stereo extension bit rates. This can be content dependent where higher or lower stereo extension bit rates can be used depending on the correlation between the two channels. In Table 25 of 26.290 [2], there are 24 stereo modes (mode indices 24 to 47). These modes correspond to the 8 core mono modes where 3 different extension rates are combined with each core mode. For example, mode indices 39, 40, and 41 correspond to a core bit rate of 19.2 kbps combined with stereo extension rates of 4.0, 4.8, and 6.4 kbps, respectively. This results in total bit rates of 23.2, 24.0, and 25.6 kbps. In these stereo modes, the ratio between the stereo extension rate and the total bit rate is 17.1%, 20%, and 25%, respectively.
Choosing encoded bandwidth
The flexibility in choosing the ISF gives the user the choice to adjust the coded audio bandwidth depending on the input signal. For instance, in case of speech signals, a bandwidth up to 14 kHz is sufficient to attain transparent quality.
To determine the AMR-WB+ mode, one approach could be to first determine the bandwidth of the signal that required to be encoded. Table 24 of 26.290 [2] can be used to choose the ISF. Then the core and stereo rates are chosen from Tables 25, 22, and 23 of 26.290 [2]. (Note that these bit rates should be scaled with the bit rate factor from Table 24 of 26.290 [2].)
Further tuning can be done by adjusting the ISF. The HF encoding uses relatively few bits compared to the LF; therefore, the LF part of the signal has a higher definition than the HF. Therefore, increasing the ISF can be considered even in cases where the resulting bandwidth might exceed the input signal bandwidth if the bit budget allows it.
The graph below explains the three different possibilities.
Case A is when the ISF is set to be smaller than the signal spectrum. This can be used when the bit rate or the CPU load has to be reduced.
Case B is probably the most usual situation. The signal spectrum is matched with the ISF.
Case C can be used when high quality is required and an adequate bit rate budget is available. In this case, the HF exceeds the signal bandwidth, but the bits allocated for the HF will be used to encode the active part only. For example, this case can be used to encode a signal with input bandwidth limited to 14 kHz at a bit rate of 36 kbps. If we use mode index 36 (24 kbps at nominal ISF) and ISF of 38.4 kHz (bit rate factor 1.5), the resulting bit rate will be 36 kbps with the LF encoded up to 9.6 kHz and the HF from 9.6 to 14 kHz.
Bit rate switching using a configuration file
Bit rate switching can be simulated using an auxiliary configuration file. The option -cf refers to a text file that allows for changing Mode Index and ISF dynamically during a program run. The configuration file contains a time reference, a specific extension (AMR-WB or AMR-WB+), mode index (mi), and an internal sampling frequency (ISF), used to encode at that specific time. The encoder keeps the last setting to encode the remaining part of the file. To use -cf option to switch between AMR-WB and AMR-WB+ special modes, input files at 16 kHz sampling rate need to be used and the decoder needs to use the option -fs 16000 (see Section A.2 below).
Each configuration file consists of 4 columns consisting of "time" "ext" "mode_index" "fscale".
"time" is specified in seconds and must always be > 0.
"extension" is 0 or 1 for choosing AMR-WB or AMR-WB+ modes.
"mi" = [0..47], where [0..15] is AMR-WB and AMR-WB+ special modes, and [16..47] is for AMR-WB+ extension modes
"isf"= [0.5..1.5] for AMR-WB+ extension modes, and represent the bit rate factor. "isf" is set to zero for mode indices 0 to 15.
The following is an example of a configuration file for switching between AMR-WB and AMR-WB+ special modes.
|
#time |
Extension |
mi |
isf |
|
0.08 |
1 |
10 |
0.0 |
|
1.08 |
0 |
7 |
0.0 |
|
2.08 |
1 |
10 |
0.0 |
|
3.08 |
0 |
0 |
0.0 |
Here, "time" is in seconds, and "extension"=0 means AMR-WB and "extension"=1 means AMR-WB+. The value of "isf" for modes 0-15 must be zero. In this example, the encoder will use the initial configuration up to first 80 ms. At 80 ms it will start encoding with AMR-WB+ mode 10 (13.6 kbps). At time instant 1080 ms, it will start encoding with AMR-WB mode 7 (23.05 kbps). At time instant 2080 ms, it will start encoding with AMR-WB+ mode 10. Finally, at time instant 3080 ms, it will start encoding with AMR-WB mode 0 (6.6 kbps) till the end of the file. This can be seen as using the initial configuration for the first 80 ms, using mode 10 for 1 second, using mode 7 for 1 second, using mode 10 for 1 second, and using mode 0 for the remaining of the file.
In the above example, if the following command line is used:
AmrwbPlusEncode -ff raw -mi 12 -cf switch_amrwb.txt -if Input.wav -of bit_stream
then the first 80 ms will be encoded with AMR-WB+ mode 12.
The example below shows a configuration file where mode index and ISF are switched.
|
#time |
Extension |
mi |
isf |
|
0.5 |
1 |
16 |
1.0 |
|
1.0 |
1 |
20 |
1.5 |
|
2.0 |
1 |
23 |
1.5 |
|
3.0 |
1 |
35 |
0.8 |
|
4.0 |
1 |
40 |
1.0 |
|
5.0 |
1 |
47 |
1.5 |
|
10.0 |
1 |
23 |
1.0 |