3 Functional description of the RPE‑LTP codec
3GPP46.010Full rate speechRelease 17TranscodingTS
The block diagram of the RPE‑LTP‑coder is shown in figure 3.1. The individual blocks are described in the following clauses.
3.1 Functional description of the RPE‑LTP encoder
The Pre‑processing clause of the RPE‑LTP encoder comprises the following two sub‑blocks:
‑ Offset compensation (3.1.1);
‑ Pre‑emphasis (3.1.2).
The LPC analysis clause of the RPE‑LTP encoder comprises the following five sub‑blocks:
‑ Segmentation (3.1.3);
‑ Auto‑Correlation (3.1.4);
‑ Schur Recursion (3.1.5);
‑ Transformation of reflection coefficients to Log.‑Area Ratios (3.1.6);
‑ Quantization and coding of Log.‑Area Ratios (3.1.7).
The Short term analysis filtering clause of the RPE‑LTP comprises the following four sub‑blocks:
‑ Decoding of the quantized Log.‑Area Ratios (LARs) (3.1.8);
‑ Interpolation of Log.‑Area Ratios (3.1.9);
‑ Transformation of Log.‑Area Ratios into reflection coefficients (3.1.10);
‑ Short term analysis filtering (3.1.11).
The Long Term Predictor (LTP) clause comprises 4 sub‑blocks working on subsegments (3.1.12) of the short term residual samples:
‑ Calculation of LTP parameters (3.1.13);
‑ Coding of the LTP lags (3.1.14) and the LTP gains (3.1.15);
‑ Decoding of the LTP lags (3.1.14) and the LTP gains (3.1.15);
‑ Long term analysis filtering (3.1.16), and Long term synthesis filtering (3.1.17).
The RPE encoding clause comprises five different sub‑blocks:
‑ Weighting filter (3.1.18);
‑ Adaptive sample rate decimation by RPE grid selection (3.1.19);
‑ APCM quantization of the selected RPE sequence (3.1.20);
‑ APCM inverse quantization (3.1.21);
‑ RPE grid positioning (3.1.22).
Pre‑processing clause
3.1.1 Offset compensation
Prior to the speech encoder an offset compensation, by a notch filter is applied in order to remove the offset of the input signal so to produce the offset‑free signal sof.
sof(k) = so(k) – so(k‑1) + alpha*sof(k‑1) (3.1.1)
alpha = 32735*2‑15
3.1.2 Pre‑emphasis
The signal sof is applied to a first order FIR pre‑emphasis filter leading to the input signal s of the analysis clause.
s(k) = sof(k) – beta*sof(k‑1) (3.1.2)
beta= 28180*2‑15
LPC analysis clause
3.1.3 Segmentation
The speech signal s(k) is divided into non‑overlapping frames having a length of T0 = 20 ms (160 samples). A new LPC‑analysis of order p=8 is performed for each frame.
3.1.4 Autocorrelation
The first p+1 = 9 values of the Auto‑Correlation function are calculated by:
159
ACF(k)= s(i)s(ik) ,k = 0,1…,8 (3.2)
i=k
3.1.5 Schur Recursion
The reflection coefficients are calculated as shown in figure 3.2 using the Schur Recursion algorithm. The term "reflection coefficient" comes from the theory of linear prediction of speech (LPC), where a vocal tract representation consisting of series of uniform cylindrical clauses is assumed. Such a representation can be described by the reflection coefficients or the area ratios of connected clauses.
3.1.6 Transformation of reflection coefficients to Log.‑Area Ratios
The reflection coefficients r(i), (i=1..8), calculated by the Schur algorithm, are in the range:
‑1 <= r(i) <= + 1
Due to the favourable quantization characteristics, the reflection coefficients are converted into Log.‑Area Ratios which are strictly defined as follows:
1 + r(i)
Logarea(i) = log10 (———) (3.3)
1 – r(i)
Since it is the companding characteristic of this transformation that is of importance, the following segmented approximation is used.
r(i) ; r(i) < 0.675
LAR(i) = sign[r(i)]*[2r(i)‑0.675] ; 0.675 <= r(i) < 0.950
sign[r(i)]*[8r(i)‑6.375] ; 0.950 <= r(i) <= 1.000
(3.4)
with the result that instead of having to divide and obtain the logarithm of particular values, it is merely necessary to multiply, add and compare these values.
The following equation (3.5) gives the inverse transformation.
LAR'(i) ; LAR'(i)<0.675
r'(i)=sign[LAR'(i)]*[0.500*LAR'(i)
+0.337500] ; 0.675<=LAR'(i)<1.225
sign[LAR'(i)]*[0.125*LAR'(i)
+0.796875] ; 1.225<=LAR'(i)<=1.625
(3.5)
3.1.7 Quantization and coding of Log.‑Area Ratios
The Log.‑Area Ratios LAR(i) have different dynamic ranges and different asymmetric distribution densities. For this reason, the transformed coefficients LAR(i) are limited and quantized differently according to the following equation (3.6), with LARc(i) denoting the quantized and integer coded version of LAR(i).
LARc(i) = Nint{A(i)*LAR(i) + B(i)} (3.6)
with
Nint{z} = int{z+sign{z}*0.5} (3.6a)
Function Nint defines the rounding to the nearest integer value, with the coefficients A(i), B(i), and different extreme values of LARc(i) for each coefficient LAR(i) given in table 3.1.
Table 3.1: Quantization of the Log.‑Area Ratios LAR(i)
LAR No i 
A(i) 
B(i) 
Minimum LARc(i) 
Maximum LARc(i) 
1 
20.000 
0.000 
‑32 
+31 
2 
20.000 
0.000 
‑32 
+31 
3 
20.000 
4.000 
‑16 
+15 
4 
20.000 
‑5.000 
‑16 
+15 
5 
13.637 
0.184 
‑ 8 
+ 7 
6 
15.000 
‑3.500 
‑ 8 
+ 7 
7 
8.334 
‑0.666 
‑ 4 
+ 3 
8 
8.824 
‑2.235 
‑ 4 
+ 3 
Short‑term analysis filtering clause
The current frame of the speech signal s is retained in memory until calculation of the LPC parameters LAR(i) is completed. The frame is then read out and fed to the short term analysis filter of order p=8. However, prior to the analysis filtering operation, the filter coefficients are decoded and pre‑processed by interpolation.
3.1.8 Decoding of the quantized Log.‑Area Ratios
In this block the quantized and coded Log.‑Area Ratios (LARc(i)) are decoded according to equation (3.7).
LAR”(i) = ( LARc(i) – B(i) )/ A(i) (3.7)
3.1.9 Interpolation of Log.‑Area Ratios
To avoid spurious transients which may occur if the filter coefficients are changed abruptly, two subsequent sets of Log.‑Area Ratios are interpolated linearly. Within each frame of 160 analysed speech samples the short term analysis filter and the short term synthesis filter operate with four different sets of coefficients derived according to table 3.2.
Table 3.2: Interpolation of LAR parameters (J=actual segment)
k 
LAR’J(i) = 
0…12 
0.75*LAR’ ‘J‑1(i) + 0.25*LAR’ ‘J(i) 
13…26 
0.50*LAR’ ‘J‑1(i) + 0.50*LAR’ ‘J(i) 
27…39 
0.25*LAR’ ‘J‑1(i) + 0.75*LAR’ ‘J(i) 
40..159 
LAR’ ‘J(i) 
3.1.10 Transformation of Log.‑Area Ratios into reflection coefficients
The reflection coefficients are finally determined using the inverse transformation according to equation (3.5).
3.1.11 Short term analysis filtering
The Short term analysis filter is implemented according to the lattice structure depicted in figure 3.3.
d0(k) = s(k) (3.8a)
u0(k) = s(k) (3.8b)
di(k) = di‑1(k) + r’i*ui‑1(k‑1) with i=1,…8 (3.8c)
ui(k) = ui‑1(k‑1) + r’i*di‑1(k) with i=1,…8 (3.8d)
d(k ) = d8(k) (3.8e)
Long‑Term Predictor (LTP) clause
3.1.12 Sub‑segmentation
Each input frame of the short term residual signal contains 160 samples, corresponding to 20 ms. The long term correlation is evaluated four times per frame, for each 5 ms subsegment. For convenience in the following, we note j=0,…,3 the sub‑segment number, so that the samples pertaining to the j‑th sub‑segment of the residual signal are now denoted by d(kj+k) with j = 0,…,3; kj = k0 + j*40 and k = 0,…,39 where k0 corresponds to the first value of the current frame.
3.1.13 Calculation of the LTP parameters
For each of the four sub‑segments a long term correlation lag Nj, (j=0,…,3), and an associated gain factor bj, (j=0,…,3) are determined. For each sub‑segment, the determination of these parameters is implemented in three steps.
1) The first step is the evaluation of the cross‑correlation Rj(lambda) of the current sub‑segment of short term residual signal d(kj+i),(i=0,…,39) and the previous samples of the reconstructed short term residual signal d'(kj+i), (i=‑120,…,‑1):
39 j = 0,…3
Rj(lambda) = d(kj+i)*d'(kj+ilambda); kj = k0 + j*40
i=0 lambda = 40,…,120
(3.9)
The cross‑correlation is evaluated for lags lambda greater than or equal to 40 and less than or equal to 120, i.e. corresponding to samples outside the current sub‑segment and not delayed by more than two sub‑segments.
2) The second step is to find the position Nj of the peak of the cross‑correlation function within this interval:
Rj(Nj) = max { Rj(lambda); lambda = 40..120 };
j = 0,…,3
(3.10)
3) The third step is the evaluation of the gain factor bj according to:
bj = Rj(Nj) / Sj(Nj); j = 0,…,3 (3.11)
with
39
Sj(Nj) = d’2 (kj+iNj); j = 0,…,3 (3.12)
i=0
It is clear that the last 120 samples of the reconstructed short term residual signal d'(kj+i),(i=‑120,…,‑1) shall be retained until the next sub‑segment so as to allow the evaluation of the relations (3.9),…,(3.12).
3.1.14 Coding/Decoding of the LTP lags
The long term correlation lags Nj,(j=0,…,3) can have values in the range (40,…,120), and so shall be coded using 7 bits with:
Ncj = Nj; j = 0,…,3 (3.13)
At the receiving end, assuming an error free transmission, the decoding of these values will restore the actual lags:
Nj’ = Ncj; j = 0,…,3 (3.14)
3.1.15 Coding/Decoding of the LTP gains
The long term prediction gains bj,(j=0,…,3) are encoded with 2 bits each, according to the following algorithm:
if bj <= DLB(i) then bcj = 0; i=0
if DLB(i‑1) < bj <= DLB(i) then bcj = i; i=1,2 (3.15)
if DLB(i‑1) < bj then bcj = 3; i=3
where DLB(i),(i=0,…,2) denotes the decision levels of the quantizer, and bcj represents the coded gain value. Decision levels and quantizing levels are given in table 3.3.
Table 3.3: Quantization table for the LTP gain
i 
Decision level 
Quantizing level 
DLB(i) 
QLB(i) 

0 
0.2 
0.10 
1 
0.5 
0.35 
2 
0.8 
0.65 
3 
1.00 
The decoding rule is implemented according to:
bj’ = QLB(bcj) ; j = 0,…,3 (3.16)
where QLB(i),(i=0,…,3) denotes the quantizing levels, and bj’ represents the decoded gain value (see table 3.3).
3.1.16 Long term analysis filtering
The short term residual signal d(k0+k),(k=0,…,159) is processed by sub‑segments of 40 samples. From each of the four sub‑segments (j=0,…,3) of short term residual samples, denoted here d(kj+k), (k=0,…,39), an estimate d"(kj+k), (k=0,…,39) of the signal is subtracted to give the long term residual signal e(kj+k), (k=0,…,39) (see figure 3.1):
j = 0,…,3
e(kj+k) = d(kj+k) – d"(kj+k) ; k = 0,…,39 (3.17)
kj = k0 + j*40
Prior to this subtraction, the estimated samples d"(kj+k) are computed from the previously reconstructed short term residual samples d’, adjusted to the current sub‑segment LTP lag Nj’ and weighted with the sub‑segment LTP gain bj’:
j = 0,…,3
d"(kj+k) = bj’*d'(kj+kNj’) ; k = 0,…,39 (3.18)
kj = k0 + j*40
3.1.17 Long term synthesis filtering
The reconstructed long term residual signal e'(k0+k),(k=0,…,159) is processed by sub‑segments of 40 samples. To each sub‑segment, denoted here e'(kj+k), (k=0,…,39), the estimate d"(kj+k), (k=0,…,39) of the signal is added to give the reconstructed short term residual signal d'(kj+k),(k=0,…,39):
j = 0,…,3
d'(kj+k) = e'(kj+k) + d"(kj+k) ; k = 0,…,39 (3.19)
kj = k0 + j*40
RPE encoding clause
3.1.18 Weighting Filter
A FIR "block filter" algorithm is applied to each sub‑segment by convolving 40 samples e(k) with the impulse response H(i) ; i=0,…,10 (see table 3.4).
Table 3.4: Impulse response of block filter (weighting filter)
i 
5 
4 (6) 
3 (7) 
2 (8) 
1 (9) 
0 (10) 
H(i)*213 
8192 
5741 
2054 
0 
‑374 
‑134 
H(Omega=0) = 2.779;
The conventional convolution of a sequence having 40 samples with an 11‑tap impulse response would produce 40+11‑1=50 samples. In contrast to this, the "block filter" algorithm produces the 40 central samples of the conventional convolution operation. For notational convenience the block filtered version of each sub‑segment is denoted by x(k), k=0,…,39.
10
x(k) = H(i) * e(k+5i) with k = 0,…,39 (3.20)
i=0
NOTE: e(k+5‑i) = 0 for k+5‑i<0 and k+5‑i>39.
3.1.19 Adaptive sample rate decimation by RPE grid selection
For the next step, the filtered signal x is down‑sampled by a ratio of 3 resulting in 3 interleaved sequences of lengths 14, 13 and 13, which are split up again into 4 sub‑sequences xm of length 13:
xm(i) = x(kj+m+3*i) ; i = 0,…,12 (3.21)
m = 0,…,3
with m denoting the position of the decimation grid. According to the explicit solution of the RPE mean squared error criterion, the optimum candidate sub‑sequence xM is selected which is the one with the maximum energy:
12
EM = max xm2(i) ; m = 0,…,3 (3.22)
m i=0
The optimum grid position M is coded as Mc with 2 bits.
3.1.20 APCM quantization of the selected RPE sequence
The selected sub‑sequence xM(i) (RPE sequence) is quantized, applying APCM (Adaptive Pulse Code Modulation). For each RPE sequence consisting of a set of 13 samples xM(i) ,the maximum xmax of the absolute values xM(i) is selected and quantized logarithmically with 6 bits as xmaxc as given in table 3.5.
Table 3.5: Quantization of the block maximum xmax
xmax 
x’max _{ } 
xmaxc _{ } 
xmax 
x’max 
xmaxc 

0 .. 31 
31 
0 
2048 .. 2303 
2303 
32 

32 .. 63 
63 
1 
2304 .. 2559 
2559 
33 

64 .. 95 
95 
2 
2560 .. 2815 
2815 
34 

96 .. 127 
127 
3 
2816 .. 3071 
3071 
35 

128 .. 159 
159 
4 
3072 .. 3327 
3327 
36 

160 .. 191 
191 
5 
3328 .. 3583 
3583 
37 

192 .. 223 
223 
6 
3584 .. 3839 
3839 
38 

224 .. 255 
255 
7 
3840 .. 4095 
4095 
39 

256 .. 287 
287 
8 
4096 .. 4607 
4607 
40 

288 .. 319 
319 
9 
4608 .. 5119 
5119 
41 

320 .. 351 
351 
10 
5120 .. 5631 
5631 
42 

352 .. 383 
383 
11 
5632 .. 6143 
6143 
43 

384 .. 415 
415 
12 
6144 .. 6655 
6655 
44 

416 .. 447 
447 
13 
6656 .. 7167 
7167 
45 

448 .. 479 
479 
14 
7168 .. 7679 
7679 
46 

480 .. 511 
511 
15 
7680 .. 8191 
8191 
47 

512 .. 575 
575 
16 
8192 .. 9215 
9215 
48 

576 .. 639 
639 
17 
9216 .. 10239 
10239 
49 

640 .. 703 
703 
18 
10240 .. 11263 
11263 
50 

704 .. 767 
767 
19 
11264 .. 12287 
12287 
51 

768 .. 831 
831 
20 
12288 .. 13311 
13311 
52 

832 .. 895 
895 
21 
13312 .. 14335 
14335 
53 

896 .. 959 
959 
22 
14336 .. 15359 
15359 
54 

960 .. 1023 
1023 
23 
15360 .. 16383 
16383 
55 

1024 .. 1151 
1151 
24 
16384 .. 18431 
18431 
56 

1152 .. 1279 
1279 
25 
18432 .. 20479 
20479 
57 

1280 .. 1407 
1407 
26 
20480 .. 22527 
22527 
58 

1408 .. 1535 
1535 
27 
22528 .. 24575 
24575 
59 

1536 .. 1663 
1663 
28 
24576 .. 26623 
26623 
60 

1664 .. 1791 
1791 
29 
26624 .. 28671 
28671 
61 

1792 .. 1919 
1919 
30 
28672 .. 30719 
30719 
62 

1920 .. 2047 
2047 
31 
30720 .. 32767 
32767 
63 
For the normalization, the 13 samples are divided by the decoded version x’max of the block maximum. Finally, the normalized samples:
x'(i) = xM(i)/x’max ; i=0,…,12 (3.23)
are quantized uniformly with three bits to xMc(i) as given in table 3.6.
Table 3.6: Quantization of the normalized RPE‑samples
x’*215 
xM’*215 
xMc 
(Interval‑limits) 
(Channel) 

‑32768 … ‑24577 
‑28672 
0 = 000 
‑24576 … ‑16385 
‑20480 
1 = 001 
‑16384 … ‑8193 
‑12288 
2 = 010 
‑8192 … ‑1 
‑4096 
3 = 011 
0 … 8191 
4096 
4 = 100 
8192 … 16383 
12288 
5 = 101 
16384 … 24575 
20480 
6 = 110 
24576 … 32767 
28672 
7 = 111 
3.1.21 APCM inverse quantization
The xMc(i) are decoded to xM'(i) and denormalized using the decoded value x’maxc leading to the decoded sub‑sequence x’M(i).
3.1.22 RPE grid positioning
The quantized sub‑sequence is upsampled by a ratio of 3 by inserting zero values according to the grid position given with Mc.
3.2 Decoder
The decoder comprises the following 4 clauses. Most of the sub‑blocks are also needed in the encoder and have been described already. Only the short term synthesis filter and the de‑emphasis filter are added in the decoder as new sub‑blocks.
‑ RPE decoding clause (3.2.1);
‑ Long Term Prediction clause (3.2.2);
‑ Short term synthesis filtering clause (3.2.3);
‑ Post‑processing (3.2.4).
The complete block diagram for the decoder is shown in figure 3.4. The variables and parameters of the decoder are marked by the index r to distinguish the received values from the encoder values.
3.2.1 RPE decoding clause
The input signal of the long term synthesis filter (reconstruction of the long term residual signal) is formed by decoding and denormalizing the RPE‑samples (APCM inverse quantization ‑ 3.1.21) and by placing them in the correct time position (RPE grid positioning ‑ 3.1.22). At this stage, the sampling frequency is increased by a factor of 3 by inserting the appropriate number of intermediate zero‑valued samples.
3.2.2 Long Term Prediction clause
The reconstructed long term residual signal er’ is applied to the long term synthesis filter (see 3.1.16 and 3.1.17) which produces the reconstructed short term residual signal dr’ for the short term synthesizer.
3.2.3 Short term synthesis filtering clause
The coefficients of the short term synthesis filter (see figure 3.5) are reconstructed applying the identical procedure to that in the encoder (3.1.8 ‑ 3.1.10). The short term synthesis filter is implemented according to the lattice structure depicted in figure 3.5.
sr(0)(k) = dr'(k) (3.24a)
sr(i)(k) = sr(i‑1)(k) – rr'(9i) * v8i(k‑1); i=1,…,8
(3.24b)
v9i(k) = v8i(k‑1) + rr'(9i) * sr(i)(k); i=1,…,8
(3.24c)
sr'(k) = sr(8)(k) (3.24d)
v0(k) = sr(8)(k) (3.24e)
3.2.4 Post‑processing
The output of the synthesis filter sr(k) is fed into the IIR‑ de‑emphasis filter leading to the output signal sro.
sro(k) = sr(k) + beta*sro(k‑1) ; beta= 28180*2‑15 (3.25)
Figure 3.1: Block diagram of the RPE ‑ LTP encoder
Figure 3.2: LPC analysis using Schur recursion
Figure 3.3: Short term analysis filter
Figure 3.4: Block diagram of the RPE‑LTP decoder
Figure 3.5: Short term synthesis filter