6.1.2 TCX mode decoding and signal synthesis
26.2903GPPAudio codec processing functionsExtended Adaptive Multi-Rate - Wideband (AMR-WB+) codecRelease 17Transcoding functionsTS
The TCX decoder is shown in Figure 13.
Figure 13: Block diagram of the TCX decoder
Figure 13 shows a block diagram of the TCX decoder including the following two cases:
Case 1: Packet-erasure concealment in TCX-256 when the TCX frame length is 256 samples and the related packet is lost i.e. BFI_TCX = (1), as shown in Figure 13-a.
Case 2: Normal TCX decoding, possibly with partial packet losses, as shown in Figure 13-b..
In Case 1, no information is available to decode the 256-sample TCX frame. The TCX synthesis is found by processing the past excitation delayed by T, where T=pitch_tcx is a pitch lag estimated in the previously decoded TCX frame, by a non-linear filter roughly equivalent to . A non-linear filter is used instead of
to avoid clicks in the synthesis. This filter is decomposed in 3 steps:
Step 1: filtering by
to map the excitation delayed by T into the TCX target domain;
Step 2: applying a limiter (the magnitude is limited to rmswsyn)
Step 3: filtering by
to find the synthesis. Note that the buffer OVLP_TCX is set to zero in this case.
Decoding of the algebraic VQ parameters
In Case 2, TCX decoding involves decoding the algebraic VQ parameters describing each quantized block of the scaled spectrum X’, where X’ is as described in Step 2 of Section 5.3.5.7. Recall that X’ has dimension N, where N = 288, 576 and 1152 for TCX-256, 512 and 1024 respectively, and that each block B’k has dimension 8. The number K of blocks B’k is thus 36, 72 and 144 for TCX-256, 512 and 1024 respectively. The algebraic VQ parameters for each block B’k are described in Step 5 of Section 5.3.5.7. For each block B’k , three sets of binary indices are sent by the encoder:
a) the codebook index nk, transmitted in unary code as described in Step 5 of Section 5.3.5.7;
b) the rank Ik of a selected lattice point c in a so-called base codebook, which indicates what permutation has to be applied to a specific leader (see Step 5 of Section 5.3.5.7) to obtain a lattice point c;
c) and, if the quantized block (a lattice point) was not in the base codebook, the 8 indices of the Voronoi extension index vector k calculated in sub-step V1 of Step 5 in Section; from the Voronoi extension indices, an extension vector z can be computed as in reference [7]. The number of bits in each component of index vector k is given by the extension order r, which can be obtained from the unary code value of index nk . The scaling factor M of the Voronoi extension is given by M = 2r.
Then, from the scaling factor M, the Voronoi extension vector z (a lattice point in RE8) and the lattice point c in the base codebook (also a lattice point in RE8), each quantized scaled block can be computed as
= M c + z
When there is no Voronoi extension (i.e. nk < 5, M=1 and z=0), the base codebook is either codebook Q0, Q2, Q3 or Q4 from reference [6]. No bits are then required to transmit vector k. Otherwise, when Voronoi extension is used because is large enough, then only Q3 or Q4 from reference [6] is used as a base codebook. The selection of Q3 or Q4 is implicit in the codebook index value nk,, as described in Step 5 of Section 5.3.5.7.
Decoding of the noise-fill parameter
The noise fill-in level noise is decoded by inverting the 3-bit uniform scalar quantization calculated at the encoder as in Step 4 of Section 5.3.5.7 . For an index 0 idx1 7, noise is given by: noise = 0.1 * (8 – idx1). However, it may happen that the index idx1 is not available. This is the case when BFI_TCX = (1) in TCX-256, (1 X) in TCX-512 and (X 1 X X) in TCX-1024, with X representing an arbitrary binary value. In this case, noise is set to its maximal value, i.e. noise = 0.8.
Comfort noise is injected in the subvectors Bk rounded to zero and which correspond to a frequency above Fs/2 kHz 4. More precisely, Z is initialized as Z = Y and for K/6 k K (only), if Yk = (0, 0, …,0), Zk is replaced by the 8-dimensional vector:
noise * [ cos(1) sin(1) cos(2) sin(2) cos(3) sin(3) cos(4) sin(4) ],
where the phases 1, 2, 3 and 4 are randomly selected.
Low-frequency de-emphasis
After decoding the algebraic VQ parameters and noise-fill parameter, we obtain the quantized pre-shaped TCX spectrum X’. De-shaping is then applied as in Section 5.3.5.6.
Estimation of the dominant pitch value
The estimation of the dominant pitch is performed so that the next frame to be decoded can be properly extrapolated if it corresponds to TCX-256 and if the related packet is lost. This estimation is based on the assumption that the peak of maximal magnitude in spectrum of the TCX target corresponds to the dominant pitch. The search for the maximum M is restricted to a frequency below Fs/64 kHz
M = maxi=1..N/32 ( X‘2i )2+ ( X‘2i+1 )2
and the minimal index 1 imax N/32 such that ( X‘2i )2+ ( X‘2i+1 )2 = M is also found. Then the dominant pitch is estimated in number of samples as Test = N / imax (this value may not be integer). Recall that the dominant pitch is calculated for packet-erasure concealment in TCX-256. To avoid buffering problems (the excitation buffer being limited to 256 samples ), if Test > 256 samples, pitch_tcx is set to 256 ; otherwise, if Test 256, multiple pitch period in 256 samples are avoided by setting pitch_tcx to
pitch_tcx = max { n Test | n integer > 0 and n Test 256}
where . denotes the rounding to the nearest integer towards -.
Inverse transform
To obtain the quantized perceptual signal, an inverse transform is applied to the de-shaped spectrum X’. The transform used at the encoder and decoder is a the discrete Fourier transform, and is implemented as an FFT and IFFT, respectively. Recall that due to the ordering used at the TCX encoder, the transform coefficients X’=(X’0,…,X’N-1) are such that:
X’0 corresponds to the DC coefficient,
X’1 corresponds to the Nyquist frequency, and
the coefficients X’2k and X’2k+1, for k=1..N/2-1, are the real and imaginary parts of the Fourier component of frequency of k(/N/2) * Fs/4 kHz.
X’1 is always forced to 0. After this zeroing, the time-domain TCX target signal x’w is found by applying an inverse FFT to the quantized scaled spectrum X. Rescaling will be applied in the following section, to obtain the total quantized weighted signal prior to windowing and overlapping.
Decoding of the glocal TCX gain and scaling
The (global) TCX gain gTCX is decoded by inverting the 7-bit logarithmic quantization calculated in the TCX encoder as in Section 5.2.5.10 . First, the r.m.s. value of the TCX target signal x’w is computed as:
rms = sqrt(1/N (x‘w02 + x‘w12 +…+ x‘wL-12))
From the received 7-bit index 0 idx2 127, the TCX gain is given by:
The (logarithmic) quantization step is around 0.71 dB.
This gain is used to scale x’w into xw. Note that from the mode extrapolation and the gain repetition strategy, the index idx2 is available in case of frame loss. However, in case of partial packet losses (1 loss for TCX-512 and up to 2 losses for TCX-1024) the least significant bit of idx2 may be set by default to 0 in the demultiplexer.
Windowing and overlap
Since the TCX encoder employs windowing with overlap and weighted ZIR removal prior to transform coding of the target signal, the reconstructed TCX target signal x = (x0, x1, …, xN-1) is actually found by overlap-add. The overlap-add depends on the type of the previous decoded frame (ACELP or TCX). The TCX target signal is first multiplied by a window w = [w0 w1 … wN-1], whose shape is described in Section 5.3.5.4.
Then, the overlap from the past decoded frame (OVLP_TCX) is added to the present windowed signal x. The overlap length OVLP_TCX depends on the past TCX framelength and on the mode of the past frame (ACELP or TCX).
Computation of the synthesis signal
The reconstructed TCX target is then filtered through the zero-state inverse perceptual filter to find the synthesis signal which will be applied to the synthesis filter. The excitation is also calculated to update the ACELP adaptive codebook and allow to switch from TCX to ACELP in a subsequent frame. Note that the length of the TCX synthesis is given by the TCX frame length (without the overlap): 256, 512 or 1024 samples.