6.2.2 MDCT based TCX
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
A general description of the MDCT based TCX coder module can be found in subclause5.3.3.1. In the following, the configurations are described, the initialization process from the bit stream data and the general decoding process.
6.2.2.1 Rate dependent configurations
The rate dependent configuration of the TCX coder is described for the encoder in subclause 5.3.3.1.2. The configurations are valid for the decoder as well. Depending on the configuration, the initialization order of the modules changes. The following subclause describes the module initialization therefore just on a principle level.
6.2.2.2 Init module parameters
6.2.2.2.1 TCX block configuration
The coding modes TCX20 or TCX10 are signalled within the bit stream. Binary code for the overlap width as defined in subclause 5.3.2.3 is read from the bitstream. Overlap code for the current frame is formed from the short/long transform decision bit and from the binary code for the overlap width as defined in subclause 5.3.2.3. The TCX block is then configured as described in subclause 6.2.4.2.
6.2.2.2.2 LPC parameter
This subclause describes the inverse quantization of LPC.
6.2.2.2.2.1 Low-rate LPC
MDCT based TCX relies on smoothed LPC spectral envelope. Decoding process of LSF is common to that for ACELP except the case for 9.6 kbps (NB/WB/SWB). Decoding process of weighted LSF and conversion of LSF used for 9.6 kbps (NB/WB/SWB) are described in this subclause.
Weighted domain quantized LSF vector is reconstructed with two stage or three stage VQ as described in 5.3.3.2.1. These VQ codebooks are used in two ways. In case of primary decoding, weighted LSF vector
is retrieved from two stage VQ and after adding mean vector
and the MA predicted contribution vector
, reconstructed LSF vector,
is obtained.
is corresponding to the weighted envelope and can be directly used for inverse shaping of MDCT coefficients. The VQ uses 5 bits for 16 samples at the first stage, 4 bits for the lower 6 LSF parameters, and 4 bits for the higher 10 LSF parameters.
In the secondary decoding, weighted LSF vector is reconstructed without adding MA predicted contribution vector. This can inform the shape of envelope for the envelope base arithmetic coding even when the decoder cannot get the information from the previous frame. The conditional third stage VQ with addition of mean vector is applied to the retrieved vector from two stage VQ to get . After reconstruction of LSF vector
with two stage VQ and adding mean vector
, the first and the second lower position
and
of the reconstructed LSF vector
are checked. If the
is expected to have large spectral distortion from
, the third stage VQ is applied. The retrieved vector
at the third stage VQ with 2 bits is applied to modify
and
to get
, only when,
or
has smaller values than the threshold. Otherwise, the reconstructed LSF up to the second stage VQ,
is used for the final reconstruction LSF,
.
For the interpolation of LSF between the LSF in the possible ACELP at the next frame, the reconstructed LSF vector with MA prediction needs to be converted to unweighted domain LSF vector
. In envelope based arithmetic coding, unweighted LSF vector without depending on MA prediction
is also necessary to estimate the MDCT envelope. These can be achieved by low-complex direct matrix conversion described in 5.3.3.2.1.1
6.2.2.2.2.2 Mid-rate LPC
The inverse quantization for mid-rate bitrates (13.2 till 32 kbps) is the same as for ACELP (with the quantizer being in AUDIO mode). However there is a different interpolation, the same as described in subclause 5.3.3.2.1.2.
6.2.2.2.2.3 High-rate LPC
Inverse quantization of an LPC filter is performed as described in figure 93. The LPC filters are quantized using the line spectral frequency (LSF) representation. A first-stage approximation is computed by inverse Vector Quantization by a simple table look-up with an 8 bit index. An algebraic vector quantized (AVQ) refinement is then calculated as described in 6.1.1.2.1.4. The quantized LSF vector is reconstructed by adding the first-stage approximation and the inverse-weighted AVQ contribution.
Depending on the frame being coded as a single TCX20 frame or subdivided into TCX10/TCX5 sub-frames one or two sets of LPC have to be de-quantized. In case two sets are transmitted the first set is decoded in the same way a single set would be decoded. For the second set an initial bit signals if it has to be decoded depending on the first set or not. If zero, the first stage approximation of the second set has to be decoded with another 8 bit inverse Vector Quantizer. If one, the inverse quantized first set will be used as first stage approximation. The inverse weights in figure 93 are the reciprocal of the weights used in the encoder (see subclause 5.3.3.2.1.3.3).
The AVQ decoder is described in subclause 6.1.1.2.1.4.
Figure 93: Overview high-rate LPC decoding
The inverse-quantized LSF vector is subsequently converted into a vector of LSF coefficients, then interpolated and converted again into LPC parameters. The interpolation is the same described in 5.3.3.2.1.2.
6.2.2.2.3 PLC Wavefrom adjustment
The waveform adjustment tool reads one bit for initialization where 1 stands for harmonic and 0 for non-harmonic.
6.2.2.2.4 Global Gain
The TCX global gain index is transmitted in the bitstream as a 7 bit unsigned integer.
6.2.2.2.5 Noise fill parameter
The TCX noise factor index is transmitted in the bitstream as a 3 bit unsigned integer.
6.2.2.2.6 LTP
The LTP on/off flag is transmitted in the bitstream as a single bit. If the LTP flag is one, the quantized pitch lag and gain are transmitted after the LTP flag. The pitch lag index
is transmitted as unsigned 9 bit integer. The gain index
is transmitted as unsigned 2 bit integer.
6.2.2.2.7 TNS parameter
The TNS on/off flat is transmitted in the bitstream as a single bit. If the TNS flag is one and if the configuration (see subclause 5.3.3.2.2) indicates that 2 filters are possible then an additional single bit transmitted in the bitstream indicates if the parameters for 1 or 2 filters are transmitted in the bitstream (nMaxFilters = 1 or nMaxFilters = 2). For each filter, order and coefficients are the parameters transmitted in the bitstream. The order and the coefficients are coded using Huffman coding. Which Huffman code will be used for the filter order depends on the frame configuration (TCX20/TCX10/TCX5) and on the bandwidth (SWB,WB). Which Huffman code will be used for a parcor coefficient depends on the frame configuration (TCX20/TCX10/TCX5), on the bandwidth (SWB,WB) and on the parcor coefficentās index. If the parameters for 1 filter are transmited in the bitstream, then the second filter is inactive. If the parameters for 2 filters are transmitted in the bitstream, then order 1 and the filter coefficient set to 0 for the first filter indicates that the first filter is disabled and that only the second filter is active.
6.2.2.2.8 Harmonic model
For both context and envelope based arithmetic coding, a harmonic model is used for efficient coding of frames with harmonic content. The harmonic model is disabled if any of the following conditions apply:
– The bit-rate is not one of 9.6, 13.2, 16.4, 24.4, 32, 48 kbps.
– The previous frame was coded by ACELP.
– Envelope based arithmetic coding is used and the coder type is neither Voiced nor Generic.
In the above cases, no further signalling is used.
Otherwise, a single-bit harmonic model flag is read from the bit-stream. When the flag is non-zero, the decoding proceeds by reading the harmonic model interval parameter as follows.
6.2.2.2.8.1 Decoding of Interval of harmonics
When pitch lag and gain are used for the LTP post processing, the lag parameter is utilized for representing the interval of harmonics in the frequency domain. Otherwise, normal representation of interval is applied.
6.2.2.2.8.1.1 Decoding interval depending on time domain pitch lag
According to the procedure in subclause 5.3.3.2.8.1.8.1.1, ,
,
are set up,
is read from the bit-stream, and finally
is calculated.
6.2.2.2.8.1.2 Decoding interval without depending on time domain pitch lag
When pitch lag and gain in the time domain is not used or the pitch gain is less than or equals to 0.46, normal decoding of the interval with un-equal resolution is used. The 8-bit is read from the bit-stream and
and
are calculated as in subclause 5.3.3.2.8.1.8.1.1.
6.2.2.2.8.2 Decoding of gain
In case of envelope based arithmetic coding and Voiced coder type, a 2-bit gain index is read from the bit-stream.
6.2.2.2.9 IGF bit stream reader
On the decoder side the IGF scale factors, the IGF whitening levels and the IGF temporal flatness indicator flag are extracted from the bit stream and subsequently decoded. The decoding of the IGF scale factors is described in subclause 6.2.2.2.9.2.
6.2.2.2.9.1 IGF whitening level decoding
The IGF whitening levels are decoded according to the following pseudo code:
nT = ;
for (k = 0; k < nT; k++) {
= 0;
}
tmp = -1;
if () {
tmp = 0;
} else {
tmp = read_bit();
}
if (tmp == 1) {
for (k = 0; k < nT; k++) {
=
;
}
} else {
k = 0;
= decode_whitening_level(k);
tmp = read_bit();
if (tmp == 1) {
for (k = 1; k < nT; k++) {
= decode_whitening_level(k);
}
} else {
for (k = 1; k < nT; k++) {
=
;
}
}
}
for (k = 0; k < nT; k++) {
=
;
}
wherein the vector contains the whitening levels
from the previous frame and the function decode_whitening_level takes care of the decoding the actual whitening level from the bit stream. The function is implemented according to the pseudo code below:
tmp = read_bit();
if (tmp == 1) {
tmp = read_bit();
if (tmp == 1) {
return 2;
} else {
return 0;
}
} else {
return 1;
}
Finally the IGF temporal flatness indicator flag is extracted from the bit stream.
In case of a TCX10 frame (), the IGF sideinfo for both sub-frames is extracted from the bit stream prior to the IGF processing. Therefore, the decoded sideinfo for the individual sub-frames is stored in a temporary buffer, so that the sideinfo for the current sub-frame under IGF processing can be accessed via the temporary buffer.
6.2.2.2.9.2 IGF noiseless decoding of scale factors
The noiseless decoding of the IGF scale factor vector is very similar to the noiseless encoding part. The entire encoding and decoding procedures are highly symmetric, and therefore the decoding procedure can be uniquely and unambiguously derived from the encoding procedure.
The module uses the common raw arithmetic decoder functions from the infrastructure, which are available from the core coder. The functions used are , which decodes one bit into
,
, which decodes one value into
from an alphabet of 27 symbols (
) using the cumulative frequency table
,
, which initializes the arithmetic decoder. Note that there is no
to finalize the arithmetic decoder. Instead, the equivalent function
was locally defined, which returns the last 14 bits read back to the bit stream reader.
6.2.2.2.9.2.1 IGF independency flag
The behaviour and processing related to the flag is identical to the encoder side.
6.2.2.2.9.2.2 IGF all-Zero flag
The flag is read from the bit stream first. In case the flag is 1, the decoder state is reset and no further data is read from the bit stream, because the decoded scale factors are all set to zero:
(1762)
Otherwise, if the flag is 0, the arithmetic coded scale factor vector is decoded from the bit stream.
6.2.2.2.9.2.3 IGF arithmetic decoding helper functions
6.2.2.2.9.2.3.1 The reset function
The behaviour and processing related to the reset function is identical to the encoder side.
6.2.2.2.9.2.3.2 The arith_decode_bits function
The function decodes an unsigned integer of length
bits, by reading one bit at a time.
arith_decode_bits(nBits)
{
x = 0;
for (i = 0; i < nBits; ++i) {
ari_decode_14bits_bit_ext(&bit);
x = (x << 1) | bit;
}
return x;
}
6.2.2.2.9.2.4 IGF arithmetic decoding
The function decodes an integer valued prediction residual, using the cumulative frequency table
, and the table offset
.
The behaviour and processing related to the function is very similar to the corresponding function in the encoder.
arith_decode_residual(cumulativeFrequencyTable, tableOffset)
{
ari_decode_14bits_s27_ext(&val, cumulativeFrequencyTable);
if ((val != 0) && (val != SYMBOLS_IN_TABLE – 1)) {
x = (val – 1) + MIN_ENC_SEPARATE;
x -= tableOffset;
return x;
}
extra = arith_decode_bits(4);
if (extra == 15) {
extra_tmp = arith_decode_bits(6);
if (extra_tmp == 63) {
extra_tmp = 63 + arith_decode_bits(7);
}
extra = 15 + extra_tmp;
}
if (val == 0) {
x = (MIN_ENC_SEPARATE – 1) – extra;
} else { /* val == SYMBOLS_IN_TABLE – 1 */
x = (MAX_ENC_SEPARATE + 1) + extra;
}
x -= tableOffset;
return x;
}
The function decodes the scale factor vector
, which consists of
integer values. The value
and the
vector, which constitute the decoder state, are used as additional parameters for the function. Note that the top level function
must call the common arithmetic decoder initialization function
before calling the function
, and also call the locally defined arithmetic decoder finalization function
afterwards.
decode_sfe_vector(t, prev, g, nB)
{
for (f = 0; f < nB; f++) {
if (t == 0) {
if (f == 0) {
ari_decode_14bits_s27_ext(&pred, cf_se00);
g[f] = pred << 2;
g[f] += arith_decode_bits(2); /* LSBs as 2 bit raw */
}
else if (f == 1) {
pred = g[f – 1]; /* pred = b */
g[f] = pred + arith_decode_residual(cf_se01, cf_off_se01);
} else { /* f >= 2 */
pred = g[f – 1]; /* pred = b */
ctx = quant_ctx(g[f – 1] – g[f – 2]); /* Q(b – e) */
g[f] = pred + arith_decode_residual(cf_se02[CTX_OFFSET + ctx],
cf_off_se02[CTX_OFFSET + ctx]);
}
}
else { /* t == 1 */
if (f == 0) {
pred = prev[f]; /* pred = a */
g[f] = pred + arith_decode_residual(cf_se10, cf_off_se10);
} else { /* (t == 1) && (f >= 1) */
pred = prev[f] + g[f – 1] – prev[f – 1]; /* pred = a + b – c */
ctx_f = quant_ctx(prev[f] – prev[f – 1]); /* Q(a – c) */
ctx_t = quant_ctx(x[f – 1] – prev[f – 1]); /* Q(b – c) */
g[f] = pred + arith_decode_residual(
cf_se11[CTX_OFFSET + ctx_t][CTX_OFFSET + ctx_f],
cf_off_se11[CTX_OFFSET + ctx_t][CTX_OFFSET + ctx_f]);
}
}
}
}
The cumulative frequency tables and the corresponding table offsets are initialized identically to the encoder side.
6.2.2.2.10 Spectral data
The quantized spectral coefficients are read from the bit-stream by the means of arithmetic decoding.
6.2.2.2.11 Residual bits
The left-over bits (to the target bit budget) after arithmetic decoding are read by the residual decoding module.
6.2.2.3 Decoding process
6.2.2.3.1 Arithmetic decoder
The arithmetic decoder is described by the following pseudo-code. It takes as input arguments the cumulative frequency table cum_freq[] and the size of the alphabet cfl.
symbol = ari_decode(cum_freq[], cfl) {
if (arith_first_symbol()) {
value = 0;
for (i=1; i<=16; i++) {
value = (val<<1) | arith_get_next_bit();
}
low = 0;
high = 65535;
}
range = high-low+1;
cum =((((int) (value-low+1))<<14)-((int) 1))/range;
p = cum_freq-1;
do {
q = p + (cfl>>1);
if ( *q > cum ) { p=q; cfl++; }
cfl>>=1;
}
while ( cfl>1 );
symbol = p-cum_freq+1;
if (symbol)
high = low + ((range*cum_freq[symbol-1])>>14) – 1;
low += (range * (cum_freq[symbol])>>14);
for (;;) {
if (high<32768) { }
else if (low>=32768) {
value -= 32768;
low -= 32768;
high -= 32768;
}
else if (low>=16384 && high<49152) {
value -= 16384;
low -= 16384;
high -= 16384;
}
else break;
low += low;
high += high+1;
value = (value<<1) | arith_get_next_bit();
}
return symbol;
}
6.2.2.3.1.1 Context-based arithmetic decoder
The context-based arithmetic decoder reads the following data in the following order:
bits for decoding lastnz/2-1.
- The entropy-coded MSBs bits
- The sign bits
- The residual quantization bits
- The LSBs bits are read backwardly from the end of the bitstream buffer.
The following pseudo-code shows how the spectral coefficients X[] or are decoded. It takes as input argument the allocated bit budget target_bits and the number of coded samples lastnz. The helper functions are given in encoder subsection from 5.3.3.2.8.1.2 to 5.3.3.2.8.1.2.
X=ari_context_decode(target_bits,pi,hi,last_nz) {
c[0]=c[1]=p1=p2=0;
for (k=0; k<L; k++) {
X[k]=0;
for (k=0; k<lastnz; k+=2) {
a=b=0;
(a1_i,p1,idx1) = get_next_coeff(pi,hi,lastnz);
(b1_i,p2,idx2) = get_next_coeff(pi,hi,lastnz);
t=get_context(idx1,idx2,c,p1,p2);
/* MSBs decoding */
for (lev=esc_nb=0;;){
pki = ari_context_lookup [t + 1024*esc_nb ];
ari_decode(ari_cf_m [pki],17);
if(r<16) break;
/*LSBs decoding*/
a=(a)+read_bit_end() <<(lev));
b=(b)+ read_bit_end() <<(lev));
lev++;
esc_nb=min(lev,3);
}
/*MSBs contributions*/
b1= r>>2;
a1= r&0x3;
a += (a1)<<lev;
b += (b1)<<lev;
/*Dectect overflow*/
if(nbbits>target_bits){
break;
}
c=update_context(a,b,a1,b1,c,p1,p2);
/* Store decoded data */
X[a1_i] = a;
X[b1_i] = b;
}
/*decode signs*/
for (i=0; i<L; i++){
if(X[i]>0){
if ( read_bit()==1 ){
X[i] = -X[i];
}
}
}
}
6.2.2.3.1.2 Envelope-based arithmetic decoder
The probability model is computed as described in the encoder subclause 5.3.3.2.8.1.2.3.
6.2.2.3.2 Adaptive low frequency de-emphasis
A general description of ALFE can be found in subclause 5.3.3.2.4.1.
6.2.2.3.2.1 Adaptive de-emphasis algorithm 1
ALFE algorithm 1 reverses the encoder-side LF emphasis 1 (see subclause 5.3.3.2.4.2). First, as was done in the encoder, the minimum and maximum of the first nine gains are found using comparison operations executed within a loop over the gain indices 0 to 8.
Then, if the ratio between the minimum and maximum exceeds a threshold of 1/32, a gradual lowering of the lowest lines in x is performed such that the first line is attenuated by (max/(32 min))0.25 and the 33rd line is not attenuated:
tmp = 32 * min;
if ((max < tmp) && (tmp > 0)) {
fac = tmp = pow(max / tmp, 1/128);
for (i = 31; i >= 0; i–) { /* gradual lowering of lowest 32 lines */
X[i] *= fac;
fac *= tmp;
}
}
Adaptive de-emphasis algorithm 2
ALFE algorithm 2 reverses the encoder-side LF emphasis 2 (see subclause 5.3.3.2.4.3) by checking for modifications to the quantized LF MDCT lines and undoing them. As was done in the encoder, the procedure is split into five steps:
- Step 1: first find first magnitude maximum at index i_max in lower spectral quarter (k = 0 ā¦
/ 4) for which |Xq[k]| ā„ 4 and modify the maximum as follows: Xq[i_max] += (Xq[i_max] < 0) ? 2 : -2
- Step 2: then expand value range of all X[k] up to i_max by multiplying all lines at k = 0ā¦i_maxā1 with 0.5
- Step 3: again find first magnitude maximum in lower quarter of spectrum if the i_max found in step 1 is > -1
- Step 4: again expand value range of all X[i] up to i_max as in step 2, but using the i_max found in step 3
- Step 5: finish and always expand two lines at the latest i_max found, i.e. at k = i_max+1, i_max+2. If the line magnitude at k is greater than or equal to 4, move it toward zero by two, otherwise multiply it by 0.5. As in the encoder all i_max are initialized to ā1. For details please see AdaptLowFreqDeemph() in tcx_utils.c.
6.2.2.3.3 Global gain decoding
The global gain is decoded from the index
transmitted in the bit stream as follows:
(1763)
6.2.2.3.4 Residual bits decoding
At 13.2 kbps and above the 3 first bits are used for refining the global gain. The variable n is initialized to 0:
The following bits refine the non-zeroed decoded lines. 1 bit per non-zeroed spectral value is read. The rounding offset used in the first quantization stage with dead-zone is taking into account for computing the reconstructed points:
If at least 2 bits are left to read, a zeroed value is refined as:
6.2.2.3.5 TCX formant enhancement
The TCX formant enhancement intends to replicate a behavior similar to that of the ACELP formant enhancement. It operates based on the LPC frequency-band gains, lpcGains[]. First, the square-root of each gain is computed. Then,
fac = 1 / min(sqrtGains[0], sqrtGains[1]);
k = 0;
for (i = 1; i < numGains – 1; i++) {
if ((sqrtGains[i-1] <= sqrtGains[i]) && (sqrtGains[i+1] <= sqrtGains[i])) {
step = max(sqrtGains[i-1], sqrtGains[i+1]);
step = (1 / step – fac) / (i – k);
sqrtGains[k] = 1;
fac += step;
for (j = k + 1; j < i; j++) {
sqrtGains[j] = min(1, sqrtGains[j] * fac);
fac += step;
}
k = i;
}
}
where sqrtGains[] contains the square-roots of the lpcGains[], and numGains denotes the number of LPC gains. In order to complete the above algorithm for the last gain at i = numGains ā 1, the following operation is executed,
step = min(sqrtGains[i-1], sqrtGains[i]),
and the above steps inside the if-condition, starting with āstep = (1 / step ā fac) / (i ā k)ā, are repeated. Finally, we set sqrtGains[numGains ā 1] = 1 and multiply the modified set of gains onto the decoded spectrum:
for (i = j = 0; i < j++) {
for (k = 0; k < / numGains; i++, k++) {
[i] *= sqrtGains[j];
}
}
with being the decoded spectrum. Like its ACELP counterpart, TCX formant enhancement is only used at 9.6 kbps.
6.2.2.3.6 Noise Filling
Noise filling is applied to fill gaps in the MDCT spectrum where coefficients have been quantized to zero. Pseudo-random noise is inserted into the gaps, starting at bin up to bin
. The amount of noise inserted is controlled by a noise factor transmitted in the bit stream. To compensate for LPC tilt, a tilt compensation factor is computed. At each side of a noise filling segment a fadeout over
bins is applied to the inserted noise to smooth the transition.
The start and stop bins and
are determined as described in subclause 5.3.3.2.10.2 .
is set to the same value as
:
(1764)
Computation of the tilt compensation factor is described in subclause 5.3.3.2.10.1. The transition width
is computed as described in 5.3.3.2.10.3.
6.2.2.3.6.1 Decoding of Noise Factor
The dequantized noise factor is obtained from the transmitted index
as follows:
(1765)
6.2.2.3.6.2 Noise Filling Seed
The inserted noise is generated as a sequence of pseudo-random numbers, which is computed in a recursive way starting with a seed computed from the quantized MDCT coefficients
:
(1766)
6.2.2.3.6.3 Filling Noise Segments
Determining the number and start/stop bins of noise filling segments is described in 5.3.3.2.10.2 and 5.3.3.2.10.4.
For each segment pseudo-random noise is generated and normalized, so that it has an RMS of one. Then tilt compensation, noise factor and transition fadeouts are applied. The resulting coefficients are inserted to the quantized MDCT spectrum and replace the zeroes in the noise filling segments. The following pseudo-code defines the exact procedure:
6.2.2.3.7 Apply global gain and LPC shaping in MDCT domain
The decoded global gain factor is applied to all MDCT coefficients. The LPC shaping of the MDCT spectrum applied on encoder side is inverted by multiplying the spectral coefficients by the LPC shaping gains
.
The computation of the shaping gains is performed in the same way as on encoder side, see subclause 5.3.3.2.3.2.
The following pseudo-code defines how global gain and LPC shaping are applied to the MDCT bins corresponding to the CELP frequency range:
For the remaining MDCT coefficients above the CELP frequency range (if any) the last LPC shaping gain is used:
(1767)
6.2.2.3.8 IGF apply
6.2.2.3.8.1 IGF independent noise filling
IGF uses independent noise filling in the IGF range. Through independent noise filling, core coder noise filling is replaced by random noise which is de-correlated from the core coder noise filling. Therefore a vector is filled with either 0 or 1 by evaluating the TCX noise-filling routine from subclause 6.2.2.3.6.3 such that every subband, which is noise-filled by TCX noise-filling, represents a 1 in
, all other entries are set to 0 in
.
First, the total noise energy in the IGF source range in the decoded MDCT spectrum
is calculated:
, (1768)
where . The noise indicated by
is replaced according to the following formula:
(1769)
where contains
copies of the spectrum with independent noise per copy, i.e. per IGF tile. For creating pseudo random numbers r(i), the random generator described in subclause 6.2.2.3.6.3 is used.
The energy of the inserted pseudo random numbers is measured with
. (1770)
Now the inserted noise is adjusted to the same energy level as the original noise. Therefore the correction factor is calculated:
(1771)
Using , the replaced noise is rescaled to match the original noise energy level in
:
(1772)
6.2.2.3.8.2 IGF whitening generation
In order to remove possible formant structure of the tiled signal and to suppress strong tonal components the routine IGF_getWhiteSpectralData() will be applied to the TCX spectrum
if the bitstream element
is 1 for any tile
. The algorithm is a low complex simplification of the following formula:
(1773)
which is a division of the spectrum by the square root of a moving average calculated on the spectrum. denotes the TCX coefficient of the decoded core signal with index
prior to application of the LPC filter.
Since the above formula would need a division and a square root operation per line ā two complex operations (18 and 10 OPS) ā the operations are done in logarithmic domain, while the logarithm is replaced with a low complex rounded integer logarithm to the basis 2.
(1774)
where
(1775)
The length of the moving average (MA) filter is 15 bins in total.
The range of bins on which this whitening operation has to be carried out is going from
to
, where
is the index of the first scale factor band – 1. Because
is always greater 7, the MA filter will be calculated from
on. However, the calculation of the MA filter has to be different for the last
bins below
:
(1776)
If the bitstream element is 2 for any tile
no core signal will be copied but a sequence of pseudo-random numbers will be used instead as described in subclause 6.2.2.3.6.2.
The seed for the pseudo random number generator (described in subclause 6.2.2.3.6.3) is derived from the TCX noise filling seed by:
(1777)
6.2.2.3.8.3 IGF envelope reconstruction
The IGF envelope reconstruction tool shapes the noise components filled into the gaps in the IGF range in order to adjust the spectral envelope as a function of the transmitted IGF scale factors.
6.2.2.3.8.3.1 Dequantizing IGF scale factors
For de-quantizing the IGF scale factors , transmitted in the bitstream, to
the following mapping is applied:
(1778)
6.2.2.3.8.3.2 Refining IGF scale factor borders
In order to optionally smooth the transmitted scale factors along the frequency axis, a refinement of the IGF scale-factor borders is introduced:
(1779)
The de-quantized IGF scale factors shall be mapped:
(1780)
For simplicity, is also mapped:
(1781)
The IGF envelope refinement is active for bitrates 64 kbps for all operating modes WB, SWB, FB.
6.2.2.3.8.3.3 Collecting energies below
To stabilize energy distribution in the range of , energy below
is collected:
(1782)
The energy is later used to adapt the first IGF scale factor band energy as described in subclause 6.2.2.3.8.3.7.
6.2.2.3.8.3.4 Collecting residual energies in IGF range
The residual energy determines the energy of the non-zero subbands in the IGF range:
(1783)
is therefore the energy of the de-quantized subband values above
which are not quantized to zero by the tonal mask detection of the encoder described in subclause 5.3.3.2.11.5.
6.2.2.3.8.3.5 Collecting tile energies in IGF range
The tile energy determines the energy of the signal which is filled into the gaps in the IGF range:
(1784)
(1785)
where is the signal after filling the gaps in
using the mapping function
as described in subclause 5.3.3.2.11.1.8:
(1786)
is therefore the energy of the synthesized subband values above
which are quantized to zero by the tonal mask detection of the encoder described in subclause 5.3.3.2.11.5.
6.2.2.3.8.3.6 Rescaling IGF scale factor band energies
The rescaling of the IGF scale factors has to be done in order to bring them on the correct energy level for the subsequent calculation of the IGF gains. In dependency of the refinement rescaling is applied on a scale factor band basis or on a group of scale factor bands. The rescaled IGF scale factor band energy is therefore called IGF destination energy .
In case refinement is not active, the rescaling is applied as follows:
(1787)
In case refinement is active, two subsequent scale factor band energies are mapped:
(1788)
6.2.2.3.8.3.7 Adaption of IGF scale factor band energies
The IGF scale factor band energies have to be adapted to fulfil the signal requirements. The first scale factor band energy is adapted to the energy below using
as introduced in subclause 6.2.2.3.8.3.3:
(1789)
where is the adapted IGF scale factor band energy
and
is the adaption factor for the first scale factor band energy according to table :
Table 166: IGF scale factor band energy adaption factors
|
Bitrate |
Mode |
|
|
|
|
9.6 kbps |
WB |
0.7 |
0.8 |
0.6 |
|
9.6 kbps |
SWB |
0 |
1.0 |
1.0 |
|
13.2 kbps |
SWB |
0.2 |
0.93 |
0.85 |
|
16.4 kbps |
SWB |
0.2 |
0.93 |
0.85 |
|
24.4 kbps |
SWB |
0.2 |
0.965 |
0.85 |
|
32.0 kbps |
SWB |
0.2 |
0.965 |
0.85 |
|
48.0 kbps |
SWB |
0.2 |
1.0 |
1.0 |
|
64.0 kbps |
SWB |
0.2 |
1.0 |
1.0 |
|
16.4 kbps |
FB |
0.2 |
0.93 |
0.85 |
|
24.4 kbps |
FB |
0.2 |
0.965 |
0.85 |
|
32.0 kbps |
FB |
0.2 |
0.965 |
0.85 |
|
48.0 kbps |
FB |
0.2 |
1.0 |
1.0 |
|
64.0 kbps |
FB |
0.2 |
1.0 |
1.0 |
|
96.0 kbps |
FB |
0 |
1.0 |
1.0 |
|
128.0 kbps |
FB |
0 |
1.0 |
1.0 |
The last scale factor band energy is adapted as follows:
(1790)
where is the adaption factor for the last scale factor band energy according to table 166.
If refinement is active and , the remaining scale factor band energies are low-pass filtered in order to smooth the frequency envelope:
(1791)
Otherwise the scale factor band energies are not affected by further modifications:
(1792)
6.2.2.3.8.3.8 Calculation of IGF gain factors
The IGF gain factors are used to finally shape the tiled subband values in order to adjust the spectral envelope of the synthesized signal above . First, the target energy level
has to be calculated:
(1793)
(1794)
where is the hop-size of the refinement in dependency of
and the maximal possible hop-size
according to table 167:
(1795)
Table 167: Maximal IGF hop-size
|
Bitrate |
mode |
|
|
9.6 kbps |
WB |
4 |
|
9.6 kbps |
SWB |
2 |
|
13.2 kbps |
SWB |
4 |
|
16.4 kbps |
SWB |
4 |
|
24.4 kbps |
SWB |
4 |
|
32.0 kbps |
SWB |
4 |
|
48.0 kbps |
SWB |
4 |
|
64.0 kbps |
SWB |
4 |
|
16.4 kbps |
FB |
4 |
|
24.4 kbps |
FB |
2 |
|
32.0 kbps |
FB |
2 |
|
48.0 kbps |
FB |
2 |
|
64.0 kbps |
FB |
2 |
|
96.0 kbps |
FB |
1 |
|
128.0 kbps |
FB |
1 |
Second, a normalization term for normalizing the target energy
is calculated as follows:
(1796)
(1797)
Finally, the IGF gain factors have to be calculated according to the following formula:
(1798)
where is the general adaption factor for all scale factor band energy according to table 166.
If hop-size , the IGF gain factors in between a particular hop are hold beginning at the hop-start:
(1799)
void (1800)
6.2.2.3.8.3.9 IGF envelope adjustment
With the calculated IGF gain factors as described in subclause 6.2.2.3.8.3.8, the envelope of the spectrum above
is adjusted as follows:
(1801)
where shall be already mapped with the function
see subclause 5.3.3.2.11.1.1, and
being the number of bands.
is the gap-filled signal in accordance with subclause 6.2.2.3.8.3.5.
6.2.2.3.9 Inverse window grouping (TCX5 separation)
If the configuration, determined as described in subclause 6.2.4.2, indicates that some sub-frames are coded using TCX5 then a sub-frame containing MDCT bins of 2 TCX5 sub-frames is de-interleaved to form 2 consecutive TCX5 sub-frames before the optional Temporal Noise Shaping, the optional IGF temporal flattening and before the transformation with the inverse MDCT:
()
6.2.2.3.10 Temporal Noise Shaping
The decoding process for Temporal Noise Shaping is carried out separately on each window of the current frame by applying the so called lattice filter to selected regions of the spectral coefficients (see in subclause 5.3.3.2.2). The number of noise shaping filters applied to each window is specified by "nMaxFilters ".
For TCX5 the same rearrangement is done as described in subclause 5.3.3.2.2 prior to the TNS filtering and the rearrangement is reverted after the filtering.
First the transmitted filter coefficients have to be decoded, i.e. conversion to signed numbers, inverse quantization. Then the so called lattice filters are applied to the target frequency regions of the spectral coefficients (see subclause 5.3.3.2.2). The maximum possible filter order is defined by the constant TNS_MAX_FILTER_ORDER.
The application of TNS shaping filter is done before the optional IGF temporal flattening and before the transformation with the inverse MDCT
The decoding process for one window can be described as follows pseudo code:
/* TNS decoding for one window */
tns_decode()
{
set_zero( state, TNS_MAX_FILTER_ORDER );
for (iFilter = nMaxFilters-1; iFilter >= 0; iFilter–) {
tns_decode_coef( order, index[iFilter], parCoeff[iFilter] );
tns_filter( spectrum, startLine, endLine, parCoeff[iFilter], order, state );
}
}
/* Decoder transmitted coefficients index[] for one TNS filter */
tns_decode_coef( order, index[], parCoeff[] )
{
/* Conversion to signed integer */
for (i = 0; i < order; i++)
tmp[i] = index[i] + (1 << (TNS_COEF_RES-1));
/* Inverse quantization */
iqfac = ((1 << (TNS_COEF_RES-1)) – 0.5) / (Ļ/2.0);
iqfac_m = ((1 << (TNS_COEF_RES-1)) + 0.5) / (Ļ/2.0);
for (i = 0; i < order; i++) {
parCoeff[i] = sin( tmp[i] / ((tmp[i] >= 0) ? iqfac : iqfac_m) );
}
}
/* Lattice filter */
tns_filter( spectrum[], startLine, endLine, parCoeff[], order, state )
{
for (j = startLine; j <= endLine; j++) {
spectrum[j] -= parCoeff[order-1] * state[order-1];
for (i = order-2; i >= 0; i–) {
spectrum[j] -= parCoeff[i] * state[i];
state[i+1] = parCoeff[i] * spectrum[j] + state[i];
}
state[0] = spectrum[j];
}
}
Filter order 1 with the first coefficient equal to 0 identifies disabled filter.
Please note that this pseudo code uses a āCā-style interpretation of arrays and vectors, i.e. if parCoeff describes the coefficients for all filters, parCoeff[iFilter] is a pointer to the coefficients of one particular filter.
6.2.2.3.11 IGF temporal flattening
The reconstructed signal by IGF is temporally flattened in the frequency domain when . The temporal flattening is performed in a frequency-selective manner as follows.
The selection of the spectral contents to be temporally flattened is done by comparing the quantized spectral coefficients with 0 and the contents whose coefficients are quantized to 0 are selected.
In order to maintain the significant spectral contents, they are temporarily replaced by the spectra which are similarly generated to the filled spectra by IGF:
(1803)
where is the quantized MDCT coefficient after arithmetic decoding and
is the reconstructed MDCT coefficient by IGF.
The linear prediction of the spectra is done and the linear prediction coefficients
are calculated. Then the temporally flattened spectrum is given by the following filtering:
. (1804)
Finally, the significant spectral contents are restored by:
, (1805)
and then the frequency-selectively temporally flattened spectrum is output to IMDCT for getting the time domain signal.