5.3.3 MDCT based TCX
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
5.3.3.1 General description
5.3.3.1.1 High level overview
The MDCT based TCX (TCX) mode codes the MDCT spectrum using an LPC scheme for noise shaping. The LPC parameters are estimated in time domain and applied in the MDCT domain. The basic mode consists of a 20ms MDCT transformation (TCX20). For higher rates, smaller transform sizes for 10ms (TCX10) and 5ms (TCX5) are supported as well. The TCX mode provides several coding tools such as adaptive low frequency emphasis, temporal noise shaping, intelligent gap filling or noise filling to improve coding efficiency. The spectral data is finally quantized by a uniform quantizer with dead-zone and noiseless codec by an arithmetic coder module with several modes.
For the MDCT based TCX, the total length of the MDCT spectrum depends on the input sampling rate, the TCX mode and the coding mode of the previous frame. Some of the coding tools used in TCX only work on the lower MDCT bins corresponding to the CELP frequency range, as LP analysis is performed on the CELP sampling rate. This number of bins is designated by. The maximum number of bins encoded by TCX is determined by the coding bandwidth. If exceeds, the remaining bins are filled with zeroes.
The following table defines, and:
Table 83: Overview frame sizes of MDCT-based TCX
|
TCX mode |
TCX20 |
TCX10/TCX5 |
||
|
Previous coding mode |
MDCT |
CELP |
||
|
NB |
160 |
200 |
80 |
|
|
WB |
320 |
400 |
160 |
|
|
SWB |
640 |
800 |
320 |
|
|
FB |
960 |
1200 |
480 |
|
5.3.3.1.2 Rate dependent configuration
The MDCT-based TCX contains different setups depending on the bit rate for encoding. In general, the setup consists of three configurations: Low-rate, mid-rate and high-rate configuration. The following table lists the main tools used in the various configurations.
Table 84: Overview configurations MDCT-based TCX
|
Bitrate [kbps] |
Bandwidth |
Specific configurations |
|
9.6 (Low-rate) |
NB,WB,SWB |
low-rate LPC, envelope based arithmetic coder, ALFE 1, TCX20 |
|
13.2 – 32 (Mid-rate) |
NB, WB, SWB,FB |
mid-rate LPC, context based arithmetic coder, ALFE 2, TCX20 |
|
48 – 128 (High-rate) |
WB, SWB,FB |
high-rate LPC, context based arithmetic coder, ALFE 1, TCX20/10/5 |
One additional bit rate dependent tool is temporal noise shaping (TNS). TNS is activated only for bit rate 24.4 kbps and higher.
5.3.3.2 General encoding procedure
5.3.3.2.1 LPC parameter calculation
5.3.3.2.1.1 Low rate LPC
MDCT based TCX relies on smoothed LPC spectral envelope. Analysis methods are identical to that of ACELP. Quantization of LSF is also common to ACELP except for 9.6 kbps (NB/WB/SWB). Quantization of weighted LSF and conversion of LSF used for 9.6 kbps (NB/WB/SWB) are described in the following subclauses.
5.3.3.2.1.1.1 Quantization of weighted LSF
For MDCT based TCX at 9.6 kbps (NB/WB), envelope based arithmetic coding and low rate quantization of weighted LSF are used. After normal LPC analysis and interpolation in LSP domain, perceptual weighting is applied to the prediction coefficients to get weighted prediction coefficients as
. (909)
Weighted prediction coefficients are converted to LSF vector and the LSF vector is quantized with two stage or three stage VQ. These VQ codebooks are used in two ways.
First and primary use of the quantization is intended to represent the weighted LPC envelope to shape the MDCT coefficients with MA inter frame prediction. Secondary use is to tell the estimate of envelope to the arithmetic coding without depending on MA prediction.
In case of primary encoding, input weighted LSF vector is subtracted by the mean vector and the MA predicted contribution vector to generate the input of two stage VQ , .
. (910)
For the first stage VQ, 5 bits for 16 samples are allocated. For the second stage VQ, 4 bits are assigned to the lower 6 LSF parameters and 4 bits for the higher 10 parameters. A quantization criterion is to minimize the weighted distortion between input, and output of quantized LSF vectors, . The reconstructed LSF vector after adding mean vector and the MA predicted contribution vector, can be converted to the weighted envelope representing and used for noise shaping of MDCT domain transform coefficients.
In the secondary encoding and decoding, the same LSF vector from the VQ table at the primary encoding, is retrieved and mean vector is added to reconstruct weighted LSF vector without MA prediction . This can inform the shape of envelope for the envelope base arithmetic coding even when the decoder cannot get the information from the previous frame.
Envelope shape could be distorted due to the lack of MA prediction and a third stage VQ is used to compensate the distortion only when necessary. Compensation is preferable when large spectral envelope or large distortion in spectral envelope is expected. In order to determine the necessity of the third stage VQ, the first and the second lower values of , namely, and are checked. Ifor have smaller values than the threshold, namely expected envelope values and distortion are larger than threshold, the third VQ with 2 bits is applied to and for the purpose of corrections. The criteria of the selection of the retrieved vector at the third stage VQ, is to minimize the weighted distortion between input and a final reconstructed LSF vector without MA prediction . If both and values are equal to or larger than threshold, the third VQ is skipped and is used as a final reconstruction LSF vector.
Reconstructed LSF vector is corresponding to the weighted envelopewithout depending on MA prediction. In envelope based arithmetic coding, unweighted LSF vector without depending on MA prediction is also necessary to estimate the MDCT envelope. This can be obtained by low-complex direct matrix conversion described in the next subclause.
5.3.3.2.1.1.2 Direct conversion of LSF
For the interpolation of LSF to the LSF in the possible ACELP at the next frame, the reconstructed LSF vector with MA prediction needs to be converted to unweighted domain LSF vector . Similarly, reconstructed LSF vector without MA prediction needs to be converted to get unweighted domain LSF to assist the envelope base arithmetic coding.
In order to get unweighted LSF corresponding to, weighted LSF is converted by using matrixand equally spaced LSF as follows.
(911)
(912)
has non-zero elements only in the diagonal position and the adjacent samples as follows and the non-zero elements are shown in table 85.
(913)
Table 85: Non-zero elements of conversion matrices
|
from 0.92 to 1.0 (12.8 kHz sample) |
from 0.94 to 1.0 (16 kHz sample) |
|||||
|
1 |
– |
1.19764 |
-0.59173 |
– |
0.78925 |
-0.38537 |
|
2 |
-0.91173 |
1.79182 |
-0.80921 |
-0.57154 |
1.19486 |
-0.54136 |
|
3 |
-0.51779 |
1.44703 |
-0.81871 |
-0.33642 |
0.99096 |
-0.56792 |
|
4 |
-0.44862 |
1.36777 |
-0.75103 |
-0.29856 |
0.93785 |
-0.51255 |
|
5 |
-0.4515 |
1.30719 |
-0.7422 |
-0.29716 |
0.89303 |
-0.50509 |
|
6 |
-0.43157 |
1.21326 |
-0.68538 |
-0.28264 |
0.81530 |
-0.46020 |
|
7 |
-0.43606 |
1.21317 |
-0.69131 |
-0.27926 |
0.80997 |
-0.46378 |
|
8 |
-0.392 |
1.04941 |
-0.58674 |
-0.25334 |
0.69596 |
-0.38969 |
|
9 |
-0.45208 |
1.10009 |
-0.59175 |
-0.29656 |
0.72916 |
-0.38888 |
|
10 |
-0.42553 |
0.99725 |
-0.49992 |
-0.27488 |
0.65949 |
-0.32999 |
|
11 |
-0.50168 |
1.07575 |
-0.51401 |
-0.32630 |
0.70913 |
-0.33659 |
|
12 |
-0.498 |
1.06563 |
-0.50592 |
-0.33069 |
0.70668 |
-0.33105 |
|
13 |
-0.53101 |
1.16372 |
-0.58033 |
-0.35437 |
0.77582 |
-0.38003 |
|
14 |
-0.48744 |
1.07596 |
-0.52531 |
-0.31771 |
0.70752 |
-0.34216 |
|
15 |
-0.51899 |
1.04998 |
-0.49495 |
-0.35066 |
0.70177 |
-0.31664 |
|
16 |
-0.4773 |
0.90959 |
– |
-0.33404 |
0.62528 |
– |
5.3.3.2.1.2 Mid-rate LPC
For mid-rate bitrates (between 13.2 and 32 kbps) the LP analysis method is the same as for ACELP. The quantizer used is also the same but operated in AUDIO mode only (see subclause 5.2.2.1.2). However the interpolation of quantized LSF coefficients is different. Here each interpolated LSF coefficient is a weighted sum of the quantized LSF coefficients referring to the current and previous frame:
(914)
These interpolated coefficients are converted back to LPC domain in the same way as done for ACELP frames.
5.3.3.2.1.3 High-rate LPC
This subclause describes the LPC analysis and quantization scheme for high-rate bitrates (48 kbps and higher). The description in the following subclauses refers to the quantization of either a single set of LPC parameters for the whole frame or two sets if the frame is divided into sub-frames. As described in subclause 5.3.2.3 a frame can be subdivided into TCX5 or TCX10 sub-frames. In this case the LPC quantizer however treats two adjacent TCX5 sub-frames as a single TCX10 sub-frame.
The LP coefficients are obtained by calculating autocorrelation on a windowed and pre-emphasized time domain signal with set to 0.9 for SWB and to 0.72 for WB and the input signal to the pre-emphasis filter is for WB, for 48 kbps SWB and for 96 kbps and 128 kbps SWB (see subclause 5.1.4). The window used here is the window used for the TCX MDCT transformation (described in subclause 5.3.2.3) with the only exception, the ALDO windows will be replaced by FULL overlap windows (see subclause 5.3.2.5). After autocorrelation adaptive lag-windowing (see subclause 5.1.9.3) is applied to the autocorrelation coefficients which are finally converted to LP coefficients by Levinson recursion. The sample rate used for adaptive lag-windowing is the TCX sample rate (25.6 or 32 kHz) and the information of OL pitch estimation of current frame is used. For quantization the LP coefficients are converted to LSF domain.
The first set of LSF coefficients is always quantized without inter-frame dependency. In case two sets of LSF coefficients are quantized the second set may be quantized with or without dependency to the first set. This dependency (explained below) is signalled with one bit.
5.3.3.2.1.3.1 General Principle
The general principle for the quantization of a given LPC filter is represented in figure 62. A first-stage approximation is computed, and then subtracted from the input LSF vector to produce a residual LSF vector.
A LSF weighting function is derived from the first-stage approximation and applied to the residual LSF vector. The purpose and calculation of the weighting function are described in subclause 5.3.3.2.1.2.3.
The resulting weighted residual LSF vector is finally fed to the algebraic VQ encoder described in subclause 5.3.3.2.1.2.4.
Figure 62: Principle of weighted algebraic LPC quantizer
5.3.3.2.1.3.2 First-stage approximation
The first stage approximation is a 16-dimensional, 8-bits stochastic vector quantizer applied to the input LSF vector. The codebook search uses a weighted Euclidian distance in which each component of the squared difference between the input LSF vector and the codebook entry is multiplied by:
(915)
with:
(916)
where LSF is the input LSF vector to be quantized and is the internal sampling frequency of the TCX-based codec (25600 for 48 kbps and 32000 for 96 kbps and above).
The difference between the input LSF vector and the first-stage approximation is called the residual LSF vector.
5.3.3.2.1.3.3 LSF weighting
Initially a weighted sum of the residual LSF vector is compared to a threshold:
(917)
where are the first stage approximation of the LSF coefficients and are the weights according to equation (918) with the factor being . If the above relation is true, a special codebook is selected and no further quantization done. Otherwise the residual LSF vector is quantized with stochastic vector quantization.
The principle of stochastic vector quantization is to search a codebook of vectors for the nearest neighbour (in terms of Euclidian distance) of the vector to be quantized. When quantizing LPC filters in the LSF (line spectral frequency) domain, a weighted Euclidian distance is generally used, each component of the vector being weighted differently depending on its value and of the value of the other components. The purpose of this weighting is to make the minimization of the Euclidian distance behave as close as possible to a minimization of the spectral distortion.
Unlike a stochastic quantizer, a uniform algebraic vector quantizer does not perform an exhaustive search of the codebook. It is therefore very complex to introduce a weighting function in the distance computation.
The solution used here is to warp the residual LSF vector (i.e. the difference between the input LSF vector and a first stage approximation) using a weighting function computed from the first-stage LSF approximation. By warping we mean applying different weights to the components of the LSF residual vector. Because the first-stage LSF approximation is also available at the decoder, the inverse weighting factors can also be computed at the decoder and the inverse warping can be applied to the quantized residual LSF vector. Warping the residual LSF vector according to a model that minimizes the spectral distortion is especially useful when the quantizer is uniform.
The weights applied to the components of the residual LSF vector are:
(918)
with
(919)
where is the internal sampling frequency of the TCX-based codec (25600 for 48 kbps and 32000 for 96 kbps and above). The factor is always for the first set of LSF coefficients.
5.3.3.2.1.3.4 Algebraic vector quantizer
The algebraic VQ used for quantizing the refinement is described in subclause 5.2.3.1.6.9.
5.3.3.2.1.3.5 Quantization of second set of LSF
In case a second set of LSF coefficients needs to be quantized, the quantization scheme for the second set may be dependent on the first set or not. Two approaches are tried and the one with the lower bit consumption will be chosen. The first approach is identical to the quantization of the first set. In the second approach the first stage approximation described in subclause 5.3.3.2.1.3.2 is replaced by the quantized first set of LSF. The factor for calculating the weights is changed to. Finally the bit consumption of both approaches is compared and the one yielding the lower amount of bits is chosen and signalled with one bit.
5.3.3.2.2 Temporal Noise Shaping
Temporal Noise Shaping (TNS) is used to control the temporal shape of the quantization noise within each window of the transform.
If TNS is active in this encoder, up to two filters per MDCT-spectrum will be applied. The steps in the TNS encoding are described below. TNS is always calculated on a per subwindow basis, so in case of an 4 TCX5 window sequence these steps have to be applied once for each of the 4 subwindows.
Table 86: TNS configurations
|
Bitrate [kbps] |
Bandwidth |
Number of TNS filters for TCX20 and start and stop frequency |
Number of TNS filters for TCX10 and start and stop frequency |
Number of TNS filters for TCX5 and start and stop frequency |
|
24.4, 32 |
SWB |
1 (600Hz-16000Hz) |
TCX10 not used |
TCX5 not used |
|
48,96,128 |
WB |
1 (600Hz-8000Hz) |
2 (800Hz-4400Hz, 4400-8000Hz) |
1 (800Hz-8000Hz) |
|
48,96,128 |
SWB |
2 (600Hz-4500Hz, 4500-16000Hz) |
2 (800Hz-8400Hz, 8400-16000Hz) |
1 (800Hz-16000Hz) |
|
24.4,32 |
FB |
1 (600Hz-20000Hz) |
TCX10 not used |
TCX5 not used |
|
48,96,128 |
FB |
2 (600Hz-4500Hz, 4500-20000Hz) |
2 (800Hz-10400Hz, 10400-20000Hz) |
1 (800Hz-20000Hz) |
The number of filters for each configuration and the start and the stop frequency of each filter are given in table 86.
The MDCT bins of 2 TCX5 sub-frames are rearranged prior to TNS filtering:
(920)
For such rearanged and concatanated 2 TCX5 frames, special TNS configuration with 2 filters is used, which effectively functions as one filter per TCX5. The rearrangement is reverted after the filtering described below, to again have 2 separated TCX5 subwindows.
5.3.3.2.2.1 TNS detection
The spectral coefficients between the start and stop frequency are divided into equal consecutive portions and for each portion the normalized autocorrelation function is calculated. The values of the autocorrelation function of all portions are then summed up and lag windowed. An exception is the first filter for 48/96/128 kbps SWB from 600 Hz to 4500 Hz which has only one portion ( = 1).
The next step is an LPC calculation using the Levinson-Durbin algorithm (defined in the subclause 5.1.9.4). The filter oder is limited to 8 and to quarter of the number of the bins that a TNS filter covers.. As a result so called PARCOR or reflection coefficients (where is defined in subclause 5.1.9.4 and ) and the prediction gain are available, where is the residual error energy as defined in subclause 5.1.9.4.
The TNS parcor coefficients will be quantized with a resolution of 4 bits.
The filter order is reduced such that the last PARCOR coefficient is non-zero.
TNS filter will be used only if the prediction gain is greater than a given threshold, which is dependent on the filter and varies between 1.35 and 1.85, or if the average of the squared filter coefficients is greater than a given threshold, which is dependent on the filter and varies between 0.03 and 0.075. For configurations with 2 filters per spectrum, each filter can be independently disabled or enabled.
5.3.3.2.2.2 TNS filtering
The spectral coefficients will be replaced by the spectral coefficients filtered with the TNS filter. In the following text refers to the TNS filtered or to the non-filtered spectral coefficients, depending on the configuration, where the configurations where TNS filtered spectral coefficients are used are listed in table 86. The filtering is done with the help of a so called lattice filter, no conversion from parcor coefficients to linear prediction coefficients is required.
5.3.3.2.2.3 Coding of the TNS parameters
The TNS filter order and the quantized parcor coefficients are coded using Huffman code. Which Huffman code will be used for the filter order depends on the frame configuration (TCX20/TCX10/TCX5) and on the bandwidth (SWB,WB). Which Huffman code will be used for a parcor coefficient depends on the frame configuration (TCX20/TCX10/TCX5), on the bandwidth (SWB,WB) and on the parcor coefficent’s index.
5.3.3.2.3 LPC shaping in MDCT domain
5.3.3.2.3.1 General Principle
LPC shaping is performed in the MDCT domain by applying gain factors computed from weighted quantized LP filter coefficients to the MDCT spectrum. The input sampling rate , on which the MDCT transform is based, can be higher than the CELP sampling rate , for which LP coefficients are computed. Therefore LPC shaping gains can only be computed for the part of the MDCT spectrum corresponding to the CELP frequency range. For the remaining part of the spectrum (if any) the shaping gain of the highest frequency band is used.
5.3.3.2.3.2 Computation of LPC shaping gains
To compute the 64 LPC shaping gains the weighted LP filter coefficients are first transformed into the frequency domain using an oddly stacked DFT of length 128:
(921)
The LPC shaping gains are then computed as the reciprocal absolute values of :
(922)
5.3.3.2.3.3 Applying LPC shaping gains to MDCT spectrum
The MDCT coefficients corresponding to the CELP frequency range are grouped into 64 sub-bands. The coefficients of each sub-band are multiplied by the reciprocal of the corresponding LPC shaping gain to obtain the shaped spectrum . If the number of MDCT bins corresponding to the CELP frequency range is not a multiple of 64, the width of sub-bands varies by one bin as defined by the following pseudo-code:
,
if then
,
else if then
, ,
else
, ,
for
{
if then
else
for
{
}
}
The remaining MDCT coefficients above the CELP frequency range (if any) are multiplied by the reciprocal of the last LPC shaping gain:
(923)
For the configurations 9.6 kbit/s and 13.2 kbit/s SWB, the remaining spectral coefficients above the CELP frequency range are postprocessed:
First, the highest amplitudes of the MDCT spectrum below and above are determined. The search procedure returns the following values:
a) max_low_pre: The maximum MDCT coefficient below , evaluated on the spectrum of absolute values before the application of reciprocal LPC shaping gains
b) max_high_pre: The maximum MDCT coefficient above , evaluated on the spectrum of absolute values before the application of reciprocal LPC shaping gains
max_low_pre = 0;
for(i=0; i<; i++)
{
tmp = fabs();
if(tmp > max_low_pre)
{
max_low_pre = tmp;
}
}
max_high_pre = 0;
for(i=0; i<; i++)
{
tmp = fabs( ( + i));
if(tmp > max_high_pre)
{
max_high_pre = tmp;
}
}
Second, a peak-distance metric analyzes the impact of spectral peaks above above on the arithmetic coder. Thus, the maximum amplitude of the MDCT spectrum below and above are searched on the MDCT spectrum after the application of reciprocal LPC shaping gains, i.e. in the domain where also the arithmetic coder is applied. In addition to the maximum amplitude, also the distance from is evaluated. The search procedure returns the following values:
a) max_low: The maximum MDCT coefficient below , evaluated on the spectrum of absolute values after the application of reciprocal LPC shaping gains
b) dist_low: The distance of max_low from
c) max_high: The maximum MDCT coefficient above , evaluated on the spectrum of absolute values after the application of reciprocal LPC shaping gains
d) dist_high: The distance of max_high from
max_low = 0;
dist_low = 0;
for(i=0; i<; i++)
{
tmp = fabs( ( – 1 – i));
if(tmp > max_low)
{
max_low = tmp;
dist_low = i;
}
}
max_high = 0;
dist_high = 0;
for(i=0; i< – ; i++)
{
tmp = fabs( ( + i));
if(tmp > max_high)
{
max_high = tmp;
dist_high = i;
}
}
Third, the peak-amplitudes in psycho-acoustically similar spectral regions are compared. Thus, the maximum amplitude of the MDCT spectrum below and above is searched on the MDCT spectrum after the application of reciprocal LPC shaping gains. The maximum amplitude of the MDCT spectrum below is not searched for the full spectrum, but only starting at . This is to discard the lowest frequencies, which are psycho-acoustically most important and usually have the highest amplitude after the application of reciprocal LPC shaping gains, and to only compare components with a similar psycho-acoustical importance. The search procedure returns the following values:
a) max_low2: The maximum MDCT coefficient below , evaluated on the spectrum of absolute values after the application of reciprocal LPC shaping gains starting from
b) max_high: The maximum MDCT coefficient above , evaluated on the spectrum of absolute values after the application of reciprocal LPC shaping gains
max_low2 = 0;
for(i=; i<; i++)
{
tmp = fabs( (i));
if(tmp > max_low2)
{
max_low2 = tmp;
}
}
max_high = 0;
for(i=0; i< – ; i++)
{
tmp = fabs( ( + i));
if(tmp > max_high)
{
max_high = tmp;
}
}
Finally, the relation of max_low, dist_low, max_high, dist_high, max_low2, max_high is evaluated and – if applicaple – a correction factor for determined and applied:
if( (16.0 * max_low_pre > max_high_pre) &&
(4.0 * dist_high * max_high > dist_low * max_low) &&
(max_high > max_fac * max_low2)
)
{
fac = mac_fac * max_low2 / max_high;
for(i = ; i< ; i++)
{
(i) = (i) * fac;
}
}
where max_fac is set to 1.5 in case the Envelope based arithmetic coder is used, or max_fac is set to 3.0 for all other cases.
5.3.3.2.4 Adaptive low frequency emphasis
5.3.3.2.4.1 General Principle
The purpose of the adaptive low-frequency emphasis and de-emphasis (ALFE) processes is to improve the subjective performance of the frequency-domain TCX codec at low frequencies. To this end, the low-frequency MDCT spectral lines are amplified prior to quantization in the encoder, thereby increasing their quantization SNR, and this boosting is undone prior to the inverse MDCT process in the internal and external decoders to prevent amplification artifacts.
There are two different ALFE algorithms which are selected consistently in encoder and decoder based on the choice of arithmetic coding algorithm and bit-rate. ALFE algorithm 1 is used at 9.6 kbps (envelope based arithmetic coder) and at 48 kbps and above (context based arithmetic coder). ALFE algorithm 2 is used from 13.2 up to incl. 32 kbps. In the encoder, the ALFE operates on the spectral lines in vector x[] directly before (algorithm 1) or after (algorithm 2) every MDCT quantization, which runs multiple times inside a rate-loop in case of the context based arithmetic coder (see subclause 5.3.3.2.8.1).
5.3.3.2.4.2 Adaptive emphasis algorithm 1
ALFE algorithm 1 operates based on the LPC frequency-band gains, lpcGains[]. First, the minimum and maximum of the first nine gains – the low-frequency (LF) gains – are found using comparison operations executed within a loop over the gain indices 0 to 8.
Then, if the ratio between the minimum and maximum exceeds a threshold of 1/32, a gradual boosting of the lowest lines in x is performed such that the first line (DC) is amplified by (32 min/max)0.25 and the 33rd line is not amplified:
tmp = 32 * min
if ((max < tmp) && (max > 0))
{
fac = tmp = pow(tmp / max, 1/128)
for (i = 31; i >= 0; i–)
{ /* gradual boosting of lowest 32 lines */
x[i] *= fac
fac *= tmp
}
}
5.3.3.2.4.3 Adaptive emphasis algorithm 2
ALFE algorithm 2, unlike algorithm 1, does not operate based on transmitted LPC gains but is signaled by means of modifications to the quantized low-frequency (LF) MDCT lines. The procedure is divided into five consecutive steps:
- Step 1: first find first magnitude maximum at index i_max in lower spectral quarter (k = 0 … / 4) utilizing invGain = 2/gTCX and modifying the maximum: xq[i_max] += (xq[i_max] < 0) ? -2 : 2
- Step 2: then compress value range of all x[i] up to i_max by requantizing all lines at k = 0 … i_max–1 as in the subclause describing the quantization, but utilizing invGain instead of gTCX as the global gain factor.
- Step 3: find first magnitude maximum below i_max (k = 0 …/ 4) which is half as high if i_max > –1 using invGain = 4/gTCX and modifying the maximum: xq[i_max] += (xq[i_max] < 0) ? -2 : 2
- Step 4: re-compress and quantize all x[i] up to the half-height i_max found in the previous step, as in step 2
- Step 5: finish and always compress two lines at the latest i_max found, i.e. at k = i_max+1, i_max+2, again utilizing invGain = 2/gTCX if the initial i_max found in step 1 is greater than –1, or using invGain = 4/gTCX otherwise. All i_max are initialized to –1. For details please see AdaptLowFreqEmph() in tcx_utils_enc.c.
5.3.3.2.5 Spectrum noise measure in power spectrum
For guidance of quantization in the TXC encoding process, a noise measure between 0 (tonal) and 1 (noise-like) is determined for each MDCT spectral line above a specified frequency based on the current transform’s power spectrum. The power spectrum is computed from the MDCT coefficients and the MDSTcoefficients on the same time-domain signal segment and with the same windowing operation:
(924)
Each noise measure in is then calculated as follows. First, if the transform length changed (e.g. after a TCX transition transform following an ACELP frame) or if the previous frame did not use TCX20 coding (e.g. in case a shorter transform length was used in the last frame), all up to are reset to zero. The noise measure start line is initialized according to the following table 87.
Table 87: Initialization table of in noise measure
|
Bitrate (kbps) |
9.6 |
13.2 |
16.4 |
24.4 |
32 |
48 |
96 |
128 |
|
bw= NB, WB |
66 |
128 |
200 |
320 |
320 |
320 |
320 |
320 |
|
bw=SWB,FB |
44 |
96 |
160 |
320 |
320 |
256 |
640 |
640 |
For ACELP to TCX transitions, is scaled by 1.25. Then, if the noise measure start line is less than , the at and above are derived recursively from running sums of power spectral lines:
(925)
(926)
Furthermore, every time is given the value zero in the above loop, the variable lastTone is set to k. The upper 7 lines are treated separately since cannot be updated any more (, however, is computed as above):
(927)
The uppermost line at is defined as being noise-like, hence . Finally, if the above variable lastTone (which was initialized to zero) is greater than zero, then . Note that this procedure is only carried out in TCX20, not in other TCX modes ().
5.3.3.2.6 Low pass factor detector
A low pass factor is determined based on the power spectrum for all bitrates below 32.0 kbps. Therefore, the power spectrum is compared iteratively against a threshold for all , where for regular MDCT windows and for ACELP to MDCT transition windows. The iteration stops as soon as.
The low pass factor determines as , where is the last determined low pass factor. At encoder startup, is set to 1.0. The low pass factor is used to determine the noise filling stop bin (see subclause 5.3.3.2.10.2).
5.3.3.2.7 Uniform quantizer with adaptive dead-zone
For uniform quantization of the MDCT spectrum after or before ALFE (depending on the applied emphasis algorithm, see subclause 5.3.3.2.4.1), the coefficients are first divided by the global gain (see subclause 5.3.3.2.8.1.1), which controls the step-size of quantization. The results are then rounded toward zero with a rounding offset which is adapted for each coefficient based on the coefficient’s magnitude (relative to ) and tonality (as defined by in subclause 5.3.3.2.5). For high-frequency spectral lines with low tonality and magnitude, a rounding offset of zero is used, whereas for all other spectral lines, an offset of 0.375 is employed. More specifically, the following algorithm is executed.
Starting from the highest coded MDCT coefficient at index , we set and decrement by 1 as long as condition and evaluates to true. Then downward from the first line at index where this condition is not met (which is guaranteed since ), rounding toward zero with a rounding offset of 0.375 and limiting of the resulting integer values to the range –32768 to 32767 is performed:
(928)
with . Finally, all quantized coefficients of at and above are set to zero.
5.3.3.2.8 Arithmetic coder
The quantized spectral coefficients are noiselessly coded by an entropy coding and more particularly by an arithmetic coding.
The arithmetic coding uses 14 bits precision probabilities for computing its code. The alphabet probability distribution can be derived in different ways. At low rates, it is derived from the LPC envelope, while at high rates it is derived from the past context. In both cases, a harmonic model can be added for refining the probability model.
The following pseudo-code describes the arithmetic encoding routine, which is used for coding any symbol associated with a probability model. The probability model is represented by a cumulative frequency table cum_freq[]. The derivation of the probability model is described in the following subclauses.
/* global varibles */
low
high
bits_to_follow
ar_encode(symbol, cum_freq[])
{
if (ari_first_symbol()) {
low = 0;
high = 65535;
bits_to_follow = 0;
}
range = high-low+1;
if (symbol > 0) {
high = low + ((range*cum_freq[symbol-1])>>14) – 1;
}
low += ((range*cum_freq[symbol-1])>>14) – 1;
for (;;) {
if (high < 32768 ) {
write_bit(0);
while ( bits_to_follow ) {
write_bit(1);
bits_to_follow–;
}
}
else if (low >= 32768 ) {
write_bit(1)
while ( bits_to_follow ) {
write_bit(0);
bits_to_follow–;
}
low -= 32768;
high -= 32768;
}
else if ( (low >= 16384) && (high < 49152) ) {
bits_to_follow += 1;
low -= 16384;
high -= 16384;
}
else break;
low += low;
high += high+1;
}
if (ari_last_symbol()) /* flush bits */
if ( low < 16384 ) {
write_bit(0);
while ( bits_to_follow > 0) {
write_bit(1);
bits_to_follow–;
}
} else {
write_bit(1);
while ( bits_to_follow > 0) {
write_bit(0);
bits_to_follow–;
}
}
}
}
The helper functions ari_first_symbol() and ari_last_symbol() detect the first symbol and the last symbol of the generated codeword respectively.
5.3.3.2.8.1 Context based arithmetic codec
5.3.3.2.8.1.1 Global gain estimator
The estimation of the global gain for the TCX frame is performed in two iterative steps. The first estimate considers a SNR gain of 6dB per sample per bit from SQ. The second estimate refines the estimate by taking into account the entropy coding.
The energy of each block of 4 coefficients is first computed:
(929)
A bisection search is performed with a final resolution of 0.125dB:
Initialization: Set fac = offset = 12.8 and target = 0.15(target_bits – L/16)
Iteration: Do the following block of operations 10 times
1- fac=fac/2
2- offset = offset – fac
2-
3- if(ener>target) then offset=offset+fac
The first estimate of gain is then given by:
(930)
5.3.3.2.8.1.2 Rate-loop for constant bit rate and global gain
In order to set the best gain within the constraints of , convergence process of and is carried out by using following valuables and constants:
and denote weights corresponding to the lower bound the upper bound,
and denote gain corresponding to the lower bound the upper bound, and
and denote flags indicating and is found, respectively.
and are variables with and .
and are constants, set as 10 and 0.96.
After the initial estimate of bit consumption by arithmetic coding, is set 0 when is larger than, while is set as when is larger than .
If is larger than 0, that means is larger than ,
needs to be modified to be larger than the previous one and is set as TRUE, is set as the previous. is set as
, (931)
When was set, that means was smaller than , is updated as an interpolated value between upper bound and lower bound. ,
, (932)
Otherwise, that means is FALSE, gain is amplified as
, (933)
with larger amplification ratio when the ratio of (= and is larger to accelerate to attain .
If equals to 0, that means is smaller than,
should be smaller than the previous one and is set as 1, is set as the previous and is set as
, (934)
If has been already set, gain is calculated as
, (935)
otherwise, in order to accelerate to lower band gain , gain is reduced as,
, (936)
with larger reduction rates of gain when the ratio of and is small.
After above correction of gain, quantization is performed and estimation of by arithmetic coding is obtained. As a result, is set 0 when is larger than, and is set as when it is larger than . If the loop count is less than 4, either lower bound setting process or upper bound setting process is carried out at the next loop depending on the value . If the loop count is 4, the final gain and the quantized MDCT sequence are obtained.
5.3.3.2.8.1.3 Probability model derivation and coding
The quantized spectral coefficients X are noiselessly encoded starting from the lowest-frequency coefficient and progressing to the highest-frequency coefficient. They are encoded by groups of two coefficients a and b gathering in a so-called 2-tuple {a,b}.
Each 2-tuple {a,b} is split into three parts namely, MSB, LSB and the sign. The sign is coded independently from the magnitude using uniform probability distribution. The magnitude itself is further divided in two parts, the two most significant bits (MSBs) and the remaining least significant bitplanes (LSBs, if applicable). The 2-tuples for which the magnitude of the two spectral coefficients is lower or equal to 3 are coded directly by the MSB coding. Otherwise, an escape symbol is transmitted first for signalling any additional bit plane.
The relation between 2-tuple, the individual spectral values a and b of a 2-tuple, the most significant bit planes m and the remaining least significant bit planes, r, are illustrated in the example in figure 63. In this example three escape symbols are sent prior to the actual value m, indicating three transmitted least significant bit planes
Figure 63: Example of a coded pair (2-tuple) of spectral values a and b
and their representation as m and r.
The probability model is derived from the past context. The past context is translated on a 12 bits-wise index and maps with the lookup table ari_context_lookup [] to one of the 64 available probability models stored in ari_cf_m[].
The past context is derived from two 2-tuples already coded within the same frame. The context can be derived from the direct neighbourhood or located further in the past frequencies. Separate contexts are maintained for the peak regions (coefficients belonging to the harmonic peaks) and other (non-peak) regions according to the harmonic model. If no harmonic model is used, only the other (non-peak) region context is used.
The zeroed spectral values lying in the tail of spectrum are not transmitted. It is achieved by transmitting the index of last non-zeroed 2-tuple. If harmonic model is used, the tail of the spectrum is defined as the tail of spectrum consisting of the peak region coefficients, followed by the other (non-peak) region coefficients, as this definition tends to increase the number of trailing zeros and thus improves coding efficiency. The number of samples to encode is computed as follows:
(937)
The following data are written into the bitstream with the following order:
- lastnz/2-1 is coded on bits.
- The entropy-coded MSBs along with escape symbols.
- The signs with 1 bit-wise code-words
- The residual quantization bits described in section when the bit budget is not fully used.
- The LSBs are written backwardly from the end of the bitstream buffer.
The following pseudo-code describes how the context is derived and how the bitstream data for the MSBs, signs and LSBs are computed. The input arguments are the quantized spectral coefficients X[], the size of the considered spectrum L, the bit budget target_bits, the harmonic model parameters (pi, hi), and the index of the last non zeroed symbol lastnz.
ari_context_encode(X[], L,target_bits,pi[],hi[],lastnz)
{
c[0]=c[1]=p1=p2=0;
for (k=0; k<lastnz; k+=2) {
ari_copy_states();
(a1_i,p1,idx1) = get_next_coeff(pi,hi,lastnz);
(b1_i,p2,idx2) = get_next_coeff(pi,hi,lastnz);
t=get_context(idx1,idx2,c,p1,p2);
esc_nb = lev1 = 0;
a = a1 = abs(X[a1_i]);
b = b1 = abs(X[b1_i]);
/* sign encoding*/
if(a1>0) save_bit(X[a1_i]>0?0:1);
if(b1>0) save_bit(X[b1_i]>0?0:1);
/* MSB encoding */
while (a1 > 3 || b1 > 3) {
pki = ari_context_lookup[t+1024*esc_nb];
/* write escape codeword */
ari_encode(17, ari_cf_m[pki]);
a1>>=1; b1 >>=1; lev1++;
esc_nb = min(lev1,3);
}
pki = ari_context_lookup[t+1024*esc_nb];
ari_encode(a1+4*b1, ari_cf_m[pki]);
/* LSB encoding */
for(lev=0;lev<lev1;lev++){
write_bit_end((a>>lev)&1);
write_bit_end((b>>lev)&1);
}
/*check budget*/
if(nbbits>target_bits){
ari_restore_states();
break;
}
c=update_context(a,b,a1,b1,c,p1,p2);
}
write_sign_bits();
}
The helper functions ari_save_states() and ari_restore_states() are used for saving and restoring the arithmetic coder states respectively. It allows cancelling the encoding of the last symbols if it violates the bit budget. Moreover and in case of bit budget overflow, it is able to fill the remaining bits with zeros till reaching the end of the bit budget or till processing lastnz samples in the spectrum.
The other helper functions are described in the following subclauses.
5.3.3.2.8.1.4 Get next coefficient
(a,p,idx) = get_next_coeff(pi, hi, lastnz)
If ((ii[0] ≥ lastnz – min(#pi, lastnz)) or
(ii[1] < min(#pi, lastnz) and pi[ii[1]] < hi[ii[0]])) then
{
p=1
idx=ii[1]
a=pi[ii[1]]
}
else
{
p=0
idx=ii[0] + #pi
a=hi[ii[0]]
}
ii[p]=ii[p] + 1
The ii[0] and ii[1] counters are initialized to 0 at the beginning of ari_context_encode() (and ari_context_decode() in the decoder).
5.3.3.2.8.1.5 Context update
The context is updated as described by the following pseudo-code. It consists of the concatenation of two 4 bit-wise context elements.
if ()
{
if ()
{
If ()
}
if ()
{
if ()
}
}
else
{
if ()
else
}
5.3.3.2.8.1.6 Get context
The final context is amended in two ways:
if then
if then
The context t is an index from 0 to 1023.
5.3.3.2.8.1.7 Bit consumption estimation
The bit consumption estimation of the context-based arithmetic coder is needed for the rate-loop optimization of the quantization. The estimation is done by computing the bit requirement without calling the arithmetic coder. The generated bits can be accurately estimated by:
cum_freq= arith_cf_m[pki]+m
proba*= cum_freq[0]- cum_freq[1]
nlz=norm_l(proba) /*get the number of leading zero */
nbits=nlz
proba>>=14
where proba is an integer initialized to 16384 and m is a MSB symbol.
5.3.3.2.8.1.8 Harmonic model
For both context and envelope based arithmetic coding, a harmonic model is used for more efficient coding of frames with harmonic content. The model is disabled if any of the following conditions apply:
– The bit-rate is not one of 9.6, 13.2, 16.4, 24.4, 32, 48 kbps.
– The previous frame was coded by ACELP.
– Envelope based arithmetic coding is used and the coder type is neither Voiced nor Generic.
– The single-bit harmonic model flag in the bit-stream in set to zero.
When the model is enabled, the frequency domain interval of harmonics is a key parameter and is commonly analysed and encoded for both flavours of arithmetic coders.
5.3.3.2.8.1.8.1 Encoding of Interval of harmonics
When pitch lag and gain are used for the post processing, the lag parameter is utilized for representing the interval of harmonics in the frequency domain. Otherwise, normal representation of interval is applied.
5.3.3.2.8.1.8.1.1 Encoding interval depending on time domain pitch lag
If integer part of pitch lag in time domain is less than the frame size of MDCT , frequency domain interval unit (between harmonic peaks corresponding to the pitch lag) with 7 bit fractional accuracy is given by
(938)
where denotes the fractional part of pitch lag in time domain,denotes the max number of allowable fractional values whose values are either 4 or 6 depending on the conditions.
Since has limited range, the actual interval between harmonic peaks in the frequency domain is coded relatively to using the bits specified in table 88. Among candidate of multiplication factors, given in the table 89 or table 90, the multiplication number is selected that gives the most suitable harmonic interval of MDCT domain transform coefficients.
(939)
(940)
Table 88: Number of bits for specifying the multiplier depending on
|
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
|
|
NB: |
5 |
4 |
4 |
4 |
4 |
4 |
4 |
3 |
3 |
3 |
3 |
2 |
2 |
2 |
2 |
2 |
|
WB: |
5 |
5 |
5 |
5 |
5 |
5 |
4 |
4 |
4 |
4 |
4 |
4 |
4 |
2 |
2 |
2 |
Table 89: Candidates of multiplier in the order of depending on (NB)
|
0 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
|
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
30 |
32 |
34 |
36 |
38 |
40 |
|
|
1 |
0.5 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
16 |
20 |
24 |
30 |
|
2 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
14 |
16 |
18 |
20 |
24 |
30 |
|
3 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
14 |
16 |
18 |
20 |
24 |
30 |
|
4 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
14 |
16 |
18 |
20 |
24 |
30 |
|
5 |
1 |
2 |
2.5 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
14 |
16 |
18 |
20 |
|
6 |
1 |
1.5 |
2 |
2.5 |
3 |
3.5 |
4 |
4.5 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
16 |
|
7 |
1 |
2 |
3 |
4 |
5 |
6 |
8 |
10 |
– |
– |
– |
– |
– |
– |
– |
– |
|
8 |
1 |
2 |
3 |
4 |
5 |
6 |
8 |
10 |
– |
– |
– |
– |
– |
– |
– |
– |
|
9 |
1 |
1.5 |
2 |
3 |
4 |
5 |
6 |
8 |
– |
– |
– |
– |
– |
– |
– |
– |
|
10 |
1 |
2 |
2.5 |
3 |
4 |
5 |
6 |
8 |
– |
– |
– |
– |
– |
– |
– |
– |
|
11 |
1 |
2 |
3 |
4 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
12 |
1 |
2 |
4 |
6 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
13 |
1 |
2 |
3 |
4 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
14 |
1 |
1.5 |
2 |
4 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
15 |
1 |
1.5 |
2 |
3 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
16 |
0.5 |
1 |
2 |
3 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
Table 90: Candidates of multiplier in the order of depending on (WB)
|
0 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
|
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
30 |
32 |
34 |
36 |
38 |
40 |
|
|
1 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
14 |
16 |
18 |
20 |
22 |
|
24 |
26 |
28 |
30 |
32 |
34 |
36 |
38 |
40 |
44 |
48 |
54 |
60 |
68 |
78 |
80 |
|
|
2 |
1.5 |
2 |
2.5 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
14 |
16 |
18 |
20 |
|
22 |
24 |
26 |
28 |
30 |
32 |
34 |
36 |
38 |
40 |
42 |
44 |
48 |
52 |
54 |
68 |
|
|
3 |
1 |
1.5 |
2 |
2.5 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
|
15 |
16 |
18 |
20 |
22 |
24 |
26 |
28 |
30 |
32 |
34 |
36 |
40 |
44 |
48 |
54 |
|
|
4 |
1 |
1.5 |
2 |
2.5 |
3 |
3.5 |
4 |
4.5 |
5 |
5.5 |
6 |
6.5 |
7 |
7.5 |
8 |
9 |
|
10 |
11 |
12 |
13 |
14 |
15 |
16 |
18 |
20 |
22 |
24 |
26 |
28 |
34 |
40 |
41 |
|
|
5 |
1 |
1.5 |
2 |
2.5 |
3 |
3.5 |
4 |
4.5 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
|
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22.5 |
24 |
25 |
27 |
28 |
30 |
35 |
|
|
6 |
0.5 |
1 |
1.5 |
2 |
2.5 |
3 |
3.5 |
4 |
4.5 |
5 |
5.5 |
6 |
7 |
8 |
9 |
10 |
|
7 |
1 |
2 |
2.5 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
15 |
16 |
18 |
27 |
|
8 |
1 |
1.5 |
2 |
2.5 |
3 |
3.5 |
4 |
5 |
6 |
8 |
10 |
15 |
18 |
22 |
24 |
26 |
|
9 |
1 |
1.5 |
2 |
2.5 |
3 |
3.5 |
4 |
5 |
6 |
8 |
10 |
12 |
13 |
14 |
18 |
21 |
|
10 |
0.5 |
1 |
1.5 |
2 |
2.5 |
3 |
4 |
5 |
6 |
8 |
9 |
11 |
12 |
13.5 |
16 |
20 |
|
11 |
0.5 |
1 |
1.5 |
2 |
2.5 |
3 |
4 |
5 |
6 |
7 |
8 |
10 |
11 |
12 |
14 |
20 |
|
12 |
0.5 |
1 |
1.5 |
2 |
2.5 |
3 |
4 |
4.5 |
6 |
7.5 |
9 |
10 |
12 |
14 |
15 |
18 |
|
13 |
0.5 |
1 |
1.25 |
1.5 |
1.75 |
2 |
2.5 |
3 |
3.5 |
4 |
4.5 |
5 |
6 |
8 |
9 |
14 |
|
14 |
0.5 |
1 |
2 |
4 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
15 |
1 |
1.5 |
2 |
4 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
|
16 |
1 |
2 |
3 |
4 |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
– |
5.3.3.2.8.1.8.1.2 Encoding interval without depending on time domain pitch lag
When pitch lag and gain in the time domain is not used or the pitch gain is less than or equals to 0.46, normal encoding of the interval with un-equal resolution is used.
Unit interval of spectral peaks is coded as
, (941)
and actual interval is represented with fractional resolution of as
. (942)
Each paramter is shown in table 91, where “small size” means when frame size is smaller than 256 of the target bit rates is less than or equal to 150.
Table 91: Un-equal resolution for coding of (0<= index < 256)
|
Res |
base |
bias |
|
|
3 |
6 |
0 |
|
|
4 |
8 |
16 |
|
|
3 |
12 |
80 |
|
|
“small size” or |
1 |
28 |
208 |
|
0 |
188 |
224 |
5.3.3.2.8.1.8.2 Void
5.3.3.2.8.1.8.3 Search for interval of harmonics
In search of the best interval of harmonics, encoder tries to find the index which can maximize the weighted sum of the peak part of absolute MDCT coefficients. denotes sum of 3 samples of absolute value of MDCT domain transform coefficients as
(943)
(944)
where num_peak is the maximum number that reaches the limit of samples in the frequency domain.
In case interval does not rely on the pitch lag in time domain, hierarchical search is used to save computational cost. If the index of the interval is less than 80, periodicity is checked by a coarse step of 4. After getting the best interval, finer periodicity is searched around the best interval from -2 to +2. If index is equal to or larger than 80, periodicity is searched for each index.
5.3.3.2.8.1.8.4 Decision of harmonic model
At the initial estimation, number of used bits without harmonic model, , and one with harmonic model, is obtained and the indicator of consumed bits is defined as
, (945)
, (946)
, (947)
where denotes the additional bits for model parameters of periodic harmonic structure, and and indicate the consumed bits when they are larger than the target bits. Thus, the larger, the more preferable to use harmonic model. Relative periodicity is defined as the normalized sum of absolute values for peak regions of the shaped MDCT coefficients as
, (948)
where is the harmonic interval that attain the max value of. When the score of periodicity of this frame is larger than the threshold as
, (949)
this frame is considered to be coded by the harmonic model. The shaped MDCT coefficients divided by gain are quantized to produce a sequence of integer values of MDCT coefficients, , and compressed by arithmetic coding with harmonic model. This process needs iterative convergence process (rate loop) to get and with consumed bits . At the end of convergence, in order to validate harmonic model, the consumed bits by arithmetic coding with normal (non-harmonic) model for is additionally calculated and compared with . If is larger than, arithmetic coding of is revert to use normal model. can be used for residual quantization for further enhancements. Otherwise, harmonic model is used in arithmetic coding.
In contrast, if the indicator of periodicity of this frame is smaller than or the same as the threshold, quantization and arithmetic coding are carried out assuming the normal model to produce a sequence of integer values of the shaped MDCT coefficients, with consumed bits . After convergence of rate loop, consumed bits by arithmetic coding with harmonic model for is calculated. If is larger than , arithmetic coding of is switched to use harmonic model. Otherwise, normal model is used in arithmetic coding.
5.3.3.2.8.1.9 Use of harmonic information in Context based arithmetic coding
For context based arithmetic coding, all regions are classified into two categories. One is peak part and consists of 3 consecutive samples centered at ( is a positive integer up to the limit) peak of harmonic peak of,
. (950)
The other samples belong to normal or valley part. Harmonic peak part can be specified by the interval of harmonics and integer multiples of the interval. Arithmetic coding uses different contexts for peak and valley regions.
For ease of description and implementation, the harmonic model uses the following index sequences:
, (951)
, (952)
, the concatenation of and . (953)
In case of disabled harmonic model, these sequences are , and .
5.3.3.2.8.2 Envelope based arithmetic coder
In the MDCT domain, spectral lines are weighted with the perceptual model such that each line can be quantized with the same accuracy. The variance of individual spectral lines follow the shape of the linear predictor weighted by the perceptual model, whereby the weighted shape is . is calculated by transforming to frequency domain LPC gains as detailed in subclauses 5.3.3.2.4.1 and 5.3.3.2.4.2. is derived from after conversion to direct-form coefficients, and applying tilt compensation , and finally transforming to frequency domain LPC gains. All other frequency-shaping tools, as well as the contribution from the harmonic model, shall be also included in this envelope shape . Observe that this gives only the relative variances of spectral lines, while the overall envelope has arbitrary scaling, whereby we must begin by scaling the envelope.
5.3.3.2.8.2.1 Envelope scaling
We will assume that spectral lines are zero-mean and distributed according to the Laplace-distribution, whereby the probability distribution function is
(954)
The entropy and thus the bit-consumption of such a spectral line is . However, this formula assumes that the sign is encoded also for those spectral lines which are quantized to zero. To compensate for this discrepancy, we use instead the approximation
(955)
which is accurate for . We will assume that the bit-consumption of lines with is which matches the bit-consumption at . For large we use the true entropy for simplicity.
The variance of spectral lines is then If is the th element of the power of the envelope shape then describes the relative energy of spectral lines such that where is scaling coefficient. In other words, describes only the shape of the spectrum without any meaningful magnitude and is used to scale that shape to obtain the actual variance .
Our objective is that when we encode all lines of the spectrum with an arithmetic coder, then the bit-consumption matches a pre-defined level , that is, . We can then use a bi-section algorithm to determine the appropriate scaling factor such that the target bit-rate is reached.
Once the envelope shape has been scaled such that the expected bit-consumption of signals matching that shape yield the target bit-rate, we can proceed to quantizing the spectral lines.
5.3.3.2.8.2.2 Quantization rate loop
Assume that is quantized to an integer such that the quantization interval is then the probability of a spectral line occurring in that interval is for
(956)
and for
(957)
It follows that the bit-consumption for these two cases is in the ideal case
(958)
By pre-computing the terms and , we can efficiently calculate the bit-consumption of the whole spectrum.
The rate-loop can then be applied with a bi-section search, where we adjust the scaling of the spectral lines by a factor , and calculate the bit-consumption of the spectrum , until we are sufficiently close to the desired bit-rate. Note that the above ideal-case values for the bit-consumption do not necessarily perfectly coincide with the final bit-consumption, since the arithmetic codec works with a finite-precision approximation. This rate-loop thus relies on an approximation of the bit-consumption, but with the benefit of a computationally efficient implementation.
When the optimal scaling has been determined, the spectrum can be encoded with a standard arithmetic coder. A spectral line which is quantized to a value is encoded to the interval
(959)
and is encoded onto the interval
(960)
The sign of will be encoded with one further bit.
Observe that the arithmetic coder must operate with a fixed-point implementation such that the above intervals are bit-exact across all platforms. Therefore all inputs to the arithmetic coder, including the linear predictive model and the weighting filter, must be implemented in fixed-point throughout the system
5.3.3.2.8.2.3 Probability model derivation and coding
When the optimal scaling has been determined, the spectrum can be encoded with a standard arithmetic coder. A spectral line which is quantized to a value is encoded to the interval
(961)
and is encoded onto the interval
(962)
The sign of will be encoded with one further bit.
5.3.3.2.8.2.4 Harmonic model in envelope based arithmetic coding
In case of envelope base arithmetic coding, harmonic model can be used to enhance the arithmetic coding. The similar search procedure as in the context based arithmetic coding is used for estimating the interval between harmonics in the MDCT domain. However, the harmonic model is used in combination of the LPC envelope as shown in figure 64. The shape of the envelope is rendered according to the information of the harmonic analysis.
Harmonic shape at in the frequency data sample is defined as
, (963)
, otherwise , where denotes center position of harmonics.
(964)
and are height and width of each harmonics depending on the unit interval as shown,
(965)
(966)
Height and width get larger when interval gets larger.
The spectral envelopeis modified by the harmonic shape at as
, (967)
where gain for the harmonic components is always set as 0.75 for Generic mode, and is selected from {0.6, 1.4, 4.5, 10.0} that minimizes for Voiced mode using 2 bits,
, (968)
. (969)
Figure 64: Example of harmonic envelope combined with LPC envelope used in envelope based arithmetic coding.
5.3.3.2.9 Global gain coding
5.3.3.2.9.1 Optimizing global gain
The optimum global gain is computed from the quantized and unquantized MDCT coefficients. For bit rates up to 32 kbps, the adaptive low frequency de-emphasis (see subclause 6.2.2.3.2) is applied to the quantized MDCT coefficients before this step. In case the computation results in an optimum gain less than or equal to zero, the global gain determined before (by estimate and rate loop) is used.
(970)
(971)
5.3.3.2.9.2 Quantization of global gain
For transmission to the decoder the optimum global gain is quantized to a 7 bit index :
(972)
The dequantized global gain is obtained as defined in subclause 6.2.2.3.3).
5.3.3.2.9.3 Residual coding
The residual quantization is a refinement quantization layer refining the first SQ stage. It exploits eventual unused bits target_bits-nbbits, where nbbits is the number of bits consumed by the entropy coder. The residual quantization adopts a greedy strategy and no entropy coding in order to stop the coding whenever the bit-stream reaches the desired size.
The residual quantization can refine the first quantization by two means. The first mean is the refinement of the global gain quantization. The global gain refinement is only done for rates at and above 13.2kbps. At most three additional bits is allocated to it. The quantized gain is refined sequentially starting from n=0 and incrementing n by one after each following iteration:
The second mean of refinement consists of re-quantizing the quantized spectrum line per line. First, the non-zeroed quantized lines are processed with a 1 bit residual quantizer:
Finally, if bits remain, the zeroed lines are considered and quantized with on 3 levels. The rounding offset of the SQ with deadzone was taken into account in the residual quantizer design:
5.3.3.2.10 Noise Filling
On the decoder side noise filling is applied to fill gaps in the MDCT spectrum where coefficients have been quantized to zero. Noise filling inserts pseudo-random noise into the gaps, starting at bin up to bin . To control the amount of noise inserted in the decoder, a noise factor is computed on encoder side and transmitted to the decoder.
5.3.3.2.10.1 Noise Filling Tilt
To compensate for LPC tilt, a tilt compensation factor is computed. For bitrates below 13.2 kbps the tilt compensation is computed from the direct form quantized LP coefficients , while for higher bitrates a constant value is used:
(973)
(974)
5.3.3.2.10.2 Noise Filling Start and Stop Bins
The noise filling start and stop bins are computed as follows:
(975)
(976)
5.3.3.2.10.3 Noise Transition Width
At each side of a noise filling segment a transition fadeout is applied to the inserted noise. The width of the transitions (number of bins) is defined as:
(977)
where denotes that the harmonic model is used for the arithmetic codec and denotes the previous codec mode.
5.3.3.2.10.4 Computation of Noise Segments
The noise filling segments are determined, which are the segments of successive bins of the MDCT spectrum between and for which all coefficients are quantized to zero. The segments are determined as defined by the following pseudo-code:
where and are the start and stop bins of noise filling segment j, and is the number of segments.
5.3.3.2.10.5 Computation of Noise Factor
The noise factor is computed from the unquantized MDCT coefficients of the bins for which noise filling is applied.
If the noise transition width is 3 or less bins, an attenuation factor is computed based on the energy of even and odd MDCT bins:
(978)
(979)
(980)
For each segment an error value is computed from the unquantized MDCT coefficients, applying global gain, tilt compensation and transitions:
(981)
A weight for each segment is computed based on the width of the segment:
(982)
The noise factor is then computed as follows:
(983)
5.3.3.2.10.6 Quantization of Noise Factor
For transmission the noise factor is quantized to obtain a 3 bit index:
(984)
5.3.3.2.11 Intelligent Gap Filling
The Intelligent Gap Filling (IGF) tool is an enhanced noise filling technique to fill gaps (regions of zero values) in spectra. These gaps may occur due to coarse quantization in the encoding process where large portions of a given spectrum might be set to zero to meet bit constraints. However, with the IGF tool these missing signal portions are reconstructed on the receiver side (RX) with parametric information calculated on the transmission side (TX). IGF is used only if TCX mode is active.
See table 92 below for all IGF operating points:
Table 92: IGF application modes
|
Bitrate |
Mode |
|
9.6 kbps |
WB |
|
9.6 kbps |
SWB |
|
13.2 kbps |
SWB |
|
16.4 kbps |
SWB |
|
24.4 kbps |
SWB |
|
32.2 kbps |
SWB |
|
48.0 kbps |
SWB |
|
16.4 kbps |
FB |
|
24.4 kbps |
FB |
|
32.0 kbps |
FB |
|
48.0 kbps |
FB |
|
96.0 kbps |
FB |
|
128.0 kbps |
FB |
On transmission side, IGF calculates levels on scale factor bands, using a complex or real valued TCX spectrum. Additionally spectral whitening indices are calculated using a spectral flatness measurement and a crest-factor. An arithmetic coder is used for noiseless coding and efficient transmission to receiver (RX) side.
5.3.3.2.11.1 IGF helper functions
5.3.3.2.11.1.1 Mapping values with the transition factor
If there is a transition from CELP to TCX coding () or a TCX 10 frame is signalled (), the TCX frame length may change. In case of frame length change, all values which are related to the frame length are mapped with the function :
(985)
where is a natural number, for example a scale factor band offset, and is a transition factor, see table 97.
5.3.3.2.11.1.2 TCX power spectrum
The power spectrum of the current TCX frame is calculated with:
(986)
where is the actual TCX window length, is the vector containing the real valued part (cos-transformed) of the current TCX spectrum, and is the vector containing the imaginary (sin-transformed) part of the current TCX spectrum.
5.3.3.2.11.1.3 The spectral flatness measurement function
Let be the TCX power spectrum as calculated according to subclause 5.3.3.2.11.1.2 and the start line and the stop line of the SFM measurement range.
Thefunction, applied with IGF, is defined with:
(987)
where is the actual TCX window length and is defined with:
(988)
5.3.3.2.11.1.4 The crest factor function
Let be the TCX power spectrum as calculated according to subclause 5.3.3.2.11.1.2 and the start line and the stop line of the crest factor measurement range.
The function, applied with IGF, is defined with:
(989)
where is the actual TCX window length and is defined with:
(990)
5.3.3.2.11.1.5 The mapping function
The mapping function is defined with:
(991)
where is a calculated spectral flatness value and is the noise band in scope. For threshold values , refer to table 93 below.
Table 93: Thresholds for whitening for , and
|
Bitrate |
Mode |
nT |
ThM |
ThS |
|
9.6 kbps |
WB |
2 |
0.36, 0.36 |
1.41, 1.41 |
|
9.6 kbps |
SWB |
3 |
0.84, 0.89, 0.89 |
1.30, 1.25, 1.25 |
|
13.2 kbps |
SWB |
2 |
0.84, 0.89 |
1.30, 1.25 |
|
16.4 kbps |
SWB |
3 |
0.83, 0.89, 0.89 |
1.31, 1.19, 1.19 |
|
24.4 kbps |
SWB |
3 |
0.81, 0.85, 0.85 |
1.35, 1.23, 1.23 |
|
32.2 kbps |
SWB |
3 |
0.91, 0.85, 0.85 |
1.34, 1.35, 1.35 |
|
48.0 kbps |
SWB |
1 |
1.15 |
1.19 |
|
16.4 kbps |
FB |
3 |
0.63, 0.27, 0.36 |
1.53, 1.32, 0.67 |
|
24.4 kbps |
FB |
4 |
0.78, 0.31, 0.34, 0.34 |
1.49, 1.38, 0.65, 0.65 |
|
32.0 kbps |
FB |
4 |
0.78, 0.31, 0.34, 0.34 |
1.49, 1.38, 0.65, 0.65 |
|
48.0 kbps |
FB |
1 |
0.80 |
1.0 |
|
96.0 kbps |
FB |
1 |
0 |
2.82 |
|
128.0 kbps |
FB |
1 |
0 |
2.82 |
5.3.3.2.11.1.6 Void
5.3.3.2.11.1.7 IGF scale factor tables
IGF scale factor tables are available for all modes where IGF is applied.
Table 94: Scale factor band offset table
|
Bitrate |
Mode |
Number of bands (nB) |
Scale factor band offsets (t[0],t[1],…,t[nB]) |
|
9.6 kbps |
WB |
3 |
164, 186, 242, 320 |
|
9.6 kbps |
SWB |
3 |
200, 322, 444, 566 |
|
13.2 kbps |
SWB |
6 |
256, 288, 328, 376, 432, 496, 566 |
|
16.4 kbps |
SWB |
7 |
256, 288, 328, 376, 432, 496, 576, 640 |
|
24.4 kbps |
SWB |
8 |
256, 284, 318, 358, 402, 450, 508, 576, 640 |
|
32.2 kbps |
SWB |
8 |
256, 284, 318, 358, 402, 450, 508, 576, 640 |
|
48.0 kbps |
SWB |
3 |
512, 534, 576, 640 |
|
16.4 kbps |
FB |
9 |
256, 288, 328, 376, 432, 496, 576, 640, 720, 800 |
|
24.4 kbps |
FB |
10 |
256, 284, 318, 358, 402, 450, 508, 576, 640, 720, 800 |
|
32.0 kbps |
FB |
10 |
256, 284, 318, 358, 402, 450, 508, 576, 640, 720, 800 |
|
48.0 kbps |
FB |
4 |
512, 584, 656, 728, 800 |
|
96.0 kbps |
FB |
2 |
640, 720, 800 |
|
128.0 kbps |
FB |
2 |
640, 720, 800 |
The table 94 above refers to the TCX 20 window length and a transition factor 1.00.
For all window lengths apply the following remapping
(992)
where is the transition factor mapping function described in subclause 5.3.3.2.11.1.1.
5.3.3.2.11.1.8 The mapping function
Table 95: IGF minimal source subband,
|
Bitrate |
mode |
|
|
9.6 kbps |
WB |
30 |
|
9.6 kbps |
SWB |
32 |
|
13.2 kbps |
SWB |
32 |
|
16.4 kbps |
SWB |
32 |
|
24.4 kbps |
SWB |
32 |
|
32.2 kbps |
SWB |
32 |
|
48.0 kbps |
SWB |
64 |
|
16.4 kbps |
FB |
32 |
|
24.4 kbps |
FB |
32 |
|
32.0 kbps |
FB |
32 |
|
48.0 kbps |
FB |
64 |
|
96.0 kbps |
FB |
64 |
|
128.0 kbps |
FB |
64 |
For every mode a mapping function is defined in order to access source lines from a given target line in IGF range.
Table 96: Mapping functions for every mode
|
Bitrate |
Mode |
nT |
mapping Function |
|
9.6 kbps |
WB |
2 |
|
|
9.6 kbps |
SWB |
3 |
|
|
13.2 1kbps |
SWB |
2 |
|
|
16.4 kbps |
SWB |
3 |
|
|
24.4 kbps |
SWB |
3 |
|
|
32.2 kbps |
SWB |
3 |
|
|
48.0 kbps |
SWB |
1 |
|
|
16.4 kbps |
FB |
3 |
|
|
24.4 kbps |
FB |
4 |
|
|
32.0 kbps |
FB |
4 |
|
|
48.0 kbps |
FB |
1 |
|
|
96.0 kbps |
FB |
1 |
|
|
128.0 kbps |
FB |
1 |
The mapping function is defined with:
(993)
The mapping function is defined with:
(994)
The mapping function is defined with:
(995)
The mapping function is defined with:
(996)
The mapping function is defined with:
(997)
The mapping function is defined with:
(998)
The mapping function is defined with:
(999)
The mapping function is defined with:
(1000)
The value is the appropriate transition factor, see table 97 and is described in subclause 5.3.3.2.11.1.1.
Please note, that all values shall be already mapped with the function as described in subclause 5.3.3.2.11.1.1. Values for are defined in table 94.
The here described mapping functions will be referenced in the text as “mapping function m” assuming, that the proper function for the current mode is selected.
5.3.3.2.11.2 IGF input elements (TX)
The IGF encoder module expects the following vectors and flags as an input:
: vector with real part of the current TCX spectrum
: vector with imaginary part of the current TCX spectrum
: vector with values of the TCX power spectrum
: flag, signalling if the current frame contains a transient, see subclause 5.3.2.4.1.1
: flag, signalling a TCX 10 frame
: flag, signalling a TCX 20 frame
: flag, signalling CELP to TCX transition; generate flag by test whether last frame was CELP
: flag, signalling that the current frame is independent from the previous frame
Listed in table 97, the following combinations signalled through flags , and are allowed with IGF:
Table 97: TCX transitions, transition factor , window length
|
Bitrate / Mode |
|
|
|
Transition factor |
Window length |
|
9.6 kbps / WB |
false |
true |
false |
1.00 |
320 |
|
false |
true |
true |
1.25 |
400 |
|
|
9.6 kbps / SWB |
false |
true |
false |
1.00 |
640 |
|
false |
true |
true |
1.25 |
800 |
|
|
13.2 kbps / SWB |
false |
true |
false |
1.00 |
640 |
|
false |
true |
true |
1.25 |
800 |
|
|
16.4 kbps / SWB |
false |
true |
false |
1.00 |
640 |
|
false |
true |
true |
1.25 |
800 |
|
|
24.4 kbps / SWB |
false |
true |
false |
1.00 |
640 |
|
false |
true |
true |
1.25 |
800 |
|
|
32.0 kbps / SWB |
false |
true |
false |
1.00 |
640 |
|
false |
true |
true |
1.25 |
800 |
|
|
48.0 kbps / SWB |
false |
true |
false |
1.00 |
640 |
|
false |
true |
true |
1.00 |
640 |
|
|
true |
false |
false |
0.50 |
320 |
|
|
16.4 kbps / FB |
false |
true |
false |
1.00 |
960 |
|
false |
true |
true |
1.25 |
1200 |
|
|
24.4 kbps / FB |
false |
true |
false |
1.00 |
960 |
|
false |
true |
true |
1.25 |
1200 |
|
|
32.0 kbps / FB |
false |
true |
false |
1.00 |
960 |
|
false |
true |
true |
1.25 |
1200 |
|
|
48.0 kbps / FB |
false |
true |
false |
1.00 |
960 |
|
false |
true |
true |
1.00 |
960 |
|
|
true |
false |
false |
0.50 |
480 |
|
|
96.0 kbps / FB |
false |
true |
false |
1.00 |
960 |
|
false |
true |
true |
1.00 |
960 |
|
|
true |
false |
false |
0.50 |
480 |
|
|
128.0 kbps / FB |
false |
true |
false |
1.00 |
960 |
|
false |
true |
true |
1.00 |
960 |
|
|
true |
false |
false |
0.50 |
480 |
5.3.3.2.11.3 IGF functions on transmission (TX) side
All function declaration assumes that input elements are provided by a frame by frame basis. The only exceptions are two consecutive TCX 10 frames, where the second frame is encoded dependent on the first frame.
5.3.3.2.11.4 IGF scale factor calculation
This subclause describes how the IGF scale factor vector is calculated on transmission (TX) side.
5.3.3.2.11.4.1 Complex valued calculation
In case the TCX power spectrum is available the IGF scale factor values are calculated using :
(1001)
and let be the mapping function which maps the IGF target range into the IGF source range described in subclause 5.3.3.2.11.1.8, calculate:
(1002)
(1003)
where shall be already mapped with the function see subclause 5.3.3.2.11.1.1, and are the number of IGF scale factor bands, see table 94.
Calculate with:
(1004)
and limit to the range with
(1005)
The values will be transmitted to the receiver (RX) side after further lossless compression with an arithmetic coder described in subclause 5.3.3.2.11.8.
5.3.3.2.11.4.2 Real valued calculation
If the TCX power spectrum is not available calculate:
(1006)
where shall be already mapped with the function see subclause 5.3.3.2.11.1.1, and are the number of bands, see table 94.
Calculate with:
(1007)
and limit to the range with
(1008)
The values will be transmitted to the receiver (RX) side after further lossless compression with an arithmetic coder described in subclause 5.3.3.2.11.8.
5.3.3.2.11.5 IGF tonal mask
In order to determine which spectral components should be transmitted with the core coder, a tonal mask is calculated. Therefore all significant spectral content is identified whereas content that is well suited for parametric coding through IGF is quantized to zero.
5.3.3.2.11.5.1 IGF tonal mask calculation
In case the TCX power spectrum is not available, all spectral content above is deleted:
(1009)
where is the real valued TCX spectrum after applying TNS and is the current TCX window length.
In case the TCX power spectrum is available, calculate:
(1010)
where is the first spectral line in IGF range and is 1.0 for 9.6 and 13.2 kbit/s SWB and 2.0 for all other configurations.
Given , apply the following algorithm:
Initialize and :
for (i = ; i < ; i++) {
if () {
} else if() {
}
}
if , set
5.3.3.2.11.6 IGF spectral flatness calculation
Table 98: Number of tiles and tile width
|
Bitrate |
Mode |
|
|
|
9.6 kbps |
WB |
2 |
|
|
9.6 kbps |
SWB |
3 |
|
|
13.2 kbps |
SWB |
2 |
|
|
16.4 kbps |
SWB |
3 |
|
|
24.4 kbps |
SWB |
3 |
|
|
32.2 kbps |
SWB |
3 |
|
|
48.0 kbps |
SWB |
1 |
|
|
16.4 kbps |
FB |
3 |
|
|
24.4 kbps |
FB |
4 |
|
|
32.0 kbps |
FB |
4 |
|
|
48.0 kbps |
FB |
1 |
|
|
96.0 kbps |
FB |
1 |
|
|
128.0 kbps |
FB |
1 |
For the IGF spectral flatness calculation two static arrays, and , both of size are needed to hold filter-states over frames. Additionally a static flag is needed to save the information of the input flag from the previous frame.
5.3.3.2.11.6.1 Resetting filter states
The vectors and are both static arrays of size in the IGF module and both arrays are initialised with zeroes:
(1011)
This initialisation shall be done
- with codec start up
- with any bitrate switch
- with any codec type switch
- with a transition from CELP to TCX, e.g.
- if the current frame has transient properties, e.g.
5.3.3.2.11.6.2 Resetting current whitening levels
The vector shall be initialised with zero for all tiles,
(1012)
- with codec start up
- with any bitrate switch
- with any codec type switch
- with a transition from CELP to TCX, e.g.
5.3.3.2.11.6.3 Calculation of spectral flatness indices
The following steps 1) to 4) shall be executed consecutive:
- Update previous level buffers and initialize current levels:
(1013)
In case or is true, apply
(1014)
else, if the power spectrum is available, calculate
(1015)
with
(1016)
where is a spectral flatness measurement function, described in subclause 5.3.3.2.11.1.3 and is a crest-factor function described in subclause 5.3.3.2.11.1.4.
Calculate:
(1017)
After calculation of the vector , the filter states are updated with:
(1018)
- A mapping function is applied to the calculated values to obtain a whitening level index vector The mapping function is described in subclause 5.3.3.2.11.1.5.
(1019)
- With selected modes, see table 99, apply the following final mapping:
(1020)
Table 99: modes for step 4) mapping
|
Bitrate |
mode |
mapping |
|
9.6 kbps |
WB |
apply |
|
9.6 kbps |
SWB |
apply |
|
13.2 kbps |
SWB |
NOP |
|
16.4 kbps |
SWB |
apply |
|
24.4 kbps |
SWB |
apply |
|
32.2 kbps |
SWB |
apply |
|
48.0 kbps |
SWB |
NOP |
|
16.4 kbps |
FB |
apply |
|
24.4 kbps |
FB |
apply |
|
32.0 kbps |
FB |
apply |
|
48.0 kbps |
FB |
NOP |
|
96.0 kbps |
FB |
NOP |
|
128.0 kbps |
FB |
NOP |
After executing step 4) the whitening level index vector is ready for transmission.
5.3.3.2.11.6.4 Coding of IGF whitening levels
IGF whitening levels, defined in the vector , are transmitted using 1 or 2 bits per tile. The exact number of total bits required depends on the actual values contained in and the value of the flag. The detailed processing is described in the pseudo code below:
= 1;
nTiles = ;
k = 0;
if (
= 0;
} else {
for (k = 0; k < nTiles ; k++) {
if ( != ) {
= 0;
break;
}
}
}
if () {
write_bit(1);
} else {
if (!) {
write_bit(0);
}
encode_whitening_level();
for (k = 1; k < nTiles ; k++) {
= 1;
if ( != ) {
= 0;
break;
}
}
if (!) {
write_bit(1);
for (k = 1; k < nTiles ; k++) {
encode_whitening_level();
}
} else {
write_bit(0);
}
}
wherein the vector contains the whitening levelsfrom the previous frame and the function encode_whitening_level takes care of the actual mapping of the whitening level to a binary code. The function is implemented according to the pseudo code below:
if (
write_bit(0);
} else {
write_bit(1);
if (
write_bit(0);
} else {
write_bit(1);
}
}
5.3.3.2.11.7 IGF temporal flatness indicator
The temporal envelope of the reconstructed signal by the IGF is flattened on the receiver (RX) side according to the transmitted information on the temporal envelope flatness, which is an IGF flatness indicator.
The temporal flatness is measured as the linear prediction gain in the frequency domain. Firstly, the linear prediction of the real part of the current TCX spectrum is performed and then the prediction gain is calculated:
(1021)
where = i-th PARCOR coefficient obtained by the linear prediction.
From the prediction gain and the prediction gain described in subclause 5.3.3.2.2.3, the IGF temporal flatness indicator flag is defined as
(1022)
5.3.3.2.11.8 IGF noiseless coding
The IGF scale factor vector is noiseless encoded with an arithmetic coder in order to write an efficient representation of the vector to the bit stream.
The module uses the common raw arithmetic encoder functions from the infrastructure, which are provided by the core encoder. The functions used are , which encodes the value , , which encodes from an alphabet of 27 symbols () using the cumulative frequency table , , which initializes the arithmetic encoder, and , which finalizes the arithmetic encoder.
5.3.3.2.11.8.1 IGF independency flag
The internal state of the arithmetic encoder is reset in case the flag has the value . This flag may be set to only in modes where TCX10 windows (see table 97) are used for the second frame of two consecutive TCX 10 frames.
5.3.3.2.11.8.2 IGF all-Zero flag
The IGF all-Zero flag signals that all of the IGF scale factors are zero:
(1023)
The flag is written to the bit stream first. In case the flag is , the encoder state is reset and no further data is written to the bit stream, otherwise the arithmetic coded scale factor vector follows in the bit stream.
5.3.3.2.11.8.3 IGF arithmetic encoding helper functions
5.3.3.2.11.8.3.1 The reset function
The arithmetic encoder states consist of , and the vector, which represents the value of the vector preserved from the previous frame. When encoding the vector , the value 0 for means that there is no previous frame available, therefore is undefined and not used. The value 1 for means that there is a previous frame available therefore has valid data and it is used, this being the case only in modes where TCX10 windows (see table 97) are used for the second frame of two consecutive TCX 10 frames. For resetting the arithmetic encoder state, it is enough to set .
If a frame has set, the encoder state is reset before encoding the scale factor vector . Note that the combination and is valid, and may happen for the second frame of two consecutive TCX 10 frames, when the first frame had . In this particular case, the frame uses no context information from the previous frame (the vector), because , and it is actually encoded as an independent frame.
5.3.3.2.11.8.3.2 The arith_encode_bits function
The function encodes an unsigned integer , of length bits, by writing one bit at a time.
arith_encode_bits(x, nBits)
{
for (i = nBits – 1; i >= 0; –i) {
bit = (x >> i) & 1;
ari_encode_14bits_sign(bit);
}
}
5.3.3.2.11.8.3.2 The save and restore encoder state functions
Saving the encoder state is achieved using the function , which copies and vector into and vector, respectively. Restoring the encoder state is done using the complementary function , which copies back and vector into and vector, respectively.
5.3.3.2.11.8.4 IGF arithmetic encoding
Please note that the arithmetic encoder should be capable of counting bits only, e.g., performing arithmetic encoding without writing bits to the bit stream. If the arithmetic encoder is called with a counting request, by using the parameter set to , the internal state of the arithmetic encoder shall be saved before the call to the top level function and restored and after the call, by the caller. In this particular case, the bits internally generated by the arithmetic encoder are not written to the bit stream.
The function encodes the integer valued prediction residual , using the cumulative frequency table , and the table offset . The table offset is used to adjust the value before encoding, in order to minimize the total probability that a very small or a very large value will be encoded using escape coding, which slightly is less efficient. The values which are between and , inclusive, are encoded directly using the cumulative frequency table , and an alphabet size of .
For the above alphabet of SYMBOLS_IN_TABLE symbols, the values 0 and are reserved as escape codes to indicate that a value is too small or too large to fit in the default interval. In these cases, the value indicates the position of the value in one of the tails of the distribution. The value is encoded using 4 bits if it is in the range , or using 4 bits with value 15 followed by extra 6 bits if it is in the range {15, or using 4 bits with value 15 followed by extra 6 bits with value 63 followed by extra 7 bits if it is larger or equal than . The last of the three cases is mainly useful to avoid the rare situation where a purposely constructed artificial signal may produce an unexpectedly large residual value condition in the encoder.
arith_encode_residual(x, cumulativeFrequencyTable, tableOffset)
{
x += tableOffset;
if ((x >= MIN_ENC_SEPARATE) && (x <= MAX_ENC_SEPARATE)) {
ari_encode_14bits_ext((x – MIN_ENC_SEPARATE) + 1, cumulativeFrequencyTable);
return;
} else if (x < MIN_ENC_SEPARATE) {
extra = (MIN_ENC_SEPARATE – 1) – x;
ari_encode_14bits_ext(0, cumulativeFrequencyTable);
} else { /* x > MAX_ENC_SEPARATE */
extra = x – (MAX_ENC_SEPARATE + 1);
ari_encode_14bits_ext(SYMBOLS_IN_TABLE – 1, cumulativeFrequencyTable);
}
if (extra < 15) {
arith_encode_bits(extra, 4);
} else { /* extra >= 15 */
arith_encode_bits(15, 4);
extra -= 15;
if (extra < 63) {
arith_encode_bits(extra, 6);
} else { /* extra >= 63 */
arith_encode_bits(63, 6);
extra -= 63;
arith_encode_bits(extra, 7);
}
}
}
The function encodes the scale factor vector , which consists of integer values. The value and the vector, which constitute the encoder state, are used as additional parameters for the function. Note that the top level function must call the common arithmetic encoder initialization function before calling the function , and also call the arithmetic encoder finalization function afterwards.
The function is used to quantize a context value , by limiting it to , and it is defined as:
quant_ctx(ctx)
{
if (abs(ctx) <= 3) {
return ctx;
} else if (ctx > 3) {
return 3;
} else { /* ctx < -3 */
return -3;
}
}
The definitions of the symbolic names indicated in the comments from the pseudo code, used for computing the context values, are listed in the following table 100:
Table 100: Definition of symbolic names
|
the previous frame (when available) |
the current frame |
|
|
(the value to be coded) |
|
|
(when available) |
|
(when available) |
encode_sfe_vector(t, prev, g, nB)
for (f = 0; f < nB; f++) {
if (t == 0) {
if (f == 0) {
ari_encode_14bits_ext(g[f] >> 2, cf_se00);
arith_encode_bits(g[f] & 3, 2); /* LSBs as 2 bit raw */
}
else if (f == 1) {
pred = g[f – 1]; /* pred = b */
arith_encode_residual(g[f] – pred, cf_se01, cf_off_se01);
} else { /* f >= 2 */
pred = g[f – 1]; /* pred = b */
ctx = quant_ctx(g[f – 1] – g[f – 2]); /* Q(b – e) */
arith_encode_residual(g[f] – pred, cf_se02[CTX_OFFSET + ctx)],
cf_off_se02[IGF_CTX_OFFSET + ctx]);
}
}
else { /* t == 1 */
if (f == 0) {
pred = prev[f]; /* pred = a */
arith_encode_residual(x[f] – pred, cf_se10, cf_off_se10);
} else { /* (t == 1) && (f >= 1) */
pred = prev[f] + g[f – 1] – prev[f – 1]; /* pred = a + b – c */
ctx_f = quant_ctx(prev[f] – prev[f – 1]); /* Q(a – c) */
ctx_t = quant_ctx(g[f – 1] – prev[f – 1]); /* Q(b – c) */
arith_encode_residual(g[f] – pred,
cf_se11[CTX_OFFSET + ctx_t][CTX_OFFSET + ctx_f)],
cf_off_se11[CTX_OFFSET + ctx_t][CTX_OFFSET + ctx_f]);
}
}
}
}
There are five cases in the above function, depending on the value of and also on the position of a value in the vector :
- when and , the first scalefactor of an independent frame is coded, by splitting it into the most significant bits which are coded using the cumulative frequency table , and the least two significant bits coded directly.
- when and , the second scale factor of an independent frame is coded (as a prediction residual) using the cumulative frequency table .
- when and , the third and following scale factors of an independent frame are coded (as prediction residuals) using the cumulative frequency table , determined by the quantized context value .
- when and , the first scalefactor of a dependent frame is coded (as a prediction residual) using the cumulative frequency table .
- when and , the second and following scale factors of a dependent frame are coded (as prediction residuals) using the cumulative frequency table determined by the quantized context values and .
Please note that the predefined cumulative frequency tables , , and the table offsets , depend on the current operating point and implicitly on the bitrate, and are selected from the set of available options during initialization of the encoder for each given operating point. The cumulative frequency table is common for all operating points, and cumulative frequency tables and , and the corresponding table offsets and are also common, but they are used only for operating points corresponding to bitrates larger or equal than 48 kbps, in case of dependent TCX 10 frames (when ).
5.3.3.2.11.9 IGF bit stream writer
The arithmetic coded IGF scale factors, the IGF whitening levels and the IGF temporal flatness indicator are consecutively transmitted to the decoder side via bit stream. The coding of the IGF scale factors is described in subclause 5.3.3.2.11.8.4. The IGF whitening levels are encoded as presented in subclause 5.3.3.2.11.6.4. Finally the IGF temporal flatness indicator flag, represented as one bit, is written to the bit stream.
In case of a TCX20 frame, i.e. (), and no counting request is signalled to the bit stream writer, the output of the bit stream writer is fed directly to the bit stream. In case of a TCX10 frame (), where two sub-frames are coded dependently within one 20ms frame, the output of the bit stream writer for each sub-frame is written to a temporary buffer, resulting in a bit stream containing the output of the bit stream writer for the individual sub-frames. The content of this temporary buffer is finally written to the bit stream.
5.3.3.2.12 Memory updates
5.3.3.2.12.1 Internal decoder
Subsequent to the MDCT based TCX encoding, a simplified decoding at is performed to enable and update the filter memories for CELP coding. Based on the quantized MDCT spectral coefficients, the following decoding steps are performed:
- Adaptive low frequency deemphasis (subclause 6.2.2.3.2)
- Global gain decoding (subclause 6.2.2.3.3)
- Residual dequantizer (subclause 6.2.2.3.4)
- Formant enhancement (subclause 6.2.2.3.5)
- Noise filling (subclause 6.2.2.3.6)
- Application of global gain and LPC shaping in MDCT domain (subclause 6.2.2.3.7)
- Inverse window grouping (subclause 6.2.2.3.9)
- Temporal noise shaping (subclause 6.2.2.3.10)
The resulting MDCT spectrum is transformed to a time-domain signal at using the corresponding frequency-to-time transformation, as described in subclause 6.2.4, resulting in the synthesized TCX decoder output .
5.3.3.2.12.2 Update of filter memories
The updating of the states of the filter is performed by applying the perceptually weighted LPC coefficients on the synthesized TCX decoder output and subtract this from the perceptually weighted speech signal The LPC coefficients correspond to the 4th subframe for and to the 5th subframe for
(1024)
The pre-emphasis-filter is applied to for updating the synthesis buffers used in the target signal computation to calculate the residual signal ( subclause 5.2.3.1.2), and for LPC synthesis filter states used in the CELP encoder. For updating the filter states of LP residual signal computation, the pre-emphasized synthesis buffer is filtered by the quantized LPC coefficients (see subclause 5.2.3.1.1).
5.3.3.2.13 Global Gain Adjuster
For bitrates less than 13.2 kbps the optimum global gain is partially recomputed after noise filling and formant enhancement have been applied in the internal decoder:
(1025)
The new global gain is quantized again for transmission in the bit stream to an index , replacing the previously computed one.