5.6.2 Encoding for LP-CNG
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
This section describes the operation in LP-CNG. Similar to the default operation, the LP-CNG also operates on the split-band basis. In WB/NB operation, only the LP parameters for low-band signal are analyzed and encoded. In SWB/FB operation, besides the LP analysis for the low-band signal, the high-band signal is analyzed and encoded separately as a kind of bandwidth extension. The LP parameters for low-band analysis include: the low-band excitation energy, the low-band signal spectrum and the low-band excitation envelope. The high-band analysis only involves one parameter which is the high-band energy. The 1 CNG type bit (see subclause 7.2) is set to “0” for LP-CNG and transmitted in each SID frame.
5.6.2.1 LP-CNG CN parameters estimation
The CN parameters to be encoded into a LP-CNG SID frame are calculated over a certain period, which is called the CN averaging period. These parameters give information about the level and the spectrum of the background noise. The CN averaging period, NCN, is equal to the number of consecutive frames including the current SID frame and its preceding NO_DATA frames, upper-limited by the value of 8 consecutive frames. It is a variable value depending on the current SID transmission rate. In particular, the first SID frame immediately after an active signal burst always uses the value NCN = 1. The LP-CNG can generate two different types of SID frame – the WB SID frame, containing low-band (WB) only CN parameters, and the SWB SID frame, containing both low-band and high-band (SHB) CN parameters. One bit is encoded into the LP-CNG SID frame to indicate the bandwidth type of the SID frame, where “0” indicates WB SID and “1” indicates SWB SID. Only WB SID frames are transmitted when operating in NB/WB mode. In SWB/FB operation, the SWB SID frames are not always transmitted but WB SID frames can also be transmitted between two adjacent SWB SID updates. This means the high-band CN parameters are not updated at the decoder with the same rate the low-band CN parameters will be updated. Details in SWB/FB operation will be described in subclause 5.6.2.1.8. The bit allocation for the CN parameters in the respect WB and SWB SID frames are described in subclause 7.2.
5.6.2.1.1 LP-CNG Hangover analysis period determination
To enable high quality comfort noise synthesis on the receiving side, the encoder sends a three bit counter value to the decoder. The transmitted value is derived from the
counter, as:
()
where is counter of the consecutive frames without any primary SAD active decisions where the DTX SAD flag
, as described in subclause 5.1.12.8, is set to 1. These hangover frames without primary SAD active decisions are deemed to be relevant for comfort noise analysis. For DTX operation, the
counter is incremented in actively encoded speech segments whenever the primary SAD flag is set to 0 and the DTX SAD flag is set to 1 and it is set to zero whenever this is not the case.
5.6.2.1.2 LP-CNG filter parameters evaluation for low-band signal
In the DTX/ LP-CNG operation, the LP filter coefficients are quantized in the LSF domain, which is the same as in the default option. However, in the case of DTX/CNG operation, the LP filter coefficients are not interpolated within the frame. From the LP analysis, which is described in subclause 5.1.9, only the end-frame LSP vector, , is used for quantization purposes in SID frames.
The end-frame LSP vector is not quantized directly. Instead, an averaged end-frame LSP vector is calculated over the CN averaging period which is then converted to an LSF vector and quantized. Not all end-frame LSP vectors in the CN averaging period are used for averaging, but two outlier LSP vectors are removed. The two outliers are found over the CN averaging period, as the two LSP vectors representing the lowest spectral entropy. This is to mitigate the possible corruption to the averaged LSP vector from interfering background frames, assuming that interfering background frames are usually more structural in their spectrum (leading to lower spectral entropy) than normal background noise frames. A parameter which can reflect such a spectral entropy is thus calculated for each end-frame LSP vector over the CN averaging period. Each end-frame LSP vector is first converted to its LSF representation and
is calculated as
()
where equals 16 which is the order of the LP filter,
denotes the bandwidth of the partition if the signal bandwidth is divided by M equally spaced LSF coefficients, that is,
, where
is the bandwidth of the signal,
equals 6400 for 12.8kHz sampling rate core and 8000 for 16kHz sampling rate core,
denotes the length of the
partition divided by LSF coefficients of the
LSP vector over the signal bandwidth, that is
()
where is the
LSF coefficient of the
LSP vector,
is either 6400 or 8000 depending on the sampling rate of the core. A more structured spectrum will result in a
having higher value representing lower spectral entropy. So the two LSP vectors resulting in the two maximum
over the CN averaging period is found as the two outliers. The averaged LSP vector is then calculated as
()
where is the length of CN averaging period,
,
denotes the index of the two LSP outliers. The outlier-removed LSP vector average
is considered the best representation of the short-term spectral envelope of the background noise. It is converted to the LSF representation, quantized (see subclause 5.6.2.1.3) and transmitted in the SID frame.
5.6.2.1.3 LP-CNG CNG-LSF quantization for low-band signal
The quantization of the LSF vector follows the procedures used for the LSF vector within the ACELP block. They are described in subclause 5.2.2.1. The quantization is done with a two stage quantizer. The first stage consists of a non-predictive, non-structured, optimized VQ codebook. The second stage consists of a multiple scale lattice vector quantizer whose structure and search procedure are detailed in subclause 5.2.2.1.4. The number of lattice structures from the second stage corresponds to the number of codevectors from the first stage such that if a particular codevector is selected in the first stage, its corresponding lattice structure is used in the second stage. A lattice multiple scale lattice structure corresponds to a set of 6 numbers specifying the number of leader classes in each of the 6 lattice truncations and 6 numbers specifying the scale values for the same truncations. There are three lattice truncations that define the codebook for the first 8-dimensional LSF subvector and three lattice truncations defining the codebook for the second 8-dimensional LSF subvector. In addition, a 16-dimensional vector defines for each multiple scale lattice structure a normalization values vector, .
The codebook from the first stage uses 4 bits, and each lattice structure from the second stage is defined for 25 bits. Consequently a total of 29 bits that are used for the quantization of the LSF vector. With the above described structure, the table ROM used for storing the CNG LSF codebook data covering 29 bits has 1.408kBytes.
The search in the first stage codebook is done taking into account the value of the last component of the 16 dimensional LSF vector. Based on this value only part of the first stage codebook is searched. If the last LSF vector component is larger than 6350 then the search is done only for the first 6 codevectors of the first stage and the LSF vector corresponds to internal sampling frequency of 16kHz, otherwise the search is performed within the last 10 codevectors of the first stage that correspond to the internal sampling rate of 12.8kHz. At the second stage, prior to quantization with the lattice structure, based on the selected first stage codevector some components of the LSF vector are permuted like specified by the following table:
Table : LSF vector component permutation
|
First stage codevector index |
Permutations |
|
0 |
(6,11), (7,15) |
|
1 |
(6,15) |
|
2 |
(5,8), (7,15) |
|
3 |
(7,10) |
|
7 |
(0,9), (7,10) |
|
9 |
(7,15) |
|
12 |
(6,10), (7,11) |
|
13 |
(6,10), (7,12) |
|
14 |
(6,10), (7,12) |
|
15 |
(6,10), (7,12) |
A permutation defined as (6,11) signifies that the 6th component, numbered starting from 0, is replaced by the 11th one and reciprocally. The permutations are performed between the two groups or subvectors, i.e the first index in the permutation is from the first half of the codevector and the second one from the second half or subvector. The permutations are performed only when one of the first stage codevectors whose index is mentioned in the previous table is obtained at the first stage. The resulting vector is component wise multiplied with the inverse of the corresponding vector and quantized with the corresponding multiple scale lattice structure. After quantization the components of the obtained second stage multiple scale lattice codevector are permuted back, and added to the first stage codevector in order to obtain the quantized LSF vector. 1 bit indicating the core sampling rate is transmitted in each SID frame. This bit signals the decoder the sampling domain on which the quantized LSF vector is. The bit is set to “0” for 12.8 kHz sampling rate and “1’ for 16 kHz sampling rate.
5.6.2.1.4 LP-CNG synthesis filter computation for local CNG synthesis
A smoothed LSP vector is used in every inactive frame to obtain the LP synthesis filter .The quantized LSF vector is converted back to the LSP domain. The smoothed LSP vector is updated by the last quantized LSP vector by means of an AR low-pass filter in each inactive frame except the first SID frame after an active burst. That is
()
where ,
denote respectively the smoothed LSP vector at the current and the previous frame,
denotes the last quantized LSP vector,
= 0.9 is a smoothing factor. An additional constraint is applied to the inactive frames after the first SID frame of an inactive segment and before the second SID frame that the update to the smoothed LSP vector described above is suspended if the last SID excitation energy is an outlier and sufficient hangover frames are contained in the last active burst, that is, if
and
, where
denotes the quantized excitation energy in the last SID frame, as calculated in equation (1339),
denotes the number of entries in
used for
calculation as described in subclause 6.7.2.1.2 and
is the smoothed quantized excitation energy, further described in subclause 5.6.2.1.6. For the first SID frame after an active burst, the smoothed LSP vector is updated depending on whether the frame is an outlier in either energy or spectrum and whether there are past CN parameters to analyze in the CNG analysis buffer as described in subclause 6.7.2.1.2. If the step update flag
is set to 1 or there were no past CN-parameters to analyse in the CNG analysis buffer, the smoothed LSP vector is initialized to the quantized LSP vector of the current SID frame. Otherwise, if step update is not allowed and there are past CN parameters to analyze in the CNG analysis buffer, the overall and the maximum individual spectral distortion between the quantized LSP vector of the current SID frame and the average LSP vector,
, calculated over hangover frames in subclause 6.7.2.1.2 are calculated.
()
()
where is the overall spectral distance,
is the maximum spectral distance,
is the average LSP vector calculated over hangover frames,
is the quantized LSP vector of the current SID frame. If
and
are deviating to each other, that is, if
or
, the quantized LSP vector of the current SID frame
, is considered an outlier and the smoothed LSP vector is initialized to the average LSP vector calculated over hangover
. Otherwise, the smoothed LSP vector is initialized by
()
The smoothed LSP vector is initially set to the quantized end-frame LSP vector from the previous frame,
, when the first SID frame is processed at the encoder. The step update flag
is set in each inactive frame by measuring the energy step between the instant energy and the long-term energy. For the first inactive frame after an active signal period, the flag
is additionally set if there are past CN-parameters and the most recent energy value in
is more than four times larger than the smoothed quantized excitation energy
. Finally, the smoothed LSP vector is converted to LP coefficients to obtain a smoothed LP synthesis filter,
, which is used in the local CNG synthesis.
5.6.2.1.5 LP-CNG energy calculation and quantization
The excitation energy in the current frame is computed for each inactive frame according to the following equation:
()
where is the LP residual signal, calculated by filtering the pre-emphasized inactive input signal,
, through the filter Â(z),
=256 or 320 depending on the sampling rate of the core. Then a weighted average energy is computed over the whole CN averaging period by
()
where the weights are defined as = [0.2, 0.16, 0.128, 0.1024, 0.08192, 0.065536, 0.0524288, 0.01048576], and
is an energy offset value which is set to 0 for input bandwidth = NB, to 1.5 for input bandwidth greater than WB, and for signals of bandwidth = WB, the energy offset value is chosen from an energy attenuation table depending on the latest bitrate used for actively encoded frames
as defined by Table 151a. The energy offset is only updated in the first SID frame after an active signal period if two criteria are both fulfilled. The first criterion is satisfied if AMR-WB IO mode is used or the bandwidth=WB. The second criterion is met if the number of consecutive active frames in the latest active signal segment was at least
number of frames or if the current SID is the very first encoded SID frame. The superscript [n] denotes a particular frame, e.g., [0] is the current frame.
Table 151a: Energy offset selection for LP-CNG
|
Latest active bitrate [kbps] |
|
|
|
1.7938412 |
|
|
1.3952098 |
|
|
1.0962363 |
|
|
0.9965784 |
|
|
0.9965784 |
The weighted average energy is then quantized using a 7-bit arithmetic quantizer. The integer quantization index in the current SID frame is found using the relation
(1336)
where Δ = 5.25 is the quantization step. The quantization index is limited to [0, 127]. The quantization index is further limited not to increase by more than one from the value of the previous frame if the previous frame was also an inactive frame. An exception is that if the step update flag is set to 1, then the quantization index is allowed to increase more than one from the value of the previous frame using the relation
()
where denotes the final quantization index transmitted in each SID frame,
denotes the quantization index transmitted in the previous SID frame,
is the quantization index calculated in equation (1336). The quantized value of energy is used further in the local CNG synthesis and is found by
()
which is converted to the linear domain by
(1339)
5.6.2.1.6 LP-CNG energy smoothing for local CNG synthesis
The quantized excitation energy, , calculated in equation (1339) is not used directly in the local CNG synthesis. Instead, a smoothed quantized excitation energy,
, is computed. The smoothed quantized excitation energy is updated in every inactive frame in a general form of
()
where superscript [-1] demotes the value from the previous frame, is the quantized energy in the SID frame, calculated in equation (1339),
is the smoothing factor controlling the update rate. Variable update rates are utilized. For the first inactive frame after an active signal period, if there is no step update flag
set to 1 in the latest two SID frames,
= 0.8 if the number of preceding hangover frames is less than 3 or
, otherwise,
= 0.95. Otherwise, if step update flag
is set to 1 in either of the latest two SID frames,
= 0, i.e.
is set directly to
. For consequent frames, if the step update flag
at the latest SID frame is not set to 1,
= 0.8. Otherwise,
= 0, i.e.
is set directly to
. For the first inactive frame after an active signal period, before the smoothed quantized excitation energy
is updated by
, the value of
from the previous frame, that is
, is initialized to
which is the age weighted average excitation energy of the DTX hangover frames calculated in subclause 6.7.2.1.2, if step update flag
is not set to 1 and there are past CN parameters to analyse in the CNG analysis buffers.
is initially set equal to
.
5.6.2.1.7 LP-CNG LF-BOOST determination and quantization
While the quantized LSF spectrum generally estimates well the spectrum of most background noises, it is found less sufficient for noises which have strong low frequency component for example the car noise. To compensate the missing low frequency component, the spectral envelope in the low frequency portion of the LP residual signal is quantized and transmitted in the SID frame. Note that this quantized residual spectral envelope is only transmitted in WB SID frame
The LP residual signal calculated in subclause 5.6.2.1.5 is first attenuated by multiplying an attenuation factor
for all input bandwidth except NB. The attenuation factor is calculated as
()
where if the bandwidth is not WB or the latest bitrate used for actively encoded frames
is larger than 16.4 kbps. Otherwise
is determined from a hangover attenuation floor table as defined in table 35b. The attenuation factor
is finally lower limited to
. Then a FFT is used to obtain the frequency representation of the LP residual signal and a spectral envelope which is the energies of the first 20 FFT bins in the low frequency portion of the frequency representation (excluding the DC bin) is calculated as
(1342)
where and
are, respectively, the real and the imaginary parts of the
-th frequency bin as outputted by the FFT,
= 256 is the size of FFT analysis. This low frequency spectral envelope of the LP residual is not quantized directly. Instead, an averaged spectral envelope is calculated over the CN averaging period. The averaging is similar to the process in subclause 5.6.2.1.2 that the spectral envelopes of the two outliers identified in subclause 5.6.2.1.2 are removed from the averaging. The averaged low frequency spectral envelope is calculated as
()
where is the length of CN averaging period,
denotes the low frequency envelope of the
-th frame in the CN averaging period,
,
denotes the index of the two outliers. To encode this averaged low frequency spectral envelope, the spectral details of the averaged low frequency spectral envelope is extracted and used for actual quantization. The spectral details,
, is obtained by subtracting an envelope floor which is equal to two times of the quantized average excitation energy from the averaged low frequency spectral envelope, that is
()
whereis bounded to non-negative value.
is then converted to log domain
()
where is the energy offset value calculated in subclause 5.6.2.1.5 and
are bounded to non-negative value. A distance vector is calculated as
()
where is the quantized total excitation energy calculated as
()
where = 256 is the length of the excitation. The distance vector
is quantized by a vector quantization. The codeword having the minimum prediction error is found by a direct search in the codebook and the index of the codeword is transmitted in the WB SID frame. The quantized low frequency envelope is recovered and used in the local CNG synthesis.
5.6.2.1.8 LP-CNG high band analysis and quantization
To enable high perceptual quality in the inactive portions of speech on the decoder side, during SWB mode DTX/LP-CNG operation of the codec, the high band noise signal (6.4 – 14.4 kHz or 8 – 16 kHz depending on the core sampling rate) is analyzed and quantized. However, this is being done without transmitting any extra parameters from the encoder to decoder to model the high-band spectral characteristics of the inactive frames. Instead, the high band spectrum of the comfort noise is modelled purely in the decoder side. Only the energy of high band signal is quantized and transmitted in the SID frame.
The average energy of the high band signal is first calculated
()
where is the high band signal,
= 320 is the length of high band signal. The log average energy of high band signal is calculated and to which an attenuation of 6.5 dB is applied
()
The attenuated log energy is smoothed by an AR filtering as
()
where is the smoothed high band log average energy,
denotes the smoothed log average energy in the previous frame. The average energy of the low band signal is also calculated
()
where is the synthesized low band signal as described in subclause 5.6.2.2,
is the length of the synthesized low band signal. The log average energy of the low band signal is calculated
()
The log average energy of the low band signal is also smoothed by an AR filtering.
()
where is the smoothed low band log average energy,
denotes the smoothed log average energy in the previous frame. Step update to the smoothed low band and high band log average energy is allowed. If the low band log average energy
of the current frame is deviating from the smoothed low band log average energy of the previous frame
by more than 12dB, a step update flag
is set to 1 indicating the permission of step update, otherwise is set to 0. If
is set to 1, the smoothed low band log average energy and the smoothed high band log average energy are respectively set to the low band log average energy
and the high band log average energy
. The high band parameter, i.e. the energy of the high band signal is not quantized and transmitted in every SID frame. Instead, SWB SID frame which contains both the low band and high band parameters is only transmitted when the energy relationship (the energy ratio) between the low band and high band signals at the current frame is deviating from that relationship at previous SWB SID frame by more than 3dB. This can be described as, when a SID frame is about to be transmitted, if
, then the high band parameter is transmitted in the SID frame. Besides, following conditions can also trigger the transmission of SWB SID frame, including: the first SID frame immediately after active frames, the SID frame which is within high band analysis initialization period, the SID frame before which there is no active and no SWB SID frame in the 100 preceding frames when operating in SWB mode or above, the SID frame where there is bandwidth switching between WB and SWB. If SWB SID transmission is not triggerd at an instance of SID frame update, the WB SID frame will be transmitted instead.
In each SWB SID frame, the smoothed high band log average energy is quantized and transmitted. The
is first converted to
domain as
()
Then a 4-bit arithmetic quantizer is used for quantizing . The integer quantization index is found by
()
where = 0.9 is the quantization step. The quantization index
is bounded to [0, 15].
5.6.2.2 LP-CNG local CNG synthesis
The local CNG synthesis is performed at the encoder for low-band signal in order to update the filters, the adaptive codebook memories, and to guide the high-band analysis and the DTX hangover control (see subclause 5.1.12.8). The local CNG is performed by filtering a scaled excitation signal through a smoothed LP synthesis filter. The scaled excitation,, is a combination of a random excitation and an excitation representing the low frequency spectral details of the excitation signal. For the generation of
, see subclause 6.7.2.1.5. For the computation of the smoothed LP synthesis filter, see subclause 5.6.2.1.4.
5.6.2.3 LP-CNG CNG Memory update
When an inactive signal frame is encoded, the following updates are carried out:
– MA memory of the ISF quantizer is set to zero;
– AR memory of the ISF quantizer is set to its mean values (UC mode, WB case);
– synthesis excitation spectrum tilt is set to zero;
– weighting filter denominator memory is set to zero;
– gain of pitch clipping memory is set to initial values;
– open-loop pitch estimator parameters are set to zero;
– per-bin NR last critical band is set to zero (the whole spectrum subtraction);
– noise enhancer memory is set to zero;
– phase dispersion memory is set to zero;
– previous pitch gains are all set to zero;
– previous codebook gain is set to zero;
– voicing factors used by bandwidth extension are all set to 1;
– active frame counter is set to zero;
– bass post-filter is tuned off;
– floating point pitch for each subframe is set to the subframe length;
– class of last received good frame for FEC is set to UNVOICED_CLAS;
– synthesis filter memories are updated.