5.3.4 Handling of multiple frame losses and muting

26.4473GPPCodec for Enhanced Voice Services (EVS)Error concealment of lost packetsRelease 17TS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

5.3.4.1 Specifics for rates 5.9, 6.8, 8.0, 13.2, 32 and 64 kbps

The principal of attenuation in case of packet lost has been introduced in subclause 5.3.1. However, there are exceptions to the general case. The following three exceptions based on the last good frame coding mode that take precedence over table below. All apply only up to 3 consecutive lost frames. First, if the last good received frame is coded with UC mode,  is set to 1. Second, if the last good received frame is coded with VC or is an ONSET,  is set to 1.0. Finally, A stability factor, , is computed based on a distance measure between the adjacent LP filters. Here, the factor, , is related to the LSF distance measure and it is bounded by 0    1, with larger values of  corresponding to more stable signals. This limits energy and spectral envelope fluctuations when an isolated frame erasure occurs inside a stable unvoiced segment. Note that the class ARTIFICIAL ONSET is set at the decoder if the frame follows an erased frame and artificial onset reconstruction is used as described in subclause 5.3.3.4.2 at bit rate 32 and 64 kbps.

The signal classification is implicit for VC, UC and TC frames. Further, more precise classification can be decoded from the bitstream depending of the bit rate.

Table 7: Values of the PLC attenuation factor α

Last good received frame	Number of successive erased frames	
ARTIFICIAL ONSET		0.6
ONSET, VOICED_CLAS	≤ 3	1.0
ONSET, VOICED_CLAS	> 3	0.4
VOICED TRANSITION		0.4
UNVOICED TRANSITION		0.8
UNVOICED_CLAS	= 1	0.2 + 0.8
	= 2	0.6
	> 2	0.4
AUDIO \|\| INACTVE	>1 if GSC had temporal contribution	0.8
	<=5	0.995
	>5	0.95

5.3.4.2 Specifics for rates 9.6, 16.4 and 24.4 kbps

5.3.4.2.1 Fading to background level

The innovative as well as the harmonic excitation fade to individual target levels by changing the codebook gains.

(107)

where: is the gain of the current frame;

is the gain of the previous frame;

is the target gain;

is the fading factor, its derivation is outlined in subclause 5.3.4.2.3.

The fading is performed as follows:

(107a)

Where is the input signal, e.g. the harmonic or the innovative excitation, and is the faded output signal.

The harmonic excitation is faded towards zero: .

The innovative excitation is faded towards a target background noise level: . It is derived during the first lost frame based on the background noise spectrum derived by CNG during clean channel decoding (see clause 4.3 of [5]). Its derivation is performed as follows:

a) Derive target level in time domain based on background noise spectrum :

(108)

b) Compensate gain of LPC synthesis / de-emphasis (see also subsection 5.2.5):

(109)

where is derived subframe-wise as stated in equation (26).

5.3.4.2.2 Fading to background spectral shape

Separate LPCs are applied for the innovative and the harmonic excitation as described in subclause 5.3.1. The innovative and the harmonic excitations are faded to individual target spectral shapes by altering the LPC coefficients. The fading from the last good LPC coefficients to the target LPC coefficients is performed in the LSF domain as follows:

(110)

where: are LPC coefficients in the LSF domain of the current frame;

are LPC coefficients in the LSF domain of the previous frame;

are the target LPC coefficients;

is the fading factor as described in subclause 5.3.4.2.3. In case of the innovative excitation, will be minimal 0.8.

The target spectral shape of the harmonic excitation is the short term mean of the last three LPC coefficient sets. Its derivation is performed in the LSF domain as follows:

(111)

The target spectral shape of the innovative excitation is derived during the first lost frame based on the background noise spectrum derived by CNG during clean channel decoding (see 4.3 of [5]). Its derivation is performed as follows:

a) Compute power spectrum on the background noise spectrum.

b) Apply an inverse Fourier transform with length 640 on the power spectrum to obtain the autocorrelation values with .

c) Do a normalisation of to obtain , if set to 100 and multiply by 1.0005.

d) Execute the Levinson-Durbin algorithm with the order 16 to obtain the LP parameters from .

e) Finally, transform the LPC coefficients to the LSF domain to obtain

5.3.4.2.3 Fading speed

The damping factor controls the fading speed of the innovative and the harmonic excitation and depends on a bunch of parameters. These are: the number of consecutive lost frames, the LSF stability factor , the coder type, the class of the last good frame, the pitch gain and the current coding mode. With this set of parameters the damping factor is determined as follows:

– Firstly, if current coding mode is ACELP_CORE, then

– in case the coder type is UNVOICED and the number of consecutive lost frames is maximally three, then is set to 1

– else if the last good frame was UNVOICED_CLAS and

– if it is the first lost frame, then

– else if exactly two frames are lost, then

– otherwise, if three or more frames are lost, then

– else if the last good frame was UNVOICED_TRANSITION, then

– else if the last good frame was ONSET and number of lost frames are maximally three and the coder type is GENERIC, then

– else if the last good frame was either VOICED_CLAS or the last good frame was ONSET and the number of lost frames is maximally three, then

– otherwise,

– besides that, if the last good frame was not one out of the set of { UNVOICED_CLAS, UNVOICED_TRANSITION, VOICED_TRANSITION }, then

– in case it is the first erased frame, then , whereas is limited from 0.85 to 0.98

– else if the number of lost frames are exactly two, then

– otherwise, if more than two frames are consecutive lost, then the pitch of gain is changed to the new gain and following the damping factor is calculated as

– Otherwise, if current coding mode is not ACELP_CORE and

– if it is the first lost frame, then

– else if exactly two frames are lost, then

– otherwise, if three or more frames are lost, then