6 Computational description overview

3GPP46.082Release 17TSVoice Activity Detector (VAD) for Enhanced Full Rate (EFR) speech traffic channels

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

The computational details necessary for the fixed point implementation of the speech transcoding and DTX functions are given in the form of an American National Standards Institute (ANSI) C program contained in GSM 06.53 [2]. This clause provides an overview of the modules which describe the computation of the VAD algorithm.

6.1 VAD modules

The computational description of the VAD is divided into three ANSI C modules. These modules are:

– vad_reset;

– vad_computation;

– periodicity_update.

The vad_reset module sets the VAD variables to their initial values.

The vad_computation module is divided into nine sub-modules which correspond to the blocks of figure 1 in the high level description of the VAD algorithm. The vad_computation module can be called as soon as the acf[0..8] and rc[1..4] variables are known. This means that the VAD computation can take place after the levinson routine of the second half of the frame in the speech encoder (GSM 06.60 [4]). The vad_computation module also requires the value of the ptch variable calculated in the previous frame.

The ptch variable is calculated by the periodicity_update module from the lags[1..2] variable. The individual lag values are calculated by the open loop pitch search routine in the speech encoder (GSM 06.60 [4]). The periodicity_update module is called after the VAD decision and when the current LTP lag values lags[1..2] are available.

6.2 Pseudo-floating point arithmetic

All the arithmetic operations follow the precision and format used in the computational description of the speech codec in GSM 06.53 [2]. To increase the precision within the fixed point implementation, a pseudo-floating point representation of some variables is used. This applies to the following variables (and related constants) of the VAD algorithm:

– pvad: Energy of filtered signal;

– thvad: Threshold of the VAD decision;

– acf0: Energy of input signal.

For the representation of these variables, two 16-bit integers are needed:

– one for the exponent (e_pvad, e_thvad, e_acf0);

– one for the mantissa (m_pvad, m_thvad, m_acf0).

The value e_pvad represents the lowest power of 2 just greater or equal to the actual value of pvad and the m_pvad value represents an integer which is always greater or equal to 16 384 (normalized mantissa). It means that the pvad value is equal to:

pvad = 2e_pvad * (m_pvad/32768) (7)

This scheme provides a large dynamic range for the pvad value and always keeps a precision of 16 bits. All the comparisons are easy to make by comparing the exponents of two variables. The VAD algorithm needs only one pseudo-floating point addition and multiplication. All the computations related to the pseudo-floating point variables require simple 16- or 32-bit arithmetic operations defined in the detailed description of the speech codec.

Some constants, represented by a pseudo-floating point format, are needed and symbolic names (in capital letters) for their exponent and mantissa are used; table 8 lists all these constants with the associated symbolic names and their numerical constant values.

Table 8: List of floating point constants

Constant	Exponent	Mantissa
pth	E_PTH = 17	M_PTH = 32500
margin	E_MARGIN = 27	M_MARGIN = 16927
plev	E_PLEV = 19	M_PLEV = 21667

Annex A (informative):
Simplified block filtering operation

Consider an 8th order transversal filter with filter coefficients a0..a8, through which a signal is being passed, the output of the filter being:

s'[n] = – SUM (a[i]*s[n-i]) (1)

i=0

If we apply block filtering over 20 ms segments, then this equation becomes:

s'[n] = – SUM (a[i]*s[n-i]) ; n = 0..167 (2)

i=0 ; 0 <= n-i <= 159

If the energy of the filtered signal is then obtained for every 20 ms segment, the equation for this is:

167 8

pvad = SUM (- SUM (a[i]*s[n-i]))2 ; 0 <= n-i <= 159 (3)

n=0 i=0

We know that:

159

acf[i] = SUM (s[n]*s[n-i]) ; i = 0..8 (4)

n=0 ; 0 <= n-i <= 159

If equation (3) is expanded and acf[0..8] are substituted for s[n] then we arrive at the equations:

pvad = r[0]*acf[0] + 2*SUM (r[i]*acf[i]) (5)

i=1

Where:

8-i

r[i] = SUM (a[k]*a[k+i]) ; i = 0..8 (6)

k=0

Annex B (informative):
Pole frequency calculation

This annex describes the algorithm used to determine whether the pole frequency for a second order analysis of the signal frame is less than 385 Hz.

The filter coefficients for a second order synthesis filter are calculated from the first two unquantized reflection coefficients rc[1..2] obtained from the speech encoder. This is done using the step up routine described in GSM 06.53 [2]. If the filter coefficients a[0..2] are defined such that the synthesis filter response is given by:

H(z) = 1/(a[0] + a[1]z‑1 + a[2]z‑2) (1)

Then the positions of the poles in the Z-plane are given by the solutions to the following quadratic:

a[0]z2 + a[1]z + a[2] = 0, a[0] = 1 (2)

The positions of the poles, z, are therefore:

z = re + j*sqrt(im), j2 = ‑1 (3)

where:

re = – a[1] / 2 (4)

im = (4*a[2] – a[1]2)/4 (5)

If im is negative then the poles lie on the real axis of the Z-plane and the signal is not a tone and the algorithm terminates. If re is negative then the poles lie in the left hand side of the Z-plane and the frequency is greater than 2000 Hz and the prediction error test can be performed.

If im is positive and re is positive then the poles are complex and lie in the right hand side of the Z-plane and the frequency in Hz is related to re and im by the expression:

freq = arctan(sqrt(im)/re)*4000/pi (6)

Having ensured that both im and re are positive the test for a pole frequency less than 385 Hz can be derived by substituting equations 4 and 5 into equation 6 and re-arranging:

(4*a[2] – a[1]2 )/a[1]2 < tan2(pi*385/4000) (7)

(4*a[2] – a[1]2)/a[1]2 < 0.0973 (8)

If this test is true then the signal is not a tone and the algorithm terminates, otherwise the prediction error test is performed.

Annex C (informative):
Change history

Change history
SMG No.	TDoc. No.	CR. No.	Clause affected	New version	Subject/Comments
SMG#22				4.0.1	ETSI Publication
SMG#20				5.0.3	Release 1996 version
SMG#27				6.0.0	Release 1997 version
SMG#29				7.0.0	Release 1998 version
				7.0.1	Version update to 7.0.1 for Publication
SMG#31				8.0.0	Release 1999 version
				8.0.1	Update to Version 8.0.1 for Publication

Change history
Date	TSG #	TSG Doc.	CR	Rev	Subject/Comment	Old	New
03-2001	11				Version for Release 4		4.0.0
06-2002	16				Version for Release 5	4.0.0	5.0.0
12-2004	26				Version for Release 6	5.0.0	6.0.0
06-2007	36				Version for Release 7	6.0.0	7.0.0
07-2007					Makes matrices in §5.2.3 visible	7.0.0	7.0.1
12-2008	42				Version for Release 8	7.0.1	8.0.0
12-2009	46				Version for Release 9	8.0.0	9.0.0
03-2011	51				Version for Release 10	9.0.0	10.0.0
09-2012	57				Version for Release 11	10.0.0	11.0.0
09-2014	65				Version for Release 12	11.0.0	12.0.0
12-2015	70				Version for Release 13	12.0.0	13.0.0

Change history
Date	Meeting	TDoc	CR	Rev	Cat	Subject/Comment	New version
03-2017	SA#75					Version for Release 14	14.0.0
06-2018	SA#80					Version for Release 15	15.0.0
2020-07	–	–	–	–	–	Update to Rel-16 version (MCC)	16.0.0
2022-04	–	–	–	–	–	Update to Rel-17 version (MCC)	17.0.0