6 Computational description overview
3GPP46.082Release 17TSVoice Activity Detector (VAD) for Enhanced Full Rate (EFR) speech traffic channels
The computational details necessary for the fixed point implementation of the speech transcoding and DTX functions are given in the form of an American National Standards Institute (ANSI) C program contained in GSM 06.53 [2]. This clause provides an overview of the modules which describe the computation of the VAD algorithm.
6.1 VAD modules
The computational description of the VAD is divided into three ANSI C modules. These modules are:
– vad_reset;
– vad_computation;
– periodicity_update.
The vad_reset module sets the VAD variables to their initial values.
The vad_computation module is divided into nine sub-modules which correspond to the blocks of figure 1 in the high level description of the VAD algorithm. The vad_computation module can be called as soon as the acf[0..8] and rc[1..4] variables are known. This means that the VAD computation can take place after the levinson routine of the second half of the frame in the speech encoder (GSM 06.60 [4]). The vad_computation module also requires the value of the ptch variable calculated in the previous frame.
The ptch variable is calculated by the periodicity_update module from the lags[1..2] variable. The individual lag values are calculated by the open loop pitch search routine in the speech encoder (GSM 06.60 [4]). The periodicity_update module is called after the VAD decision and when the current LTP lag values lags[1..2] are available.
6.2 Pseudo-floating point arithmetic
All the arithmetic operations follow the precision and format used in the computational description of the speech codec in GSM 06.53 [2]. To increase the precision within the fixed point implementation, a pseudo-floating point representation of some variables is used. This applies to the following variables (and related constants) of the VAD algorithm:
– pvad: Energy of filtered signal;
– thvad: Threshold of the VAD decision;
– acf0: Energy of input signal.
For the representation of these variables, two 16-bit integers are needed:
– one for the exponent (e_pvad, e_thvad, e_acf0);
– one for the mantissa (m_pvad, m_thvad, m_acf0).
The value e_pvad represents the lowest power of 2 just greater or equal to the actual value of pvad and the m_pvad value represents an integer which is always greater or equal to 16 384 (normalized mantissa). It means that the pvad value is equal to:
pvad = 2e_pvad * (m_pvad/32768) (7)
This scheme provides a large dynamic range for the pvad value and always keeps a precision of 16 bits. All the comparisons are easy to make by comparing the exponents of two variables. The VAD algorithm needs only one pseudo-floating point addition and multiplication. All the computations related to the pseudo-floating point variables require simple 16- or 32-bit arithmetic operations defined in the detailed description of the speech codec.
Some constants, represented by a pseudo-floating point format, are needed and symbolic names (in capital letters) for their exponent and mantissa are used; table 8 lists all these constants with the associated symbolic names and their numerical constant values.
Table 8: List of floating point constants
Constant |
Exponent |
Mantissa |
pth |
E_PTH = 17 |
M_PTH = 32500 |
margin |
E_MARGIN = 27 |
M_MARGIN = 16927 |
plev |
E_PLEV = 19 |
M_PLEV = 21667 |
Annex A (informative):
Simplified block filtering operation
Consider an 8th order transversal filter with filter coefficients a0..a8, through which a signal is being passed, the output of the filter being:
8
s'[n] = – SUM (a[i]*s[n-i]) (1)
i=0
If we apply block filtering over 20 ms segments, then this equation becomes:
8
s'[n] = – SUM (a[i]*s[n-i]) ; n = 0..167 (2)
i=0 ; 0 <= n-i <= 159
If the energy of the filtered signal is then obtained for every 20 ms segment, the equation for this is:
167 8
pvad = SUM (- SUM (a[i]*s[n-i]))2 ; 0 <= n-i <= 159 (3)
n=0 i=0
We know that:
159
acf[i] = SUM (s[n]*s[n-i]) ; i = 0..8 (4)
n=0 ; 0 <= n-i <= 159
If equation (3) is expanded and acf[0..8] are substituted for s[n] then we arrive at the equations:
8
pvad = r[0]*acf[0] + 2*SUM (r[i]*acf[i]) (5)
i=1
Where:
8-i
r[i] = SUM (a[k]*a[k+i]) ; i = 0..8 (6)
k=0
Annex B (informative):
Pole frequency calculation
This annex describes the algorithm used to determine whether the pole frequency for a second order analysis of the signal frame is less than 385 Hz.
The filter coefficients for a second order synthesis filter are calculated from the first two unquantized reflection coefficients rc[1..2] obtained from the speech encoder. This is done using the step up routine described in GSM 06.53 [2]. If the filter coefficients a[0..2] are defined such that the synthesis filter response is given by:
H(z) = 1/(a[0] + a[1]z‑1 + a[2]z‑2) (1)
Then the positions of the poles in the Z-plane are given by the solutions to the following quadratic:
a[0]z2 + a[1]z + a[2] = 0, a[0] = 1 (2)
The positions of the poles, z, are therefore:
z = re + j*sqrt(im), j2 = ‑1 (3)
where:
re = – a[1] / 2 (4)
im = (4*a[2] – a[1]2)/4 (5)
If im is negative then the poles lie on the real axis of the Z-plane and the signal is not a tone and the algorithm terminates. If re is negative then the poles lie in the left hand side of the Z-plane and the frequency is greater than 2000 Hz and the prediction error test can be performed.
If im is positive and re is positive then the poles are complex and lie in the right hand side of the Z-plane and the frequency in Hz is related to re and im by the expression:
freq = arctan(sqrt(im)/re)*4000/pi (6)
Having ensured that both im and re are positive the test for a pole frequency less than 385 Hz can be derived by substituting equations 4 and 5 into equation 6 and re-arranging:
(4*a[2] – a[1]2 )/a[1]2 < tan2(pi*385/4000) (7)
or
(4*a[2] – a[1]2)/a[1]2 < 0.0973 (8)
If this test is true then the signal is not a tone and the algorithm terminates, otherwise the prediction error test is performed.
Annex C (informative):
Change history
Change history |
|||||
SMG No. |
TDoc. No. |
CR. No. |
Clause affected |
New version |
Subject/Comments |
SMG#22 |
4.0.1 |
ETSI Publication |
|||
SMG#20 |
5.0.3 |
Release 1996 version |
|||
SMG#27 |
6.0.0 |
Release 1997 version |
|||
SMG#29 |
7.0.0 |
Release 1998 version |
|||
7.0.1 |
Version update to 7.0.1 for Publication |
||||
SMG#31 |
8.0.0 |
Release 1999 version |
|||
8.0.1 |
Update to Version 8.0.1 for Publication |
Change history |
|||||||
Date |
TSG # |
TSG Doc. |
CR |
Rev |
Subject/Comment |
Old |
New |
03-2001 |
11 |
Version for Release 4 |
4.0.0 |
||||
06-2002 |
16 |
Version for Release 5 |
4.0.0 |
5.0.0 |
|||
12-2004 |
26 |
Version for Release 6 |
5.0.0 |
6.0.0 |
|||
06-2007 |
36 |
Version for Release 7 |
6.0.0 |
7.0.0 |
|||
07-2007 |
Makes matrices in §5.2.3 visible |
7.0.0 |
7.0.1 |
||||
12-2008 |
42 |
Version for Release 8 |
7.0.1 |
8.0.0 |
|||
12-2009 |
46 |
Version for Release 9 |
8.0.0 |
9.0.0 |
|||
03-2011 |
51 |
Version for Release 10 |
9.0.0 |
10.0.0 |
|||
09-2012 |
57 |
Version for Release 11 |
10.0.0 |
11.0.0 |
|||
09-2014 |
65 |
Version for Release 12 |
11.0.0 |
12.0.0 |
|||
12-2015 |
70 |
Version for Release 13 |
12.0.0 |
13.0.0 |
Change history |
|||||||
Date |
Meeting |
TDoc |
CR |
Rev |
Cat |
Subject/Comment |
New version |
03-2017 |
SA#75 |
Version for Release 14 |
14.0.0 |
||||
06-2018 |
SA#80 |
Version for Release 15 |
15.0.0 |
||||
2020-07 |
– |
– |
– |
– |
– |
Update to Rel-16 version (MCC) |
16.0.0 |
2022-04 |
– |
– |
– |
– |
– |
Update to Rel-17 version (MCC) |
17.0.0 |