3 Definitions, symbols and abbreviations
3GPP46.060Enhanced Full Rate (EFR) speech transcodingRelease 17TS
3.1 Definitions
For the purposes of the present document, the following terms and definitions apply:
adaptive codebook: adaptive codebook contains excitation vectors that are adapted for every subframe. The adaptive codebook is derived from the long term filter state. The lag value can be viewed as an index into the adaptive codebook.
adaptive postfilter: this filter is applied to the output of the short term synthesis filter to enhance the perceptual quality of the reconstructed speech. In the GSM enhanced full rate codec, the adaptive postfilter is a cascade of two filters: a formant postfilter and a tilt compensation filter.
algebraic codebook: fixed codebook where algebraic code is used to populate the excitation vectors (innovation vectors).The excitation contains a small number of nonzero pulses with predefined interlaced sets of positions.
closed‑loop pitch analysis: this is the adaptive codebook search, i.e., a process of estimating the pitch (lag) value from the weighted input speech and the long term filter state. In the closed‑loop search, the lag is searched using error minimization loop (analysis‑by‑synthesis). In the GSM enhanced full rate codec, closed‑loop pitch search is performed for every subframe.
direct form coefficients: one of the formats for storing the short term filter parameters. In the GSM enhanced full rate codec, all filters which are used to modify speech samples use direct form coefficients.
fixed codebook: fixed codebook contains excitation vectors for speech synthesis filters. The contents of the codebook are non‑adaptive (i.e., fixed). In the GSM enhanced full rate codec, the fixed codebook is implemented using an algebraic codebook.
fractional lags: set of lag values having sub‑sample resolution. In the GSM enhanced full rate codec a sub‑sample resolution of 1/6th of a sample is used.
frame: time interval equal to 20 ms (160 samples at an 8 kHz sampling rate).
integer lags: set of lag values having whole sample resolution.
interpolating filter: FIR filter used to produce an estimate of sub‑sample resolution samples, given an input sampled with integer sample resolution.
inverse filter: this filter removes the short term correlation from the speech signal. The filter models an inverse frequency response of the vocal tract.
lag: long term filter delay. This is typically the true pitch period, or a multiple or sub‑multiple of it.
Line Spectral Frequencies: (see Line Spectral Pair).
Line Spectral Pair: transformation of LPC parameters. Line Spectral Pairs are obtained by decomposing the inverse filter transfer function A(z) to a set of two transfer functions, one having even symmetry and the other having odd symmetry. The Line Spectral Pairs (also called as Line Spectral Frequencies) are the roots of these polynomials on the z-unit circle).
LP analysis window: for each frame, the short term filter coefficients are computed using the high pass filtered speech samples within the analysis window. In the GSM enhanced full rate codec, the length of the analysis window is 240 samples. For each frame, two asymmetric windows are used to generate two sets of LP coefficients. No samples of the future frames are used (no lookahead).
LP coefficients: Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding (LPC) coefficients) is a generic descriptive term for describing the short term filter coefficients.
open‑loop pitch search: process of estimating the near optimal lag directly from the weighted speech input. This is done to simplify the pitch analysis and confine the closed‑loop pitch search to a small number of lags around the open‑loop estimated lags. In the GSM enhanced full rate codec, open‑loop pitch search is performed every 10 ms.
residual: output signal resulting from an inverse filtering operation.
short term synthesis filter: this filter introduces, into the excitation signal, short term correlation which models the impulse response of the vocal tract.
perceptual weighting filter: this filter is employed in the analysis‑by‑synthesis search of the codebooks. The filter exploits the noise masking properties of the formants (vocal tract resonances) by weighting the error less in regions near the formant frequencies and more in regions away from them.
subframe: time interval equal to 5 ms (40 samples at an 8 kHz sampling rate).
vector quantization: method of grouping several parameters into a vector and quantizing them simultaneously.
zero input response: output of a filter due to past inputs, i.e. due to the present state of the filter, given that an input of zeros is applied.
zero state response: output of a filter due to the present input, given that no past inputs have been applied, i.e., given the state information in the filter is all zeroes.
3.2 Symbols
For the purposes of the present document, the following symbols apply:
The inverse filter with unquantized coefficients
The inverse filter with quantified coefficients
The speech synthesis filter with quantified coefficients
The unquantized linear prediction parameters (direct form coefficients)
The quantified linear prediction parameters
The order of the LP model
The long‑term synthesis filter
The perceptual weighting filter (unquantized coefficients)
The perceptual weighting factors
Adaptive pre‑filter
The nearest integer pitch lag to the closed‑loop fractional pitch lag of the subframe
The adaptive pre‑filter coefficient (the quantified pitch gain)
The formant postfilter
Control coefficient for the amount of the formant post‑filtering
Control coefficient for the amount of the formant post‑filtering
Tilt compensation filter
Control coefficient for the amount of the tilt compensation filtering
A tilt factor, with being the first reflection coefficient
The truncated impulse response of the formant postfilter
The length of
The auto‑correlations of
The inverse filter (numerator) part of the formant postfilter
The synthesis filter (denominator) part of the formant postfilter
The residual signal of the inverse filter
Impulse response of the tilt compensation filter
The AGC‑controlled gain scaling factor of the adaptive postfilter
The AGC factor of the adaptive postfilter
Pre‑processing high‑pass filter
, LP analysis windows
Length of the first part of the LP analysis window
Length of the second part of the LP analysis window
Length of the first part of the LP analysis window
Length of the second part of the LP analysis window
The auto‑correlations of the windowed speech
Lag window for the auto‑correlations (60 Hz bandwidth expansion)
The bandwidth expansion in Hz
The sampling frequency in Hz
The modified (bandwidth expanded) auto‑correlations
The prediction error in the ith iteration of the Levinson algorithm
The ith reflection coefficient
The jth direct form coefficient in the ith iteration of the Levinson algorithm
Symmetric LSF polynomial
Antisymmetric LSF polynomial
Polynomial with root eliminated
Polynomial with root eliminated
The line spectral pairs (LSPs) in the cosine domain
An LSP vector in the cosine domain
The quantified LSP vector at the ith subframe of the frame n
The line spectral frequencies (LSFs)
A th order Chebyshev polynomial
The coefficients of the polynomials and
The coefficients of the polynomials and
The coefficients of either or
Sum polynomial of the Chebyshev polynomials
Cosine of angular frequency
Recursion coefficients for the Chebyshev polynomial evaluation
The line spectral frequencies (LSFs) in Hz
The vector representation of the LSFs in Hz
, The mean‑removed LSF vectors at frame n
, The LSF prediction residual vectors at frame n
The predicted LSF vector at frame n
The quantified second residual vector at the past frame
The quantified LSF vector at quantization index k
The LSP quantization error
LSP‑quantization weighting factors
The distance between the line spectral frequencies and
The impulse response of the weighted synthesis filter
The correlation maximum of open‑loop pitch analysis at delay k
The correlation maxima at delays
The normalized correlation maxima and the corresponding delays
The weighted synthesis filter
The numerator of the perceptual weighting filter
The denominator of the perceptual weighting filter
The nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframe
The windowed speech signal
The weighted speech signal
Reconstructed speech signal
The gain‑scaled post‑filtered signal
Post‑filtered speech signal (before scaling)
The target signal for adaptive codebook search
, The target signal for algebraic codebook search
The LP residual signal
The fixed codebook vector
The adaptive codebook vector
The filtered adaptive codebook vector
The past filtered excitation
The excitation signal
The emphasized adaptive codebook vector
The gain‑scaled emphasized excitation signal
The best open‑loop lag
Minimum lag search value
Maximum lag search value
Correlation term to be maximized in the adaptive codebook search
The FIR filter for interpolating the normalized correlation term
The interpolated value of for the integer delay k and fraction t
The FIR filter for interpolating the past excitation signal to yield the adaptive codebook vector
Correlation term to be maximized in the algebraic codebook search at index k
The correlation in the numerator of at index k
The energy in the denominator of at index k
The correlation between the target signal and the impulse response , i.e., backward filtered target
The lower triangular Toepliz convolution matrix with diagonal and lower diagonals
The matrix of correlations of
The elements of the vector d
The elements of the symmetric matrix
The innovation vector
The correlation in the numerator of
The position of the i th pulse
The amplitude of the i th pulse
The number of pulses in the fixed codebook excitation
The energy in the denominator of
The normalized long‑term prediction residual
The sum of the normalized vector and normalized long‑term prediction residual
The sign signal for the algebraic codebook search
Sign extended backward filtered target
The modified elements of the matrix , including sign information
, The fixed codebook vector convolved with
The mean‑removed innovation energy (in dB)
The mean of the innovation energy
The predicted energy
The MA prediction coefficients
The quantified prediction error at subframe k
The mean innovation energy
The prediction error of the fixed‑codebook gain quantization
The quantization error of the fixed‑codebook gain quantization
The states of the synthesis filter
The perceptually weighted error of the analysis‑by‑synthesis search
The gain scaling factor for the emphasized excitation
The fixed‑codebook gain
The predicted fixed‑codebook gain
The quantified fixed codebook gain
The adaptive codebook gain
The quantified adaptive codebook gain
A correction factor between the gain and the estimated one
The optimum value for
Gain scaling factor
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply. Further GSM related abbreviations may be found in GSM 01.04 [1].
ACELP Algebraic Code Excited Linear Prediction
AGC Adaptive Gain Control
CELP Code Excited Linear Prediction
FIR Finite Impulse Response
ISPP Interleaved Single‑Pulse Permutation
LP Linear Prediction
LPC Linear Predictive Coding
LSF Line Spectral Frequency
LSP Line Spectral Pair
LTP Long Term Predictor (or Long Term Prediction)
MA Moving Average