4 C code structure
26.2433GPPANSI-C code for the fixed-point distributed speech recognition extended advanced front-endRelease 17TS
This clause gives an overview of the structure of the bit‑exact C code and provides an overview of the contents and organization of the C code attached to this document.
The C code has been verified on the following systems:
– Sun Microsystems workstations and GNU gcc compiler
– IBM PC compatible computers with Linux operating system and GNU gcc compiler.
ANSI‑C was selected as the programming language because portability was desirable.
4.1 Contents of the C source code
The distributed files with suffix "c" contain the source code and the files with suffix "h" are the header files.
Makefiles are provided for the platforms in which the C code has been verified (listed above).
4.2 Program execution
There are separate executables for the FrontEnd and Vector Quantization, with and without Extensions. The command line options are described below.
<> – indicates parameters for the given option for running the executable
() – indicates default parameter.
FrontEnd w/ Extension:
USAGE: bin/ExtAdvFrontEnd infile HTK_outfile pitch_outfile class_outfile [options]
OPTIONS:
-q Quiet Mode (FALSE)
-F format Input file format <NIST,HTK,RAW> (NIST)
-fs freq Sampling frequency in kHz <8,16> (8)
-swap Change input byte ordering (Native)
-noh No HTK header to output file (FALSE)
-noc0 No c0 coefficient to output feature vector (FALSE)
-nologE No logE component to output feature vector (FALSE)
-skip_header_bytes n – Skip header, first n bytes ( Only for -F RAW)
-noh, -noc0, -nologE and –skip_header_bytes are not used and should not be changed.
FrontEnd w/o Extension:
USAGE: bin/AdvFrontEnd infile HTK_outfile [options]
OPTIONS: – Same as FrontEnd w/ Extension
Vector Quantization w/ Extension:
Usage: extcoder htk_file_in pitch_file_in class_file_in bitstream_file_out pitch_file_out txt_file_out -freq x -VAD/No_VAD
htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format.
pitch_file_in Input pitch period file.
class_file_in Input classification file.
bit_file_out Output binary bitstream.
pitch_file_out Output quantised pitch period file.
txt_file_out Vector quantiser output in text format.
-freq x Sampling frequency in kHz (8 or 16).
-VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but extension .vad
-No_VAD Do not incorporate voice activity detector information in output bitstream.
Vector Quantization w/o Extension:
Usage: coder htk_file_in bitstream_file_out txt_file_out -freq x -VAD/No_VAD
htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format.
bit_file_out Binary output bitstream.
txt_file_out Vector quantiser output in text format.
-freq x Sampling frequency in kHz (8 or 16).
-VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but extension .vad
-No_VAD Do not incorporate voice activity detector information in output bitstream.
File extension descriptions as generated by the sample script:
.cep – Binary file containing cepstral features in HTK format. Output from the FrontEnd, input to the vector quantizer.
.pitch – Binary file containing pitch information. Output from the FrontEnd, input to the vector quantizer. Only used for Extension.
.class – Ascii file containing class information. Output from the FrontEnd, input to the vector quantizer. Only used for Extension.
.bs – Binary file containing the bitstream. Output from the vector quantizer.
.log – Log files from the different executables.
4.3 Code hierarchy
Tables 1 to 3 are call graphs that show the functions used for AFE (table 1), VQ (table 2), and Extension (table 3).
Each column represents a call level and each cell a function. The functions contain calls to the functions in rightwards neighboring cells. The time order in the call graphs is from the top downwards as the processing of a frame advances. All standard C functions: printf(), fwrite(), etc. have been omitted. Also, no basic operations (add(), L_add(), mac(), etc.) or double precision extended operations (e.g. L_Extract()) appear in the graphs.
The basic operations are not counted as extending the depth, therefore the deepest level in this software is level 7.
Table 1: AFE call structure
|
main() |
||||
|
AdvProcessInit_B() |
||||
|
DoNoiseSupInit_B() |
||||
|
DoWaveProcInit_B() |
||||
|
DoCompCepsInit_B() |
||||
|
DoPostProcInit_B() |
||||
|
DoVADInit_F() |
||||
|
Do16kProcInit_B() |
||||
|
QMF_FIR_Init_B() |
||||
|
fir_initialization_B() |
||||
|
DP_HP_filters_B() |
||||
|
BufIn32Alloc() |
||||
|
AdvProcessAlloc_B() |
||||
|
DoNoiseSupAlloc_B() |
||||
|
DoWaveProcAlloc_B() |
||||
|
DoCompCepsAlloc_B() |
||||
|
DoPostProcAlloc_B() |
||||
|
DoVADAlloc_F() |
||||
|
Do16kProcAlloc_B() |
||||
|
FlushAdvProcess_B() |
||||
|
DoVADFlush_F() |
||||
|
CvFeatInt2Float() |
||||
|
AdvProcessDelete_B() |
||||
|
DoNoiseSupDelete_B() |
||||
|
DoWaveProcDelete_B() |
||||
|
DoCompCepsDelete_B() |
||||
|
DoPostProcDelete_B() |
||||
|
DoVADDelete_B() |
||||
|
BufIn32Free() |
||||
|
DoAdvProcess_B() |
||||
|
Do16kProcessing_B() |
||||
|
DoNoiseSup_B() |
||||
|
Get16k_p_bufferData16k_B() |
||||
|
Get16k_bufData16kSize_B() |
||||
|
Get16k_p_BandsForCoding16k_B() |
||||
|
Get16k_p_CodeForBands16k_B() |
||||
|
Get16k_dataHP_B() |
||||
|
VAD_F() |
||||
|
Log_2() |
||||
|
DoSigWindowing16_F1() |
||||
|
DoSigWindowing16_F2() |
||||
|
ff4NRFix32_B() |
||||
|
GetL15() |
||||
|
GetH15() |
||||
|
Mult16x32() |
||||
|
Add_Mult16x16_16() |
||||
|
Sub_Mult16x16_16() |
||||
|
Permut() |
||||
|
FFTtoPSD_F() |
||||
|
Square24d2_B() |
||||
|
Square24_B() |
||||
|
Get16k_BFC_dec_B() |
||||
|
GetBandsForCoding16k_B() |
||||
|
PSDMean_F() |
||||
|
NoiseEstimation_F1() |
||||
|
Sqrt_2() |
||||
|
Sqrt16_2() |
||||
|
NoiseEstimation_F2() |
||||
|
Sqrt_2() |
||||
|
Sqrt16_2() |
||||
|
FilterCalc_F() |
||||
|
SpeechQVar() |
||||
|
FilterBank16() |
||||
|
SpeechQSpec() |
||||
|
SpeechQMel() |
||||
|
DoGainFact_F1() |
||||
|
Log_2() |
||||
|
DoGainFact_F2() |
||||
|
Log_2() |
||||
|
DoMelIDCT_F16() |
||||
|
ApplyWF() |
||||
|
Get16k_dec1() |
||||
|
Get16k_dec2() |
||||
|
Get16k_dec3() |
||||
|
DoSigWindowing16_F3() |
||||
|
ff4NRFix32_B() |
||||
|
GetL15() |
||||
|
GetH15() |
||||
|
Mult16x32() |
||||
|
Add_Mult16x16_16() |
||||
|
Sub_Mult16x16_16() |
||||
|
Permut() |
||||
|
FFTtoPSD_F() |
||||
|
Square24d2_B() |
||||
|
Square24_B() |
||||
|
DoMelFB_B() |
||||
|
CodeBands16k_B() |
||||
|
DoSpecSub16k_B() |
||||
|
Log_2() |
||||
|
UpDateDecal() |
||||
|
ApplyDecal() |
||||
|
DCOffsetFil_F() |
||||
|
Get16k_hpBandsSize_B() |
||||
|
Get16k_p_hpBands_B() |
||||
|
Get16k_p_bufferCodeForBands16k_B() |
||||
|
Get16k_p_CodeForBands16k_B() |
||||
|
Get16k_p_bufferCodeWeights_B() |
||||
|
Get16k_p_codeWeights_B() |
||||
|
Set16k_hpBands_dec_B() |
||||
|
DoWaveProc_B() |
||||
|
TeagerEng() |
||||
|
GetTeagerFilter() |
||||
|
GetMaximaPositions() |
||||
|
DoCompCeps_B() |
||||
|
CepsCompute() |
||||
|
Get16k_p_bufferCodeWeights_B() |
||||
|
Get16k_p_bufferCodeForBands16k_B() |
||||
|
PreEmphHamm() |
||||
|
ff4NB16_B() |
||||
|
GetBandsForDecoding16k_B() |
||||
|
DecodeBands16k_B() |
||||
|
FilterBank() |
||||
|
Get16k_hpBands_dec_B() |
||||
|
Get16k_p_hpBands_B() |
||||
|
MergeSSandCoded_B() |
||||
|
CorrectEnergy_B() |
||||
|
CosInv16Khz() |
||||
|
cosInv() (only for 8kHz) |
||||
|
DoPostProc_B() |
||||
|
DoVADProc_F() |
||||
|
focalpoint() |
Table 2: VQ call structure
|
main() |
|||||
|
quantize_and_print() |
|||||
|
get_best_dataframe() |
|||||
|
best_centroid() |
|||||
|
quant_pitch_abs() |
|||||
|
get_class_bit() |
|||||
|
quant_pitch_diff() |
|||||
|
get_class_bit() |
|||||
|
mfcc_crc_encode() |
|||||
|
pc_crc_encode() |
Table 3: Extension call structure
|
main() |
|||||||
|
RVC_ConstructPitchRom_be() |
|||||||
|
RVC_ConstructPitchMeter_be() |
|||||||
|
Allocate_InterpolatedDft_be() |
|||||||
|
RVC_ResetPitchMeter_be() |
|||||||
|
RVC_DestructPitchRom_be() |
|||||||
|
RVC_DestructPitchMeter_be() |
|||||||
|
Deallocate_InterpolatedDft_be() |
|||||||
|
DoAdvProcess_B() |
|||||||
|
DoPitchExtract() |
|||||||
|
FilterBank() |
|||||||
|
dsr_afe_vad() |
|||||||
|
get_vm() |
|||||||
|
fnLog2() |
|||||||
|
IsLowBandNoise() |
|||||||
|
get_zcm() |
|||||||
|
pre_process() |
|||||||
|
iir_d() |
|||||||
|
iir_s() |
|||||||
|
RVC_MeasurePitch_be() |
|||||||
|
ClearPitch_be() |
|||||||
|
DirichletInterpolation_be() |
|||||||
|
IsLowLevelInput_be() |
|||||||
|
Finalize_be() |
|||||||
|
IsContinuousPitch_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
PrepareSpectralPeaks_be() |
|||||||
|
CalcSpectrum_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
Mpy_lw_sw_Add() |
|||||||
|
FindPeaks_be() |
|||||||
|
Prelim_ScaleDownAmpsOfHighFreqPeaks_be() |
|||||||
|
qsort_be()* |
|||||||
|
swap() |
|||||||
|
CompareIpointAmp_be() |
|||||||
|
RefineSpectralPeaks_be() |
|||||||
|
sqrt_l_fix() |
|||||||
|
Final_ScaleDownAmpsOfHighFreqPeaks_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
FindPitchCandidates_be() |
|||||||
|
NormalizeAmplitudes_be() |
|||||||
|
CalcUtilityFunction_be() |
|||||||
|
CreatePieceWiseConstantFunction_be() |
|||||||
|
L_Extract() |
|||||||
|
Mpy_32_16() |
|||||||
|
qsort_be()* |
|||||||
|
swap() |
|||||||
|
Compare_ARRAY_OF_XPOINTS_be() |
|||||||
|
LinkArrayOfPoints_be() |
|||||||
|
AddSortedArrayOfPoints_be() |
|||||||
|
LinkArrayOfPoints_be() |
|||||||
|
ConvertLinkedListOfDiffPointsToUtilFunc_be() |
|||||||
|
FindDominantLocalMaximaInUtilityFunction_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
UtilityFunctionAtGivenPitchFreq_be() |
|||||||
|
qsort_be()* |
|||||||
|
swap() |
|||||||
|
ComparePitchFreqAscending_be() |
|||||||
|
SelectTopPitchCandidates_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
compute_pcorr_be() |
|||||||
|
interpolate_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
Mpy_lw_lw() |
|||||||
|
sqrt_l_fix() |
|||||||
|
find_most_energetic_window_be() |
|||||||
|
accumulate_be() |
|||||||
|
find_most_energetic_window2_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
SelectFinalPitch_be() |
|||||||
|
qsort_be()* |
|||||||
|
swap() |
|||||||
|
ComparePitchFreqDescending_be() |
|||||||
|
ClearPitch_be() |
|||||||
|
GOOD_ENOUGH_be() |
|||||||
|
CLOSELY_LOCATED_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
BETTER_be() |
|||||||
|
IsContinuousPitch_be() |
|||||||
|
Mpy_lw_sw() |
|||||||
|
CalculateDoubleWindowDft_be() |
|||||||
|
classify_frame() |
* qsort_be() is a recursive function
4.5 Variables, constants and tables
The data types of variables and tables used in the fixed point implementation are signed integers in 2’s complement representation, defined by:
– Word16 16 bit variable;
– Word32 32 bit variable.
4.5.1 Description of constants used in the C-code
Table 5a: Global constants for AFE
|
Constant |
Value |
Description |
|
NS_SPEC_ORDER_16K |
64 |
Noise suppression Array length |
|
NS_HANGOVER_16K |
15 |
Noise suppression hangover count |
|
NS_MIN_SPEECH_FRAME_HANGOVER_16K |
4 |
Noise suppression minmum speech frame hangover count |
|
NS_ANALYSIS_WINDOW_16K |
80 |
Noise suppression analysis window |
|
PERC_CODED |
0.7 |
lambda merge (empirically set constant) |
|
LAMBDA_NSE16k |
0.99 |
Noise estimation Lambda |
|
NS_NB_FRAME_THRESHOLD_NSE |
100 |
Noise suppression number of frame threshold used for NSE |
|
LENGTH_QMF |
118 |
QMF filter length |
|
f24 |
1 |
multiplier for QMF filter coefficients |
|
SHFF_H |
8 |
shift to get higher value |
|
L_H |
16 |
shift to get lower value |
|
HP16k_MEL_USED |
3 |
Higher frequnecy band Mel used |
|
NB_LP_BANDS_CODING |
3 |
Lower frequency band used in coding |
|
NE16k_FRAMES_THRESH |
100 |
Noise estimation frames threshold |
|
NB_TOPOSTPROC |
12 |
Number of coefficients to postprocess |
|
CEP_FRAME_LENGTH |
200 |
Frame length for cepstral coefficients |
|
CEP_NB_COEF |
13 |
Number of cepstral coefficients (including c0) |
|
CEP_NB_CHANNELS |
23 |
Number of filters used for cepstral coefficients |
|
CEP_FFT_LENGTH |
256 |
FFT length for cepstral coefficients |
|
FRAME_BUF_SIZE |
241 |
Denoised Output buffer size |
|
FRAME_SHIFT |
80 |
WaveProcessing input frame shift |
|
FRAME_LENGTH |
200 |
WaveProcessing frame size |
|
NS_SPEC_ORDER |
65 |
Noise suppression array length (8khz) |
|
NS_BUFFER_SIZE |
180 |
Noise suppression past frame size |
|
NS_FRAME_SHIFT |
80 |
Noise suppression input frame shift |
|
NS_HALF_FILTER_LENGTH |
8 |
Noise suppression filter half size |
|
NS_NB_FRAME_THRESHOLD_LTE |
10 |
Noise suppression long term energy forgetting factor threshold (in frames) |
|
NS_NB_FRAME_THRESHOLD_NSE |
100 |
Noise suppression spectrum estimate forgetting factor threshold (in frames) |
|
NS_MIN_FRAME |
10 |
Number of frame threshold to update average energy for Nosie suppression VAD |
|
NS_FFT_LENGTH |
256 |
FFT length for noise suppression |
|
WF_MEL_ORDER |
25 |
Noise suppression Wiener filter order |
|
SHFT_NOISE |
14 |
shift applied to noise spectrum estimate |
|
SHFT_FACT_MUL |
14 |
shift applied to gain coefficient (nosie suppression gain factoriization) |
|
IDCT_ORDER |
25 |
Noise suppression idct order |
|
NS_BETA |
0.98 |
Noiseless signal suppression factor |
|
NS_RSB_MIN |
0.079432823 |
Minimum a priori SNR |
|
NS_LAMBDA_NSE |
0.99 |
Forgetting factor for noise spectrum estimate |
|
NS_LOG_SPEC_FLOOR |
-10.0 |
average energy minimum threshold |
|
NS_SNR_THRESHOLD_VAD |
15 |
SNR threshold for noise suppression VAD |
|
NS_SNR_THRESHOLD_UPD_LTE |
20 |
Long term energy update threshold for noise suppression VAD |
|
NS_ENERGY_FLOOR |
80 |
Energy Minimum threshold for noise suppression VAD |
|
MaxPos |
10 |
Maximum number of maxima in waveprocessing |
|
WP_EPS |
0.2 |
weigthing value added or substracted for waveprocessing |
Table 5b: Global constants for VQ
|
Constant |
Value |
Description |
|
MIN_PERIOD |
1245184 |
Minimum pitch period allowed |
|
MAX_PERIOD |
9175040 |
Maximum pitch period allowed |
|
NUM_MULTI_LEVELS_1 |
26 |
number of levels in pitch quantization |
|
NUM_MULTI_LEVELS_2 |
24 |
number of levels in pitch quantization |
|
UNVOICED_CODE |
0 |
init value for Qpindex |
Table 5c: Global constants for Extension
|
Constant |
Value |
Description |
|
HISTORY_LEN |
100 |
History length – past samples for pitch extraction |
|
DOWN_SAMP_FACTOR |
4 |
Down-sampling factor – used in computing correlation |
|
NO_OF_DFT_POINTS |
128 |
Number of DFT points |
|
BREAK_POINT |
12 |
Break point – marks the end of low frequency band |
|
LBN_HIST_WEIGHT |
32440 |
Low band noise history weight |
|
LBN_CURR_WEIGHT |
328 |
Low band noise current weight (32768 – LBN_HIST_WEIGHT) |
|
LBN_MAX_THR |
124518 |
Low band noise maximum threshold |
|
LBN_LOW_ENR_LEVEL_MANT |
32000 |
Low band noise low energy level mantissa |
|
LBN_LOW_ENR_LEVEL_SHFT |
22 |
Low band noise low energy level shift |
|
RVC_OK |
0 |
Return code for success |
|
RVC_ERR |
-1 |
Return code for unspecified error |
|
RVC_ERR_NOT_ENOUGH_MEMORY |
-2 |
Return code for not enough memory |
|
RVC_ERR_ILLEGAL_ARGUMENT |
-3 |
Return code for an illegal input / output argument |
|
RVC_ERR_IO_FAILED |
-4 |
Return code for failed input / output to a file |
|
RVC_ERR_BAD_FILE_FORMAT |
-5 |
Return code for a bad file header |
|
RVC_ERR_NOT_INITIALIZED |
-6 |
Return code for failure due to improper initialization |
|
RVC_ERR_ILLEGAL_USAGE |
-7 |
Return code for illegal usage of a function |
|
RVC_ERR_NOT_ENOUGH_SAMPLES |
-8 |
Return code for insufficient number of samples |
|
RVC_ERR_NOT_IMPLEMENTED |
-9 |
Return code for an unimplemented function |
|
RVC_ERR_FAIL_OPEN_FILE |
-10 |
Return code for failure to open a file |
|
UB_ENRG_FRAC |
59 |
Upper band energy fraction |
|
ZCM_THLD |
87 |
Zero crossing measure threshold |
|
SQRT_ONE_HALF |
0x5A82 |
Square root of 0.5 (0.707) |
|
FRAME_LEN_DS |
50 |
Frame length downsampled (200/4) |
|
FRAME_LEN_DS_BY_2 |
25 |
Frame length downsampled divided by 2 |
|
HISTORY_LEN_DS |
25 |
History length downsampled (100/4) |
|
WINDOW_LENGTH |
18 |
Window length used in computing correlation |
|
INV_WINDOW_LENGTH |
1820 |
Inverse of window length (1/18 = 0.05556) |
|
NUM_CHAN |
23 |
Number of channels or Mel-frequency bands |
|
MIN_CH_ENRG_MANTISSA |
20000 |
Minimum channel energy mantissa |
|
MIN_CH_ENRG_SHIFT |
25 |
Minimum channel energy shift |
|
INIT_SIG_ENRG_MANTISSA |
30518 |
Initial signal energy mantissa |
|
INIT_SIG_ENRG_SHIFT |
8 |
Initial signal energy shift |
|
CE_SM_FAC |
18022 |
Channel energy smoothing factor |
|
CE_SM_FAC_COMPL |
14746 |
Channel energy smoothing factor complement |
|
CNE_SM_FAC |
3277 |
Channel noise energy smoothing factor |
|
CNE_SM_FAC_COMPL |
29491 |
Channel noise energy smoothing factor complement |
|
LO_GAMMA |
22938 |
Low gamma value |
|
LO_GAMMA_COMPL |
9830 |
Low gamma value complement |
|
HI_GAMMA |
29491 |
High gamma value |
|
HI_GAMMA_COMPL |
3277 |
High gamma value complement |
|
LO_BETA |
31130 |
Low beta value |
|
HI_BETA |
32702 |
High beta value |
|
INIT_FRAMES |
10 |
Initial number of frames (considered to be noise frames) |
|
SINE_START_CHAN |
4 |
Sine start channel (for sine wave detection) |
|
PEAK_TO_AVE_THLD |
10 |
Peak to average threshold |
|
DEV_THLD |
1523942 |
Deviation threshold |
|
HYSTER_CNT_THLD |
9 |
Hysteresis count threshold |
|
F_UPDATE_CNT_THLD |
500 |
Forced update count threshold |
|
NON_SPEECH_THLD |
32 |
Non-speech threshold |
|
FIX_34 |
24576 |
(short) (32768.0 * 3.0/4.0) |
|
FIX_18 |
4096 |
(short) (32768.0 * 1.0/8.0) |
|
FIX_INVSQRT2 |
-23170 |
1 / sqrt(2) |
|
swTHIRD_REF_BANDWIDTH |
85 |
One third of the reference bandwidth |
|
swTWO_THIRDS_REF_BANDWIDTH |
171 |
Two thirds of the reference bandwidth |
|
MIN_ENERGY_MANTISSA |
25600 |
Minimum energy mantissa |
|
MIN_ENERGY_SHIFT |
18 |
Minimum energy shift |
|
swREF_SAMPLE_RATE_Q0 |
0x1F40 |
Reference sampling rate in Q0 format |
|
swCLOSE_FACTOR_Q14 |
0x4CCD |
Closeness factor in Q14 format |
|
swFD_SCORE_THLD1_Q15 |
0x63D7 |
Frequency domain score threshold 1 in Q15 format |
|
swFD_SCORE_THLD2_Q15 |
0x570A |
Frequency domain score threshold 2 in Q15 format |
|
swCORR_THLD_Q15 |
0x651F |
Correlation threshold in Q15 format |
|
swSUM_THLD_Q14 |
0x6667 |
Sum threshold in Q14 format |
|
lwCRIT0_OFFSET_Q15 |
0x0000170A |
Offset for finding a better pitch candidate in Q15 format |
|
swCANDCORR_THLD1_Q15 |
0x799A |
Pitch candidate correlation threshold 1 in Q15 format |
|
swCANDCORR_THLD2_Q15 |
0x599A |
Pitch candidate correlation threshold 2 in Q15 format |
|
swCANDCORR_THLD3_Q15 |
0x6CCD |
Pitch candidate correlation threshold 3 in Q15 format |
|
swCANDAMP_THLD3_Q15 |
0x68F6 |
Pitch candidate amplitude threshold 3 in Q15 format |
|
swSTARTFREQ_COEFF |
0x553F |
Start frequency coefficient (for candidate search) |
|
swENDFREQ_COEFF |
0x4666 |
End frequency coefficient (for candidate search) |
|
DIRICHLET_KERNEL_SPAN |
8 |
Direchlet kernal span (for interpolation) |
|
REF_SAMPLE_RATE |
8000 |
Reference sampling rate |
|
REF_BANDWIDTH |
4000 |
Reference bandwidth |
|
lwTHIRD_REF_BANDWIDTH |
87381333 |
One third of the reference bandwidth |
|
lwTWO_THIRDS_REF_BANDWIDTH |
174762667 |
Two thirds of the reference bandwidth |
|
swCENTER_WEIGHT |
0x5000 |
Center weight |
|
swSIDE_WEIGHT |
0x1800 |
Side weight |
|
swAMP_SCALE_DOWN1 |
0x5333 |
Amplitude scale down factor 1 |
|
swAMP_SCALE_DOWN2 |
0x399A |
Amplitude scale down factor 2 |
|
swAMP_SCALE_DOWN2b |
0x7333 |
Amplitude scale down factor 2b |
|
swUDIST1 |
-4160 |
Utility function distance 1 |
|
swUDIST2 |
-6400 |
Utility function distance 2 |
|
swUSTEP |
-16384 |
Utility function step |
|
swFREQ_MARGIN1 |
0x4AE1 |
Frequency margin 1 |
|
swAMP_MARGIN1 |
0x07AE |
Amplitude margin 1 |
|
swAMP_MARGIN2 |
0x07AE |
Amplitude margin 2 |
|
MIN_STABLE_FRAMES |
6 |
Minimum number of stable frames |
|
MAX_TRACK_GAP_FRAMES |
2 |
Maximum pitch track gap frames |
|
swSTABLE_FREQ_UPPER_MARGIN |
0x4E14 |
Stable frequency upper margin |
|
swSTABLE_FREQ_LOWER_MARGIN |
0x68EB |
Stable frequency lower margin |
|
UNVOICED |
0 |
Pitch frequency of an unvoiced frame |
|
lwMAX_PITCH_FREQ |
0x01A40000L |
Maximum pitch frequency |
|
lwMIN_PITCH_FREQ |
0x00340000L |
Minimum pitch frequency |
|
MAX_PITCH_FREQ |
420 |
Maximum pitch frequency in Hz |
|
MIN_PITCH_FREQ |
52 |
Minimum pitch frequency in Hz |
|
HIGHPASS_CUTOFF_FREQ |
300 |
Highpass cut-off frequency in Hz |
|
NO_OF_FRACS |
77 |
Number of fractions in the frations table |
|
lwSHORT_WIN_START_FREQ |
0x00C80000L |
Short window start frequency |
|
lwSHORT_WIN_END_FREQ |
0x01A40000 |
Short window end frequency |
|
lwSINGLE_WIN_START_FREQ |
0x00640000L |
Single window start frequency |
|
lwSINGLE_WIN_END_FREQ |
0x00D20000L |
Single window end frequency |
|
lwDOUBLE_WIN_START_FREQ |
0x00340000 |
Double window start frequency |
|
lwDOUBLE_WIN_END_FREQ |
0x00780000L |
Double window end frequency |
|
MAX_LOCAL_MAXIMA_ON_SPECTRUM |
70 |
Maximum number of local maxima on the spectrum |
|
MAX_PEAKS_FOR_SORT |
30 |
Maximum number peaks for sorting |
|
MAX_PEAKS_PRELIM |
7 |
Maximum number of peaks (preliminary) |
|
MIN_PEAKS |
7 |
Minimum number of peaks |
|
MAX_PEAKS_FINAL |
20 |
Maximum number of peaks (final) |
|
MAX_PRELIM_CANDS |
4 |
Maximum number of preliminary candidates (pitch) |
|
CREATE_PIECEWISE_FUNC_LOOP_LIM_SH |
20 |
Create Piecewise function loop limit for short window |
|
CREATE_PIECEWISE_FUNC_LOOP_LIM_SNG |
30 |
Create Piecewise function loop limit for single window |
|
CREATE_PIECEWISE_FUNC_LOOP_LIM_DBL |
60 |
Create Piecewise function loop limit for double window |
|
swSUM_FRACTION |
0x799A |
Sum fraction |
|
swAMP_FRACTION |
0x33F8 |
Amplitude fraction |
|
MAX_BEST_CANDS |
2 |
Maximum number of best candidates (pitch) |
|
N_OF_BEST_CANDS_SHORT |
2 |
Number of best candidates for short window |
|
N_OF_BEST_CANDS_SINGLE |
2 |
Number of best candidates for single window |
|
N_OF_BEST_CANDS_DOUBLE |
2 |
Number of best candidates for double window |
|
N_OF_BEST_CANDS |
6 |
Number of best candidates for all windows |
|
SIZE_SCRATCH_DOPITCH |
1090 |
Scratch memory size for DoPitch() function (This is the actual size required. The declared size in C simulation is 1632) |
|
SIZE_SCRATCH_ADVPROCESS |
825 |
Scratch memory size for DoAdvProcess() function (This is the actual size required. The declared size in C simulation is 1100) |
|
RVC_PITCH_ROM_SIG |
11031 |
Signature for RVC_PITCH_ROM structure |
|
RVC_PITCH_METER_SIG |
21053 |
Signature for RVC_PITCH_METER structure |
4.5.2 Description of fixed tables used in the C-code
This section contains a listing of all fixed tables sorted by source file name and table name. All table data is declared as Word16.
Table 6a: Fixed tables for AFE
|
File |
Table Name |
Length |
Description |
|
16kHzProcessing_B.c |
table_pow2 |
33 |
Table for square root |
|
LambdaNSEx2 |
100 |
Table used to compute first 100 LambdaNSE |
|
|
dp02_h |
59 |
MSB of QMF filter coefficients |
|
|
dp02_l |
43 |
LSB of QMF filter coefficients |
|
|
PostProc_B.c |
targetLMS16 |
12 |
Target for blind equalization |
|
ComCeps_B.c |
HalfHamming16 |
100 |
Hamming window coefficients |
|
CosMatrix16 |
144 |
Inverse cosinus coefficients at 8Khz (not used at 16khz) |
|
|
CosMatrix16_16khz |
156 |
Inverse cosinus coefficients at 16Khz |
|
|
pondMelFilter |
309 |
Mel bank coefficients |
|
|
ff4nrFix16_B.c |
tabSin |
64 |
Sine table |
|
tabCos |
64 |
Cosine table |
|
|
MathFunc.c |
tbInt0 |
48 |
Coefficients for computation of square root |
|
ExtNoiseSup_B.c |
lambda_1divX |
20 |
Computation of 1/N |
|
Hann_sh32_hi |
100 |
MSB of hanning window coefficients (32 bits) |
|
|
Hann_sh32_lo |
100 |
LSB of hanning window coefficients (32 bits) |
|
|
Hann_sh24_hi |
100 |
MSB of hanning window coefficients (24 bits) |
|
|
Hann_sh24_lo |
100 |
LSB of hanning window coefficients (24 bits) |
|
|
pondMelFilterNoise |
157 |
Mel-frequency scale coefficients (applied to the Wiener filter) |
|
|
idctMel16 |
234 |
Mel-warped inverse DCT coefficients |
|
|
pondMelFilter16k |
134 |
Filter bank coefficients at 16Khz |
|
|
M1_LamdaLTE |
8 |
Computation of 1/N |
|
|
M1_LambdaNSEx2 |
100 |
Computation of 2/N |
|
|
M1_LamdaNSE |
9 |
Computation of 1/N |
|
|
mInvLambda16 |
10 |
Comutation od 2/N |
Table 6b: Fixed tables for VQ
|
File |
Table Name |
Length |
Description |
|
coder_VAD.c |
quantizer16kHz_0_1 |
128 |
vq table |
|
quantizer16kHz_2_3 |
128 |
vq table |
|
|
quantizer16kHz_4_5 |
128 |
vq table |
|
|
quantizer16kHz_6_7 |
128 |
vq table |
|
|
quantizer16kHz_8_9 |
128 |
vq table |
|
|
quantizer16kHz_10_11 |
64 |
vq table |
|
|
quantizer16kHz_12_13 |
512 |
vq table |
|
|
quantizer8kHz_0_1 |
128 |
vq table |
|
|
quantizer8kHz_2_3 |
128 |
vq table |
|
|
quantizer8kHz_4_5 |
128 |
vq table |
|
|
quantizer8kHz_6_7 |
128 |
vq table |
|
|
quantizer8kHz_8_9 |
128 |
vq table |
|
|
quantizer8kHz_10_11 |
64 |
vq table |
|
|
quantizer8kHz_12_13 |
512 |
vq table |
|
|
weight16kHz_c0_shift |
1 |
vq weights |
|
|
weight16kHz_c0_norm |
1 |
vq weights |
|
|
weight16kHz_logE |
1 |
vq weights |
|
|
weight8kHz_c0_shift |
1 |
vq weights |
|
|
weight8kHz_c0_norm |
1 |
vq weights |
|
|
weight8kHz_logE |
1 |
vq weights |
|
|
plwQuantLevels[127] |
127*2 |
vq tables for pitch/class quantization |
|
|
ppplwQuantSections[8][3] |
24*2 |
vq tables for pitch/class quantization |
|
|
plwQuantLevels[31] |
31*2 |
vq tables for pitch/class quantization |
|
|
pplwQuantSections[4][3] |
12*2 |
vq tables for pitch/class quantization |
|
|
pswRatioThld_1[4][6] |
24 |
vq tables for pitch/class quantization |
|
|
piMultiLevelIndex[4] |
4 |
vq tables for pitch/class quantization |
|
|
pswRatioThld_2[4][8] |
32 |
vq tables for pitch/class quantization |
|
|
piMultiLevelIndex_2[4] |
4 |
vq tables for pitch/class quantization |
|
|
swAlpha1 |
1 |
pitch/class constants |
|
|
swAlpha2 |
1 |
pitch/class constants |
Table 6c: Fixed Tables for Extension
|
File |
Table name |
Length |
Description |
|
ExtNoiseSup_B.c |
pswPePower |
129 |
Coefficients to compute the pre-emphasis power spectrum |
|
preProc_B.c |
pswHpfCoef |
15 |
High pass filter coefficients |
|
preProc_B.c |
pswLpfCoef |
15 |
Low pass filter coefficients |
|
preProc_B.c |
pswLfeCoef |
3 |
Low frequency emphasis filter coefficients |
|
dsrAfeVad_B.c |
piBurstConst |
20 |
Burst length constants for different SNR’s |
|
dsrAfeVad_B.c |
piHangConst |
20 |
Hang length constants for different SNR’s |
|
dsrAfeVad_B.c |
piVADThld |
20 |
VAD voice metric thresholds for different SNR’s |
|
dsrAfeVad_B.c |
piVMTable |
90 |
Voice metric table as a function of SNR index |
|
dsrAfeVad_B.c |
piSigThld |
20 |
Signal threshold table as a function of SNR |
|
dsrAfeVad_B.c |
piUpdateThld |
20 |
Update threshold table as a function of SNR |
|
dsrAfeVad_B.c |
pswShapeTable |
23 |
Spectral shape correction table |
|
fix_mathlib.c |
coeff_sqrt5_58 |
5 |
Coefficients for computation of square root |
|
fix_mathlib.c |
coeff_sqrt5_78 |
5 |
Coefficients for computation of square root |
|
rvc_pitch_init_B.h |
ROM_astFrac |
312 |
Fractions table |
|
rvc_pitch_init_B.h |
ROM_pstWindowshiftTable |
514 |
Complex exponents table for time shifting in frequency domain |
|
rvc_pitch_init_B.h |
ROM_aswDirichletImag |
8 |
Imaginary part of the Dirichlet kernel |
4.5.3 Static variables used in the C-code
In this section two tables that specify the static variables for the AFE, VQ, and Extension respectively are shown.
Table 7a: AFE static variables
|
Struct Name |
Variable |
Type[Length] |
Description |
|
QMF_FIR |
|||
|
lengthQMF |
Word32 |
QMF Filter length |
|
|
*dp_l |
Word16 |
QMF filter low frequency Coeff |
|
|
*dp_h |
Word16 |
QMF filter high frequency Coeff |
|
|
*T |
Word16 |
Temporary QMF filter buffer |
|
|
T_dec |
Word16 |
Multiplier for T |
|
|
DataFor16kProc_B |
|||
|
FrameLength |
Word32 |
Input Frame length |
|
|
FrameShift |
Word32 |
Shift value for the frame |
|
|
numFramesInBuffer |
Word32 |
Number of frames in buffer |
|
|
SamplingFrequency |
Word32 |
Sampling frequency (8/16) |
|
|
Do16kHzProc |
BOOLEAN |
Flag to enable 16kHz processing |
|
|
*hpBands_B |
Word32 |
Buffer for HP bands |
|
|
hpBandsSize |
Word32 |
hpBands_B buffer size |
|
|
CodeForBands16k_B |
Word32[9] |
HP coding buffer |
|
|
bufferCodeForBands16k_B |
Word32[27] |
buffer used for HP coding |
|
|
codeWeights_B |
Word16[3] |
code Weights buffer |
|
|
bufferCodeWeights_B |
Word16[9] |
buffer used for code Weights |
|
|
* pQMF_Fir |
QMF_FIR |
Pointer to QMF_FIR structure |
|
|
*bufferData16k_B |
Word32 |
temporary buffer to carry QMF LP data |
|
|
bufData16kSize |
Word32 |
16k data buffer size |
|
|
*FirstWindow16k |
MelFB_Window |
pointer to MelFB_Window structure |
|
|
noiseSE16k_B |
Word32[3] |
noise spectrul energy variable |
|
|
noise_dec |
Word16 |
Multiplier for noiseSE16k_B |
|
|
BandsForCoding16k_B |
Word32[9] |
buffer for storing Bands for Coding |
|
|
vadCounter16k |
Word32 |
vad flag counter |
|
|
vad16k |
Word32 |
vad flag |
|
|
nbSpeechFrames16k |
Word32 |
number of speech frames counter |
|
|
hangOver16k |
Word32 |
hang over used for VAD |
|
|
meanEn16k |
Word32 |
mean Energy variable |
|
|
nb_frame_threshold_nse |
Word32 |
threshold NSE for frame |
|
|
lambda_nse |
Word16 |
lambda NSE variable |
|
|
*dataHP_B |
Word32 |
buffer stores QMF HP value |
|
|
dec_16k |
Word16[5] |
Multiplier for dataHP_B buffer |
|
|
BFC_dec |
Word16[1] |
Multiplier for computing bands for coding |
|
|
fb16k_dec |
Word16[3] |
Buffer is used to store multiplier for current and pervious two frames |
|
|
PostProcStructX |
|||
|
weightLMS |
Word32[12] |
Current LMS weight |
|
|
CompCepsStructX |
|||
|
FFTLength |
Word32 |
FFT size |
|
|
Do16khzProc |
Word16 |
Flag to enable 16kHz processing |
|
|
*pData16k |
Word32 |
Pointer to data for 16Khz processing |
|
|
WaveProcStructX |
|||
|
*TeagerFilter16 |
Word32 |
Pointer to teager filter |
|
|
*TeagerWindow32 |
Word32 |
Pointer to teager window |
|
|
TeagerOnset |
Word32 |
Unused |
|
|
FrameLength |
Word32 |
Input frame length |
|
|
ns_var_F |
|||
|
SampFreq |
Word16 |
Sampling frequency (8/16) |
|
|
Do16khzProc |
Word16 |
Flag to enable 16kHz processing |
|
|
buffers.nbFramesInFirstStage |
Word32 |
number of frames in first stage |
|
|
buffers.nbFramesInFirstStage |
Word32 |
number of frames in second stage |
|
|
buffers. nbFramesOutSecondStage |
Word32 |
number of frames out og second stage |
|
|
buffers. FirstStageIn16Buffer |
Word16[180] |
First stage buffer |
|
|
buffers.SecondStageInBuffer32 |
Word32[180] |
Second stage buffer |
|
|
buffers. SecondDecalSig |
Word16[4] |
Shift factor for each sub-frame of second stage buffer |
|
|
prevSamples32.lastSampleIn32 |
Word32 |
Last input sample of DC offset compensation |
|
|
prevSamples32.lastDCOut32 |
Word32 |
last output sample of DC offset compensation |
|
|
prevSamples32. oldShift |
Word16 |
lprevious window shift factor of DC offset compensation |
|
|
spectrum.indexBuffer1 |
Word16 |
Where to enter new PSD for first stage, alternatively 0 and 1 |
|
|
spectrum.indexBuffer2 |
Word16 |
Where to enter new PSD for second stage, alternatively 0 and 1 |
|
|
spectrum.noiseSE1_32 |
Word32[65] |
Noise spectrum estimate for first stage |
|
|
spectrum.noiseSE1_dec |
Word16[65] |
Shift factor for Noise spectrum estimate (first sage) |
|
|
spectrum.noiseSE2_32 |
Word32[65] |
Noise spectrum estimate for second stage |
|
|
spectrum.noiseSE2_dec |
Word16[65] |
Shift factor for Noise spectrum estimate (second sage) |
|
|
spectrum.PSDMeanAntBuffer1 |
Word32[65] |
1st stage PSD Mean buffer for precedent frame |
|
|
spectrum.nSigSE1Ant_dec |
Word16[65] |
Shift factor for PSD Mean buffer for precedent frame (1rst stage) |
|
|
spectrum.PSDMeanAntBuffer2 |
Word32[65] |
2nd stage PSD Mean bufferfor precedent frame |
|
|
spectrum.nSigSE2Ant_dec |
Word16[65] |
Shift factor for PSD Mean buffer for precedent frame (2nd stage) |
|
|
spectrum.denSigSE1_32 |
Word32[65] |
1st stage PSD Mean buffer |
|
|
spectrum. nSigSE1Cur_dec |
Word16[65] |
Shift factor for PSD Mean buffer (1rst stage) |
|
|
spectrum. denSigSE2_32 |
Word32[65] |
2nd stage PSD Mean buffer |
|
|
spectrum. nSigSE2Cur_dec |
Word16[65] |
Shift factor for PSD Mean buffer (2nd stage) |
|
|
vad_data_ns_F. nbFrame |
Word16[2] |
Nubmer of frames (for the 2 stages) |
|
|
vad_data_ns_F. flagVAD |
Word16 |
Vad Flag (1 = SPEECH, 0 = NON SPEECH) |
|
|
vad_data_ns_F.hangOver |
Word16 |
hangover |
|
|
vad_data_ns_F. nbSpeechFrames |
Word16 |
Number of speech frames (used to set hangover) |
|
|
vad_data_ns_F.meanEn32 |
Word32 |
Mean energy for VAD |
|
|
vad_data_ca. flagVAD |
Word16 |
Vad Flag (1 = SPEECH, 0 = NON SPEECH) |
|
|
vad_data_ca.hangOver |
Word16 |
hangover |
|
|
vad_data_ca. nbSpeechFrames |
Word16 |
Number of speech frames (used to set hangover) |
|
|
vad_data_ca.meanEn32 |
Word32 |
Mean energy for VAD |
|
|
vad_data_fd.MelMean |
Word16 |
SpeechQMel (for frame dropping) |
|
|
vad_data_fd.VarMean |
Word32 |
SpeechQVar (for frame dropping) |
|
|
vad_data_fd.AccTest |
Word32 |
SpeechQSpec (for frame dropping) |
|
|
vad_data_fd.AccTest2 |
Word32 |
||
|
vad_data_fd.SpecMean |
Word32 |
SpecMean (for frame dropping) |
|
|
vad_data_fd.MelValues |
Word16[2] |
SpeechQMel (for frame dropping) |
|
|
vad_data_fd.SpecValues |
Word32 |
SpeechQSpec (for frame dropping) |
|
|
vad_data_fd.SpeechInVADQ |
Word16 |
Flag (for frame dropping) |
|
|
vad_data_fd.SpeechInVADQ2 |
Word16 |
Flag (for frame dropping) |
|
|
gainFact.logDenEn1_32 |
Word32[3] |
Denoise frame energy for gain factorization |
|
|
gainFact.lowSNRtrack32 |
Word32 |
Low SNR level for gain factorization |
|
|
gainFact. alfaGF16 |
Word16 |
Wiener filter gain factorization coefficient |
|
|
VADStructX_F |
|||
|
Focus |
Word16 |
Position of circular buffe |
|
|
HangOver |
Word16 |
Hangover length |
|
|
FlushFocus |
Word16 |
Position in circular buffer when emptying at end |
|
|
H_CountDown |
Word16 |
Main hangover countdown |
|
|
V_CountDown |
Word16 |
Short hangover countdown |
|
|
**OutBuffer |
Word32 |
outBuffer pointer pointer |
|
|
*OutBuffer |
Word32[7] |
outBuffer pointer |
|
|
OutBuffer |
Word16[7×15] |
outBuffer |
Table 7b: VQ static variables
|
Struct Name |
Variable |
Type [Length] |
Description |
|
coder_VAD.c |
four_frames[27] |
Word16[27] |
Previous frames used to build multiframe |
|
plwQPHistory[3] |
Word32[3] |
History of Pitch |
|
|
IReliableFlag |
Word16 |
Pitch reliability flag |
Table 7c: Extension static variables
|
Struct Name |
Variable |
Type[Length] |
Description |
|
iFirstFrameFlag |
Word16 |
First frame flag |
|
|
pswUBSpeech |
Word16[200] |
Upper band speech |
|
|
pswDownSampledProcSpeech |
Word16[75] |
Down-sampled processed speech |
|
|
lwCritMax |
Word32 |
Maximum power ratio |
|
|
iOldPitchPeriod |
Word16 |
Old pitch period value |
|
|
iOldFrameNo |
Word16 |
Old frame number |
|
|
PCORR_STATE_be |
s_be |
||
|
lwX1_X1 |
Word32 |
X1*X1 |
|
|
lwZ1_Z1 |
Word32 |
Z1*Z1 |
|
|
lwZ2_Z2 |
Word32 |
Z2*Z2 |
|
|
lwX1_Z1 |
Word32 |
X1*Z1 |
|
|
lwX1_Z2 |
Word32 |
X1*Z2 |
|
|
lwZ1_Z2 |
Word32 |
Z1*Z2 |
|
|
swX1_Sum |
Word16 |
Sum of X1 |
|
|
swZ1_Sum |
Word16 |
Sum of Z1 |
|
|
swZ2_Sum |
Word16 |
Sum of Z2 |
|
|
iBurstConst |
Word16 |
Burst constant |
|
|
iBurstCount |
Word16 |
Burst count |
|
|
iHangConst |
Word16 |
Hang constant |
|
|
iHangCount |
Word16 |
Hang count |
|
|
iVADThld |
Word16 |
VAD threshold |
|
|
iFrameCount |
Word16 |
Frame count |
|
|
iFUpdateFlag |
Word16 |
Forced update flag |
|
|
iHysterCount |
Word16 |
Hysteresis count |
|
|
iLastUpdateCount |
Word16 |
Last update count |
|
|
iSigThld |
Word16 |
Signal threshold |
|
|
iUpdateCount |
Word16 |
Update count |
|
|
iChanEnrgShift |
Word16 |
Channel energy shift |
|
|
iChanNoiseEnrgShift |
Word16 |
Channel noise energy shift |
|
|
pswChanEnrg |
Word16[23] |
Channel energy |
|
|
pswChanNoiseEnrg |
Word16[23] |
Channel noise energy |
|
|
swBeta |
Word16 |
Beta value |
|
|
swSnr |
Word16 |
SNR value |
|
|
NormSw |
pnsLogSpecEnrgLong |
||
|
swMantissa |
Word16[23] |
Mantissa |
|
|
iShift |
Word16[23] |
Shift |
|
|
swC0 |
Word16 |
C0 value |
|
|
swC1 |
Word16 |
C1 value |
|
|
swC2 |
Word16 |
C2 value |
|
|
pswHpfXState |
Word16[6] |
High pass filter input state |
|
|
pswHpfYState |
Word16[12] |
High pass filter output state |
|
|
pswLpfXState |
Word16[6] |
Low pass filter input state |
|
|
pswLpfYState |
Word16[12] |
Low pass filter output state |
|
|
pswLfeXState |
Word16 |
Low frequency emphasis filter input state |
|
|
pswLfeYState |
Word16[2] |
Low frequency emphasis filter output state |