A.1 Notations
26.0773GPPApplication to the Adaptive Multi-Rate (AMR) speech encoderMinimum performance requirements for Noise SuppresserRelease 17TS
The following notations are used in this document:
– The operator AMR() corresponds to applying the AMR speech encoder and decoder on the input.
– The operator NR() corresponds to applying the NS algorithm, and the AMR speech encoder and decoder on the input.
– The clean speech signals are referred to as si , i = 1 to I.
– The noise signals are referred to as nj , j = 1 to J.
– The noisy speech test signals are referred to as dij = ij(SNR) nj+ si, i = 1 to I, j = 1 to J, where dij is built by adding si and nj with a pre-specified SNR as presented below.
– The processed signal are referred to as yij = NR (dij).
– The reference signal in the calculations shall be either the noisy speech test signal dij itself or dij processed by the AMR speech codec without NS processing. The latter signal will be referred to as cij = AMR (dij), i = 1 to I, j = 1 to J. The relevant reference signal will be indicated in the formulation of each objective measure below.
– The notation Log() indicates the decimal logarithm.
– ij(SNR) is the scaling factor to be applied to the background noise signal ni in order to have a ratio SNR (in dB) between the clean speech signal si and nj. The scaling of the input speech and noise signals is to be carried according to the following procedure:
1) The clean speech material is scaled to a desired dBov level with the ITU-T Recommendation P.56 [7] speech voltmeter, one file at a time, each file including a sequence of one to four utterances from one speaker.
2) A silence period of 2 s is inserted in the beginning of each of the resulting files to make up augmented clean speech files.
3) Within each noise type and level, a noise sequence is selected for every speech utterance file, each with the same length as the corresponding speech files, and each noise sequence is stored in a separate file.
4) Each of the noise sequences is scaled to a dBov level leading to the SNR condition corresponding to the ij(SNR) value in each of the test cases by applying the RMS level based scaling according to the P.56 [7] recommendation.
– The determination of which frames contain active speech is to be carried out with reference to the ITU-T Recommendation P.56 [7] active speech level measurement and is related to the classification of the frames into the presented speech power classes which is explained below.