A.1 Decoder Test
26.4443GPPCodec for Enhanced Voice Services (EVS)Release 17Test sequencesTS
A.1.1 General Considerations
The reference PCM signals are taken from the decoded floating-point test sequences of this specification. The PCM signal under test are obtained by running the floating-point bit-stream included in this specification through the Decoder under Test (Figure A.1). The reference decoder is the floating-point code of TS 26.443 [8].
Figure A.1: Flow diagram for the decoder test using signal-based metrics
All metrics are calculated on the reference PCM signal and the PCM signal under test based on 20ms frames. The frames of the two signals will be time aligned, this means the delay compensation in EVS encoder and decoder remains ON (the default configuration). Furthermore, the frame processing is aligned with the encoded frame by adding the decoder delay. Table A.1 shows the delay values used for the different sampling frequencies.
Table A.1: Delay used for alignment of processing frames with encoded frames
|
Sampling frequency |
8000 Hz |
16000 Hz |
32000 Hz |
48000 Hz |
|
Delay (samples) |
10 |
37 |
74 |
111 |
The number of samples for a 20ms frame size is defined by , where represents the sampling rate.
The PCM signals and should be scaled between -1 and 1.
A.1.2 Metrics
A.1.2.1 RMS Error Threshold
The RMS method is derived from the decoder conformance used in ISO/IEC 14496-26 [10]. The RMS error is calculated for each 20ms frame and compared to a threshold according to:
The value chosen for the RMS error threshold is to assume change on the last bit of the audio signal:
with
A.1.2.2 Signal to Noise Ratio (SNR)
The segmental SNR method is derived from the decoder conformance used in ISO/IEC 14496-26 [10]. For each 20 ms segment, the following values need to be calculated:
Energy of reference signal:
Noise energy:
Signal to noise ratio with
As EVS is a switched codec containing a LPC based speech coder and a MDCT based transform coder, the SNR values vary significantly depending on the used coding mode. Therefore, a constant threshold for the SNR is not suitable but instead, a reference value per frame and test vector should be specified. The SNR should be compared against the thresholds by
where is a 20 ms frame index and is the test vector index
The set of SNR reference values is included in the zip file. This set was obtained using the reference implementations listed in clause A.4.
A.1.2.3 Spectral Distortion
The spectral distortion method can be conducted on a 20 ms frame base by the following steps:
Calculate the absolute FFT spectrum of and using a Hanning window
with
The 32768 is due to MATLAB scaling and to align to 16 bit PCM C-code. This scaling is dependent on the input value range.
For all spectral bins the distortion d is calculated according to the following pseudo code:
cnt=0
d=0
for k=1..N/2-1
if (==0 && ==0)
X_Y = 1;
Y_X = 1;
else
if (==0)
X_Y = 0;
Y_X = 2;
else if (==0)
X_Y = 2;
Y_X = 0;
else
X_Y = ( * ) / ( * );
Y_X = ( * ) / ( * );
end
end
COSH = (X_Y + Y_X – 2)/2;
d = d + COSH;
cnt = cnt+1;
end
d = d/cnt;
The distortion value is to be compared against a threshold . The frame will be considered as passed if
with
A.1.3 Analysis Flow and Reporting
The three metrics are computed in a specific order, as shown in Figure A.2. Once a frame passes a metric, the process is stopped and the next frame is analysed. The SNR metric is computed on the frames failing the RMS error criteria. Similarly, the Spectral Distortion metric is computed on the frames failing the SNR criteria.
Figure A.2: Flow chart for decoder tool
In a file one or two frames could slightly be above the threshold. To avoid relaxing the threshold, a constraint on the number of frames failing per file has been added as an additional criterion.
if number_of_frames_failing =< THRESH_GOOD_FRAMES_TO_PASS * number_of_frame_in file, the test signal will be considered equivalent to the reference signal.
All the test sequences need to pass for the implementation to be conformant.
In addition to the number of fail/pass test sequences, the statistics from the three methods should be displayed. Table A.2 shows an example of reporting.
Table A.2: Template for result presentation
|
RMS |
WSNR |
Spectral Distortion |
|
|
Number of frames tested |
|||
|
Number of frames passing |
|||
|
Number of frames failing |
|||
|
Ratio of frames passing |
|||
|
Ratio of frames failing |
As part of conformance criteria, thresholds are set for the ratio of frames passing with RMS and WNR tests (Ratio_RMSframespassing_and RatioWSNRframespassing respectively).
The list of the thresholds used in decoder test are summarized in table A.3.
Table A.3: List of thresholds
|
Thresholds |
Description |
value |
|
SNRHEADROOM |
Headroom compare to the Tsnr threshold |
3 dB |
|
CDSNRMAX |
Limit of SNR for the spectral distortion test |
0 dB |
|
CDSNRHEADROOM |
Headroom compare to Tsnr threshold for the spectral distortion test |
10 dB |
|
Tsd |
Threshold for the spectral distance |
6.6 |
|
THRESH_GOOD_FRAMES_TO_PASS |
Factor for number of failing frame per file |
0.005 |
|
Ratio_RMSframespassing |
Minimal percentage for frames passing RMS error test |
47% |
|
RatioWSNRframespassing |
Minimal percentage for frames passing WSNR test |
95% |