C.8 Experiments 2a, 2b & 2c: No degradation of Speech and no Undesirable Effects in Residual Noise in Conditions with Background Noise (ACR)
26.0773GPPApplication to the Adaptive Multi-Rate (AMR) speech encoderMinimum performance requirements for Noise SuppresserRelease 17TS
C.8.1 Introduction
These ACR experiments are designed to test the requirement "No degradation of Speech and no Undesirable Effects in Residual Noise" in the Minimum Performance Requirements for Noise Suppresser Application to the AMR Speech Encoder, [1],. These ACR experiments will be run for three types of acoustic background noise.
C.8.2 Test Factors and Conditions
The ACR test will be run for the following three types of acoustic background noise:
– A car noise that is stationary both in level and in spectrum.
– A street noise that is non-stationary in level but fairly stationary in spectrum.
– A babble noise that is fairly stationary in level but non-stationary in spectrum.
This results in a total of three ACR experiments with the different noise types in separate experiments. Within each experiment, a low, a medium and a high SNR level will be tested. The values for the low SNR are SNR_C = 6 dB for the car noise, SNR_S = 9 dB for the street noise, and SNR_B = 9 dB for the babble noise. The higher SNR will be equal to SNR + 6 dB and SNR + 12 dB for all three noise types. The noise samples will have been recorded in scenarios representative of the respective low SNR value for each noise type (i.e. SNR = 6 or 9 dB).
All three experiments are run at AMR bit rate 12.2 kbit/s and 5.9 kbit/s.
The following table shows the testing factors to be used in these experiments. A full list of test conditions is given in Clause 8.12.
Table 8.2.1: Factors and conditions for Experiments 2a, 2b, 2c
Main Codec Conditions |
# |
Notes |
Noise Suppresser Algorithms |
1 |
|
Codec |
1 |
AMR |
Codec Modes |
2 |
12.2 kbps rate, 5.9 kbps rate |
BERs |
0 |
Clear channel, no transmission errors |
Input level |
3 |
nominal (high, low): -26dB (-16 dB, -36 dB) relative to OVL |
Acoustic Background Noise |
3 |
Static Car @ 6dB, 12dB, 18dB Street @ 9dB, 15dB, 21dB Babble @ 9dB, 15dB, 21dB |
Input Characteristic |
1 |
GSM Filtered |
VAD/CNG/DTX |
2 |
ON only at the nominal level, medium SNR values, zero value of Ideal NS One VAD/CNG/DTX will be used ; either VAD Option 1 or 2, depending on the implementers choice |
Codec references |
# |
Notes |
All Experiments |
1 |
AMR wo/ NS |
Other references |
# |
Notes |
Direct |
Nominal level, GSM Filtered |
|
MNRU, Exp 2a, 2b, 2c |
5 |
Nominal level, with background noise, GSM Filtered, Q= 6, 12, 18, 24, 30dB |
Ideal Noise Suppression |
6 |
3 levels for each SNR |
Common Conditions |
# |
Notes |
GSM Channel |
0 |
NO channel model |
Number of talkers |
4 |
2 male + 2 female |
Number of speech samples |
28 |
6/ talker for the main test + 1/ talker for the Practice session |
Listening Level |
1 |
-15dBPa (79dB SPL) at ERP |
Listeners |
24 |
Naive Listeners |
Randomizations |
6 |
6 groups of 4 listeners |
Rating Scale |
1 |
Modified ACR Instructions |
Replications |
1 |
Original Presentation Only |
C.8.3 Preliminary Conditions
The following 16 preliminary test conditions are recommended.
Table 8.3.1: List of preliminary conditions
Cond. |
Presentation order |
SNR value |
Ideal NS (dB) |
Codec |
Talker and Sample Number |
P1 |
5 |
SNR |
– |
Direct |
M1S07 |
P2 |
1 |
SNR |
– |
MNRU-12 |
M2S07 |
P3 |
3 |
SNR |
– |
AMR@12.2 |
M1S07 |
P4 |
7 |
SNR |
7 |
AMR@12.2 |
M2S07 |
P5 |
6 |
SNR+6 |
7 |
AMR@12.2 |
F1S07 |
P6 |
2 |
SNR+12 |
7 |
AMR@12.2 |
F2S07 |
P7 |
4 |
SNR |
– |
AMR@5.9 |
F1S07 |
P8 |
8 |
SNR+12 |
– |
AMR@5.9 |
F2S07 |
P9 |
14 |
SNR |
– |
Direct |
F1S07 |
P10 |
10 |
SNR |
– |
MNRU-12 |
F2S07 |
P11 |
12 |
SNR |
– |
AMR@12.2 |
F1S07 |
P12 |
16 |
SNR |
7 |
AMR@12.2 |
F2S07 |
P13 |
13 |
SNR+6 |
7 |
AMR@12.2 |
M1S07 |
P14 |
9 |
SNR+12 |
7 |
AMR@12.2 |
M2S07 |
P15 |
11 |
SNR |
– |
AMR@5.9 |
M1S07 |
P16 |
15 |
SNR+12 |
– |
AMR@5.9 |
M2S07 |
C.8.4 Speech Material
The speech material should be as defined in Clause 6.4 – Long Sentence Quads, with each sample containing 4 sentences. For each test condition there are:
– 6 samples / talker, each sample 16sec long w/ 4 sentences
– 24 unique sentences / talker
For the practice conditions there are:
– 1 sample / talker
– 4 unique sentences / talker
To reduce any speech material effect, each talker sample must be unique. For these experiments, the unique samples are not balanced across all condition, candidates and subject groups. The same sample numbers for each talker are used for common conditions within a subject group and changed across subject groups. For a given language, the same speech material must be used for the three experiments 2a, 2b and 2c.
Speech samples numbered from 01 to 06 should be used for the test conditions; speech samples numbered as 07 should be used for the Practice session.
The noise material and its mix with the speech material should be as defined in Clause 6.10 and Clause 8.2.
C.8.5 Experimental Design
The design is based on a restricted randomization philosophy using 6 different randomizations, each one covered by a group of 4 of the 24 subjects. This means that up to 4 subjects can perform the experiment simultaneously.
Each subject will hear all of the conditions four times, once with speech from each of the four talkers. Over the experiment as a whole, each of the conditions will be paired with six different samples from each of the four talkers. Each of the six groups of subjects will hear different combinations of source material and condition.
C.8.6 Processing
Every condition has to be processed for each of the six stimuli of each of the four primary talkers. The actual samples used for each condition by each subject group are presented in Clause 8.12 Test Conditions.
C.8.7 Randomizations
Separate randomizations for each of the six subject groups shall be provided to reduce order effects and to minimize differences between the laboratories. There shall be six randomizations for the sub-experiments, one for each subject group. The same randomizations will be used for the three experiments (2a, 2b and 2c). Each one will therefore be used by four of the 24 subjects. Each randomization shall be balanced across 4 blocks of 36 stimuli to eliminate long sequences of similar conditions or identical talkers. The sequences shall provide for alternating male-female talkers.
C.8.8 Duration of the ACR Experiments 2a, 2b, and 2c
Each stimuli is 16 s speech sample + 5 s voting time or 21 seconds. For each of the three experiments there are 16 preliminary conditions x 21 seconds or 5.6 minutes for an introductory block. The test consists of 36 conditions x 4 talkers x 21 seconds or 50.4 minutes, presented as three 16.8 minute blocks of 36 stimuli for 56 minutes testing time / subject group. The 6 groups of 4 subjects require 4 hours and 24 minutes total testing time
To reduce the effects of subject fatigue, the three blocks should be separated by short comfort breaks.
Note that the above calculations do not include the time needed to give the subjects their instructions, or for comfort breaks.
C.8.9 Votes Per Condition
In each of the three experiments, every condition will have 24 subjects vote on one stimulus from each of four talkers, giving:
(24 subjects x 4 talkers) = 96 votes per condition
From past experience of ACR tests, this is the minimum number of votes per condition needed to give enough statistical certainty to differentiate the performance of one candidate process from another candidate process over the conditions and against the references.
C.8.10 Test Procedure
Factors important for the experimental environment are specified in clause 6.5 and 6.6. As specified in clause 9.8, comfort breaks should be provided to reduce the effects of subject fatigue.
C.8.11 Opinion Scale
The question asked of the subject is a modification of the ACR Listening Quality Scale. The specific wording is designed to evaluate both the level of distortion of the speech and the presence of artefacts in the residual background noise signal. The subjects will listen to each sample and after it has completed they will be asked to give their opinion.
Annex A contains an example of the instructions for the subjects in English. The instructions in Annex A contain a modified version of the ACR instructions. They are aimed at focusing the subjects to rate artefacts introduced by the NS device. The test administrator should have the freedom to provide guidance to the subjects to reinforce this point, provided that such instructions are consistent across all 24 subjects. This is particularly important for tests not performed in English. Any additional instructions given to the subjects should be reported as an integral part of test reports.
C.8.12 Test Conditions for Experiments 2a, 2b and 2c
Cond. |
Input level |
SNR value |
Ideal NS (dB) |
VAD/DTX |
Codec |
Speech sample number (6 sequences) |
|
1 |
nominal |
SNR |
– |
N/A |
Direct |
4 5 6 1 2 3 |
|
2 |
nominal |
SNR |
– |
N/A |
MNRU-30 |
4 5 6 1 2 3 |
|
3 |
nominal |
SNR |
– |
N/A |
MNRU-24 |
4 5 6 1 2 3 |
|
4 |
nominal |
SNR |
– |
N/A |
MNRU-18 |
4 5 6 1 2 3 |
|
5 |
nominal |
SNR |
– |
N/A |
MNRU-12 |
4 5 6 1 2 3 |
|
6 |
nominal |
SNR |
– |
N/A |
MNRU-6 |
4 5 6 1 2 3 |
|
7 |
nominal |
SNR |
– |
off |
AMR@12.2 |
1 2 3 4 5 6 |
|
8 |
nominal |
SNR |
4 |
off |
AMR@12.2 |
1 2 3 4 5 6 |
|
9 |
nominal |
SNR |
7 |
off |
AMR@12.2 |
1 2 3 4 5 6 |
|
10 |
nominal |
SNR |
– |
off |
AMR@5.9 |
1 2 3 4 5 6 |
|
11 |
high |
SNR |
– |
off |
AMR@12.2 |
1 2 3 4 5 6 |
|
12 |
high |
SNR |
– |
off |
AMR@5.9 |
1 2 3 4 5 6 |
|
13 |
nominal |
SNR+6 |
– |
off |
AMR@12.2 |
2 3 4 5 6 1 |
|
14 |
nominal |
SNR+6 |
4 |
off |
AMR@12.2 |
2 3 4 5 6 1 |
|
15 |
nominal |
SNR+6 |
7 |
off |
AMR@12.2 |
2 3 4 5 6 1 |
|
16 |
nominal |
SNR+6 |
– |
off |
AMR@5.9 |
2 3 4 5 6 1 |
|
17 |
nominal |
SNR+6 |
– |
on |
AMR@12.2 |
2 3 4 5 6 1 |
|
18 |
nominal |
SNR+6 |
– |
on |
AMR@5.9 |
2 3 4 5 6 1 |
|
19 |
low |
SNR+6 |
– |
off |
AMR@12.2 |
2 3 4 5 6 1 |
|
20 |
low |
SNR+6 |
– |
off |
AMR@5.9 |
2 3 4 5 6 1 |
|
21 |
nominal |
SNR+12 |
– |
off |
AMR@12.2 |
3 4 5 6 1 2 |
|
22 |
nominal |
SNR+12 |
4 |
off |
AMR@12.2 |
3 4 5 6 1 2 |
|
23 |
nominal |
SNR+12 |
7 |
off |
AMR@12.2 |
3 4 5 6 1 2 |
|
24 |
nominal |
SNR+12 |
– |
off |
AMR@5.9 |
3 4 5 6 1 2 |
|
25 |
nominal |
SNR |
– |
off |
AMR/NS@12.2 |
1 2 3 4 5 6 |
|
26 |
nominal |
SNR |
– |
off |
AMR/NS@5.9 |
1 2 3 4 5 6 |
|
27 |
nominal |
SNR+6 |
– |
off |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
28 |
nominal |
SNR+6 |
– |
off |
AMR/NS@5.9 |
2 3 4 5 6 1 |
|
29 |
nominal |
SNR+12 |
– |
off |
AMR/NS@12.2 |
3 4 5 6 1 2 |
|
30 |
nominal |
SNR+12 |
– |
off |
AMR/NS@5.9 |
3 4 5 6 1 2 |
|
31 |
nominal |
SNR+6 |
– |
on |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
32 |
nominal |
SNR+6 |
– |
on |
AMR/NS@5.9 |
2 3 4 5 6 1 |
|
33 |
low |
SNR+6 |
– |
off |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
34 |
low |
SNR+6 |
– |
off |
AMR/NS@5.9 |
2 3 4 5 6 1 |
|
35 |
high |
SNR |
– |
off |
AMR/NS@12.2 |
1 2 3 4 5 6 |
|
36 |
high |
SNR |
– |
off |
AMR/NS@5.9 |
1 2 3 4 5 6 |
|
NOTE: |
Experiment 2a: Car noise with SNR = SNR_C = 6 dB, Experiment 2b: Street noise with SNR = SNR_S = 9 dB Experiment 2c: Babble noise with SNR = SNR_B = 9 dB |
C.8.13 Statistical Analysis
The statistics to be reported from this ACR test are the averaged MOS () scores and the standard deviations (
) for all the conditions.
Additionally, the requirement in [1, Clause 6.1.3] should be checked using a hypothesis test for the conditions 25-36 if the mean MOS score is greater or equal to the MOS score for the corresponding equivalent (all being equal except NS activated) reference condition for AMR without NS within a 95 % confidence.
The hypothesis test should be performed using a 2-tailed T-test. The NS algorithm has failed the requirement if, for any of test condition,
where
and the subscripts and
denotes the test condition and corresponding reference condition, respectively,
is the number of votes, and
is the inverse of the Student’s t-distribution with
degrees of freedom and probability 0.05.