C.10 Experiments 4: Influence of Input Level, Voice Activity Detection and Discontinuous Transmission (CCR)
26.0773GPPApplication to the Adaptive Multi-Rate (AMR) speech encoderMinimum performance requirements for Noise SuppresserRelease 17TS
C.10.1 Introduction
This experiment is designed to test Requirements in the associated Clause in the Recommended Minimum Performance Requirements Specification ([1], TS 3GPP TS 06.77[4]). Specifically, the AMR with noise suppression should, in a certain number of conditions, be preferred to the AMR without noise suppression in a background noise environment and should provide a reasonable level of SNR improvement.
C.10.2 Test Factors and Conditions
Three types of background noise will be used, at two different SNRs:
A car noise that is stationary both in level and in spectrum.
– A street noise that is non-stationary in level, but fairly stationary in spectrum.
– A babble noise that is fairly stationary in level, but non-stationary in spectrum.
The factors and conditions to be used in Experiment 4 are presented in Table 10.2. The expanded set of test conditions is given in Clause 10.12.
Table 10.2: Factors and conditions for Experiment 4
Main Codec Conditions |
# |
Notes |
Noise Suppressor Candidates |
1 |
NS algorithm under test |
Codec |
1 |
AMR |
Codec Modes |
1 |
12.2 kbit/s rate |
BERs |
0 |
Clear channel, no transmission errors |
Input level |
3 |
Nominal: -26dBov; High-level (-16 dBov); Low-level (‑36 dBov) |
Acoustic Background Noise |
3 |
Car noise at 6 dB SNR; Street and Babble noise at 9 dB SNR |
Input Characteristic |
1 |
GSM transmit filtered |
VAD/CNG/DTX |
2 |
ON for all noise/level combinations One VAD/CNG/DTX will be used ; either VAD Option 1 or 2, depending on the implementers choice |
Codec references |
# |
Notes |
1 |
AMR 12.2 kbit/s rate without NS |
|
Other references |
# |
Notes |
Direct |
3 |
Nominal level, GSM transmit filtered |
MNRU |
6 |
Nominal level, GSM transmit filtered, Q=30, 24, 21, 18, 12, 6 dB, compared against Q=18 dB |
Common Conditions |
# |
Notes |
GSM Channel |
0 |
NO channel model |
Number of talkers |
4 |
2 male + 2 female primary talkers |
Number of speech samples |
28 |
7 Sentence-pairs/primary talker (6 for Test, 1 for Practice) |
Listening Level |
1 |
-15dBPa (79dB SPL) at ERP |
Listeners |
24 |
Naive Listeners |
Randomizations |
6 |
6 groups of 4 listeners |
Rating Scale |
1 |
CCR Instructions |
Replications |
1 |
C.10.3 Preliminary Conditions
The following 16 preliminary test conditions are recommended, for presentation, before proceeding to the test samples. The samples shall be presented in the random order given in Table 10.3.
Table 10.3: List of preliminary conditions [TO BE REVISED]
Cond. |
Presentation order |
Noise |
Input level |
SNR (dB) |
VAD/DTX |
Reference |
Processed |
Speech Sample Number |
P1 |
9 |
Car |
nominal |
M1S07 |
||||
P2 |
5 |
Car |
nominal |
6 |
Direct |
MNRU-12 |
F1S07 |
|
P3 |
12 |
Car |
nominal |
6 |
MNRU-16 |
MNRU-12 |
M2S07 |
|
P4 |
13 |
Car |
nominal |
15 |
F2S07 |
|||
P5 |
2 |
Street |
nominal |
9 |
Direct |
MNRU-12 |
M1S07 |
|
P6 |
4 |
Street |
nominal |
9 |
MNRU-16 |
MNRU-12 |
F1S07 |
|
P7 |
8 |
Street |
nominal |
M2S07 |
||||
P8 |
16 |
Babble |
nominal |
9 |
Direct |
MNRU-12 |
F2S07 |
|
P9 |
7 |
Babble |
nominal |
9 |
MNRU-16 |
MNRU-12 |
M1S07 |
|
P10 |
1 |
Babble |
nominal |
F1S07 |
||||
P11 |
11 |
Car |
nominal |
6 |
off |
AMR@12.2 |
AMR@12.2 |
M2S07 |
P12 |
3 |
Car |
nominal |
6 |
on |
AMR@12.2 |
AMR@12.2 |
F2S07 |
P13 |
15 |
Street |
nominal |
9 |
off |
AMR@12.2 |
AMR@12.2 |
M1S07 |
P14 |
6 |
Street |
nominal |
9 |
on |
AMR@12.2 |
AMR@12.2 |
F1S07 |
P15 |
10 |
Babble |
nominal |
9 |
off |
AMR@12.2 |
AMR@12.2 |
M2S07 |
P16 |
14 |
Babble |
nominal |
9 |
on |
AMR@12.2 |
AMR@12.2 |
F2S07 |
C.10.4 Speech Material
The source speech material shall be as defined in Clause 6.3 and will consist of the material used during the AMR Noise Suppression Selection phase: Each sample consists of two sentences. Only primary talkers are needed. For the four talkers, the following source material should be prepared:
– Seven samples for each talker, six for the test samples and one for the preliminaries,
– Each sample to be eight seconds long,
– Unique sentences-pairs in each sample (i.e., no repeated across the talkers)
To reduce any speech material effect, the samples for each talker must be unique. For these experiments, these unique stimuli are balanced across all conditions, candidates and subject groups. The noise material and its mix with the speech material should be as defined in Clause 6.8 and Clause 6.3.7 respectively.
C.10.5 Experimental Design
The design is based on a restricted randomization philosophy using six different randomizations, each of which is used with a group of four of the 24 listeners. This means that up to four subjects can perform the experiment simultaneously.
Each listener will hear all of the conditions four times, once with speech from each of the four talkers. Over the experiment as a whole, each of the conditions will be paired with six different samples from each of the four talkers. Each of the six groups of subjects will hear different combinations of source material and condition.
C.10.6 Processing
Every condition is processed with each of the six samples of each of the four primary talkers. Every speech file will be processed through all test conditions.
C.10.7 Randomizations
The test shall be completed using the randomizations provided by the experimenter. There shall be six randomizations for the sub-experiments, one for each group of four subjects. Each randomization shall be balanced across four blocks of 30 stimuli to eliminate long sequences of similar conditions or identical talkers. The sequences shall provide for alternating male-female talkers. Use of these randomizations will allow presentation order to be used as a factor in a global analysis, should that be necessary. The randomization shall be constrained to a randomized block design, which controls practice and fatigue effects that may occur over the course of a test session.
C.10.8 Duration of the Experiment
Each trial consists of an eight-second reference sample + an eight-second test sample + five second voting time, totalling 21 seconds. For each of the four experiments there are 16 preliminary conditions x 21 seconds or 5.6 minutes for an introductory block. Each presentation set within an experiment consists of 60 conditions (A/B+B/A) x 4 talkers x 21 seconds or approximately 1h30min. The total testing time for each experiment will be 9 hours and 34 minutes, if four listeners are tested at one time.
Note that the above calculations do not include the time needed to give the subjects their instructions, or time taken for comfort breaks.
C.10.9 Votes Per Condition
In each of the three experiments, 24 listeners rate every condition with four talkers in each of two presentation orders (A/B and B/A), giving:
(24 subjects x 4 talkers x 2 presentations) = 192 votes per condition
From past experience with CCR tests, this is the minimum number of votes per condition needed to give enough statistical certainty to differentiate the performance of one candidate process from another candidate process over the conditions and against the references.
C.10.10 Test Procedure
Factors important for the experimental environment are specified in Clauses 6.4, 6.5, and 6.6. Comfort breaks should be provided to reduce the effects of subject fatigue.
C.10.11 Opinion Scale
The question asked of the subject is a based on of the CCR Listening Quality Comparison Scale. The listening subjects will judge the quality of the second sample with regard to quality of the first sample. The subjects will listen to each pair of samples and after these have been played, they will be asked to give their comparative opinion. Annex A contains an example of the instructions for the subjects in English. Changes to the instructions may be needed to specify the method of data collection being used (button-press, paper & pencil, etc.).
C.10.12 Test Conditions for Experiment 4
Cond. |
Noise |
Input level |
SNR (dB) |
VAD/DTX |
Reference |
Processed |
Speech sample |
|
Codec |
number |
|||||||
1 |
Car |
nominal |
6 |
off |
AMR@12.2 |
AMR@12.2 |
4 5 6 1 2 3 |
|
2 |
Street |
nominal |
9 |
off |
AMR@12.2 |
AMR@12.2 |
4 5 6 1 2 3 |
|
3 |
Babble |
nominal |
9 |
off |
AMR@12.2 |
AMR@12.2 |
4 5 6 1 2 3 |
|
4 |
Car |
nominal |
6 |
on |
AMR@12.2 |
AMR@12.2 |
4 – – 1 – – |
|
5 |
Car |
nominal |
6 |
N/A |
Direct |
MNRU-12 |
4 – – 1 – – |
|
6 |
Car |
nominal |
6 |
N/A |
MNRU-16 |
MNRU-12 |
4 – – 1 – – |
|
4′ |
Street |
nominal |
9 |
on |
AMR@12.2 |
AMR@12.2 |
– 5 – – 2 – |
|
5′ |
Street |
nominal |
9 |
N/A |
Direct |
MNRU-12 |
– 5 – – 2 – |
|
6′ |
Street |
nominal |
9 |
N/A |
MNRU-16 |
MNRU-12 |
– 5 – – 2 – |
|
4” |
Babble |
nominal |
9 |
on |
AMR@12.2 |
AMR@12.2 |
– – 6 – – 3 |
|
5” |
Babble |
nominal |
9 |
N/A |
Direct |
MNRU-12 |
– – 6 – – 3 |
|
6” |
Babble |
nominal |
9 |
N/A |
MNRU-16 |
MNRU-12 |
– – 6 – – 3 |
|
7 |
Car |
nominal |
6 |
on |
AMR@12.2 |
AMR/NS@12.2 |
1 2 3 4 5 6 |
|
8 |
Street |
nominal |
9 |
on |
AMR@12.2 |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
9 |
Babble |
nominal |
9 |
on |
AMR@12.2 |
AMR/NS@12.2 |
3 4 5 6 1 2 |
|
10 |
Car |
low |
6 |
off |
AMR@12.2 |
AMR/NS@12.2 |
5 6 1 2 3 4 |
|
11 |
Street |
low |
9 |
off |
AMR@12.2 |
AMR/NS@12.2 |
6 1 2 3 4 5 |
|
12 |
Babble |
low |
9 |
off |
AMR@12.2 |
AMR/NS@12.2 |
1 2 3 4 5 6 |
|
13 |
Car |
high |
6 |
off |
AMR@12.2 |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
14 |
Street |
high |
9 |
off |
AMR@12.2 |
AMR/NS@12.2 |
3 4 5 6 1 2 |
|
15 |
Babble |
high |
9 |
off |
AMR@12.2 |
AMR/NS@12.2 |
5 6 1 2 3 4 |
|
16 |
Car |
nominal |
15 |
on |
AMR@12.2 |
AMR/NS@12.2 |
6 1 2 3 4 5 |
|
17 |
Street |
nominal |
18 |
on |
AMR@12.2 |
AMR/NS@12.2 |
1 2 3 4 5 6 |
|
18 |
Babble |
nominal |
18 |
on |
AMR@12.2 |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
19 |
Car |
low |
15 |
off |
AMR@12.2 |
AMR/NS@12.2 |
3 4 5 6 1 2 |
|
20 |
Street |
low |
18 |
off |
AMR@12.2 |
AMR/NS@12.2 |
5 6 1 2 3 4 |
|
21 |
Babble |
low |
18 |
off |
AMR@12.2 |
AMR/NS@12.2 |
6 1 2 3 4 5 |
|
22 |
Car |
high |
15 |
off |
AMR@12.2 |
AMR/NS@12.2 |
1 2 3 4 5 6 |
|
23 |
Street |
high |
18 |
off |
AMR@12.2 |
AMR/NS@12.2 |
2 3 4 5 6 1 |
|
24 |
Babble |
high |
18 |
off |
AMR@12.2 |
AMR/NS@12.2 |
3 4 5 6 1 2 |
|
25-48 |
Reversed order of the reference and processed speech samples in cond. 1-24 |
|||||||
NOTES |
4 talkers are used for all conditions: 2 male and 2 female 6 speech samples (8 s) are used for each talker – ‘multiple’ conditions "4s", "5s" and "6s" (e.g. 4, 4′ and 4") are only presented to a subset of listeners (e.g. to the first and the fourth groups of randomisation) |
C.10.13 Statistical Analysis
The statistics to be reported from this CCR test are the averaged CMOS () scores and the standard deviations (
) for all the conditions.
Additionally, the requirement in [1, Clause 6.1.4] should be checked using hypothesis tests for the conditions 7-24 if the mean CMOS score is greater than zero (the NS performance is preferred) and greater or equal to zero (the NS performance is equivalent) within a 95 % confidence.
The hypothesis test should be performed using a 1-tailed T-test. The NS algorithm has failed the requirement at level "preferred" for any of test condition if
where
and the subscripts denotes the test condition,
is the number of votes, and
is the inverse of the Student’s t-distribution with
degrees of freedom and probability 0.05.
Similarly, the NS algorithm has failed the requirement at level "equal" if