C.10 Experiments 4: Influence of Input Level, Voice Activity Detection and Discontinuous Transmission (CCR)

26.0773GPPApplication to the Adaptive Multi-Rate (AMR) speech encoderMinimum performance requirements for Noise SuppresserRelease 17TS

C.10.1 Introduction

This experiment is designed to test Requirements in the associated Clause in the Recommended Minimum Performance Requirements Specification ([1], TS 3GPP TS 06.77[4]). Specifically, the AMR with noise suppression should, in a certain number of conditions, be preferred to the AMR without noise suppression in a background noise environment and should provide a reasonable level of SNR improvement.

C.10.2 Test Factors and Conditions

Three types of background noise will be used, at two different SNRs:

A car noise that is stationary both in level and in spectrum.

– A street noise that is non-stationary in level, but fairly stationary in spectrum.

– A babble noise that is fairly stationary in level, but non-stationary in spectrum.

The factors and conditions to be used in Experiment 4 are presented in Table 10.2. The expanded set of test conditions is given in Clause 10.12.

Table 10.2: Factors and conditions for Experiment 4

Main Codec Conditions

#

Notes

Noise Suppressor Candidates

1

NS algorithm under test

Codec

1

AMR

Codec Modes

1

12.2 kbit/s rate

BERs

0

Clear channel, no transmission errors

Input level

3

Nominal: -26dBov; High-level (-16 dBov); Low-level (‑36 dBov)

Acoustic Background Noise

3

Car noise at 6 dB SNR; Street and Babble noise at 9 dB SNR

Input Characteristic

1

GSM transmit filtered

VAD/CNG/DTX

2

ON for all noise/level combinations
OFF for all noise types but only at the nominal level

One VAD/CNG/DTX will be used ; either VAD Option 1 or 2, depending on the implementers choice

Codec references

#

Notes

1

AMR 12.2 kbit/s rate without NS

Other references

#

Notes

Direct

3

Nominal level, GSM transmit filtered

MNRU

6

Nominal level, GSM transmit filtered, Q=30, 24, 21, 18, 12, 6 dB, compared against Q=18 dB

Common Conditions

#

Notes

GSM Channel

0

NO channel model

Number of talkers

4

2 male + 2 female primary talkers

Number of speech samples

28

7 Sentence-pairs/primary talker (6 for Test, 1 for Practice)

Listening Level

1

-15dBPa (79dB SPL) at ERP

Listeners

24

Naive Listeners

Randomizations

6

6 groups of 4 listeners

Rating Scale

1

CCR Instructions

Replications

1

C.10.3 Preliminary Conditions

The following 16 preliminary test conditions are recommended, for presentation, before proceeding to the test samples. The samples shall be presented in the random order given in Table 10.3.

Table 10.3: List of preliminary conditions [TO BE REVISED]

Cond.

Presentation order

Noise

Input level

SNR (dB)

VAD/DTX

Reference

Processed

Speech Sample Number

P1

9

Car

nominal

M1S07

P2

5

Car

nominal

6

Direct

MNRU-12

F1S07

P3

12

Car

nominal

6

MNRU-16

MNRU-12

M2S07

P4

13

Car

nominal

15

F2S07

P5

2

Street

nominal

9

Direct

MNRU-12

M1S07

P6

4

Street

nominal

9

MNRU-16

MNRU-12

F1S07

P7

8

Street

nominal

M2S07

P8

16

Babble

nominal

9

Direct

MNRU-12

F2S07

P9

7

Babble

nominal

9

MNRU-16

MNRU-12

M1S07

P10

1

Babble

nominal

F1S07

P11

11

Car

nominal

6

off

AMR@12.2

AMR@12.2

M2S07

P12

3

Car

nominal

6

on

AMR@12.2

AMR@12.2

F2S07

P13

15

Street

nominal

9

off

AMR@12.2

AMR@12.2

M1S07

P14

6

Street

nominal

9

on

AMR@12.2

AMR@12.2

F1S07

P15

10

Babble

nominal

9

off

AMR@12.2

AMR@12.2

M2S07

P16

14

Babble

nominal

9

on

AMR@12.2

AMR@12.2

F2S07

C.10.4 Speech Material

The source speech material shall be as defined in Clause 6.3 and will consist of the material used during the AMR Noise Suppression Selection phase: Each sample consists of two sentences. Only primary talkers are needed. For the four talkers, the following source material should be prepared:

– Seven samples for each talker, six for the test samples and one for the preliminaries,

– Each sample to be eight seconds long,

– Unique sentences-pairs in each sample (i.e., no repeated across the talkers)

To reduce any speech material effect, the samples for each talker must be unique. For these experiments, these unique stimuli are balanced across all conditions, candidates and subject groups. The noise material and its mix with the speech material should be as defined in Clause 6.8 and Clause 6.3.7 respectively.

C.10.5 Experimental Design

The design is based on a restricted randomization philosophy using six different randomizations, each of which is used with a group of four of the 24 listeners. This means that up to four subjects can perform the experiment simultaneously.

Each listener will hear all of the conditions four times, once with speech from each of the four talkers. Over the experiment as a whole, each of the conditions will be paired with six different samples from each of the four talkers. Each of the six groups of subjects will hear different combinations of source material and condition.

C.10.6 Processing

Every condition is processed with each of the six samples of each of the four primary talkers. Every speech file will be processed through all test conditions.

C.10.7 Randomizations

The test shall be completed using the randomizations provided by the experimenter. There shall be six randomizations for the sub-experiments, one for each group of four subjects. Each randomization shall be balanced across four blocks of 30 stimuli to eliminate long sequences of similar conditions or identical talkers. The sequences shall provide for alternating male-female talkers. Use of these randomizations will allow presentation order to be used as a factor in a global analysis, should that be necessary. The randomization shall be constrained to a randomized block design, which controls practice and fatigue effects that may occur over the course of a test session.

C.10.8 Duration of the Experiment

Each trial consists of an eight-second reference sample + an eight-second test sample + five second voting time, totalling 21 seconds. For each of the four experiments there are 16 preliminary conditions x 21 seconds or 5.6 minutes for an introductory block. Each presentation set within an experiment consists of 60 conditions (A/B+B/A) x 4 talkers x 21 seconds or approximately 1h30min. The total testing time for each experiment will be 9 hours and 34 minutes, if four listeners are tested at one time.

Note that the above calculations do not include the time needed to give the subjects their instructions, or time taken for comfort breaks.

C.10.9 Votes Per Condition

In each of the three experiments, 24 listeners rate every condition with four talkers in each of two presentation orders (A/B and B/A), giving:

(24 subjects x 4 talkers x 2 presentations) = 192 votes per condition

From past experience with CCR tests, this is the minimum number of votes per condition needed to give enough statistical certainty to differentiate the performance of one candidate process from another candidate process over the conditions and against the references.

C.10.10 Test Procedure

Factors important for the experimental environment are specified in Clauses 6.4, 6.5, and 6.6. Comfort breaks should be provided to reduce the effects of subject fatigue.

C.10.11 Opinion Scale

The question asked of the subject is a based on of the CCR Listening Quality Comparison Scale. The listening subjects will judge the quality of the second sample with regard to quality of the first sample. The subjects will listen to each pair of samples and after these have been played, they will be asked to give their comparative opinion. Annex A contains an example of the instructions for the subjects in English. Changes to the instructions may be needed to specify the method of data collection being used (button-press, paper & pencil, etc.).

C.10.12 Test Conditions for Experiment 4

Cond.

Noise

Input level

SNR (dB)

VAD/DTX

Reference

Processed

Speech sample

Codec

number

1

Car

nominal

6

off

AMR@12.2

AMR@12.2

4 5 6 1 2 3

2

Street

nominal

9

off

AMR@12.2

AMR@12.2

4 5 6 1 2 3

3

Babble

nominal

9

off

AMR@12.2

AMR@12.2

4 5 6 1 2 3

4

Car

nominal

6

on

AMR@12.2

AMR@12.2

4 – – 1 – –

5

Car

nominal

6

N/A

Direct

MNRU-12

4 – – 1 – –

6

Car

nominal

6

N/A

MNRU-16

MNRU-12

4 – – 1 – –

4′

Street

nominal

9

on

AMR@12.2

AMR@12.2

– 5 – – 2 –

5′

Street

nominal

9

N/A

Direct

MNRU-12

– 5 – – 2 –

6′

Street

nominal

9

N/A

MNRU-16

MNRU-12

– 5 – – 2 –

4”

Babble

nominal

9

on

AMR@12.2

AMR@12.2

– – 6 – – 3

5”

Babble

nominal

9

N/A

Direct

MNRU-12

– – 6 – – 3

6”

Babble

nominal

9

N/A

MNRU-16

MNRU-12

– – 6 – – 3

7

Car

nominal

6

on

AMR@12.2

AMR/NS@12.2

1 2 3 4 5 6

8

Street

nominal

9

on

AMR@12.2

AMR/NS@12.2

2 3 4 5 6 1

9

Babble

nominal

9

on

AMR@12.2

AMR/NS@12.2

3 4 5 6 1 2

10

Car

low

6

off

AMR@12.2

AMR/NS@12.2

5 6 1 2 3 4

11

Street

low

9

off

AMR@12.2

AMR/NS@12.2

6 1 2 3 4 5

12

Babble

low

9

off

AMR@12.2

AMR/NS@12.2

1 2 3 4 5 6

13

Car

high

6

off

AMR@12.2

AMR/NS@12.2

2 3 4 5 6 1

14

Street

high

9

off

AMR@12.2

AMR/NS@12.2

3 4 5 6 1 2

15

Babble

high

9

off

AMR@12.2

AMR/NS@12.2

5 6 1 2 3 4

16

Car

nominal

15

on

AMR@12.2

AMR/NS@12.2

6 1 2 3 4 5

17

Street

nominal

18

on

AMR@12.2

AMR/NS@12.2

1 2 3 4 5 6

18

Babble

nominal

18

on

AMR@12.2

AMR/NS@12.2

2 3 4 5 6 1

19

Car

low

15

off

AMR@12.2

AMR/NS@12.2

3 4 5 6 1 2

20

Street

low

18

off

AMR@12.2

AMR/NS@12.2

5 6 1 2 3 4

21

Babble

low

18

off

AMR@12.2

AMR/NS@12.2

6 1 2 3 4 5

22

Car

high

15

off

AMR@12.2

AMR/NS@12.2

1 2 3 4 5 6

23

Street

high

18

off

AMR@12.2

AMR/NS@12.2

2 3 4 5 6 1

24

Babble

high

18

off

AMR@12.2

AMR/NS@12.2

3 4 5 6 1 2

25-48

Reversed order of the reference and processed speech samples in cond. 1-24

NOTES

4 talkers are used for all conditions: 2 male and 2 female

6 speech samples (8 s) are used for each talker

– ‘multiple’ conditions "4s", "5s" and "6s" (e.g. 4, 4′ and 4") are only presented to a subset of listeners (e.g. to the first and the fourth groups of randomisation)

C.10.13 Statistical Analysis

The statistics to be reported from this CCR test are the averaged CMOS () scores and the standard deviations () for all the conditions.

Additionally, the requirement in [1, Clause 6.1.4] should be checked using hypothesis tests for the conditions 7-24 if the mean CMOS score is greater than zero (the NS performance is preferred) and greater or equal to zero (the NS performance is equivalent) within a 95 % confidence.

The hypothesis test should be performed using a 1-tailed T-test. The NS algorithm has failed the requirement at level "preferred" for any of test condition if

where

and the subscripts denotes the test condition, is the number of votes, and is the inverse of the Student’s t-distribution with degrees of freedom and probability 0.05.

Similarly, the NS algorithm has failed the requirement at level "equal" if