8.12 Send speech quality and noise intrusiveness in the presence of ambient noise

26.1323GPPRelease 18Speech and video telephony terminal acoustic test specificationTS

8.12.1 Handset UE

The speech quality in sending for wideband systems is tested based on ETSI TS 103 106 [34]. This test method leads to three MOS-LQOw quality numbers:

N-MOS-LQOw: Transmission quality of the background noise

S-MOS-LQOw: Transmission quality of the speech

G-MOS-LQOw: Overall transmission quality

The test arrangement is given in clause 5.1.5. For connections with handset UE, the measurement is conducted for 8 noise conditions as described in Table 2h. The measurements should be made in the same unique and dedicated call. The noise types shall be presented according to the order specified in Table 2h.

Table 2h: Noise conditions used for ambient noise simulation in handset mode as specified in ES 202 396-1 [35]

Description

File name

Duration

Level

Type

Recording in pub

Pub_Noise_binaural_V2

30 s

L: 75,0 dB(A)

R: 73,0 dB(A)

Binaural

Recording at pavement

Outside_Traffic_Road_binaural

30 s

L: 74,9 dB(A)

R: 73,9 dB(A)

Binaural

Recording at pavement

Outside_Traffic_Crossroads_binaural

20 s

L: 69,1 dB(A)

R: 69,6 dB(A)

Binaural

Recording at departure platform

Train_Station_binaural

30 s

L: 68,2 dB(A)

R: 69,8 dB(A)

Binaural

Recording at the drivers position

Fullsize_Car1_130Kmh_binaural

30 s

L: 69,1 dB(A)

R: 68,1 dB(A)

Binaural

Recording at sales counter

Cafeteria_Noise_binaural

30 s

L: 68,4 dB(A)

R: 67,3 dB(A)

Binaural

Recording in a cafeteria

Mensa_binaural

22 s

L: 63,4 dB(A)

R: 61,9 dB(A)

Binaural

Recording in business office

Work_Noise_Office_Callcenter_binaural

30 s

L: 56,6 dB(A)

R: 57,8 dB(A)

Binaural

1) Before starting the measurements a proper conditioning sequence shall be used. The conditioning sequence shall be comprised of the four additional sentences 1-4 described in ETSI TS 103 106 [34], applied to the beginning of the 16-sentence test sequence.

NOTE: The sequence of speech samples concatenated for the test signal, consisting of alternating talkers in the sending direction, reduces the overall test time but may represent an unrealistic behaviour for certain voice enhancement technologies. Alternative concatenations are for further study.

2) The send speech signal consists of the 16 sentences of speech as described in ETSI TS 103 106 [34] The test signal level is – 1.7 dBPa at the MRP, measured as active speech level according to ITU-T P.56 [37]. Three signals are required for the tests:

– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 106 [34], ETSI EG 202 396‑3 [36]).

– The speech plus undisturbed background noise signal is recorded at the terminal’s microphone position using an omnidirectional measurement microphone with a linear frequency response between 50 Hz and 12 kHz.

– The send signal is recorded at the POI.

3) N-MOS-LQOw, S-MOS-LQOw and G-MOS-LQOw are calculated as described in ETSI TS 103 106 [34] on a per sentence basis and averaged over all 16 sentences. The results shall be reported as average and standard deviation.

4) The measurement is repeated for each ambient noise condition described in Table 2h.

5) The average of the results derived from all ambient noise types is calculated.

8.12.2 Hand-held hands-free UE

For connections with hand-held hands-free UE, when using the simulation method described in TS 103 224 [43], the measurement is conducted for 5 noise conditions as described in Table 2h2. When using the ES 202 396-1 method, the equivalent binaurally recorded noises described in Table 2h2, and available in the source file directory of TS 103 224 [43], are used.

Table 2h2: Noise conditions used for ambient noise simulation in hand-held hands-free mode as specified in TS 103 224 [43], A-weighted

Name

Description

Length

Hands-free Levels

Binaural L

Binaural R

Full-size car 130 km/h (FullSizeCar_130)

HATS and microphone array at co-drivers position

30 s

1: 69,5 dB 2: 68,6 dB

3: 68,6 dB 4: 68,7 dB

5: 68,8 dB 6: 68,8 dB

7: 69,2 dB 8: 69,7 dB

68.7 dB

70.7 dB

Crossroadnoise (Crossroadnoise)

HATS and microphone array standing outside near a crossroad

30 s

1: 69,9 dB 2: 69,6 dB

3: 69,6 dB 4: 69,9 dB

5: 69,6 dB 6: 69,5 dB

7: 69,6 dB 8: 69,7 dB

70.8 dB

71.6 dB

Cafeteria (Cafeteria)

HATS and microphone array inside a cafeteria

30 s

1: 69,0 dB 2: 69,7 dB

3: 69,6 dB 4: 69,8 dB

5: 69,5 dB 6: 69,5 dB

7: 69,7 dB 8: 70,0 dB

69.8 dB

70.3 dB

Sales Counter (SalesCounter)

HATS and microphone array in a supermarket

30 s

1: 65,5 dB 2: 65,3 dB

3: 65,2 dB 4: 65,5 dB

5: 65,6 dB 6: 65,3 dB

7: 65,2 dB 8: 65,3 dB

66.7 dB

66.6 dB

Callcenter 2 (Callcenter)

HATS and microphone array in business office

30 s

1: 59,3 dB 2: 59,3 dB

3: 59,5 dB 4: 59,6 dB

5: 59,4 dB 6: 59,3 dB

7: 59,3 dB 8: 59,5 dB

60,2 dB

60,0 dB

1) Before starting the measurements a proper conditioning sequence shall be used. The conditioning sequence shall be comprised of the four additional sentences 1-4 described in ETSI TS 103 106 [34], applied to the beginning of the 16-sentence test sequence.

NOTE: The sequence of speech samples concatenated for the test signal, consisting of alternating talkers in the sending direction, reduces the overall test time but may represent an unrealistic behaviour for certain voice enhancement technologies. Alternative concatenations are for further study.

2) The send speech signal consists of the 16 sentences of speech as described in ETSI TS 103 106 [34]. For connections with a hand-free UE, the test signal level is +1.3dBPa at the MRP, measured as active speech level according to ITU-T P.56 [37]. Three signals are required for the tests:

– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 106 [34], ETSI EG 202 396‑3 [36]).

– The speech plus undisturbed background noise signal is recorded at the terminal’s microphone position using an omnidirectional measurement microphone with a linear frequency response between 50 Hz and 12 kHz.

– The send signal is recorded at the POI.

3) N-MOS-LQOw, S-MOS-LQOw and G-MOS-LQOw are calculated as described in ETSI TS 103 106 [34] on a per sentence basis and averaged over all 16 sentences. The results shall be reported as average and standard deviation.

4) The measurement is repeated for each ambient noise condition described in Table 2h2.

5) The average of the results derived from all ambient noise types is calculated.

8.12.3 Electrical interface UE

The speech quality in sending for narrowband systems is tested based on ETSI TS 103 106 [34]. This test method leads to three MOS-LQOw quality numbers:

N-MOS-LQOw: Transmission quality of the background noise

S-MOS-LQOw: Transmission quality of the speech

G-MOS-LQOw: Overall transmission quality

For the measurement of electrial interface UE, pre-recorded noisy speech signals according to Annex B of Recommendation ITU‑T P.381 [53] shall be used. These noisy test sequences are available for the eight noise types described in Table 2h and were captured at the electrical output of a representative analogue headset. The corresponding speech level at MRP was calibrated to -1.7 dBPa, as described in clause 8.12.1. All test signals also include the proper conditioning sequence described in ETSI TS 103 106 [34], which is applied to the beginning of the 16-sentence test sequence.

Annex B of Recommendation ITU‑T P.381 [53] also provides the corresponding unprocessed reference speech signals, which are necessary for the calculation of S-MOS, N-MOS and G-MOS according to [b-ETSI TS 103 106]. These signals were recorded with a omnidirectional measurement microphone close to the input microphone of the representative headset.

1) The test arrangement is given in clause 5.1.6. For analogue interfaces, the noisy test sequences according to Annex B of Recommendation ITU‑T P.381 [53] shall be calibrated in a way that -26 dBov correspond to ‑60 dBV. For digital interfaces, -26 dBov shall correspond to -16 dBm0.

2) The noisy test sequence is inserted into electrical interface UE and then recorded at the POI.

3) N-MOS-LQOw, S-MOS-LQOw and G-MOS-LQOw are calculated as described in ETSI TS 103 106 [34] (wideband mode) on a per sentence basis and averaged over all 16 sentences. The results shall be reported as average and standard deviation. Three signals are required for the tests:

– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 106 [34], ETSI EG 202 396‑3  [36]).

– The speech plus undisturbed background noise signal. For each noisy test signal, a corresponding signal is available in Annex B of Recommendation ITU‑T P.381 [53] as well.

– The send signal is recorded at the POI.

4) The measurement is repeated for each ambient noise condition described in Table 2h. For each of these noise types, a corresponding test signal is available in Annex B of Recommendation ITU‑T P.381 [53].

5) The average of the results derived from all ambient noise types is calculated.