8.12 Send speech quality and noise intrusiveness in the presence of ambient noise

26.1323GPPRelease 18Speech and video telephony terminal acoustic test specificationTS

Tools: ARFCN - Frequency Conversion for 5G NR/LTE/UMTS/GSM

8.12.1 Handset UE

The speech quality in sending for wideband systems is tested based on ETSI TS 103 106 [34]. This test method leads to three MOS-LQOw quality numbers:

N-MOS-LQOw: Transmission quality of the background noise

S-MOS-LQOw: Transmission quality of the speech

G-MOS-LQOw: Overall transmission quality

The test arrangement is given in clause 5.1.5. For connections with handset UE, the measurement is conducted for 8 noise conditions as described in Table 2h. The measurements should be made in the same unique and dedicated call. The noise types shall be presented according to the order specified in Table 2h.

Table 2h: Noise conditions used for ambient noise simulation in handset mode as specified in ES 202 396-1 [35]

Description	File name	Duration	Level	Type
Recording in pub	Pub_Noise_binaural_V2	30 s	L: 75,0 dB(A) R: 73,0 dB(A)	Binaural
Recording at pavement	Outside_Traffic_Road_binaural	30 s	L: 74,9 dB(A) R: 73,9 dB(A)	Binaural
Recording at pavement	Outside_Traffic_Crossroads_binaural	20 s	L: 69,1 dB(A) R: 69,6 dB(A)	Binaural
Recording at departure platform	Train_Station_binaural	30 s	L: 68,2 dB(A) R: 69,8 dB(A)	Binaural
Recording at the drivers position	Fullsize_Car1_130Kmh_binaural	30 s	L: 69,1 dB(A) R: 68,1 dB(A)	Binaural
Recording at sales counter	Cafeteria_Noise_binaural	30 s	L: 68,4 dB(A) R: 67,3 dB(A)	Binaural
Recording in a cafeteria	Mensa_binaural	22 s	L: 63,4 dB(A) R: 61,9 dB(A)	Binaural
Recording in business office	Work_Noise_Office_Callcenter_binaural	30 s	L: 56,6 dB(A) R: 57,8 dB(A)	Binaural

1) Before starting the measurements a proper conditioning sequence shall be used. The conditioning sequence shall be comprised of the four additional sentences 1-4 described in ETSI TS 103 106 [34], applied to the beginning of the 16-sentence test sequence.

NOTE: The sequence of speech samples concatenated for the test signal, consisting of alternating talkers in the sending direction, reduces the overall test time but may represent an unrealistic behaviour for certain voice enhancement technologies. Alternative concatenations are for further study.

2) The send speech signal consists of the 16 sentences of speech as described in ETSI TS 103 106 [34] The test signal level is – 1.7 dBPa at the MRP, measured as active speech level according to ITU-T P.56 [37]. Three signals are required for the tests:

– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 106 [34], ETSI EG 202 396‑3 [36]).

– The speech plus undisturbed background noise signal is recorded at the terminal’s microphone position using an omnidirectional measurement microphone with a linear frequency response between 50 Hz and 12 kHz.

– The send signal is recorded at the POI.

3) N-MOS-LQOw, S-MOS-LQOw and G-MOS-LQOw are calculated as described in ETSI TS 103 106 [34] on a per sentence basis and averaged over all 16 sentences. The results shall be reported as average and standard deviation.

4) The measurement is repeated for each ambient noise condition described in Table 2h.

5) The average of the results derived from all ambient noise types is calculated.

8.12.2 Hand-held hands-free UE

For connections with hand-held hands-free UE, when using the simulation method described in TS 103 224 [43], the measurement is conducted for 5 noise conditions as described in Table 2h2. When using the ES 202 396-1 method, the equivalent binaurally recorded noises described in Table 2h2, and available in the source file directory of TS 103 224 [43], are used.

Table 2h2: Noise conditions used for ambient noise simulation in hand-held hands-free mode as specified in TS 103 224 [43], A-weighted

Name	Description	Length	Hands-free Levels	Binaural L	Binaural R
Full-size car 130 km/h (FullSizeCar_130)	HATS and microphone array at co-drivers position	30 s	1: 69,5 dB 2: 68,6 dB 3: 68,6 dB 4: 68,7 dB 5: 68,8 dB 6: 68,8 dB 7: 69,2 dB 8: 69,7 dB	68.7 dB	70.7 dB
Crossroadnoise (Crossroadnoise)	HATS and microphone array standing outside near a crossroad	30 s	1: 69,9 dB 2: 69,6 dB 3: 69,6 dB 4: 69,9 dB 5: 69,6 dB 6: 69,5 dB 7: 69,6 dB 8: 69,7 dB	70.8 dB	71.6 dB
Cafeteria (Cafeteria)	HATS and microphone array inside a cafeteria	30 s	1: 69,0 dB 2: 69,7 dB 3: 69,6 dB 4: 69,8 dB 5: 69,5 dB 6: 69,5 dB 7: 69,7 dB 8: 70,0 dB	69.8 dB	70.3 dB
Sales Counter (SalesCounter)	HATS and microphone array in a supermarket	30 s	1: 65,5 dB 2: 65,3 dB 3: 65,2 dB 4: 65,5 dB 5: 65,6 dB 6: 65,3 dB 7: 65,2 dB 8: 65,3 dB	66.7 dB	66.6 dB
Callcenter 2 (Callcenter)	HATS and microphone array in business office	30 s	1: 59,3 dB 2: 59,3 dB 3: 59,5 dB 4: 59,6 dB 5: 59,4 dB 6: 59,3 dB 7: 59,3 dB 8: 59,5 dB	60,2 dB	60,0 dB

2) The send speech signal consists of the 16 sentences of speech as described in ETSI TS 103 106 [34]. For connections with a hand-free UE, the test signal level is +1.3dBPa at the MRP, measured as active speech level according to ITU-T P.56 [37]. Three signals are required for the tests:

– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 106 [34], ETSI EG 202 396‑3 [36]).

– The send signal is recorded at the POI.

4) The measurement is repeated for each ambient noise condition described in Table 2h2.

5) The average of the results derived from all ambient noise types is calculated.

8.12.3 Electrical interface UE

The speech quality in sending for narrowband systems is tested based on ETSI TS 103 106 [34]. This test method leads to three MOS-LQOw quality numbers:

N-MOS-LQOw: Transmission quality of the background noise

S-MOS-LQOw: Transmission quality of the speech

G-MOS-LQOw: Overall transmission quality

For the measurement of electrial interface UE, pre-recorded noisy speech signals according to Annex B of Recommendation ITU‑T P.381 [53] shall be used. These noisy test sequences are available for the eight noise types described in Table 2h and were captured at the electrical output of a representative analogue headset. The corresponding speech level at MRP was calibrated to -1.7 dBPa, as described in clause 8.12.1. All test signals also include the proper conditioning sequence described in ETSI TS 103 106 [34], which is applied to the beginning of the 16-sentence test sequence.

Annex B of Recommendation ITU‑T P.381 [53] also provides the corresponding unprocessed reference speech signals, which are necessary for the calculation of S-MOS, N-MOS and G-MOS according to [b-ETSI TS 103 106]. These signals were recorded with a omnidirectional measurement microphone close to the input microphone of the representative headset.

1) The test arrangement is given in clause 5.1.6. For analogue interfaces, the noisy test sequences according to Annex B of Recommendation ITU‑T P.381 [53] shall be calibrated in a way that -26 dBov correspond to ‑60 dBV. For digital interfaces, -26 dBov shall correspond to -16 dBm0.

2) The noisy test sequence is inserted into electrical interface UE and then recorded at the POI.

3) N-MOS-LQOw, S-MOS-LQOw and G-MOS-LQOw are calculated as described in ETSI TS 103 106 [34] (wideband mode) on a per sentence basis and averaged over all 16 sentences. The results shall be reported as average and standard deviation. Three signals are required for the tests:

– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 106 [34], ETSI EG 202 396‑3 [36]).

– The speech plus undisturbed background noise signal. For each noisy test signal, a corresponding signal is available in Annex B of Recommendation ITU‑T P.381 [53] as well.

– The send signal is recorded at the POI.

4) The measurement is repeated for each ambient noise condition described in Table 2h. For each of these noise types, a corresponding test signal is available in Annex B of Recommendation ITU‑T P.381 [53].

5) The average of the results derived from all ambient noise types is calculated.