A.3 MOS-LQO Test

26.4443GPPCodec for Enhanced Voice Services (EVS)Release 17Test sequencesTS

A.3.1 General consideration

For this test P.863 [16] is used. An implementation of [16] pertaining to the 2014 version 2 was used to determine the threshold values to be met.

The audio database, is based on ITU-T P.501 [17] Annex B & C items and mixed/music items as detailed in clause A.3.2. For speech with background noise pre-mixed items based on the same speech samples are used.

This test was used in EVS characterization reported in TS 26.952 [15].

For this test, four combinations of encoder/decoder are used (3GPP EVS fixed-point encoder/decoder executables are taken from TS 26.442 [7]):

a) 3GPP fixed-point encoder and 3GPP fixed-point decoder (FX/FX),

b) floating-point Encoder under Test and floating-point Decoder under Test (FL/FL),

c) 3GPP fixed-point encoder and floating-point Decoder under Test (FX/FL),

d) floating-point Encoder under Test and 3GPP fixed-point decoder (FL/FX).

The MOS-LQO scores are computed for each of the four cases using the decoded files and the original test files.

30 files representing various talkers and languages are used for each speech test condition, and the average MOS-LQO scores are reported. In addition, 30 mixed/music files are used for the non-speech test conditions as decribed in clause A.3.2.

The scenario a) is considered the reference score. For the three other scenarios (b, c and d), the difference in MOS-LQO of a) are then computed:

– a) – b)

– a) – c)

– a) – d)

The difference a) – b) assesses the encoder + decoder floating-point implementation, the difference a) – c) assesses the decoder implementation and a) – d) assesses the encoder implementation.

Figures A.4, A.5, and A.6 represent the flow diagram to obtain the MOS-LQO in the three scenarios.

Figure A.4: Flow diagram to obtain the MOS-LQO for floating-point Encoder under Test and floating-point Decoder under Test

Figure A.5: Flow diagram to obtain the MOS-LQO for 3GPP fixed-point Encoder and floating-point Decoder under Test

Figure A.6: Flow diagram to obtain the MOS-LQO for floating-point Encoder under Test and 3GPP fixed-point decoder

A.3.2 Test Files

The test files are based on ITU-T P.501 [17] Annex B & C. 30 files representing various talkers and languages. The files are listed below:

an1f1s1 => AnnexC//P501_C_chinese_f1_FB_48k.wav

an1f1s2 => AnnexC//P501_C_chinese_f2_FB_48k.wav

an1f1s3 => Speech and Noise Signals Clause B/Dutch_FB_clause_B.3.2//female 1.wav

an1f1s4 => Speech and Noise Signals Clause B/Dutch_FB_clause_B.3.2//female 2.wav

an1f1s5 => AnnexC//P501_C_english_f1_FB_48k.wav

an1f2s1 => AnnexC//P501_C_english_f2_FB_48k.wav

an1f2s2 => AnnexC//P501_C_finnish_f1_FB_48k.wav

an1f2s3 => AnnexC//P501_C_finnish_f2_FB_48k.wav

an1f2s4 => AnnexC//P501_C_french_f1_FB_48k.wav

an1f2s5 => AnnexC//P501_C_french_f2_FB_48k.wav

an1f3s1 => AnnexC//P501_C_german_f1_FB_48k.wav

an1f3s2 => AnnexC//P501_C_german_f2_FB_48k.wav

an1f3s3 => AnnexC//P501_C_italian_f1_FB_48k.wav

an1f3s4 => AnnexC//P501_C_italian_f2_FB_48k.wav

an1f3s5 => AnnexC//P501_C_japanese_f1_FB_48k.wav

an1m1s1 => AnnexC//P501_C_chinese_m1_FB_48k.wav

an1m1s2 => AnnexC//P501_C_chinese_m2_FB_48k.wav

an1m1s3 => Speech and Noise Signals Clause B/Dutch_FB_clause_B.3.2//male 1.wav

an1m1s4 => Speech and Noise Signals Clause B/Dutch_FB_clause_B.3.2//male 2.wav

an1m1s5 => AnnexC//P501_C_english_m1_FB_48k.wav

an1m2s1 => AnnexC//P501_C_english_m2_FB_48k.wav

an1m2s2 => AnnexC//P501_C_finnish_m1_FB_48k.wav

an1m2s3 => AnnexC//P501_C_finnish_m2_FB_48k.wav

an1m2s4 => AnnexC//P501_C_french_m1_FB_48k.wav

an1m2s5 => AnnexC//P501_C_french_m2_FB_48k.wav

an1m3s1 => AnnexC//P501_C_german_m1_FB_48k.wav

an1m3s2 => AnnexC//P501_C_german_m2_FB_48k.wav

an1m3s3 => AnnexC//P501_C_italian_m1_FB_48k.wav

an1m3s4 => AnnexC//P501_C_italian_m2_FB_48k.wav

an1m3s5 => AnnexC//P501_C_japanese_m1_FB_48k.wav

The noisy speech items are created from the clean speech items above and mixed with car, street, or office noise.

The mixed content and music items are selected from [13], and [14] as follows:

an1a1s1 => samples 840000:1104000 from {26444}/stv48c.INP,

an1a1s2 => samples 1260000:1404288 from{26444}/stv48c.INP,

an1a1s3 => samples 1611888:1793270 from {26444}/stv48c.INP,

an1a1s4 => samples 1793270:2057040 from {26444}/stv48c.INP,

an1a1s5 => left channel from {26406}/guitar_cymbals.wav,

an1a2s1 => left channel from {26274}/m_cl_x_1_org.wav,

an1a2s2 => left channel from {26274}/m_cl_x_2_org.wav,

an1a2s3 => left channel from {26274}/m_ot_x_5_org.wav,

an1a2s4 => left channel from {26274}/m_ot_x_8_org.wav,

an1a2s5 => left channel from {26274}/m_si_x_3_org.wav,

an1a3s1 => left channel from {26274}/m_ch_x_1_org.wav,

an1a3s2 => left channel from {26274}/m_po_x_2_org.wav,

an1a3s3 => left channel from {26274}/m_ot_x_4_org.wav,

an1a3s4 => left channel from {26274}/m_ot_x_9_org.wav,

an1a3s5 => left channel from {26274}/m_po_x_3_org.wav,

an1a4s1 => left channel from {26274}/m_ot_x_6_org.wav,

an1a4s2 => left channel from {26274}/m_ot_x_3_org.wav,

an1a4s3 => left channel from {26274}/m_ot_x_a_org.wav,

an1a4s4 => left channel from {26274}/m_ot_x_b.org.wav,

an1a4s5 => left channel from {26406}/hihat.wav,

an1a5s1 => left channel from {26274}/m_ot_x_7_org.wav,

an1a5s2 => left channel from {26274}/m_po_x_1_org.wav,

an1a5s3 => left channel from {26274}/m_ot_x_2_org.wav,

an1a5s4 => left channel from {26274}/m_po_x_4_org.wav,

an1a5s5 => left channel from {26274}/m_po_x_5_org.wav,

an1a6s1 => left channel from {26274}/m_po_x_6_org.wav,

an1a6s2 => left channel from {26274}/m_po_x_3_org.wav,

an1a6s3 => left channel from {26274}/m_si_x_1_org.wav,

an1a6s4 => left channel from {26274}/m_si_x_2_org.wav,

an1a6s5 => left channel from {26274}/m_vo_x_1_org.wav

A.3.3 Test Conditions

The differences are computed for various test conditions:

– All the codec modes of EVS

– All the bandwidths of EVS

– All the bit-rates of EVS, including bit-rate switching

– DTX ON and OFF

– Various levels: -26 dB, -36 dB, -16 dB

– Various noise conditions

– Various impairment conditions

The files have been processed according to EVS-7c (EVS processing plan) for the various test conditions [6]. In all, 941 test conditions are assessed.

The processing generates for all 941 test conditions from the items detailed in clause A.3.2, roughly 225000 seconds (or ~62 hours) of PCM data, which shall be assessed with P.863 [16] according to version 2 to generate the average MOS-LQO differences per test condition.

NOTE: Implementers are advised to ensure that sufficient free storage space is available as the processing may require up to 100 GB of storage. Processing and P.863 [16] evaluation may also require significant amounts of time.

A.3.4 Thresholds and Criteria

From the MOS-LQO differences of the test condition, the average, 95%, 99% and Maximum are computed for all bandwidths combined, as well as for each set of bandwidth condition. The number of test condition for each bandwidth and the total are summarized in Table A.6.

Table A.6: Number of test conditions per bandwidth excluding the EVS JBM [11]

Bandwidth

NB

WB

WBIO

SWB

FB

All

Number

118

214

216

170

143

861

An implementation according to 26.443 Version 16.3.0 will be considered passing the MOS-LQO verification if all the average, 95 percentile, 99 percentile and maximum MOS-LQO differences are below the thresholds proposed in Table A.7 for all bandwidths.

Table A.7: Thresholds for MOS_LQO difference excluding the EVS JBM [11]

All

Average

95%

99%

Max

A-B

0.001

0.04

0.07

0.11

A-C

0.001

0.02

0.04

0.09

A-D

0.002

0.04

0.07

0.12

NB

Average

95%

99%

Max

A-B

0.01

0.07

0.09

0.1

A-C

0.002

0.02

0.02

0.04

A-D

0.012

0.07

0.09

0.11

WB

Average

95%

99%

Max

A-B

0.002

0.05

0.06

0.08

A-C

0.001

0.02

0.04

0.09

A-D

0.004

0.05

0.07

0.09

WBIO

Average

95%

99%

Max

A-B

0.006

0.02

0.04

0.08

A-C

0.003

0.01

0.02

0.04

A-D

0.004

0.01

0.03

0.08

SWB

Average

95%

99%

Max

A-B

0.002

0.04

0.06

0.08

A-C

0.003

0.03

0.04

0.05

A-D

0.003

0.04

0.07

0.08

FB

Average

95%

99%

Max

A-B

0.005

0.05

0.07

0.11

A-C

0.003

0.03

0.04

0.06

A-D

0.003

0.05

0.07

0.12

NOTE: The MOS-LQO verification does not include testing of the EVS JBM solution. Conformance for the EVS JBM solution in [11] is FFS.