9 Super-wideband telephony transmission performance test methods
26.1323GPPRelease 18Speech and video telephony terminal acoustic test specificationTS
9.1 Applicability
The test methods in this clause shall apply when testing a UE that is used to provide super-wideband telephony, either as a stand-alone service, or as part of a multimedia service.
The application force used to apply the handset against the artificial ear shall be 8 ± 2 N. For the headset case, the application of the headset shall comply with ITU-T Recommendation P.57 [14].
9.2 Overall loss/loudness ratings
9.2.1 General
The SLR and RLR values for GSM, 3G, LTE, NR or WLAN networks apply up to the POI. However, the main determining factors are the characteristics of the UE, including the analogue to digital conversion (ADC) and digital to analogue conversion (DAC). In practice, it is convenient to specify loudness ratings to the Air Interface. For the normal case, where the GSM, 3G, LTE, NR or WLAN network introduce no additional loss between the Air Interface and the POI, the loudness ratings to the PSTN boundary (POI) will be the same as the loudness ratings measured at the Air Interface.
9.2.2 Connections with handset UE
9.2.2.1 Sending loudness rating (SLR)
The test method is the same as for wideband (see sub-clause 8.2.2.1).
9.2.2.2 Receiving loudness rating (RLR)
The test method is the same as for wideband (see sub-clause 8.2.2.2, observing the signal properties for super-wideband described in sub-clause 5.4).
9.2.2.3 Receiving loudness rating (RLR) in the presence of background noise
The test method is the same as for wideband (see sub-clause 8.2.2.3, observing the signal properties for super-wideband described in sub-clause 5.4).
9.2.3 Connections with desktop hands-free UE
The description is the same as for wideband (see sub-clause 8.2.3).
9.2.3.1 Sending loudness rating (SLR)
The test method is the same as for wideband (see sub-clause 8.2.3.1).
9.2.3.2 Receiving loudness rating (RLR)
The test method is the same as for wideband (see sub-clause 8.2.3.2, observing the signal properties for super-wideband described in sub-clause 5.4).
9.2.4 Connections with hand-held hands-free UE
9.2.4.1 Sending loudness rating (SLR)
The test method is the same as for wideband (see sub-clause 8.2.4.1).
9.2.4.2 Receiving loudness rating (RLR)
The test method is the same as for wideband (see sub-clause 8.2.4.2, observing the signal properties for super-wideband described in sub-clause 5.4).
9.2.5 Connections with headset UE
The description is the same as for wideband (see sub-clause 8.2.5).
9.2.6 Connections with electrical interface UE
9.2.6.1 Sending junction loudness rating (SJLR)
The description is the same as for wideband (see sub-clause 8.2.6.1).
9.2.6.2 Receving junction loudness rating (RJLR)
The description is the same as for wideband (see sub-clause 8.2.6.2).
9.3 Idle channel noise (handset, headset and eletrical interface UE)
9.3.0 Overview
For idle noise measurements in sending and receiving directions, care should be taken that only the noise is windowed out by the analysis and the result is not impaired by any remaining reverberation or by noise and/or interference from various other sources. Some examples are air-conducted or vibration-conducted noise from sources inside or outside the test chamber, disturbances from lights and regulators, mains supply induced noise including grounding issues, test system and system simulator inherent noise as well as radio interference from the UE to test equipment such as ear simulators, microphone amplifiers, etc.
The following steps shall be followed in advance to both measurement directions:
a) The test environment shall comply with the conditions described in subclause 6.1.
b) The terminal should be configured to the test equipment as described in subclause 5.1.
c) A test signal may have to be intermittently applied to prevent ‘silent mode’ operation of the MS. This is for further study.
d) An optional activation sequence may be used, to e.g., override a voice activity detection. In this case, the additional test signal shall be suitable regarding level and bandwidth, like e.g., the composite source signals described in clause 9.10.
To improve repeatability, the test sequence (optional activation followed by the noise level measurement) may be contiguously repeated one or more times.
9.3.1 Sending (handset and headset UE)
In advance to the measurement, the general steps listed in clause 9.3.0 shall be followed.
a) In advance to the noise level measurement, an optional activation sequence may be used.
b) The noise level at the output of the SS is measured from 100 Hz to 16 kHz with A‑weighting. The A-weighting filter is described in IEC 61672 [38].
c) The measured part of the noise shall be 170,667 ms (which equals 8192 samples in a 48 kHz sample rate test system). The spectral distribution of the noise is analyzed with an 8k FFT using windowing with ≤ 0,1 dB leakage for non bin-centered signals. This can be achieved with a window function commonly known as a "flat top window". Within the specified frequency range, the FFT bin that has the highest level is searched for; the level of this bin is the maximum level of a single frequency disturbance.
d) The total noise powers obtained from such repeats shall be averaged. The total result shall be 10 * log10 of this average in dB.
e) The single frequency maximum powers obtained from such repeats shall be averaged. The total result shall be 10 * log10 of this average in dB.
9.3.2 Receiving (handset and headset UE)
In advance to the measurement, the general steps listed in clause 9.3.0 shall be followed.
a) In advance to the noise level measurement, an optional activation sequence may be used.
b) The noise shall be measured from 100 Hz to 20 kHz with A‑weighting at the DRP with diffuse-field correction. The A-weighting filter is described in IEC61672 [38].
The measured part of the noise shall be 170,667 ms (which equals 8192 samples in a 48 kHz sample rate test system). The spectral distribution of the noise is analyzed with an 8k FFT using windowing with ≤ 0,1 dB leakage for non bin-centered signals. This can be achieved with a window function commonly known as a "flat top window". Within the specified frequency range the FFT bin that has the highest level is searched for; the level of this bin is the maximum level of a single frequency disturbance.
.
d) The total noise powers obtained from such repeats shall be averaged. The total result shall be 10 * log10 of this average in dB.
e) The single frequency maximum powers obtained from such repeats shall be averaged. The total result shall be 10 * log10 of this average in dB.
9.3.3 Sending (electrical interface UE)
Same method as in clause 9.3.1.
9.3.4 Receiving (electrical interface UE)
Same method as in clause 9.3.1, except that the idle noise signal is captured at the receive output of the electrical reference interface.
9.4 Sensitivity/frequency characteristics
9.4.0 General
For checking the sensitivity/frequency characteristics against performance requirements (as in e.g., 3GPP TS 26.131 [1]), any given tolerance mask shall be defined for each center frequency of the fractional octave bands, which is used in the respective test method. If necessary, the tolerance mask is interpolated linearly for a certain center frequency between the two closest neighbouring data points on a log-frequency scale and the magnitude in dB.
9.4.1 Handset and headset UE sending
9.4.1.1 Handset UE sending
The headset case is similar to the handset one, except for the application force.
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The spectrum of the acoustic signal produced by the artificial mouth is calibrated under free-field conditions at the MRP. The test signal level shall be –4,7 dBPa measured at the MRP. The test signal level is calculated over the complete test signal sequence.
b) The handset terminal is setup as described in clause 5. Measurements shall be made at both 1/3-octave and 1/12-octave intervals as given by the R.10 and R.40 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at the electrical reference point for each frequency band is referred to the averaged test signal level measured in each frequency band at the MRP.
c) The sensitivity is expressed in terms of dBV/Pa.
9.4.1.2 Headset UE sending
The headset case is similar to the handset one, except for the application force and measurement intervals (only 1/3-octave intervals are used).
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The spectrum of the acoustic signal produced by the artificial mouth is calibrated under free-field conditions at the MRP. The test signal level shall be –4,7 dBPa measured at the MRP. The test signal level is calculated over the complete test signal sequence.
b) The handset terminal is setup as described in clause 5. Measurements shall be made at 1/3-octave intervals as given by the R.10 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at the electrical reference point for each frequency band is referred to the averaged test signal level measured in each frequency band at the MRP.
c) The sensitivity is expressed in terms of dBV/Pa.
9.4.2 Handset and headset UE receiving
9.4.2.1 Handset UE receiving
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The test signal level shall be ‑16 dBm0 measured at the digital reference point or the equivalent analogue point. The test signal level is calculated over the complete test signal sequence.
b) The handset terminal is setup as described in clause 5. Measurements shall be made at both 1/3-octave and 1/12-octave intervals as given by the R.10 and R.40 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at each frequency band is referred to the averaged test signal level measured in each frequency band.
c) The HATS is diffuse-field equalized. The sensitivity is expressed in terms of dBPa/V. Information about correction factors is available in subclause 5.1.4.
Optionally, the measurements may be repeated with 2 N and 13 N application force. For these test cases no normative values apply.
9.4.2.2 Headset UE receiving
The headset case is similar to the handset one, except for the measurement intervals (only 1/3-octave intervals are used).
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The test signal level shall be ‑16 dBm0 measured at the digital reference point or the equivalent analogue point. The test signal level is calculated over the complete test signal sequence.
b) The handset terminal is setup as described in clause 5. Measurements shall be made at 1/3-octave intervals as given by the R.10 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at each frequency band is referred to the averaged test signal level measured in each frequency band.
c) The HATS is diffuse-field equalized. The sensitivity is expressed in terms of dBPa/V. Information about correction factors is available in subclause 5.1.4.
Optionally, the measurements may be repeated with 2 N and 13 N application force. For these test cases no normative values apply.
9.4.3 Desktop hands-free UE sending
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The spectrum of the acoustic signal produced by the artificial mouth is calibrated under free-field conditions at the MRP. The test signal level shall be –4,7 dBPa measured at the MRP. The test signal level is calculated over the complete test signal sequence. The broadband signal level is then adjusted to –28,7 dBPa at the HFRP or the HATS HFRP (as defined in ITU-T Recommendation P.581) and the spectrum is not altered.
The spectrum at the MRP and the actual level at the MRP (measured in 1/3-octaves) are used as references to determine the sending sensitivity SmJ.
b) The hands-free terminal is setup as described in clause 5. Measurements shall be made at 1/3-octave intervals as given by the R.10 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation the averaged measured level at each frequency band is referred to the averaged test signal level measured in each frequency band.
c) The sensitivity is expressed in terms of dBV/Pa.
9.4.4 Desktop hands-free UE receiving
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The test signal level shall be ‑16 dBm0 measured at the digital reference point or the equivalent analogue point. The test signal level is calculated over the complete test signal sequence.
b) The hands-free terminal is setup as described in clause 5. If a HATS is used, then it is free-field equalized as described in ITU-T Recommendation P.581. The equalized output signal of each artificial ear is power-averaged over the total duration of the analysis; the right and left artificial ear signals are voltage-summed for each 1/3-octave frequency band; these 1/3-octave band data are considered as the input signal to be used for calculations or measurements. Measurements shall be made at 1/3-octave intervals as given by the R.10 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at each frequency band is referred to the averaged test signal level measured in each frequency band.
c) The sensitivity is expressed in terms of dBPa/V.
9.4.5 Hand-held hands-free UE sending
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The spectrum of the acoustic signal produced by the artificial mouth is calibrated under free-field conditions at the MRP. The test signal level shall be –4,7 dBPa measured at the MRP. The test signal level is calculated over the complete test signal sequence. The broadband signal level is then adjusted to –28,3 dBPa at the HFRP or the HATS HFRP (as defined in subclause 9.2.3.1) and the spectrum is not altered.
The spectrum at the MRP and the actual level at the MRP (measured in 1/3-octaves) are used as reference to determine the sending sensitivity SmJ.
b) The hands-free terminal is setup as described in clause 5.1.3.3. Measurements shall be made at 1/3-octave intervals as given by the R.10 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at each frequency band is referred to the averaged test signal level measured in each frequency band.
c) The sensitivity is expressed in terms of dBV/Pa.
9.4.6 Hand-held hands-free UE receiving
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The test signal level shall be ‑16 dBm0 measured at the digital reference point or the equivalent analogue point. The test signal level is calculated over the complete test signal sequence.
b) The hands-free terminal is setup as described in clause 5. If a HATS is used, then it is free-field equalized as described in ITU-T Recommendation P.581. The equalized output signal of each artificial ear is power-averaged over the total duration of the analysis; the right and left artificial ear signals are voltage-summed for each 1/3-octave band frequency band; these 1/3-octave band data are considered as the input signal to be used for calculations or measurements. Measurements shall be made at 1/3-octave intervals as given by the R.10 series of preferred numbers in ISO 3 for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at each frequency band is referred to the averaged test signal level measured in each frequency band.
c) The sensitivity is expressed in terms of dBPa/V.
9.4.7 Electrical interface UE sending
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The active speech level of the signal shall be calibrated to -60 dBV for analogue and to -16 dBm0 for digital connections. The test signal level is calculated over the complete test signal sequence.
b) The reference signal to be used for the calculation shall be the same as the test signal and is calibrated to ‑4.7 dBPa (independent of analogue or digital connection).
c) The electrical interface is setup as described in clause 5.1.6. Measurements shall be made at both 1/3-octave and 1/12-octave intervals as given by the R.10 and R.40 series of preferred numbers in ISO 3 [54] for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the averaged measured level at the electrical reference point for each frequency band is referred to the averaged reference signal level measured in each frequency band.
d) The sensitivity is expressed in terms of dB.
9.4.8 Electrical interface UE receiving
a) The test signal to be used for the measurements shall be the British-English single talk sequence described in ITU-T Recommendation P.501 [22]. The test signal level shall be -16 dBm0 measured at the digital reference point or the equivalent analogue point. The test signal level is calculated over the complete test signal sequence.
b) The reference signal to be used for the calculation shall be the same as the test signal and is calibrated to ‑39 dBV for analogue and to ‑16 dBm0 for digital connections.
c) The handset terminal is setup as described in clause 5. Measurements shall be made at both 1/3-octave and 1/12-octave intervals as given by the R.10 and R.40 series of preferred numbers in ISO 3 [54] for frequencies from 100 Hz to 16 kHz inclusive. For the calculation, the average measured level at the output of the electrical interface UE for each frequency band is referred to the reference signal.
d) The sensitivity is expressed in terms of dB.
9.5 Sidetone characteristics
9.5.1 Connections with handset UE
The test method is the same as for wideband (see sub-clause 8.5.1).
9.5.2 Headset UE
The test method is the same as for wideband (see sub-clause 8.5.2).
9.5.3 Hands-free UE (all categories)
No requirement other than echo control.
9.5.3a Electrical interface UE
The test method is the same as for wideband (see sub-clause 8.5.3a).
9.5.4 Sidetone delay for handset, headset or electrical interface UE
The test method is the same as for wideband (see sub-clause 8.5.4).
9.6 Stability loss
Where a user-controlled volume control is provided it is set to maximum.
Handset UE: The handset is placed on a hard plane surface with the earpiece facing the surface.
Headset UE: The requirement applies for the closest possible position between microphone and headset receiver within the intended wearing position.
NOTE: Depending on the type of headset it may be necessary to repeat the measurement in different positions.
Hands-free UE (all categories): No requirement other than echo loss.
Before the actual test a training sequence consisting of the British-English single talk sequence described in ITU-T Recommendation P.501 [22] is applied. The training sequence level shall be ‑16 dBm0 in order to not overload the codec.
The test signal is a PN-sequence complying with ITU-T Recommendation P.501 with a length of 4 096 points (for a 48 kHz sampling rate system) and a crest factor of 6 dB instead of 11 dB. The PN-sequence is generated as described in P.501 with W(k) constant within the frequency range 100-16000 Hz and zero outside this range. The duration of the test signal is 250 ms. With an input signal of ‑3 dBm0, the attenuation from input to output of the system simulator shall be measured under the following conditions:
a) The handset or the headset, with the transmission circuit fully active, shall be positioned on a hard plane surface with at least 400 mm free space in all directions. The earpiece shall face towards the surface as shown in figure 20;
b) The headset microphone is positioned as close as possible to the receiver(s) within the intended wearing position;
c) For a binaural headset, the receivers are placed symmetrically around the microphone.
NOTE: All dimensions in mm.
Figure 20. Test configuration for stability loss measurement on handset or headset UE
The attenuation from input to output shall be measured in the frequency range from 100 Hz to 16 kHz. The spectral distribution of the output signal is analysed with a 4k FFT (for a 48 kHz sample rate test system), thus the measured part of the output signal is 85,333 ms. To avoid leakage effects the frequency resolution of the FFT must be the same as the frequency spacing of the PN-sequence.
9.7 Acoustic echo control
9.7.1 General
The echo loss (EL) presented by the GSM, 3G, LTE, NR or WLAN networks at the POI should be at least 46 dB during single talk. This value takes into account the fact that UE is likely to be used in a wide range of noise environments.
NOTE: A test method fully adapted to super-wideband acoustic echo control is for further study
The calculation of terminal coupling loss (TCL) is based on the attenuation from reference point input to reference point output versus frequency bands. The following common measurement steps are applicable for all types of UE described below:
a) The attenuation from reference point input to reference point output shall be measured using the compressed real speech signal described in clause 7.3.3 of ITU-T P.501 Amendment 1 [33]. The test signal level shall be ‑10 dBm0.
b) The first 17,0 s of the test signal (6 sentences) are discarded from the analysis to allow for convergence of the acoustic echo canceller. The analysis is performed over the remaining length of the test sequence (last 6 sentences).
c) The analysis shall be conducted in 1/3-octave band intervals as given by the R.10 series of preferred numbers in ISO 3 [54]. For the calculation, the averaged measured echo level at each frequency band is referred to the averaged test signal level measured in each frequency band.
d) The TCL is calculated according to ITU-T Recommendation G.122 [8], annex B, clause B.4 (trapezoidal rule), but using the frequency range between 300 to 6700 Hz (instead of 300 Hz to 3400 Hz).
9.7.2 Acoustic echo control in a hands-free UE
The hands-free UE is set up according to clause 5.
The TCL is measured and calculated according to clause 9.7.1.
9.7.3 Acoustic echo control in a handset UE
The handset UE is set up according to clause 5. The ambient noise level shall be ≤ -64 dBPa(A).
The TCL is measured and calculated according to clause 9.7.1.
9.7.4 Acoustic echo control in a headset UE
The headset is set up according to clause 5. The ambient noise level shall be ≤ -64 dBPa(A). The TCL is measured and calculated according to clause 9.7.1.
9.7.5 Acoustic echo control in a electrical interface UE
The electrical interface UE is setup according to clause 5.1.6. In order to simulate an acoustic echo, the electrical reference interface shall introduce an echo loss of 30 dB.
The TCL is measured and calculated according to clause 9.7.1.
9.8 Distortion
9.8.1 Sending distortion
The test method is the same as for wideband (see sub-clause 8.8.1).
9.8.2 Receiving Distortion
The test method is the same as for wideband (see sub-clause 8.8.2, observing the signal properties for super-wideband described in sub-clause 5.4).
9.9 Void
9.10 Delay
9.10.0 UE Delay Measurement Methodologies
The test method is the same as in wideband (see clause 8.10.0).
9.10.1 Delay in sending direction (handset UE)
The test method is the same as in wideband (see clause 8.10.1).
9.10.1a Delay in sending direction (headset UE)
The test method is the same as in wideband (see clause 8.10.1a).
9.10.1b Delay in sending direction (electrical interface UE)
The test method is the same as in wideband (see clause 8.10.1b).
9.10.2 Delay in receiving direction (handset UE)
The test method is the same as in wideband (see clause 8.10.2, observing the test signal properties defined for super-wideband described in clause 5.4).
9.10.2a Delay in receiving direction (headset UE)
The test method is the same as in wideband (see clause 8.10.2a, observing the test signal properties for super-wideband described in clause 5.4).
9.10.2b Delay in receiving direction (electrical interface UE)
The test method is the same as in wideband (see clause 8.10.2b, observing the test signal properties for super-wideband described in clause 5.4).
9.10.3 Delay in sending + receiving direction using "echo" method (handset UE)
The test method is the same as in wideband (see clause 8.10.3, observing the test signal properties for super-wideband described in clause 5.4).
9.10.3a Delay in sending + receiving direction using "echo" method (headset UE)
The test method is the same as in wideband (see clause 8.10.3a, observing the test signal properties for super-wideband in clause 5.4).
9.10.3b Delay in sending + receiving direction using "echo" method (electrical interface UE)
The test method is the same as in wideband (see clause 8.10.3b, observing the test signal properties for super-wideband in clause 5.4).
9.10.4 Delay and speech quality in conditions with packet arrival time variations and packet loss (handset, headset, electrical interface UE)
9.10.4.1 Delay in sending direction
The test method is the same as in wideband (see clause 8.10.4.1).
9.10.4.2 Delay in receiving direction
For this test it shall be ensured that the call is originated from the UE.
NOTE 1: Differences have been observed between UE originated call and UE-terminated call. For better consistency calls from the UE are used.
The test signal consists of 3 repeats of the Composite Source Signal (CSS) according to ITU-T Recommendation P.501 [22] followed by a speech signal of 160s. During the first two CSS signals the terminal can adapt its jitter buffer. The third CSS is used for measuring the delay in constant-delay condition, and the speech signal is used for delay and quality measurement in the packet impairment condition.
Constant delay Tc corresponding to the minimum delay of the profile (i.e. the compensation value for the profile) shall be added at the beginning of the different delay/loss profiles, to avoid unecessary delay jumps between the two measurement phases and realistic conditions for the second measurement test phase.
In receiving direction, the delay between the electrical access point of the test equipment and the reference point (RP), TTEAP-RP(t) = TR-jitter(t) + TTER, is measured in two successive phases:
1 First the delay in constant-delay condition TTEAP-RP-constant is measured as described in steps 1 to 4, clause 9.10.2/ 9.10.2a/9.10.2b, using the third CSS signal. The constant delay Tc is subtracted from TTEAP-DRP to obtain TR-constant.
2 Then the delay with packet impairment TR-jitter(t) is measured continuously for a speech signal during the inclusion of packet delay and loss profiles in the receiving direction RTP voice stream.
The reference point is defined as follows:
– for handset and headset UE, the reference point is the DRP.
– for electrical interface UE, the reference point is the input of the electrical reference interface.
Packet impairments shall be applied between the reference client and system simulator eNodeB. Separate calls shall be established for each packet impairment condition.
The start of the delay profiles must be synchronized with the start of the downlink speech material reproduction (compensated by the delay between reproduction and the point of impairment insertion, i.e. the delay of the reference client) in order to ensure a repeatable application of impairments to the test speech signal. Tests shall be performed with DTX enabled in the reference client.
NOTE 2: RTP packet impairments representing packet delay variations and loss in LTE transmission scenarios are specified in Annex E. These LTE jitter/loss profiles are reused also for tests with WLAN and NR access. Care must be taken that the system simulator uses a dedicated bearer with no buffering/scheduling of packets for transmission.
For the CSS signal repeated 3 times, the pseudo random noise (pn)-part of the CSS has to be longer than the maximum expected delay. It is recommended to use a pn sequence of 32 k samples (with 48 kHz sampling rate). The test signal level is -16 dBm0 measured at the digital reference point or the equivalent analogue point.
For the speech signal, 8 English test sentences according to ITU-T P.501 Annex C.2.3, normalized to an active speech level of -16dBm0, are used (2 male, 2 female speakers). The sequences are concatenated in such a way that all sentences are centered within a 4.0s time window, which results in an overall duration of 32.0s. The sequences are repeated 5 times, resulting in a test file 160.0s long. The first 2 sentences are used for convergence of the UE jitter buffer manager and are discarded from the analysis. Equivalent implementations of the concatenation by repeating the test sentences in sequence may be used.
For the delay calculation with the speech signal, a cross-correlation with a rectangular window length of 4s, centered at each sentence of the stimulus file, is used. The process is repeated for each sample. For each cross correlation, the maximum of the envelope is obtained producing one delay value per sentence.
The UE delay in the receive direction, TR-jitter(t), is obtained by subtracting the delay introduced by the test equipment and the simulated transport network packet delay introduced by the delay and loss profile (as specified for the respective profile in Annex E) from the first electrical event at the electrical access point of the test equipment to the first bit of the corresponding speech frame at the system simulator antenna, TTER, from the measured TTEAP-DRP(t).
The difference DT between maximum receiving delay obtained with at least 5 individual calls (see clause 7.10.2) and the delay TR-constant measured for the CSS signal in constant delay condition is calculated. The quantity "Call-to-Call Variability Adjustment" (CCVA) = max(0,DT) shall be added to the obtained delay for the speech signal TR-jitter(t).
For stationary packet delay variation test conditions (test condition 1 and 2), the first 2 sentences are used for convergence of the jitter buffer management and are discarded from the analysis. The CCVA-adjusted UE delay (TR-CCVA(t) = TR-jitter(t) + CCVA) in the receiving direction shall be reported as the maximum value excluding the two largest values of the remaining sequence of the 38 sentence delay values, i.e. the 95-percentile value of TR-CCVA(t). The TR-CCVA values for all 40 sentences shall be reported in the test report.
NOTE 3: The synchronization of the speech frame processing in the UE to the bits of the speech frames at the UE antenna may lead to a variability of up to 20 ms of the measured UE receive delay between different calls. This synchronization is attributed to the UE receiving delay according to the definition of the UE delay reference points The effect of this possible call-to-call variation is taken into account with the CCVA = max(0,DT) value.
9.10.4.3 Speech quality loss in conditions with packet arrival time variations and packet loss
The test method is the same as in wideband (see clause 8.10.4.3, observing the test signal properties for super-wideband described in clause 5.4).
9.10.5 UE send clock accuracy
The UE clock accuracy in send direction shall be measured according to Annex D.
NOTE1: For this specific measurement, care should be taken about the clock accuracy of the test equipment. See Table 1a.
NOTE2: As required in clause 5, prior to the actual measurements for MTSI-based speech with LTE, NR or WLAN access, the clocks of the reference client and the UE have to be synchronized. This measurement of UE send clock accurary does not need to be repeated and can be obtained from this setup procedure.
9.10.6 UE receiving with clock skew
For further study.
9.11 Echo control characteristics
9.11.1 Test set-up and test signals
The test method is the same as for wideband (see sub-clause 8.11.1, observing the signal properties for super-wideband described in sub-clause 5.4).
9.11.2 Test method
The test method is the same as for wideband (see sub-clause 8.11.2, observing the signal properties for super-wideband described in sub-clause 5.4).
9.11.2.1 Signal alignment
The test method is the same as for wideband (see sub-clause 8.11.2.1).
9.11.2.2 Signal level computation and frame classification
The test method is the same as for wideband (see sub-clause 8.11.2.2).
9.11.2.3 Classification into categories
The test method is the same as for wideband (see sub-clause 8.11.2.3).
9.12 Send speech quality and noise intrusiveness in the presence of ambient noise
9.12.1 Handset UE
The speech quality in sending for super-wideband systems is tested based on ETSI TS 103 281 [50]. This test method leads to three MOS-LQOfb quality numbers:
– N-MOS-LQOfb: Transmission quality of the background noise
– S-MOS-LQOfb: Transmission quality of the speech
– G-MOS-LQOfb: Overall transmission quality
The test arrangement is given in clause 5.1.5. For connections with handset UE, the measurement is conducted for 8 noise conditions as described in Table 2i. The measurements should be made in the same unique and dedicated call. The noise types shall be presented according to the order specified in Table 2i.
Table 2i: Noise conditions used for ambient noise simulation in handset mode as specified in ES 202 396-1 [35]
Description |
File name |
Duration |
Level |
Type |
Recording in pub |
Pub_Noise_binaural_V2 |
30 s |
L: 75,0 dB(A) R: 73,0 dB(A) |
Binaural |
Recording at pavement |
Outside_Traffic_Road_binaural |
30 s |
L: 74,9 dB(A) R: 73,9 dB(A) |
Binaural |
Recording at pavement |
Outside_Traffic_Crossroads_binaural |
20 s |
L: 69,1 dB(A) R: 69,6 dB(A) |
Binaural |
Recording at departure platform |
Train_Station_binaural |
30 s |
L: 68,2 dB(A) R: 69,8 dB(A) |
Binaural |
Recording at the drivers position |
Fullsize_Car1_130Kmh_binaural |
30 s |
L: 69,1 dB(A) R: 68,1 dB(A) |
Binaural |
Recording at sales counter |
Cafeteria_Noise_binaural |
30 s |
L: 68,4 dB(A) R: 67,3 dB(A) |
Binaural |
Recording in a cafeteria |
Mensa_binaural |
22 s |
L: 63,4 dB(A) R: 61,9 dB(A) |
Binaural |
Recording in business office |
Work_Noise_Office_Callcenter_binaural |
30 s |
L: 56,6 dB(A) R: 57,8 dB(A) |
Binaural |
1) Before starting the measurements, the calibration procedure described in clause 9.5 of ETSI TS 103 281 [50] shall be performed with the UE in handset mode. Also, a proper conditioning sequence shall be used. The conditioning sequence shall be comprised of the four additional sentences 1-4 described in ETSI TS 103 281 [50], applied to the beginning of the 16-sentence test sequence.
NOTE: The sequence of speech samples concatenated for the test signal, consisting of alternating talkers in the sending direction, reduces the overall test time but may represent an unrealistic behaviour for certain voice enhancement technologies. Alternative concatenations are for further study.
2) The send speech signal consists of the 16 sentences of speech as described in ETSI TS 103 281 [50] The test signal level is – 1.7 dBPa at the MRP, measured as active speech level per ITU-T P.56 [37]. Two signals are required for the tests:
– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 281 [50])
– The send signal is recorded at the POI.
3) N-MOS-LQOfb, S-MOS-LQOfb and G-MOS-LQOfb are calculated according to the Model A objective predictor described in ETSI TS 103 281[50] on a per sentence basis and averaged over all 16 sentences. The final results are derived as follows:
– S-MOS-LQOfb = S-MOS-LQOfb_modelA
– N-MOS-LQOfb = 1.438*N-MOS-LQOfb_modelA – 1.959
– G-MOS-LQOfb = G-MOS-LQOfb_modelA
4) The measurement is repeated for each ambient noise condition described in Table 2i.
5) The average of the results derived from all ambient noise types is calculated.
NOTE: Recent investigations indicated an improved prediction performance when combining both models A and B. The usage of model B according to ETSI TS 103 281 [50] is for further study, pending a commercially available implementation.
9.12.2 Hand-held hands-free UE
For connections with hand-held hands-free UE, when using the simulation method described in TS 103 224 [43], the measurement is conducted for 5 noise conditions as described in Table 2i2. When using the ES 202 396-1 method, the equivalent binaurally recorded noises described in Table 2i2, and available in the source file directory of TS 103 224 [43], are used.
Table 2i2: Noise conditions used for ambient noise simulation in hand-held hands-free mode as specified in TS 103 224 [43], A-weighted
Name |
Description |
Length |
Hands-free Levels |
Binaural L |
Binaural R |
Full-size car 130 km/h (FullSizeCar_130) |
HATS and microphone array at co-drivers position |
30 s |
1: 69,5 dB 2: 68,6 dB 3: 68,6 dB 4: 68,7 dB 5: 68,8 dB 6: 68,8 dB 7: 69,2 dB 8: 69,7 dB |
68.7 dB |
70.7 dB |
Crossroadnoise (Crossroadnoise) |
HATS and microphone array standing outside near a crossroad |
30 s |
1: 69,9 dB 2: 69,6 dB 3: 69,6 dB 4: 69,9 dB 5: 69,6 dB 6: 69,5 dB 7: 69,6 dB 8: 69,7 dB |
70.8 dB |
71.6 dB |
Cafeteria (Cafeteria) |
HATS and microphone array inside a cafeteria |
30 s |
1: 69,0 dB 2: 69,7 dB 3: 69,6 dB 4: 69,8 dB 5: 69,5 dB 6: 69,5 dB 7: 69,7 dB 8: 70,0 dB |
69.8 dB |
70.3 dB |
Sales Counter (SalesCounter) |
HATS and microphone array in a supermarket |
30 s |
1: 65,5 dB 2: 65,3 dB 3: 65,2 dB 4: 65,5 dB 5: 65,6 dB 6: 65,3 dB 7: 65,2 dB 8: 65,3 dB |
66.7 dB |
66.6 dB |
Callcenter 2 (Callcenter) |
HATS and microphone array in business office |
30 s |
1: 59,3 dB 2: 59,3 dB 3: 59,5 dB 4: 59,6 dB 5: 59,4 dB 6: 59,3 dB 7: 59,3 dB 8: 59,5 dB |
60,2 dB |
60,0 dB |
1) Before starting the measurements, the calibration procedure described in clause 9.5 of ETSI TS 103 281 [50] shall be performed with the UE in hands-free mode. Also, a proper conditioning sequence shall be used. The conditioning sequence shall be comprised of the four additional sentences 1-4 described in ETSI TS 103 281 [50], applied to the beginning of the 16-sentence test sequence.
NOTE: The sequence of speech samples concatenated for the test signal, consisting of alternating talkers in the sending direction, reduces the overall test time but may represent an unrealistic behaviour for certain voice enhancement technologies. Alternative concatenations are for further study.
2) The send speech signal consists of the 16 sentences of speech as described in ETSI TS 103 281 [50]. For connections with a hand-free UE, the test signal level is +1.3dBPa at the MRP, measured as active speech level according to ITU-T P.56 [37]. Two signals are required for the tests:
– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 281 [50])
– The send signal is recorded at the POI.
3) N-MOS-LQOfb, S-MOS-LQOfb and G-MOS-LQOfb are calculated according to the Model A objective predictor described in ETSI TS 103 281 [50] on a per sentence basis and averaged over all 16 sentences. The final results are derived as follows:
– S-MOS-LQOfb = S-MOS-LQOfb_modelA
– N-MOS-LQOfb = 1.438*N-MOS-LQOfb_modelA – 1.959
– G-MOS-LQOfb = G-MOS-LQOfb_modelA
4) The measurement is repeated for each ambient noise condition described in Table 2i2.
5) The average of the results derived from all ambient noise types is calculated.
NOTE: Recent investigations indicated an improved prediction performance when combining both models A and B. The usage of model B according to ETSI TS 103 281 [50] is for further study, pending a commercially available implementation.
9.12.3 Electrical interface UE
The speech quality in sending for super-wideband systems is tested based on ETSI TS 103 281 [50]. This test method leads to three MOS-LQOfb quality numbers:
– N-MOS-LQOfb: Transmission quality of the background noise
– S-MOS-LQOfb: Transmission quality of the speech
– G-MOS-LQOfb: Overall transmission quality
For the measurement of electrial interface UE, pre-recorded noisy speech signals according to Annex B of Recommendation ITU‑T P.381 [53] shall be used. These noisy test sequences are available for the eight noise types described in Table 2i and were captured at the electrical output of a representative analogue headset. The corresponding speech level at MRP was calibrated to -1.7 dBPa, as described in clause 9.12.1. All test signals also include the proper conditioning sequence described in ETSI TS 103 281 [50], which is applied to the beginning of the 16-sentence test sequence.
Annex B of Recommendation ITU‑T P.381 [53] also provides a recording without ambient noise and without Lombard correction (-4.7 dBPa at MRP). This silence condition is needed for the calibration procedure described in clause 9.5 of ETSI TS 103 281 [50].
1) The test arrangement is given in clause 5.1.6. For analogue interfaces, the noisy test sequences according to Annex B of Recommendation ITU‑T P.381 [53] shall be calibrated in a way that -26 dBov correspond to ‑60 dBV. For digital interfaces, -26 dBov shall correspond to -16 dBm0.
2) Before starting the measurements, the calibration procedure described in clause 9.5 of ETSI TS 103 281 [50] shall be performed with the electrical interface UE. A recording in silence as per Annex B of Recommendation ITU‑T P.381 [53] shall be used for the measurement.
3) The first noisy test sequence is inserted into electrical interface UE and then recorded at the POI. Two signals are required for the prediction model:
– The clean speech signal is used as the undisturbed reference (see ETSI TS 103 281 [50])
– The send signal is recorded at the POI.
4) N-MOS-LQOfb, S-MOS-LQOfb and G-MOS-LQOfb are calculated according to the Model A objective predictor described in ETSI TS 103 281[50] on a per sentence basis and averaged over all 16 sentences. The final results are derived as follows:
– S-MOS-LQOfb = S-MOS-LQOfb_modelA
– N-MOS-LQOfb = 1.438*N-MOS-LQOfb_modelA – 1.959
– G-MOS-LQOfb = G-MOS-LQOfb_modelA
5) The measurement is repeated for each ambient noise condition described in Table 2i. For each of these noise types, a corresponding test signal is available in Annex B of Recommendation ITU‑T P.381 [53].
6) The average of the results derived from all ambient noise types is calculated.
9.13 Jitter buffer management behaviour (handset, headset and electrical interface UE)
9.13.0 General
For MTSI-based speech-only with LTE, NR or WLAN access, a jitter buffer is used in receiving to handle the variation in packet receiver timing. To minimize the additional latency introduced by the jitter buffer, adaptation is used to minimize delay while preventing packet losses due to packet delivery timing variations. See clause 8 of TS 26.114 [39] for the definition of jitter buffer and minimum performance requirements on JBM.
The test method is used to characterize different possible strategies and trade-offs in the design of JBM implementations used in MTSI terminals.
9.13.1 Delay histogram
For this test it shall be ensured that the call is originated from the UE.
NOTE 1: Differences have been observed between UE originated calls and UE-terminated calls. For better consistency, calls from UE are used.
The test signal consists of 3 repeats of the Composite Source Signal (CSS) according to ITU-T Recommendation P.501 [22] followed by a speech signal of 160s. During the first two CSS signals the terminal can adapt its jitter buffer. The third CSS is used for measuring the delay in constant-delay condition, and the speech signal is used for delay and quality measurement in the packet impairment condition.
Constant delay Tc corresponding to the minimum delay of the profile (i.e. the compensation value for the profile) shall be added at the beginning of the different delay/loss profiles, to avoid unnecessary delay jumps between the two measurement phases and realistic conditions for the second measurement test phase. In receiving direction, the delay between the electrical access point of the test equipment and the reference point (RP), TTEAP-RP(t) = TR-jitter(t) + TTER, is measured in two successive phases:
1) First the delay in constant-delay condition TTEAP-DRP-constant is measured as described in steps 1 to 4, clause 9.10.2/9.10.2a/9.10.2b, using the third CSS signal. The constant delay Tc is subtracted from TTEAP-RP to obtain TR-constant.
2) Then the delay with packet impairment TR-jitter(t) is measured continuously for a speech signal during the inclusion of packet delay and loss profiles in the receiving direction RTP voice stream.
The reference point is defined as follows:
– for handset and headset UE, the reference point is the DRP.
– for electrical interface UE, the reference point is the input of the electrical reference interface.
Packet impairments shall be applied between the reference client and system simulator eNodeB. Separate calls shall be established for each packet impairment condition.
The start of the delay profiles must be synchronized with the start of the downlink speech material reproduction (compensated by the delay between reproduction and the point of impairment insertion, i.e. the delay of the reference client) in order to ensure a repeatable application of impairments to the test speech signal. Tests shall be performed with DTX enabled in the reference client.
NOTE 2: RTP packet impairments representing packet delay variations and loss are specified in Annex F. Care must be taken that the system simulator uses a dedicated bearer with no buffering/scheduling of packets for transmission.
For the CSS signal repeated 3 times, the pseudo random noise (pn)-part of the CSS has to be longer than the maximum expected delay. It is recommended to use a pn sequence of 32 k samples (with 48 kHz sampling rate). The test signal level is -16 dBm0 measured at the digital reference point or the equivalent analogue point.
For the speech signal, 8 English test sentences according to ITU-T P.501 Annex C.2.3, normalized to an active speech level of -16dBm0, are used (2 male, 2 female speakers). The sequences are concatenated in such a way that all sentences are centred within a 4.0s time window, which results in an overall duration of 32.0s. The sequences are repeated 5 times, resulting in a test file 160.0s long. The first 2 sentences are used for convergence of the UE jitter buffer manager and are discarded from the analysis. Equivalent implementations of the concatenation by repeating the test sentences in sequence may be used.
For the delay calculation with the speech signal, a cross-correlation with a rectangular window length of 4s, centered at each sentence of the stimulus file, is used. The process is repeated for each sample. For each cross correlation, the maximum of the envelope is obtained producing one delay value per sentence.
The UE delay in the receive direction, TR-jitter(t), is obtained by subtracting the delay introduced by the test equipment and the simulated transport network packet delay introduced by the delay and loss profile (as specified for the respective profile in Annex F) from the first electrical event at the electrical access point of the test equipment to the first bit of the corresponding speech frame at the system simulator antenna, TTER, from the measured TTEAP-DRP(t).
The difference DT between maximum receiving delay obtained with at least 5 individual calls (see clause 9.10.2) and the delay TR-constant measured for the CSS signal in constant delay condition is calculated. The quantity "Call-to-Call Variability Adjustment" (CCVA) = max(0,DT) shall be added to the obtained delay for the speech signal TR-jitter(t).
The UE delay in the receiving direction shall be reported in the form of an histogram covering the range of measured CCVA-adjusted values (TR-CCVA(t) = TR-jitter(t) + CCVA) with a step of 20 ms. The following pseudo code provides an example implementation for the histogram:
lo=min(floor(TR-CCVA(t=1…40)/20)*20)
hi=max(ceil(TR-CCVA(t=1…40)/20)*20)
[n,x]=hist(TR-CCVA(t=1…40),lo:20:hi)
bar(x,n)
The TR-CCVA values for all 40 sentences shall also be reported in the test report.
NOTE 3: The synchronization of the speech frame processing in the UE to the bits of the speech frames at the UE antenna may lead to a variability of up to 20 ms of the measured UE receive delay between different calls. This synchronization is attributed to the UE receiving delay according to the definition of the UE delay reference points. The effect of this possible call-to-call variation is taken into account with the CCVA = max(0,DT) value.
9.13.2 Speech quality loss histogram
For the evaluation of speech quality loss in conditions with packet arrival time variations and packet loss, the speech test signal described in clause 9.13.1 shall be used. Two 48 kHz recordings are used to produce the speech quality loss metric:
– A recording obtained in jitter and error free conditions with the test signal described in clause 7.13.1 (reference condition)
– A recording obtained during the application of packet arrival time variations and packet loss as described in clause 9.13.1 (test condition)
The speech quality of the signal is estimated using the measurement algorithm described in ITU-T Recommendation P.863 [44] in super-wideband mode. Level pre-alignment to -26 dBov of recordings shall be used – see P.863.1 clause 10.2 [45].
NOTE: For the analysis of acoustical measurements, ITU-T P.863 [44] assumes diffuse-field equalized recordings. For this reason, signals at DRP are diffuse-field corrected for testing handset and headset UE. For electrical interface UE, only the level pre-alignment is applied
A score shall be computed for each 8s speech sentence pair. The MOS-LQO values for the reference and test conditions shall be reported in the form of an histogram covering the range of measured values with a step of 0.1 and the values for all 20 sentences pairs shall also be reported in the test report. The following pseudo code provides an example implementation for the histogram:
lo=min(floor(MOS-LQOtest condition(i=1…20)/0.1)*0.1)
hi=max(ceil(MOS-LQOtest condition(i=1…20)/0.1)*0.1)
[n,x]=hist(MOS-LQOtest condition(i=1…20),lo:0.1:hi)
bar(x,n)
The synchronization between stimuli and degraded condition shall be done by the test system before applying the P.863 algorithm on each sentence pair.