7.10 Delay
26.1323GPPRelease 18Speech and video telephony terminal acoustic test specificationTS
7.10.0 UE Delay Measurement Methodologies
For UMTS circuit-switched operation and MTSI-based speech with LTE, NR or WLAN access in error and jitter free conditions, the sum of the UE delays in the sending and receiving directions (TS+TR) shall be measured according to the methods described in clauses 7.10.1 and 7.10.2. In the event that the delays of the test equipment in send and/or receive directions are not stable between calls or cannot be accurately determined, the alternative method described in clause 7.10.3 may be used to obtain (TS+TR) and the measured instability or inaccuracy observed when the methods described in 7.10.1 and 7.10.2 were performed shall be recorded in the test report. The test method(s) used and all results obtained shall also be recorded in the test report.
For MTSI-based speech with LTE, NR or WLAN access in conditions with simulated packet arrival time variations, the sum of the UE delays in the sending and receiving directions (TS+TR-jitter) and the objective speech quality in the receive direction shall be measured according to the method described in clause 7.10.4.
For MTSI-based speech with LTE, NR or WLAN access, prior to the actual measurements, the clock skew between UE and reference client shall be compensated by adjusting the clock of the reference client to match the clock of the UE (as stated in clause 5). The inaccuracy of the clock skew adjustment shall be less than 1ppm measured according to the procedure in Annex D.
It shall be ensured that the packet generation by the reference client and the packet treatment of the test equipment are free of jitter.
7.10.1 Delay in sending direction (Handset UE)
The handset terminal is setup as described in clause 5.1.1.
The UE delay in the sending direction is obtained by measuring the delay between MRP and the electrical access point of the test equipment and subtracting the delays introduced by the test equipment from the measured value.
Figure 17b1: Different entities when measuring the delay in sending direction
The delay measured from MRP to the electrical access point of the test equipment is TS + TTES.
TTES: The delay between the last bit of a speech frame at the system simulator antenna and the first electrical event at the electrical access point of the test equipment.
1. For the measurements, a Composite Source Signal (CSS) according to ITU-T Recommendation P.501 [22] is used. The pseudo random noise (pn)-part of the CSS has to be longer than the maximum expected delay. It is recommended to use a pn sequence of 32 k samples (with 48 kHz sampling rate). The test signal level is -4,7 dBPa at the MRP.
2 The reference signal is the original signal (test signal). The setup of the handset/headset terminal is made corresponding to clause 5.1.
3. The delay is determined by cross-correlation analysis between the measured signal at the electrical access point and the original signal. The measurement is corrected by subtracting the test equipment delay TTES.
4. The delay is measured in ms and the maximum of the cross-correlation envelope is used for the determination.
For MTSI-based speech with LTE, NR or WLAN access, a variability of up to 20ms may be expected between different calls due to the synchronization between the speech frame processing in the sending UE and the bits of the speech frames at the UE antenna. This synchronization is attributed to the UE sending delay according to the definition of the UE delay reference points. Hence, the maximum value of the UE sending delay obtained from at least 5 individual calls shall be reported as the UE delay in the sending direction. All values shall be reported in the test report.
A further variability of up to 20ms may be expected between different calls due to the synchronization between the speech frames at the UE antenna and the speech frame processing in the receiving reference client of the test system. In an end-to-end call this synchronization of the frames will only take place at the receiver, and this variability of the measurement shall be deduced from the UE sending delay. Hence, if the reference client of the test equipment does not adjust for the effect of the speech frame synchronization (as specified by the manufacturer of the reference client), this maximum uncertainty shall be subtracted from measured maximum value reported as the UE sending delay in order to compensate for the uncertainty of the test equipment. This correction value (i.e. maximum uncertainty) shall be reported in the test report.
7.10.1a Delay in sending direction (headset UE)
The UE delay in the sending direction is obtained by measuring the delay between MRP and the electrical access point of the test equipment and subtracting the delays introduced by the test equipment from the measured value.
Figure 17b2: Different entities when measuring the delay in sending direction with a headset connected via cable
NOTE: The test setup only applies to headsets connected by wire. Wireless headsets (e.g. connected by Bluetooth) are currently out of scope.
The test method is the same as for handset UE (clause 7.10.1).
7.10.1b Delay in sending direction (electrical interface UE)
The UE delay TS in the sending direction is obtained by measuring the delay between output of the electrical reference interface and the electrical access point of the test equipment; delays introduced by the test equipment are subtracted from the measured value.
Figure 17b2a: Different entities when measuring the delay in sending direction through electical interface UE
The overall delay measured from output of the electrical reference interface to the electrical access point of the test equipment is TS + TTES, as illustrated in Figure 17b2a.
The test method is the same as for handset UE (clause 7.10.1), except that the source levels are as follows:
– for analogue connections, -60 dBV at electrical reference interface output.
– for digital connection, -16 dBm0 at electrical reference interface output.
7.10.2 Delay in receiving direction (handset UE)
The handset terminal is setup as described in clause 5.
The UE delay in the receiving direction is obtained by measuring the delay between the electrical access point of the test equipment and the DRP and subtracting the delays introduced by the test equipment from the measured value.
Figure 17b3: Different entities when measuaring the delay in receiving direction
The delay measured from the electrical access point of the test equipment to DRP is TR + TTER.
TTER: The delay between the first electrical event at the electrical access point of the test equipment and the first bit of the corresponding speech frame at the system simulator antenna.
Before the actual test for MTSI-based speech with LTE, NR or WLAN access a conditioning sequence consisting of the British-English single talk sequence described in ITU-T Recommendation P.501 [22] is applied for convergence of the jitter buffer management of the UE. The conditioning sequence level shall be -16 dBm0 in order to not overload the codec.
1. For the measurements a Composite Source Signal (CSS) according to ITU-T Recommendation P.501 [22] is used. The pseudo random noise (pn)-part of the CSS has to be longer than the maximum expected delay. It is recommended to use a pn sequence of 32 k samples (with 48 kHz sampling rate).The test signal level is -16 dBm0 measured at the digital reference point or the equivalent analogue point.
2 The reference signal is the original signal (test signal). The setup of the handset/headset terminal is in correspondence to clause 5.1.
3. The delay is determined by cross-correlation analysis between the measured signal at the electrical access point and the original signal. The measurement is corrected by subtracting the test equipment delay TTER.
4. The delay is measured in ms and the maximum of the cross-correlation envelope is used for the determination.
For MTSI-based speech with LTE, NR or WLAN access, a variability of up to 20ms may be expected between different calls due to the synchronization between the bits of the speech frames at the UE antenna and the speech frame processing in the receiving UE. This synchronization is attributed to the UE receiving delay according to the definition of the UE delay reference points. Hence, the maximum value of the UE receiving delay obtained from at least 5 individual calls shall be reported as the UE delay in the receiving direction. All values shall be reported in the test report.
7.10.2a Delay in receiving direction (headset UE)
The UE delay in the receiving direction is obtained by measuring the delay between the electrical access point of the test equipment and the DRP and subtracting the delays introduced by the test equipment from the measured value.
Figure 17b4: Different entities when measuring the delay in receiving direction with a headset connected via cable
NOTE: The test setup only applies to headsets connected by wire. Wireless headsets (e.g. connected by Bluetooth) are currently out of scope.
The test method is the same as for handset UE (clause 7.10.2).
7.10.2b Delay in receiving direction (electrical interface UE)
The UE delay TR in the receiving direction is obtained by measuring the delay between the electrical access point of the test equipment and the input of the electical reference interface; delays introduced by the test equipment are subtracted from the measured value.
Figure 17b4a: Different entities when measuring the delay in receiving direction through electical interface UE
The overall delay measured from the electrical access point of the test equipment to the input of the electrical reference interface is TR + TTER, as illustrated in Figure 17b4a.
The test method is the same as for handset UE (clause 7.10.2).
7.10.3 Delay in sending + receiving direction using "echo" method (handset UE)
The UE delay is obtained by measuring the delay between the MRP and the DRP and subtracting the delays introduced by the test equipment from the measured value.
Figure 17b4bis: Different entities when measuring the delay in sending + receiving direction
The delay measured from MRP to DRP is (TS + TR + TSS).
TSS: The delay between the last bit of a speech frame at the system simulator antenna and the first bit of the looped back speech frame at the system simulator antenna.
Before the actual test for MTSI-based speech with LTE, NR or WLAN access a conditioning sequence consisting of the British-English single talk sequence described in ITU-T Recommendation P.501 [22] is applied for convergence of the jitter buffer management of the UE. The conditioning sequence level shall be -16 dBm0 in order to not overload the codec.
1. For the measurements a Composite Source Signal (CSS) according to ITU-T Recommendation P.501 [22] is used. It is recommended to use a pn sequence of 32 k samples (with 48 kHz sampling rate). The test signal level is -4.7 dBPa at the MRP.
2. The system simulator is configured for "loopback" or "echo" operation with the additional loopback delay as specified below when applicable. In "loopback" or "echo" operation, the packets in the sending direction are routed to the receiving direction by the system simulator.
3. The reference signal is the original signal (test signal). The setup of the mobile station is in correspondence to clause 5.1.
4. The mouth-to-ear delay is determined by cross-correlation analysis between the measured signal at DRP and the original signal. The analysis window for the cross-correlation shall start at an instant T > 50ms in order to discard the cross-correlation peaks corresponding to the direct acoustic path from mouth to ear and possible delayed sidetone signal. The measurement is corrected by subtracting the system simulator delay TSS to obtain the TS + TR delay.
5. The delay is measured in ms and the maximum of the cross-correlation envelope is used for the determination.
For MTSI-based speech with LTE, NR or WLAN access, a variability of the UE delay with up to 20ms in the respective sending and receiving direction may be expected due to the synchronization of the speech frame processing in the UE to the bits of the speech frame on the UE antenna. This synchronization is attributed to the UE delay according to the definition of the UE delay reference points. Hence, the UE delay shall be reported as the maximum value from at least 5 separate calls each with a different loopback delay TSS in at least 5 steps of 4ms in the full range from 0 to 16ms. All values shall be reported in the test report.
7.10.3a Delay in sending + receiving direction using "echo" method (headset UE)
The UE delay is obtained by measuring the delay between the MRP and the DRP and subtracting the delays introduced by the test equipment, TSS, from the measured value.
The test method is the same as for handset UE (clause 7.10.3).
7.10.3b Delay in sending + receiving direction using "echo" method (electrical interface UE)
The UE delay is obtained by measuring the delay between the input and output of the electrical reference interface; delays introduced by the test equipment and system simulator, TSS, is subtracted from the measured value.
The test method is the same as for handset UE (clause 7.10.3), except that the source levels are as follows:
– for analogue connections, -60 dBV at electrical reference interface output.
– for digital connection, -16 dBm0 at electrical reference interface output.
7.10.4 Delay and speech quality in conditions with packet arrival time variations and packet loss (handset, headset, electrical interface UE)
7.10.4.1 Delay in sending direction
The UE delay in the sending direction, TS, shall be measured in jitter and error free conditions according to clause 7.10.0.
7.10.4.2 Delay in receiving direction
For this test it shall be ensured that the call is originated from the UE.
NOTE 1: Differences have been observed between UE originated calls and UE-terminated calls. For better consistency, calls from the UE are used.
The test signal consists of 3 repeats of the Composite Source Signal (CSS) according to ITU-T Recommendation P.501 [22] followed by a speech signal of 160s. During the first two CSS signals the terminal can adapt its jitter buffer. The third CSS is used for measuring the delay in constant-delay condition, and the speech signal is used for delay and quality measurement in the packet impairment condition.
Constant delay Tc corresponding to the minimum delay of the profile (i.e. the compensation value for the profile) shall be added at the beginning of the different delay/loss profiles, to avoid unecessary delay jumps between the two measurement phases and realistic conditions for the second measurement test phase.
In receiving direction, the delay between the electrical access point of the test equipment and reference point (RP), TTEAP-RP(t) = TR-jitter(t) + TTER, is measured in two successive phases:
1) First the delay in constant-delay condition TTEAP-RP-constant is measured as described in steps 1 to 4, clause 7.10.2 /7.10.2a/7.10.2b, using the third CSS signal. The constant delay Tc is subtracted from TTEAP-RP-constant to obtain TR-constant.
2) Then the delay with packet impairment TR-jitter(t) is measured continuously for a speech signal during the inclusion of packet delay and loss profiles in the receiving direction RTP voice stream.
The reference point is defined as follows:
– for handset and headset UE, the reference point is the DRP.
– for electrical interface UE, the reference point is the input of the electrical reference interface.
Packet impairments shall be applied between the reference client and system simulator eNodeB. Separate calls shall be established for each packet impairment condition.
The start of the delay profiles must be synchronized with the start of the downlink speech material reproduction (compensated by the delay between reproduction and the point of impairment insertion, i.e. the delay of the reference client) in order to ensure a repeatable application of impairments to the test speech signal. Tests shall be performed with DTX enabled in the reference client.
NOTE 2: RTP packet impairments representing packet delay variations and loss in LTE transmission scenarios are specified in Annex E. These LTE jitter/loss profiles are reused also for tests with WLAN and NR access. Care must be taken that the system simulator uses a dedicated bearer with no buffering/scheduling of packets for transmission.
For the CSS signal repeated 3 times, the pseudo random noise (pn)-part of the CSS has to be longer than the maximum expected delay. It is recommended to use a pn sequence of 32 k samples (with 48 kHz sampling rate). The test signal level is -16 dBm0 measured at the digital reference point or the equivalent analogue point.
For the speech signal, 8 English test sentences according to ITU-T P.501 Annex C.2.3, normalized to an active speech level of -16dBm0, are used (2 male, 2 female speakers). The sequences are concatenated in such a way that all sentences are centered within a 4.0s time window, which results in an overall duration of 32.0s. The sequences are repeated 5 times, resulting in a test file 160.0s long. The first 2 sentences are used for convergence of the UE jitter buffer manager and are discarded from the analysis. Equivalent implementations of the concatenation by repeating the test sentences in sequence may be used.
For the delay calculation with the speech signal, a cross-correlation with a rectangular window length of 4s, centered at each sentence of the stimulus file, is used. The process is repeated for each sample. For each cross correlation, the maximum of the envelope is obtained producing one delay value per sentence.
The UE delay in the receive direction, TR-jitter(t), is obtained by subtracting the delay introduced by the test equipment and the simulated transport network packet delay introduced by the delay and loss profile (as specified for the respective profile in Annex E) from the first electrical event at the electrical access point of the test equipment to the first bit of the corresponding speech frame at the system simulator antenna, TTER, from the measured TTEAP-DRP(t).
The difference DT between maximum receiving delay obtained with at least 5 individual calls (see clause 7.10.2) and the delay TR-constant measured for the CSS signal in constant delay condition is calculated. The quantity "Call-to-Call Variability Adjustment" (CCVA) = max(0,DT) shall be added to the obtained delay for the speech signal TR-jitter(t).
For stationary packet delay variation test conditions (test condition 1 and 2), the first 2 sentences are used for convergence of the jitter buffer management and are discarded from the analysis. The CCVA-adjusted UE delay (TR-CCVA(t) = TR-jitter(t) + CCVA) in the receiving direction shall be reported as the maximum value excluding the two largest values of the remaining sequence of the 38 sentence delay values, i.e. the 95-percentile value of TR-CCVA(t). The TR-CCVA values for all 40 sentences shall be reported in the test report.
NOTE 3: The synchronization of the speech frame processing in the UE to the bits of the speech frames at the UE antenna may lead to a variability of up to 20 ms of the measured UE receive delay between different calls. This synchronization is attributed to the UE receiving delay according to the definition of the UE delay reference points The effect of this possible call-to-call variation is taken into account with the CCVA = max(0,DT) value.
7.10.4.3 Speech quality loss in conditions with packet arrival time variations and packet loss
For the evaluation of speech quality loss in conditions with packet arrival time variations and packet loss, the test signal described in clause 7.10.4.2 shall be used. The first 2 sentences are used for convergence of the UE jitter buffer manager and are discarded from the analysis. Two 48 kHz recordings are used to produce the speech quality loss metric:
– A recording obtained in jitter and error free conditions with the test signal described in clause 7.10.4.2 (reference condition)
– A recording obtained during the application of packet arrival time variations and packet loss as described in clause 7.10.4.2 (test condition)
The speech quality of the signal is estimated using the measurement algorithm described in ITU-T Recommendation P.863 [44] in super-wideband mode. For narrowband speech, the method according to Appendix III of P.863 [44] shall be used. Level pre-alignment to -26 dBov of recordings shall be used – see P.863.1 clause 10.2 [45].
NOTE: For the analysis of acoustical measurements, ITU-T P.863 [44] assumes diffuse-field equalized recordings. For this reason, signals at DRP are diffuse-field corrected for testing handset and headset UE. For electrical interface UE, only the level pre-alignment is applied.
A score shall be computed for each 8s speech sentence pair and averaged to produce a mean MOS-LQO value for the reference and test conditions.
MOS-LQOREF
MOS-LQOTEST
NOTE: This evaluation of the speech quality requirement is only applicable to test conditions with a stationary statistic of the packet delay variation. Evaluation of the speech quality for a test condition with non-stationary packet delay variations is for further study.
The synchronization between stimuli and degraded condition shall be done by the test system before applying the P.863 algorithm on each sentence pair.
7.10.5 UE send clock accuracy
The UE clock accuracy in send direction shall be measured according to Annex D.
NOTE1: For this specific measurement, care should be taken about the clock accuracy of the test equipment. See Table 1a.
NOTE2: As required in clause 5, prior to the actual measurements for MTSI-based speech with LTE, NR or WLAN access, the clocks of the reference client and the UE have to be synchronized. This measurement of UE send clock accurary does not need to be repeated and can be obtained from this setup procedure.
7.10.6 UE receiving with clock skew
For further study.