4.2.3 Motion to Sound Latency in Dynamic Binaural Rendering Systems
26.2603GPPObjective test methodologies for the evaluation of immersive audio systemsRelease 18TS
4.2.3.1 Introduction
Motion to Sound latency is the time difference between the event of a change in head rotation and when the immersive audio signal is finally compensated for the head motion. The method in this specification is intended to verify that the overall motion-to-sound latency that a user experiences upon rotating their head is within acceptable limits.
The method allows full measurement of motion to sound, i.e. including both the latency of the head tracking sensor as well as the audio playback. This includes all components of a real setup and therefore contains all possible causes of additional latency that a user may experience.
The method also provides a latency value for the isolated audio processing of the binaural renderer without the aforementioned external hardware, assuming that the binaural renderer can process audio data as an audio processing plugin that can be evaluated in isolation.
NOTE: This method requires synchronized playback of two renderer instances and may not be suitable for the measurements of UEs where such synchronization is not possible.
4.2.3.2 Requirements
The following will be required:
Software:
– Audio processing software to run and record output of two renderers simultaneously
– Head tracker software
Hardware:
– Host machine for audio processing
– Head tracker hardware
– Stereo audio recording interface
– Stereo audio playback interface
– Mechanical setup to rotate the head tracking sensor in a precise and reproducible way
An exemplary hardware setup can be seen below in Figure 2, the method however can also be implemented using different systems under test and accompanying equipment:
Figure 2: Hardware Overview (Setup in Position 1 on the left, Position 2 on the right)
The audio processing environment uses two parallel signal chains, each containing its own instance of the same binaural renderer being tested. The test is concerned only with yaw angles, so values of pitch and roll should be set to zero at the beginning of the test and can be ignored thereafter.
Figure 3: Generic Audio Processing Environment
The initial conditions are that Rendering Chain 1 (RC1) has a static yaw head rotation angle of 0 degrees and RC2 uses the physical rotation of the head tracker to get its yaw value. A white noise signal is virtually placed directly in front of the listener (0 degrees azimuth, elevation), meaning that rotation of the arm directly affects how the white noise source is rendered.
4.2.3.3 Calibration
The first step is to calibrate the final position of the rotating arm (Position 2 / P2). The rotating arm is moved manually and requires only a limited range of motion – from some small rotation away from the table (Position 1 / P1), 20 to 30 degrees will be ample, through until contact with the table (P2). The arm should be placed at P2 and set up so that this position also corresponds to 0 degrees yaw.
4.2.3.4 Evaluation Environment
An object within the evaluation environment, e.g. using Max/MSP, should be created to set the value of yaw to exactly 0 degrees once the real value of yaw (received from the head tracker) is <0.2 degrees.
NOTE: This tolerance value was chosen to be as small as possible while ensuring that it does not bounce (dependent on the accuracy of the tracking system)
This object should be designed to latch to zero once the actual value is under the tolerance threshold, so that any small accidental rebound of the rotating arm does not affect the yaw angle fed to the renderer – it artificially remains at exactly 0, which is important to ensure that both rendering chains have exactly the same head rotation when the arm is in its final position (P2). The output from the evaluation environment is captured by the recording audio interface, which therefore includes any latency introduced by playback.
4.2.3.5 Data acquisition
A test run begins by starting to record on the recording audio interface. The rotating arm is set to Position 1, then the audio processing set running and starts feeding the input source to both renderer instances. A microphone is positioned near the contact point at the table. This mono room microphone will be recorded synchronously with the output from the evaluation environment, with its purpose being to log the point of contact of the arm with the table, which should be done with a good amount speed and vigour so that the microphone picks up a loud knock at the table. Shortly after this (one or two seconds for example), with the test run now complete, playback and recording can be stopped.
Some milliseconds after the collision, the latest yaw value detected by the tracking system will have been passed into the evaluation environment (tTracking). With the target yaw value now reached (latched to zero in Max), both rendering chains will have the identical values of head rotation and therefore, after some further short delay, the output of both renderer instances will be identical.
4.2.3.6 Data Analysis
The overall motion-to-sound latency (tM2S) is taken as the time from the moment of collision until the point at which the two output signals are identical.
To easily visually inspect when this point occurs, one output channel of one signal chain (e.g. RC1-left) is subtracted from the same output channel of the complementary signal chain (RC2-left).
NOTE 1: This could be done manually in audio editor software after processing, but this would require recording at least three channels synchronously (one from each renderer chain, and one of the room microphone). Instead, the subtraction of signals can be done within Max/MSP, meaning only the output of this operation (one channel) and the room microphone can conveniently be recorded with a stereo audio interface. In addition to the stereo WAVE file recorded by a separate audio application, the Max/MSP application also writes to a separate mono WAVE file once it detects that it is in the final tolerated yaw position (latched on). This mono WAVE contains only the subtracted signals as described above, from which the tMspProc time can also be measured.
Evaluation is performed offline in audio editor software. The tMspProc time is measured from the start of the file until the point at which consecutive zero samples begin. This value encompasses any motion-to-sound latency caused by the tested renderer chain as well as any other latency caused by Virtual Studio Technology (VST) plugin framework buffering. The tMspProc time shall be measured from the audio frame boundary at which the latched-on yaw value is activated and applied within that audio frame.
NOTE 2: Since the yaw rotation update rate of the tracker is typically in the range of a few milliseconds, there is a framing mismatch when compared to the audio framing, but this mismatch will not be incorporated in the tMspProc value but rather only in the tM2S measurement.
An example measurement of the tMspProc is displayed in Figure 4. For tM2S this is measured by selecting the duration between the visible collision peak in the microphone channel and the point at which the other channel reduces to silence. Figure 5 shows an example measurement for the motion-to-sound latency.
Figure 4: tMspProc latency measurement
Figure 5: Motion-to-sound (tM2S) latency measurement
In Figure 5 the room recording is on the top, subtracted renderer output is on the bottom. Marked region is the time passed since the arm hits the table (recorded knock) and when the subtracted binaural renderer output reaches silence.
NOTE 3: Unlike the tMspProc measurement, the tM2S measurement is taken from signals recorded from hardware audio interfaces, hence it is not possible to look for continuous silence since the resulting file will always contain some noise added by the digital-to-analog and analog-to-digital converters. For this reason, it is important to ensure a high signal-to-noise ratio in the signal provided to audio interfaces, to make it easier to inspect where the cancellation occurs.
Annex A (normative):
Order dependent directions
The following tables order-dependent directions , where and denote the elevations and azimuths in radians, respectively.
Index |
||
---|---|---|
1 |
1.570796 |
0 |
2 |
-0.339837 |
0 |
3 |
-0.339837 |
2.094395 |
4 |
-0.339837 |
-2.0944 |
Index |
||
---|---|---|
1 |
1.570796 |
0 |
2 |
-0.790277 |
0 |
3 |
0.363207 |
-1.95668 |
4 |
0.363207 |
1.956682 |
5 |
-0.844382 |
-1.95668 |
6 |
0.009757 |
-3.14159 |
7 |
-0.844382 |
1.956681 |
8 |
0.245128 |
0.687124 |
9 |
0.245129 |
-0.68712 |
Index |
||
---|---|---|
1 |
1.570796 |
0 |
2 |
0.716698 |
0 |
3 |
-0.461173 |
1.119907 |
4 |
-1.034310 |
-0.25283 |
5 |
0.492174 |
1.155586 |
6 |
-0.165812 |
2.040481 |
7 |
-0.461172 |
-1.38118 |
8 |
-0.165813 |
0.270692 |
9 |
0.001916 |
-2.20417 |
10 |
0.653709 |
2.297267 |
11 |
0.653709 |
-2.80293 |
12 |
-0.192680 |
3.010956 |
13 |
-1.079056 |
2.154919 |
14 |
0.001915 |
-0.63529 |
15 |
0.616834 |
-1.41973 |
16 |
-0.887326 |
-2.46809 |
Index |
||
---|---|---|
1 |
1.570796 |
0 |
2 |
0.747578 |
0 |
3 |
-0.168324 |
-2.00759 |
4 |
0.846499 |
1.927637 |
5 |
0.234515 |
-1.41208 |
6 |
0.699165 |
-2.10001 |
7 |
0.307091 |
2.512927 |
8 |
0.130649 |
1.667633 |
9 |
-0.677517 |
1.442383 |
10 |
0.136843 |
-0.60062 |
11 |
-1.317269 |
0.329968 |
12 |
-0.433118 |
-1.18621 |
13 |
-0.231864 |
2.983332 |
14 |
0.174242 |
-2.69222 |
15 |
-0.599985 |
0.507602 |
16 |
-0.382009 |
2.208977 |
17 |
-0.009394 |
0.952319 |
18 |
-1.013813 |
-1.71565 |
19 |
0.696199 |
0.934402 |
20 |
-0.602139 |
-0.38654 |
21 |
-1.041921 |
2.675958 |
22 |
-0.623111 |
-2.62842 |
23 |
0.054056 |
0.165012 |
24 |
0.855489 |
-1.02504 |
25 |
0.808243 |
-3.13121 |
Index |
||
---|---|---|
1 |
1.570796 |
0 |
2 |
-0.454100 |
0 |
3 |
0.323739 |
-1.19666 |
4 |
-1.175381 |
0.184066 |
5 |
0.947221 |
0.124282 |
6 |
-0.193698 |
-2.84022 |
7 |
0.500281 |
-1.84701 |
8 |
-0.663529 |
0.698758 |
9 |
-0.613332 |
2.280239 |
10 |
-0.588043 |
-2.28482 |
11 |
0.946645 |
-2.37569 |
12 |
0.333311 |
2.883411 |
13 |
0.967374 |
-1.18504 |
14 |
0.436854 |
-2.76846 |
15 |
0.510141 |
0.763488 |
16 |
-0.063811 |
-0.46491 |
17 |
0.048266 |
-2.27504 |
18 |
-0.148392 |
1.762138 |
19 |
0.945735 |
2.804486 |
20 |
-0.125777 |
-1.69175 |
21 |
-0.241518 |
-1.0321 |
22 |
-0.063824 |
0.509415 |
23 |
-1.240392 |
-1.95737 |
24 |
0.542172 |
-0.567 |
25 |
0.043647 |
2.319619 |
26 |
-0.291045 |
2.853233 |
27 |
-0.841101 |
-3.07101 |
28 |
-1.213891 |
2.113132 |
29 |
-0.706626 |
-1.50877 |
30 |
-0.774625 |
-0.65404 |
31 |
-0.707445 |
1.464227 |
32 |
0.990842 |
1.373127 |
33 |
-0.122664 |
1.112751 |
34 |
0.598614 |
2.113949 |
35 |
0.306690 |
0.057137 |
36 |
0.381934 |
1.457925 |
Index |
||
---|---|---|
1 |
1.570796 |
0 |
2 |
0.720144 |
0 |
3 |
-0.308365 |
3.024454 |
4 |
0.068431 |
2.080642 |
5 |
-0.495677 |
-2.21373 |
6 |
-0.018779 |
-2.03598 |
7 |
0.426043 |
1.678014 |
8 |
-0.259742 |
0.964363 |
9 |
0.179320 |
-3.03552 |
10 |
-0.249618 |
-2.70206 |
11 |
1.074183 |
0.581055 |
12 |
-0.781172 |
-2.80103 |
13 |
0.457849 |
0.550136 |
14 |
0.523951 |
-1.98436 |
15 |
-0.006246 |
-0.51212 |
16 |
-0.788507 |
-1.1411 |
17 |
0.228181 |
-2.48765 |
18 |
-0.418110 |
-1.62282 |
19 |
-0.512688 |
-0.57506 |
20 |
0.572140 |
2.286204 |
21 |
-0.867576 |
-0.08741 |
22 |
-0.624799 |
0.547028 |
23 |
-0.446687 |
1.878965 |
24 |
-0.789667 |
2.746717 |
25 |
1.047763 |
-0.76025 |
26 |
0.247192 |
-1.01978 |
27 |
0.720143 |
1.162107 |
28 |
-0.081819 |
1.507148 |
29 |
0.226040 |
1.062706 |
30 |
0.709088 |
-2.68135 |
31 |
-0.249096 |
-1.08377 |
32 |
0.573959 |
2.91352 |
33 |
1.069121 |
2.939099 |
34 |
0.135381 |
-1.53966 |
35 |
-0.057504 |
0.473238 |
36 |
-0.975369 |
-1.95522 |
37 |
-0.666036 |
1.294994 |
38 |
-1.146922 |
0.887936 |
39 |
-0.357070 |
2.427548 |
40 |
0.200642 |
-0.01608 |
41 |
-0.965084 |
1.97199 |
42 |
0.681666 |
-1.35341 |
43 |
0.112434 |
2.651183 |
44 |
0.528475 |
-0.57647 |
45 |
1.003627 |
1.857517 |
46 |
-1.275974 |
-0.77916 |
47 |
1.051102 |
-2.01121 |
48 |
-1.315079 |
3.087768 |
49 |
-0.326694 |
-0.00446 |
Annex B (normative):
Directions in Gaussian spherical grid