4.2.3 Motion to Sound Latency in Dynamic Binaural Rendering Systems

26.2603GPPObjective test methodologies for the evaluation of immersive audio systemsRelease 18TS

4.2.3.1 Introduction

Motion to Sound latency is the time difference between the event of a change in head rotation and when the immersive audio signal is finally compensated for the head motion. The method in this specification is intended to verify that the overall motion-to-sound latency that a user experiences upon rotating their head is within acceptable limits.

The method allows full measurement of motion to sound, i.e. including both the latency of the head tracking sensor as well as the audio playback. This includes all components of a real setup and therefore contains all possible causes of additional latency that a user may experience.

The method also provides a latency value for the isolated audio processing of the binaural renderer without the aforementioned external hardware, assuming that the binaural renderer can process audio data as an audio processing plugin that can be evaluated in isolation.

NOTE: This method requires synchronized playback of two renderer instances and may not be suitable for the measurements of UEs where such synchronization is not possible.

4.2.3.2 Requirements

The following will be required:

Software:

– Audio processing software to run and record output of two renderers simultaneously

– Head tracker software

Hardware:

– Host machine for audio processing

– Head tracker hardware

– Stereo audio recording interface

– Stereo audio playback interface

– Mechanical setup to rotate the head tracking sensor in a precise and reproducible way

An exemplary hardware setup can be seen below in Figure 2, the method however can also be implemented using different systems under test and accompanying equipment:

Figure 2: Hardware Overview (Setup in Position 1 on the left, Position 2 on the right)

The audio processing environment uses two parallel signal chains, each containing its own instance of the same binaural renderer being tested. The test is concerned only with yaw angles, so values of pitch and roll should be set to zero at the beginning of the test and can be ignored thereafter.

Figure 3: Generic Audio Processing Environment

The initial conditions are that Rendering Chain 1 (RC1) has a static yaw head rotation angle of 0 degrees and RC2 uses the physical rotation of the head tracker to get its yaw value. A white noise signal is virtually placed directly in front of the listener (0 degrees azimuth, elevation), meaning that rotation of the arm directly affects how the white noise source is rendered.

4.2.3.3 Calibration

The first step is to calibrate the final position of the rotating arm (Position 2 / P2). The rotating arm is moved manually and requires only a limited range of motion – from some small rotation away from the table (Position 1 / P1), 20 to 30 degrees will be ample, through until contact with the table (P2). The arm should be placed at P2 and set up so that this position also corresponds to 0 degrees yaw.

4.2.3.4 Evaluation Environment

An object within the evaluation environment, e.g. using Max/MSP, should be created to set the value of yaw to exactly 0 degrees once the real value of yaw (received from the head tracker) is <0.2 degrees.

NOTE: This tolerance value was chosen to be as small as possible while ensuring that it does not bounce (dependent on the accuracy of the tracking system)

This object should be designed to latch to zero once the actual value is under the tolerance threshold, so that any small accidental rebound of the rotating arm does not affect the yaw angle fed to the renderer – it artificially remains at exactly 0, which is important to ensure that both rendering chains have exactly the same head rotation when the arm is in its final position (P2). The output from the evaluation environment is captured by the recording audio interface, which therefore includes any latency introduced by playback.

4.2.3.5 Data acquisition

A test run begins by starting to record on the recording audio interface. The rotating arm is set to Position 1, then the audio processing set running and starts feeding the input source to both renderer instances. A microphone is positioned near the contact point at the table. This mono room microphone will be recorded synchronously with the output from the evaluation environment, with its purpose being to log the point of contact of the arm with the table, which should be done with a good amount speed and vigour so that the microphone picks up a loud knock at the table. Shortly after this (one or two seconds for example), with the test run now complete, playback and recording can be stopped.

Some milliseconds after the collision, the latest yaw value detected by the tracking system will have been passed into the evaluation environment (tTracking). With the target yaw value now reached (latched to zero in Max), both rendering chains will have the identical values of head rotation and therefore, after some further short delay, the output of both renderer instances will be identical.

4.2.3.6 Data Analysis

The overall motion-to-sound latency (tM2S) is taken as the time from the moment of collision until the point at which the two output signals are identical.

To easily visually inspect when this point occurs, one output channel of one signal chain (e.g. RC1-left) is subtracted from the same output channel of the complementary signal chain (RC2-left).

NOTE 1: This could be done manually in audio editor software after processing, but this would require recording at least three channels synchronously (one from each renderer chain, and one of the room microphone). Instead, the subtraction of signals can be done within Max/MSP, meaning only the output of this operation (one channel) and the room microphone can conveniently be recorded with a stereo audio interface. In addition to the stereo WAVE file recorded by a separate audio application, the Max/MSP application also writes to a separate mono WAVE file once it detects that it is in the final tolerated yaw position (latched on). This mono WAVE contains only the subtracted signals as described above, from which the tMspProc time can also be measured.

Evaluation is performed offline in audio editor software. The tMspProc time is measured from the start of the file until the point at which consecutive zero samples begin. This value encompasses any motion-to-sound latency caused by the tested renderer chain as well as any other latency caused by Virtual Studio Technology (VST) plugin framework buffering. The tMspProc time shall be measured from the audio frame boundary at which the latched-on yaw value is activated and applied within that audio frame.

NOTE 2: Since the yaw rotation update rate of the tracker is typically in the range of a few milliseconds, there is a framing mismatch when compared to the audio framing, but this mismatch will not be incorporated in the tMspProc value but rather only in the tM2S measurement.

An example measurement of the tMspProc is displayed in Figure 4. For tM2S this is measured by selecting the duration between the visible collision peak in the microphone channel and the point at which the other channel reduces to silence. Figure 5 shows an example measurement for the motion-to-sound latency.

Figure 4: tMspProc latency measurement

Figure 5: Motion-to-sound (tM2S) latency measurement

In Figure 5 the room recording is on the top, subtracted renderer output is on the bottom. Marked region is the time passed since the arm hits the table (recorded knock) and when the subtracted binaural renderer output reaches silence.

NOTE 3: Unlike the tMspProc measurement, the tM2S measurement is taken from signals recorded from hardware audio interfaces, hence it is not possible to look for continuous silence since the resulting file will always contain some noise added by the digital-to-analog and analog-to-digital converters. For this reason, it is important to ensure a high signal-to-noise ratio in the signal provided to audio interfaces, to make it easier to inspect where the cancellation occurs.

Annex A (normative):
Order dependent directions

The following tables order-dependent directions , where and denote the elevations and azimuths in radians, respectively.

Index

1

1.570796

0

2

-0.339837

0

3

-0.339837

2.094395

4

-0.339837

-2.0944

Index

1

1.570796

0

2

-0.790277

0

3

0.363207

-1.95668

4

0.363207

1.956682

5

-0.844382

-1.95668

6

0.009757

-3.14159

7

-0.844382

1.956681

8

0.245128

0.687124

9

0.245129

-0.68712

Index

1

1.570796

0

2

0.716698

0

3

-0.461173

1.119907

4

-1.034310

-0.25283

5

0.492174

1.155586

6

-0.165812

2.040481

7

-0.461172

-1.38118

8

-0.165813

0.270692

9

0.001916

-2.20417

10

0.653709

2.297267

11

0.653709

-2.80293

12

-0.192680

3.010956

13

-1.079056

2.154919

14

0.001915

-0.63529

15

0.616834

-1.41973

16

-0.887326

-2.46809

Index

1

1.570796

0

2

0.747578

0

3

-0.168324

-2.00759

4

0.846499

1.927637

5

0.234515

-1.41208

6

0.699165

-2.10001

7

0.307091

2.512927

8

0.130649

1.667633

9

-0.677517

1.442383

10

0.136843

-0.60062

11

-1.317269

0.329968

12

-0.433118

-1.18621

13

-0.231864

2.983332

14

0.174242

-2.69222

15

-0.599985

0.507602

16

-0.382009

2.208977

17

-0.009394

0.952319

18

-1.013813

-1.71565

19

0.696199

0.934402

20

-0.602139

-0.38654

21

-1.041921

2.675958

22

-0.623111

-2.62842

23

0.054056

0.165012

24

0.855489

-1.02504

25

0.808243

-3.13121

Index

1

1.570796

0

2

-0.454100

0

3

0.323739

-1.19666

4

-1.175381

0.184066

5

0.947221

0.124282

6

-0.193698

-2.84022

7

0.500281

-1.84701

8

-0.663529

0.698758

9

-0.613332

2.280239

10

-0.588043

-2.28482

11

0.946645

-2.37569

12

0.333311

2.883411

13

0.967374

-1.18504

14

0.436854

-2.76846

15

0.510141

0.763488

16

-0.063811

-0.46491

17

0.048266

-2.27504

18

-0.148392

1.762138

19

0.945735

2.804486

20

-0.125777

-1.69175

21

-0.241518

-1.0321

22

-0.063824

0.509415

23

-1.240392

-1.95737

24

0.542172

-0.567

25

0.043647

2.319619

26

-0.291045

2.853233

27

-0.841101

-3.07101

28

-1.213891

2.113132

29

-0.706626

-1.50877

30

-0.774625

-0.65404

31

-0.707445

1.464227

32

0.990842

1.373127

33

-0.122664

1.112751

34

0.598614

2.113949

35

0.306690

0.057137

36

0.381934

1.457925

Index

1

1.570796

0

2

0.720144

0

3

-0.308365

3.024454

4

0.068431

2.080642

5

-0.495677

-2.21373

6

-0.018779

-2.03598

7

0.426043

1.678014

8

-0.259742

0.964363

9

0.179320

-3.03552

10

-0.249618

-2.70206

11

1.074183

0.581055

12

-0.781172

-2.80103

13

0.457849

0.550136

14

0.523951

-1.98436

15

-0.006246

-0.51212

16

-0.788507

-1.1411

17

0.228181

-2.48765

18

-0.418110

-1.62282

19

-0.512688

-0.57506

20

0.572140

2.286204

21

-0.867576

-0.08741

22

-0.624799

0.547028

23

-0.446687

1.878965

24

-0.789667

2.746717

25

1.047763

-0.76025

26

0.247192

-1.01978

27

0.720143

1.162107

28

-0.081819

1.507148

29

0.226040

1.062706

30

0.709088

-2.68135

31

-0.249096

-1.08377

32

0.573959

2.91352

33

1.069121

2.939099

34

0.135381

-1.53966

35

-0.057504

0.473238

36

-0.975369

-1.95522

37

-0.666036

1.294994

38

-1.146922

0.887936

39

-0.357070

2.427548

40

0.200642

-0.01608

41

-0.965084

1.97199

42

0.681666

-1.35341

43

0.112434

2.651183

44

0.528475

-0.57647

45

1.003627

1.857517

46

-1.275974

-0.77916

47

1.051102

-2.01121

48

-1.315079

3.087768

49

-0.326694

-0.00446

Annex B (normative):
Directions in Gaussian spherical grid