9 VR Metrics

26.1183GPPRelease 17TSVirtual Reality (VR) profiles for streaming applications

9.1 General

VR metrics is a functionality where the client collects specific quality-related metrics during a session. These collected metrics can then be reported back to a network side node for further analysis. The metric functionality is based on the QoE metrics concept in 3GP-DASH [8], but further extended to also cover VR-specific metrics. A VR client supporting VR metrics shall support all metrics listed in clause 9.3, and shall handle metric configuration and reporting as specified in clauses 9.4 and 9.5.

9.2 VR Client Reference Architecture

9.2.1 Architecture

The client reference architecture for VR metrics, shown below in Figure 9.2.1-1, is based on the client architecture in Figure 4.3-1. It also contains a number of observation points where specific metric-related information can be made available to the Metrics Collection and Computation (MCC) function. The MCC can use and combine information from the different observation points to calculate more complex metrics.

Note that these observation points are only defined conceptually, and might not always directly interface to the MCC. For instance, an implementation might relay information from the actual observation points to the MCC via the VR application. It is also possible that the MCC is not separately implemented, but simply included as an integral part of the VR application.

Also note that in this version of this specification not all of the described observation points are necessarily used to produce VR metrics.

Figure 9.2.1-1: Client reference architecture for VR metrics

9.2.2 Observation Point 1

The access engine fetches the MPD, constructs and issues segment requests for relevant adaptation sets or preselections as ordered by the VR application, and receives segments or parts of segments. It may also adapt between different representations due to changes in available bitrate. The access engine provides a conforming 3GPP VR track to the file decoder.

The interface from the access engine towards MCC is referred to as observation point 1 (OP1) and is defined to monitor:

– A sequence of transmitted network requests, each defined by its transmission time, contents, and the TCP connection on which it is sent

– For each network response, the reception time and contents of the response header and the reception time of each byte of the response body

– The projection/orientation metadata carried in network manifest file if applicable

– The reception time and intended playout time for each received segment

9.2.3 Observation Point 2

The file decoder processes the 3GPP VR Track and typically includes a file parser and a media decoder. The file parser processes the file or segments, extracts elementary streams, and parses the metadata, if present. The processing may be supported by dynamic information provided by the VR application, for example which tracks to choose based on static and dynamic configurations. The media decoder decodes media streams of the selected tracks into the decoded signals. The file decoder outputs the decoded signals and metadata which is used for rendering.

The interface from the file decoder towards MCC is referred to as observation point 2 (OP2) and is defined to monitor:

– Media resolution

– Media codec

– Media frame rate

– Media projection, such as region wise packing, region wise quality ranking, content coverage

– Mono vs. stereo 360 video

– Media decoding time

9.2.4 Observation Point 3

The sensor extracts the current pose according to the user’s head and/or eye movement and provides it to the renderer for viewport generation. The current pose may also be used by the VR application to control the access engine on which adaptation sets or preselections to fetch.

The interface from the sensor towards MCC is referred to as observation point 3 (OP3) and is defined to monitor:

– Head pose

– Gaze direction

– Pose timestamp

– Depth

9.2.5 Observation Point 4

The VR Renderer uses the decoded signals and rendering metadata, together with the pose and the knowledge of the horizontal/vertical field of view, to determine a viewport and render the appropriate part of the video and audio signals.

The interface from the media presentation towards MCC is referred to as observation point 4 (OP4) and is defined to monitor:

– The media type

– The media sample presentation timestamp

– Wall clock counter

– Actual presentation viewport

– Actual presentation time

– Actual playout frame rate

– Audio-to-video synchronization

– Video-to-motion latency

– Audio-to-motion latency

9.2.6 Observation Point 5

The VR application manages the complete device, and controls the access engine, the file decoder and the rendering based on media control information, the dynamic user pose, and the display and device capabilities.

The interface from the VR application towards MCC is referred to as observation point 5 (OP5) and is defined to monitor:

– Display resolution

– Max display refresh rate

– Field of view, horizontal and vertical

– Eye to screen distance

– Lens separation distance

– OS support, e.g. OS type, OS version

9.3 Metrics Definitions

9.3.1 General

As the VR metrics functionality is based on the DASH QoE metrics [8], all metrics already defined in [8] are valid also for a VR client. Thus the following sub-clauses only define additional VR-related metrics.

9.3.2 Comparable quality viewport switching latency

The comparable quality viewport switching latency metric reports the latency and the quality-related factors when viewport movement causes quality degradations, such as when low-quality background content is briefly shown before the normal higher-quality is restored. Note that this metric is only relevant if the Advanced Video Media profile and region-wise packing is used. Also note that the metric currently does not report factors related to foveated rendering.

The viewport quality is represented by two factors; the quality ranking (QR) value, and the pixel resolution of one or more regions within the viewport. The resolution is defined by the orig_width and orig_height values in ISO/IEC 23090-2 [13] in SRQR (Spherical-Region Quality Ranking) or 2DQR (2-Dimensional Quality Ranking). The resolution corresponds to the monoscopic projected picture from which the packed region covering the viewport is extracted.

In order to determine whether two viewports have a comparable quality, if more than one quality ranking region is visible inside the viewport, the aggregated viewport quality factors are calculated as the area-weighted average for QR and the area-weighted (effective) pixel resolution, respectively.

For instance, if 60% of the viewport is from a region with QR=1, Res=3840 x 2160, and 40% is from a region with QR=2, Res=960 x 540, then the average QR is 0.6 x 1 + 0.4 x 2, and the effective pixel resolution is 0.6 x 3840 x 2160 + 0.4 x 960 x 540 (also see Annex D.1 for more examples).

If the viewport is moved so that the current viewport includes at least one new quality ranking region (i.e. a quality ranking region not included in the previous viewport), a switch event is started. The list of quality factors related to the last evaluated viewport quality before the switch are assigned to the firstViewport log entry. The start time of the switch is also set to the time of the last evaluated viewport before the switch.

The end time for the switch is defined as when both the weighted average QR and the effective resolution for the viewport reach values comparable to the ones before the switch. A value is comparable if it is not more than QRT% (QR threshold) or ERT% (effective resolution threshold) worse than the corresponding values before the switch. If comparable values are not achieved within N milliseconds, a timeout occurs (for instance if an adaptation to a lower bitrate occurs, and the viewport never reaches comparable quality).

Note that smaller QR values and larger resolution values are better. For instance, QRT=5% would require a weighted average QR value equal or smaller than 105 % of the weighted average QR before the switch, but ERT=5% would require an effective resolution value equal or larger than 95% of the effective resolution before the switch.

The list of quality factors related to the viewport which fulfills both thresholds are assigned to the secondViewport log entry, and the latency (end time minus start time) is assigned to the latency log entry. In case of a timeout, this is indicated under the cause log entry.

During the duration of the switch the worst evaluated viewport is also stored, and assigned to the worstViewport log entry. The worst viewport is defined as the viewport with the worst relative weighted average QR or relative effective resolution, as compared to the values before the switch.

If a new viewport switching event occurs (e.g. yet another new region becomes visible) before an ongoing switch event has ended, only the N milliseconds timeout is reset. The ongoing measurement process continues to evaluate the viewport quality until a comparable viewport quality value is achieved (or a timeout occurs).

The observation points needed to calculate the metrics are:

– OP2 File Decoder: SRQR/2DQR information

– OP3 Sensor: Gaze information

– OP4 VR Renderer: Start of switch event detection (alternatively, region coverage information from SRQR/2DQR can be used as strict rendering pixel-exactness is not required)

– OP5 VR Application: Field-of-view information of the device

The accuracy of the measured latency depends on how the client implements the viewport switching monitoring. As this might differ between clients, the client shall report the estimated accuracy.

The thresholds QRT, ERT, and the timeout N, can be specified during metrics configuration (see clause 9.4) as attributes within parenthesis, e.g. "CompQualLatency(QRT=3.5,ERT=6.8,N=900)". If a threshold or the timeout is not specified the client shall use appropriate default values.

The data type ViewportDataType is defined in Table 9.3.2-1 below, and identifies the direction and coverage of the viewport.

Table 9.3.2-: ViewportDataType

Key

Type

Description

ViewportDataType

Object

centre_azimuth

Integer

Specifies the azimuth of the centre of the viewport in units of 2−16 degrees. The value shall be in the range of −180 * 216 to 180 * 216 − 1, inclusive.

centre_elevation

Integer

Specifies the elevation of the centre of the viewport in units of 2−16 degrees. The value shall be in the range of −90 * 216 to 90 * 216, inclusive.

centre_tilt

Integer

Specifies the tilt angle of the viewport in units of 2−16 degrees. The value shall be in the range of −180 * 216 to 180 * 216 − 1, inclusive.

azimuth_range

Integer

Specifies the azimuth range of the viewport through the centre point of the viewport, in units of 2−16 degrees.

elevation_range

Integer

Specifies the elevation range of the viewport through the centre point of the viewport, in units of 2−16 degrees.

The data type Viewport-Item is defined as shown in Table 9.3.2‑2. Viewport-Item is an Object which identifies a viewport and quality-related factors for the region(s) covered by the viewport.

Table 9.3.2-2: ViewportItem

Key

Type

Description

ViewportItem

Object

Position

ViewportDataType

Identifies the viewport

QualityLevels

List

List of different quality levels regions within the viewport

Coverage

Float

Percentage of the viewport area covered by this region

QR

Integer

Quality ranking (QR) value of this region

Resolution

Object

Resolution for this region

Width

Integer

Horizontal resolution for this region

Height

Integer

Vertical resolution for this region

The comparable quality viewport switching latency metric is specified in Table 9.3.2-1 below.

Table 9.3.2-1: Comparable quality viewport switching latency metric

Key

Type

Description

CompQualLatency

List

List of comparable quality viewport switching latencies

Entry

Object

firstViewport

ViewportItem

Specifies information about the first viewport

secondViewport

ViewportItem

Specifies information about the second viewport

worstViewport

ViewportItem

Specifies information about the worst viewport seen during the switch duration

time

Real-Time

Wall-clock time when the switch started

Mtime

Media-Time

Media presentation time when the switch started.

Latency

Integer

Specifies the switching delay in milliseconds.

Accuracy

Integer

Specifies the estimated accuracy of the latency metric in milliseconds

Cause

List

Specifies a list of possible causes for the latency.

Entry

Object

code

Enum

A possible cause for the latency. The value is equal to one of the following:

– 0: Segment duration

– 1: Buffer fullness

– 2: Availability of comparable quality segment

– 3: Timeout

9.3.3 Rendered viewports

The rendered viewports metric reports a list of viewports that have been rendered during the media presentation.

The client shall evaluate the current viewport gaze every X ms and potentially add the viewport to the rendered viewport list. To enable frequent viewport evaluations without necessarily increasing the report size too much, consecutive viewports which are close to each other may be grouped into clusters, where only the average cluster viewport data is reported. Also, clusters which have too short durations may be excluded from the report.

The viewport clustering is controlled by an angular distance threshold D. If the center (i.e. the azimuth and the elevation) of the current viewport is closer than the distance D to the current cluster center (i.e. the average cluster azimuth and elevation), the viewport is added to the cluster. Note that the distance is only compared towards the current (i.e. last) cluster, not to any earlier clusters which might have been created.

If the distance to the cluster center is instead equal to or larger than D, a new cluster is started based on the current viewport, and the average old cluster data and the start time and duration for the old cluster is added to the viewport list.

Before reporting a viewport list, a filtering based on viewport duration shall be done. Each entry in the viewport list is first assigned an "aggregated duration" equal to the duration of that entry. Then, for each entry E, the other entries in the viewport list are checked. The duration for a checked entry is added to the aggregated duration for entry E, if the checked entry is both less than T ms away from E, and closer than the angular distance D from E.

After all viewport entries have been evaluated and have received a final aggregated duration, all viewport entries with an aggregated duration of less than T are deleted from the viewport list (and thus not reported). Note that the aggregated duration is only used for filtering purposes, and not itself included in the viewport list reports.

Some examples of metric calculation are shown in Annex D.2.

The observation points needed to calculate the metrics are:

– OP3 Sensor: Gaze information

– OP5 P5 VR Application: Field-of-view information of the device

The viewport sample interval X (in ms), the distance threshold D (in degrees), and the duration threshold T (in ms) can be specified during metrics configuration as attributes within parenthesis, e.g. "RenderedViewports(X=50,D=15,T=1500)". Note that if no clustering or duration filtering is wanted, the D and T thresholds can be set to 0 (e.g. specifying "RenderedViewports(X=1000,D=0,T=0)" will just log the viewport every 1000 ms). If no sample interval or thresholds values are specified the client shall use appropriate default values.

The rendered viewports metric is specified in Table 9.3.3-1.

Table 9.3.3-1: Rendered viewports metric

Key

Type

Description

RenderedViewports

List

List of rendered viewports

Entry

Object

startTime

Media-Time

Specifies the media presentation time of the first played out media sample when the viewport cluster indicated in the current entry is rendered starting from this media sample.

duration

Integer

The time duration, in units of milliseconds, of the continuously presented media samples when the viewport cluster indicated in the current entry is rendered starting from the media sample indicated by startTime.

"Continuously presented" means that the media clock continued to advance at the playout speed throughout the interval.

viewport

ViewportDataType

Indicates the average region of the omnidirectional media corresponding to the viewport cluster being rendered starting from the media sample time indicated by startTime.

9.3.4 VR Device information

This metric contains information about the device, and is logged at the start of each session and whenever changed (for instance if the rendered field-of-view for the device is adjusted). If an individual metric cannot be logged, its value shall be set to 0 (zero) or to the empty string.

The observation point needed to report the metrics is:

– OP5 VR Application: Device Information

Table 9.3.4-1: Device information

Key

Type

Description

VrDeviceInformation

List

A list of device information objects.

Entry

Object

A single object containing new device information.

start

Real-Time

Wall-clock time when the device information was logged.

mstart

Media-Time

The presentation time at which the device information was logged.

deviceIdentifier

String

The brand, model and version of the device.

horizontalResolution

Integer

The horizontal display resolution, per eye, in pixels.

verticalResolution

Integer

The vertical display resolution, per eye, in pixels.

horizontalFoV

Integer

Maximum horizontal field-of-view, per eye, in degrees.

verticalFoV

Integer

Maximum vertical field-of-view, per eye, in degrees.

renderedHorizontalFoV

Integer

Current rendered horizontal field-of-view, per eye, in degrees.

renderedVerticalFoV

Integer

Current rendered vertical field-of-view, per eye, in degrees.

refreshRate

Integer

Display refresh rate, in Hz

9.4 Metrics Configuration and Reporting

9.4.1 Configuration

Metrics configuration is done according to clauses 10.4 and 10.5 in DASH [8], but can also include any metrics defined in clause 9.3.

9.4.2 Reporting

Metrics reporting is done according to clause 10.6 in DASH [8], with the type QoeReportType extended to handle the additional VR-specific metrics according the XML schema in clause 9.4.3. In this version of the specification the element vrMetricSchemaVersion shall be set to 1.

9.4.3 Reporting Format

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="urn:3gpp:metadata:2020:VR:metrics"

xmlns:hsd="urn:3gpp:metadata:2011:HSD:receptionreport"

xmlns="urn:3gpp:metadata:2020:VR:metrics" elementFormDefault="qualified">

<xs:complexType name="VrQoeReportType">

<xs:complexContent>

<xs:extension base="QoeReportType">

<xs:sequence>

<xs:element name="vrMetric" type="VrMetricType"

minOccurs="0" maxOccurs="unbounded"/>

<xs:element name="vrMetricSchemaVersion" type="unsignedInt"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

</xs:extension>

</xs:complexContent>

<xs:anyAttribute processContents="skip"/>

</xs:complexType>

<xs:complexType name="VrMetricType">

<xs:choice maxOccurs="unbounded">

<xs:element name="compQualLatency" type="CompQualLatencyType"

maxOccurs="unbounded"/>

<xs:element name="renderedViewports" type="RenderedViewportsType"

maxOccurs="unbounded"/>

<xs:element name="vrDeviceInformation" type="VrDeviceInformationType"

maxOccurs="unbounded"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:choice>

<xs:anyAttribute processContents="skip"/>

</xs:complexType>

<xs:complexType name="CompQualLatencyType">

<xs:sequence>

<xs:element name="firstViewport" type="ViewportItem"/>

<xs:element name="secondViewport" type="ViewportItem"/>

<xs:element name="worstViewport" type="ViewportItem"/>

<xs:element name="time" type="xs:dateTime"/>

<xs:element name="mtime" type="xs:duration"/>

<xs:element name="latency" type="xs:unsignedInt"/>

<xs:element name="accuracy" type="xs:unsignedInt"/>

<xs:element name="cause" type="unsignedInt" minoccurs="0" maxoccurs="unbounded"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:anyAttribute processContents="skip"/>

</xs:cmplexType>

<xs:complexType name="RenderedViewportsType">

<xs:sequence>

<xs:element name="startTime" type="xs:duration"/>

<xs:element name="duration" type="xs:unsignedInt"/>

<xs:element name="viewport" type="ViewportDataType"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:anyAttribute processContents="skip"/>

</xs:cmplexType>

<xs:complexType name="VrDeviceInformationType">

<xs:sequence>

<xs:element name="start" type="xs:dateTime"/>

<xs:element name="mstart" type="xs:duration"/>

<xs:element name="deviceIdentifier" type=cs:string/>

<xs:element name="horizontalResolution" type=cs:unsignedInt/>

<xs:element name="verticalResolution" type=cs:unsignedInt/>

<xs:element name="horizontalFoV" type=cs:unsignedInt/>

<xs:element name="verticalFoV" type=cs:unsignedInt/>

<xs:element name="renderedHorizontalFoV" type=cs:unsignedInt/>

<xs:element name="renderedVerticalFoV" type=cs:unsignedInt/>

<xs:element name="refreshRate" type=cs:unsignedInt/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:anyAttribute processContents="skip"/>

</xs:cmplexType>

<xs:complexType name="ViewportItem">

<xs:sequence>

<xs:element name="position" type="ViewportDataType"/>

<xs:element name="qualityLevel" type="QualityLevelEntry" maxOccurs="unbounded"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:anyAttribute processContents="skip"/>

</xs:complexType>

<xs:complexType name="ViewportDataType">

<xs:sequence>

<xs:element name="centreAzimuth" type="xs:unsignedInt"/>

<xs:element name="centreElevation" type="xs:unsignedInt"/>

<xs:element name="centreTilt" type="xs:unsignedInt"/>

<xs:element name="azimuthRange" type="xs:unsignedInt"/>

<xs:element name="elevationRange" type="xs:unsignedInt"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:anyAttribute processContents="skip"/>

</xs:cmplexType>

<xs:complexType name="QualityLevelEntry">

<xs:sequence>

<xs:element name="coverage" type="xs:double"/>

<xs:element name="qr" type="xs:unsignedInt"/>

<xs:element name="width" type="xs:unsignedInt"/>

<xs:element name="height" type="xs:unsignedInt"/>

<xs:any namespace="##other" processContents="lax"

minOccurs="0" maxOccurs="unbounded"/>

</xs:sequence>

<xs:anyAttribute processContents="skip"/>

</xs:complexType>

</xs:schema>

Annex A (informative):
Content Generation Guidelines