4.1.2 3GPP 3DOF Coordinate System
26.1183GPPRelease 17TSVirtual Reality (VR) profiles for streaming applications
The coordinate system is specified for defining the sphere coordinates azimuth (φ) and elevation (θ) for identifying a location of a point on the unit sphere, as well as the rotation angles yaw (α), pitch (β), and roll (γ). The origin of the coordinate system is usually the same as the centre point of a device or rig used for audio or video acquisition as well as the position of the user’s head in the 3D space in which the audio or video are rendered. Figure 4.1-2 specifies principal axes for the coordinate system. The X axis is equal to back-to-front axis, Y axis is equal to side-to-side (or lateral) axis, and Z axis is equal to vertical (or up) axis. These axis map to the reference system in Figure 4.1-1.
Figure 4.1-2: Coordinate system
Signals defined in the present document are represented in a spherical coordinate space in angular coordinates (ϕ,θ) for use in omnidirectional video and 3D audio. The viewing and listing perspective are from the origin sensing/looking/hearing outward toward the inside of the sphere. Even though a spherical coordinate is generally represented by using radius, elevation, and azimuth, it assumes that a unit sphere is used for capturing and rendering of VR media. Thus, a location of a point on the unit sphere is identified by using the sphere coordinates azimuth (φ) and elevation (θ). The spherical coordinates are defined so that ϕ is the azimuth and θ is the elevation. As depicted in Figure 4.1-2, the coordinate axes are also used for defining the rotation angles yaw (α), pitch (β), and roll (γ). The angles increase clockwise when looking from the origin towards the positive end of an axis. The value ranges of azimuth, yaw, and roll are all −180.0, inclusive, to 180.0, exclusive, degrees. The value range of elevation and pitch are both −90.0 to 90.0, inclusive, degrees.
Depending on the applications or implementations, not all angles may be necessary or available in the signal. The 360 video may have a restricted coverage as shown in Figure 4.1-3. When the video signal does not cover the full sphere, the coverage information is described by using following parameters:
– centre azimuth: specifies the azimuth value of the centre point of sphere region covered by the signal.
– centre elevation: specifies the elevation value of the centre of sphere region.
– azimuth range: specifies the azimuth range through the centre point of the sphere region.
– elevation range: specifies the elevation range through the centre point of the sphere region.
– tilt angle: indicates the amount of tilt of a sphere region, measured as the amount of rotation of the sphere region along the axis originating from the origin passing through the centre point of the sphere region, where the angle value increases clockwise when looking from the origin towards the positive end of the axis.
Figure 4.1-3: Restricted coverage of the sphere region covered by the cropped output picture with omni_projection_{yaw | pitch | roll}_center the center of the coverage region
For video, such a centre point may exist for each eye, referred to as stereo signal, and the video consists of three colour components, typically expressed by the luminance (Y) and two chrominance components (U and V).
The coordinate systems for all media types are assumed to be aligned in 3GPP 3DOF coordinate system. Within this coordinate system, the pose is expressed by a triple of azimuth, elevation, and tilt angle characterizing the head position of a user consuming the audio-visual content. The pose is generally dynamic, and the information may be provided through sensors in a frequently sampled version.
The field of view (FoV) of a rendering device is static and defined in two dimensions, the horizontal and vertical FoV, each in units of degrees in the angular coordinates (ϕ,θ). The pose together with the field of view of the device enables the system to generate the user viewport, i.e., the presented part of the content at a specific point in time.