A.2 EVS RTP Payload Format

26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS

The EVS RTP Payload Format includes a Compact format and a Header-Full format, which are used depending on the required functionalities within a session and whether only a single frame is transmitted. These two formats can be switched during a session by the media sender, if the EVS RTP Payload Format is not restricted to use only the Header-Full format, as described in Annex A.3 and TS 26.114 [13].

In addition to the EVS RTP Payload Format, RFC 4867 [15] format shall also be supported for the EVS AMR-WB IO modes to provide the backward interoperability with legacy AMR-WB terminals.

The media sender is the entity encoding the audio signal frames and sending the RTP packets including the encoded frames. The media receiver is the entity receiving the RTP packets and decoding the audio signal frames from the encoded frames.

The media receiver may send Codec Mode Requests (CMRs) in the Compact format (in the 3-bit CMR) or in the Header-Full format (in the CMR byte) to the media sender for adapting the bit rate, the audio bandwidth or the operational mode (EVS primary or EVS AMR-WB IO).

A.2.1 EVS codec Compact Format

In the Compact format, the RTP payload consists of exactly one coded frame for the EVS Primary mode, and one coded frame and one 3-bit CMR field for the EVS AMR-WB IO mode. The Compact format uses protected payload sizes that uniquely identify EVS codec modes (EVS Primary or EVS AMR-WB IO mode) and bit-rates. The protected payload sizes are used for determining the bit-rate of a received coded frame at the receiver.

Table A.1 shows the protected payload sizes and the corresponding bit-rates to be used for Compact RTP payload format.

Table A.1: Protected payload sizes

Mode

Payload Size (bits)

Bitrate (kbps)

EVS Primary

48

2.4 (EVS Primary SID)

Special case

(see clause A.2.1.3)

56

2.8

EVS AMR-WB IO

136

6.6

EVS Primary

144

7.2

EVS Primary

160

8

EVS AMR-WB IO

184

8.85

EVS Primary

192

9.6

EVS AMR-WB IO

256

12.65

EVS Primary

264

13.2

EVS AMR-WB IO

288

14.25

EVS AMR-WB IO

320

15.85

EVS Primary

328

16.4

EVS AMR-WB IO

368

18.25

EVS AMR-WB IO

400

19.85

EVS AMR-WB IO

464

23.05

EVS AMR-WB IO

480

23.85

EVS Primary

488

24.4

EVS Primary

640

32

EVS Primary

960

48

EVS Primary

1280

64

EVS Primary

1920

96

EVS Primary

2560

128

A.2.1.1 Compact format for EVS Primary mode

In the Compact format for EVS Primary mode, the RTP payload consists of exactly one coded frame. Hence, the coded frame follows the RTP header without any additional EVS RTP payload header.

The payload represents a speech frame of 20 ms encoded with the EVS codec bit-rate identified by the payload size. The bits are in the same order as produced by the EVS encoder, where the first bit is placed left-most immediately following the RTP header.

A.2.1.2 Compact format for EVS AMR-WB IO mode (except SID)

In the Compact format for EVS AMR-WB IO mode, except SID, the RTP payload consists of one 3-bit CMR field, one coded frame, and zero-padding bits if necessary.

A.2.1.2.1 Representation of Codec Mode Request (CMR) in Compact format for EVS AMR-WB IO mode

The 3-bit CMR field carries the codec mode request information to signal to the media sender the requested AMR-WB [37] or EVS AMR-WB IO codec mode to be applied for encoding. The signalling of AMR-WB and EVS AMR-WB IO with the 3-bit CMR field is defined as shown in Table A.2. The 3-bit CMR field in Compact format for EVS AMR-WB IO mode comprises a 3-bit element [c(0), c(1), c(2)] for signalling codec mode requests for the following EVS AMR-WB IO or AMR-WB codec modes.

Table A.2: 3-bit signalling element and EVS AMR-WB IO/AMR-WB CMR

C(0) C(1) C(2)

Requested Mode

0 0 0

6.6

0 0 1

8.85

0 1 0

12.65

0 1 1

15.85

1 0 0

18.25

1 0 1

23.05

1 1 0

23.85

1 1 1

none

Due to the 3-bit limitation, there is not enough signalling space for all EVS AMR-WB IO codec modes. Consequently, CMRs in Compact format for EVS AMR-WB IO are limited to include the most frequently used set of EVS AMR-WB IO /AMR-WB modes as shown in Table A.2. CMRs for EVS AMR-WB IO / AMR-WB modes 14.25 and 19.85 are not supported in Compact format for EVS AMR-WB IO. In case a request needs to be transmitted for either mode, it should be re-mapped to the next lower mode (12.65 and 18.25, respectively). Alternatively, the CMR byte in the Header-Full format may be used to transmit CMRs to 14.25 and 19.85 modes.In case of restrictions in the allowed codec modes by the mode-set MIME parameter, the 3-bit CMR for a not supported mode may be re-mapped to the next lower mode in this mode-set.

Codec mode requests for EVS primary modes shall be made using the CMR byte in the Header-Full format.

The codec mode request indicated in the 3-bit-CMR shall comply with the media type parameters (the allowed bit-rates for EVS AMR-WB IO or AMR-WB) that are negotiated for the session. When a 3-bit-CMR is received, requesting a bit-rate that does not comply with the negotiated media parameters, it shall be ignored.

A 3-bit CMR indicates the highest EVS AMR-WB IO codec mode that the media receiver (CMR sender) wants to receive. When receiving a 3-bit CMR (except value "none") the media sender shall use the EVS AMR-WB IO operation mode. The media sender should use the EVS AMR-WB IO codec mode (bit rate) requested in the received 3-bit CMR and shall not use a higher codec mode (higher bit rate). The media sender may use a lower EVS AMR-WB IO codec mode within the negotiated mode-set.

CMR code-point "none" is specified as equivalent to no CMR-value being sent. The receiver of "none" shall ignore it.

NOTE: The meaning of "none" and "NO_REQ" (see A.2.2.1.1 below) for EVS is not equivalent to code-point "CMR=15" for AMR and AMR-WB, as specified according to TS 26.114 and RFC 4867 with its errata. MGWs in the path, repacking between the RTP format according to RFC 4867 and the RTP format according to the present document, translate between these code-points.

A.2.1.2.2 Payload structure of Compact EVS AMR-WB IO mode frame

In order to minimize the need for bit re-shuffling in media gateways in case of payload format conversion to or from AMR-WB bandwidth-efficient format according to [15], the speech data bits are inserted after CMR, starting with bit d(1). Speech data bit d(0) is appended after the last speech data bit.

Figure A.1. Payload structure of Compact EVS AMR-WB IO.

The speech data payload represents a speech frame of 20 ms encoded with EVS AMR-WB IO bit-rate (mode) identified by the payload size. The order and numbering notation of the bits are as specified for Interface Format 1 (IF1) in Annex B of [36] for AMR-WB. The bits of the speech frames are arranged in the order of decreasing sensitivity, giving a re-ordered bit sequence {d(0),d(1),…,d(K-1)}.

If a total of three CMR bits and coded frame bits is not a multiple of 8, zero-padding bits are added so that the total becomes a multiple of 8. One zero-padding bit is required for EVS AMR-WB IO mode 6.6 and four zero-padding bits are required for EVS AMR-WB IO mode 8.85. In other mode no padding bits are inserted. With the exception of SID frames, the EVS AMR-WB IO Compact payload follows the RTP header without any additional EVS RTP payload header.

Note that no Compact frame format EVS AMR-WB IO SID frames is defined. For such frames the Header-Full format with CMR byte shall be used (see clause A.2.1.3).

NOTE: The Q bit defined in RFC 4867 [15] is not present in the Compact payload structure of EVS AMR-WB IO. Therefore it shall be ensured that the speech payload is not damaged. In case of a conversion of RFC 4867 formatted packets to Compact payload format, damaged frames (indicated by the Q bit) shall be discarded and not converted.

A.2.1.3 Special case for 56 bit payload size (EVS Primary or EVS AMR-WB IO SID)

The Compact format for EVS Primary 2.8 kbps frames (56 bits) has the same payload size (56 bits) as the Header-Full format for EVS AMR-WB IO SID frames with CMR byte.
Hence, two types of frames can be carried in the 56 bit payload case:

– EVS Primary 2.8 kbps frame in Compact format.

– EVS AMR-WB IO SID frame in Header-Full format (see clause A.2.2) with one CMR byte.

– The payload structure and bit ordering of EVS Primary 2.8 kbps frame in Compact format is defined in Figure A.2.

Figure A.2. Payload structure for EVS Primary 2.8 kbps (56-bit) payload

The resulting ambiguity between EVS Primary 2.8 kbps and EVS AMR-WB IO SID frames is resolved through the most significant bit (MSB) of the first byte of the payload. By definition, the first data bit d(0) of the EVS Primary 2.8 kbps is always set to ‘0’. Therefore, if the MSB of the first byte of the payload is set to ‘0’ (see Figure A.2), then the payload is an EVS Primary 2.8 kbps frame in Compact format. Otherwise it is an EVS AMR-WB IO SID frame in Header-Full format with one CMR byte. The structure of EVS AMR-WB IO SID frame with Header-Full format is described in clause A.2.2.

A.2.2 EVS codec Header-Full format

In the Header-Full format, the payload consists of one or more coded frame(s) with EVS RTP payload header(s). There are two types of EVS RTP payload header: Table of Content (ToC) byte and Codec Mode Request (CMR) byte. The detailed header structure is described in clause A.2.2.1.

A.2.2.1 EVS RTP payload structure

The complete payload of Header-Full EVS frames comprises an optional CMR byte, followed by one or several ToC bytes, followed by speech data bits and possible zero-padding bits. Padding bits shall be discarded by the receiver.

The purpose of padding is two-fold:

– In the case of EVS AMR-WB IO frames, payload data may need to be octet-aligned using zero-padding bits at the end of the payload. Note that EVS Primary frames are by definition octet-aligned (see clause A.2.2.1.4.1).

– When required, zero-padding bits are also used to increase the total payload size by byte increments such that conflicts with any of the protected sizes reserved for the Compact format are avoided (see clause A.2.2.1.4.2).

CMR and ToC bytes use MSB as Header Type identification bit (H) in order to identify the type of EVS RTP payload header. If the H bit is set to 0, the corresponding byte is a ToC byte, and if set to 1, the corresponding byte is a CMR byte. A CMR byte, if present, shall be located before ToC byte(s).

Figure A.3 shows the general structure of Header-Full payload format.

(a) Payload structure of Header-Full format with ToC single frame

(b) Payload structure of Header-Full format with ToC multiple frames

(c) Payload structure of Header-Full format with CMR + ToC single frame

(d) Payload structure of Header-Full format with CMR + ToC multiple frames

Figure A.3 Payload structure of Header-Full format

NOTE: The zero padding at the end of packet, indicated in Figure A.3 as “Zero P”, does not represent the octet-alignment for AMR-WB IO data described in clause A.2.2.1.4.1, but it represents the zero-padding for size collision avoidance described in clause A.2.2.1.4.2.

A.2.2.1.1 CMR byte

The Codec Mode Request (CMR) byte structure is shown in Figure A.4. This CMR byte shall be present for EVS AMR-WB IO speech and SID frames in Header-Full format. For EVS Primary mode, the CMR byte is only used when a CMR needs to be transmitted or if required by session negotiation. The request indicated in the CMR byte shall comply with the media type parameters (e.g. allowed bit-rates or audio bandwidths) that are negotiated in the session.

NOTE 1: There is no SDP MIME signalling parameter defined that can be used to disallow all CMRs with T-bits "001". However, the mode-set MIME parameter can be used to restrain the allowed EVS AMR-WB IO codec modes. If this mode-set parameter is not included in the media type parameters, then all 9 modes of the EVS AMR-WB IO codes modes are allowed.

The media receiver in the MTSI terminal shall be prepared to receive any speech frames within the negotiated media type parameter set as well as SID frames, irrespective of the CMR it sends or receives.

NOTE 2: The media receiver can receive such frames for various reasons. For instance, after a handover to AMR-WB, a MGW can send speech frames with an EVS AMR-WB IO codec mode even if it receives CMR byte of EVS Primary mode (T-bits not "001").

The bit-rate in the CMR byte of EVS Primary mode (T-bits not "001") indicates the highest bit-rate that the media receiver (CMR sender) wants to receive. The media sender should use the bit-rate requested in the received CMR and shall not use a higher bit-rate. The media sender may use a lower bit-rate than the requested bit-rate within the set of negotiated bit-rates.

If a single audio bandwidth is negotiated for EVS Primary mode, the CMR shall indicate this bandwidth in its T-bits, unless the mode of operation is switched by a received CMR from EVS Primary to EVS AMR-WB IO or is kept in EVS AMR-WB IO operation mode.

If a range of audio bandwidths is negotiated for EVS Primary mode, then the audio bandwidth in the CMR byte of EVS Primary mode indicates the highest audio bandwidth that the media receiver wants to receive. The media sender should use the audio bandwidth requested in the received CMR.

A CMR with T-bits "001" (i.e. a CMR for the EVS AMR-WB IO mode of operation) indicates the highest EVS AMR-WB IO codec mode that the media receiver wants to receive. When receiving a CMR with T-bits "001", the media sender shall use the EVS AMR-WB IO mode of operation. The media sender should use the EVS AMR-WB IO codec mode (bit rate) requested in the received CMR and shall not use a higher codec mode (higher bit rate). The media sender may use a lower EVS AMR-WB IO codec mode within the negotiated mode-set.

When a CMR is received, requesting a bit-rate and/or audio bandwidth that does not comply with the negotiated media parameters, it shall be ignored.

The request in the received CMR is valid until a new request is received.

Figure A.4. CMR byte

H (1 bit): Header Type identification bit. For the CMR byte this bit is always set to 1.

T (3 bits): These bits indicate the Type of Request in order to distinguish EVS AMR-WB IO and EVS Primary bandwidths.

D (4 bits): These bits indicate the requested bit rate (in cases the T-bits are "000", "001", "010", "011" and "100") or the EVS Channel Aware offset and level (in cases the T-bits are "101" and "110") of the codec mode request.

The possible values of the CMR byte and corresponding CMRs are defined in Table A.3.

Table A.3: Structure of the CMR byte

Code

Definition

Code

Definition

T

D

T

D

000

0000

NB

5.9 (VBR)

010

0000

WB

5.9 (VBR)

0001

NB

7.2

0001

WB

7.2

0010

NB

8.0

0010

WB

8

0011

NB

9.6

0011

WB

9.6

0100

NB

13.2

0100

WB

13.2

0101

NB

16.4

0101

WB

16.4

0110

NB

24.4

0110

WB

24.4

0111

Not used

0111

WB

32

1000

Not used

1000

WB

48

1001

Not used

1001

WB

64

1010

Not used

1010

WB

96

1011

Not used

1011

WB

128

1100

Not used

1100

Not used

1101

Not used

1101

Not used

1110

Not used

1110

Not used

1111

Not used

1111

Not used

001

0000

IO

6.6

011

0000

Not used

0001

IO

8.85

0001

Not used

0010

IO

12.65

0010

Not used

0011

IO

14.25

0011

SWB

9.6

0100

IO

15.85

0100

SWB

13.2

0101

IO

18.25

0101

SWB

16.4

0110

IO

19.85

0110

SWB

24.4

0111

IO

23.05

0111

SWB

32

1000

IO

23.85

1000

SWB

48

1001

Not used

1001

SWB

64

1010

Not used

1010

SWB

96

1011

Not used

1011

SWB

128

1100

Not used

1100

Not used

1101

Not used

1101

Not used

1110

Not used

1110

Not used

1111

Not used

1111

Not used

Table A.3: Structure of the CMR byte (continued)

Code

Definition

Code

Definition

T

D

T

D

100

0000

Not used

110

0000

SWB

13.2 CA-L-O2

0001

Not used

0001

SWB

13.2 CA-L-O3

0010

Not used

0010

SWB

13.2 CA-L-O5

0011

Not used

0011

SWB

13.2 CA-L-O7

0100

Not used

0100

SWB

13.2 CA-H-O2

0101

FB

16.4

0101

SWB

13.2 CA-H-O3

0110

FB

24.4

0110

SWB

13.2 CA-H-O5

0111

FB

32

0111

SWB

13.2 CA-H-O7

1000

FB

48

1000

Not used

1001

FB

64

1001

Not used

1010

FB

96

1010

Not used

1011

FB

128

1011

Not used

1100

Not used

1100

Not used

1101

Not used

1101

Not used

1110

Not used

1110

Not used

1111

Not used

1111

Not used

101

0000

WB

13.2 CA-L-O2

111

0000

Reserved

0001

WB

13.2 CA-L-O3

0001

Reserved

0010

WB

13.2 CA-L-O5

0010

Reserved

0011

WB

13.2 CA-L-O7

0011

Reserved

0100

WB

13.2 CA-H-O2

0100

Reserved

0101

WB

13.2 CA-H-O3

0101

Reserved

0110

WB

13.2 CA-H-O5

0110

Reserved

0111

WB

13.2 CA-H-O7

0111

Reserved

1000

Not used

1000

Reserved

1001

Not used

1001

Reserved

1010

Not used

1010

Reserved

1011

Not used

1011

Reserved

1100

Not used

1100

Reserved

1101

Not used

1101

Reserved

1110

Not used

1110

Reserved

1111

Not used

1111

NO_REQ

CMR code-point "NO_REQ" is specified as equivalent to no CMR-value being sent. The receiver of "NO_REQ" shall ignore it.

NOTE: The meaning of "NO_REQ" and “none” (see A.2.1.2.1 above) for EVS is not equivalent to code-point "CMR=15" for AMR and AMR-WB, as specified according to TS 26.114 and RFC 4867with its errata. MGWs in the path, repacking between the RTP format according to RFC 4867 and the RTP format according to the present document, translate between these code-points.

A.2.2.1.2 ToC byte

The Table of Content (ToC) byte structure is shown in Figure A.5.

Figure A.5. ToC byte

H (1 bit): Header Type identification bit. For the ToC byte this bit is always set to 0.

F (1 bit): If set to 1, the bit indicates that the corresponding frame is followed by another speech frame in this payload, implying that another ToC byte follows this entry. If set to 0, the bit indicates that this frame is the last frame in this payload and no further header entry follows this entry.

FT (6 bits): Frame type index. These bits indicate whether the EVS Primary or EVS AMR-WB IO mode, or comfort noise (SID) mode of the corresponding frame is carried in this payload. FT is further divided into 3 fields: EVS mode (1 bit), Unused/Q bit (1 bit) depending on the value of EVS mode bit, and EVS bit-rate (4 bits). The value of FT is defined in Tables A.4 and A.5.

Table A.4: Frame Type index when EVS mode bit = 0

EVS mode bit

(1 bit)

Unused

(1 bit)

EVS bit rate

Indicated EVS mode and bit rate

0

0

0000

Primary 2.8 kbps

0

0

0001

Primary 7.2 kbps

0

0

0010

Primary 8.0 kbps

0

0

0011

Primary 9.6 kbps

0

0

0100

Primary 13.2 kbps

0

0

0101

Primary 16.4 kbps

0

0

0110

Primary 24.4 kbps

0

0

0111

Primary 32.0 kbps

0

0

1000

Primary 48.0 kbps

0

0

1001

Primary 64.0 kbps

0

0

1010

Primary 96.0 kbps

0

0

1011

Primary 128.0 kbps

0

0

1100

Primary 2.4kbps SID

0

0

1101

For future use

0

0

1110

SPEECH_LOST

0

0

1111

NO_DATA

Table A.5: Frame Type index when EVS mode bit = 1

EVS mode bit (1 bit)

AMR-WB Q bit

(1 bit)

EVS bit rate

(4 bits)

Indicated EVS mode and codec mode

1

Q

0000

AMR-WB IO 6.6 kbps

1

Q

0001

AMR-WB IO 8.85 kbps

1

Q

0010

AMR-WB IO 12.65 kbps

1

Q

0011

AMR-WB IO 14.25 kbps

1

Q

0100

AMR-WB IO 15.85 kbps

1

Q

0101

AMR-WB IO 18.25 kbps

1

Q

0110

AMR-WB IO 19.85 kbps

1

Q

0111

AMR-WB IO 23.05 kbps

1

Q

1000

AMR-WB IO 23.85 kbps

1

Q

1001

AMR-WB IO 2.0 kbps SID

1

Q

1010

For future use

1

Q

1011

For future use

1

Q

1100

For future use

1

Q

1101

For future use

1

Q

1110

SPEECH_LOST

1

Q

1111

NO_DATA

NOTE: The 4-bit EVS bit-rate index and the mapping to EVS AMR-WB IO codec mode in Table A.4 are the same as used for the Frame Type of AMR-WB. See Table 1a [36]. The Q bit for EVS AMR-WB IO has the same definition as in [15]. If Q bit is set to 0, this indicates that the corresponding frame is severely damaged. The receiver should handle such a severly damaged frame properly by applying bad frame processing according to [6].

Packets containing only NO_DATA frames should not be transmitted in any payload format configuration, except for situations, when CMR needs to be sent immediately. Frame-blocks containing only NO_DATA frames at the end of the packet should not be transmitted in any payload format configuration. In addition, frame blocks containing only NO_DATA frames in the beginning of the packet should not be included in the payload.

For sessions with multiple mono-channels, see clause A.2.5.

A.2.2.1.3 Speech Data

In Header-Full format, the RTP payload comprises, apart from headers and possible padding, one or several coded frames, the Speech Data.

In case the frame is coded EVS Primary mode data, the bits are in the same order as produced by the EVS encoder, where the first bit is placed left-most immediately following the EVS RTP payload header (CMR byte if present, and ToC bytes).

In case the frame is coded EVS AMR-WB IO mode data, the Speech Data field is constructed as described in RFC 4867 [15] for octet-aligned Mode, sub-clause 4.4.3. In accordance with this, in case multiple frames are included in the payload, the last octet of each frame shall be padded with zero bits at the end if some bits in the octet are not used. The padding bits shall be ignored on reception.

In case the frame is coded EVS AMR-WB IO SID data, the payload structure and bit-ordering are defined in Figure A.6. The bits d(0) to d(39) are as defined in TS 26.201 [36], sub-clause 4.2.3.

Figure A.6. Payload structure for EVS AMR-WB IO SID (56 bit) payload

The EVS AMR-WB IO SID frame payload is identified by MSB of the first byte of the payload set to ‘1’.

A.2.2.1.4 Zero padding

A.2.2.1.4.1 Zero padding for octet alignment of speech data (EVS AMR-WB IO)

In EVS AMR-WB IO mode, the payload length is always made an integral number of octets by padding with zero bits if necessary (see clause A.2.2.1.3).

Note that, by definition, EVS Primary speech data is octet-aligned.

A.2.2.1.4.2 Zero padding for size collision avoidance

When “hf-only=0” or “hf-only” is not present, the RTP payload formatting function of the sender shall control the size of Header-Full RTP payload so that the Header-Full format RTP payload size does not collide with any of the protected Compact format RTP payload sizes listed in Table A.1, except for the special case of the 56-bit payload. If a Header-Full format RTP payload size collides with one of the protected Compact format RTP payload sizes, the RTP payload formatting function of the sender shall append an appropriate number of zero-padding bytes to the end of the payload such that payload sizes do not collide.

The Header-Full format representing an EVS AMR-WB IO SID frame (with one CMR byte and one ToC byte) is allowed to have the same 56 bits as EVS Primary 2.8 kbps in Compact format. In this special case, no padding bits shall be appended to the EVS AMR-WB IO SID frame.

A.2.2.1.4.3 Additional zero padding

If additional padding is required to bring the payload length to a larger multiple of octets or for some other purposes, then the P bit in the RTP header may be set and padding bits are appended as specified in [30].

A.2.3 Header-Full/Compact format handling

There are two format handling modes: Default mode and Header-Full-only mode.

A.2.3.1 Default format handling

When “hf-only=0” is present or when the “hf-only” attribute is not present, the Compact format shall be used in the following cases:

– A single mono EVS Primary mode frame is carried in an RTP packet without sending CMR.

– A single mono EVS AMR-WB IO mode frame with 3-bit CMR is carried in an RTP packet.

Otherwise, the Header-Full format with size collision avoidance shall be used.

The only exception in this default format handling is as follows: the Header-Full format may be used to transmit a single EVS AMR-WB IO frame to request 14.25 or 19.85 kbps in EVS AMR-WB IO mode as these two bit-rates cannot be indicated with the 3-bit CMR defined for Compact format.

A.2.3.2 Header-Full-only format handling

When “hf-only=1” is present, only the Header-Full format shall be used during the session. In other words, the Compact format shall not be used. The size collision avoidance shall not be performed by the RTP payload formatting function of the sender. The RTP payload decoding function of the receiver shall use ToC byte(s) to obtain the mode (i.e., EVS Primary or EVS AMR-WB IO) and the bit-rate regardless of the RTP payload size.

A.2.4 AMR-WB backward compatible EVS AMR-WB IO mode format

In order to provide backward interoperability with AMR-WB, the payload format in [15] shall also be supported for EVS AMR-WB IO mode. This payload format shall be used to communicate with a terminal not supporting EVS but supporting AMR-WB.

A.2.5 Sessions with multiple mono channels

The Header-Full EVS payload format supports transmission of multiple mono channels in the same way as described in the AMR-WB payload format [15].

A.2.5.1 Encoding of multiple mono channels

The speech encoders for different channels are not synchronized, which means that they may use different codec modes and may result in different VAD decisions depending on the content in each channel.

A.2.5.2 RTP header usage

The RTP time stamp is derived from the media time of the first frame of the first channel in the packet, even if that frame is a NO_DATA frame.

If a frame in the packet is an onset frame, then the Marker bit in the RTP header is set to ‘1’. However, since the encoders are not synchronized, they may use different VAD decisions for different channels. Hence, it is not sufficient to only use the Marker bit to detect onset frames, and to for example reset the jitter buffers in the receiver. The receiver needs to monitor the content of the channels, e.g., the Frame Type identifier, to find the transition from DTX to active speech for each individual channel.

A.2.5.3 Construction of the RTP payload

The ToC bytes of all frames from a frame-block are placed in consecutive order as defined in Section 4.1 [38]. Therefore, with N channels and K speech frame-blocks in a packet, there shall be N*K ToC bytes in the EVS RTP payload header, and the first N ToC bytes will be from the first frame-block, the second N ToC bytes will be from the second frame-block, and so on.

The payload shall include frames from all channels for each media time that is included. If a frame is not available for a channel, e.g., when the encoder for that channel is currently in DTX mode, then a NO_DATA frame shall be included instead. Since the payload always contains two or more frames, the Header-Full payload format shall be used.

The payload may contain a CMR byte according to the same rules as defined for single-channel session. When a CMR is received, it is applied equally to all channels. It may still happen that different channels are encoded in different modes, especially if independent encoders are used.

A.2.6 Storage Format

The storage format is used for storing EVS Primary or EVS AMR-WB IO speech frames in a file or as an email attachment. Multiple channel content is supported.

For EVS AMR-WB IO, the storage format of [15] can be used.

For EVS, the storage format has the following structure:

Figure A.7. Storage format for EVS

There is another storage format that is suitable for applications with more advanced demands on the storage format, like random access or synchronization with video. This format is the 3GPP-specified ISO-based multimedia file format specified in [40]. Its media type is specified in [41].

A.2.6.1 Header

The header consists of a magic number followed by a 32-bit channel description field, giving the header the following structure:

Figure A.8. Header for EVS

The magic number shall consist of the ASCII character string:

"#!EVS_MC1.0\n" or (0x23214556535f4d43312e30)

The version number in the magic number string refers to the version of the file format.

The 32-bit channel description field is defined as a 32-bit number (unsigned integer, MSB first). This number indicates the number of audio channels contained in this storage file starting from 1 for mono to N for a multi-mono signal with N channels.

A.2.6.2 Speech Frames

After the header, speech frame-blocks consecutive in time are stored in the file. Each frame-block contains a number of octet-aligned speech frames equal to the number of channels stored in the increasing order, starting with channel 1. Each stored speech frame starts with a ToC byte (see clause A.2.2.1.2). Note that no CMR byte is needed.

Non-received speech frames or frame-blocks between SID frames during non-speech periods shall be stored as NO_DATA frames. Frames or frame-blocks lost during transmission shall be stored as SPEECH_LOST frames in complete frame-blocks to keep synchronization with the original media.