A.2 EVS RTP Payload Format
26.4453GPPCodec for Enhanced Voice Services (EVS)Detailed algorithmic descriptionRelease 15TS
The EVS RTP Payload Format includes a Compact format and a Header-Full format, which are used depending on the required functionalities within a session and whether only a single frame is transmitted. These two formats can be switched during a session by the media sender, if the EVS RTP Payload Format is not restricted to use only the Header-Full format, as described in Annex A.3 and TS 26.114 [13].
In addition to the EVS RTP Payload Format, RFC 4867 [15] format shall also be supported for the EVS AMR-WB IO modes to provide the backward interoperability with legacy AMR-WB terminals.
The media sender is the entity encoding the audio signal frames and sending the RTP packets including the encoded frames. The media receiver is the entity receiving the RTP packets and decoding the audio signal frames from the encoded frames.
The media receiver may send Codec Mode Requests (CMRs) in the Compact format (in the 3-bit CMR) or in the Header-Full format (in the CMR byte) to the media sender for adapting the bit rate, the audio bandwidth or the operational mode (EVS primary or EVS AMR-WB IO).
A.2.1 EVS codec Compact Format
In the Compact format, the RTP payload consists of exactly one coded frame for the EVS Primary mode, and one coded frame and one 3-bit CMR field for the EVS AMR-WB IO mode. The Compact format uses protected payload sizes that uniquely identify EVS codec modes (EVS Primary or EVS AMR-WB IO mode) and bit-rates. The protected payload sizes are used for determining the bit-rate of a received coded frame at the receiver.
Table A.1 shows the protected payload sizes and the corresponding bit-rates to be used for Compact RTP payload format.
Table A.1: Protected payload sizes
|
Mode |
Payload Size (bits) |
Bitrate (kbps) |
|
EVS Primary |
48 |
2.4 (EVS Primary SID) |
|
Special case (see clause A.2.1.3) |
56 |
2.8 |
|
EVS AMR-WB IO |
136 |
6.6 |
|
EVS Primary |
144 |
7.2 |
|
EVS Primary |
160 |
8 |
|
EVS AMR-WB IO |
184 |
8.85 |
|
EVS Primary |
192 |
9.6 |
|
EVS AMR-WB IO |
256 |
12.65 |
|
EVS Primary |
264 |
13.2 |
|
EVS AMR-WB IO |
288 |
14.25 |
|
EVS AMR-WB IO |
320 |
15.85 |
|
EVS Primary |
328 |
16.4 |
|
EVS AMR-WB IO |
368 |
18.25 |
|
EVS AMR-WB IO |
400 |
19.85 |
|
EVS AMR-WB IO |
464 |
23.05 |
|
EVS AMR-WB IO |
480 |
23.85 |
|
EVS Primary |
488 |
24.4 |
|
EVS Primary |
640 |
32 |
|
EVS Primary |
960 |
48 |
|
EVS Primary |
1280 |
64 |
|
EVS Primary |
1920 |
96 |
|
EVS Primary |
2560 |
128 |
A.2.1.1 Compact format for EVS Primary mode
In the Compact format for EVS Primary mode, the RTP payload consists of exactly one coded frame. Hence, the coded frame follows the RTP header without any additional EVS RTP payload header.
The payload represents a speech frame of 20 ms encoded with the EVS codec bit-rate identified by the payload size. The bits are in the same order as produced by the EVS encoder, where the first bit is placed left-most immediately following the RTP header.
A.2.1.2 Compact format for EVS AMR-WB IO mode (except SID)
In the Compact format for EVS AMR-WB IO mode, except SID, the RTP payload consists of one 3-bit CMR field, one coded frame, and zero-padding bits if necessary.
A.2.1.2.1 Representation of Codec Mode Request (CMR) in Compact format for EVS AMR-WB IO mode
The 3-bit CMR field carries the codec mode request information to signal to the media sender the requested AMR-WB [37] or EVS AMR-WB IO codec mode to be applied for encoding. The signalling of AMR-WB and EVS AMR-WB IO with the 3-bit CMR field is defined as shown in Table A.2. The 3-bit CMR field in Compact format for EVS AMR-WB IO mode comprises a 3-bit element [c(0), c(1), c(2)] for signalling codec mode requests for the following EVS AMR-WB IO or AMR-WB codec modes.
Table A.2: 3-bit signalling element and EVS AMR-WB IO/AMR-WB CMR
|
C(0) C(1) C(2) |
Requested Mode |
|
0 0 0 |
6.6 |
|
0 0 1 |
8.85 |
|
0 1 0 |
12.65 |
|
0 1 1 |
15.85 |
|
1 0 0 |
18.25 |
|
1 0 1 |
23.05 |
|
1 1 0 |
23.85 |
|
1 1 1 |
none |
Due to the 3-bit limitation, there is not enough signalling space for all EVS AMR-WB IO codec modes. Consequently, CMRs in Compact format for EVS AMR-WB IO are limited to include the most frequently used set of EVS AMR-WB IO /AMR-WB modes as shown in Table A.2. CMRs for EVS AMR-WB IO / AMR-WB modes 14.25 and 19.85 are not supported in Compact format for EVS AMR-WB IO. In case a request needs to be transmitted for either mode, it should be re-mapped to the next lower mode (12.65 and 18.25, respectively). Alternatively, the CMR byte in the Header-Full format may be used to transmit CMRs to 14.25 and 19.85 modes.In case of restrictions in the allowed codec modes by the mode-set MIME parameter, the 3-bit CMR for a not supported mode may be re-mapped to the next lower mode in this mode-set.
Codec mode requests for EVS primary modes shall be made using the CMR byte in the Header-Full format.
The codec mode request indicated in the 3-bit-CMR shall comply with the media type parameters (the allowed bit-rates for EVS AMR-WB IO or AMR-WB) that are negotiated for the session. When a 3-bit-CMR is received, requesting a bit-rate that does not comply with the negotiated media parameters, it shall be ignored.
A 3-bit CMR indicates the highest EVS AMR-WB IO codec mode that the media receiver (CMR sender) wants to receive. When receiving a 3-bit CMR (except value "none") the media sender shall use the EVS AMR-WB IO operation mode. The media sender should use the EVS AMR-WB IO codec mode (bit rate) requested in the received 3-bit CMR and shall not use a higher codec mode (higher bit rate). The media sender may use a lower EVS AMR-WB IO codec mode within the negotiated mode-set.
CMR code-point "none" is specified as equivalent to no CMR-value being sent. The receiver of "none" shall ignore it.
NOTE: The meaning of "none" and "NO_REQ" (see A.2.2.1.1 below) for EVS is not equivalent to code-point "CMR=15" for AMR and AMR-WB, as specified according to TS 26.114 and RFC 4867 with its errata. MGWs in the path, repacking between the RTP format according to RFC 4867 and the RTP format according to the present document, translate between these code-points.
A.2.1.2.2 Payload structure of Compact EVS AMR-WB IO mode frame
In order to minimize the need for bit re-shuffling in media gateways in case of payload format conversion to or from AMR-WB bandwidth-efficient format according to [15], the speech data bits are inserted after CMR, starting with bit d(1). Speech data bit d(0) is appended after the last speech data bit.
Figure A.1. Payload structure of Compact EVS AMR-WB IO.
The speech data payload represents a speech frame of 20 ms encoded with EVS AMR-WB IO bit-rate (mode) identified by the payload size. The order and numbering notation of the bits are as specified for Interface Format 1 (IF1) in Annex B of [36] for AMR-WB. The bits of the speech frames are arranged in the order of decreasing sensitivity, giving a re-ordered bit sequence {d(0),d(1),…,d(K-1)}.
If a total of three CMR bits and coded frame bits is not a multiple of 8, zero-padding bits are added so that the total becomes a multiple of 8. One zero-padding bit is required for EVS AMR-WB IO mode 6.6 and four zero-padding bits are required for EVS AMR-WB IO mode 8.85. In other mode no padding bits are inserted. With the exception of SID frames, the EVS AMR-WB IO Compact payload follows the RTP header without any additional EVS RTP payload header.
Note that no Compact frame format EVS AMR-WB IO SID frames is defined. For such frames the Header-Full format with CMR byte shall be used (see clause A.2.1.3).
NOTE: The Q bit defined in RFC 4867 [15] is not present in the Compact payload structure of EVS AMR-WB IO. Therefore it shall be ensured that the speech payload is not damaged. In case of a conversion of RFC 4867 formatted packets to Compact payload format, damaged frames (indicated by the Q bit) shall be discarded and not converted.
A.2.1.3 Special case for 56 bit payload size (EVS Primary or EVS AMR-WB IO SID)
The Compact format for EVS Primary 2.8 kbps frames (56 bits) has the same payload size (56 bits) as the Header-Full format for EVS AMR-WB IO SID frames with CMR byte.
Hence, two types of frames can be carried in the 56 bit payload case:
– EVS Primary 2.8 kbps frame in Compact format.
– EVS AMR-WB IO SID frame in Header-Full format (see clause A.2.2) with one CMR byte.
– The payload structure and bit ordering of EVS Primary 2.8 kbps frame in Compact format is defined in Figure A.2.
Figure A.2. Payload structure for EVS Primary 2.8 kbps (56-bit) payload
The resulting ambiguity between EVS Primary 2.8 kbps and EVS AMR-WB IO SID frames is resolved through the most significant bit (MSB) of the first byte of the payload. By definition, the first data bit d(0) of the EVS Primary 2.8 kbps is always set to ‘0’. Therefore, if the MSB of the first byte of the payload is set to ‘0’ (see Figure A.2), then the payload is an EVS Primary 2.8 kbps frame in Compact format. Otherwise it is an EVS AMR-WB IO SID frame in Header-Full format with one CMR byte. The structure of EVS AMR-WB IO SID frame with Header-Full format is described in clause A.2.2.
A.2.2 EVS codec Header-Full format
In the Header-Full format, the payload consists of one or more coded frame(s) with EVS RTP payload header(s). There are two types of EVS RTP payload header: Table of Content (ToC) byte and Codec Mode Request (CMR) byte. The detailed header structure is described in clause A.2.2.1.
A.2.2.1 EVS RTP payload structure
The complete payload of Header-Full EVS frames comprises an optional CMR byte, followed by one or several ToC bytes, followed by speech data bits and possible zero-padding bits. Padding bits shall be discarded by the receiver.
The purpose of padding is two-fold:
– In the case of EVS AMR-WB IO frames, payload data may need to be octet-aligned using zero-padding bits at the end of the payload. Note that EVS Primary frames are by definition octet-aligned (see clause A.2.2.1.4.1).
– When required, zero-padding bits are also used to increase the total payload size by byte increments such that conflicts with any of the protected sizes reserved for the Compact format are avoided (see clause A.2.2.1.4.2).
CMR and ToC bytes use MSB as Header Type identification bit (H) in order to identify the type of EVS RTP payload header. If the H bit is set to 0, the corresponding byte is a ToC byte, and if set to 1, the corresponding byte is a CMR byte. A CMR byte, if present, shall be located before ToC byte(s).
Figure A.3 shows the general structure of Header-Full payload format.
(a) Payload structure of Header-Full format with ToC single frame
(b) Payload structure of Header-Full format with ToC multiple frames
(c) Payload structure of Header-Full format with CMR + ToC single frame
(d) Payload structure of Header-Full format with CMR + ToC multiple frames
Figure A.3 Payload structure of Header-Full format
NOTE: The zero padding at the end of packet, indicated in Figure A.3 as “Zero P”, does not represent the octet-alignment for AMR-WB IO data described in clause A.2.2.1.4.1, but it represents the zero-padding for size collision avoidance described in clause A.2.2.1.4.2.
A.2.2.1.1 CMR byte
The Codec Mode Request (CMR) byte structure is shown in Figure A.4. This CMR byte shall be present for EVS AMR-WB IO speech and SID frames in Header-Full format. For EVS Primary mode, the CMR byte is only used when a CMR needs to be transmitted or if required by session negotiation. The request indicated in the CMR byte shall comply with the media type parameters (e.g. allowed bit-rates or audio bandwidths) that are negotiated in the session.
NOTE 1: There is no SDP MIME signalling parameter defined that can be used to disallow all CMRs with T-bits "001". However, the mode-set MIME parameter can be used to restrain the allowed EVS AMR-WB IO codec modes. If this mode-set parameter is not included in the media type parameters, then all 9 modes of the EVS AMR-WB IO codes modes are allowed.
The media receiver in the MTSI terminal shall be prepared to receive any speech frames within the negotiated media type parameter set as well as SID frames, irrespective of the CMR it sends or receives.
NOTE 2: The media receiver can receive such frames for various reasons. For instance, after a handover to AMR-WB, a MGW can send speech frames with an EVS AMR-WB IO codec mode even if it receives CMR byte of EVS Primary mode (T-bits not "001").
The bit-rate in the CMR byte of EVS Primary mode (T-bits not "001") indicates the highest bit-rate that the media receiver (CMR sender) wants to receive. The media sender should use the bit-rate requested in the received CMR and shall not use a higher bit-rate. The media sender may use a lower bit-rate than the requested bit-rate within the set of negotiated bit-rates.
If a single audio bandwidth is negotiated for EVS Primary mode, the CMR shall indicate this bandwidth in its T-bits, unless the mode of operation is switched by a received CMR from EVS Primary to EVS AMR-WB IO or is kept in EVS AMR-WB IO operation mode.
If a range of audio bandwidths is negotiated for EVS Primary mode, then the audio bandwidth in the CMR byte of EVS Primary mode indicates the highest audio bandwidth that the media receiver wants to receive. The media sender should use the audio bandwidth requested in the received CMR.
A CMR with T-bits "001" (i.e. a CMR for the EVS AMR-WB IO mode of operation) indicates the highest EVS AMR-WB IO codec mode that the media receiver wants to receive. When receiving a CMR with T-bits "001", the media sender shall use the EVS AMR-WB IO mode of operation. The media sender should use the EVS AMR-WB IO codec mode (bit rate) requested in the received CMR and shall not use a higher codec mode (higher bit rate). The media sender may use a lower EVS AMR-WB IO codec mode within the negotiated mode-set.
When a CMR is received, requesting a bit-rate and/or audio bandwidth that does not comply with the negotiated media parameters, it shall be ignored.
The request in the received CMR is valid until a new request is received.
Figure A.4. CMR byte
H (1 bit): Header Type identification bit. For the CMR byte this bit is always set to 1.
T (3 bits): These bits indicate the Type of Request in order to distinguish EVS AMR-WB IO and EVS Primary bandwidths.
D (4 bits): These bits indicate the requested bit rate (in cases the T-bits are "000", "001", "010", "011" and "100") or the EVS Channel Aware offset and level (in cases the T-bits are "101" and "110") of the codec mode request.
The possible values of the CMR byte and corresponding CMRs are defined in Table A.3.
Table A.3: Structure of the CMR byte
|
Code |
Definition |
Code |
Definition |
||||
|
T |
D |
T |
D |
||||
|
000 |
0000 |
NB |
5.9 (VBR) |
010 |
0000 |
WB |
5.9 (VBR) |
|
0001 |
NB |
7.2 |
0001 |
WB |
7.2 |
||
|
0010 |
NB |
8.0 |
0010 |
WB |
8 |
||
|
0011 |
NB |
9.6 |
0011 |
WB |
9.6 |
||
|
0100 |
NB |
13.2 |
0100 |
WB |
13.2 |
||
|
0101 |
NB |
16.4 |
0101 |
WB |
16.4 |
||
|
0110 |
NB |
24.4 |
0110 |
WB |
24.4 |
||
|
0111 |
Not used |
0111 |
WB |
32 |
|||
|
1000 |
Not used |
1000 |
WB |
48 |
|||
|
1001 |
Not used |
1001 |
WB |
64 |
|||
|
1010 |
Not used |
1010 |
WB |
96 |
|||
|
1011 |
Not used |
1011 |
WB |
128 |
|||
|
1100 |
Not used |
1100 |
Not used |
||||
|
1101 |
Not used |
1101 |
Not used |
||||
|
1110 |
Not used |
1110 |
Not used |
||||
|
1111 |
Not used |
1111 |
Not used |
||||
|
001 |
0000 |
IO |
6.6 |
011 |
0000 |
Not used |
|
|
0001 |
IO |
8.85 |
0001 |
Not used |
|||
|
0010 |
IO |
12.65 |
0010 |
Not used |
|||
|
0011 |
IO |
14.25 |
0011 |
SWB |
9.6 |
||
|
0100 |
IO |
15.85 |
0100 |
SWB |
13.2 |
||
|
0101 |
IO |
18.25 |
0101 |
SWB |
16.4 |
||
|
0110 |
IO |
19.85 |
0110 |
SWB |
24.4 |
||
|
0111 |
IO |
23.05 |
0111 |
SWB |
32 |
||
|
1000 |
IO |
23.85 |
1000 |
SWB |
48 |
||
|
1001 |
Not used |
1001 |
SWB |
64 |
|||
|
1010 |
Not used |
1010 |
SWB |
96 |
|||
|
1011 |
Not used |
1011 |
SWB |
128 |
|||
|
1100 |
Not used |
1100 |
Not used |
||||
|
1101 |
Not used |
1101 |
Not used |
||||
|
1110 |
Not used |
1110 |
Not used |
||||
|
1111 |
Not used |
1111 |
Not used |
||||
Table A.3: Structure of the CMR byte (continued)
|
Code |
Definition |
Code |
Definition |
||||
|
T |
D |
T |
D |
||||
|
100 |
0000 |
Not used |
110 |
0000 |
SWB |
13.2 CA-L-O2 |
|
|
0001 |
Not used |
0001 |
SWB |
13.2 CA-L-O3 |
|||
|
0010 |
Not used |
0010 |
SWB |
13.2 CA-L-O5 |
|||
|
0011 |
Not used |
0011 |
SWB |
13.2 CA-L-O7 |
|||
|
0100 |
Not used |
0100 |
SWB |
13.2 CA-H-O2 |
|||
|
0101 |
FB |
16.4 |
0101 |
SWB |
13.2 CA-H-O3 |
||
|
0110 |
FB |
24.4 |
0110 |
SWB |
13.2 CA-H-O5 |
||
|
0111 |
FB |
32 |
0111 |
SWB |
13.2 CA-H-O7 |
||
|
1000 |
FB |
48 |
1000 |
Not used |
|||
|
1001 |
FB |
64 |
1001 |
Not used |
|||
|
1010 |
FB |
96 |
1010 |
Not used |
|||
|
1011 |
FB |
128 |
1011 |
Not used |
|||
|
1100 |
Not used |
1100 |
Not used |
||||
|
1101 |
Not used |
1101 |
Not used |
||||
|
1110 |
Not used |
1110 |
Not used |
||||
|
1111 |
Not used |
1111 |
Not used |
||||
|
101 |
0000 |
WB |
13.2 CA-L-O2 |
111 |
0000 |
Reserved |
|
|
0001 |
WB |
13.2 CA-L-O3 |
0001 |
Reserved |
|||
|
0010 |
WB |
13.2 CA-L-O5 |
0010 |
Reserved |
|||
|
0011 |
WB |
13.2 CA-L-O7 |
0011 |
Reserved |
|||
|
0100 |
WB |
13.2 CA-H-O2 |
0100 |
Reserved |
|||
|
0101 |
WB |
13.2 CA-H-O3 |
0101 |
Reserved |
|||
|
0110 |
WB |
13.2 CA-H-O5 |
0110 |
Reserved |
|||
|
0111 |
WB |
13.2 CA-H-O7 |
0111 |
Reserved |
|||
|
1000 |
Not used |
1000 |
Reserved |
||||
|
1001 |
Not used |
1001 |
Reserved |
||||
|
1010 |
Not used |
1010 |
Reserved |
||||
|
1011 |
Not used |
1011 |
Reserved |
||||
|
1100 |
Not used |
1100 |
Reserved |
||||
|
1101 |
Not used |
1101 |
Reserved |
||||
|
1110 |
Not used |
1110 |
Reserved |
||||
|
1111 |
Not used |
1111 |
NO_REQ |
||||
CMR code-point "NO_REQ" is specified as equivalent to no CMR-value being sent. The receiver of "NO_REQ" shall ignore it.
NOTE: The meaning of "NO_REQ" and “none” (see A.2.1.2.1 above) for EVS is not equivalent to code-point "CMR=15" for AMR and AMR-WB, as specified according to TS 26.114 and RFC 4867with its errata. MGWs in the path, repacking between the RTP format according to RFC 4867 and the RTP format according to the present document, translate between these code-points.
A.2.2.1.2 ToC byte
The Table of Content (ToC) byte structure is shown in Figure A.5.
Figure A.5. ToC byte
H (1 bit): Header Type identification bit. For the ToC byte this bit is always set to 0.
F (1 bit): If set to 1, the bit indicates that the corresponding frame is followed by another speech frame in this payload, implying that another ToC byte follows this entry. If set to 0, the bit indicates that this frame is the last frame in this payload and no further header entry follows this entry.
FT (6 bits): Frame type index. These bits indicate whether the EVS Primary or EVS AMR-WB IO mode, or comfort noise (SID) mode of the corresponding frame is carried in this payload. FT is further divided into 3 fields: EVS mode (1 bit), Unused/Q bit (1 bit) depending on the value of EVS mode bit, and EVS bit-rate (4 bits). The value of FT is defined in Tables A.4 and A.5.
Table A.4: Frame Type index when EVS mode bit = 0
|
EVS mode bit (1 bit) |
Unused (1 bit) |
EVS bit rate |
Indicated EVS mode and bit rate |
|
0 |
0 |
0000 |
Primary 2.8 kbps |
|
0 |
0 |
0001 |
Primary 7.2 kbps |
|
0 |
0 |
0010 |
Primary 8.0 kbps |
|
0 |
0 |
0011 |
Primary 9.6 kbps |
|
0 |
0 |
0100 |
Primary 13.2 kbps |
|
0 |
0 |
0101 |
Primary 16.4 kbps |
|
0 |
0 |
0110 |
Primary 24.4 kbps |
|
0 |
0 |
0111 |
Primary 32.0 kbps |
|
0 |
0 |
1000 |
Primary 48.0 kbps |
|
0 |
0 |
1001 |
Primary 64.0 kbps |
|
0 |
0 |
1010 |
Primary 96.0 kbps |
|
0 |
0 |
1011 |
Primary 128.0 kbps |
|
0 |
0 |
1100 |
Primary 2.4kbps SID |
|
0 |
0 |
1101 |
For future use |
|
0 |
0 |
1110 |
SPEECH_LOST |
|
0 |
0 |
1111 |
NO_DATA |
Table A.5: Frame Type index when EVS mode bit = 1
|
EVS mode bit (1 bit) |
AMR-WB Q bit (1 bit) |
EVS bit rate (4 bits) |
Indicated EVS mode and codec mode |
|
1 |
Q |
0000 |
AMR-WB IO 6.6 kbps |
|
1 |
Q |
0001 |
AMR-WB IO 8.85 kbps |
|
1 |
Q |
0010 |
AMR-WB IO 12.65 kbps |
|
1 |
Q |
0011 |
AMR-WB IO 14.25 kbps |
|
1 |
Q |
0100 |
AMR-WB IO 15.85 kbps |
|
1 |
Q |
0101 |
AMR-WB IO 18.25 kbps |
|
1 |
Q |
0110 |
AMR-WB IO 19.85 kbps |
|
1 |
Q |
0111 |
AMR-WB IO 23.05 kbps |
|
1 |
Q |
1000 |
AMR-WB IO 23.85 kbps |
|
1 |
Q |
1001 |
AMR-WB IO 2.0 kbps SID |
|
1 |
Q |
1010 |
For future use |
|
1 |
Q |
1011 |
For future use |
|
1 |
Q |
1100 |
For future use |
|
1 |
Q |
1101 |
For future use |
|
1 |
Q |
1110 |
SPEECH_LOST |
|
1 |
Q |
1111 |
NO_DATA |
NOTE: The 4-bit EVS bit-rate index and the mapping to EVS AMR-WB IO codec mode in Table A.4 are the same as used for the Frame Type of AMR-WB. See Table 1a [36]. The Q bit for EVS AMR-WB IO has the same definition as in [15]. If Q bit is set to 0, this indicates that the corresponding frame is severely damaged. The receiver should handle such a severly damaged frame properly by applying bad frame processing according to [6].
Packets containing only NO_DATA frames should not be transmitted in any payload format configuration, except for situations, when CMR needs to be sent immediately. Frame-blocks containing only NO_DATA frames at the end of the packet should not be transmitted in any payload format configuration. In addition, frame blocks containing only NO_DATA frames in the beginning of the packet should not be included in the payload.
For sessions with multiple mono-channels, see clause A.2.5.
A.2.2.1.3 Speech Data
In Header-Full format, the RTP payload comprises, apart from headers and possible padding, one or several coded frames, the Speech Data.
In case the frame is coded EVS Primary mode data, the bits are in the same order as produced by the EVS encoder, where the first bit is placed left-most immediately following the EVS RTP payload header (CMR byte if present, and ToC bytes).
In case the frame is coded EVS AMR-WB IO mode data, the Speech Data field is constructed as described in RFC 4867 [15] for octet-aligned Mode, sub-clause 4.4.3. In accordance with this, in case multiple frames are included in the payload, the last octet of each frame shall be padded with zero bits at the end if some bits in the octet are not used. The padding bits shall be ignored on reception.
In case the frame is coded EVS AMR-WB IO SID data, the payload structure and bit-ordering are defined in Figure A.6. The bits d(0) to d(39) are as defined in TS 26.201 [36], sub-clause 4.2.3.
Figure A.6. Payload structure for EVS AMR-WB IO SID (56 bit) payload
The EVS AMR-WB IO SID frame payload is identified by MSB of the first byte of the payload set to ‘1’.
A.2.2.1.4 Zero padding
A.2.2.1.4.1 Zero padding for octet alignment of speech data (EVS AMR-WB IO)
In EVS AMR-WB IO mode, the payload length is always made an integral number of octets by padding with zero bits if necessary (see clause A.2.2.1.3).
Note that, by definition, EVS Primary speech data is octet-aligned.
A.2.2.1.4.2 Zero padding for size collision avoidance
When “hf-only=0” or “hf-only” is not present, the RTP payload formatting function of the sender shall control the size of Header-Full RTP payload so that the Header-Full format RTP payload size does not collide with any of the protected Compact format RTP payload sizes listed in Table A.1, except for the special case of the 56-bit payload. If a Header-Full format RTP payload size collides with one of the protected Compact format RTP payload sizes, the RTP payload formatting function of the sender shall append an appropriate number of zero-padding bytes to the end of the payload such that payload sizes do not collide.
The Header-Full format representing an EVS AMR-WB IO SID frame (with one CMR byte and one ToC byte) is allowed to have the same 56 bits as EVS Primary 2.8 kbps in Compact format. In this special case, no padding bits shall be appended to the EVS AMR-WB IO SID frame.
A.2.2.1.4.3 Additional zero padding
If additional padding is required to bring the payload length to a larger multiple of octets or for some other purposes, then the P bit in the RTP header may be set and padding bits are appended as specified in [30].
A.2.3 Header-Full/Compact format handling
There are two format handling modes: Default mode and Header-Full-only mode.
A.2.3.1 Default format handling
When “hf-only=0” is present or when the “hf-only” attribute is not present, the Compact format shall be used in the following cases:
– A single mono EVS Primary mode frame is carried in an RTP packet without sending CMR.
– A single mono EVS AMR-WB IO mode frame with 3-bit CMR is carried in an RTP packet.
Otherwise, the Header-Full format with size collision avoidance shall be used.
The only exception in this default format handling is as follows: the Header-Full format may be used to transmit a single EVS AMR-WB IO frame to request 14.25 or 19.85 kbps in EVS AMR-WB IO mode as these two bit-rates cannot be indicated with the 3-bit CMR defined for Compact format.
A.2.3.2 Header-Full-only format handling
When “hf-only=1” is present, only the Header-Full format shall be used during the session. In other words, the Compact format shall not be used. The size collision avoidance shall not be performed by the RTP payload formatting function of the sender. The RTP payload decoding function of the receiver shall use ToC byte(s) to obtain the mode (i.e., EVS Primary or EVS AMR-WB IO) and the bit-rate regardless of the RTP payload size.
A.2.4 AMR-WB backward compatible EVS AMR-WB IO mode format
In order to provide backward interoperability with AMR-WB, the payload format in [15] shall also be supported for EVS AMR-WB IO mode. This payload format shall be used to communicate with a terminal not supporting EVS but supporting AMR-WB.
A.2.5 Sessions with multiple mono channels
The Header-Full EVS payload format supports transmission of multiple mono channels in the same way as described in the AMR-WB payload format [15].
A.2.5.1 Encoding of multiple mono channels
The speech encoders for different channels are not synchronized, which means that they may use different codec modes and may result in different VAD decisions depending on the content in each channel.
A.2.5.2 RTP header usage
The RTP time stamp is derived from the media time of the first frame of the first channel in the packet, even if that frame is a NO_DATA frame.
If a frame in the packet is an onset frame, then the Marker bit in the RTP header is set to ‘1’. However, since the encoders are not synchronized, they may use different VAD decisions for different channels. Hence, it is not sufficient to only use the Marker bit to detect onset frames, and to for example reset the jitter buffers in the receiver. The receiver needs to monitor the content of the channels, e.g., the Frame Type identifier, to find the transition from DTX to active speech for each individual channel.
A.2.5.3 Construction of the RTP payload
The ToC bytes of all frames from a frame-block are placed in consecutive order as defined in Section 4.1 [38]. Therefore, with N channels and K speech frame-blocks in a packet, there shall be N*K ToC bytes in the EVS RTP payload header, and the first N ToC bytes will be from the first frame-block, the second N ToC bytes will be from the second frame-block, and so on.
The payload shall include frames from all channels for each media time that is included. If a frame is not available for a channel, e.g., when the encoder for that channel is currently in DTX mode, then a NO_DATA frame shall be included instead. Since the payload always contains two or more frames, the Header-Full payload format shall be used.
The payload may contain a CMR byte according to the same rules as defined for single-channel session. When a CMR is received, it is applied equally to all channels. It may still happen that different channels are encoded in different modes, especially if independent encoders are used.
A.2.6 Storage Format
The storage format is used for storing EVS Primary or EVS AMR-WB IO speech frames in a file or as an email attachment. Multiple channel content is supported.
For EVS AMR-WB IO, the storage format of [15] can be used.
For EVS, the storage format has the following structure:
Figure A.7. Storage format for EVS
There is another storage format that is suitable for applications with more advanced demands on the storage format, like random access or synchronization with video. This format is the 3GPP-specified ISO-based multimedia file format specified in [40]. Its media type is specified in [41].
A.2.6.1 Header
The header consists of a magic number followed by a 32-bit channel description field, giving the header the following structure:
Figure A.8. Header for EVS
The magic number shall consist of the ASCII character string:
"#!EVS_MC1.0\n" or (0x23214556535f4d43312e30)
The version number in the magic number string refers to the version of the file format.
The 32-bit channel description field is defined as a 32-bit number (unsigned integer, MSB first). This number indicates the number of audio channels contained in this storage file starting from 1 for mono to N for a multi-mono signal with N channels.
A.2.6.2 Speech Frames
After the header, speech frame-blocks consecutive in time are stored in the file. Each frame-block contains a number of octet-aligned speech frames equal to the number of channels stored in the increasing order, starting with channel 1. Each stored speech frame starts with a ToC byte (see clause A.2.2.1.2). Note that no CMR byte is needed.
Non-received speech frames or frame-blocks between SID frames during non-speech periods shall be stored as NO_DATA frames. Frames or frame-blocks lost during transmission shall be stored as SPEECH_LOST frames in complete frame-blocks to keep synchronization with the original media.