5.6 Packetization

26.2903GPPAudio codec processing functionsExtended Adaptive Multi-Rate - Wideband (AMR-WB+) codecRelease 17Transcoding functionsTS

5.6.1 Packetization of TCX encoded parameters

This section explains how the TCX encoded parameters are put in one or several binary packets for transmission. One packet is used for 256-sample TCX, while respectively 2 and 4 packets are used for 512- and 1024-sample TCX. To split the TCX spectral information in multiple packets (in case of 512- and 1024-sample TCX), the spectrum is divided into interleaved tracks, where each track contains a subset of the splits in the spectrum (each split represent 8-dimensional vectors encoded with algebraic VQ, and the bits of individual splits are not divided across different tracks). If we number the splits in the spectrum, from low to high frequency, with the split numbers 0, 1, 2, 3, etc. up to the last split at the highest frequency, then the tracks are as shown in the following table

Table 13: Dividing spectral splits in different tracks for packetization

Split numbers

256-sample- TCX

Track 1

0, 1, 2, 3, etc (only one track)

512-sample TCX

Track 1

0, 2, 4, 6, etc.

Track 2

1, 3, 5, 7, etc.

1024-sampleTCX

Track 1

0, 4, 8, 12, etc.

Track 2

1, 5, 9, 13, etc.

Track 3

2, 6, 10, 14, etc.

Track 4

3, 7, 11, 15, etc.

Then, recall that the parameters of each split in algebraic VQ consist of the codebook numbers n = [n0nK-1] and the indices i = [i0iK-1] of all splits. The values of codebooks numbers n are in the set of integers {0, 2, 3, 4,…}. The size (number of bits) of each index ik is given by 4nk. To write these bits into the different packets, we associate a track number to each packet. In the case of 256-sample TCX, only one track is used (i.e. all the splits in the spectrum) and it is written in a single packet. In the case of 512-sample TCX, two packets are used: the first packet is used for Track 1 and the second packet for Track 2. In the case of 1024-sample TCX, four packets are used: the first packet is used for Track 1, the second packet for Track 2, the third packet for Track 3 and the fourth packet for Track 4. However, the spectrum quantization and bit allocation was performed without constraining each track to have the same amount of bits, so in general the different tracks do not have the same number of bits allocated to the respective splits. Hence, when writing the encoded splits (codebook numbers and lattice point indices) of a track into their respective packet, two situations can occur: 1) there are not enough bits in the track to fill the packet or 2) there are more bits in a track than the size of the packet so there is overflow. The third possibility (exactly the same number of bits in a track as the packet size) occurs rarely. This overflow has to be managed properly, so all packets are completely filled, and so the decoder can properly interpret and decode the received bits. This overflow management will be explained below when the multiplexing for the case of multiple binary tables (i.e. tracks) is detailed.

The split indices are written in their respective packets starting from the lowest frequency split and scanning the track in the spectrum in increasing value of frequency. The codebook number nK and index iK of each split are written in separate sections of the packet. Specifically, the bits of the codebook number nK (actually, its unary code representation) are written sequentially starting from one end of the packet, and the bits of the index iK are written sequentially starting from the other end of the packet. Hence, overflow occurs when these concurrent bit writing processes attempt to overwrite each other. Alternatively, when the bits in one track do not completely fill a packet, there will be a "hole" (i.e. available position for writing more bits) somewhere in the middle of the packet. In 512-sample TCX, overflow will only occur in one of the two packets, while the other packet will have this "hole" where the overflowing bits of the other packet will be written. In 1024-sample TCX, there can be "holes" in more than one of the four packets after overflow has happened. In this case, all the "holes" will be grouped together and the overflowing bits of the other packets will be written into these "holes". Details of this procedure are given below.

Then, we note that the use of a unary code to encode the lattice codebook numbers (n) implies that each split requires actually 5nk bits, when it is quantized using a point in the lattice codebook with number nk. That is, nk bits are used by the unary code (nk -1 successive "1’s" and a final "0") to indicate how many blocks of 4 bits are used in the codebook index, and 4nk bits are used to form the actual lattice codebook index in codebook nk) for the split. Note also that when a split is not quantized (i.e. set to zero by the TCX quantizer), it still requires 1 bit (a "0") in the unary code, to indicate that the decoder must skip this split and set it to zero.

Now, more details related to the multiplexing of algebraic vector quantizer indices in one or several packets are given below, in particular regarding the splitting of TCX indices in more than one packet (for 512-and 1024-sample TCX) and the management of overflow in writing the bits into the packets.

Recall that the codebook numbers are integers defined in the set {0,2,3,4,…., 36}. Each nk has to be represented in a proper binary format, denoted hereafter nEk, for multiplexing.

5.6.1.1 Multiplexing principle for a single binary table

The multiplexing in a single binary table t consists of writing bit-by-bit all the elements of n and i inside t, where the table t = (t0,…, tR-1) contains R bits (which corresponds to the number of bits allocated to algebraic VQ).

A straightforward strategy amounts to writing sequentially the elements of nE and i in the binary table t, as follows:

[nE0 i0 nE1 i1 nE2 i2 …. ]

In this case, the bits of nE0 are written from position 0 in t and upward, the bits of i0 then follow, etc. This format is uniquely decodable, because the encoded codebook number nEk indicates the size of ik.

Instead, an alternative format is used as described below:

[ i0 i1 i2 …… nE2 nE1 nE0 ]

The codebook numbers are written sequentially and downward from the end of the binary table t, whereas the indices are written sequentially and upward from the beginning of the table. This format has the advantage to separate codebook numbers and indices. This allows to take into account the different bit sensitivity of codebook numbers and indices. Indeed, with the multi-rate lattice vector quantization used, the codebooks numbers are the most sensitive parametersThus, they are written from the beginning of the table t and take around 20% of the total bit consumption, giving bitstream ordering according to bit sensitivity.

For the actual multiplexing, two pointers are then defined on the binary table t: one for (encoded) codebooks numbers posn, another for indices posi. The pointer posi is initialized to 0 (i.e. the beginning of the binary table), and posn to R-1 (i.e. the end of the binary table). Positive increments are used for posi, and negative ones for posn. At any time, the number of bits left in the binary table is given by posnposi+1.

The table t is initialized to zero. This guarantees that if no data is written, the data inside this table will correspond to an all-zero codebook numbers n (this follows from the definition of the unary code used here). The splits are then written sequentially in the binary table from k=0 to K-1: [nE0 i0] then [nE1 i1] then [nE2 i2], etc.

The data of the kth split are really written in the binary table t only if the minimal bit consumption of the kth split, denoted Rk hereafter, is less than the number of bits left in table t, i.e. if Rkposnposi+1. For the multi-rate lattice vector quantization used here, the minimal bit consumption Rk equals to 0 bit if nk=0, or 5nk-1 bits if nk2.

The multiplexing works as follows as shown in the algorithm of Figure 11.

Initialization:

posi =0, posn =R-1

set binary table t to zero

For k=0 to K-1 (loop for all splits over the 4 steps below):

Compute the number of left bits in table t: nb=posnposi+1

Compute the minimal bit consumption of the kth split: Rk =0 if nk=0, 5nk-1 if nk2

Figure 11: Multiplexing algorithm for one binary table

In practice, the binary table t is physically represented as having 4-bit elements instead of binary (1-bit) elements, so as to accelerate the write-in-table operations and avoid too many bit manipulations. This optimization is significant because the indices ik are typically formatted into 4-bit blocks. In this case, the value of posi is always a multiple of 4. However, this implies to use bit shifts and modular arithmetic on pointers posn and posi to locate positions in the table.

5.6.1.2 Multiplexing in case of multiple binary tables

In the case of multiple binary tables, the algebraic VQ parameters are written in P tables t0, …, tP-1 (P1) containing respectively r0, …, rP-1 bits, such that r0+…+rP-1 = R. In other words, the bit budget allocated to algebraic VQ parameters, R, is distributed to P binary tables. Here, L is set to 1 in the 256-sample TCX mode, 2 in the 512-sample TCX mode or 4 in the 1024-sample TCX mode.

Note that the multiplexing of algebraic VQ parameters in TCX modes employs frame-zero-fill if the bit budget allocated to algebraic VQ is not fully used.

We assume that the number of sub-vectors, K, is a multiple of P. Under this assumption, the algebraic VQ parameters are then divided into P groups of equal cardinality: each group comprises K/P (encoded) codebook numbers and K/P indices. By convention, the pth group is defined as the set (nEp+jP, ip+jP)j=0..K/P-1. This can be seen as a decimation operation (in the usual multi-rate signal processing sense).

Assuming the size of table tp is sufficient, the parameters of the pth group are written in table tp. For the sake of clarity, the division of sub-vectors is explained below in more details for P=1 and 2:

If P=1, the set (nEp+jP, ip+jP)j=0..K/P-1 for l=0 simply corresponds to (nE0, i0, …, nEK-1, iK-1). These parameters are written in table t0. This is the single-table case.

If P=2, we have (nEp+jP, ip+jP)j=0..K/P-1 = (nE0, i0, nE2, i2…, nEK-2, iK-2) for p=0 and (nE1, i1, nE3, i3…, nEK-1, iK-1) for p=1. Assuming the table sizes are sufficient, the parameters (nE0, i0, nE2, i2…, nEK-2, iK-2) are written in table t0, while the other parameters (nE1, i1, nE3, i3…, nEK-1, iK-1) are written in table t1.

The case of P=4 can be readily understood from the case of P=2.

As a consequence, in principle the multiplexing in the multiple-table case boils down to applying several times the single-table multiplexing principle: the (encoded) codebook numbers (nEp+jP)j=0..K/P-1 can be written upward from the bottom of each table tp and the indices (ip+jP)j=0..K/P-1 can be written downward from the end of each table tp. Two pointers are defined for each binary table tp: posn,p and posi,p. These pointers are initialized to posi,p = 0 and posn,p = rp –1, and are respectively incremented and decremented.

Nonetheless, the multiple-table case is not a straightforward extension of the single-packet case. It may happen indeed that the number of bits in (nEp+jP, ip+jP)j=0..K/P-1 exceeds, for a given p, the number of bits, rp, available in the binary table tp. To deal with such an "overflow", an extra table tex is defined as temporary buffer to write the bits in excess (which have to be distributed in another table tq with q p). The size of tex is set to 4*36 bits.

The actual multiplexing algorithm in the multiple-table case is detailed below:

1) Initialize: (We assume that a size of rp bits for each binary table tp.)

Set total number of bits to R: nb = R

Initialize the maximum position last such that nlast  2:

last = -1

For p=0…P-1,

posi,p = 0 and posn,p = rp –1l

set table tp to zero

2) Split and write all codebook numbers:

For p=0…P-1, the (encoded) codebook numbers (nEp+jP)j=0..K/P-1 are written sequentially (downward from the end) in table tp. This is done through two nested loops over p and j. In the illustrative embodiment a single loop is used with modular arithmetic, as detailed below:

For k=0,…,K-1

p =k mod P

Compute the minimal bit consumption of the kth split: Rk = 0 if nk=0, 5nk -1 if nk  2

If Rk > nb, nk=0 else nb = nbRk

If nk  2, last = k

Write downward nEk (except the stop bit) in table tp starting from posn,p, and decrement posn,p by nk -1

If nb  0, write the stop bit of the unary code and decrement posn,p by 1

It can be checked that for P4 with a near-equal distribution of R in rp, no overflow (i.e. bit in excess) in tables tl can happen at this step (for p=0,..,P-1). In general this property must be verified to apply the algorithm.

3) Split and write all indices:

This is the tricky part of the multiplexing algorithm due to the possibility of overflow.

Find the positions posovfp in each binary table tp (with p = 1…P) from which the bits in overflow can be written. These positions are computed assuming the indices are written by 4-bit block.

For p = 0..P-1

pos = 0

nb = posn,p + 1

For k = p to last with a step of P

If nk > 0,

If 4nk  nb, nb1 = nk

else nb1 = nb >> 2 (where >> is a bit shift operator)

nb = nb – 4* nb1

pos = pos + nb1

posovfp = pos*4

The indices can then be written as follows:

For p = 0..P-1

pos = 0

For l = p to N-1 with a step of P

nb = posn,ppos

Write the 4nk bits of ik:

Compute the number, nb1, of 4-bit blocks which can fit in table tp and the number, nb2, of 4-bit blocks in excess (to be written temporarily in table tex):

If 4nk  nb, nb1 = nk, nb2 = 0

else nb1 = nb >> 2 (where >> is a bit shift operator), nb2 = nk – nb1

Write upward the 4nb1 bits of ik from posi,p to posi,p+4nb1-1 in table tp, and increment posi,p by 4nb1

If nb2  0,

Initialize posovf to 0

Write upward the remaining 4nb2 bits of ik from posovf to posovf+4nb2-1 in table tex, and increment posovf by 4nbovf

Distribute the 4nb2 bits in table tp (with qp) based on the pointers posovfq and posn,q and the pointers posovfq are updated.

5.6.2 Packetization procedure for all parameters

The coding parameters computed in a 1024-sample super-frame at the encoder are multiplexed into 4 binary packets of equal size. The packetization consists of a multiplexing loop over 4 iterations. The size of each packet is set to Rtotal / 4 where Rtotal is the number of bits allocated to the super-frame.

Recall that the mode selected in the 1024-sample super-frame has the form (m1, m2, m3, m4), where mk=0, 1, 2 or 3, with the mapping: 0  256-sample ACELP, 1  256-sample TCX, 2  512-sample TCX, 3  1024-sample TCX

Figure 12: Structure of transmission packets for all four frame types

The multiplexing in the k-th packet is performed according to the value of mk. The corresponding packet format is shown in Figure 12. There are 3 cases:

If mk=0 or 1, the k-th packet simply contains all parameters related to a 256-sample frame, where the parameters are the 2-bit mode information (’00’ or ’01’ in binary format), the parameters of ACELP or those of 256-sample TCX, and the parameters of 256-sample HF coding.

If mk=2, the p-th packet contains half of the bits of the 512-sample TCX mode, half of the bits of 512-sample HF coding, plus the 2-bit mode information (’10’ in binary format).

If mk=3, the k-th packet contains one fourth of the bits describing the 512-sample TCX mode, one fourth of the bits of 1024-sample HF coding, plus the 2-bit mode information (’11’ in binary format).

The packetization is therefore straightforward if the k-th packet corresponds to ACELP or 256-sample TCX. The packetization is slightly more involved if 512- or 1024-sample TCX mode is used, because the bits of the 512- or 1024-sample modes have to be shared into even parts.

5.6.3 TCX gain multiplexing

It was found that the TCX gain is important to maintain audible quality in case of packet loss. Thus, in 512-sample and 1024-sample TCX frames, the TCX gain value is encoded redundantly in multiple packets to protect against packet loss. The TCX gain is encoded at a resolution of 7 bits, and these bits are labelled "Bit 0" to "Bit 6", where "Bit 0" is the Least Significant Bit (LSB) and "Bit 6" is the Most Significant Bit (MSB). We consider two cases, TCX512 and TCX1024, where the encoded bits are split into two or four packets, respectively.

At the Encoder side

TCX512: The first packet contains the full gain information (7 bits). The second packet repeats the most significant 6 bits ("Bit 1" to "Bit 7").

TCX1024: The first packet contains the full gain information (7 bits). The third packet contains a copy of the three bits "Bit 4", "Bit 5" and "Bit 6". The fourth packet contains a copy of the three bits "Bit 1", "Bit 2" and "Bit 3".

Additionally, a 3-bit "parity" is formed as thus: combining by logical XOR "Bit 1" and "Bit 4" to generate "Parity Bit 0", combining by logical XOR "Bit 2" and "Bit 5" to generate "Parity Bit 1", and combining by logical XOR "Bit 3" and "Bit 6" to generate "Parity Bit 2". These three parity bits are sent in the second packet.

At the Decoder side

The logic applied at the decoder to recover the TCX gain when missing packets occur for 512-sample TCX and 1024-sample TCX. We assume that there is at least one packet missing before entering the flowchart.

TCX512: If the fist packet is flagged as being lost, the TCX global gain is taken from the second packet, with the LSB ("Bit 0") being set to zero. If only the second packet is lost, then the full TCX gain is obtained from the first packet.

TCX1024: The gain recovery algorithm is only used if 1 or 2 packets forming an 1024-sample TCX frame are lost; as described in Section 6.5.1.1. If 3 or more packets are lost in a TCX1024 frame, the MODE is changed to (1,1,1,1) and BFI=(1,1,1,1). When only 1 or 2 packets are lost in a TCX1024 frame, the recovery algorithm is as follows:

As described above, the second, third and fourth packets of a TCX1024 frame contain the parity bits, "Bit 6" to "Bit 4", and "Bit 3" to "Bit 1" of the TCX gain. These bits (three each) are stored in "parity", "index0" and "index1" respectively.

If the third packet is lost, "index0" is replaced by the logical XOR combination of "parity" and "index1". That is, "Bit 6" is generated from the logical XOR of "Parity Bit 2" and "Bit 3", "Bit 5" is generated from the logical XOR of "Parity Bit 1" and "Bit 2", and "Bit 4" is generated from the logical XOR of "Parity Bit 0" and "Bit 1".

If the fourth packet is lost, "index1" is replaced by the logical XOR combination of "parity" and "index0. That is, "Bit 3" is generated from the logical XOR of "Parity Bit 2" and "Bit 6", "Bit 2" is generated from the logical XOR of "Parity Bit 1" and "Bit 5", and "Bit 1" is generated from the logical XOR of "Parity Bit 0" and "Bit 4".

Finally, the 7-bit TCX gain value is taken from the recovered bits ("Bit 1" to "Bit 6") and "Bit 0" is set to zero.

5.6.4 Stereo Packetization

Stereo parameters computed in a 1024-sample super-frame at the encoder are multiplexed into 4 binary packets of equal size. The packetization consists of a similar multiplexing loop as for the core encoder. The stereo packets are appended at the end of the mono packets.