5.4 Time / frequency grid generation

26.4043GPPEnhanced aacPlus encoder Spectral Band Replication (SBR) partEnhanced aacPlus general audio codecGeneral audio codec audio processing functionsRelease 17TS

An introduction to the time / frequency grid generation, including a brief discussion of the frame classes, is given in the informal encoder description in [1], subclause 4.B.18.3. The present encoder implementation employs three tools for the grid generation:

– The Transient Detector (TD)

– The Frame Splitter (FS)

– The Frame Generator (FG)

Those tools are described in the subsequent sections. Figure 7 shows the ranges of the frame classes and the transient detector offset versus the indices used by the frame generator.

|<————tranPos———->|

|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|

0 1 2 3 4 5 6 7 8 9 A B C D E F TD index (hexadecimal)

|<————FIXFIX———–>|

|<————FIXVAR———–>:<—>:

:<—>:<——VARFIX———–>|

:<—>:<——VARVAR———–>:<—>: Ybuffer

………………………………………………………. QMF slots

I-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-Io|o|o|-|-|-|-|-|-|-|-|-|-|-|-|-I SBR slots

0 4 8 16 19 32 FG index

I: nominal frame boundaries

o: frame overlap region slots

Figure 7: The four frame classes and the transient detector range

5.4.1 Transient detector

The transient detection is performed according to the pseudo-code below. It operates on subband samples of one frame length starting from sample 8. The output from the transient detector are the variables tranFlag and tranPos. The first is a boolean indicating whether there is a transient in the processed frame, and the second specifies the position (in time slots) for the on-set of the transient. The time / frequency grid generation module uses the output from the transient detector and the stored transient detection output from the previous frame to perform its operations.

t and a are static channel-dependent arrays of length 64 that needs to be stored in between calls to the transient detector. On start-up, all elements in both arrays must be set to zero.

5.4.2 Frame splitter

The frame splitting is accomplished according to the following algorithm. It is only active when the transient detector has detected the absence of a transient in the current frame of interest, i.e. when tranFlag = 0). It operates on subband samples of one and a half frame length starting from subband sample 0. The output from the frame splitter is the variable splitFlag, which indicates whether the current frame (free from transients) should be divided into two envelopes of equal size.

The variable is a static channel-dependent variable that must be stored in between calls to the frame splitting module. This variable should be set to zero on start-up.

5.4.3 Frame generator

The frame generator creates the time/frequency grid for one SBR frame. Input signals are provided by the transient detector and the frame splitter. The frame generator produces two outputs: The sbr_grid() portion of the bitstream, and an internal representation of the time/frequency grid to be used by the envelope and noise floor estimators, see Figure 5.

When no transients are present (i.e. tranFlag = 0), FIXFIX class frames are used. The frame splitter decides whether to use one or two envelopes in the FIXFIX frames (splitFlag = 0 or splitFlag = 1 respectively). "Sparse" transients (separated by one or more frames with tranFlag = 0) are coded by means of FIXVAR-VARFIX sequences. "Tight" transients ( tranFlag = 1 for two or more consecutive frames) are handeled by inserting VARVAR class frames.

As most transients are "sparse", the frame generator prepares a grid for a FIXVAR-VARFIX pair upon detection of a transient after a sequence of FIXFIX frames. The present frame is encoded using the FIXVAR portion, and the VARFIX grid is stored. At the next call of the generator it is known whether the transient actually is "sparse" or not. If ‘yes’, the already calculated and stored VARFIX grid is used. If ‘no’, a new grid, meeting the requirements of the new transient, as well as those of the previous one, is calculated, whereby a VARVAR class frame is used.

The operation of the frame generator is further described below by means of pseudo-code, where the syntax

[out0, out1, …, outm-1] = function(in0, in1, …, inn-1) is used.

FrameGenerator(tranFlag, tranPos, splitFlag)

{

static frameClassOld; // frameClass used for previous frame

static G1; // grid designed during previous call

[frameClass, frameClassOld] = calcFrameClass(frameClassOld, tranFlag);

if (tranFlag)

GP = fillFrameTran(tranPos); // load transient borders into GP

switch (frameClass) {

case FIXFIX:

BS = calcSbrGrid(FIXFIX, dc, splitFlag);

break;

case FIXVAR:

if (tranPos > 8)

GP = fillFramePre(GP); // append borders before transient borders

if (tranPos < 10)

GP = fillFramePost(GP); // append borders after transient borders

[G0, G1] = splitAndStore(GP); // split GP into two grids, G0 and G1

BS = calcSbrGrid(FIXVAR, G0, dc); // calc BS using G0

break;

case VARFIX:

BS = calcSbrGrid(VARFIX, G1, dc); // calc BS using G1 (from previous call)

break;

case VARVAR:

GP = fillFrameInter(G1, GP); // resolve conflicts and merge G1 and GP

if (tranPos < 10)

GP = fillFramePost(GP); // append fill-borders after tran-borders in GP

[G0, G1] = splitAndStore(GP); // split GP into two grids, G0 and G1

BS = calcSbrGrid(VARVAR, G0, dc); // calc BS using newly designed G0

break;

}

return [BS, FI = decodeSbrGrid(BS)];// decode BS into FI

}

The following pseudo-variables are defined:

GP = "Grid-Pair":

– GP.aBorders: array holding envelope borders of two consecutive frames

– GP.aFreqRes: array holding envelope frequency resolutions of two consecutive frames

– GP.iTran : index of transient leading border

Gi = "Grid instance i":

– Gi.aBorders: array holding envelope borders of one frame

– Gi.aFreqRes: array holding envelope frequency resolutions of one frame

– Gi.iTran : index of transient leading border of one frame

BS = "Bit-Stream":

– sbr_grid() as defined in [1] Subclause 4.4.2.8, Table 4.61A

FI = "Frame-Info":

– FI.t_E: tE , envelope borders as defined in 3.2

– FI.r : , envelope frequency resolutions as defined in 3.2

– FI.t_Q: tQ , noise floor borders as defined in 3.2

– FI.l_A: lA , index of border where the preceding envelope is to be "shortened"

the symbolic constant,

dc: don’t care

and the operations

cat(a, b): concatenate vectors a & b

length(a): number of elements of vector a

fliplr(a): reverse order of elements of vector a

ones(a) : generate vector of length a, were all elements are 1

The internal functions are defined below:

calcFrameClass (frameClassOld, tranFlag)

{

switch (frameClassOld) {

case FIXFIX:

if (tranFlag)

frameClass = FIXVAR;// stationary to transient transition

else

frameClass = FIXFIX;// when no transients are present, FIXFIX frames are used

break;

case FIXVAR:

if (tranFlag)

frameClass = VARVAR;// "tight" transients are handeled by VARVAR frames

else

frameClass = VARFIX;// "sparse" transients are handeled by [FIXVAR, VARFIX] pairs

break;

case VARFIX:

if (tranFlag)

frameClass = FIXVAR;

else

frameClass = FIXFIX;// transient to stationary transition

break;

case VARVAR:

if (tranFlag)

frameClass = VARVAR;// "tight" transients are handeled by VARVAR frames

else

frameClass = VARFIX;

break;

}

frameClassOld = frameClass;

return [frameClass, frameClassOld];

}

fillFrameTran(tranPos)

{

GP.aBorders = {tranPos + 4, tranPos + 6, tranPos + 10};

GP.aFreqRes = {0, 0, 1};

GP.iTran = 0;

return GP;

}

fillFramePre(GP)

{

aBordersFill = fillHelper(GP.aBorders[0], 8);

GP.aBorders = cat(fliplr(aBordersFill), GP.aBorders);

GP.aFreqRes = cat(ones(length(aBordersFill)), GP.aFreqRes);

GP.iTran += length(aBordersFill);

return GP;

}

fillFramePost(GP, tranPos)

{

if (tranPos < 4)

maxStep = 6;

else if (tranPos == 4 || tranPos == 5)

maxStep = 4;

else

maxStep = 8;

aBordersFill = fillHelper((32 – GP.aBorders[length(GP.aBorders) – 1], maxStep);

GP.aBorders = cat(GP.aBorders, aBordersFill);

GP.aFreqRes = cat(GP.aFreqRes, ones(length(aBordersFill)));

return GP;

}

splitAndStore(GP)

{

iSplit = 0;

while (GP.aBorders[iSplit] < 16)

iSplit++;

for (i = 0; i <= iSplit; i++) {

G0.aBorders[i] = GP.aBorders[i];

G0.aFreqRes[i] = GP.aFreqRes[i];

}

G0.iTran = GP.iTran;

for (j = 0, i = iSplit; i < length(GP.aBorders); i++, j++) {

G1.aBorders[j] = GP.aBorders[i] – 16;

G1.aFreqRes[j] = GP.aFreqRes[i];

}

G1.iTran = GP.iTran – iSplit;

}

As evident from the pseudo code, every transient is initially processed by fillFrameTran() by inserting one border at the onset of the transient, and two "decay" borders after the onset at the distances 2 and 6 slots from the first border respectively. The frequency resolutions of the two corresponding envelopes are ‘low’, whereas all other envelopes use ‘high’ resolution. Additional borders are inserted before said borders by fillFramePre() and fillFramePost(), such that no envelope exceeds the length 12 slots. The function fillHelper(A, B) subdivides the distance A by calculating segments quantized to the lengths {2, 4, 6, 8} slots while limiting the segment length to B. In splitAndStore() the borders are separated into two groups, each associated with one frame. The above procedures are illustrated by Figure 8.

tranFlag = 1

tranPos = 9

<T>

|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|

0 1 2 3 4 5 6 7 8 9 A B C D E F TD index

*

|<—–6—-|<-2|<–4—|—–6—->|

N | N

|<——— Frame n: FIXVAR —-:–3->|<– Frame n+1: VARFIX –>|

………………………………………………………. QMF slots

I-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-Io|o|o|-|-|-|-|-|-|-|-|-|-|-|-|-I SBR slots

0 7 13 15 19 25 32 FG index

I: nominal frame boundaries

o: frame overlap region slots

*: border pointed to by bs_pointer

N: noise floor middle border

Figure 8: Example of isolated transient

In Figure 8. the borders at index 7, 13, 15 and 19 are used for the present FIXVAR class frame. Conversion into sbr_grid() bitstream elements is performed in calcSbrGrid(). The methods of the four classes for conversion of borders and frequency resolutions are implicitely defined by the bitstream and decoding equations in [1], subclause 4.4.2.8 (Table 4.61A) and 4.6.18.3, and are hence not described here. In the example bs_var_bord_1 = 3, bs_num_rel_1 = 3, the relative borders have the lengths 4, 2 and 6 ("right to left"), and the frequency resolutions are 0, 0, 1, 1 ("right to left"). The bs_pointer is set to point to the transient leading border, i.e. the value is 3 since FIXVAR borders are also indexed "right to left", starting from 1 (0 signals that no transient leading border is present within the frame). The border at index 19 must be followed up in the next frame by a leading border at index 3. The border at 25, however, may or may not yield a border at 9, since a transient is possible in frame n + 1. If the transient actually is "sparse", the VARFIX bitstream comprises of bs_var_bord_0 = 3, bs_num_rel_0 = 1, one relative border of length 6, bs_pointer = 0 and frequency resolutions 1, 1.

Figure 9. gives an example of "tight" transients, and also serves to outline the functionality of fillFrameInter(). Here G1 contains borders at index 1 and 7, but a transient is located already at index 6. In fillFrameInter() the preliminary border at 7 is simply removed, and the rest of the borders for the present frame are taken from GP. (If on the other hand the distance between the last border in G1 and the first border in GP exceeds 12, the segment inbetween said borders is subdivided analogously to the procedures in fillFramePre().) Hereafter GP is finalized and split in the same manner as described above, whereafter G0 is converted into a bitstream using the VARVAR method of calcSbrGrid(). Hereby the leading border yields bs_var_bord_0 = 1 and the trailing border bs_var_bord_1 = 2. Clearly bs_num_rel_0 = 0 and bs_num_rel_1 = 3. Figure 9. also shows that fillFramePost() has inserted a border at 18, thereby meeting the requirement that one border is present within the interval [16, 19]. This concludes the description of how to generate BS.

tranFlag = 1

tranPos = 2

<T>

I-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-I

0 1 2 3 4 5 6 7 8 9 A B C D E F TD index

*

|<r2|<–r4–|<—-6—–|—–6—->|

:1| | |

:1|<——- Frame n: VARVAR —-:2->|<— Frame n+1: VARFIX —>|

………………………………………………………. QMF slots

I-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-Io|o|oI-|-|-|-|-|-|-|-|-|-|-|-|-I SBR slots

0 1 6 7 8 12 18 24 32 FG index

Figure 9: Example of tight transients

The second output of the frame generator, FI, comprises of tE, r, tQ and lA. Since those signals are equivalent to their counterparts at the decoder side, the relation between FI and BS is fully defined by the decoding equations in MPEG-4. Thus, as the last step in the frame generator, the decodeSbrGrid() function parses and decodes the now available sbr_grid() portion of the bitstream in accordance with the description in the MPEG-4 standard, which shall not be repeated here.