7 Enhanced aacPlus general audio codec: Enhanced aacPlus encoder
26.4013GPPEnhanced aacPlus general audio codecGeneral audio codec audio processing functionsGeneral descriptionRelease 17TS
Figure 2 shows a block diagram of the Enhanced aacPlus encoder. The input PCM time domain signal is first fed to a stereo-to-mono downmix unit, which is only applied if the input signal is stereo but the chosen audio encoding mode is selected to be mono.
Next, the (mono or stereo) input time domain signal is fed to an IIR resampling filter in order to adjust the input sampling rate fsin to the best-suited sampling rate fsenc for the encoding process. The usage of the IIR resampler is only applied if the input signal sampling rate differs from the encoding sampling rate. The IIR resampler may either be run as a 3:2 downsampler (e.g. to downsample from 48 kHz to 32 kHz) or as a 1:2 upsampler (e.g. to upsample from 16 to 32 kHz).
The Enhanced aacPlus encoder basically consists of the well-known AAC[2] (Advanced Audio Coding) waveform encoder, the SBR (Spectral Band Replication) high frequency reconstruction encoding tool and the Parametric Stereo encoding tool. The Enhanced aacPlus encoder is operating in a dual rate mode, whereas the SBR encoder operates at the encoding sampling rate fsenc as delivered from the IIR resampler and the AAC encoder at half of this sampling rate fsenc/2. Consequently a 2:1 downsampler is present at the input to the AAC encoder. For an efficient implementation an IIR (Infinite Impulse Response) filter algorithm is used. The Parametric Stereo tool is used for low-bitrate stereo coding, i.e. at and below a bitrate of 44 kbit/s. The AAC encoder implementation complies with the AAC Low Complexity Object Type [5].
Figure 2: Enhanced aacPlus Encoder overview
The SBR encoder consists of a QMF (Quadrature Mirror Filter) analysis filter bank, which is used to derive the spectral envelope of the original input signal. Furthermore the SBR related modules control the selection of a input signal adaptive grid partitioning of the QMF samples on the time axis (i.e. control the framing), analyse the relation of noise floor to tonal components in the high band, collect guidance information for the transposition process in the decoder and detect missing harmonic components which could not be reconstructed by pure transposition. This gathered information about the characteristics of the input signal, together with the spectral envelope data forms the SBR stream. The amount of bits for the SBR stream is subtracted from the bits available to the AAC encoder in order to achieve a constant bitrate encoding of the multiplexed Enhanced aacPlus stream.
For stereo bitrates at and below 44 kbit/s, the Parametric Stereo encoding tool in the Enhanced aacPlus encoder is used. For stereo bitrates above 44 kbit/s, normal stereo operation is performed. The Parametric Stereo encoding tool estimates parameters characterizing the perceived stereo image of the input signal. These stereo parameters are embedded in the SBR stream. At the same time, a signal-adaptive mono downmix of the input signal is generated in the QMF domain and fed into the SBR encoder operating in mono. This downmix is also processed by a downsampled QMF synthesis filterbank to obtain the time domain input signal for the AAC core encoder with the sampling rate fsenc/2. In this case, the 2:1 IIR downsampler is not active.
The embedding of the SBR stream (including the Parametric Stereo data) into the AAC stream is done in a backwards compatible way, i.e. a legacy Release 5 AAC decoder is able to parse the Enhanced aacPlus stream and decode the AAC core part.
The Enhanced aacPlus encoder is described in detail in [2], [3] and [4].