5.4 Mono Signal High-Band encoding (BWE)

26.2903GPPAudio codec processing functionsExtended Adaptive Multi-Rate - Wideband (AMR-WB+) codecRelease 17Transcoding functionsTS

The encoding of the HF signal is detailed in Figure 9. The HF signal is composed of the frequency components above Fs/4 kHz in the input signal. The bandwidth of this HF signal depends on the input signal sampling rate. To encode the HF signal at a low rate, a bandwidth extension (BWE) approach is employed. In BWE, energy information is sent to the decoder in the form of spectral envelope and frame energy, but the fine structure of the signal is extrapolated at the decoder from the received (decoded) excitation signal in the LF signal.

The down-sampled HF signal is called sHF(n) in Figure 9. The spectrum of this signal can be seen as a folded version of the high-frequency band prior to down-sampling. An LP analysis is performed on sHF(n) to obtain a set of coefficients which model the spectral envelope of this signal. Typically, fewer parameters are necessary than in the LF signal. Here, a filter of order 8 is used. The LP coefficients are then transformed into ISP representation and quantized for transmission. The number of LP analysis in an 1024-sample super-frame depends on the frame lengths in the super-frame.

Figure 9: High frequency encoding

The LP filter for the HF signal is denoted by AHF(z), and its quantized version is denoted by HF (z). From the LF signal (s(n) in Figure 9), a residual signal is first obtained by filtering s(n) through the inverse filter (z). Then, this residual is filtered through the quantized HF synthesis filter, 1/HF (z). Up to a gain factor, this produces a good approximation of the HF signal, but in a spectrally folded version. The actual HF synthesis signal will be recovered when up-sampling is applied to this signal

Since the excitation is taken from the LF signal, an important step is to compute the proper gain for the HF signal. This is done by comparing the energy of the reference HF signal (sHF(n)) with the energy of the synthesized HF signal. The energy is computed once per 64-sample subframe, with energy match ensured at the Fs/4 kHz subband boundary. Specifically, the synthesized HF signal and the reference HF signal are filtered through a perceptual filter derived from AHF(z). The ratio of the energy of these two filtered signals is computed every 64 samples, and expressed in dB. There are 4 such gains in a 256-sample frame (one for every 64-sample subframe). This 4-gain vector represents the gain that should be applied to the HF signal to properly match the HF signal energy. Instead of transmitting this gain directly, an estimated gain ratio is first computed by comparing the gains of filters (z) from the lower band and HF (z) from the higher band. This gain ratio estimation is detailed in Figure 10 and will be explained below. The gain ratio estimation is interpolated every 64 samples, expressed in dB and subtracted from the measured gain ratio. The resulting gain differences or gain corrections, noted to in Figure 9, are quantized as 4-dimensional vectors, i.e. 4 values per 256-sample frame.

The gain estimation computed from filters (z) and HF (z) is detailed in Figure 12. These two filters are available at the decoder side. The first 64 samples of a decaying sinusoid at Nyquist frequency radians per sample is first computed by filtering a unit impulse through a one-pole filter. The Nyquist frequency is used since the goal is to match the filter gains at around Fs/4 kHz, i.e. at the junction frequency between the LF and HF signals. Note the 64-sample length of this reference signal is the sub-frame length (64 samples). The decaying sinusoid is then filtered first through (z), to obtain a low-frequency residual, then through 1/HF (z) to obtain a synthesis signal from the HF synthesis filter. We note that if filters (z) and HF (z) have identical gains at the normalized frequency of radians per sample, the energy of the output of 1/HF (z) would be equivalent to the energy of the input of (z) (the decaying sinusoid). If the gains differ, then this gain difference is taken into account in the energy of the signal at the output, noted x(n). The correction gain should actually increased as the energy of x(n) decreases. Hence, the gain correction is computed as the multiplicative inverse of the energy of signal x(n), in the logarithmic domain (i.e. in dB). To get a true energy ratio, the energy of the decaying sinusoid, in dB, should be removed from the output. However, since this energy offset is a constant, it will simply be taken into account in the gain correction encoder.

At the decoder, the gain of the HF signal can be recovered by adding to (known at the decoder) to the decoded gain corrections.

.

Figure 10: Gain matching between low and high frequency envelope