6 Functions on the receive (RX) side
26.1923GPPAdaptive Multi-Rate - Wideband (AMR-WB) speech codecComfort noise aspectsRelease 17Speech codec speech processing functionsTS
The situations in which comfort noise shall be generated on the receive side are defined in [4]. In general, the comfort noise generation is started or updated whenever a valid SID frame is received.
6.1 Averaging and decoding of the LP and energy parameters
When speech frames are received by the decoder the LP and the energy parameters of the last seven speech frames shall be kept in memory. The decoder counts the number of frames elapsed since the last SID frame was updated and passed to the RSS by the encoder. Based on this count, the decoder determines whether or not there is a hangover period at the end of the speech burst (defined in [4] ). The interpolation factor is also adapted to the SID update rate.
As soon as a SID frame is received comfort noise is generated at the decoder end. The first SID frame parameters are not received but computed from the parameters stored during the hangover period. If no hangover period is detected, the parameters from the previous SID update are used.
The averaging procedure for obtaining the comfort noise parameters for the first SID frame is as follows:
– when a speech frame is received, the ISF vector is decoded and stored in memory, moreover the logarithmic frame energy of the decoded signal is also stored in memory.
– the averaged values of the quantized ISF vectors and the averaged logarithmic frame energy of the decoded frames are computed and used for comfort noise generation.
The averaged value of the ISF vector for the first SID frame is given by:
(9)
where , n > 0 is the quantized ISF vector of one of the frames of the hangover period and where = . The averaged logarithmic frame energy for the first SID frame is given by:
(10)
where , n > 0 is the logarithmic vector of one of the frames of the hangover period computed for the decoded frames and where = .
For ordinary SID frames, the ISF vector and logarithmic frame energy are computed by table lookup. The ISF vector is given by the sum of the decoded reference vector and the constant mean ISF vector.
During comfort noise generation the spectrum and energy of the comfort noise is determined by interpolation between old and new SID frames.
When dithering is used, the ISF vector f is modified by
(11)
where L(i) = 100 + 0.8i Hz and rand(L(i),L(i)) is random function generating values between L(i) and L(i). A minimum gap of 175 Hz is ensured between elements of f.
Dithering insertion for energy parameter is similar to spectral dithering and can be computed as follows:
, (12)
where L = 75 and is the energy value used for scaling the energy of the comfort noise excitation.
6.2 Comfort noise generation and updating
The comfort noise generation procedure uses the Adaptive Multi-Rate Wideband (AMR-WB) speech decoder algorithm defined in [2].
When comfort noise is to be generated, the various encoded parameters are set as follows:
In each subframe, the pulse positions and signs of the excitation are locally generated using uniformly distributed pseudo random numbers. The excitation pulses take values between +2047 and ‑2048 when comfort noise is generated. The fixed codebook comfort noise excitation generation algorithm works as follows:
for (i = 0; i < 64; i++) u[i] = shr(random(),4);
where:
u[0..63] excitation buffer;
random() generates a random integer value, uniformly distributed between -32768 and +32767;
The excitation gain is computed from the logarithmic frame energy parameter by converting it to the linear domain.
The adaptive codebook gain values in each subframe are set to 0, also the memory of the adaptive codebook is set to zero.
The pitch delay values in each subframe are set to 64.
The LP filter parameters used are those received in the SID frame.
The predictor memory of the ordinary LP parameter algorithm is initialized when RX_TYPE is not SPEECH , so that the quantizer start from given initial states when the speech activity begins again. With these parameters, the speech decoder now performs the standard operations described in [2] and synthesizes comfort noise. During CN generation, the high-band generation is performed using estimated high-band gain like in 8.85, 12.65, 14.25, 15.85, 18.25, 19.85 or 23.05 kbit/s modes during active speech.
Updating of the comfort noise parameters (energy and LP filter parameters) occurs each time a valid SID frame is received, as described in [4].
When updating the comfort noise, the parameters above should be interpolated over the SID update period to obtain smooth transitions.