A.4 Speech Codec

26.1103GPPCodec for circuit switched multimedia telephony serviceGeneral descriptionRelease 17TS

A.4.1 3GPP AMR

The AMR codec uses eight source codecs with bit-rates of 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s. The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8000 sample/s. It performs the mapping from input blocks of 160 speech samples in 13‑bit uniform PCM format to encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits and from encoded blocks of 95, 103, 118, 134, 148, 159, 204, and 244 bits to output blocks of 160 reconstructed speech samples. The coding scheme for the multi-rate coding modes is the so‑called Algebraic Code Excited Linear Prediction Coder (ACELP). The multi-rate ACELP coder is referred to as MR-ACELP. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LP filter coefficients, adaptive and fixed codebooks’ indices and gains). These parameters are encoded and transmitted. At the decoder, these parameters are decoded and speech is synthesised by filtering the reconstructed excitation signal through the LP synthesis filter.

The adaptive multi-rate speech codec is described in a bit‑exact arithmetic in form of a fixed-point ANSI-C code to allow for easy type approval as well as general testing purposes of the adaptive multi-rate speech codec.

The DTX mechanism includes a Voice Activity Detector (VAD) on the TX side; evaluation of the background acoustic noise on the TX side, in order to transmit characteristic parameters to the RX side; and generation of comfort noise on the RX side during periods where the radio transmission is turned off.

The AMR specification contains error concealment. The purpose of frame substitution is to conceal the effect of lost AMR speech frames. The purpose of muting the output in the case of several lost frames is to indicate the breakdown of the channel to the user and to avoid generating possible annoying sounds as a result from the frame substitution procedure.

A.4.2 G.723.1

G.723.1 can be used for compressing the speech or other audio signal component of multimedia services at a very low bitrate as part of H.324. This coder has two bit-rates associated with it, 5.3 and 6.3 kbit/s. The higher bitrate has greater quality. The lower bit-rate gives good quality and provides system designers with additional flexibility. Both rates are a mandatory part of the encoder and decoder. It is possible to switch between the two rates at any frame boundary. An option for variable rate operation using discontinuous transmission and noise fill during non-speech intervals is also possible using a series of silence frames or a single silence frame followed by no frames until speech is detected.

G.723.1 encodes speech or other audio signals in frames using linear predictive analysis-by-synthesis coding. The excitation signal for the high rate coder is Multipulse Maximum Likelihood Quantization (MP-MLQ) and for the low rate coder is Algebraic-Code-Excited Linear-Prediction (ACELP). The frame size is 30 ms and there is an additional look ahead of 7.5 msec,. This coder is designed to operate with a digital signal obtained by first performing telephone bandwidth filtering (ITU-T Recommendation G.712) of the analogue input, then sampling at 8000 Hz and then converting to 16-bit linear PCM for the input to the encoder. The output of the decoder is converted back to analogue by similar means.

G.723.1 has been designed to be robust for indicated frame erasures. An error concealment strategy for frame erasures has been included in the decoder. However, this strategy must be triggered by an external indication that the bit stream for the current frame has been erased. This can be achieved in H.324 using the AL2 Error Indication (EI) flag and the optional AL2 Sequence Number (SN). Because the coder was designed for burst errors, there is no error correction mechanism provided for random bit errors. If a frame erasure has occurred, the decoder switches from regular decoding to frame erasure concealment mode.

G.723.1 contains three annexes. Annex A describes the silence compression system designed for the G.723.1 speech coder (mentioned above). Annex B describes an alternative implementation of G.723.1 contained in floating point C source code. Annex C specifies a channel coding scheme which can be used with the triple rate speech codec G.723.1. The channel codec is scalable in bit-rate and is designed for mobile multimedia applications as a part of the overall H.324 family of standards.