5 AMR-WB SCR operation
26.1933GPPAdaptive Multi-Rate - Wideband (AMR-WB) speech codecRelease 17Source controlled rate operationSpeech codec speech processing functionsTS
5.1 Transmit (TX) side
A block diagram of the transmit side SCR functions is shown in Figure 2.
Figure 2: Block diagram of SCR functions at the TX side
5.1.1 General operation
The TX SCR handler passes traffic frames, individually marked by TX_TYPE, to the Framing unit. Each frame consists of bit fields containing the information bits, the codec mode indication, and the TX_TYPE. TX_TYPE shall be used to specify the contents of the frame. The table below provides an overview of the different TX_TYPEs used and explains the required contents in the information bit and the mode indication bit fields.
Table 1: SCR TX_TYPE identifiers for UMTS_AMR-WB and FR_AMR-WB
TX_TYPE |
Information Bits |
Mode Indication |
SPEECH_GOOD |
Speech frame, size 132..477 bits, depending on codec mode |
Current codec mode |
SPEECH_BAD |
Corrupt speech frame (bad CRC), size 132..477 bits, depending on codec mode |
Current codec mode |
SPEECH_LOST |
No useful information. |
No useful information |
SID_FIRST |
Marker for the end of talkspurt, no further information, all 35 comfort noise bits set to "0" |
The codec mode that would have been used if TX_TYPE had been "SPEECH_GOOD" |
SID_UPDATE |
35 comfort noise bits |
The codec mode that would have been used if TX_TYPE had been "SPEECH_GOOD" |
SID_BAD |
Corrupt SID update frame (bad CRC) |
The codec mode that would have been used if TX_TYPE had been "SPEECH_GOOD" |
NO_DATA |
No useful information, nothing to be transmitted |
No useful information |
TX_TYPE = "SPEECH_LOST" indicates that the Information Bit and Codec Mode fields do not contain any useful data (but still should be transmitted over AN). The purpose of this TX_TYPE is indicate that the frame was transmitted but lost on some previous phase. This TX_TYPEs may occur only in TFO and TrFO situations. Note, that it is possible to replace SPEECH_LOST with SPEECH_BAD but this may degrade the quality of the error concealment in the receiving end because concealment may try to use part of the received parameters from the frame which do not contain any useful information.
TX_TYPE = "NO_DATA" indicates that the Information Bit and Codec Mode fields do not contain any useful data (and should not be transmitted over AN). The purpose of this TX_TYPE is to provide the option to save network transmission between the transcoder and AN.
Note, the TX_TYPEs "SPEECH_BAD", "SPEECH_LOST" and "SID_BAD" may occur in TFO and TrFO situations.
The scheduling of the frames for transmission on the Access Network is controlled by the TX SCR handler by the use of the TX_TYPE field.
5.1.2 Functions of the TX SCR handler
If TX SCR operation is disabled, the TX SCR handler continuously generates speech frames, i.e. frames marked with TX_TYPE="SPEECH_GOOD".
If the TX SCR operation is enabled, the VAD flag controls the TX SCR handler operation as described in the following paragraphs.
5.1.2.1 AMR-WB SCR Timing procedures
To allow an exact verification of the TX SCR handler functions, all frames before the reset of the system are treated as if there were speech frames of an infinitely long time. Therefore, and in order to ensure the correct estimation of comfort noise parameters at RX SCR side, the first 7 frames after the reset or after enabling the SCR operation shall always be marked with TX_TYPE= "SPEECH_GOOD", even if VAD flag ="0" (hangover period, see figure 3).
The Voice Activity Detector (VAD) shall operate all the time in order to assess whether the input signal contains speech or not. The output is a binary flag (VAD flag ="1" or VAD flag ="0", respectively) on a frame by frame basis (see [7]).
The VAD flag controls indirectly, via the TX SCR handler operations described below, the overall SCR operation on the transmit side.
Whenever VAD flag ="1", the speech encoder output frame along with mode information shall be passed directly to the AN, marked with TX_TYPE =" SPEECH_GOOD "
At the end of a speech burst (transition VAD flag ="1" to VAD flag ="0"), it takes eight consecutive frames to make a new updated SID analysis available (see [6]). Normally, the first seven speech encoder output frames after the end of the speech burst shall therefore be passed directly to the AN, marked with TX_TYPE =" SPEECH_GOOD " ("hangover period").
The end of the speech is then indicated by passing frame eight after the end of the speech burst to the AN, marked with TX_TYPE = "SID_FIRST" (see figure 3). SID_FIRST frames do not contain data.
Figure 3: Normal hangover procedure for AMR-WB (Nelapsed > 23)
If, however, at the end of the speech burst, less than 24 frames have elapsed since the last SID_UPDATE frame was computed, then this last analysed SID_UPDATE frame should be passed to the AN whenever a SID_UPDATE frame is to be produced, until a new updated SID analysis is available (8 consecutive frames marked with VAD flag ="0").This reduces the load on the network in cases where short background noise spikes are taken for speech, by avoiding the "hangover" waiting for the SID frame computation.
Once the SID_FIRST frame has been passed to the AN, the TX SCR handler shall at regular intervals compute and pass updated SID_UPDATE (Comfort Noise) frames to the AN as long as VAD flag = "0". SID_UPDATE frames shall be generated every 8th frame. The first SID_UPDATE shall be sent as the third frame after the SID_FIRST frame.
The speech encoder is operated in full speech modality if TX_TYPE = " SPEECH_GOOD " and otherwise in a simplified mode, because not all encoder functions are required for the evaluation of comfort noise parameters and because comfort noise parameters are only to be generated at certain times.
5.1.3 The TX part of the AN
The TX part of the AN has the following overall functionality. The transmission is cut after the transmission of a SID_FIRST frame when the speaker stops talking. During speech pauses the transmission is resumed at regular intervals for transmission of one SID_UPDATE frame, in order to update the generated comfort noise on the RX side. The operation of
the TX part of the AN is controlled by the TX SCR handler via the TX_TYPE.
All frames, marked with SPEECH_GOOD, SID_FIRST or SID_UPDATE shall be transmitted by the TX part of the AN.
5.2 Receive (RX) side
A block diagram of the receive side SCR functions is shown in Figure 3 below.
Figure 4: Block diagram of the receive side SCR functions
5.2.1 General operation
The AN passes all the received traffic frames to the RX SCR handler, classified with RX_TYPE, as described in Table 2 (see TS 26.201). The RX SCR handles the frame accordingly.
Table 2: RX_TYPE identifiers for AMR-WB
RX_TYPE |
Information Bits |
SPEECH_GOOD |
Speech frame without detected errors. |
SPEECH_BAD |
(likely) speech frame with bad CRC (or estimated to be very bad by the RX part of the AN ) |
SPEECH_LOST |
No frame received. Indicates that this frame was transmitted, but never received. |
SID_FIRST |
This SID-frame marks the beginning of a comfort noise period. |
SID_UPDATE |
Correct SID update frame |
SID_BAD |
Corrupt SID update frame (bad CRC; applicable only for SID_UPDATE frames) |
NO_DATA |
Nothing useable was received. The synthesis mode of the previous frame type is used. |
5.2.3 Demands on the RX SCR handler
The RX SCR handler is responsible for the overall SCR operation on the RX side. It consists of two main modes: SPEECH and COMFORT_NOISE. The initial mode shall be SPEECH.
The SCR operation on the RX side shall be as follows:
– The RX SCR handler shall enter mode SPEECH, when a frame classified as SPEECH_GOOD is received.
Whenever a frame classified as SPEECH_GOOD is received the RX SCR handler shall pass it directly on to the speech decoder;
– if the RX SCR handler is in mode SPEECH, then frames classified as SPEECH_BAD, SPEECH_LOST, or NO_DATA shall be substituted and muted as defined in [5]. Frames classified as NO_DATA shall be handled like SPEECH_LOST frames without valid speech information;
– if the error concealment of RX SCR handler does not support the RX_TYPE=SPEECH_LOST, then frames classified as SPEECH_LOST shall be substituted with RX_TYPE=SPEECH_BAD;
– frames classified as SID_FIRST, SID_UPDATE or SID_BAD shall bring the RX SCR handler into mode COMFORT_NOISE and shall result in comfort noise generation, as defined in [6]. SID_BAD frames shall be substituted and muted as defined in [5];
– in mode COMFORT_NOISE the RX SCR handler shall ignore all unusable frames (NO_DATA, SPEECH_BAD); comfort noise generation shall continue, until timeout may apply ([5]).
5.3 AMR-WB SID Information format
When the TX SCR handler is ordered by the network to operate in AMR-WB mode with SCR operation enabled the SID_UPDATE frame format is according to [5]. This is the default and only mandatory operating mode of the SCR handler.
Annex A (normative):
AMR-WB DTX handler for the GSM system