5 AMR-WB SCR operation

26.1933GPPAdaptive Multi-Rate - Wideband (AMR-WB) speech codecRelease 17Source controlled rate operationSpeech codec speech processing functionsTS

5.1 Transmit (TX) side

A block diagram of the transmit side SCR functions is shown in Figure 2.

Figure 2: Block diagram of SCR functions at the TX side

5.1.1 General operation

The TX SCR handler passes traffic frames, individually marked by TX_TYPE, to the Framing unit. Each frame consists of bit fields containing the information bits, the codec mode indication, and the TX_TYPE. TX_TYPE shall be used to specify the contents of the frame. The table below provides an overview of the different TX_TYPEs used and explains the required contents in the information bit and the mode indication bit fields.

Table 1: SCR TX_TYPE identifiers for UMTS_AMR-WB and FR_AMR-WB

TX_TYPE

Information Bits

Mode Indication

SPEECH_GOOD

Speech frame, size 132..477 bits, depending on codec mode

Current codec mode

SPEECH_BAD

Corrupt speech frame (bad CRC), size 132..477 bits, depending on codec mode

Current codec mode

SPEECH_LOST

No useful information.
(Note: If implementation does not support the SPEECH_LOST, SPEECH_BAD shall be used instead)

No useful information

SID_FIRST

Marker for the end of talkspurt, no further information, all 35 comfort noise bits set to "0"

The codec mode that would have been used if TX_TYPE had been "SPEECH_GOOD"

SID_UPDATE

35 comfort noise bits

The codec mode that would have been used if TX_TYPE had been "SPEECH_GOOD"

SID_BAD

Corrupt SID update frame (bad CRC)

The codec mode that would have been used if TX_TYPE had been "SPEECH_GOOD"

NO_DATA

No useful information, nothing to be transmitted

No useful information

TX_TYPE = "SPEECH_LOST" indicates that the Information Bit and Codec Mode fields do not contain any useful data (but still should be transmitted over AN). The purpose of this TX_TYPE is indicate that the frame was transmitted but lost on some previous phase. This TX_TYPEs may occur only in TFO and TrFO situations. Note, that it is possible to replace SPEECH_LOST with SPEECH_BAD but this may degrade the quality of the error concealment in the receiving end because concealment may try to use part of the received parameters from the frame which do not contain any useful information.

TX_TYPE = "NO_DATA" indicates that the Information Bit and Codec Mode fields do not contain any useful data (and should not be transmitted over AN). The purpose of this TX_TYPE is to provide the option to save network transmission between the transcoder and AN.

Note, the TX_TYPEs "SPEECH_BAD", "SPEECH_LOST" and "SID_BAD" may occur in TFO and TrFO situations.

The scheduling of the frames for transmission on the Access Network is controlled by the TX SCR handler by the use of the TX_TYPE field.

5.1.2 Functions of the TX SCR handler

If TX SCR operation is disabled, the TX SCR handler continuously generates speech frames, i.e. frames marked with TX_TYPE="SPEECH_GOOD".

If the TX SCR operation is enabled, the VAD flag controls the TX SCR handler operation as described in the following paragraphs.

5.1.2.1 AMR-WB SCR Timing procedures

To allow an exact verification of the TX SCR handler functions, all frames before the reset of the system are treated as if there were speech frames of an infinitely long time. Therefore, and in order to ensure the correct estimation of comfort noise parameters at RX SCR side, the first 7 frames after the reset or after enabling the SCR operation shall always be marked with TX_TYPE= "SPEECH_GOOD", even if VAD flag ="0" (hangover period, see figure 3).

The Voice Activity Detector (VAD) shall operate all the time in order to assess whether the input signal contains speech or not. The output is a binary flag (VAD flag ="1" or VAD flag ="0", respectively) on a frame by frame basis (see [7]).

The VAD flag controls indirectly, via the TX SCR handler operations described below, the overall SCR operation on the transmit side.

Whenever VAD flag ="1", the speech encoder output frame along with mode information shall be passed directly to the AN, marked with TX_TYPE =" SPEECH_GOOD "

At the end of a speech burst (transition VAD flag ="1" to VAD flag ="0"), it takes eight consecutive frames to make a new updated SID analysis available (see [6]). Normally, the first seven  speech encoder output frames after the end of the speech burst shall therefore be passed directly to the AN, marked with TX_TYPE =" SPEECH_GOOD " ("hangover period").

The end of the speech is then indicated by passing frame eight after the end of the speech burst to the AN, marked with TX_TYPE = "SID_FIRST" (see figure 3). SID_FIRST frames do not contain data.

Figure 3: Normal hangover procedure for AMR-WB (Nelapsed > 23)

If, however, at the end of the speech burst, less than 24 frames have elapsed since the last SID_UPDATE frame was computed, then this last analysed SID_UPDATE frame should be passed to the AN whenever a SID_UPDATE frame is to be produced, until a new updated SID analysis is available (8 consecutive frames marked with VAD flag ="0").This reduces the load on the network in cases where short background noise spikes are taken for speech, by avoiding the "hangover" waiting for the SID frame computation.

Once the SID_FIRST frame has been passed to the AN, the TX SCR handler shall at regular intervals compute and pass updated SID_UPDATE (Comfort Noise) frames to the AN as long as VAD flag = "0". SID_UPDATE frames shall be generated every 8th frame. The first SID_UPDATE shall be sent as the third frame after the SID_FIRST frame.

The speech encoder is operated in full speech modality if TX_TYPE = " SPEECH_GOOD " and otherwise in a simplified mode, because not all encoder functions are required for the evaluation of comfort noise parameters and because comfort noise parameters are only to be generated at certain times.

5.1.3 The TX part of the AN

The TX part of the AN has the following overall functionality. The transmission is cut after the transmission of a SID_FIRST frame when the speaker stops talking. During speech pauses the transmission is resumed at regular intervals for transmission of one SID_UPDATE frame, in order to update the generated comfort noise on the RX side. The operation of

the TX part of the AN is controlled by the TX SCR handler via the TX_TYPE.

All frames, marked with SPEECH_GOOD, SID_FIRST or SID_UPDATE shall be transmitted by the TX part of the AN.

5.2 Receive (RX) side

A block diagram of the receive side SCR functions is shown in Figure 3 below.

Figure 4: Block diagram of the receive side SCR functions

5.2.1 General operation

The AN passes all the received traffic frames to the RX SCR handler, classified with RX_TYPE, as described in Table 2 (see TS 26.201). The RX SCR handles the frame accordingly.

Table 2: RX_TYPE identifiers for AMR-WB

RX_TYPE

Information Bits

SPEECH_GOOD

Speech frame without detected errors.

SPEECH_BAD

(likely) speech frame with bad CRC (or estimated to be very bad by the RX part of the AN )

SPEECH_LOST

No frame received. Indicates that this frame was transmitted, but never received.

SID_FIRST

This SID-frame marks the beginning of a comfort noise period.

SID_UPDATE

Correct SID update frame

SID_BAD

Corrupt SID update frame (bad CRC; applicable only for SID_UPDATE frames)

NO_DATA

Nothing useable was received. The synthesis mode of the previous frame type is used.

5.2.3 Demands on the RX SCR handler

The RX SCR handler is responsible for the overall SCR operation on the RX side. It consists of two main modes: SPEECH and COMFORT_NOISE. The initial mode shall be SPEECH.

The SCR operation on the RX side shall be as follows:

– The RX SCR handler shall enter mode SPEECH, when a frame classified as SPEECH_GOOD is received.

Whenever a frame classified as SPEECH_GOOD is received the RX SCR handler shall pass it directly on to the speech decoder;

– if the RX SCR handler is in mode SPEECH, then frames classified as SPEECH_BAD, SPEECH_LOST, or NO_DATA shall be substituted and muted as defined in [5]. Frames classified as NO_DATA shall be handled like SPEECH_LOST frames without valid speech information;

– if the error concealment of RX SCR handler does not support the RX_TYPE=SPEECH_LOST, then frames classified as SPEECH_LOST shall be substituted with RX_TYPE=SPEECH_BAD;

– frames classified as SID_FIRST, SID_UPDATE or SID_BAD shall bring the RX SCR handler into mode COMFORT_NOISE and shall result in comfort noise generation, as defined in [6]. SID_BAD frames shall be substituted and muted as defined in [5];

– in mode COMFORT_NOISE the RX SCR handler shall ignore all unusable frames (NO_DATA, SPEECH_BAD); comfort noise generation shall continue, until timeout may apply ([5]).

5.3 AMR-WB SID Information format

When the TX SCR handler is ordered by the network to operate in AMR-WB mode with SCR operation enabled the SID_UPDATE frame format is according to [5]. This is the default and only mandatory operating mode of the SCR handler.

Annex A (normative):
AMR-WB DTX handler for the GSM system