7 Quality under background noise conditions
3GPP46.008Half rate speechPerformance Characterization of the GSM Half Rate speech codecRelease 17TS
7.1 Experiments 4 and 5
International subjective test programs have been conducted in the past, by both the ITU and ETSI, to investigate the effects of environmental noise. This has proved to be a difficult area to evaluate, and more satisfactory methodologies are continually being sought to improve the accuracy of these tests. Several methodologies have been used recently to investigate this factor:
a) the ACR (Absolute Category Rating) method using the classical Quality scale (second selection phase of the GSM half rate speech coding algorithm candidate, 1992);
b) the ACR method using the Listening Effort scale (second pre-selection test of the GSM Half Rate candidate, 1992);
c) the DCR (Degradation Category Rating) method such as in the ITU-T test methodology for the 16 kbit/s and 8 kbit/s speech coders which is an adapted version of the standard DCR procedure (described in ITU-T Recommendation P.80) and where several types of noise at different Signal-to-Noise ratios were evaluated in a unique experiment;
d) the DCR procedure adapted such as in the first pre-selection phase of testing for the GSM half rate candidates in 1991, where only one distinct noise has been tested in the same experiment in order to prevent the noise from being the predominant factor within the test; two experiments were, then, designed to take into account two types of noise: babble noise at a SNR of 30 dB and vehicle noise at a SNR of 10 dB.
Analysis of results gathered from these four experimental designs led to the conclusion that the last procedure – DCR test per noise (d) – is the most appropriate one to study the effects of environmental noise on a codec’s behaviour.
For the final characterization phase of testing, it was decided to follow up two methodologies: the ACR and the DCR methods, i.e. to formally compare two distinct modes of collecting the subjects’ responses with exactly the same experimental test plan (four 24 x 24 interleaved graeco-latin squares). The following environmental noises were considered of interest: office babble, vehicular, and traffic.
A listening-only test was chosen, adopting, for Exp. 4, the Absolute Category Rating (ACR) method, and subjective tests were carried out by BT (United Kingdom) and DEUTSCHE TELEKOM (Germany), while a modified version of the Degradation Category rating (DCR) was agreed for Exp. 5, and subjective tests were carried out by CNET (France) and CSELT (Italy).
Table 4 and 5 report the results obtained in experiment 4 and 5, respectively: each cell shows the difference in terms of equivalent Q values between the candidate and the full rate, negative values meaning worse performance than the full rate.
Table 4: Results from experiment 4 (ACR)
Noise |
Office Babble |
Vehicular |
Traffic |
Low noise |
-0,78 |
-2,19 |
-1,06 |
High Noise |
-1,75 |
-0,87 |
-1,25 |
Low Noise Tandem |
-1,75 |
-2,38 |
-2,66 |
High Noise Tandem |
-2,99 |
-4,10 |
-3,09 |
NOTE: The figures indicate DQ values in dB, where DQ = QHR – QFR.
Table 5: Results from experiment 5 (DCR)
Noise |
Office Babble |
Vehicular |
Traffic |
Low noise |
-2,10 |
-2,96 |
-4,53 |
High Noise |
-2,79 |
-2,83 |
-2,04 |
Low Noise Tandem |
-4,03 |
-4,39 |
-5,31 |
High Noise Tandem |
-4,96 |
-5,85 |
-5,68 |
NOTE: The figures indicate DQ values in dB, where DQ = QHR – QFR.
The main conclusion that can be drawn is that the performance of the half rate codec is (always) worse than that of the full rate, the amount of perceived degradation, in terms of DQ in dB, depending on the method chosen for the test (DCR being clearly more discriminant than ACR). Such background noise effect is most pronounced in tandem conditions.
7.2 Special background noise
7.2.1 Introduction
Some informal listening sessions were carried out to further investigate background noise effects. Speech samples from four different talkers were electronically mixed (at 3 different Signal-to-Noise Ratios; 5 dB, 10 dB, and 20 dB) with a wide range of different background noises, reflecting the following types of environment:
– Industrial Setting;
– Babble (offices and public places such as airports);
– Trains;
– Cars and Lorries;
– Roadside.
These were processed through a simulation of the Half Rate codec (with no DTX) and were listened to (on an informal basis) under controlled listening conditions using headphones.
No formal method of voting or opinion collation was employed; observations were simply noted.
7.2.2 Observations
At the lower Signal-to-Noise Ratios, the speech was often unintelligible without considerable concentration and effort on the part of the listener. In some cases, even where the listener was familiar with the speech material, it was impossible to understand some parts of the speech.
The codec had the effect of making the background noises sound "babbley", which, for example, made most background noises sound more "busy". This effect was particularly bad at 5 dB SNR. At 10 dB, the listening was more comfortable although parts of it were still difficult to understand. At 20 dB the speech was clearly understandable, although the noise was still "babbley".
For the -12 dB and -22 dB input levels, peak clipping also distorted the speech. Understandably, this effect was worse for the higher input level and for the higher Signal-to-Noise Ratios.
It must be particularly remembered when considering these results that the listening was informal and used headphones, not a handset. Also, the use of electrically summed speech and noise will not give the same results as would have been obtained if the speech used had actually been recorded in the noisy environment.