G.2 Theory to derive the numbers for statistical testing (informative)
38.521-43GPPNRPart 4: PerformanceRadio transmission and receptionRelease 17TSUser Equipment (UE) conformance specification
Editor’s note: This clause of the Annex G is for information only and it described the background theory and information for statistical testing.
G.2.1 Error Ratio (ER)
The Error Ratio (ER) is defined as the ratio of number of errors (ne) to all results, number of samples (ns).
(1-ER is the success ratio).
G.2.2 Test Design
A statistical test is characterized by:
Test-time, Selectivity and Confidence level.
G.2.3 Confidence level
The outcome of a statistical test is a decision. This decision may be correct or in-correct. The Confidence Level CL describes the probability that the decision is a correct one. The complement is the wrong decision probability (risk) D = 1-CL.
G.2.4 Introduction: Supplier Risk versus Customer Risk
There are two targets of decision:
(a) A measurement on the pass-limit shows, that the DUT has the specified quality or is better with probability CL (CL e.g.95 %). This shall lead to a "pass decision".
The pass-limit is on the good side of the specified DUT-quality. A more stringent CL (CL e.g.99 %) shifts the pass-limit farer into the good direction. Given the quality of the DUTs is distributed, a greater CL passes less and better DUTs.
A measurement on the bad side of the pass-limit is simply "not pass" (undecided or artificial fail).
(aa) Complementary:
A measurement on the fail-limit shows, that the DUT is worse than the specified quality with probability CL.
The fail-limit is on the bad side of the specified DUT-quality. A more stringent CL shifts the fail-limit farer into the bad direction. Given the quality of the DUTs is distributed, a greater CL fails less and worse DUTs.
A measurement on the good side of the fail-limit is simply "not fail".
(b) A DUT, known to have the specified quality, shall be measured and decided pass with probability CL. This leads to the test limit.
For CL e.g. 95 %, the test limit is on the bad side of the specified DUT-quality. CL e.g.99 % shifts the pass-limit farer into the bad direction. Given the DUT-quality is distributed, a greater CL passes more and worse DUTs.
(bb) A DUT, known to be an (ε🡪0) beyond the specified quality, shall be measured and decided fail with probability CL.
For CL e.g.95 %, the test limit is on the good side of the specified DUT-quality.
NOTE 1: The different sense for CL in (a), (aa) versus (b), (bb).
NOTE 2: For constant CL in all 4 bullets (a) is equivalent to (bb) and (aa) is equivalent to (b).
G.2.5 Supplier Risk versus Customer Risk
The table below summarizes the different targets of decision.
Table G.2.5-1: Equivalent statements
Equivalent statements, using different cause-to-effect-directions, and assuming CL = constant >1/2 |
||
cause-to-effect-directions |
Known measurement result 🡪 estimation of the DUT’s quality |
Known DUT’s quality 🡪 estimation of the measurement’s outcome |
Supplier Risk |
A measurement on the pass-limit shows, that the DUT has the specified quality or is better (a) |
A DUT, known to have an (ε🡪0) beyond the specified DUT-quality, shall be measured and decided fail (bb) |
Customer Risk |
A measurement on the fail-limit shall shows, that the DUT is worse than the specified quality (aa) |
A DUT, known to have the specified quality, shall be measured and decided pass (b) |
The shaded area shown the direct interpretation of Supplier Risk and Customer Risk.
The same statements can be based on other DUT-quality-definitions.
G.2.6 Introduction: Standard test versus early decision concept
In standard statistical tests, a certain number of results (ns) is predefined in advance to the test. After ns results the number of bad results (ne) is counted and the error ratio (ER) is calculated by ne/ns.
Applying statistical theory, a decision limit can be designed, against which the calculated ER is compared to derive the decision. Such a limit is one decision point and is characterized by:
– D: the wrong decision probability (a predefined parameter)
– ns: the number of results (a fixed predefined parameter)
– ne: the number of bad results (the limit based on just ns)
In the formula for the limit, D and ns can be understood as variable parameter and variable. However the standard test execution requires fixed ns and D. The property of such a test is: It discriminates between two states only, depending on the test design:
– pass (with CL) / undecided (undecided in the sense: finally undecided)
– fail (with CL) / undecided (undecided in the sense: finally undecided)
– pass(with CL) / fail (with CL) (however against two limits).
In contrast to the standard statistical tests, the early decision concept predefines a set of (ne,ns) co-ordinates, representing the limit-curve for decision. After each result a preliminary ER is calculated and compared against the limit-curve. After each result one may make the decision or not (undecided for later decision). The parameters and variables in the limit-curve for the early decision concept have a similar but not equal meaning:
– D: the wrong decision probability (a predefined parameter)
– ns: the number of results (a variable parameter)
– ne: the number of bad results (the limit. It varies together with ns)
To avoid a "final undecided" in the standard test, a second limit shall be introduced and the single decision co-ordinate (ne,ns) needs a high ne, leading to a fixed (high) test time. In the early decision concept, having the same selectivity and the same confidence level an "undecided" need not to be avoided, as it can be decided later. A perfect DUT will hit the decision coordinate (ne,ns) with ne=0. This test time is short.
G.2.7 Standard test versus early decision concept
For Supplier Risk:
The wrong decision probability D in the standard test is the probability, to decide a DUT in-correct in the single decision point. In the early decision concept there is a probability of in-correct decisions d at each point of the limit-curve. The sum of all those wrong decision probabilities accumulate to D. Hence d<D.
For Customer Risk:
The correct decision probability CL in the standard test is the probability, to decide a DUT correct in the single decision point. In the early decision concept there is a probability of correct decisions cl at each point of the limit-curve. The sum of all those correct decision probabilities accumulate to CL. Hence cl<CL or d>D.
G.2.8 Selectivity
There is no statistical test which can discriminate between a limit DUT and a DUT which is an (ε🡪0) apart from the limit in finite time and high confidence level CL. Either the test discriminates against one limit with the results pass (with CL)/undecided or fail (with CL)/undecided, or the test ends in a result pass (with CL)/fail (with CL) but this requires a second limit.
For CL>1/2, a (measurement-result = specified-DUT-quality), generates undecided in test "supplier risk against pass limit" (a, from above) and also in the test "customer risk against the fail limit " (aa)
For CL>1/2, a DUT, known to be on the limit, will be decided pass for the test "customer risk against pass limit" (b) and also "supplier risk against fail limit" (bb).
This overlap or undecided area is not a fault or a contradiction, however it can be avoided by introducing a Bad or a Good DUT quality according to:
– Bad DUT quality: specified DUT-quality * M (M>1)
– Good DUT quality: specified DUT-quality * m (m<1)
Using e.g. M>1 and CL=95 % the test for different DUT qualities yield different pass probabilities:
Figure G.2.8-1: Pass probability versus DUT quality
G.2.9 Design of the test
The receiver characteristic test are defined by the following design principles:
1. The early decision concept is applied.
2. A second limit is introduced: Bad DUT factor M>1
3. To decide the test pass:
Supplier risk is applied based on the Bad DUT quality
To decide the test fail
Customer Risk is applied based on the specified DUT quality
The receiver characteristic test are defined by the following parameters:
1. Limit ER = 0.05
2. Bad DUT factor M=1.5 (selectivity)
3. Confidence level CL = 95 % (for specified DUT and Bad DUT-quality)
This has the following consequences:
1. A measurement on the fail limit is connected with 2 equivalent statements:
A measurement on the fail-limit shows, that the DUT is worse than the specified DUT-quality |
A DUT, known have the specified quality, shall be measured and decided pass |
2. A measurement on the pass limit is connected with the complementary statements:
A measurement on the pass limit shows, that the DUT is better than the Bad DUT-quality. |
A DUT, known to have the Bad DUT quality, shall be measured and decided fail |
The left column is used to decide the measurement.
The right column is used to verify the design of the test by simulation.
The simulation is based on the two fulcrums A and B only in Figure G.2.8-1
3. Test time
The minimum and maximum test time is fixed.
The average test time is a function of the DUT’s quality.
The individual test time is not predictable.
4. The number of decision co-ordinates (ne,ns) in the early decision concept is responsible for the selectivity of the test and the maximum test time. Having fixed the number of decision co-ordinates there is still freedom to select the individual decision co-ordinates in many combinations, all leading to the same confidence level.
G.2.10 Simulation to derive the pass fail limits
There is freedom to design the decision co-ordinates (ne,ns).
The binomial distribution and its inverse is used to design the pass and fail limits. Note that this method is not unique and that other methods exist.
Where
– fail(..) is the error ratio for the fail limit
– pass(..) is the error ratio for the pass limit
– ER is the specified error ratio 0.05
– ne is the number of bad results. This is the variable in both equations
– M is the Bad DUT factor M=1.5
– df is the wrong decision probability of a single (ne,ns) co-ordinate for the fail limit.
It is found by simulation to be df = 0.004
– clp is the confidence level of a single (ne,ns) co-ordinate for the pass limit.
It is found by simulation to be clp = 0.9975
– qnbinom(..): The inverse cumulative function of the negative binomial distribution
The simulation works as follows:
– A large population of limit DUTs with true ER = 0.05 is decided against the pass and fail limits.
– clp and df are tuned such that CL (95 %) of the population passes and D (5 %) of the population fails.
– A population of Bad DUTs with true ER = M*0.05 is decided against the same pass and fail limits.
– clp and df are tuned such that CL (95 %) of the population fails and D (5 %) of the population passes.
– This procedure and the relationship to the measurement is justified in clause G.2.9. The number of DUTs decrease during the simulation, as the decided DUTs leave the population. That number decreases with an approximately exponential characteristics. After 169 bad results all DUTs of the population are decided.
NOTE: The exponential decrease of the population is an optimal design goal for the decision co-ordinates (ne,ns), which can be achieved with other formulas or methods as well.