Sampling methods for visual inspection or quality control of GIS data
Common quality errors for visual inspection
1. Missing features
2. Redundant feature
3. Misplaced
4. Wrong shape (line and area)
5. Miscoded (wrong attributes)
- Determine sampling method
- Create a sample
- Inspect sampling record
- Mark as pass or fail
- Generate report
- Determine acceptability
Common sampling methods
- By Fixed number - randomly choose features based on a fixed number of sample that can be carried out by the team.
- By grid or polygon - create an index grid and sample features per grid.
- By percentage of features - determine the sampling percentage of all features. Amount of features for inspection/QC is based on direct percentage count or applied weights, if there is any.
- By calculation - determine the statistically significant number of samples at a certain confidence interval, plus or minus the acceptable error. ( Given the sample size, how many features can fail inspection before my entire database fails.)
The
sample size is determined based on four factors:
- The probability ( p) of the outcome, that is, given a feature, the probability of a “pass” versus a “fail.” This value is maximized at 0.5; that is, since we have no prior knowledge of past probability that a certain percent of features from a given client will pass or fail, there is an equal probability of the features passing or failing, so 0.5 is the value used in the tool. 0.5 represents the most pessimistic (conservative) value when used in the equation for variance p(1- p). That is, p(1 - p) is maximized when p = 0.5.
- The population size (N).
- The acceptable margin of error in the confidence interval (m).
- The z-statistic for the desired confidence level (z). This is used to compare the sample to a normal distribution. The value is supplied by a lookup table.
For an
infinite population, the equation for determining the sample size (n) is:
n = ((z/m)2)(p (1 - p))
This
value must then be truncated to conform to the actual population, which gives
the actual sample size (n'):
n' = n(N)/(n + (N - 1))
Determine acceptability by failure threshold
The
failure threshold value is given by the Test of Proportions equation. This
equation determines whether the number of failures is significant enough to
fail the entire dataset, given a population size, confidence interval, and
specified failure ratio. Determination of failure threshold depends on three
factors:
- Population size (n' from above)
- The acceptable maximum failure ratio (r)
- The z-statistic for the desired confidence interval (z), which is used to compare the sample to a normal distribution. This value is supplied by a lookup table.
The
maximum failure ratio allowable (r') is
given by this equation:
r'= z *(sqrt(r(1-r)/n')+r
Since
this is a ratio, the resulting value must then be multiplied by the sample size
to get the maximum number of failures allowable (f):
f= r'(n')
Remediation
If a
given dataset fails to pass (that is, the number of actual failures exceeds the
maximum allowable number of failures), it is not sufficient to fix the failures
that were detected, then pass the dataset. If a dataset fails, it means that
the sample has revealed a deficiency with the entire dataset, not just the
detected failures. The quality of the entire dataset will have to be improved
to pass a retest based on a new random sample.
Comments
Post a Comment