Unless otherwise stated this page contains Version 1.0 content (Read more about versions)
6.1 Statistical methods for the treatment of experimental data
The result of an experiment to measure a physical quantity is never exact but is subject to errors. The error in a result x is defined as its difference (x − xt) from the true value xt of the quantity to be measured. This should not be confused with the uncertainty in x which is expressed by some convenient measure of how large the error might be or, equivalently, of the range about the result x in which the true value is thought to lie.
Errors may be divided into two classes: (a) random errors which vary unpredictably from one repetition of a measurement to another, and (b) systematic errors arising from bias in the measurement process perhaps due to the equipment used, calibration errors, corrections based on simplified error models or data processing techniques. If both types of error are small, the measurement will be accurate. If only random errors are small, it will merely be precise. The effect of random errors can be seen in the variation of results of repeated measurements. This permits statistical techniques to be employed to reduce them and to estimate corresponding components of uncertainty. In contrast, the treatment of systematic errors will generally depend on a worker's technical judgements about their causes.
Treatment of random errors
If the experiment is carried out n times, there will be n results, x1...xi...xn, varying in value because of random errors. A first step which is often useful when n is large is to plot a histogram of the results, dividing the range of values taken by the xi into a number of equal intervals and plotting the number of readings falling in each interval. In a typical case the histogram will have a single peak, which, in the absence of systematic errors, we may assume to be somewhere near the true value of x, and will have a spread about this peak which is an indication of the precision of the measurements.
When systematic errors are negligible it is usual to take the mean of the xi as best estimate of the true value of x where
The standard deviation, s, is taken as a quantitative measure of the spread of the readings. Its square is known as the sample variance and is given by
Another important quantity is the standard deviation of the mean, sometimes called the standard error and equal to n−1/2s when n is large. This quantity is a measure of the spread of the different values of which would be obtained from successively measured sets of n values of x. It is therefore a measure of the precision of the result of the experiment and of that component of its uncertainty associated with random errors. For this reason it has been recommended that it be referred to as a ‘standard uncertainty’. Where systematic errors are negligible, the mean and its standard deviation (with the number of readings n) are sufficient to characterize an experimental result. To appreciate their significance more fully requires some underlying theory.
In biology and the social sciences it is often the object of an investigation to make an estimate of some property of a large but finite population of individuals, too numerous for each to be measured, by means of measurements on a limited sample. There is a unique true value of the property being estimated, which could be ascertained given sufficient time and effort. A range of statistical techniques is available to estimate the properties of the parent population from sample measurements with calculable uncertainties and confidence limits.
In the physical sciences it is convenient to postulate that our series of measurements forms a finite sample from an infinite population. We can speak of this population having a probability distribution function F(x)
with the following properties:
The histogram, mean and standard deviation derived from our set of measurements can be regarded as sample approximations to the probability distribution, mean and standard deviation of the parent population.
It is sometimes necessary to combine the results of a number of independent determinations of a quantity into a single estimate. The best precision (minimum variance) is obtained if the individual results are ‘weighted’ inversely to their variances. Suppose that there are m separate results Xj, each the mean of nj readings with variance sj2. Then the best estimate of the quantity is
and the expected variance of this will be
In most physical measurements, where the random errors can be thought of as made up of a number of small independent contributions, the probability distribution has the form of the normal error function or Gaussian distribution,
Much of the further statistical treatment of the data is based on theory which assumes that the distribution function is Gaussian. If the histogram has a form which differs widely from the Gaussian it is a warning to proceed with caution. However, the Central Limit Theorem states that the sample means from a non-Gaussian population have a distribution which approximates to the Gaussian, and the larger the number of observations the better the approximation. Consequently, many tests are valid even with non-Gaussian populations. The Gaussian and its integral are tabulated in a number of the references below.
The component of uncertainty in
due to random errors can be expressed in several ways,
for example, as:
ksn−1/2 (the ‘expanded uncertainty’ measure defined as k standard uncertainties where k is some small number, e.g. k = 2);
a confidence interval (a range of values that is expected to include μ with a stated level of confidence).
The first two uncertainty measures have the advantage of not presupposing any particular form of distribution and are increasingly recommended in standards.
In order to derive a confidence interval for a single measured result x which would contain the mean μ of its parent population, assumed to have standard deviation σ, we form the test function c = |x − μ|/σ. By integrating the Gaussian function we can calculate the probability p(C) that a sample observation taken at random will have a value of c less than C, and construct the following table:
If in our experiment c > 2.58 we can say that, given the assumed value of σ, the difference between x and μ is significant at the 1% level of probability, meaning that there is a less than 1% chance that it is due to random causes. Thus the range of values of μ for which c 2.58 defines a 99% confidence interval about x.
In practice we do not usually know the standard deviation σ of the parent population but have to work with the standard deviation s of the observations. This causes an important change in the method as the following illustration will show: given a set of n observations with mean and standard deviation s, test the hypothesis that the true value is μ (i.e. that they are a sample from a parent population with Gaussian distribution having mean μ). We now form the test function
t = | − μ|/(s/n1/2)
This has a distribution which can be calculated for a
Gaussian population and is known as the ‘Student’ t
distribution. It involves a parameter known as the number of degrees
of freedom which is, loosely, the number of independent observations,
n − 1 in our case (not n because for a given
, once n − 1 values are known, the nth is
determinate). Values of t2 for given significance levels
P and numbers of degrees of freedom are tabulated in the literature and
in the following short table where values of t2 are given
against parameter values
As an illustration of the use of this we may return to our original experiment, to measure x, and improve on our earlier statement that, in the absence of systematic errors, is an estimate of the true value of x by adding confidence limits. We can say that the value of x is
at the confidence level 1 − P, where
t2 is the entry in the following table at the appropriate
values of P and the other parameters. We are then asserting that the
Treatment of systematic errors
The discussion above has not dealt with uncertainty
components associated with systematic errors. Sometimes a component of bias in
a result is due to a random error ‘sampled’ only once, for example
when taking an unbiased measured value subject to a random error from another
worker. In this case a worker must use their technical judgement to estimate
the corresponding standard deviation or confidence interval. A more difficult
case arises when a component of bias is due to the method of the
measurement. This can occur with an uncorrected error of method (e.g. due to a
cosine error in a length measurement) or when a correction itself is subject to
an error of method (e.g. due to the use of a simplified theoretical model for
the error being corrected). Here there can be no question of a population of
varying errors ‘sampled’ by the experiment. In this type of
situation statisticians usually resort to ‘subjective probability
distributions’ expressing degrees of belief in possible values of a
variable. In this way subjective standard deviations or confidence intervals
can be derived, again on the basis of technical judgement.
Combination of uncertainties
Usually a worker needs to calculate an overall uncertainty allowing for both random and systematic errors. There is considerable argument on how to proceed here, but a method increasingly recommended is to correct for all known systematic error components and then simply to combine all estimated standard deviations sj in quadrature, no matter what type of error they are associated with, to produce the standard uncertainty measure:
where n errors are involved. An expanded uncertainty measure U defining a range ±U can be stated using a k factor:
U = ku
The value k = 2 is recommended for general use corresponding closely to a 95% confidence level in the case of a Gaussian distribution. Experimental results can be expressed in the form ± u or ± U and should always be accompanied by a statement of what uncertainty measure is being used, the value of any k factor and the associated number of degrees of freedom. The measures u and U have the merit of simplicity, not requiring the difficult classification of errors into the random and systematic categories. This is particularly helpful when estimating uncertainties for results calculated using the results of other workers where adequate information may be lacking. However, where errors of method are involved, it is not possible to interpret U in terms of the frequency with which the range ±U contains a true value.
K. A. Brownlee (1960) Statistical Theory and Methodology in Science
and Engineering, Wiley.
Making statistical tests on data
Nine commonly needed tests are listed in column 1 including the two discussed above. Use of the transformations of the observations given in column 2 enables a single table to be employed. To make a test calculate the function of the observations given in column 2 and compare its value with those given in that cell of the table identified in column 3. Greater values should be adjudged significant. Smaller values are not conclusive; a larger experiment might show significance. P of the table is the level of significance to be quoted.
The three entries in each cell of the table are for three levels of significance (P = 0.05, 0.01, 0.001), the values for P = 0.01 being printed in bold type. P is the risk of a wrong decision when no difference exists; the risk of a wrong decision in other cases obviously cannot be stated generally because the differences may be of any magnitude. The smaller the value of P used, the larger will a real difference have to be before it makes itself apparent by these tests. The choice of P must therefore be made by balancing this risk against the magnitude of the difference which will just escape detection, i.e. the table value.
Except for test 9, the table is calculated on the assumption that the error or random sampling variation referred to above results in observations being normally distributed. Small departures from normality will not usually affect the decisions because their effect on P is small.
Table for significance tests
This site is hosted and maintained by the National Physical Laboratory