To understand the relationship between variables, Six Sigma practitioners may be required to perform a designed experiment. When obtaining the results using the Analysis of Variance (ANOVA) procedure, it may be appropriate to remove terms from the full model. This article describes a methodology for removing terms, and an understanding of why this approach may be legitimate.
It is important to note that mathematical models used in designed experiments are merely an attempt to understand relationships between variables. As George Box once observed, all models are all wrong, but some are useful.
When performing a designed experiment, variables and factor levels are chosen to hopefully provide key insights into these relationships. Certain techniques such as blocking, replication and randomization may be appropriate in this endeavor.1
From the outset, an experimenter typically considers the full model, namely a model that includes higher order interactions as well as main effects. Once the experiment has been performed and results obtained, it may be appropriate to remove certain terms if they do not explain a significant amount of the total variation. In practice, the experimenter attempts to develop an understanding of the few (owing to the sparsity of effects phenomenon) main and interaction effects that may be driving the relationship under investigation.
A hypothetical example, used solely for pedagogical purposes, illustrates the methodology behind model reduction.
Consider a chemical reaction in which four temperature settings 100:C, 120:C, 140:C and 160:C and two pressure settings 200 PSI and 1000 PSI are to be investigated. The eight combinations are tested five times in a random order, leading to forty observations in total.
Once the data are collected the results are analyzed using the ANOVA procedure, employing the full model as shown in (1).
(1) Yield = 5 + Temperaturei + Pressurej + Temp x Pressureij +
Where i = 1, 2, 3, 4; j = 1, 2
| Source | DF | Seq SS | Adj SS | Adj MS | F | P |
| Temp | 3 | 457.28 | 457.28 | 152.43 | 5.25 | 0.005 |
| Pressure | 1 | 0.35 | 0.35 | 0.35 | 0.01 | 0.913 |
| Temp*Pressure | 3 | 38.09 | 38.09 | 12.70 | 0.44 | 0.728 |
| Error | 32 | 928.19 | 928.19 | 29.01 | ||
| Total | 39 | 1423.91 |
From Table 1, we find there are 1423.91 sums of squares in total, with
39 degrees of freedom.2 Frequently, practitioners mistakenly
disregard this final line in the ANOVA table. However, consider that
1423.91 divided by 39 gives us the sample variance (36.5). The ANOVA
table furnishes us with information as to how this sample variance is
being decomposed into the constituent elements (this is, after all, why
it is called ANOVA). As discussed by R.A. Fisher:3
When the variation of any quantity (variate) is produced by the action of two or more independent causes, it is known that the variance produced by all the causes simultaneously in operation is the sum of the values of the variance produced by each cause separatelyThe property of the variance, by which each independent cause makes its own contribution to the total, enables us to analyse the total, and to assign, with more or less accuracy, the several portions to their appropriate causes, or groups of causes.
In this example we find that roughly a third (457/1424) of the variation is explained by Temp, with only 3 of the 39 available degrees of freedom being used to explain this amount of variation. In comparison, neither the Temp*Pressure interaction effect nor the Pressure main effect seem to be important sources of variation. Note also that their P-values are statistically insignificant (i.e. 0.913 and 0.728 are both higher than 0.05).
From the evidence indicated in the ANOVA table, it is clear that this system appears to be driven primarily by the main effect of Temp. The other terms in the model are therefore placed into the error term as they are regarded as being, essentially, noise.
The model would then be refit and residuals assessed in the usual manner to ensure that model assumptions are approximately well met.4
When considering sources of variation, the ANOVA table provides key information for model reduction purposes. The Six Sigma practitioner needs to use common sense, along with their process knowledge to distinguish between the important few sources of variation from the trivial many.
References
1. For more information on these terms, see Keith M. Bower, Some Comments on Historical Designed Experiments, ASQ Six Sigma Forum, November 2004.
2. For a discussion of degrees of freedom, see Keith M. Bower, Why Divide by n-1? ASQ Six Sigma Forum, February 2005.
3. R.A. Fisher, Studies in Crop Variation. I. An Examination of the Yield of Dressed Grain from Broadbalk, Journal of Agricultural Science, 11 no. 2 (1920), 110-111.
4. For information on assessing model assumptions, see Douglas C. Montgomery, Design and Analysis of Experiments, (New York: John Wiley and Sons, Inc., 2004): 75-85.
© Keith M. Bower. All rights reserved.