Model Reduction in Designed Experiments

by Keith M. Bower

To understand the relationship between variables, Six Sigma practitioners may be required to perform a designed experiment. When obtaining the results using the Analysis of Variance (ANOVA) procedure, it may be appropriate to remove terms from the full model. This article describes a methodology for removing terms, and an understanding of why this approach may be legitimate.

Models in Designed Experiments

It is important to note that mathematical models used in designed experiments are merely an attempt to understand relationships between variables. As George Box once observed, all models are all wrong, but some are useful.

When performing a designed experiment, variables and factor levels are chosen to hopefully provide key insights into these relationships. Certain techniques such as blocking, replication and randomization may be appropriate in this endeavor.1

From the outset, an experimenter typically considers the full model, namely a model that includes higher order interactions as well as main effects. Once the experiment has been performed and results obtained, it may be appropriate to remove certain terms if they do not explain a significant amount of the total variation. In practice, the experimenter attempts to develop an understanding of the few (owing to the sparsity of effects phenomenon) main and interaction effects that may be driving the relationship under investigation.

A hypothetical example, used solely for pedagogical purposes, illustrates the methodology behind model reduction.

Example

Consider a chemical reaction in which four temperature settings  100:C, 120:C, 140:C and 160:C and two pressure settings  200 PSI and 1000 PSI are to be investigated. The eight combinations are tested five times in a random order, leading to forty observations in total.

Once the data are collected the results are analyzed using the ANOVA procedure, employing the full model as shown in (1).

(1) Yield = 5 + Temperaturei + Pressurej + Temp x Pressureij +

Where i = 1, 2, 3, 4; j = 1, 2

Table 1

Source DF Seq SS Adj SS Adj MS F P
Temp 3 457.28 457.28 152.43 5.25 0.005
Pressure 1 0.35 0.35 0.35 0.01 0.913
Temp*Pressure 3 38.09 38.09 12.70 0.44 0.728
Error 32 928.19 928.19 29.01    
Total 39 1423.91        


From Table 1, we find there are 1423.91 sums of squares in total, with 39 degrees of freedom.2 Frequently, practitioners mistakenly disregard this final line in the ANOVA table. However, consider that 1423.91 divided by 39 gives us the sample variance (36.5). The ANOVA table furnishes us with information as to how this sample variance is being decomposed into the constituent elements (this is, after all, why it is called ANOVA). As discussed by R.A. Fisher:3

When the variation of any quantity (variate) is produced by the action of two or more independent causes, it is known that the variance produced by all the causes simultaneously in operation is the sum of the values of the variance produced by each cause separatelyThe property of the variance, by which each independent cause makes its own contribution to the total, enables us to analyse the total, and to assign, with more or less accuracy, the several portions to their appropriate causes, or groups of causes.

In this example we find that roughly a third (457/1424) of the variation is explained by Temp, with only 3 of the 39 available degrees of freedom being used to explain this amount of variation. In comparison, neither the Temp*Pressure interaction effect nor the Pressure main effect seem to be important sources of variation. Note also that their P-values are statistically insignificant (i.e. 0.913 and 0.728 are both higher than 0.05).

From the evidence indicated in the ANOVA table, it is clear that this system appears to be driven primarily by the main effect of Temp. The other terms in the model are therefore placed into the error term as they are regarded as being, essentially, noise.

The model would then be refit and residuals assessed in the usual manner to ensure that model assumptions are approximately well met.4

Summary

When considering sources of variation, the ANOVA table provides key information for model reduction purposes. The Six Sigma practitioner needs to use common sense, along with their process knowledge to distinguish between the important few sources of variation from the trivial many.

About the Author

Keith M. Bower is a statistician and webmaster for www.KeithBower.com, a site devoted to providing access to online learning materials for quality improvement using statistical methods. He received a bachelors degree in mathematics with economics from Strathclyde University in Great Britain and a masters degree in quality management and productivity from the University of Iowa in Iowa City, USA. He is a member of ASQ and the Six Sigma Forum.

References

1. For more information on these terms, see Keith M. Bower, Some Comments on Historical Designed Experiments, ASQ Six Sigma Forum, November 2004.

2. For a discussion of degrees of freedom, see Keith M. Bower, Why Divide by n-1? ASQ Six Sigma Forum, February 2005.

3. R.A. Fisher, Studies in Crop Variation. I. An Examination of the Yield of Dressed Grain from Broadbalk, Journal of Agricultural Science, 11 no. 2 (1920), 110-111.

4. For information on assessing model assumptions, see Douglas C. Montgomery, Design and Analysis of Experiments, (New York: John Wiley and Sons, Inc., 2004): 75-85.



© Keith M. Bower. All rights reserved.