Six Sigma projects sometimes require practitioners to perform designed experiments. In planning or running these experiments, focusing too closely on desired outcomes or quick results can create misleading circumstances. Practitioners may, for instance, find it easy to overlook or misinterpret interaction effects, misapply tools of experimental design to historical data, or neglect the advantages of randomizing the order of experimental runs.
The first in a series of three articles developed to address common misconceptions regarding designed experiments, this discussion explores the impact of interaction effects, highlighting certain key points in experimental design strategies. A hypothetical example, used solely for pedagogical purposes, helps to illustrate the practical implications of interaction effects.
Consider a situation in which a saucepan of water containing pasta has been heated. Unfortunately the water was allowed to evaporate completely, and the pasta subsequently burnt onto the pan. To remove the remnants from the saucepan, two factors are to be investigated:
The experimenter decides to run these two factors—(A) Temperature, and (B) Time—at two settings, or “levels,” each. The two levels of temperature are 20°C and 100°C. The two levels of cleaning time are 15 seconds and 120 seconds.
The response variable (Residue) is the amount of pasta (in grams) removed after the cleaning process has been completed. The experimenter is therefore interested in maximizing the response.
The factor level values for Temperature and Time are coded as “low” (–1), or “high” (+1), as shown in Table 1.
| Factor | Level | Coded | Uncoded |
| (A) Temperature | Low | -1 | 20°C |
| High | +1 | 100°C | |
| (B) Time | Low | -1 | 15 seconds |
| High | +1 | 120 seconds |
Too frequently, statisticians conduct an inadequate level of planning prior to experimentation. The need to make sense of a complicated dataset should be moot if the experiment is thought out in detail ahead of time. It is the effort put in at the planning stage that determines the legitimacy and usefulness of the results subsequently obtained.
Aspects of the experiment requiring consideration in the planning stage may include the following:
One key consideration is the number of experimental runs to be conducted. An adequate number are required such that an effect size of practically large magnitude has ample opportunity to be detected.
For this experiment, assume that the standard deviation of the response variable, within the range of values being investigated, is 2 grams. In general, previous studies may produce this estimate, or the level of variation witnessed in a process may suggest it.
One would be advised to err on the side of using a larger standard deviation estimate for power and sample size calculations. Too small an estimate of the RMSE (root mean squared error) would suggest too few replicates; therefore, the power of the experiment, or the probability of correctly detecting certain differences, would be lower than desired.
The investigator determines the effect size that would be deemed practically significant prior to running the experiment, along with addressing acceptable Type I and Type II error rates.1 Consider a 4-gram (two-standard-deviation) difference, or more, in the response variable to be of practical interest.
As shown in Figure 1, each of the four temperature/time combinations needs to be replicated three times, leading to twelve runs in total. The actual power would then be slightly above 80% (power ≈ 85.73%).
2-Level Factorial Design
Alpha = 0.05 Assumed standard deviation = 2
| Factors: | 2 | Base Design: | 2, 4 | ||
| Blocks: | none |
|
Center Points |
Effect | Reps |
Total Runs |
Target Power |
Actual Power | |
| 0 | 4 | 3 | 12 | 0.8 | 0.857290 |
The experimental runs for cleaning the burnt pasta from the pan are randomized. Table 2 shows the results from 12 saucepans, each originally containing approximately equal amounts of burnt macaroni.
| Run Order | Temperature | Time | Residue |
| 1 | 100 | 15 | 11.01 |
| 2 | 20 | 120 | 4.82 |
| 3 | 100 | 15 | 8.98 |
| 4 | 20 | 120 | 6.86 |
| 5 | 100 | 120 | 17.00 |
| 6 | 100 | 120 | 12.91 |
| 7 | 20 | 15 | 10.27 |
| 8 | 100 | 15 | 12.88 |
| 9 | 20 | 15 | 6.50 |
| 10 | 20 | 120 | 9.15 |
| 11 | 20 | 15 | 7.98 |
| 12 | 100 | 120 | 15.22 |
For this experiment a model may be considered of the form:
(1) Residue = β0 + β1(Temperature) + β2(Time) + β12(Temperature*Time) + ε
Figure 2 shows the results using the Analysis of Variance (ANOVA) procedure and the corresponding regression coefficients.
Estimated Effects and Coefficients for Residue (coded units)
| Term | Effect | Coef | SE Coef | T | P | |
| Constant | 10.2983 | 0.5829 | 17.67 | 0.000 | ||
| Temperature | 5.4033 | 2.7017 | 0.5829 | 4.63 | 0.002 | |
| Time | 1.3900 | 0.6950 | 0.5829 | 1.19 | 0.267 | |
| Temperature*Time | 2.6967 | 1.3483 | 0.5829 | 2.31 | 0.049 |
S = 2.01931 R-Sq = 77.93% R-Sq(adj) = 69.66%
Analysis of Variance for Residue (coded units)
| Source | DF | Seq SS | Adj SS | Adj MS | F | P | |
| Main Effects | 2 | 93.38 | 93.38 | 46.692 | 11.45 | 0.004 | |
| 2-Way Interactions | 1 | 21.82 | 21.82 | 21.816 | 5.35 | 0.049 | |
| Residual Error | 8 | 32.62 | 32.62 | 4.078 | |||
| Pure Error | 8 | 32.62 | 32.62 | 4.078 | |||
| Total | 11 | 147.82 |
Estimated Coefficients for Residue using data in uncoded units
| Term | Coef |
| Constant | 7.95262 |
| Temperature | 0.0242024 |
| Time | -0.0252857 |
| Temperature*Time | 0.000642063 |
From the ANOVA table:
Considering the results in the regression analysis output, the main effect of Time is not statistically significant at the α = 0.05 level (P-value = 0.267 > 0.05). Crucially, however, Time is important via the interaction effect.
The main effect of Temperature is statistically significant (P-value = 0.002 < 0.05).
Note that owing to the heredity of effects principle, typically either one or both main effects that compose the interaction term will be statistically significant if the interaction effect is significant. From a practical perspective, however, “optimal” settings from using the main effect estimates alone may be misleading when a significant interaction effect exists.

As shown in Figure 3, when using the high temperature setting (100°C), there is a marked increase in the amount of residue removed with the longer cleaning time (120 seconds). Note that the interaction lines do not need to overlap for a significant interaction effect. Graphically, the interaction effect is shown by a twisting in the response plane, as shown in the surface plot in Figure 4 and in the contour plot in Figure 5.


Clearly, the optimum settings involve higher temperatures with longer cleaning times. From Figure 2, a regression model using uncoded units may be expressed as:
(2) Residue = 7.9526 + (0.0242*Temperature) - (0.0253*Time) + (0.0006*Temperature*Time)
Note that the insignificant term (Time) remains in the model as a main effect solely because it is included in the interaction effect. This is because the principle of hierarchy is used. 2
Six Sigma practitioners may sometimes overlook or misinterpret interaction effects in designed experiments. The above example demonstrates how to identify interaction effects and illustrates the effect on the response variable. When interactions are present, “optimal” settings obtained from main effects alone may produce non-optimal results.
References
1. For more information on power and sample size determination, see Keith M. Bower, “Sample Size Determination for the Test of One Proportion,” International Society of Six Sigma Professionals: EXTRAOrdinary Sense 3, no. 1 (2002): 6-7.
2. For more information on hierarchical models, see Douglas C. Montgomery, Design and Analysis of Experiments, 5th ed. (New York: John Wiley & Sons, Inc., 2001), 203, and Section 5-5 in Chapter 5 of the Supplemental Text Materials section.
Bibliography
1. Box, George E. P., William G. Hunter, and Stuart J. Hunter. Statistics for Experimenters: An Introduction to Design, Data Analysis, and ModelBuilding. New York: John Wiley & Sons, Inc., 1978.
2. Box, George E. P. “Do Interactions Matter?” Part B.1. in Box on Quality and Discovery: With Design, Control, and Robustness. New York: John Wiley and Sons, Inc., 2000.
3. Fisher, Ronald A. The Design of Experiments. 8th ed. New York: Hafner Publishing Company Inc., 1966.