Interaction Effects in Designed Experiments

by Keith M. Bower

Six Sigma projects sometimes require practitioners to perform designed experiments. In planning or running these experiments, focusing too closely on desired outcomes or quick results can create misleading circumstances. Practitioners may, for instance, find it easy to overlook or misinterpret interaction effects, misapply tools of experimental design to historical data, or neglect the advantages of randomizing the order of experimental runs.

The first in a series of three articles developed to address common misconceptions regarding designed experiments, this discussion explores the impact of interaction effects, highlighting certain key points in experimental design strategies. A hypothetical example, used solely for pedagogical purposes, helps to illustrate the practical implications of interaction effects.

Example

Consider a situation in which a saucepan of water containing pasta has been heated. Unfortunately the water was allowed to evaporate completely, and the pasta subsequently burnt onto the pan. To remove the remnants from the saucepan, two factors are to be investigated:

  1. Temperature of the water used to clean the saucepan, measured in degrees Celsius (°C), and
  2. Cleaning time, measured in seconds

The experimenter decides to run these two factors—(A) Temperature, and (B) Time—at two settings, or “levels,” each. The two levels of temperature are 20°C and 100°C. The two levels of cleaning time are 15 seconds and 120 seconds.

The response variable (Residue) is the amount of pasta (in grams) removed after the cleaning process has been completed. The experimenter is therefore interested in maximizing the response.

The factor level values for Temperature and Time are coded as “low” (–1), or “high” (+1), as shown in Table 1.

Table 1

Factor Level Coded Uncoded
(A) Temperature Low -1 20°C
  High +1 100°C
(B) Time Low -1 15 seconds
  High +1 120 seconds

Planning the experiment

Too frequently, statisticians conduct an inadequate level of planning prior to experimentation. The need to make sense of a complicated dataset should be moot if the experiment is thought out in detail ahead of time. It is the effort put in at the planning stage that determines the legitimacy and usefulness of the results subsequently obtained.

Aspects of the experiment requiring consideration in the planning stage may include the following:

  • Which factors are to be used?
  • At what settings should the factors be run?
  • Would blocking be appropriate?
  • How are “hard-to-change” factors to be addressed?

Power and sample size

One key consideration is the number of experimental runs to be conducted. An adequate number are required such that an effect size of practically large magnitude has ample opportunity to be detected.

For this experiment, assume that the standard deviation of the response variable, within the range of values being investigated, is 2 grams. In general, previous studies may produce this estimate, or the level of variation witnessed in a process may suggest it.

One would be advised to err on the side of using a larger standard deviation estimate for power and sample size calculations. Too small an estimate of the RMSE (root mean squared error) would suggest too few replicates; therefore, the power of the experiment, or the probability of correctly detecting certain differences, would be lower than desired.

The investigator determines the effect size that would be deemed practically significant prior to running the experiment, along with addressing acceptable Type I and Type II error rates.1 Consider a 4-gram (two-standard-deviation) difference, or more, in the response variable to be of practical interest.

As shown in Figure 1, each of the four temperature/time combinations needs to be replicated three times, leading to twelve runs in total. The actual power would then be slightly above 80% (power ≈ 85.73%).

Figure 1

Power and Sample Size

2-Level Factorial Design

Alpha = 0.05 Assumed standard deviation = 2

Factors: 2   Base Design: 2, 4  
Blocks: none        

 
Center
Points
Effect Reps Total
Runs
Target
Power
Actual Power  
0 4 3 12 0.8 0.857290  

Interaction Effects in Designed Experiments: Interpreting Results

The experimental runs for cleaning the burnt pasta from the pan are randomized. Table 2 shows the results from 12 saucepans, each originally containing approximately equal amounts of burnt macaroni.

Table 2

Run Order Temperature Time Residue
1 100 15 11.01
2 20 120 4.82
3 100 15 8.98
4 20 120 6.86
5 100 120 17.00
6 100 120 12.91
7 20 15 10.27
8 100 15 12.88
9 20 15 6.50
10 20 120 9.15
11 20 15 7.98
12 100 120 15.22
  • Temperature = temperature of the water used to clean the pan, in degrees Celsius
  • Time = cleaning time, measured in seconds
  • Residue = amount of pasta, measured in grams, removed after cleaning process

For this experiment a model may be considered of the form:

(1) Residue = β0 + β1(Temperature) + β2(Time) + β12(Temperature*Time) + ε

Figure 2 shows the results using the Analysis of Variance (ANOVA) procedure and the corresponding regression coefficients.

Figure 2

Factorial Fit: Residue versus Temperature, Time

Estimated Effects and Coefficients for Residue (coded units)

Term Effect Coef SE Coef T P  
Constant   10.2983 0.5829 17.67 0.000  
Temperature 5.4033 2.7017 0.5829 4.63 0.002  
Time 1.3900 0.6950 0.5829 1.19 0.267  
Temperature*Time 2.6967 1.3483 0.5829 2.31 0.049  

S = 2.01931 R-Sq = 77.93% R-Sq(adj) = 69.66%

Analysis of Variance for Residue (coded units)

Source DF Seq SS Adj SS Adj MS F P  
Main Effects 2 93.38 93.38 46.692 11.45 0.004  
2-Way Interactions 1 21.82 21.82 21.816 5.35 0.049  
Residual Error 8 32.62 32.62 4.078      
Pure Error 8 32.62 32.62 4.078      
Total 11 147.82          

Estimated Coefficients for Residue using data in uncoded units

Term Coef
Constant 7.95262
Temperature 0.0242024
Time -0.0252857
Temperature*Time 0.000642063

From the ANOVA table:

  1. the interaction term is statistically significant at the α = 0.05 level (P-value = 0.049 < 0.05); therefore, we reject H0: β12 = 0
  2. we also reject the null hypothesis, H0: β1 = β2 = 0 (P-value = 0.004 < 0.05), i.e., at least one main effect is significant

Considering the results in the regression analysis output, the main effect of Time is not statistically significant at the α = 0.05 level (P-value = 0.267 > 0.05). Crucially, however, Time is important via the interaction effect.

The main effect of Temperature is statistically significant (P-value = 0.002 < 0.05).

Note that owing to the heredity of effects principle, typically either one or both main effects that compose the interaction term will be statistically significant if the interaction effect is significant. From a practical perspective, however, “optimal” settings from using the main effect estimates alone may be misleading when a significant interaction effect exists.

Figure 3

Figure 3

As shown in Figure 3, when using the high temperature setting (100°C), there is a marked increase in the amount of residue removed with the longer cleaning time (120 seconds). Note that the interaction lines do not need to overlap for a significant interaction effect. Graphically, the interaction effect is shown by a twisting in the response plane, as shown in the surface plot in Figure 4 and in the contour plot in Figure 5.

Figure 4

Figure 4

 

Figure 5

Figure 5

Clearly, the optimum settings involve higher temperatures with longer cleaning times. From Figure 2, a regression model using uncoded units may be expressed as:

(2) Residue = 7.9526 + (0.0242*Temperature) - (0.0253*Time) + (0.0006*Temperature*Time)

Note that the insignificant term (Time) remains in the model as a main effect solely because it is included in the interaction effect. This is because the principle of hierarchy is used. 2

Summary

Six Sigma practitioners may sometimes overlook or misinterpret interaction effects in designed experiments. The above example demonstrates how to identify interaction effects and illustrates the effect on the response variable. When interactions are present, “optimal” settings obtained from main effects alone may produce non-optimal results.


References

1. For more information on power and sample size determination, see Keith M. Bower, “Sample Size Determination for the Test of One Proportion,” International Society of Six Sigma Professionals: EXTRAOrdinary Sense 3, no. 1 (2002): 6-7.

2. For more information on hierarchical models, see Douglas C. Montgomery, Design and Analysis of Experiments, 5th ed. (New York: John Wiley & Sons, Inc., 2001), 203, and Section 5-5 in Chapter 5 of the Supplemental Text Materials section.


Bibliography

1. Box, George E. P., William G. Hunter, and Stuart J. Hunter. Statistics for Experimenters: An Introduction to Design, Data Analysis, and ModelBuilding. New York: John Wiley & Sons, Inc., 1978.

2. Box, George E. P. “Do Interactions Matter?” Part B.1. in Box on Quality and Discovery: With Design, Control, and Robustness. New York: John Wiley and Sons, Inc., 2000.

3. Fisher, Ronald A. The Design of Experiments. 8th ed. New York: Hafner Publishing Company Inc., 1966.