Read the first article in the series:
Interaction Effects in Designed Experiments

Some Comments on "Historical" Designed Experiments

by Keith M. Bower

Six Sigma practitioners frequently analyze historical data to assess process behavior. Analysts are occasionally encouraged to use the tools of experimental design to investigate such data and draw conclusions. The second in a three-part series developed to address common misconceptions regarding designed experiments, this article exposes some drawbacks to historical designed experiments and recommends alternative strategies.

Overview of designed experiments

In analyzing a process a designed experiment holds many advantages over ad hoc procedures. With a clearly defined and executed experiment, the investigator may hold certain noise factors constant by testing in a laboratory setting and alter other factors to obtain estimates of main and interaction effects. Including noise factors in the experiment may help assess the robustness of a given product to, say, environmental conditions.

The two-level factorial design (also known as the two-to-the-k, or 2k) is typically taught as one way of planning and executing an experiment with k predictor variables. As an illustration of this type of experimental design, consider the following example that uses a 23 design.

Example of a 23 design

An experiment to be performed seeks to maximize the yield from a chemical reaction. Three factors are to be investigated: temperature, pressure, and the mass of the catalyst used on a fixed amount of iron. For each factor, high and low values have been selected and are coded as shown below.

(A) Temperature: low (-1) = 40 degrees Celsius (0C), high (+1) = 1000C
(B) Pressure: low (-1) = 200 pounds per square inch (PSI), high (+1) = 1000 PSI
(C) Catalyst: low (-1) = 1 milligram (mg), high (+1) = 3 mg

After power and sample size assessments, the experimenter decides to obtain four runs for each of the (23 = 8) possible combinations, randomizing the order of these (4*8 = 32) runs. After the experiment is executed, a regression model is used as in equation (1).

(1)

Appropriate model reduction may be required, using the principle of parsimony (a preference for the simplest formula) based on the sparsity of effects assumption (only a few effects will be significant). Analysis of the residuals may help verify model assumptions.

With regard to this example, the following circumstances have implications in relation to a historical designed experiment:

(a) The order of the runs has been randomized
(b) Low and high values of the factors correspond directly to specific values
(c) There are equal numbers of low and high values used for each factor (16 low, 16 high)
(d) The coded values of -1 and +1 are used in the setup for the low and high values, respectively
(e) A regression model is built and verified (using the coded values)

Implications for "Historical" Designed Experiments

The historical designed experiment (HDE) procedure as frequently taught consists of a practitioner assessing data collected over time and then using the 2k factorial method of analysis. This discussion assumes that the analyst has not controlled the input factors and has actually recorded more than two levels of each input variable(s).

Consider the following points:

(a) The order of the runs has been randomized

The act of randomly assigning treatments to factors is widely regarded as the only valid way to argue a cause-and-effect relationship plausibly. Essentially, the ability of the experimenter to purposefully alter the order in which the factor levels are used legitimizes such inferences. We may rely upon randomization to wash out the effects of other factors.

A statistician may, using probability theory alone, seek to assess the likelihood of obtaining the results merely by chance. In practice, this type of inference, based on permuting the original data, may lead to time-intensive computations. Analysts therefore typically depend upon the approximate results provided by normal theory techniques.1

For this discussion, merely note that inference of a cause-and-effect nature is invalid when using an HDE: with an HDE the data are collected over some period of time, and the factor settings have not purposefully and strategically been adjusted. The implication of a designed experiment having been performed is therefore a misnomer.

(b) Low and high values of the factors correspond directly to specific values

Consider the model in equation (1). Assume that after analyzing the results the reduced model takes the form:

(2)

Therefore, the main effects of Temperature (A) and Catalyst (C), and their interaction, are to be used. The low and high values for these factors (400C and 1000C for Temperature, 1 mg and 3 mg for Catalyst) may be used to obtain estimates of β0, β1, β3, and β13 in equation (2) to predict Yield.

However, when using an arbitrary cut-off pointthat is, when all values less than x are low and the rest are highwe cannot derive valid predicted values. The low and high values in an HDE have no explicit meaning other than discerning some threshold as to what will be judged low or high. Regression models built using the HDE procedure may thus provide incorrect parameter estimates.

(c) There are equal numbers of low and high values used for each factor (16 low, and 16 high)

Using equal numbers of low and high values constitutes a balanced design. The statistical analysis is relatively straightforward and many statistical software packages can provide results directly.

Importantly, the concept of orthogonality (see section d, below) will be violated to some extent with unbalanced designs. I have become aware of practitioners who use the HDE procedure purposefully discarding data points (to obtain equal low and high values) solely in order for a statistical software package to provide results. These analysts have not only sought to fit their data to a 2k factorial design model, but they have deleted data to do so.

Such arbitrary data destruction is extremely poor practice and should be strongly discouraged. When the tools of regression analysis are applied to a set of data, the design does not have to be balanced.

(d) The values of -1 and +1 are used in the setup for the low and high values, respectively

By using coded values of 1 and +1 for the two levels in our example, we obtain an orthogonal design structure. Essentially, when effect estimates are orthogonal as in the 23 design discussed in the chemical example, the inclusion or exclusion of other terms in the model does not affect the main and interaction effect size estimates.

When terms are not orthogonal, the presence of multicollinearity between factors must be of concern. In particular, when seeking to remove terms in a model when multicollinearity is present, the analyst needs to exercise more caution than in the orthogonal case, since inferences on the parameter estimates can be (seriously) affected.

Crucially, with an HDE the orthogonality assumption will necessarily be violated since the low and high values are not mapped to two unique values. A statistical software package would be ignorant of this, however, if the user only indicates the low and high values as 1 and +1, respectively. The effect estimates produced in this way would therefore likely be incorrect and may be seriously disingenuous.

(e) A regression model is built and verified

From the chemical reaction example, suppose the experimenter finally decides upon equation (2) as a useful model for explaining the behavior of the reaction. It is imperative, using normal theory procedures, to verify model assumptions by analysis of the residuals.2

With an HDE, the residuals computed would not validly represent the true errors since the fitted values were not obtained using the original independent variable values. Rather, the model has only noted whether that variable was operating at the high or low value. The model conclusions based on residuals obtained using the HDE methodology could not, therefore, be satisfactorily assessed.

Summary

There are several theoretical flaws to applying experimental design procedures to historical data. When presented with data collected over time, and when control over the predictor variables has not been exercised, practitioners should instead consider applying appropriate regression analysis techniques, which may require the incorporation of time series investigations.

In choosing regression analysis over historical designed experiments, analysts may use all data available in the attempt to discover how processes truly behave. Obviously, one remains unable to argue cause-and-effect when employing regression analysis with historical data, though such an investigation may help identify important factors and regions of inspection for subsequent studies.


References

1. For a detailed discussion of normal theory tests as approximations to randomization tests, see George E. P. Box, William G. Hunter, and J. Stuart Hunter, Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building (John Wiley & Sons, Inc., 1978). For an example of a randomization test and its relation to the two-sample t-test, see Keith M. Bower, "The Two-Sample t-Test and Randomization Test," ASQ Six Sigma Forum, June 2003.

2. For information on assessing model assumptions, see Douglas C. Montgomery, Design and Analysis of Experiments, 5th ed. (John Wiley & Sons, Inc., 2000): 79-93.



© Keith M. Bower. All rights reserved.