Six Sigma practitioners, especially those in the service quality/transactional arena, may encounter large volumes of data available for analysis. Large subgroup sizes present a relatively new challenge in the history of control chart usage. This article discusses the implications of large subgroup sizes for the control charts frequently used in Six Sigma projects and provides some general advice for practitioners.
An integral part of the DMAIC (Define, Measure, Analyze, Improve, Control) methodology is process monitoring via control charts. These charts (Xbar-R, I-MR, etc.) date from the pioneering work of Walter Shewhart. With continuous data, four or five observations, a small subgroup size, would be sampled from the process. A statistic such as the mean or range would then be computed from the sample and plotted on a control chart.
The X-bar chart serves as a specific illustration of the general form of a Shewhart chart. Consider a mathematical model with the response Y, composed of some constant mean, μ1, plus an error component, ε:
(1) Y = μ1 + ε
Next, consider the process to have shifted by an amount, δ, to a new
process mean, μ2, where:
(2) μ2 = μ1 + δ
The new model takes the form:
(3) Y = (μ1 + δ) + ε, or equivalently
(4) Y = μ2 + ε
Larger subgroup sizes can be shown to result in a higher probability of correctly detecting a change in the mean (a shift from μ1 to μ2). This is merely a feature of the increased sensitivity obtained when looking at the mean of a group of observations as opposed to looking at, say, one data point.
For large shifts in a process, a δ value of the order of 1.5
standard deviations or more, the mean from a sample of four or five
observations will have a very good chance of falling outside the control
chart limits (i.e.,
).
Incidentally, this does not depend upon an assumption of having sampled
from a Normal distribution.
A chart with a large subgroup size will be not only very sensitive to large shifts in the process mean, but also sensitive to much smaller shifts. That is, δ may be very small indeed, yet it will be detected with a high probability.
As noted by Wadsworth, Stephens, and Godfrey:
[M]any applications may not require great sensitivity in the detection of small shifts as they may occur from day to day. Hence, using larger subgroups just to be sure! may actually be counterproductive when too much time is spent checking the process for unimportant changes signaled by an oversensitive chart. 1
A pertinent question is, what is the practical impact on our interpretation of control charts when large sample sizes are employed?
With the advent of new technology, it is frequently possible to obtain very large numbers of observations from a process, as in the case of call centers receiving inbound telephone calls from customers. The manner in which these data are obtained for analysis may be as straightforward as simply querying a database where massive data sets are automatically updated. As such, one hour may be an entirely reasonable period in which data would be collected, and the resulting subgroup size may be extremely large.
It does not make sense to remove observations solely to fit the scheme of creating a chart that would be sensitive to large shifts in a process characteristic. However, one finds in practice that arbitrary data destruction schemes are employed in many Six Sigma projects.
A key consideration at this point is to note that by monitoring, say, the mean of a large sample, we are by the very construction of the chart looking solely at a measure of central tendency. If our concern is to isolate unusual observations such as long cycle times, it would be more appropriate to look at the individual values themselves rather than bury them in an estimate of the process mean. These data points are typically the raisons djtre for process monitoring and thereby warrant investigation, not camouflaging.
Large subgroup sizes may result in very sensitive Shewhart control charts. In fact, a chart with a large subgroup size may be sensitive to small and even practically irrelevant changes in a process characteristic. 2
Six Sigma practitioners would be advised against arbitrarily removing data points to desensitize a control chart. They should, however, keep in mind that such a chart would now be sensitive to much smaller shifts. Moreover, a clear focus as to precisely why a particular control chart and subgrouping scheme are being used should, of course, be of paramount concern.
1 Harrison M. Wadsworth, Kenneth S. Stephens, and A. Blanton Godfrey, Modern Methods for Quality Control and Improvement (New York: John Wiley & Sons, 1986), 223.
2 For a more detailed perspective on the effects of large sample sizes in the use of P-charts (frequently encountered in the service quality/transactional arena), see Peter A. Heimann, Attributes Control Charts with Large Sample Sizes, Journal of Quality Technology 28, no. 4 (1996): 451-459.
Montgomery, Douglas C. Introduction to Statistical Quality Control. 5th ed. New Jersey: John Wiley & Sons, Inc., 2004.
Shewhart, Walter A. Economic Control of Quality of Manufactured Product. New York: Van Nostrand-Reinhold, 1931.
© Keith M. Bower. All rights reserved.