|
Sign In to gain access to subscriptions and/or personal tools.
|
Applied Psychological Measurement, Vol. 11, No. 4,
329-354 (1987)
DOI: 10.1177/014662168701100401
Methodology Review: Clustering Methods
Glenn W. Milligan
Ohio State University
Martha C. Cooper
Ohio State University
A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research. After an over view of the clustering literature, the clustering process is discussed within a seven-step framework. The four major types of clustering methods can be characterized as hierarchical, partitioning, overlapping, and ordina tion algorithms. The validation of such algorithms re fers to the problem of determining the ability of the methods to recover cluster configurations which are known to exist in the data. Validation approaches in clude mathematical derivations, analyses of empirical datasets, and monte carlo simulation methods. Next, interpretation and inference procedures in cluster anal ysis are discussed. inference procedures involve test ing for significant cluster structure and the problem of determining the number of clusters in the data. The paper concludes with two sets of recommendations. One set deals with topics in clustering that would ben efit from continued research into the methodology. The other set offers recommendations for applied anal yses within the framework of the clustering process.
References
- Anderberg, M.R. (1973). Cluster analysis for researchers. New York: Academic Press.
- Andes, N. (1986, June). Validation of cluster solutions using discriminant analysis and bootstrap techniques. Paper presented at the meeting of the Classification Society of North America, Columbus OH.
- Arnold, S.J. (1979). A test for clusters. Journal of Marketing Research, 19, 545-551.[CrossRef]
- Bailey, T.A., & Dubes, R. (1982). Clustering validity profiles. Pattern Recognition, 15, 61-83.
- Baker, F.B. (1974). Stability of two hierarchical grouping techniques. Case I: Sensitivity to data errors. Journal of the American Statistical Association, 69, 440-445.[CrossRef]
- Baker, F.B., & Hubert, L.J. (1975). Measuring the power of hierarchical cluster analysis . Journal of the American Statistical Association, 70, 31- 38.
- Ball, G.H., & Hall, D.J. (1965). ISODATA, a novel method of data analysis and pattern classification. Menlo Park CA: Stanford Research Institute. (NTIS No. AD 699616)
- Bayne, C.K., Beauchamp, J.J., Begovich, C.L., & Kane, V.E. (1980). Monte carlo comparisons of selected clustering procedures. Pattern Recognition, 12, 51-62.
- Beale, E.M.L. (1969). Cluster analysis. London: Scientific Control Systems.
- Begovich, C.L., & Kane, V.E. (1982). Estimating the number of groups and group membership using simulation cluster analysis. Pattern Recognition, 15, 335-342.[CrossRef]
- Blashfield, R.K. (1976). Mixture model tests of cluster analysis: Accuracy of four agglomerative hierarchical methods. Psychological Bulletin, 83, 377-388.[CrossRef][ISI]
- Blashfield, R.K. (1977a). A consumer report on cluster analysis software: (3) Iterative partitioning methods (NSF grant DCR 74-20007). State College PA: Pennsylvania State University, Department of Psychology.
- Blashfield, R.K. (1977b). The equivalence of three statistical packages for performing hierarchical cluster analysis. Psychometrika , 42, 429-431.[CrossRef]
- Blashfield, R.K. (1980). The growth of cluster analysis: Tryon, Ward, and Johnson. Multivariate Behavioral Research, 15, 439-458.[CrossRef]
- Blashfield, R.K., & Aldenderfer, M.S. (1978). The literature of cluster analysis. Multivariate Behavioral Research, 13, 271-295.[CrossRef]
- Blashfield, R.K., & Morey, L.C. (1980). A comparison of four clustering methods using MMPI monte carlo data. Applied Psychological Measurement, 4, 57-64.[Medline]
[Order article via Infotrieve]
- Bock, H.H. (1985). On some significance tests in cluster analysis . Journal of Classification, 2, 77-108.
- Calinski, R.B., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1-27.
- Cattell, R.B. (1952). The three basic factor-analytic research designs: Their inter-relations and derivatives. Psychological Bulletin, 49, 499-520.[CrossRef][ISI][Medline]
[Order article via Infotrieve]
- Cattell, R.B. (1978). The scientific use of factor analysis. New York: Plenum Press.
- Cormack, R.M. (1971). A review of classification. Journal of the Royal Statistical Society, Series A, 134, 321-367.
- Corter, J.E., & Tversky, A. (1986). Extended similarity trees. Psychometrika , 51, 429-451.[CrossRef]
- Cronbach, L.J., & Gleser, G.C. (1953). Assessing the similarity between profiles. Psychological Bulletin, 50, 456-473.[CrossRef][ISI][Medline]
[Order article via Infotrieve]
- Cunningham, K.M., & Ogilvie, J.C. (1972). Evaluation of hierarchical grouping techniques: A preliminary study. Computer Journal, 15, 209-213.[Abstract]
- D'Andrade, R.G. (1978). U-statistic hierarchical clustering. Psychometrika, 43, 59-67.[CrossRef]
- Day, W. H. E. (Ed.). (1986). Consensus classifications [Special issue]. Journal of Classification , 3(2).
- De Soete, G., DeSarbo, W.S., & Carroll, J.D. (1985). Optimal variable weighting for hierarchical clustering: An alternating least squares approach. Journal of Classification , 2, 173-192.
- Dubes, R., & Jain, A.K. (1979). Validity studies in clustering methodologies. Pattern Recognition, 11, 235-254.[CrossRef]
- Dubes, R., & Jain, A.K. (1980). Clustering methodologies in exploratory data analysis . Advances in Computers, 19, 113-228.
- Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley.
- Edelbrock, C. (1979). Comparing the accuracy of hierarchical clustering algorithms: The problem of classifying everybody. Multivariate Behavioral Research, 14, 367-384.[CrossRef]
- Edelbrock, C., & McLaughlin, B. (1980). Hierarchical cluster analysis using intraclass correlations: A mixture model study. Multivariate Behavioral Research, 15, 299-318.[CrossRef]
- Edwards, A.W.F., & Cavalli-Sforza, L. (1965). A method for cluster analysis. Biometrics, 21, 362-375.[CrossRef][ISI][Medline]
[Order article via Infotrieve]
- Everitt, B.S. (1979). Unresolved problems in cluster analysis. Biometries, 35, 169-181.
- Everitt, B.S. (1980). Cluster analysis (2nd ed.). London: Heinemann.
- Everitt, B.S. (1981). A monte carlo investigation of the likelihood ratio test for the number of components in a mixture of normal distributions . Multivariate Behavioral Research, 16, 171-180.[CrossRef]
- Fisher, L., & Van Ness, J.W. (1971). Admissible clustering procedures . Biometrika, 58, 91-104.[Abstract/Free Full Text]
- Fleiss, J.L., Lawlor, W., Platman, S.R., & Fieve, R.R. (1971). On the use of inverted factor analysis for generating typologies. Journal of Abnormal Psychology, 77, 127-132.[CrossRef]
- Fleiss, J.L., & Zubin, J. (1969). On the methods and theory of clustering. Multivariate Behavioral Research, 4, 235-250.
- Friedman, H.P., & Rubin, J. (1967). On some invariant criteria for grouping data. Journal of the American Statistical Association, 62, 1159-1178.[CrossRef]
- Goldstein, S.G., & Linden, J.D. (1969). A comparison of multivariate grouping techniques commonly used with profile data. Multivariate Behavioral Research , 4, 103-114.
- Good, I.J. (1982). An index of separateness of clusters and a permutation test for its statistical significance. Journal of Statistical Computing and Simulation, 15, 81-84.
- Gordon, A.D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series A, 150, 119-137.
- Gower, J.C. (1967). A comparison of some methods of cluster analysis . Biometrics, 23, 623-628.[CrossRef][ISI][Medline]
[Order article via Infotrieve]
- Gower, J.C. (1975). Goodness-of-fit criteria for classification and other patterned structures. In G. Estabrook (Ed.), Proceedings of the 8th International Conference on Numerical Taxonomy. San Francisco: Freeman.
- Gross, A.L. (1972). A monte carlo study of the accuracy of a hierarchical grouping procedure. Multivariate Behavioral Research, 7, 379-389.[CrossRef]
- Harrigan, K.R. (1985). An application of clustering for strategic group analysis. Strategic Management Journal, 6, 55-73.[ISI]
- Hartigan, J.A. (1975). Clustering algorithms. New York: Wiley.
- Hartigan, J.A. (1977). Distribution problems in clustering. In J. Van Ryzin (Ed.), Classification and clustering (pp. 45-71). New York: Academic Press.
- Hartigan, J.A. (1978). Asymptotic distributions for clustering criteria . Annals of Statistics, 6, 117-131.
- Hartigan, J.A. (1985). Statistical theory in clustering. Journal of Classification, 2, 63-76.
- Hubert, L.J. (1974). Some applications of graph theory to clustering . Psychometrika, 39, 283-309.[CrossRef][ISI]
- Hubert, L.J., & Arable, P. (1985). Comparing partitions. Journal of Classification, 2, 193 -218 .[CrossRef][ISI]
- Hubert, L.J., & Baker, F.B. (1977). The comparison and fitting of given classification schemes. Journal of Mathematical Psychology, 16, 233-253.[CrossRef]
- Jancey, R.C. (1966). Multidimensional group analysis. Australian Journal of Botany, 14, 127 -130.[Medline]
[Order article via Infotrieve]
- Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. New York: Wiley.
- Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241-254.[CrossRef][ISI][Medline]
[Order article via Infotrieve]
- Kaufman, R.L. (1985). Issues in muitivariate cluster analysis: Some simulation results. Sociological Methods and Research, 13, 467-486.
- Kleiner, B., & Hartigan, J.A. (1981). Representing points in many dimensions by trees and castles (with comments and rejoinder). Journal of the American Statistical Association, 76, 260-276.[CrossRef]
- Kruskal, J.B., & Landwehr, J.M. (1983). Icicle plots: Better displays for hierarchical clustering. The American Statistician, 37, 162-168.[CrossRef]
- Kuiper, F.K., & Fisher, L. (1975). A monte carlo comparison of six clustering procedures . Biometrics, 31, 777-783.[CrossRef]
- Lance, G.N., & Williams, W.T. (1967). A general theory of classificatory sorting strategies: I. Hierarchical systems. Computer Journal , 9, 373-380.
- Lee, K.L. (1979). Multivariate tests for clusters. Journal of the American Statistical Association, 74, 708-714.[CrossRef]
- Ling, R.F. (1973). A probability theory of cluster analysis. Journal of the American Statistical Association, 68, 159-164.[CrossRef]
- Lorr, M. (1983). Cluster analysis for the social sciences. San Francisco: Jossey-Bass.
- Marriott, F.H.C. (1971). Practical problems in a method of cluster analysis . Biometrics, 27, 501-514.[CrossRef][ISI][Medline]
[Order article via Infotrieve]
- Matula, D.W. (1977). Graph theoretic techniques for cluster analysis . In J. Van Ryzin (Ed.), Classification and clustering (pp. 95-129). New York: Academic Press.
- McIntyre, R.M., & Blashfield, R.K. (1980). A nearest-centroid technique for evaluating the minimum-variance clustering procedure. Multivariate Behavioral Research, 15, 225-238.[CrossRef]
- McQuitty, L.L. (1987). Pattern-analytic clustering. New York: University Press of America.
- Mezzich, J. (1978). Evaluating clustering methods for psychiatric diagnosis. Biological Psychiatry, 13, 265-346.[ISI][Medline]
[Order article via Infotrieve]
- Milligan, G.W. (1979). Ultrametric hierarchical clustering algorithms . Psychometrika, 44, 343-346.[CrossRef]
- Milligan, G.W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika , 45, 325-342.[CrossRef][ISI]
- Milligan, G.W. (1981a). A monte carlo study of thirty internal criterion measures for cluster analysis. Psychometrika, 46, 187-199.[CrossRef]
- Milligan, G.W. (1981b). A review of monte carlo tests of cluster analysis . Multivariate Behavioral Research, 16, 379-407.[CrossRef]
- Milligan, G.W. (1985). An algorithm for generating artificial test clusters . Psychometrika, 50, 123-127.[CrossRef]
- Milligan, G.W. (1987a). A study of the beta-flexible clustering method (WPS 87-61). Columbus OH: Ohio State University, Faculty of Management Sciences.
- Milligan, G.W. (1987b). A validation study of a variable weighting algorithm (WPS 87-111). Columbus OH: Ohio State University, Faculty of Management Sciences.
- Milligan, G.W., & Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159-179.[CrossRef][ISI]
- Milligan, G.W., & Cooper, M.C. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21, 441-458.[CrossRef]
- Milligan, G.W., & Cooper, M.C. (in press). A study of standardization of variables in cluster analysis . Journal of Classification.
- Milligan, G.W., & Isaac, P. (1980). The validation of four ultrametric clustering algorithms. Pattern Recognition, 12, 41-50.[CrossRef]
- Milligan, G.W., & Mahajan, V. (1980). A note on procedures for testing the quality of a clustering of a set of objects. Decision Sciences, 11, 669-677.
- Milligan, G.W., & Sokol, L.M. (1980). A two-stage clustering algorithm with robust recovery characteristics. Educational and Psychological Measurement , 40, 755-759.[Abstract]
- Mojena, R. (1977). Hierarchical grouping methods and stopping rules: An evaluation. Computer Journal, 20, 359-363.[Abstract]
- Morey, L.C., Blashfield, R.K., & Skinner, H.A. (1983). A comparison of cluster analysis techniques within a sequential validation framework. Multivariate Behavioral Research, 18, 309-329.[CrossRef]
- Needham, R.M. (1967). Automatic classification in linguistics. The Statistician, 17, 45-54.
- Ozawa, K. (1985). A stratificational overlapping cluster scheme . Pattern Recognition, 18, 279-286.[CrossRef]
- Peay, E.R. (1975). Nonmetric grouping: Clusters and cliques. Psychometrika, 40, 297-313.[CrossRef]
- Punj, G., & Stewart, D.W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research , 20, 134-148.[CrossRef]
- Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846 -850.[CrossRef]
- Rohlf, F.J. (1974). Methods of comparing classifications. Annual Review of Ecology and Systematics, 5, 101-113.
- Romesburg, H.C. (1984). Cluster analysis for researchers. Belmont CA: Lifetime Learning Publications.
- Sarle, W.S. (1983). Cubic clustering criterion (Tech. Rep. A-108) . Cary NC: SAS Institute.
- SAS Institute (1985). SAS user's guide: Statistics, version 5 edition. Cary NC : Author.
- Scheibler, D., & Schneider, W. (1985). Monte carlo tests of the accuracy of cluster analysis algorithms-A comparison of hierarchical and nonhierarchical methods. Multivariate Behavioral Research, 20, 283-304.
- Scott, A.J., & Symons, M.J. (1971). Clustering methods based on likelihood ratio criteria . Biometrics, 27, 387-397.[CrossRef][ISI]
- Shepard, R.N., & Arabie, P. (1979). Additive clustering: Representation of similarities as combinations of discrete overlapping properties. Psychological Review, 86, 87-123.[CrossRef][ISI]
- Skinner, H.A. (1978). Differentiating the contribution of elevation, scatter, and shape in profile similarity. Educational and Psychological Measurement, 38, 297-308.[Abstract]
- Sneath, P.H.A. (1969). Evaluation of clustering methods. In A. J. Cole (Ed.), Numerical taxonomy (pp. 257-271). New York: Academic Press.
- Sneath, P.H.A. (1977). A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap. Mathematical Geology, 9, 123-143.[CrossRef]
- Sneath, P.H.A. (1980). The risk of not recognizing from ordinations that clusters are distinct. Classification Society Bulletin, 4, 22-43.
- Sneath, P.H.A., & Sokal, R.R. (1973). Numerical taxonomy. San Francisco : Freeman.
- Soon, S.C. (in press). On detection of extreme data points in cluster analysis. (Doctoral dissertation, Ohio State University, 1988.) Dissertation Abstracts International.
- Späth, H. (1980). Cluster analysis algorithms. New York: Wiley.
- Tryon, R.C., & Bailey, D.C. (1970). Cluster analysis. New York : McGraw-Hill.
- Turner, M.E. (1969). Credibility and cluster. Annals of the New York Academy of Sciences, 161, 680-688.
- Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236-244.[CrossRef][ISI]
- Williams, W.T., Lance, G.N., Dale, M.B., & Clifford, H.T. (1971). Controversy concerning the criteria for taxonometric strategies. Computer Journal , 14, 162-165.[Abstract]
- Wolfe, J.H. (1970). Pattern clustering by multivariate mixture analysis . Multivariate Behavioral Research, 5, 329-350.
- Wong, M.A. (1982). A hybrid clustering method for identifying high-density clusters. Journal of the American Statistical Association, 77, 841- 847.
- Wong, M.A., & Lane, T. (1983). A kth nearest neighbor clustering procedure. Journal of the Royal Statistical Society, Series B, 45, 362 - 368.
- Wong, M.A., & Schaak, C. (1982). Using the kth nearest neighbor clustering procedure to determine the number of subpopulations. Proceedings of the Statistical Computing Section, American Statistical Association, 40-48.

CiteULike Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
M. T. Huss and A. Ralston
Do Batterer Subtypes Actually Matter? Treatment Completion, Treatment Response, and Recidivism Across a Batterer Typology
Criminal Justice and Behavior,
June 1, 2008;
35(6):
710 - 724.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
R. M. Bossarte, T. R. Simon, and M. H. Swahn
Clustering of Adolescent Dating Violence, Peer Violence, and Suicidal Behavior
J Interpers Violence,
June 1, 2008;
23(6):
815 - 833.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
K. L. Fiori, J. Smith, and T. C. Antonucci
Social Network Types Among Older Adults: A Multidimensional Approach
J. Gerontol. B. Psychol. Sci. Soc. Sci.,
November 1, 2007;
62(6):
P322 - P330.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
N. S. Koushik, C. D. Saunders, and B. P. Rourke
Patterns of Cognitive Functioning in a Clinic-Referred Sample of Preschool Children
Canadian Journal of School Psychology,
June 1, 2007;
22(1):
94 - 107.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
M. R. Beg, J. E. Casey, and C. D. Saunders
A Typology of Behavior Problems in Preschool Children
Assessment,
June 1, 2007;
14(2):
111 - 128.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
C. DiStefano and R. W. Kamphaus
Investigating Subtypes of Child Development: A Comparison of Cluster Analysis and Latent Class Cluster Analysis in Typology Creation
Educational and Psychological Measurement,
October 1, 2006;
66(5):
778 - 794.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
T. A. Kinney
Themes And Perceptions Of Written Sexually Harassing Messages And Their Link To Distress
Journal of Language and Social Psychology,
March 1, 2003;
22(1):
8 - 28.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
J. A. Hess
Distance Regulation in Personal Relationships: The Development of a Conceptual Model and a Test of Representational Validity
Journal of Social and Personal Relationships,
October 1, 2002;
19(5):
663 - 683.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
D. Fodness and B. Murray
A Typology of Tourist Information Search Strategies
Journal of Travel Research,
November 1, 1998;
37(2):
108 - 119.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
C. W. Deville and S. Prometric
An Empirical Link of Content and Construct Validity Evidence
Applied Psychological Measurement,
June 1, 1996;
20(2):
127 - 139.
[Abstract]
[PDF]
|
 |
|

|
 |

|
 |
 
D. L Speece and D. H Cooper
Ontogeny of School Failure: Classification of First-Grade Children
American Educational Research Journal,
January 1, 1990;
27(1):
119 - 140.
[Abstract]
[PDF]
|
 |
|
|