SUBROUTINE MIX(MM, M, N, A, CLAB, RLAB, TITLE, K, MXITER, NCOV, * DMWORK, WORK1, DMWRK1, DMWRK2, WORK2, DMWRK3, * WORK3, IWORK, IERR, OUNIT) C C<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> C C PURPOSE C ------- C C FITS THE MIXTURE MODEL BY A MAXIMUM LOG-LIKEHOOD CRITERION C C DESCRIPTION C ----------- C C 1. THE DATA ARE ASSUMED TO BE A RANDOM SAMPLE OF SIZE M FROM A C MIXTURE OF K MULTIVARIATE NORMAL DISTRIBUTIONS IN N DIMENSIONS. C THE PROBABILITY THAT THE J-TH OBSERVATION WAS DRAWN FROM THE C I-TH NORMAL FOR J=1,...,M I=1,...,K IS USED TO ESTIMATE WHICH C NORMAL EACH OBSERVATION WAS SAMPLED FROM, AND HENCE GROUP THE C OBSERVATIONS INTO K CLUSTERS. THE CRITERION TO BE MAXIMIZED IS C THE LOG LIKELIHOOD C C SUM LOG(G(I)) OVER I=1,...,M C C WHERE G(I) IS THE PROBABILITY DENSITY OF THE I-TH OBSERVATION. C C SEE PAGE 116 OF THE REFERENCE FOR A FURTHER DESCRIPTION OF G. C C 2. THE MANY PARAMETERS PRESENT IN THE BETWEEN-NORMAL COVARIANCE C MATRICES REQUIRE MUCH DATA FOR THEIR ESTIMATION. A RULE OF C THUMB IS THAT M SHOULD BE GREATER THAN (N+1)(N+2)K/2. EVEN C WITH MANY OBSERVATIONS, THE PROCEDURE IS VULNERABLE TO C NONNORMALITY OR LINEAR DEPENDENCE AMONG THE VARIABLES. TO C REDUCE THIS SENSITIVITY ONE CAN MAKE ASSUMPTIONS ON THESE C COVARIANCE MATRICES BY SETTING THE NCOV PARAMETER TO: C C 1 IF THE COVARIANCE MATRICES ARE ARBITRARY C 2 IF THE COVARIANCE MATRICES IN DIFFERENT NORMALS ARE EQUAL C 3 IF THE COVARIANCE MATRICES ARE EQUAL AND DIAGONAL C 4 IF ALL VARIABLES HAVE THE SAME VARIANCE AND ARE PAIRWISE C INDEPENDENT C C 3. AFTER EVERY 5 ITERATIONS, THE CLUSTER PROBABILITIES, MEANS, AND C DETERMINANTS OF COVARIANCE MATRICES ARE PRINTED OUT. ALSO, THE C WITHIN-CLUSTER VARIANCES AND CORRELATIONS FOR EVERY PAIR OF C VARIABLES FOR EACH CLUSTER, AND FINALLY EVERY OBSERVATION AND C ITS BELONGING PROBABILILTY FOR EACH CLUSTER IS PRINTED. THE C LOG LIKELIHOOD IS PRINTED AFTER EACH ITERATION. THE ITERATIONS C STOP EITHER AFTER THE MAXIMUM NUMBER OF ITERATIONS HAVE BEEN C REACHED OR AFTER THE INCREASE IN THE LOG LIKELIHOOD FROM ONE C ITERATION TO ANOTHER IS LESS THAT .0001. ALL OUTPUT IS SENT TO C FORTRAN UNIT OUNIT. C C INPUT PARAMETERS C ---------------- C C MM INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE FIRST DIMENSION OF THE MATRIX A. MUST BE AT LEAST M. C C M INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE NUMBER OF CASES. C C N INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE NUMBER OF VARIABLES. C C A REAL MATRIX WHOSE FIRST DIMENSION MUST BE MM AND WHOSE SECOND C DIMENSION MUST BE AT LEAST N (UNCHANGED ON OUTPUT). C THE MATRIX OF DATA VALUES. C C A(I,J) IS THE VALUE FOR THE J-TH VARIABLE FOR THE I-TH CASE. C C CLAB VECTOR OF 4-CHARACTER VARIABLES DIMENSIONED AT LEAST N. C (UNCHANGED ON OUTPUT). C THE LABELS OF THE VARIABLES. C C RLAB VECTOR OF 4-CHARACTER VARIABLES DIMENSIONED AT LEAST M. C (UNCHANGED ON OUTPUT). C THE LABELS OF THE CASES. C C TITLE 10-CHARACTER VARIABLE (UNCHANGED ON OUTPUT). C TITLE OF THE DATA SET. C C K INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE DESIRED NUMBER OF CLUSTERS. C C MXITER INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE MAXIMUM NUMBER OF ITERATIONS ALLOWED. C C NCOV INTEGER SCALAR (UNCHANGED ON OUTPUT). C DETERMINES STRUCTURE OF THE WITHIN-CLUSTER COVARIANCE MATRIX C C NCOV = 1 GENERAL COVARIANCES C NCOV = 2 COVARIANCES EQUAL BETWEEN CLUSTERS C NCOV = 3 COVARIANCES EQUAL AND DIAGONAL C NCOV = 4 COVARIANCES SPHERICAL C C DMWORK INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE LEADING DIMENSION OF THE MATRIX WORK1. MUST BE AT LEAST C 2*M+N+1. C C WORK1 REAL MATRIX WHOSE FIRST DIMENSION MUST BE DMWORK AND WHOSE C SECOND DIMENSION MUST BE AT LEAST K. C WORK MATRIX. C C DMWRK1 INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE FIRST DIMENSION OF THE MATRIX WORK2. MUST BE AT LEAST N. C C DMWRK2 INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE SECOND DIMENSION OF THE MATRIX WORK2. MUST BE AT LEAST C N+1. C C WORK2 REAL MATRIX WHOSE FIRST DIMENSION MUST BE DMWRK1, WHOSE SECOND C DIMENSION MUST BE DMWRK2, AND WHOSE THIRD DIMENSION MUST BE C AT LEAST K+1. C WORK MATRIX. C C DMWRK3 INTEGER SCALAR (UNCHANGED ON OUTPUT). C THE LEADING DIMENSION OF THE MATRIX WORK3. MUST BE AT LEAST C N. C C WORK3 REAL MATRIX WHOSE FIRST DIMENSION MUST BE DMWRK3 AND WHOSE C SECOND DIMENSION MUST BE AT LEAST N+1. C WORK MATRIX. C C IWORK INTEGER VECTOR DIMENSIONED AT LEAST N. C WORK VECTOR. C C OUNIT INTEGER SCALAR (UNCHANGED ON OUTPUT). C UNIT NUMBER FOR OUTPUT. C C OUTPUT PARAMETER C ---------------- C C IERR INTEGER SCALAR. C ERROR FLAG. C C IF IERR = 0, NO ERROR WAS DETECTED. C C IF IERR = K, THE K-TH PIVOT BLOCK OF ONE OF THE COVARIANCE C MATRICES WAS SINGULAR. THEREFORE, AN INVERSE C COULD NOT BE CALCULATED AND EXECUTION WAS C TERMINATED. THE ERROR FLAG WAS SET IN CMLIB C SUBROUTINE SSIFA. C C REFERENCE C --------- C C HARTIGAN, J. A. (1975). CLUSTERING ALGORITHMS, JOHN WILEY & C SONS, INC., NEW YORK. PAGES 113-129. C C<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><> C