ESAIM: Probability and Statistics

Research Article

A non asymptotic penalized criterion for Gaussian mixture model selection

Cathy Maugisa1 and Bertrand Michela2

a1 Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, 135 avenue de Rangueil, 31077 Toulouse Cedex 4, France;

a2 Laboratoire de Statistique Théorique et Appliquée, Université Paris 6, 175 rue du Chevaleret, 75013 Paris, France;


Specific Gaussian mixtures are considered to solve simultaneously variable selection and clustering problems. A non asymptotic penalized criterion is proposed to choose the number of mixture components and the relevant variable subset. Because of the non linearity of the associated Kullback-Leibler contrast on Gaussian mixtures, a general model selection theorem for maximum likelihood estimation proposed by [Massart Concentration inequalities and model selection Springer, Berlin (2007). Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23 (2003)] is used to obtain the penalty function form. This theorem requires to control the bracketing entropy of Gaussian mixture families. The ordered and non-ordered variable selection cases are both addressed in this paper.

(Received September 19 2008)

(Revised February 13 2009)

(Online publication January 5 2012)

Key Words:

  • Model-based clustering;
  • variable selection;
  • penalized likelihood criterion;
  • bracketing entropy

Mathematics Subject Classification:

  • 62H30;
  • 62G07
  • [1] H. Akaike, Information theory and an extension of the maximum likelihood principle, in Second International Symposium on Information Theory (Tsahkadsor, 1971), Akadémiai Kiadó, Budapest (1973) 267–281.
  • [2] S. Arlot and P. Massart, Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. (2008) (to appear).
  • [3] J.D. Banfield and A.E. Raftery, Model-based Gaussian and non-Gaussian clustering. Biometrics 49 (1993) 803–821. [OpenURL Query Data]  [Google Scholar]
  • [4] A. Barron, L. Birgé and P. Massart, Risk bounds for model selection via penalization. Prob. Th. Re. Fields 113 (1999) 301–413. [OpenURL Query Data]  [Google Scholar]
  • [5] J.-P. Baudry, Clustering through model selection criteria. Poster session at One Day Statistical Workshop in Lisieux. baudry, June (2007).
  • [6] C. Biernacki, G. Celeux and G. Govaert, Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Analy. Mach. Intell. 22 (2000) 719–725. [OpenURL Query Data]  [Google Scholar]
  • [7] C. Biernacki, G. Celeux, G. Govaert and F. Langrognet, Model-based cluster and discriminant analysis with the mixmod software. Comput. Stat. Data Anal. 51 (2006) 587–600. [OpenURL Query Data]  [Google Scholar]
  • [8] L. Birgé and P. Massart, Gaussian model selection. J. Eur. Math. Soc. 3 (2001) 203–268. [OpenURL Query Data]  [Google Scholar]
  • [9] L. Birgé and P. Massart, A generalized Cp criterion for Gaussian model selection. Prépublication n° 647, Universités de Paris 6 et Paris 7 (2001).
  • [10] L. Birgé and P. Massart. Minimal penalties for Gaussian model selection. Prob. Th. Rel. Fields 138 (2007) 33–73.
  • [11] L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam. Springer, New York (1997) 55–87.
  • [12] C. Bouveyron, S. Girard and C. Schmid, High-Dimensional Data Clustering. Comput. Stat. Data Anal. 52 (2007) 502–519. [OpenURL Query Data]  [Google Scholar]
  • [13] K.P. Burnham and D.R. Anderson, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer-Verlag, New York, 2nd edition (2002). [Google Scholar]
  • [14] G. Castellan, Modified Akaike's criterion for histogram density estimation. Technical report, Université Paris-Sud 11 (1999).
  • [15] G. Castellan, Density estimation via exponential model selection. IEEE Trans. Inf. Theory 49 (2003) 2052–2060. [OpenURL Query Data]  [Google Scholar]
  • [16] G. Celeux and G. Govaert, Gaussian parsimonious clustering models. Pattern Recogn. 28 (1995) 781–793. [OpenURL Query Data]  [Google Scholar]
  • [17] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc, Ser. B. 39 (1977) 1–38.
  • [18] C.R. Genovese and L. Wasserman, Rates of convergence for the Gaussian mixture sieve. Ann. Stat. 28 (2000) 1105–1127. [OpenURL Query Data]  [Google Scholar]
  • [19] S. Ghosal and A.W. van der Vaart, Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Stat. 29 (2001) 1233–1263. [OpenURL Query Data]  [Google Scholar]
  • [20] C. Keribin, Consistent estimation of the order of mixture models. Sankhyā. The Indian Journal of Statistics. Series A 62 (2000) 49–66. [OpenURL Query Data]  [Google Scholar]
  • [21] M.H. Law, M.A.T. Figueiredo and A.K. Jain, Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 1154–1166. [OpenURL Query Data]  [Google Scholar]
  • [22] E. Lebarbier, Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Proc. 85 (2005) 717–736. [OpenURL Query Data]  [Google Scholar]
  • [23] V. Lepez, Potentiel de réserves d'un bassin pétrolier: modélisation et estimation. Ph.D. thesis, Université Paris-Sud 11 (2002).
  • [24] P. Massart, Concentration inequalities and model selection. Springer, Berlin (2007). Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23 (2003).
  • [25] C. Maugis, Sélection de variables pour la classification non supervisée par mélanges gaussiens. Applications à l'étude de données transcriptomes. Ph.D. thesis, University Paris-Sud 11 (2008).
  • [26] C. Maugis, G. Celeux and M.-L. Martin-Magniette, Variable Selection for Clustering with Gaussian Mixture Models. Biometrics (2008) (to appear).
  • [27] C. Maugis and B. Michel, Slope heuristics for variable selection and clustering via Gaussian mixtures. Technical Report 6550, INRIA (2008).
  • [28] A.E. Raftery and N. Dean, Variable Selection for Model-Based Clustering. J. Am. Stat. Assoc. 101 (2006) 168–178. [OpenURL Query Data]  [Google Scholar]
  • [29] G. Schwarz, Estimating the dimension of a model. Ann. Stat. 6 (1978) 461–464. [OpenURL Query Data]  [Google Scholar]
  • [30] D. Serre, Matrices. Springer-Verlag, New York (2002).
  • [31] M. Talagrand, Concentration of measure and isoperimetric inequalities in product spaces. Publ. Math., Inst. Hautes Étud. Sci. 81 (1995) 73–205.
  • [32] M. Talagrand, New concentration inequalities in product spaces. Invent. Math. 126 (1996) 505–563. [OpenURL Query Data]  [Google Scholar]
  • [33] F. Villers, Tests et sélection de modèles pour l'analyse de données protéomiques et transcriptomiques. Ph.D. thesis, University Paris-Sud 11 (2007).