ESAIM: P&S, June 2007, Vol. 11, pp. 272-280
DOI: 10.1051/ps:2007019
A graph-based estimator of the number of clusters
Gérard Biau, Benoît Cadre and Bruno PelletierInstitut de Mathématiques et de Modélisation de Montpellier, UMR CNRS 5149, Équipe de Probabilités et Statistique, Université Montpellier II, CC 051, Place Eugène Bataillon, 34095 Montpellier Cedex 5, France; biau@math.univ-montp2.fr; cadre@math.univ-montp2.fr; pelletier@math.univ-montp2.fr
(Received June 20, 2006. Revised October 23, 2006. Published online 19 June 2007.)
Abstract
Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given n independent observations
drawn from an unknown multivariate probability density f, we propose a new approach to estimate the number of connected components, or clusters, of the t-level set
. The basic idea is to form a rough skeleton of the set
using any preliminary estimator of f, and to count the number of connected components of the resulting graph. Under mild analytic conditions on f, and using tools from differential geometry, we establish the consistency of our method.
Mathematics Subject Classification. 62G05, 62G20
Key words: Cluster analysis, connected component, level set, graph, tubular neighborhood.
© EDP Sciences, SMAI 2007



Document