How the result of graph clustering methods depends on the construction of the graph

Markus Maier; Ulrike von Luxburg; Matthias Hein

doi:10.1051/ps/2012001

All issues

Volume 17 (2013)

ESAIM: PS, 17 (2013) 370-418

Abstract

Free Access

Issue		ESAIM: PS Volume 17, 2013


Page(s)		370 - 418
DOI		https://doi.org/10.1051/ps/2012001
Published online		21 May 2013

ESAIM: PS 17 (2013) 370-418

How the result of graph clustering methods depends on the construction of the graph

Markus Maier¹, Ulrike von Luxburg¹^,2 and Matthias Hein³

¹ Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076 Tübingen, Germany
mmaier@tuebingen.mpg.de
² Department of Computer Science, University of Hamburg, Vogt-Kölln-Str. 30, 22527 Hamburg, Germany
luxburg@informatik.uni-hamburg.de
³ Faculty of Mathematics and Computer Science, Saarland University, Postfach 151150, 66041, Saarbrücken, Germany
hein@cs.uni-sb.de

Received: 23 November 2010
Revised: 21 December 2011

Abstract

We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.

Mathematics Subject Classification: 62G20 / 05C80 / 68Q87

Key words: Random geometric graph / clustering / graph cuts

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.