How many bins should be put in a regular histogram

Lucien Birgé; Yves Rozenholc

doi:10.1051/ps:2006001

Free Access

Issue		ESAIM: PS Volume 10, September 2006


Page(s)		24 - 45
DOI		https://doi.org/10.1051/ps:2006001
Published online		31 January 2006

ESAIM: P&S, February 2006, Vol. 10, p. 24-45

How many bins should be put in a regular histogram

Lucien Birgé¹ and Yves Rozenholc²

¹ UMR 7599 “Probabilités et modèles aléatoires", Laboratoire de Probabilités, boîte 188, Université Paris VI, 4 Place Jussieu, 75252 Paris Cedex 05, France; This email address is being protected from spambots. You need JavaScript enabled to view it. ;
² MAP5-UMR CNRS 8145, Université Paris 5, 45 rue des Saints-Pères, 75270 Paris Cedex 06, France; This email address is being protected from spambots. You need JavaScript enabled to view it.

Received: 7 July 2003
Revised: 1 September 2004
Revised: 11 May 2005

Abstract

Given an n-sample from some unknown density f on [0,1], it is easy to construct an histogram of the data based on some given partition of [0,1], but not so much is known about an optimal choice of the partition, especially when the data set is not large, even if one restricts to partitions into intervals of equal length. Existing methods are either rules of thumbs or based on asymptotic considerations and often involve some smoothness properties of f. Our purpose in this paper is to give an automatic, easy to program and efficient method to choose the number of bins of the partition from the data. It is based on bounds on the risk of penalized maximum likelihood estimators due to Castellan and heavy simulations which allowed us to optimize the form of the penalty function. These simulations show that the method works quite well for sample sizes as small as 25.

Mathematics Subject Classification: 62E25 / 62G05

Key words: Regular histograms / density estimation / penalized maximum likelihood / model selection.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.