Volume 22, 2018
|Page(s)||96 - 128|
|Published online||14 December 2018|
Impact of subsampling and tree depth on random forests
Sorbonne Universités, UPMC Univ Paris 06,
2 Centre de Mathématiques Appliquées, École polytechnique, UMR 7641, 91128 Palaiseau, France.
* Corresponding author: firstname.lastname@example.org
Accepted: 10 February 2018
Random forests are ensemble learning methods introduced by Breiman [Mach. Learn. 45 (2001) 5–32] that operate by averaging several decision trees built on a randomly selected subspace of the data set. Despite their widespread use in practice, the respective roles of the different mechanisms at work in Breiman’s forests are not yet fully understood, neither is the tuning of the corresponding parameters. In this paper, we study the influence of two parameters, namely the subsampling rate and the tree depth, on Breiman’s forests performance. More precisely, we prove that quantile forests (a specific type of random forests) based on subsampling and quantile forests whose tree construction is terminated early have similar performances, as long as their respective parameters (subsampling rate and tree depth) are well chosen. Moreover, experiments show that a proper tuning of these parameters leads in most cases to an improvement of Breiman’s original forests in terms of mean squared error.
Mathematics Subject Classification: 62G05 / 62G20
Key words: Random forests / randomization / parameter tuning / subsampling / tree depth
© EDP Sciences, SMAI 2018
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.