Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Van Hanh Nguyen; Catherine Matias

doi:10.1051/ps/2013041

Free Access

Issue		ESAIM: PS Volume 18, 2014


Page(s)		584 - 612
DOI		https://doi.org/10.1051/ps/2013041
Published online		15 October 2014

ESAIM: PS 18 (2014) 584-612

Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Van Hanh Nguyen¹^,2 and Catherine Matias²

¹ Laboratoire de Mathématiques d’Orsay, Université Paris Sud, UMR CNRS 8628, Bâtiment 425, 91405 Orsay cedex, France
This email address is being protected from spambots. You need JavaScript enabled to view it.
² Laboratoire Statistique et Génome, Université d’Évry Val d’Essonne, UMR CNRS 8071, USC INRA, 23 bvd de France, 91037 Évry, France
This email address is being protected from spambots. You need JavaScript enabled to view it.

Received: 24 October 2012
Revised: 29 March 2013

Abstract

In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion θ of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Hölder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.

Mathematics Subject Classification: 62G07 / 62G20

Key words: False discovery rate / kernel estimation / local false discovery rate / maximum smoothed likelihood / multiple testing / p-values / semiparametric mixture model

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.