Interacting particle Langevin algorithm for maximum marginal likelihood estimation | ESAIM: Probability and Statistics (ESAIM: P&S)

Open Access

Issue		ESAIM: PS Volume 29, 2025


Page(s)		243 - 280
DOI		https://doi.org/10.1051/ps/2025005
Published online		13 June 2025

G. Casella and R.L. Berger, Statistical Inference. Cengage Learning (2021). [Google Scholar]
B.P. Carlin and T.A. Louis, Empirical Bayes: past, present and future. J. Am. Statist. Assoc. 95 (2000) 1286-1289. [CrossRef] [Google Scholar]
C.M. Bishop, Pattern Recognition and Machine Learning, Vol. 4. Springer (2006). [Google Scholar]
D.M. Blei, A.Y. Ng and M.I. Jordan, Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (2003) 993-1022. [Google Scholar]
P. Smaragdis, B. Raj and M. Shashanka. A probabilistic latent variable model for acoustic modeling. Adv. Models Acoust. Process. Workshop, NIPS 148 (2006) 8-1. [Google Scholar]
P.D Hoff, A.E. Raftery and M.S Handcock, Latent space approaches to social network analysis. J. Am. Statist. Assoc. 97 (2002) 1090-1098. [CrossRef] [Google Scholar]
A.P Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B (Statist. Methodol.) 39 (1977) 2-38. [Google Scholar]
R.P. Sherman, Y.-Y.K. Ho and S.R. Dalal, Conditions for convergence of Monte Carlo EM sequences with an application to product diffusion modeling. Econom. J. 2 (1999) 248-267. [CrossRef] [MathSciNet] [Google Scholar]
G.C.G. Wei and M.A. Tanner, A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. J. Am. Statist. Assoc. 85 (1990) 699-704. [CrossRef] [Google Scholar]
C. Liu and D.B. Rubin, The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81 (1994) 633-648. [CrossRef] [MathSciNet] [Google Scholar]
X.-L. Meng and D.B. Rubin, Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80 (1993) 267-278. [CrossRef] [MathSciNet] [Google Scholar]
K. Lange, A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B (Methodol.) 57 (1995) 425-437. [CrossRef] [Google Scholar]
G. Celeux, The SEM algorithm: a probabilistic teacher algorithm derived from the em algorithm for the mixture problem. Computat. Statist. Q. 2 (1985) 73-82. [Google Scholar]
J.G. Booth and J.P. Hobert, Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. Roy. Statist. Soc. Ser. B (Statist. Methodol.) 61 (1999) 265-285. [CrossRef] [Google Scholar]
O. Cappe, A. Doucet, M. Lavielle and E. Moulines, Simulation-based methods for blind maximum-likelihood filter identification. Signal Process. 73 (1999) 3-25. [CrossRef] [Google Scholar]
G. Celeux and J. Diebolt, A stochastic approximation type EM algorithm for the mixture problem. Stochastics 41 (1992) 119-134. [Google Scholar]
K.S. Chan and J. Ledolter, Monte Carlo EM estimation for time series models involving counts. J. Am. Statist. Assoc. 90 (1995) 242-252. [CrossRef] [Google Scholar]
J. Diebolt and E.H.S. Ip, A stochastic EM algorithm for approximating the maximum likelihood Estimate, in Markov Chain Monte Carlo in Practice, edited by W.R. Gilks, S.T. Richardson and D.J. Spiegelhalter (1996). [Google Scholar]
Y.F. Atchade, G. Fort and E. Moulines, On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18 (2017) 310-342. [Google Scholar]
B.S. Caffo, W. Jank and G.L. Jones, Ascent-based Monte Carlo expectation-maximization. J. Roy. Statist. Soc. Ser. B (Statist. Methodol.) 67 (2005) 235-251. [CrossRef] [Google Scholar]
B. Delyon, M. Lavielle and E. Moulines, Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist. 27 (1999) 94-128. [CrossRef] [MathSciNet] [Google Scholar]
G. Fort, E. Moulines and P. Priouret, Convergence of adaptive and interacting Markov chain Monte Carlo algorithms. Ann. Statist. 39 (2011) 3262-3289. [CrossRef] [MathSciNet] [Google Scholar]
G. Fort and E. Moulines, Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Statist. 31 (2003) 1220-1259. [CrossRef] [MathSciNet] [Google Scholar]
A.S. Dalalyan, Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent, in Conference on Learning Theory. PMLR (2017) 678-689. [Google Scholar]
A.S. Dalalyan, Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. Roy. Statist. Soc. Ser. B (Statist. Methodol.) 79 (2017) 651-676. [CrossRef] [Google Scholar]
A. Durmus and E. Moulines, Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab. 27 (2017) 1551-1587. [MathSciNet] [Google Scholar]
G.O. Roberts and R.L. Tweedie, Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 (1996) 341-363. [CrossRef] [Google Scholar]
N. Brosse, A. Durmus, E. Moulines and S. Sabanis, The tamed unadjusted Langevin algorithm. Stoch. Processes Appl. 129 (2019) 3638-3663. [CrossRef] [Google Scholar]
S. Chewi, M.A. Erdogdu, M. Li, R. Shen and S. Zhang, Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev, in Conference on Learning Theory. PMLR (2022) 1-2. [Google Scholar]
A.S Dalalyan and A. Karagulyan, User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stoch. Processes Appl. 129 (2019) 5278-5311. [CrossRef] [Google Scholar]
A. Durmus, S. Majewski and B. Miasojedow, Analysis of Langevin Monte Carlo via convex optimization. J. Mach. Learn. Res. 20 (2018) 2666-2711. [Google Scholar]
A. Durmus and E. Moulines, High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25 (2019) 2854-2882. [CrossRef] [MathSciNet] [Google Scholar]
S. Vempala and A. Wibisono, Rapid convergence of the unadjusted Langevin algorithm: isoperimetry suffices. Adv. Neural Inform. Process. Syst. 32 (2019). [Google Scholar]
V. De Bortoli, A. Durmus, M. Pereyra and A.F. Vidal, Efficient stochastic optimisation by unadjusted Langevin Monte Carlo. Statist. Comput. 31 (2021) 1-18. [CrossRef] [Google Scholar]
J. Kuntz, J.N. Lim and A.M. Johansen, Particle algorithms for maximum likelihood training of latent variable models. AISTATS 206 (2023) 5134-5180. [Google Scholar]
R. Caprio, J. Kuntz, S. Power and A.M. Johansen, Error bounds for particle gradient descent, and extensions of the log-sobolev and talagrand inequalities. arXiv preprint arXiv:2403.02004 (2024). [Google Scholar]
C. Gaetan and J.-F. Yao, A multiple-imputation Metropolis version of the EM algorithm. Biometrika 90 (2003) 643-654. [CrossRef] [MathSciNet] [Google Scholar]
E. Jacquier, M. Johannes and N. Poison, MCMC maximum likelihood for latent state models. J. Econom. 137 (2007) 615-640. [CrossRef] [Google Scholar]
A.M. Johansen, A. Doucet and M. Davy, Particle methods for maximum likelihood estimation in latent variable models. Statist. Comput. 18 (2008) 47-57. [CrossRef] [Google Scholar]
J.-C. Duan, A. Fulop and Y.-W. Hsieh, Maximum likelihood estimation of latent variable models by SMC with marginalization and data cloning. USC-INET Research Paper (17-27) (2017). [Google Scholar]
O. Deniz Akyildiz, D. Crisan and J. Miguez, Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization. Statist. Comput. 30 (2020) 1645-1663. [CrossRef] [Google Scholar]
J. Kennedy and R. Eberhart, Particle swarm optimization, in Proceedings of ICNN'95-International Conference on Neural Networks, Vol. 4. IEEE (1995) 1942-1948. [CrossRef] [Google Scholar]
R. Pinnau, C. Totzeck, O. Tse and S. Martin, A consensus-based model for global optimization and its mean-field limit. Math. Models Methods Appl. Sci. 27 (2017) 183-204. [CrossRef] [MathSciNet] [Google Scholar]
C. Totzeck and M.-T. Wolfram, Consensus-based global optimization with personal best. Math. Biosci. Eng. 17 (2020) 6026-6044. [CrossRef] [MathSciNet] [Google Scholar]
S. Grassi and L. Pareschi, From particle swarm optimization to consensus based optimization: stochastic modeling and mean-field limit. Math. Models Methods Appl. Sci. 31 (2021) 1625-1657. [CrossRef] [MathSciNet] [Google Scholar]
A. Borovykh, N. Kantas, P. Parpas and G. Pavliotis, Optimizing interacting Langevin dynamics using spectral gaps, in Proceedings of the 38th International Conference on Machine Learning (ICML 2021) (2021). [Google Scholar]
A. Doucet, S.J. Godsill and C.P. Robert, Marginal maximum a posteriori estimation using Markov chain Monte Carlo. Statist. Comput. 12 (2002) 77-84. [CrossRef] [Google Scholar]
M. Raginsky, A. Rakhlin and M. Telgarsky, Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis, in Proceedings of the 2017 Conference on Learning Theory, Vol. 65 of Proceedings of Machine Learning Research, edited by S. Kale and O. Shamir. PMLR (2017) 1674-1703. [Google Scholar]
Y. Zhang, O.D. Akyildiz, T. Damoulas and S. Sabanis, Nonasymptotic estimates for stochastic gradient Langevin dynamics under local conditions in nonconvex optimization. Appl. Math. Optim. 87 (2023) 25. [CrossRef] [Google Scholar]
R.M. Neal and G.E. Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, in Learning in Graphical Models. Springer (1998) 355-368. [CrossRef] [Google Scholar]
M. Bossy and D. Talay, A stochastic particle method for the McKean-Vlasov and the Burgers equation. Math. Computat. 66 (1997) 157-192. [CrossRef] [Google Scholar]
A.-S. Sznitman, Topics in propagation of chaos, in Ecole d'ete de probabilites de Saint-Flour XIX-1989. Vol. 1464 of Lecture Notes in Mathematics. Springer, Berlin (1991) 165-251. [CrossRef] [Google Scholar]
P. Billingsley, Probability and Measure. John Wiley & Sons (1995). [Google Scholar]
C.-R. Hwang, Laplace's method revisited: weak convergence of probability measures. Ann. Probab. 8 (1980) 11771182. [Google Scholar]
L. Bottou, F.E. Curtis and J. Nocedal, Optimization methods for large-scale machine learning. SIAM Rev. 60 (2018) 223-311. [CrossRef] [MathSciNet] [Google Scholar]
M. Welling and Y.W. The, Bayesian learning via stochastic gradient Langevin dynamics, in Proceedings of the 28th International Conference on Machine Learning (ICML-11) (2011) 681-688. [Google Scholar]
M.D. Hoffman, D.M. Blei, C. Wang and J. Paisley, Stochastic variational inference. J. Mach. Learn. Res. 14 (2013) 1303-1347. [Google Scholar]
R. Jordan, D. Kinderlehrer and F. Otto, The variational formulation of the Fokker-Planck equation. SIAM J. Math. Anal. 29 (1998) 1-17. [Google Scholar]
F. Otto, The geometry of dissipative evolution equations: the porous medium equation. Commun. Part. Differ. Equ. 26 (2001) 101-174. [CrossRef] [Google Scholar]
F. Malrieu, Logarithmic Sobolev inequalities for some nonlinear PDEs. Stoch. Processes Applic. 95 (2001) 109-132. [CrossRef] [Google Scholar]
J. Kuntz and J.N. Lim, Code for "particle algorithms for maximum likelihood training of latent variable models". https://github.com/juankuntz/ParEM (2023). [Google Scholar]
W. Wolberg, O. Mangasarian, N. Street and W. Street, Breast Cancer Wisconsin (Diagnostic). UCI Machine Learning Repository (1995). [Google Scholar]
L. Sharrock, D. Dodd and C. Nemeth, CoinEM: tuning-free particle-based variational inference for latent variable models. arXiv preprint arXiv:2305.14916 (2023). [Google Scholar]
Q. Liu and D. Wang, Stein variational gradient descent: a general purpose Bayesian inference algorithm. Adv. Neural Inform. Process. Syst. 29 (2016). [Google Scholar]
L. Sharrock and C. Nemeth, Coin sampling: Gradient-based Bayesian inference without learning rates. arXiv preprint arXiv:2301.11294 (2023). [Google Scholar]
I. Karatzas and S. Shreve, Brownian Motion and Stochastic Calculus, Vol. 113. Springer Science & Business Media (1991). [Google Scholar]
T. Lelievre and G. Stoltz, Partial differential equations and stochastic methods in molecular dynamics. Acta Numer. 25 (2016) 681-880. [CrossRef] [MathSciNet] [Google Scholar]
R. Gardner, The Brunn-Minkowski inequality. Bull. Am. Math. Soc. 39 (2002) 355-405. [CrossRef] [Google Scholar]
B. Gao and L. Pavel, On the properties of the softmax function with application in game theory and reinforcement learning (2018). [Google Scholar]
C. Kumar and S. Sabanis, On Milstein approximations with varying coefficients: the case of super-linear diffusion coefficients. BIT Numer. Math. 59 (2019) 929-968. [CrossRef] [MathSciNet] [Google Scholar]
L.-P. Chaintron and A. Diez, Propagation of chaos: a review of models, methods and applications. II. Applications. arXiv preprint arXiv:2106.14812 (2021). [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.