New error bounds for Laplace approximation via Stein's method

We use Stein's method to obtain explicit bounds on the rate of convergence for the Laplace approximation of two different sums of independent random variables; one being a random sum of mean zero random variables and the other being a deterministic sum of mean zero random variables in which the normalisation sequence is random. We make technical advances to the framework of Pike and Ren \cite{pike} for Stein's method for Laplace approximation, which allows us to give bounds in the Kolmogorov and Wasserstein metrics. Under the additional assumption of vanishing third moments, we obtain faster convergence rates in smooth test function metrics. As part of the derivation of our bounds for the Laplace approximation for the deterministic sum, we obtain new bounds for the solution, and it first two derivatives, of the Rayleigh Stein equation.


Introduction
The central limit theorem states that for a sequence of independent and identically distribution (i.i.d.) random variables, X 1 , X 2 , . . ., with zero mean and variance σ 2 ∈ (0, ∞), the standardised sum W n = 1 σ √ n n i=1 X i convergences in distribution to the standard normal distribution, as n → ∞. By modifying the sum W n appropriately such that either the number of terms in the sum is random or the normalisation is random we can instead naturally arrive at an asymptotic Laplace distribution. Studying the rate of convergence to the Laplace distribution in these two settings, via Stein's method, is the subject of this paper.
More precisely, consider the Laplace distribution with parameters a ∈ R and b ∈ (0, ∞) with probability density function If a random variable W has density (1.1), then we write W ∼ Laplace(a, b). It is readily checked that E[W ] = a and Var(W ) = 2b 2 . For a comprehensive account of the properties and applications of the Laplace distribution, see [28]. The first limit theorem we consider concerns geometric sums, which arise in a variety of settings [26]. Let X 1 , X 2 , . . . be a sequence of i.i.d. random variables with zero mean and variance σ 2 ∈ (0, ∞) and let N p ∼ Geo(p) be independent of the X i with probability mass function P (N p = k) = p(1 − p) k−1 , k = 1, 2, . . ., 0 < p < 1. Then, with an obvious abuse of notation, This result is proved under the stronger assumption of symmetric X i in [28], whilst weaker Lindeberg-type conditions for the existence of the distributional limit are given by [47]. The second limit theorem considered in this paper concerns the case in which the sum n i=1 X i is normalised by a random variable. Let B n be a beta random variable with parameters 1 and n ≥ 1 and probability density function f Bn (x) = n(1 − x) n−1 , 0 < x < 1.
We write B n ∼ Beta(1, n). As in the first limit theorem, let X 1 , X 2 , . . . be a sequence of i.i.d. random variables with zero mean and variance σ 2 ∈ (0, ∞). For n ≥ 2, let B n−1 ∼ Beta(1, n − 1) be independent of the X i . Then, Proposition 2.2.12 of [28] states that For characterisations of the Laplace distribution involving the random variables S p and T n , see [25] and [33,34], respectively. In this paper, we give explicit bounds on the distance, with respect to certain probability metrics, between the distributions of S p and T n and their limiting Laplace distributions via Stein's method, a powerful probabilistic technique that was introduced in 1972 by Charles Stein [45] for normal approximation. For a given target distribution q, the first step in Stein's method is to find a suitable operator A acting on a class of functions F such that E[Af (Y )] = 0 for all f ∈ F if and only if the random variable Y has distribution q. For the N(µ, σ 2 ) distribution, the classical Stein operator is Af (x) = σ 2 f ′ (x)−(x−µ)f (x). This leads to the Stein equation where the test function h is real-valued. The second step is to solve (1.
2) for f h (for which we require f h ∈ F ) and obtain suitable bounds for the solution. Finally, to approximate the distribution of a random variable of interest W by the target distribution q, one may evaluate both sides of (1. gives the Kolmogorov, Wasserstein and bounded Wasserstein distances, which we denote by d K , d W and d BW , respectively, as well as two smooth test function metrics, which we denote by d 2 and d 1,2 , respectively. (Here and throughout the paper g := g ∞ = sup x∈R |g(x)|.) The d 2 and d 1,2 and similar smooth test function metrics are often found in applications of Stein's method in which 'fast' convergence rates are sought, see, for example, [3,13,21,23]. Stein's method was adapted to the Laplace distribution by [38] (a number of their contributions are outlined in Section 2), and as an application they derived an explicit bound on the bounded Wasserstein distance between the distribution of S p and its limiting Laplace distribution. Their approach, which involves the introduction of the so-called centered equilibrium transformation for Laplace approximation, mirrored that of [35], who used Stein's method for exponential approximation to give explicit bounds on the rate of convergence in a generalisation of a well-known result of Rényi [40] concerning the convergence of geometric sums of positive random variables to the exponential distribution. In this paper, we make technical improvements on the work of [38] (through Lemma 2.1 and Theorem 2.5) that allow for their framework of Laplace approximation by Stein's method to yield optimal order Kolmogorov and Wasserstein distance bounds, as well as faster convergence rates in the d 2 distance. As an application we are able to obtain the following theorem. . (1.5) Finally, suppose that X 1 , X 2 , . . . are identically distributed and that E[ is difficult to compute or large. Note, though, that as k increases the exponent k 2(k+1) of p in (1.5) approaches the exponent 1 2 of (1.3). We are also able to obtain a similar theorem for the deterministic sum T n : Theorem 1.3. Let n ≥ 2 and suppose that X 1 , . . . , X n are independent random variables with E[ n .
In addition to the above assumptions, suppose that E[ 168σ n .
Written in the notation of Theorem 1.1, the bounded Wasserstein distance bound of [38] reads d BW (S p , Z) ≤ σ √ p 1 + 2 We see that in addition to being given in a stronger metric, the Wasserstein distance bound (1.4) of Theorem 1.1 has a better dependence on σ (the bound of [38] has an extra factor of 1 + 2 √ 2 σ meaning that the bound has a worse dependence on σ if σ is 'small') and a smaller numerical constant if σ < 2 √ 2 (the bound of [38] has the smaller numerical constant if σ > 2 √ 2). The bound (1.4) also improves on the recent Wasserstein distance bound given in Theorem 5.10 of [18], in which Laplace approximations were obtained as part of a more general work on variance-gamma approximation. By working in a specialist Laplace framework, it is no surprise that we outperform the results of [18], and our Kolmogorov distance bound (1.3) is also an improvement on the analogous bound in Theorem 5.10 of that work. The O(p) bound (1.6) is the first faster than O(p 1/2 ) bound for the random sum S p in the literature. The faster convergence rate is a result of the vanishing third moment assumption, and as such complements a number of other 'matching moments' limit theorems that are found in the Stein's method literature, see, for example, [5,13,16,19,22,29]. Theorem 1.3 gives the first bounds in the literature on the rate of convergence of the deterministic sum T n to its asymptotic Laplace distribution. Again, under the assumption of vanishing third moments, we obtain a faster convergence rate. As part of our proof of the theorem, we obtain the first bounds in the literature for the solution, and its first two derivatives, of the Rayleigh Stein equation, which may be useful in future applications.
The rest of the paper is organised as follows. In Section 2, we obtain new bounds for the solution of the Laplace Stein equation (Lemma 2.1) and give general bounds for Laplace approximation involving the centered equilibrium distribution (Theorem 2.5). In Sections 3 and 4, we prove Theorems 1.1 and 1.3, respectively. In Section 5, we obtain new bounds for the solution of the Rayleigh Stein equation that are used in the proof of Theorem 1.3.

Stein's method for the Laplace distribution
In this section, we recall some of the theory developed by [38] for Stein's method for Laplace approximation and make some technical improvements that allow their framework for Laplace approximation to be applied in the Kolmogorov and Wasserstein metrics, as well as the d 2 metric when faster convergence rates are sought. We begin by recalling the following characterisation of the Laplace distribution [38, Theorem 1.1].
Let W be a real-valued random variable. Then W follows the for all f : R → R such that f and f ′ are locally absolutely continuous and E|f ′ (Z)| < ∞ and E|f ′′ (Z)| < ∞, for Z ∼ Laplace(0, b). Based on this characterisation, [38] were led to the initial value problem At this point it is worth noting that an alternative Stein equation for the Laplace(0, b) distribution is given by which is a special case of the variance-gamma Stein equation of [15] (it is noted in Proposition 1.2 of [15] that the Laplace distribution is a special case of the variance-gamma distribution). A framework for variance-gamma approximation by Stein's method in the Kolmogorov and Wasserstein metrics was developed by [18], and a special case of this general framework gives a framework for Laplace approximation. However, the Stein equation (2.9) is more difficult to work with than (2.8) and it is therefore not surprising that all the comparable results for Laplace approximation obtained in this paper outperform those of [18]. We also remark that another Stein characterisation of the Laplace distribution is given by [1], as a special case of a general characterisation concerning infinitely divisible distributions, although the quantitative limit theorems derived in their work are quite different to ours. Let us now focus on the initial value problem (2.8). The solution was obtained by [38], as well as bounds for f and its first three derivatives. In the following lemma, we improve on Lemma 2.2 of [38] by obtaining bounds for f and its derivatives (of arbitrary order) that have smaller constants and hold for a larger class of functions. The latter improvement is crucial in enabling us to later obtain Kolmogorov and Wasserstein distance bounds for Laplace approximation.
Suppose that h is Lipschitz. Then Proof. It is easily verified that there is at most one bounded solution to (2.8). Suppose that u and v are solutions to (2.8). Then w = u − v satisfies w(0) = 0 and solves the differential equation b 2 w ′′ (x) − w(x) = 0, the general solution to which is given by to be bounded for all x ∈ R, we must take A = B = 0, from which we conclude that w = 0, so that u = v. Now we establish the bounds in (2.11). Suppose h is bounded. We first note that, for all x ∈ R, Applying these inequalities into (2.10) gives the bound Differentiating both sides of (2.10) gives that and so From (2.8) and formula (2.10) we have that, for all x ∈ R, Now we suppose that h is Lipschitz. We shall now prove the non-uniform bound for |f (x)|. By the mean value theorem, We verify the first inequality; the second inequality is proved similarly. For . Putting all of the above together, we obtain, for x ∈ R, Finally, we prove the uniform bounds. We note that applying integration by parts to (2.14) gives that We recognise this representation of f ′ (x) as being the same as the representation (2.10) of f (x), withh(t) replaced by h ′ (t), and so we can immediately deduce the bounds in (2.12) for f ′ , f ′′ and f (3) . Repeating the procedure inductively yields the bounds for f (k+1) , f (k+2) and f (k+3) , k ≥ 0.
The following distributional transformation, introduced by [38], is very natural in the context of Stein's method for Laplace approximation. Let W have mean zero and non-zero finite variance. Then we say that the random variable W L has the centered equilibrium distribution with respect to W if Stronger conditions were imposed on f by [38], but on examining the proof of their Theorem 3.2 it can be seen that the weaker conditions presented here are sufficient to ensure W L exists and is unique. We also refer the reader to [7] for a generalisation of (2.15) to all random variables W with finite second moment, and we note that the centered equilibrium distribution is itself the Laplace analogue of the equilibrium distribution that is used in Stein's method for exponential approximation by [35]. Some useful properties of the centered equilibrium transformation are collected in Section 3 of [38] and Proposition 4.6 of [18]. In the sequel, the following moment relations will be important: assuming (2.16) The formulas in (2.16) are obtained by substituting f 1 (w) = w r+2 and f 2 (w) = |w| r+2 , respectively, into (2.15) and using that E[W 2 ] = 2b 2 . Theorem 2.5 below gives general bounds for Laplace approximation involving the centered equilibrium transformation. Bounds (2.21) -(2.25) of the theorem are the Laplace analogues of the bounds of Theorem 2.1 of [35], which give Kolmogorov and Wasserstein distance bounds in terms the absolute difference between a random variable W and its W -equilibrium transformation. We additionally provide a bound in the weaker d 2 metric, which is used to obtain the O(p −1 ) bound (1.6) of Theorem 1.1. We mostly follow the approach of [35], but the approach used to obtain the d 2 metric bound is similar to that used by [22,Theorem 3.1] to prove an analogous bound for the zero bias transformation. We begin by stating three lemmas. The proofs of Lemmas 2.2 and 2.4 are simple and hence omitted, and the proof of Lemma 2.3 follows immediately from the estimates of Lemma 2.1.
Then, for any random variable W , For any a ∈ R and any ǫ > 0, define Let f a,ǫ be the solution (2.10) with test function h a,ǫ . Let h a,0 (x) = 1(x ≤ a) and define f a,0 accordingly. Then Lemma 2.4. Let W be a real-valued random variable and let Z ∼ Laplace(0, b). Then, for any ǫ > 0, with h a,ǫ defined as in Lemma 2.3.
Theorem 2.5. Let W be random variable with zero mean and variance 2b 2 ∈ (0, ∞), and let W L have the W -centered equilibrium distribution. Then, for any β > 0, Remark 2.6. Analogues of inequalities (2.21) -(2.25) for variance-gamma approximation were given in Theorem 4.10 of [18], which as special cases give bounds for Laplace approximation in terms of the centered equilibrium distribution. In all cases, our bounds improve on the bounds of [18].
Proof. For ease of notation, we let κ = d K (W, Z). We also let ∆ := W − W L and Using the bound (2.19) we have We also have where we used inequality (2.20) and Lemma 2.2 to obtain the last inequality. By a similar argument, and so we conclude that We now apply Lemma 2.4 and take the convenient choice ǫ = ηβ, η > 2, to obtain which on rearranging yields Choosing η = 2+ √ 10 minimises the second term in (2.27) and yields the bound (2.21). We elected to minimise the second term because in some applications the first term vanishes; as an example, see the proof of inequality (3.31). Now we prove inequality (2.22). We have By the mean value theorem, applying the triangle inequality and then using the bounds (2.19) and (2.20) we obtain yielding inequality (2.22). Now suppose that E[|W | 3 ] < ∞. By the absolute moment relation (2.16), this assumption guarantees that E|W L | < ∞. Let h ∈ H W . We have where we used the bound f (3) ≤ 2 b 2 h ′ of Lemma 2.1 in the final step. This proves inequality (2.23). Also, (2.28) Applying the bounds f (3) ≤ 1 b h ′′ and f (4) ≤ 2 b 2 h ′′ from Lemma 2.1 then yields the bound (2.26), as required. .
3 Proof of Theorem 1.1 We begin by proving the following general theorem, which improves on Theorem 4.4 of [38] and Theorem 5.9 of [18]. The improvement comes from smaller constants than in both of those theorems and by giving the bounds in metrics stronger than the bounded Wasserstein metric bounds of [38]. Very recently, [37] have obtained an optimal order Wasserstein distance bound for a multivariate generalisation of the following theorem. In their result X 1 , X 2 , . . . are i.i.d. random vectors, the limiting distribution is a centered multivariate symmetric Laplace distribution (see [28]) and an explicit constant is not given in their bound.
Theorem 3.1. Suppose that X 1 , X 2 , . . . is a sequence of independent random variables, with E[X i ] = 0 and E[X 2 i ] = σ 2 i ∈ (0, ∞). Let N be a positive, integer-valued random variable with finite mean µ, which is independent of the X i . Define Now suppose that |X i | ≤ C for all i and |N − M| ≤ K. Then we have

31)
and if K = 0 the bound also holds for unbounded X i .

Proof. It was shown in the proof of Theorem 4.4 of [38] that
We take X L m to be independent of M, N, and X k for all k. Therefore  Proof of Theorem 1.1. To ease notation, in this proof we drop the subscripts from S p and N p . As noted by [38], the assumptions imposed on N and the X i imply that L(M) = L(N), meaning that we can take M = N. Inequality (1.3) now follows from inequality (3.31). To prove inequality (1.4), we note the following simple inequality (see [38])

Substituting into (2.23) and bounding
where in the final step the Cauchy-Schwarz inequality was applied. We are now able to obtain (1.4) from (3.30).
We end by establishing inequality (1.6). We now assume that X 1 , X 2 , . . . are identically distributed with E[X 3 1 ] = 0 and E[X 4 1 ] < ∞. We prove inequality (1.6) by applying inequality (2.26) of Theorem 2.5. We proceed similarly to we did in obtaining (3.33), but this time use the independence of X N and X L N to obtain , as X L N and S are independent. Also, due to the assumption that E[X 3 i ] = 0 for all i ≥ 1, we have, by (2.16), that E[X L N ] = 1 3σ 2 E[X 3 N ] = 0. By the tower property of conditional expectation we then have where we used that because the X i are i.i.d., and therefore exchangeable, Taking h(x) = |x| in inequality (4.41) (note that h ∈ H W ) gives the inequality (see [4] for a similar bound), and on applying this inequality to (3.35) we obtain the bound The expectation E[N −1 ] is easily evaluated: We can bound E[N −1/2 ] through an application of the integral test: where we used the standard inequality log(1 + x) < x, for x > −1, in the last step.
where the X 1 , . . . , X n are independent random variables with zero mean and variance σ 2 ∈ (0, ∞). Then we have the representations follows the Rayleigh distribution with density function f U (x) = 2xe −x 2 , x > 0, and V ∼ N(0, σ 2 ) are mutually independent random variables. This representation of the Laplace distribution is given in [28, Proposition 2.2.1]. In the limit n → ∞, U n converges in distribution to U, and, by the central limit theorem, V n converges in distribution to V . Indeed, P(U n ≤ u) = 1 − (1 − u 2 /n) n−1 , u ∈ (0, √ n), which converges to 1 − e −u 2 as n → ∞. We prove Theorem 1.3 by obtaining explicit bounds on the distance between the distributions of U n and U and the distributions of V n and V with respect to suitable probability metrics and then combine these bounds to bound the distance between L(T n ) and the Laplace(0, σ √ 2 ) distribution. We combine these bounds through the following lemma.
where each inequality holds provided the expectations in the the right-hand side of the inequality exist.
Proof. We prove the bound for d 1,2 ; the bounds for d K and d W are obtained through similar and slightly simpler arguments. Let h ∈ H 1,2 . Then, by the triangle inequality and conditioning, Now, for a ∈ R \ {0} and real-valued random variables X and Y we have that since H 1,2 ⊂ H W and H 1,2 ⊂ H 2 . Applying these inequalities to (4.39) we obtain that, for h ∈ H 1,2 , The bound (4.40) holds for all h ∈ H 1,2 , and as There is a vast literature on bounds for d H (V n , V ). We will make use of three bounds from the literature for the cases H K , H W and H 2 .
Theorem 4.2 (Shevtsova [43]). Let X 1 , . . . , X n be independent random variables with where C 0 = 0.5600. Theorem 4.3 (Reinert [39]). Under the same assumptions as Theorem 4.2, we have that, for h ∈ H W , Consequently, Theorem 4.4 (Gaunt [16]). Let X 1 , . . . , X n be independent random variables with E[ Then (4.43) Remark 4.5. The Berry-Esseen Theorem 4.2, with a larger constant C 0 , was proved independently by Berry [2] and Esseen [12] in the early 1940s, and since then several works have improved on the constant with the best estimate of C 0 = 0.5600 due to [43]. For i.i.d. random variables X 1 , . . . , X n , the constant improves to C 0 = 0.4748 [44]. The assumption of bounded third absolute moments can also be reduced at the expense of a slightly more complicated bound with bigger constants [14]. Theorem 4.3 is formulated slightly differently in Theorem 2.1 of [39], but by re-scaling we obtain the bound (4.42). This is also the case for Theorem 4.4, and we additionally obtain an improved constant in (4.43) by using the bound f (4) ≤ 2 h ′′ (due to [5]) for the solution of the standard normal Stein equation , N ∼ N(0, 1), rather than the bound f (4) ≤ 3 h ′′ that was used in proof of Theorem 3.1 of [16].
As the Rayleigh distribution is a special case of the generalized gamma distribution, the following lemma follows as a special case of Proposition 2.3 of [17]. Lemma 4.6. Let U denote a Rayleigh random variable with probability density function p U (x) = 2xe −x 2 , x > 0. Suppose that f : (0, ∞) → R is differentiable and such that where Proof. Define the operator T r by T r y(x) = xy ′ (x) + ry(x), r ∈ R. In this notation, the classical Stein operator for the Beta(1, n−1) distribution is given by A B n−1 y(x) = T 1 y(x)− xT n y(x) [6,23]. Let C n = B 1/2 n−1 and let g : (0, 1) → R by such that E|C n g ′ (C n )| < ∞, E|C 3 n g ′ (C n )| < ∞, E|g(C n )| < ∞ and E|C 2 n g(C n )| < ∞. Then, by equation (15) of [20], (The conditions on g that are stated above are not specified in [20], but on examining their analysis one can see that these conditions ensure that (4.45) holds.) That is We have that U n = d √ nC n , and on rescaling we deduce (4.44) from (4.46).
In the following lemma, the bound (4.47) is proved purely for reasons of exposition, as an improved bound will be stated in Remark 4.9. Proving both the Kolmogorov and Wasserstein distance bounds requires very little more work than only proving the Wasserstein distance bound.
Remark 4.9. The following bounds will appear in the supplementary material of the arXiv version of the preprint [11]. For n ≥ 2, , (4.54) is the Gaussian hypergeometric function. (We define 0 0 := 1, but this is irrelevant because the bound (4.53) is greater than 1 in this case.) These bounds were obtained using a recent technique of [11] for bounding distances between distributions that builds upon the formalism of [10] for new representations of solutions to Stein equations. For another recent approach to bounding distances between distributions, see [9]. Our Kolmogorov distance bound (4.47) outperforms (4.53) when n = 2 (although in this case the upper bound of 1 is trivial), but for all n ≥ 3 the reverse is true. Numerical calculations carried using Mathematica suggest that the Wasserstein bound (4.54) improves on our bound (4.48) for all n ≥ 2, although verifying this assertion analytically seems to be difficult. Our bound is of course much simpler and the dependence on n is very clear. For this reason, we will use the bound (4.48) in our proof of Theorem 1.3.
Proof of Theorem 1.3. Recall that T n = d U n V n and Z = d UV . Then, by Lemma 4.1, By standard formulas for the moments and absolute moments of the beta and normal distributions, we have that E[U 2 n ] = 1 and E|V | = σ 2/π. Also, by a similar calculation to the one used to obtain the formula (4.52) we have, for n ≥ 2, where we used that √ nΓ(n)/Γ(n + 1/2) is a decreasing function of n on (0, ∞) [24].

The Rayleigh Stein equation
Let R ∼ Rayleigh(σ), σ > 0, follow the Rayleigh distribution with density function The Rayleigh distribution is a special case of the chi distribution (up to scaling). A random variable K following the chi distribution with k > 0 degrees of freedom, denoted by χ (k) , has probability density function We proceed by obtaining bounds for the solution of the chi distribution Stein equation, before specialising to the solution of the Rayleigh Stein equation. We first note that the density ρ k satisfies the differential equation where s(x) = x and τ (x) = k − x 2 . It therefore follows from Theorem 1 of [42] that a Stein equation for the χ (k) distribution is given by where K ∼ χ (k) . It is straightforward to solve (5.59) (see Proposition 1 of [42]): In order to bound the solution (5.60) and its first derivative, it will be useful to note the following straightforward extension of Lemmas 1 and 3 of [41].
Lemma 5.1. Let ρ be the probability density function of a random variable Y , supported on (a, b), which satisfies the differential equation (5.58), where s(x) is a polynomial of degree no greater than two and τ (x) is monotonic in (a, b) with exactly one sign change at the point m ∈ (a, b). Let h : (a, b) → R be bounded. Then, the solution of the Stein equation with F denoting the distribution function of Y .