ON BERNSTEIN–KANTOROVICH INVARIANCE PRINCIPLE IN H ¨OLDER SPACES AND WEIGHTED SCAN STATISTICS

. Let ξ n be the polygonal line partial sums process built on i.i.d. centered random variables X i , i ≥ 1. The Bernstein-Kantorovich theorem states the equivalence between the ﬁniteness of E | X 1 | max(2 ,r ) and the joint weak convergence in C [0 , 1] of n − 1 / 2 ξ n to a Brownian motion W with the moments convergence of E (cid:107) n − 1 / 2 ξ n (cid:107) r ∞ to E (cid:107) W (cid:107) r ∞ . For 0 < α < 1 / 2 and p ( α ) = (1 / 2 − α ) − 1 , we prove that the joint convergence in the separable H¨older space H oα of n − 1 / 2 ξ n to W jointly with the one of E (cid:107) n − 1 / 2 ξ n (cid:107) rα to E (cid:107) W (cid:107) rα holds if and only if P ( | X 1 | > t ) = o ( t − p ( α ) ) when r < p ( α ) or E | X 1 | r < ∞ when r ≥ p ( α ). As an application we show that for every α < 1 / 2, all the α -H¨olderian moments of the polygonal uniform quantile process converge to the corresponding ones of a Brownian bridge. We also obtain the asymptotic behavior of the r th moments of some α -H¨olderian weighted scan statistics where the natural border for α is 1 / 2 − 1 /p when E | X 1 | p < ∞ . In the case where the X i ’s are p regularly varying, we can complete these results for α > 1 / 2 − 1 /p with an appropriate normalization.


Introduction
Let (Z n ) n≥1 be a sequence of random elements in some separable metric space S endowed with its Borel σ-field S .Let Z be a random element in S. Assume for notational simplicity that Z and the Z n 's are all defined on the same probability space (Ω, F , P).Then Z n converges in distribution to Z, denoted by n converges weakly to µ = P •Z −1 .This means that for every continuous bounded function f : S → R, (1.1) Relaxing the boundedness assumption on f in (1.1) leads to the classical question of convergence of moments.When S is a separable Banach space with norm , one is interested in extending the convergence in (1.1) to the case of functions satisfying for some positive constants c 1 , c 2 , r, (1. 2) It is well known that this extension is valid if and only if ( Z n r ) n≥1 is uniformly integrable (see [3], Thm.5.4) that is lim Let us note that if ( Z n r ) n≥1 is uniformly integrable, necessarily In this paper we focus on the convergence of moments in the functional central limit theorem.Let (X i ) i≥1 be an i.i.d.sequence of real valued random variables with null expectation and variance one if they exist, S n := X 1 + • • • + X n and ξ n the random polygonal line with vertices (k/n, S k ), k = 0, 1, . . ., n : From Bernstein theorem [2] it is known that for r > 0 the joint convergence where G is a Gaussian N(0, 1) random variable, is equivalent to the finiteness of E |X 1 | max{2,r} .Note also that in the case where r = 2 the convergence of the corresponding moment is trivial and that for 0 < r < 2 the convergence of E n −1/2 S n r follows immediately from E X 2 1 < ∞ by uniform integrability of n −1 S 2 n , n ≥ 1 .Let us denote by W a standard Brownian motion viewed as a random element in the space C[0, 1] of continuous functions x : [0, 1] → R endowed with the uniform norm x ∞ = sup{|x(t)|, t ∈ [0, 1]}.The classical Donsker-Prokhorov theorem provides the equivalence : For r > 0, the Bernstein-Kantorovich functional central limit theorem (see [11], Thm.11.2.1, p. 219) provides the equivalence between E |X 1 | max{2,r} < ∞ and the joint convergence It turns out that the condition E |X 1 | r < ∞ for some r > 2 provides also the convergence in distribution of n −1/2 ξ n to W in a stronger topology than the C[0, 1]'s one.Define for 0 ≤ α < 1 the Hölder space H o α [0, 1] as the set of functions x : [0, 1] → R such that endowed with the norm which makes it a separable Banach space (isomorphic to C[0, 1] in the special case α = 0).Let α ∈ (0, 1/2) and p(α) = (1/2 − α) −1 .By the necessary and sufficient condition for Lamperti's Hölderian invariance principle [12,13], we know that n −1/2 ξ n converges in distribution in the space H o α [0, 1] to the standard Brownian motion if and only if P(|X 1 | > t) = o(t −p(α) ) when t tends to infinity.When E |X 1 | p(α) < ∞, this condition is satisfied.Our first result extends the Bernstein-Kantorovich functional central limit theorem to the spaces Then the joint convergence holds if and only if It is worth noticing here that (1.9) is equivalent to the convergence of the distribution of n −1/2 ξ n to the one of W with respect to the Wasserstein distance of order r associated to the norm .α , i.e. with the mass transportation cost function c(x, y) = x − y r α , see Section 3.2 for details.An immediate consequence of Theorem 1.1 is that (1.11) (1.12) Among the various functionals f (n −1/2 ξ n ) where f satisfies a condition like (1.12) are the (powers of the) following weighted scan type statistics : We refer to [1,10] for valuable information about scan statistics and their applications.The following result is a corollary of more general results obtained in this paper (see Thms. 4.1 and 4.2).
and X 1 is regularly varying with exponent p then for any 0 ≤ r < p, and Y p has Fréchet distribution with exponent p.
The paper is organized as follows.Section 2 is devoted to preliminaries where uniform integrability, regularly varying random variables are discussed and necessary tools on Hölder spaces are presented.In Section 3, one proves Theorem 1.1 and some comments concerning the upper bound for admissible Hölder index in the Bernstein-Kantorovich invariance principle are presented.Convergence of moments of weighted scan statistics is considered in Section 4. The paper ends with an appendix devoted to some facts from Karamata theory.

Uniform integrability
Lemma 2.1.Let (Z n ) n≥1 be a sequence of random elements in the Banach space (S, ).For r > 0, ( Z n r ) n≥1 is uniformly integrable if and only if The proof is elementary and will be omitted.

Hölderian tools
Let D j denotes the set of dyadic numbers of level j in [0, 1], that is D 0 := {0, 1} and for j ≥ 1, The following sequential norm defined on H o α [0, 1] by is equivalent to the natural norm x α , see [5].Let us define also The Hölder norm of a polygonal line function is very easy to compute according to the following lemma for which we refer e.g. to [8] Lemma 3, where it is proved in a more general setting.Lemma 2.2.Let t 0 = 0 < t 1 < • • • < t n = 1 be a partition of [0, 1] and x be a real-valued polygonal line function on [0, 1] with vertices at the t i 's, i.e. x is continuous on [0, 1] and its restriction to each interval [t i , t i+1 ] is an affine function.Then for any 0 ≤ α < 1,

Regularly varying random variables
Throughout this paper we implicitly assume that all the random variables considered are defined on the same probability space (Ω, F , P) and we use the following notion of regularly varying random variable.
Definition 2.3.The random variable X is regularly varying with index p > 0 (denoted X ∈ RV p ) if there exists a slowly varying function L such that the distribution function F (t) = P(X ≤ t) satisfies the tail balance condition where a, b ∈ (0, 1) and a + b = 1.
We refer to [4] for an encyclopaedic treatment of regular variation.Writing L p or L o p,∞ for the sets of random variables X verifying respectively E |X| p < ∞ or lim t→∞ t p P(|X| > t) = 0, we note that RV p ⊂ L r , for 0 ≤ r < p, The next lemma plays a key role in our results on the scan statistics built on RV p random variables.Its proof is detailed in the Appendix.
i) For any 0 < s < p, for n large enough, uniformly in y ∈ [1, ∞).ii) For any s > p, ) for n large enough, uniformly in y ∈ [1, ∞), where In this section first we prove Theorem 1.1 and then discuss some aspects of Bernstein-Kantorovich theorem in Hölder framevork.

Proof of Theorem 1.1
The necessity of the X 1 's integrability conditions for the joint convergence (1.9) is easily seen.Indeed when r < p(α), (1.10) follows from the first convergence in (1.9), see [12].When r ≥ p(α), we note that giving the necessity of (1.11).Now let us prove that the integrability conditions (1.10) or (1.11) are sufficient for the joint convergence (1.9).By the Hölderian invariance principle [12], (1.10) or a fortiori (1.11) and by continuous mapping that n −1/2 ξ n α converges in distribution to W α .It remains to check the uniform integrability of the sequence n −1/2 ξ n r α n≥1 .It is enough to consider the case where r ≤ p(α) only.Indeed, if r > p(α), then we choose β = 1/2 − 1/r (so that r = p(β)) and notice that uniform integrability of the sequence Now to prove the uniform integrability of the sequence ( n −1/2 ξ n r α ) n≥1 we can obviously replace α by an equivalent norm.The choice of seq α seems more convenient here.So we have to prove that for r ≤ p(α), The following proof of (3.1) is essentially common to the cases r < p(α) and r = p(α) except for some nuance in the exploitation of the integrability of X 1 .From now on, we write p for p(α).
The first task in establishing (3.1) is to obtain a good estimate for Write for simplicity t k,j = k2 −j , k = 0, 1, . . ., 2 j , j = 1, 2, . . .and t k = t k,j whenever the context dispels any doubt on the value of j.It is easily seen that for any x ∈ H α such that x(0) = 0, From this we deduce that with where log denotes the logarithm with basis 2 (log 2 = 1).
In the first case we have If t k and t k+1 are in consecutive intervals, noticing that the slope of each of the two involved segments of the polygonal line is bounded in absolute value by n max 1≤i≤n |X i |, we get With both cases taken into account we obtain Noting that for j > log n, 2 j(−1+α) n 1/2 < n −1/2+α = n −1/p , this leads to To control the contribution of P 2 (n, t) when estimating the integral in (3.1), we note that for every n ≥ 1, In the case where r = p, as E ds is supposed finite, we can bound the right hand side of (3.7) by ∞ a ps p−1 P(|X 1 | > s) ds uniformly in n ≥ 1.In the case where r < p, the hypothese (1.10) implies that for some constant K depending only on the distribution of X 1 .Hence Gathering both cases we obtain that for r ≤ p, where In P 1,2 (n, t), max 0≤j≤log n 2 jα ≤ n α , so Using (3.10) we obtain that for r ≤ p, To estimate P 1,1 (n, t), we use a truncation method.Define for t > 0 and 0 < δ ≤ 1, Let S u k and S u k be the random variables obtained by replacing X i with respectively X i in S u k or with X i in S u k .We introduce also First, since on the event {max 1≤i≤n |X i | ≤ δtn 1/p }, S u k = S u k for every k, we note that (3.17) for t large enough, uniformly in n ≥ 1.By a Fubini argument, Now, in the case where r < p, using (3.8) we obtain Therefore (3.18) is satisfied for every t > t 0 not depending on n, since we can choose

.19)
The same holds in the case where r = p, replacing (3.8) by Markov's inequality and K by E |X 1 | p .Now it only remains to deal with sup n≥1 ∞ a rt r−1 P 1,1 (n, t, δ) dt.For any q > p, we have (3.20) Next we bound up E |S u k+1 − S u k | q by using the Rosenthal inequality: were C q is a universal constant, i.e. not depending on the distribution of the X i 's.As the X i s are i.i.d. and Going back to (3.20) with this bound, we obtain With all these partial estimates, the upper bound obtained for P 1,1 (n, t, δ) becomes where It is worth recalling here that E | X 1 | q depends on n, δ and t.As the first term in the upper bound (3.21) neither depends on n or on δ and goes to 0 as a tends to infinity, it remains only to investigate the asymptotic behavior of sup n≥1 I r,q (a, n) when a tends to infinity.To transform I r,q (a, n), we use the fact that if Y is a positive random variable and f a C 1 non decreasing function on [0, ∞) with f (0) = 0, then by the Fubini-Tonelli theorem, for any positive constant c, Exchanging the order of integrations in J r,q (a, n) gives where We bound J for r ≤ p, using (3.8), agreeing for simplicity that K = E |X 1 | p when r = p.This gives J ≤ K δan 1/p 0 s q−p−1 ds = K q − p a q−p δ q−p n q/p−1 . (3.23) For J , the same method would lead to a divergent integral in the special case r = p, so we restrict the use of (3.8) to the case where r < p.This gives (3.24) Going back to I r,q (a, n) and accounting (3.22)-(3.24)we obtain (recalling that δ ≤ 1) In the case r = p, bounding the integral in (3.24) by 1 p E |X 1 | p = K/p, we obtain Recapitulating all the estimate proposed throughout the proof, we see that for every a > t 0 (δ) defined by (3.19) and every n ≥ 1, We note in passing that in this case we did not really need the freedom to tune the value of δ, the simple choice δ = 1 would have done the job as well.
In the special case where r = p, we have only to modify the treatment of the last term in the bound (3.27).As q > p, for any ε > 0, we can fix a δ > 0 such that C 2 (p, q)(1 − p/q) −2 Kδ q−p < ε.Accounting (3.11), (3.13), (3.10) and (3.26), there is some a 1 depending on ε, p, q and on the distribution of X 1 , such that for every a ≥ a 1 and every n ≥ 1, As ε was arbitrary, the uniform convergence (3.1) is established and the proof is complete.

Comments
If we fix p > 2 and consider X 1 ∈ L p , then the best possible Hölderian index corresponding to the p'th moments convergence is α = α(p) := 1/2 − 1/p as shows the following result.
. By looking at the increments of n −1/2 ξ n between k/n and (k + 1)/n, 0 ≤ k < n, we see that which can be recast as It is well known that when the X i 's are i.i.d., Now choose for |X 1 | the distribution given by where (S, d) is a separable metric space, P (P1,P2) denotes the set of all probabilities on the Borel σ-field of S × S with given marginals P 1 , P 2 and c(x, y) = H(d(x, y)) where H(0) = 0, H is non decreasing on [0, ∞) and satisfies the Orlicz condition sup t>0 H(2t)/H(t) < ∞.It is known, see Theorem 11.1.1 in [11] that if for some a ∈ S, S c(x, a)P n ( dx) < ∞ for every n ≥ 1, then lim n→∞ A c (P n , P 0 ) = 0 if and only if for some (and therefore for any) b ∈ S.
Let us denote by A α r the Kantorovich functional obtained by choosing ).We observe that (A α r ) 1/r is the Wasserstein distance W r associated to the space H o α [0, 1].Write P n for the distribution of n −1/2 ξ n and P 0 for the Wiener measure.Then (3.32) can be rewritten as From this point of view, Theorem 1.1 means that the convergence of A α r (P n , P 0 ) to 0 is equivalent to the moment condition (1.10) or (1.11) according to r < p(α) or r ≥ p(α).Similarly, from the Bernstein Kantorovich invariance principle in C[0, 1], the convergence of A 0 r (P n , P 0 ) to 0 is equivalent to E |X 1 | max(r,2) < ∞.As already hinted in the introduction, we see that starting from the classical Donsker-Prokhorov invariance principle in C[0, 1] (A 0 2 (P n , P 0 ) → 0 iff E X 2 1 < ∞) and looking for a stronger convergence in the framework of C[0, 1] (A 0 p (P n , P 0 ) → 0) at the price of a stronger moment assumption , we obtain a similar convergence (A α p (P n , P 0 ) → 0) with a stronger topological path's space.

An application to uniform quantile processes
As a corollary of Theorem 1.1, we look now at the convergence of moments for the uniform quantile process.For the weak-Hölderian convergence of the uniform quantile process we refer to [7].
Let U 1 , . . ., U n be a sample of i.i.d.random variables uniformly distributed on [0, 1].We denote by U n:i the order statistics of the sample: which are distinct with probability one.For notational convenience, put The polygonal uniform quantile process χ pg n is the random polygonal line on [0, 1] which is affine on each [u n:i−1 , u n:i ], i = 1, . . ., n + 1 and satisfies As a corollary of Theorem 10 in [7], for any 0 < α < 1/2, χ pg n converges weakly in H o α [0, 1] to the Brownian bridge B. Theorem 1.1 enables us to complete this convergence by the following convergence of moments.Corollary 3.2.Let χ pg n be the polygonal uniform quantile process defined above.Then for every 0 ≤ α < 1/2, and every r > 0, where B is the Brownian bridge on [0, 1].
Proof.We recall the distributional equality (see e.g.[15]) where S k = X 1 + • • • + X k and the X k 's are i.i.d 1-exponential random variables.Following [7], introduce the polygonal process ζ n which is affine on each interval [u n:i−1 , u n:i ], i = 1, . . ., n + 1 and such that Putting X i = X i − E X i (note that E X 2 1 = 1) and S k = S k − E S k , we consider also the normalized partial sums polygonal process Ξ n built on the S k 's, i.e. the random polygonal line with vertices (k/n, n −1/2 S k ), k = 0, 1, . . ., n.As shown in the proof of Theorem 10 in [7], To obtain (3.39) with any s > r, we just note the following facts.First, by elementary computation, Next, since X 1 has finite moments of every order, E Ξ n+1 2s α converges to E W 2s α by Theorem 1.1.

Weighted scan statistics
In this section we consider several weighted scan type statistics.For α ≥ 0, define and where and the convergence (4.1) follows from Theorem 1.1.
To prove (4.2), we use the representation which is explained in details in [14].Here the functional g is defined by By the Hölderian invariance principle [12], continuous mapping and Slutsky lemma, (4.3) provides the convergence in distribution of g n (n −1/2 ξ n ) to g(W ).Then in view of (4.4), Theorem 1.1 gives (4.2) since g(W ) = T α (B).
where b n is defined by (1.13) and Y p has the Fréchet distribution with exponent p.
Proof.From [9] we know, that Hence, in order to prove convergence of moments we need to check uniform integrability of (b −r n M r n,α ) for each 0 < r < p. Actually it is enough to prove that for each 0 < r < p, And to establish (4.7) it is clearly sufficient to prove that for some positive constant c and some integer n 0 (possibly depending on r), By Lemma 4.3, Markov and Doob inequalities with q > p, It is worth noticing here that for s > 0, recalling that log n denotes the dyadic logarithm: 2 log n = n.Moreover we can always choose q such that q > max(2, p) and 1 + (α − 1/2)q > 0.