IMPROVED ONE-SIDED DEVIATION INEQUALITIES UNDER REGULARITY ASSUMPTIONS FOR PRODUCT MEASURES

. This note is concerned with lower tail estimates for product measures. Some improved deviation inequalities are obtained for functions satisfying some regularity and monotonicity assumptions. The arguments are based on semigroup interpolation together with Harris’s negative association inequality and hypercontractive estimates.

It is well known that concentration of measure is an effective tool in various mathematical areas (cf.[6]).In a Gaussian setting, classical concentration results typically state that, for a Lipschitz function f : R n → R with Lipschitz constant f Lip , with γ n the standard Gaussian measure on R n .Another example of concentration of measure is the Poincaré inequality satisfied by γ n .Namely, for f ∈ L 2 (γ n ) smooth enough: with M n = max i=1,...,n X i where (X 1 , . . ., X n ) stands for a standard Gaussian random vector in R n , whereas it has been proven that Var(M n ) ≤ C/ log n with C > 0 a numerical constant.At an exponential level (1.1) is not satisfying either.Indeed, it is well known in Extreme Value theory (cf.[11], p. [14][15] that M n can be renormalized by some numerical constants a n = √ 2 log n and b n = a n − log 4π+log log n 2an , n ≥ 1, such that in distribution where Λ 0 corresponds to the Gumbel distribution with cumulative distribution function: Then, it is clear that the asymptotics of Λ 0 are not Gaussian but rather exponential on the right tail and double exponential on the left tail.It is now obvious that (1.1) and (1.2) lead to sub-optimal results for the function f (x) = max i=1,...,n x i .This is referred to as Superconcentration phenomenon (cf.[7]).This kind of phenomenon occurs for different functionals of Gaussian random variables and has been studied in [5,[18][19][20][21], etc.Recently, additional convexity assumptions have been fruitfully used by Paouris and Valettas in order to improve the concentration inequality (1.1).In the context of small ball probabilities and random Dvoretzky's Theorem, these two authors improved the lower tail of any convex function, thanks to Ehrard's inequality, in [16].More precisely, they obtained Theorem 1.1.[Paouris,Valettas] Let f : R n → R be a convex function, then the following holds where c > 0 is a universal constant.
Remark 1.2.Of course, the improvements stays in the fact that Var γn (f ) ≤ f 2 Lip as we have just seen on the basic example of the maximum of n independent standard Gaussian random variables.Ehrhard's inequality has also been used by Valettas in [19] where he proved that (1.1) is tight if the convex function f is not superconcentrated.
Besides, the work from [16] has been used by Valettas to produce some variations of Theorem 1.1.Indeed, as consequence of his inequality with Paouris, combined with transportation-type arguments, he obtained (cf.[19], Sect.2.1.3)concentration inequalities for nondecreasing, convex functions in a log-concave measures setting.
A similar result as Theorem 1.1 or 1.4 has also been obtained in [23].Instead of convexity, the author of [23] assumes that f belongs to the set and obtained the following lower deviation estimate Theorem 1.3 (Nguyen Tien).Let f ∈ H + be, then the following holds Varγ n (f ) , t ≥ 0. (1.4) The purpose of this note is the following: semigroup's arguments together with Harris negative association Lemma and hypercontractive estimates will be used to obtain a deviation inequality for the lower tail of functions belonging to F + where The obtained deviation inequalities can be seen as an extension, for the lower tail, of Theorem 1.3.The cost of this extension is the larger quantity ∇f 2 φ (instead of Var µ (f )) in the exponential.As it will be explained in remark 1.5, if the measures (µ i ) i=1,...,n are symmetric, one can substitute F + by the larger set H + .
At this stage, let us also notice that there exists some functions in F + which are not convex.In dimension 2, f is not convex if Det Hess f (x, y) < 0 for some (x, y) ∈ R 2 (i.e.A = (x, y) is a saddle point and Hessf is not positive semi-definite).
For instance, consider the real function f (x) = max(x, 0) and set h(x, y) = f (x) 2 + 4f (x)f (y) + f (y) 2 for any (x, y) ∈ R 2 .Then h is an element of F + which is not convex.As another example, one can consider the function x + 1 when x ≥ 0, e x otherwise.
Then, set g(x, y It is a simple matter to check that g ∈ F + .Besides, g is not convex on R 2 since Det Hess g(x, y) < 0 on R 2 + .In conclusion, H + and F + are not restricted classes of the set of convex functions.Now, let us describe in more details our setting and state our main result.
Let n ≥ 1 be fixed and consider µ = µ 1 ⊗ . . .⊗ µ n where, for any i = 1, . . ., n, dµ i = e −Vi(x) dx are probability measures on B(R), the Borel σ-algebra of R, and V i : R → R are smooth potentials.In the sequel, we will assume that there exists κ i ∈ R such that V i (x) ≥ −κ i ∀x ∈ R and i = 1, . . ., n and will denote by κ = max i=1,...,n κ i .Now, let us recall some facts about functional inequalities and their links with related semigroups.General references on semigroups, functional inequalities and concentration of measures are [1,6,13].
In our setting, dµ(x) = e −V (x) dx is a probability measure on B(R n ), the Borel σ-algebra of R n , with It is classical that such measures can be seen as an invariant and reversible measure of the associated diffusion operator L = ∆ − ∇V • ∇.The operator L generates the Markov semigroup of operators (P t ) t≥0 and defines by integration by parts the Dirichlet form for some smooth functions f, g on R n .The set of functions for which the preceding expression make sense is called the Dirichlet domain of L. We denote by D(L) such set.Given such a couple (L, µ), it is said to satisfy a spectral gap, or Poincaré, inequality if there is a constant λ > 0 such that for all functions f of the Dirichlet domain Similarly, it satisfies a logarithmic Sobolev inequality if there exists a constant ρ > 0 such that for all functions f of the Dirichlet domain, and f > 0. One speaks of the spectral gap constant (of (L, µ)) as the largest λ > 0 for which (1.6) holds, and of the logarithmic Sobolev constant (of (L, µ)) as the best ρ > 0 for which (1.7) holds.We still use λ and ρ to designate these constants.It is classical (cf.[13]) that ρ ≤ λ.
Let (P t ) t≥0 be a Markov semigroup with generator L acting on a suitable class of functions on R n , B(R n ) .A particular feature of the logarithmic Sobolev inequalities is the (equivalent, cf.[9]) hypercontractive property of the semigroup (P t ) t≥0 .Precisely, the logarithmic Sobolev inequality (1.7) is equivalent to saying that, whenever p ≥ 1 + e 2ρt , for all functions f in L p (µ), For simplicity, we say below that a probability measure µ, in this context, is hypercontractive with constant ρ.
Finally, let us also recall that an Orlicz norm • φ is defined as follow: given a Young function φ, set the associated Orlicz norm of a measurable function f : R n → R. In the sequel, let φ : log(e+x) for x ≥ 1 and φ(0) = 0. To ease the notation, we set ∇f 2 φ as a shorthand for n i=1 ∂ i f 2 φ where ∂ i , for any i ∈ {1, . . ., n}, stands for the i-th partial derivative operator.In this context, the following Theorem is our main result.
Theorem 1.4.Within the preceding framework, assume that (µ i ) i=1,...,n are hypercontractive with constant ρ.Then, for any smooth f ∈ F + we have where . In particular, the following holds where c ρ,λ > 0 is a universal constant.
1.In practice, it is classical to bound (cf.[8]) ∇f 2 φ by the following quantity: with C > 0 a numerical constant.2. When, the standard Gaussian measure is considered i.e.V i (x) = x 2 2 , i = 1, . . ., n and x ∈ R the quantity ∇f 2 φ can be replaced by the variance Var γn (f ) which is smaller.This is essentially Tien's result 1.3.3. When the measures (µ i ) i=1,...,n are symmetric (e.g. the Gaussian measure γ n ), one can consider the set H + instead of F + .Indeed, suppose f ∈ H + is not increasing then it is enough to perform a change of variable and consider f (−x) ∈ F + .
We want to highlight the fact that only κ ∈ R is required here, it appears as a mild property shared by numerous potentials such as, for example, double-wells potentials on the line of the form V (x) = ax 4 − bx 2 , a, b > 0. The stronger strict convexity assumption V ≥ ρ > 0 (satisfied by the standard Gaussian measure γ n ) actually implies that µ satisfies a logarithmic Sobolev inequality, and thus hypercontractivity, with constant ρ (cf.[1]).
To better understand where the improvement lies in Theorem 1.4.Let us recall some facts: for a smooth function f : R n → R it is known (cf. the introduction of [19] and references therein) that and each terms can be different from one another.For instance (cf.[5,7,22]), in a Gaussian case, if Let us mention that (1.1) has already been improved for convex functions, with Lip in [2,24] (cf.[17], Sect.5.2).Thus, in Theorem 1.4, we obtain something slightly better.However, this bound is a priori larger (except for the Gaussian case) than the one involving Var µ (f ) which would be the desired one for every µ.Now, let us describe the organization of the article.Section 2 is concerned with semigroup facts and negative association.In Section 4 we prove Theorem 1.4.Section 4 will describe some extensions.Finally, in Section 5, we say a few words about Theorem 1.3.
In the sequel, we will always assume that the functions are sufficiently integrable with respect to µ in order that studied inequalities make sense and the commutation between integrals and derivatives are legit.Also, by convention, C > 0 is a numerical constant that may change at each occurrence.

Semigroup properties
In this section, we present some tools needed to prove Theorem 1.4.In the context described in the introduction, let us collect some important properties of the semigroup (P t ) t≥0 .Again, for more details, the reader is referred to [1] (or [12], pp.306-328, for a shorter exposition).
Proposition 2.1.Within the preceding framework, the following holds -For any smooth function f : R n → R, the semigroup (P t ) t≥0 solves the heat equation associated to L.
-For any i = 1, . . ., n and any smooth function f : R n → R, the uniform lower bound V i ≥ −κ i , is equivalent to the following commutation property where κ = max i=1,...,n κ i .

Semigroup representation of the entropy
As it will be needed in the sequel, we state below some representation (cf.[1], Sect.5.5 or Sect.2.1 in [12]) of the entropy of a function along the semigroup (P t ) t≥0 .
As it is exposed in [8], when µ satisfies a logarithmic Sobolev inequality there is no need to deal with large value of t in (2.5).Indeed, a logarithmic Sobolev inequality is equivalently stated as a exponential decay of the entropy along the semigroup.Namely, and every positive function f in L 1 (µ).Therefore, the combination of the preceding representation (2.5) by semigroup together with the exponential decay of the the entropy (cf.[1], p. 244) along the semigroup we have, for any T > 0, (2.7) In the sequel, we choose e.g.T = 1 2ρ .

Semigroup and Harris inequality
As mentioned earlier, in order to investigate the lower tail, one has to use negative association inequality.Therefore we state below Harris's Lemma (cf.[6], p. 43) and see how it can be combined with semigroups.Recall that monotonicity or convexity properties of a function f : R n → R are understood coordinate-wise.Proposition 2.3 (Harris's negative association inequality).Let f : R n → R and g : R n → R two monotone functions with different monotonicity, then with X i independent random variables.
In the sequel, this proposition will also be used at the level of the semigroup.That is to say for the underlying heat kernel measure p t (x, dy) which is defined (cf.[1], p. 12) as This is the content of the following Lemma.
Lemma 2.4.Let t ≥ 0 and x ∈ R n be fixed and consider f and g two monotone functions with different monotonicity, then The following Lemma explains, in our context, that the semigroup (P t ) t≥0 preserves monotonicity properties of a function.
Lemma 2.5.Let f : R n → R be monotone, then x → P t (f )(x), t ≥ 0 shares the same monotonicity properties as the function f .Proof.As it is exposed in [14], in our setting, we have the following representation of ∇P t f (x) for any x ∈ R n and t ≥ 0. (2.9) Thus, x → P t f (x) shares the same monotonicity properties as f .
1.In the Gaussian setting, for quadratic potentials, this property is obvious thanks to Mehler's formula which gives an explicit representation of the Ornstein-Uhlenbeck semigroup: (2.10) 2. Representation as (2.9) is part of the so-called intertwinnings relation between a semigroup with some differential operator (cf.[3,4] and references therein).3. The fact that a semigroup preserves the monotonicity of a function has also been investigate in [15].

Study of the lower tail -Proof of Theorem 1.4
Recall that the measures (µ i ) i=1,...,n are assumed to be hypercontractive with constant ρ.In this section we prove Theorem 1.4 thanks to Lemmas 2.4 and 2.5.
Proof.Let f ∈ F + be.Then, start with the representation formula (2.7) and apply it to e −f /2 .We obtain, thanks to the commutation properties (2.3), Notice that, for any i ∈ {1, . . ., n}, ∂ i f and e −f are monotone with different monotonicity.Therefore, by Lemma 2.5, this is also the case for P t (∂ i f ) and P t (e −f ).Then, by applying Lemma 2.4 twice, we get where in the last upper bound we used that µ is the invariant measure of (P t ) t≥0 .Namely, for any smooth functions h : R n → R.
Finally, in the preceding inequality, the last factor can be upper bounded by hypercontractive arguments.To this task, we follow the proof of Talagrand's inequalities exposed in [8] (pages 8 − 9) in order to obtain To sum up, we have proven . The deviation inequality is classically obtained by applying the preceding inequality to e −θf with θ ≥ 0. Remark 3.1.Let us notice that the preceding scheme of proof can also be done at the level of the variance with the dynamical representation (used in [8]) Furthermore, when µ = γ n , one can choose T = +∞.Then, thanks to the exact commutation property (2.4) between ∇ and (P t ) t≥0 together with the preceding dynamical representation of the variance, we get

Some extensions
Let us say a few words about some potential extensions.As it was emphasized in [8], one key features of the preceding methodology is the following.Given a Markov semigroup (P t ) t≥0 with generator L and invariant measure µ.Assume that (L, µ) is hypercontractive and that the associated Dirichlet form E may be decomposed along directions Γ i acting on functions on some state space E as in a way that, for each i = 1, . . ., n, Γ i commutes to (P t ) t≥0 in the sense that, for some constant κ ∈ R, every t ≥ 0 and f smooth enough, In the current article, this commutation property is obtained as a strong gradient bound from Bakry and Emery's Gamma 2 criterion and is stated in (2.3).
According to [1,20], the commutation properties (4.1) is satisfied with κ = −1.Now, observe that the operator Γ i , i = 1, . . ., n preserves the key features of the function f .More precisely, assume f ∈ F + , then it is easy to check that Besides the following identity, for any θ ∈ R, holds Therefore, it is possible to apply Harris's negative association 2.3 in this situation.Indeed, in this setting, it is then easy to extend slightly the result of the current article.Following the lines of the proof of our main result, we obtain Notice also, according to [8], that hypercontractive estimates also yields the following upper bound with C > 0 a numerical constant.It is obvious that the same proof holds at the level of the variance.This bounds has to be compared with Proposition 2.20 (for k = 1) in [16].As exposed in [8], non-product measures can also be investigated.For instance, if µ stands for the uniform probability measure on the sphere S n−1 , one may consider the following fact where the direction D ij = x i ∂ j − x j ∂ i , i, j = 1, . . ., n.The operators D ij commute in an essential way to the spherical Laplacian ∆ = 1 2 n i,j=1 D 2 ij so that (4.1) holds with κ = 0.However, the monotone properties needed in the proof (in order to apply Harris's negative association inequality) seems to be difficult to characterize.

Some remarks about theorem 1.3
We briefly want to highlight the fact that the arguments used in [23] can be easily expressed in terms of semigroup arguments.As we focus on the Gaussian case, notice that (P t ) t≥0 stands for the Ornstein-Uhlenbeck semigroup.This reformulation gives shorter proof as we will show in the sequel.Unfortunately, the strategy presented below relies on exact commutation and can not be extended to the measure µ.
Following [23], introduce the operator T g defined as follows where f : R n → R is fixed, g : R n → R is centered under γ n and L(X) = γ n .
Lemma 5.1.With the preceding notations, for any θ ≥ 0, we have Proof.Since g is centered under γ n and by ergodicity (2.2) of (P t ) t≥0 , we have Thus, by the fundamental Theorem of calculus, we have Remark 5.2.The use of the operator T g was the main idea of the article [23], we state it in a slightly different way which avoids a lot of calculus.For further purposes, observe that E γn [T g ] = Cov γn (f, g).In particular, As in [23], the proof of Theorem 1.3 relies on Lemma 5.1.To stay as close as possible to the original proof, consider g ∈ F + and set f = −g.To conclude, it is enough to show that E γn e θ(f −m) T f − Var γn (f ) ≤ 0. Indeed, if it is the case we have ψ (θ) ≤ θVar γn (f )ψ(θ).
Once integrated, this differential inequality yields Besides, for any i = 1, . . ., n, by hypothesis on f .Thus, E γn e θ(f −m) T f − Var γn (f ) ≤ 0 and the proof is complete.
Remark 5.3.Theorem 1.3 implicitly uses a covariance identity (through the operator T f ).Similar identities have been used in [10] for infinitely divisible random vectors having finite exponential moments.In particular, sharp deviation inequalities were obtained.We wonder if Theorem 1.4 can be extend to this level of generality.