Elementary coupling approach for non-linear perturbation of Markov processes with mean-field jump mechanisms and related problems

Mean-field integro-differential equations are studied in an abstract framework, through couplings of the corresponding stochastic processes. In the perturbative regime, the equation is proven to admit a unique equilibrium, toward which the process converges exponentially fast. Similarly, in this case, the associated particle system is proven to converge toward its equilibrium at a rate independent from the number of particles.


Introduction
The initial motivation of this work is the study of the long-time behaviour of mean-field semi-linear integrodifferential equations of the form ∂ t m t (x) = L m t (x) + Q mt (λ mt m t ) (x) − λ mt (x)m t (x) , (1.1)

The non-linear process
Let E be a Polish space, (Ω, F, P) be a probability space on which all the random variables in this work will be implicitly defined, and let L be the infinitesimal generator of a conservative Feller semi-group (P t ) t 0 on E. Denote P(E) the set of probability measures on E and D(E) the set of càdlàg functions for R + to E, endowed with the Skorokhod topology. From Chapter 17 of [33], for all initial condition x ∈ E, there exists an associated strong Markov process (Z x t ) t 0 on D(E) such that P t f (x) = E (f (Z x t )) for all t 0 and f ∈ C 0 (E). In order to avoid regularity or well-posedness considerations, we will rather work at the level of the stochastic process than of the generator.
For all ν ∈ P(E), let λ ν be a measurable function from E to R + , and let Q ν be a Markov kernel on E, namely a measurable function from E to P(E). Throughout all this work, if Q : E → P(E) is a Markov kernel, we also denote Q the Markov operator defined by Qf (x) = (Q(x))(f ) for all x ∈ E and all bounded measurable functions f . We call λ ν the non-linear jump rate, and Q ν the non-linear jump kernel.
Let G ν : E × [0, 1] → E be a representation of Q ν , namely be such that, if U is a uniformly distributed random variable (r.v.) on [0, 1] then, for all x ∈ E, G ν (x, U ) is distributed according to Q ν (x). From Corollary 7.16.1 of [3] such a representation always exists. Similarly, let H : (x, u) ∈ E × [0, 1] → (H t (x, u)) t 0 ∈ D(E) be a representation of the kernel x → Law ((Z x t ) t 0 ).
This assumption that the jump rate is bounded uniformly, both in x ∈ E and ν ∈ P(E), is made for simplicity. Although it already holds in many interesting applications, it may sometimes be too restrictive, but in many such cases it can be circumvented via proper a priori bounds (see e.g. Sect. 4.6), specific to the problem at hand.
Let M(R + , P(E)) be the set of measurable functions from R + to P(E). Under Assumption 2.1, for x ∈ E and µ : t → µ t in M(R + , P(E)), we want to define an inhomogeneous Markov process (X µ,x t ) t 0 starting from x which, loosely speaking, follows the Markov dynamics of the semi-group (P t ) t 0 between some random jump times drawn at rate t → λ µt (X µ,x t ) at which it jumps to new values drawn according to the distribution Q µt (X µ,x t ). More precisely, consider an i.i.d. sequence (S k , U k , V k , W k ) k∈N where, for all k ∈ N, S k , U k , V k and W k are independent one from the other, S k follows an exponential law with parameter 1 and U k , V k and W k a uniform distribution over [0,1]. Set T 0 = 0, X µ,x 0 = x and suppose that T n 0 and (X µ,x t ) t∈[0,Tn] have been defined for some n ∈ N and are independent from (S k , U k , V k , W k ) k n . Set T n+1 = T n + S n /λ * .
By induction, (X µ,x t ) t∈[0,Tn] is thus defined for all n ∈ N and, since T n is the n th jump time of a Poisson process, it goes almost surely to infinity as n goes to infinity, so that X µ,x t is almost surely defined for all t 0. Denote by (R µ s,t ) t s 0 the inhomogeneous Markov semi-group defined by R µ s,t f (x) = Ef X σsµ,x t−s for all bounded measurable functions f , t s 0 and x ∈ E, where σ s is the shift of time s defined by (σ s µ) t = µ s+t for all (µ t ) t 0 and all t 0. Note that, at least formally for suitable functions f 0 , f t = R µ 0,t f 0 is a solution of (2.1) Definition 2.2. We say that m ∈ M(R + , P(E)) is a solution of (1.1) with initial distribution m 0 if m 0 R m 0,t = m t for all t 0.
Note that Definition 2.2 gives solutions in a quite weak sense, but in many cases additional informations on L allows to precise the meaning of (2.1) and thus obtain, by uniqueness of the solution, that a solution in the sense of Definition 2.2 is in fact a strong solution of (1.1).
For ν 1 , ν 2 ∈ P(E), we denote their total variation distance Assumption 2.3. The function ν → λ ν (Q ν − Id) is Lipschitz for the total variation norm, in the sense that there exists θ > 0 such that for all ν 1 , ν 2 ∈ P(E), Example. Let us consider a simple process in order to illustrate the assumptions in all this section. We consider a birth and death Markov chain on E = N. The linear Markov part is given by for some functions B, D : N 2 → R + for all probability measure ν on N, with D(0, k) = 0 for all k ∈ N. Then we set with the convention Q ν (x) = δ x if λ ν (x) = 0. In this example, Assumption 2.1 holds if B and D are bounded, and for a function f with f ∞ 1 and ν 1 , ν 2 two probability measures on N, which means Assumption 2.3 also holds in this case. This will be proven in Section 3.2.2 through a fixed point procedure. As far as the long time behavior of the process is concerned, let us first state a simple result: Assumption 2.5. There exist t 0 , α 0 such that for all x, y ∈ E, δ x P t0 − δ y P t0 T V 2(1 − α) .
Example. In the previous example, this is true for instance if there exists x 0 ∈ N such that b(x) + b ν (x) = 0 for all x x 0 for all ν, in which case we can restrict the chain to the finite state 0, x 0 . If b(x) > 0 for all x < x 0 and d(x) > 0 for all x > 0 then the Markov chain on 0, x 0 with generator (2.2) is irreducible on a finite state, hence satisfies Assumption 2.5.
Theorem 2.6. Under Assumptions 2.1, 2.3 and 2.5, if t → m t , h t are solutions of (1.1) with respective initial distributions m 0 , h 0 ∈ P(E) then, for all t 0, An equivalent result is proved in [10], for a particular model, via the Duhamel formula. We will rather establish it (in Sect. 3.2.2) with a similar but probabilistic point of view. This proof will introduce, in a simpler context, most of the ideas then used in the more general proof of Theorem 2.9 below.
The uniform Doeblin condition of Assumption 2.5 is quite demanding, in particular if E is not compact. The classical Foster-Lyapunov approach [30,41] for Markov processes usually gathers a local Doeblin condition on compact sets and the existence of a Lyapunov function that tends to decrease on average along a trajecteroy, away from some compact. The following counterpart of this strategy for non-linear perturbations seems to be new.
Lyapunov conditions. For all µ ∈ M(R + , P(E)), x ∈ E, t 0, and almost all u ∈ [0, 1], Note that (2.3) and (2.4) can usually be established if, for all ν ∈ P V , Assume that i.e. for large populations the death rate is predominant, uniformly in the non-linear birth rate. Let a, ρ > 0 be small enough and x 0 ∈ N be large enough so that Then for all ν ∈ P V and x x 0 , from which the first part of (2.8) holds with η = 0 and The second part of (2.8) corresponds to the same computation in the case B = D = 0, hence it holds with ρ * = −ρ. As a conclusion, in this example, Assumption 2.7 holds as soon as the rates B and D are bounded and c > 0. Besides, these conditions do not imply the stronger Assumption 2.5, which would require that the process with generator L comes from infinity in a finite time, which is for instance not the case if b and d are bounded (in which case it is easily checked that, at any fixed time t 0 , the probability that a process starting at x has reached a position lower than x/2 goes to 0 as x → ∞, which prevents Assumption 2.5 to hold with α > 0 uniformly in x, y).
Theorem 2.8. Under Assumptions 2.1 and 2.7, for all m 0 ∈ P V , there exists a unique solution t → m t of (1.1) with initial condition m 0 . Moreover, for all t 0, Theorem 2.9. Under Assumptions 2.1 and 2.7, if α > 0, denote , and for all m 0 ∈ P V (E) and all t 0, Remark that, for fixed values of the other parameters of Assumptions 2.1 and 2.7,κ < 1 if θ is small enough. Example. In the previous example, let θ = 2( B ∞ + D ∞ ). It is readily checked that all the estimates obtained in this example to establish Assumption 2.7 are uniform for small θ. As a conclusion, for fixed rates b, d satisfying b(x) > 0 for all x ∈ N, d(x) > 0 for all x > 0 and lim sup(b − d) < 0, there is an explicit θ 0 0 so that Theorem 2.9 holds withκ < 1 as soon as B and D are such that θ θ 0 .

Weakly interacting Markov particles
Let N ∈ N and, for i ∈ 1, N , let E i be a Polish space and (P i t ) t 0 be a conservative Feller semi-group on for all x ∈ E, t 0 and all measurable bounded functions where, for all z ∈ E i , (Z z,i t ) t 0 is a Markov process associated to (P i t ) t 0 with initial value z. In particular, if (U 1 , . . . , U N ) are i.i.d. r.v. uniformly distributed over [0, 1] and x ∈ E then H i t (x i , U i ) i∈ 1,N ,t 0 is a Markov process on E associated to (P t ) t 0 with initial value x. For all i ∈ 1, N let λ i be a jump rate on E, i.e. be a measurable function from E to R + , and Q i be a measurable function from E to P(E i ) with representation G i . We extend Q i as a Markov kernel Q i on E defined for all x ∈ E by Assumption 2.10. There exists λ * > 0 such that, for all x ∈ E and all i ∈ 1, N , Under Assumption 2.10, let Q be the Markov kernel on E defined by for all x ∈ E, and let G be a representation of Q. To jump from a position x to a r.v. drawn according to Q(x) is equivalent to chose a coordinate i uniformly in 1, N and, with probability λ i (x)/λ * , make it jump to a new position drawn according to Q i (x), else leave it at its current position, and in either case leave all the other coordinates unchanged. Let x ∈ E. We define a Markov process (X x t ) t 0 = (X x i,t ) i∈ 1,N ,t 0 on E as follows. Consider an i.i.d. sequence (S k , (U i,k ) i∈ 1,N , V k ) k∈N where, for all k ∈ N, S k , V k and (U i,k ) i∈ 1,N are independent one from the other, S k follows an exponential law with parameter 1, V k a uniform distribution over [0, 1] and (U i,k ) i∈ 1,N are i.i.d. r.v. uniformly distributed over [0,1]. Set T 0 = 0, X x 0 = x and, for all n ∈ N, T n+1 = T n + (N λ * ) −1 S n . Suppose that (X x t ) t∈[0,Tn] have been defined for some n ∈ N and is independent from (S k , (U i,k ) i∈ 1,N , V k ) k n . For all t ∈ (T n , T n+1 ) and all i ∈ 1, N , set X x i,t = H i t−Tn (X x i,Tn , U i,n ). Finally, set By induction, (X x t ) t∈[0,Tn] is thus defined for all n ∈ N and, since T n is the n th jump time of a Poisson process, it goes almost surely to +∞ as n goes to infinity, so that X t is almost surely defined for all t 0. Let (R t ) t 0 be the associated Markov semi-group, i.e.
for all bounded measurable functions f on E.
Example. Let us briefly introduce the particle system which is the analogous of the running example of the previous section, when the non-linearity is replaced by a mean-field interaction. The Markov part is independent from the particle, i.e. P i t = P t for all i, and it is the semi-group associated to the generator given by (2.2). Given B, D as in the previous section, set Then the interacting jump mechanisms are given by Then, all the assumptions stated in the rest of this section can be proven under suitable conditions on b, d, B, D similarly to the non-linear case of the previous section. The computations being exactly the same, we will not repeat this discussion.
Consider on E the l 0 distance defined bȳ for all x, y ∈ E, where A denote the cardinality of a finite set A. Remark that the topology induced byd 1 is equivalent to the trivial one, with more precisely Assumption 2.11. There exists θ, t 0 , α 0 such that, for all i ∈ 1, N , the following holds.
Lipschitz condition. For all x, y ∈ E with y i = x i , Here is the analogous of Theorem 2.6 for interacting particles.
Theorem 2.12. Under Assumptions 2.10 and 2.11, for all m 0 , h 0 ∈ P(E) and t 0, Moreover, denoting m t and h t the respective laws of X I,t and Y I,t where X t ∼ m 0 R t , Y t ∼ h 0 R t and I is uniformly distributed over 1, N and independent from X t and Y t , then Note that, using a naive approach (trying to merge at time t 0 two whole systems of particles), it is easy to prove that the condition (2.12) together with Assumption 2.10 imply the uniform Doeblin condition but then the contraction rate is geometrically poor with respect to N . This is why the Foster-Lyapunov approach is sometimes thought to scale badly with dimension. Let us show how Theorems 2.12 may apply in the case of a mean-field particle system associated to a non-linear process such as defined in Section 2.1. Suppose that E i = E 1 , P i t = P 1 t and for all i ∈ 1, N and x ∈ E = E N 1 , where λ ν and Q ν are given in Section 2.1 and is the empirical distribution of x. Remark that, for x, y ∈ E, Indeed, (x I , y I ) with a r.v. I uniformly distributed over 1, N is a coupling of µ N (x) and µ N (y) with P(x I = y I ) =d 1 (x, y)/N . Hence, Assumptions 2.1, 2.3 and 2.5 imply Assumptions 2.10 and 2.11, with the same λ * , θ, t 0 , α. If the initial positions (X i,0 ) i∈ 1,N are i.i.d. r.v. with a given law m 0 ∈ P(E 1 ), then the particles are exchangeables, in the sense that for all t 0, (X σ(i),t ) i∈ 1,N and (X i,t ) i∈ 1,N have the same distribution for all permutation σ of 1, N . In particular, if I is uniformly distributed over 1, N and independent from (X t ) t 0 then X I,t and X 1,t have the same law. This means that, under Assumptions 2.1, 2.3 and 2.5, then holds with either m t and h t solutions of the non-linear equation (1.1) or the law of the first particle of the associated system. We go back to the general case. In order to relax the strong uniform Doeblin condition (2.12) under the assumption that the process admits a suitable Lyapunov function, we now state an analogous of Theorem 2.9 for the particle system. Assumption 2.13. There exist θ 0, ρ, t 0 > 0, ρ * ∈ R, α ∈ (0, 1], η ∈ [1/2, 1), M, γ * 1 and, for all i ∈ 1, N , a measurable V i : , the following holds for all i ∈ 1, N . Lyapunov conditions. For all x ∈ E, t 0, and almost all u ∈ [0, 1], Under Assumption 2.13, denoted V andρ the distances, respectively on E and P V (E), defined for all x, y ∈ E byd and for all µ, ν ∈ P V (E) byρ (See Sect. 3.1 for a more detailed definition of couplings.) Remark that µ − ν T V ρ(µ, ν) and that µ − ν V Nρ(µ, ν). Note also that, in the case of a particle system associated to a mean-field equation, i.e. if the particles are exchangeables and V i = V j for all i, j ∈ 1, N , denoting µ and ν the laws of X 1 and Y 1 when X ∼ µ and Y ∼ ν (hence of X I and Y I with an independent I uniform over 1, N ) then µ − ν V1 ρ(µ, ν)/N . Theorem 2.14. Under Assumption 2.10 and 2.13, denote Suppose thatκ < 1. Then, (R t ) t 0 admits a unique invariant measure µ ∞ and for all m 0 ∈ P V (E) and all t 0,ρ Note that, in the case of a system of interacting particles associated to a mean-field equation, the assumptions of Theorem 2.14 are slightly stronger than the assumptions of Theorem 2.9. Indeed, the right-hand side of (2.16) involvesd 1 instead ofd V , which would be the analogous of (2.6). In other words, in this mean-field case, the assumptions of Theorem 2.14 are equivalent to the assumptions of Theorem 2.9 with the addition of Assumption 2.3. The interesting question of weakening the condition (2.16) in Theorem 2.14 raises non-trivial considerations about the non-independence of V i (X i,t ) for different i ∈ 1, N , and is postponed to a future work.

General considerations on couplings
Let us first give an alternative representation of the V-norm, for a given measurable function V from E to Remark that it induces the trivial topology on E, since d V (x, y) 21 x =y for all x, y ∈ E. Hairer and Mattingly proved in [30] that where, for all measurable ϕ from E to R, Let us now give a coupling interpretation of · V . For µ 1 , µ 2 ∈ P(E), we denote ξ(µ 1 , µ 2 ) the set of transference plane between µ 1 and µ 2 , i.e. the set of probability measures ν on E 2 such that (X, Y ) ∼ ν implies that X ∼ µ 1 and Y ∼ µ 2 . Such a random variable (X, Y ) is called a coupling of µ 1 and µ 2 . Alternatively, in that case, if Z i ∼ µ i for i = 1, 2, then we also say that (X, Y ) is a coupling of Z 1 and Z 2 .
Proof. Note that, according to (3.1), · V is the Wasserstein-1 distance on P(E) associated to the distance d V on E, so that the first statement of the lemma stems from the general result of duality for Wasserstein distances. Nevertheless, in the general case, the coupling would depend on the metric. Let us then give an elementary proof in the present specific case, straightforwardly adapted from the classical case of the total variation distance.
In other word, the measureν on E defined byν(dx) = E 1 x=y ν(dx, dy) satisfiesν µ 1 ∧ µ 2 . Therefore, writing 1 x =y = 1 − 1 x=y , It now remains to construct an optimal coupling, i.e. a measure ν ∈ ξ(µ 1 , µ 2 ) such that equality holds. Let p = µ 1 ∧ µ 2 (E). If p = 1 then µ 1 = µ 2 and the optimal coupling is to draw X according to µ 1 and then set Y = X. If p = 0 then µ 1 and µ 2 are singular one to the other, and all couplings are optimal since If p ∈ (0, 1), we consider the probability measures Remark that ν 1 and ν 2 are singular one to the other, and that Conditionning on the values of B, which concludes.
This representation of · V is convenient since, as soon are we are able to construct a coupling of two probability measures, we get an upper bound on their distance.

2)
and moreover, We call such a process an optimal coupling of (δ x P t ) t 0 and of (δ y P t ) t 0 at time t 0 .
Proof. Let (X x , X y ) be an optimal coupling of δ x P t0 and δ y P t0 such as constructed in Lemma 3.
. Then by usual conditionning arguments, (W t ) t∈[0,t0] is an inhomogeneous Markov process with transitions for all 0 s t t 0 and all events A, B of E. Let (R z s,t ) 0 s t t0 be the inhomogeneous semi-group associated to In other words, we have defined (Z z t ) t∈(0,t0) to be a Markov bridge from z to X z . Then, set Z z t = H t−t0 (X z , U ) for all t t 0 and z = x, y, where U is uniformly distributed over [0, 1] and independent from (Z x t , Z y t ) t∈[0,t0] , and H is a representation of z → (V z t ) t 0 . By the Markov property, (Z z t ) t 0 is then a Markov process with initial condition Z z 0 = z and such that Z z t0 = X z . Define a second coupling (Z x t ,Z y t ) t 0 as follows. Let τ = inf{t 0 : By the strong Markov property, (Z z t ) t 0 is a Markov process associated to (P t ) t 0 withZ z 0 = z, and the two other conditions are satisfied. Examples of couplings such that (3.2) holds and (3.3) does not are easily constructed for discrete-time Markov chains. Here is an example with continuous time. Consider the process (X t ) t 0 on [−1, 1] with generator In other words, starting from where T is an exponential r.v. with parameter 1 and X |x|+T = R where R is a Rademacher r.v. with parameter 1/2. It is clear that if |x| = |y| then δ x P t = δ y P t for all t |x|, since δ x P |x| = δ 0 . Now consider for some t 0 > 2 the processes (X t , Y t ) t 0 that jump at the same time |x| + T , the first one to a Rademacher r.v. R, the second one to R = R(1 − 21 |x|+T <t0−1 ), which is indeed a Rademacher r.v. independent from T . In other words, as long as the jump occurs early enough for the processes to have the time to come back to 0 before time t 0 , then we send them to opposite points, else to the same one. Obviously (3.2) holds and (3.3) does not.

Study of the non-linear process
We use in this section the notations of Section 2.1. In particular we consider a Markov semi-group (P t ) t 0 and non-linear jump rate and kernel ν → λ ν , Q ν .

Coupling with different inhomogeneous jump mechanisms
Before giving the formal construction of the coupling, let us briefly highlight the general strategy: we want to define simultaneously two processes, driven by different time-inhomogeneous jump mechanisms, in such a way that, as much as possible, they are equal at some times. To do so, we use the linear Markov part of the dynamics in order to merge the two processes with some positive probability under a Doeblin condition (provided that no non-linear jump occurs meanwhile). To define the non-linear jump, we force the two processes to jump as much as possible simultaneously and, as much as possible, to the same location. Thanks to the Lipschitz assumptions on the jump mechanismsn the probability to achieve this is controlled by the distance between the time-dependent measures associated with the processes.
Consider µ 1 , µ 2 ∈ M(R + , P(E)). For all x, y ∈ E, t 0 and i = 1, 2, denotẽ Then Q = t and Q i, = t for i = 1, 2 are Markov kernels from E 2 to P(E) such that, for all x, y ∈ E and i = 1, 2, and, as shown in the proof of Lemma 3.1, and similarly with V = 1 for Assumption 2.3. For all t 0 and i = 1, 2 let , the latter being the law of the position after a jump at time t from the position x (except that, with respect to the initial kernel Q µ i t , we have artificially increased the jump rate to the upper bound λ * , at the cost of adding "phantom jumps" which are jumps from x to x). The decomposition (3.4) gives the following way to sample a position according toQ 1 t (x) (when the other process is at a position y): with probability p t (x, y), draw a position according to Q = t (x, y), otherwise draw a position according to Q 1, = t (x, y). The same goes forQ 2 t (x) and since Q = t is the same for i = 1, 2 we can take the same random variable for the two processes in that case. In other words, Q = t is the law of the common position of the process after a synchronous jump, while Q i, = t for i = 1, 2 are the law of the positions when the coupling fails.
We define a Markov process (X t , Y t ) t 0 on E 2 as follows. Let m 0 , h 0 ∈ P(E) and let (X 0 , Y 0 ) be an optimal coupling of m 0 and h 0 such as constructed in the proof of Lemma 3.1. For a given t 0 , let (Z t ,Z t ) t 0 be an optimal coupling of (δ X0 P t ) t 0 and (δ Y0 P t ) t 0 in the sense of Lemma 3.2. Consider an i.i.d. sequence (S k , U k , V k , W k ) k∈N , independent from (Z t ,Z t ) t 0 where, for all k ∈ N, S k , U k , V k and W k are independent one from the other, S k follows a standard exponential law with parameter 1 and U k , V k and W k a uniform distribution over [0, 1]. Set T 0 = 0, T n+1 = T n + S n /λ * . for all n ∈ N and suppose that (X t , Y t ) t∈[0,Tn] have been defined for some n ∈ N and are independent from If V n p Tn X Tn+1 ,Ỹ Tn+1 then set and else set Proof. By symmetry, we only consider the case of (X t ) t 0 . Note that in the definition of (X t ) t 0 , as in the definition of (X µ 1 ,x t ) t 0 in Section 2.1, (T n ) n 0 is a Poisson process with intensity λ * . Moreover, for all n 0, conditionally to X Tn , (X t ) t∈[Tn,Tn+1) is a Markov process associated to (P t ) t 0 , i.e. has the same distribution as (Z X Tn t ) t∈[0,Tn+1−Tn) , and the same goes for (X µ 1 ,x t ) t∈[Tn,Tn+1) . As a consequence, it only remains to check that the Markov chain (X Tn ) n∈N has the same distribution as the Markov chain (X µ 1 ,x Tn ) n∈N , which is equivalent to say that they have the same transition kernel. Yet, for any bounded measurable f on E, Hence, Similarly, and again which concludes.

The total variation case
Proof of Proposition 2.4. For (µ 1 t , µ 2 t ) t 0 , m 0 , h 0 ∈ P(E), consider the associated process (X t , Y t ) t 0 defined in Section 3.2.1. From Proposition 3.3, for all t 0, where we used (3.5). Note thatτ split is independent from (X 0 , Y 0 ), so that Since (X 0 , Y 0 ) is an optimal coupling of m 0 and h 0 , using that 1 − exp(−a) a for all a ∈ R, we have thus obtained Now, remark that, for fixed m 0 ∈ P(E), . Set t 1 = 1/(2θ) and consider Ψ : In other words Ψ is a contraction of L ∞ ([0, t 1 ], P(E)), which is complete, hence admits a unique fixed point, which is a solution of (1.1) in the sense of Definition 2.2 on [0, t 1 ]. Then there exists a unique solution on [nt 1 , (n + 1)t 1 ] for all n ∈ N, thus on R + . Considering two such solutions with respective initial distributions m 0 and h 0 , (3.6) reads and Gronwall's Lemma concludes.
Proof of Theorem 2.6. Let t → m t , h t be two solutions of (1.1). Consider again the proces (X t , Y t ) defined in Section 3.2.1, with µ 1 = m and µ 2 = h. In particular, In the proof of Proposition 2.4, we established that Now, for any t 0, using again Proposition 2.4, we get

The general V norm case
Lemma 3.4. Under Assumptions 2.1 and 2.7, for all µ 1 , µ 2 ∈ M(R + , P(E)), x ∈ E and t 0, Proof. Consider the process (X t , Y t ) t 0 defined in Section 3.2.1 with X 0 = Y 0 = x, and t 1 0. As in the proof of Proposition 2.4, let τ split = inf{T n : n ∈ N, V n p Tn (X Tn+1 ,Ỹ Tn+1 )}. Since X t = Y t for all t < τ split , and we bound Set Γ = (S k , V k ) k∈N , and remark thatτ split and K = {n ∈ N : T n t 1 } are deterministic functions of Γ, while (Z t ,Z t ) t 0 and (U k , W k ) k∈N are independent from Γ. Using (2.4) and (2.5), for all n ∈ N, Similarly, and then dy direct induction, Besides, (V k ) k∈N is independent from K, and conditionnally to K, Since K follows a Poisson law with parameter λ * t 1 , which concludes.
Proof of Theorem 2.8. Let t → µ 1 t , µ 2 t be two measurable functions from R + to P V , and m 0 ∈ P V . From Lemma 3.4, Let t 1 be small enough so that Let A be the set of measurable functions t → µ t from [0, Remark that A is not empty since it contains the function with constant value m 0 . As a closed subset of L ∞ ([0, t 1 ], P V ), A is complete with the distance Proof. Consider (X 0 , Y 0 ) an optimal coupling of m 0 and h 0 such as given by Lemma 3.1 and let (X t , Y t ) be an optimal coupling of δ X0 R m 0,t and δ Y0 R h 0,t , so that As in the proof of Theorem 2.8, the Gronwall's Lemma yields, for all t 0, For t 0 and n 0, applying n times this result with the time t/n we get where we used that, from (2.9), m kt/n (V) M/(1 − η) + 1 for all k ∈ N. Letting n go to infinity, which concludes.
Proof of Theorem 2.9. Applying (2.3) with x = x 0 and µ s = δ x0 for all s 0 where x 0 ∈ E is such that V(x 0 ) is arbitrarily close to inf V, we obtain that inf V M (1 − η). In particular, P V,1 (E) := {ν ∈ P V , ν(V) M (1 − η) + 1} is not empty since it contains δ x 0 for all x 0 ∈ E with V(x 0 ) sufficiently close to inf V. From (3.7), if t → m t is a solution of (1.1) with m 0 ∈ P V,1 (E) and then m t ∈ P V,1 (E) for all t 0.
Let m 0 , h 0 ∈ P V,1 (E) and (X 0 , Y 0 ) be an optimal coupling of m 0 and h 0 given by Lemma 3.1, i.e. be such that, for any β > 0, Conditionning on the initial value, let (X t0 , Y t0 ) be an optimal coupling of δ X0 R m 0,t0 and δ Y0 R h 0,t0 , so that .
Considering κ and β such as defined in Lemma 3.6 and using Lemma 3.4 and the fact that Together with Lemma 3.5 and the fact that, · V ρ β (since d V d β ), this means By assumption,κ < 1, so that Ψ t0 : P V,1 (E) → P V,1 (E) that maps m 0 to m t0 where t → m t is a solution of (1.1) is a contraction. If a sequence (µ n ) n∈N in P V is such that ρ β (µ n , ν) → 0 for some ν ∈ P(E) as n → ∞, then µ n (V) → ν(V). Hence, P V,1 (E) is a closed subset of P V (E) endowed with the metric ρ β , which is complete. As a consequence, Ψ t0 admits a unique fixed point µ ∞ in P V,1 (E), which for now may depend on t 0 . From (2.9), for all n ∈ N and m 0 ∈ P V (E), which, applied to m 0 = µ ∞ and letting n go to infinity implies that V(µ ∞ ) M/(1 − η). More generally, (2.9) means that for all m 0 ∈ P V (E) and all t s 0 := ln(m 0 (V))/(ρ(1 − η)), m t ∈ P V,1 (E). For all ν ∈ P V,1 (E), Combining this with (3.9), we obtain that, for all m 0 ∈ P V (E), Sinceκ e −ρ(1−η)t0 ,κ −s0/t0 m 0 (V). We have then obtained, for all t 0 and all m 0 ∈ P V , In particular, if m 0 = µ ∞ , since in that case m s = h s+nt0 for all s 0 and n ∈ N, letting n go to infinity, we get that m s = µ ∞ for all s 0, in other words µ ∞ is an equilibrium of (1.1). Finally, ρ β · V , which concludes.

Study of the particle system
We use in this section the notations of Section 2.2. In particular for i ∈ 1, N we consider a Markov semi-group (P i t ) t 0 and jump rate and kernel λ i , Q i .

Coupling for interacting particles
For all x, y ∈ E and i ∈ 1, N , denotẽ and p i (x, y) = (Q i (x) ∧Q i (y))(E i ). If p i (x, y) = 0, set If p i (x, y) ∈ (0, 1), set Finally, if p i (x, y) = 1, set Then Q = i and Q = i are Markov kernels from E 2 to P(E i ) such that, for all x, y ∈ E, and, as shown in the proof of Lemma 3.1, and similarly for (2.16). y). Let x, y ∈ E. We define a Markov process (X t , Y t ) t 0 = (X i,t , Y i,t ) i∈ 1,N ,t 0 on E 2 and an auxiliary Markov process (J t ) t 0 on 1, N as follows. For a given t 0 and all i ∈ 1, N let (Z i,t ,Z i,t ) t 0 be an optimal coupling of (δ x P i t ) t 0 and (δ y P i t ) t 0 in the sense of Lemma 3.2, independent one from the other for j = i. Consider an i.i.d. sequence (S k , (U i,k ) i∈ 1,N , V k , W k , I k ) k∈N , independent from (X 0 , Y 0 ) and (Z i,t ,Z i,t ) t 0,i∈ 1,N and where, for all k ∈ N, S k , V k , W k , I k and (U i,k ) i∈ 1,N are independent one from the other, S k follows a standard exponential law with parameter 1, V k and W k (resp. I k ) a uniform distribution over [0, 1] (resp. over 1, N ), and (U i,k ) i∈ 1,N are i.i.d. r.v. uniformly distributed over [0, 1]. Set T 0 = 0, X 0 = x, Y 0 = y and J 0 =d 1 (x, y)/2 and suppose that T n 0 and (X t , Y t , J t ) t∈[0,Tn] have been defined for some n ∈ N and are independent from (S k , (U i,k ) i∈ 1,N , V k , W k , I k ) k n . Set T n+1 = T n + (N λ * ) −1 S n .
For all i ∈ 1, N and all t ∈ (T n , T n+1 ), set J t = J Tn , If V n p In X Tn+1 ,Ỹ Tn+1 then set and else set If V n 1 − θJ Tn /(N λ * ) and x In = y In then set J Tn+1 = J Tn + 1, else set J Tn+1 = J Tn .
For all i ∈ N, T i = {T n > 0 : I n = i, n ∈ N} are the jump times of a Poisson process of intensity λ * , independent from T j for j = i and from (Z i,t ,Z i,t ) t 0 . As a consequence, for all i ∈ 1, N with x i = y i , 1 i∈At is a Bernoulli r.v. with parameter exp(−λ * t)P(Z i,t =Z i,t ), independent from 1 j∈At if j = i and from J t . In particular, On the other hand, the generator of the Markov process (J t ) t 0 is .
Applied to f (s) = s, this yields and thus On the other hand, for all t ∈ [0, t 0 ], Considering on P(E) the distance ρ defined by we have thus obtained that, for all t ∈ [0, t 0 ] and x, y ∈ E, Finally, for all t 0 and x, y ∈ E, 2N 1 x =y e θt0 e θt0 − αe −λ * t0 t/t0 , which, integrated with respect to any initial distribution, concludes the proof of the first statement of Theorem 2.12. Now, let m t and h t be the respective distributions of X I,t and Y I,t , where I is independent from (X t , Y t ) t 0 . Then Considering an optimal coupling (X 0 , Y 0 ) of m 0 and h 0 and, conditionally to (X 0 , Y 0 ), an optimal coupling (X t , Y t ) of δ X0 R t and δ Y0 R t , = N m 0 − h 0 T V e θt0 e θt0 − αe −λ * t0 t/t0 , which concludes.

The general V case
In all this section, Assumptions 2.10 and 2.13 are enforced. For i ∈ 1, N and x ∈ E, set Lemma 3.8. For all t 0 and x ∈ E, and i ∈ 1, N , Proof. Summing (2.13) over i ∈ 1, N reads for all t 0 and x ∈ E. The Gronwall's Lemma then yields (3.11) which, reintegrated in (2.13), gives Applying R s for some s 0 to this inequality and using the semi-group property, we have thus obtained for all s, t 0 Hence, for all n ∈ N and t 0, Letting n go to infinity yields which is (3.12).
Similarly to [30] or Section 3.2.3, we now consider some parameter β > 0 and, for all x, y ∈ E and i ∈ 1, N , Lemma 3.9. For all i ∈ 1, N and all x, y ∈ E with x i = y i , let (X t , Y t ) t 0 be the coupling of (δ x R t ) t 0 and (δ y R t ) t 0 defined in Section 3.3.1. Then Proof. This is again essentially the proof of Theorem 3.1 of [30]. We bound From (3.12), 2 , as in the previous section, from (2.17), where we used the definition (3.13) where we used again the definition (3.13) of β.
Proof. Let t 1 0 and i ∈ 1, N be such that x i = y i . For all j ∈ 1, N , let τ j = inf{T n 0 : n ∈ N, I n = j, V n 1 − θJ Tn /N }. By construction and Lemma 3.7, Let Γ = (S k , V k , I k ) k∈N , K = {n ∈ N * : T n t 1 } and for all j ∈ 1, N , A j = {n ∈ N * : T n t 1 , I n = j} and K j = A j . Note that K and {τ j , K j } j∈ 1,N are deterministic functions of Γ, and in particular are independent from (U i,k ) i∈ 1,N ,k∈N . Hence, for all n ∈ N and j ∈ 1, N , using (2.14) and (2.15), Similarly, and then by direct induction, We have thus obtained for all j ∈ 1, N For j ∈ 1, N , set Γ j = {(S k , V k , I k ) : I k = j, k ∈ N}. Remark that (J t ) t 0 is a deterministic function of {Γ j : j ∈ 1, N , x j = y j } and that, by Poisson thinning, Γ j (hence K j ) is independent from Γ k for all k = j.
In particular, if j ∈ 1, N is such that x j = y j , For j ∈ 1, N with x j = y j we define the process (J j,t ) t 0 as follows. Set J j,0 = 0 and, for all n ∈ N and t ∈ (T n , T n+1 ), set J j,t = J j,Tn and J j,Tn+1 = J j,Tn + 1 In =j 1 Vn 1−θJ j,Tn /(N λ * ) .
In other words, (J j,t ) t 0 is like (J t ) t 0 but ignoring the jumps of the j th particle. By thinning of Poisson processes, (J j,t ) t 0 is independent from Γ j . Moreover J j,t J t for all t and, conditionnally to τ j > t, J j,t = J t . As a consequence, if i = j, we bound and thus, for all j = i with x j = y j , Using that (J i,t ) t 0 , K i and (V n ) n∈Ai are independent r.v. and that, conditionally to With the same computations as in the proof of Lemma 3.4, this leads to where we used that, from (3.10), E(J t ) exp(θt)d 1 (x, y)/2 for all t 0. The case of j = i with x j = y j is identical and, similarly (i.e. applying this bound with γ * = 1), Together with (3.19), this yields Proof of Theorem 2.14. Let β and κ be given by Lemma 3.9, and consider on P(E) the distanceρ β given for all µ, ν ∈ P(E) byρ Lemmas 3.9 and 3.10 means that for all x, y ∈ E, Since η 1/2 by assumption, and we get thatρ Then for all initial distributions µ, ν ∈ P(E), conditioning with respect to the initial values As in the proof of Theorem 2.9, the assumption thatκ < 1 implies that R t0 admits a unique fixed point µ ∞ in P V (E). Moreover, by the semi-group property, for all s 0, so that µ ∞ R s is a fixed point of R t0 and thus by uniqueness it is equal to µ ∞ . Integrating (3.11) with respect to µ ∞ and letting t go to infinity yields µ ∞ (V) N M/(1 − η), and thus for all ν ∈ P V (E), Finally, for all ν ∈ P V (E) and t 0 where we used (3.11). The conclusion then follows from the fact thatρ β d V for all β 0.

Examples
This section provides illustrations of our main results. For the sake of clarity and since the approach would be the same in more general or sophisticated applications, the models are chosen to be simple in order to highlight the core arguments.
As has already been mentionned, the neuron network model of [10] provides a first example where Theorem 2.6 applies.

Mean-field run-&-tumble process
Run-&-tumble processes model, among other things, the motion of some bacteria [26,46] (see also [9,27,29] for more recent details and references). The rate at which a bacterium tumbles depends on the concentration of given chemo-attractants in the medium. For a large population of bacterium, mean-field interaction is a natural extension of these dynamics.
We consider the non-linear integro-differential equation on E = R × {−1, 1} given by  The linear case θ = 0 corresponds to the run-&-tumble process attracted to a neighborhood of origin studied in [29] (except that we don't assume that r is non-decreasing). Indeed, when |x| is large, the jump rate r(xy) is larger when xy > 0, namely when the process is drifting away from a given compact, than when xy < 0, i.e. when the process is going toward the compact. Similarly, the case θ = 1 would correspond to a process attracted toward the barycenter of its law. For θ ∈ (0, 1), the process is attracted to an average of the origin and of the barycenter of its law. Note that, if s → r(s) − r(0) were anti-symmetric, then the linear process would admit a symmetric equilibrium, which would then be an equilibrium for the non-linear equation since its mean is zero. In the following, this symmetry is not assumed. A similar multi-dimensional process could be considered with the same assumptions as in [27] to ensure the existence of a Lyapunov function. The arguments below could then be straightforwardly adapted.
Proposition 4.1. Assume (4.2) holds for some 0 < a < b and c := inf s∈R r(s) > 0. Then there exist C, θ * > 0 and κ ∈ (0, 1) such that, if θ θ * , then (4.1) admits a unique equilibrium µ ∈ P(E) such that for all solution t → m t of (4.1) and all t 0, Proof. Let us establish that Assumption 2.7 holds for (4.1). To recover the notations of Section 2.1, consider the generator L on E given by the jump rateλ ν (x, y) = r(y(x − θx ν )) − c and the kernel Q ν (x, y) = Q(x, y) = δ (x,−y) . In particular, for z = (x, y) ∈ E, and for all V and k > 0 such that V(x, y) k|x| for all (x, y) ∈ E. The Markov process with generator L is the so-called integrated telegraph process and it is clear, either by simple controllability argument as in [43], more precise coupling estimates as [29] or just explicit computations of the density transition, that for any compact set K of R there exist t 0 , α > 0 such that for all x, x ∈ K and y, y ∈ {−1, 1}, the Doeblin condition holds. Note that t 0 and α only depends on K and c. It only remains to construct a Lyapunov function V that satisfies (2.3), (2.4) and (2.5) and such that V(x, y) k|x| for all (x, y) ∈ E for some k > 0. Given a measurable function t → µ t from R + to P(E), recall that we denote (R µ s,t ) t s 0 the semi-group associated to the inhomogeneous process defined in Section 2.1 with the generator (x, y)) .  As a smooth function in x, it belongs to the domain of the generalized generator of L t for all t 0, see [15,21].
For (x, y) ∈ E with |x| 1, Take R 0 1 large enough so that r(z) > (7b + a)/8 and r(−z) < (b + 7a)/8 for all z R 0 . Then, for all t 0 and (x, y) ∈ E with |x| R 0 + θ|x µt |, if xy > 0 then Hence, for all x, y ∈ E with |x| R 0 + θ|x µt | and all t 0, We have obtained that, for all (x, y) ∈ E and all t 0, For all (x, y) ∈ E, let (X t , Y t ) t 0 be a process associated to the semi-group (R µ s,t ) t s 0 and initial conditions (x, y) and let N t be its number of jumps in [0, t], which is stochastically less than a r.v. with Poisson law with parameter r ∞ t. For n ∈ N, let τ n = inf{t, N t n, |X t | n}, which almost surely goes to infinity as n goes to infinity (note that |X t | |x| + t for all t 0). For all n ∈ N, the Dynkin formula yields Letting n go to infinity we obtain (2.3). If θ θ * 1/M , η 2/3.
The two other Lyapunov conditions of Assumption 2.7 are readily checked. Indeed, similar computations yield LV(x, y) (b + π/2)V(x, y) and then y) and, since G ν ((x, y), u) = (x, −y) for all ν ∈ P(E), (x, y) ∈ E and almost all u ∈ [0, 1], Condition (2.7) is deduced from (4.4). Finally, (2.6) ensues from (4.3) since for all (x, y) ∈ E. As a conclusion, if θ θ * 1/M , Assumption 2.7 holds, and Theorems 2.8 and 2.9 state that (4.1) admits a solution for all initial condition m 0 ∈ P V and that, provided θ * is sufficiently small (depending on r or more precisely on a, b, c and r ∞ ), then (4.1) admits a unique equilibrium toward which all solutions t → m t with m 0 ∈ P V (E) converges geometrically. The equivalence of V and W, hence of · V and · W , concludes.

MCMC for granular media equilibrium
Let E 1 = (R/Z) d for some d ∈ N * and let U : E 1 → R and W : E 2 1 → R be C 1 functions respectively called the exterior and interaction potential. We want to sample according to the probability law µ V on E 1 with density propotional to exp(−βV ), where β > 0 is the so-called inverse temperature and V solves In fact, this problem doesn't necessarily have a unique solution. More precisely, V solves (4.6) iff µ V is an equilibrium of the McKean-Vlasov (or granular media) equation [11,37,40], which admits a unique such equilibrium at large temperature [52,53], i.e. for β smaller than some threshold β 0 > 0. We will only consider this large temperature regime. Sampling according to µ V can be achieved through interacting particles MCMC. To that purpose, we consider three classical Markov chains: the Metropolis-Hastings (MH) chain with Gaussian proposal, the Unadjusted Langevin Algorithm (ULA) and the Metropolis Adjusted Langevin Algorithm (MALA), which we now define.
Let N ∈ N * and, for x ∈ E = E N 1 , write For i ∈ 1, N , we denote by e i the (N d) × d matrix whose d × d blocks are all zero except the i th one which is the d-dimensional identity matrix. In other words, e i is such that, if x ∈ E and y ∈ R d , then x + e i y = (x 1 , . . . , for some τ > 0 and let q : E 2 1 → R + be a symmetric Markov density kernel, i.e. be such that for all x, q(x, ·) is the density of a probability measure on E 1 and q(x, y) = q(y, x) for all x, y ∈ E 1 . For instance, the image on the periodic torus of q(x, y) ∝ exp(−|x − y| 2 /(2σ 2 )) for some σ > 0, or q(x, y) ∝ 1. Define on E the Markov kernels Q M H , Q U LA and Q M ALA by Let x ∈ E, and consider independent r.v. Y , Z, U and I where Y follows the standard (mean zero, variance one) Gaussian distribution, and Z, U and I the uniform one respectively on E 1 , [0, 1] and 1, N . Then More discussions, motivations and comparison of these processes can be found in [7,8,23] and references within. As far as the present work is concerned, these three cases are similar, so that we focus in the following on the MH case alone, with q(x, y) ∝ 1. For a given jump rateλ > 0, consider the continuous-time Markov chain on E with generator Let (R t ) t 0 be the associated semi-group. The associated process is constructed as follows. Let (S k , I k , U k , Z k ) k 0 be an i.i.d. sequence where, for all k ∈ N, S k is a standard exponential r.v., and Z k , U k and I k are uniformly distributed respectively on E 1 , [0, 1] and 1, N . Let x ∈ E, set X i,0 = x i for all i ∈ 1, N , T 0 = 0 and T n+1 = T n + (Nλ) −1 S n for all n ∈ N. Suppose that (X t ) t∈[0,Tn] has been defined for some n ∈ N and is independent from (S k , I k , U k , Y k ) k n . Set X t = X Tn for all t ∈ (T n , T n+1 ). For all j = I n , set X j,Tn+1 = X j,Tn . If U n < p I,1 (X Tn , Z n − X I,n ), set X In,Tn+1 = Z n and else, set X In,Tn+1 = X In,Tn . Proposition 4.2. Suppose that β =β/ ((osc(U ) + osc(W ))) withβ such that Then (4.6) admits a unique solution V and, moreover, for all m 0 , h 0 ∈ P(E) and all t 0, Proof. To recover the notations of Section 2.2, denote p * = exp(−β(osc(U ) + osc(W )) and for all i ∈ 1, N consider the generator L i on E i = E 1 defined by for all z ∈ E 1 and bounded measurable function f on E 1 . Let (P i t ) t 0 be the associated Markov semi-group, which is simply For all i ∈ 1, N and x ∈ E, set λ i (x) =λ(1 − p * ) and, for a measurable bounded f on E i , Remark that p * p i (x, y) for all i ∈ 1, N , x ∈ E and y ∈ E 1 , so that Q i is indeed a Markov kernel. Then the MH process with generator L M H is the system of particles such as defined in Section 2.2 associated to ((P i t ) t 0 , λ i , Q i ) i∈ 1,N . Note that for all i ∈ 1, N and x, z ∈ E, If x, z ∈ E are such that x i = z i , then an optimal coupling of Q i (x) and Q i (z) is constructed as follows. Let Y and U be independent r.v. uniformly distributed over, respectively, E 1 and [0, 1]. Set and We have thus obtained, for all i ∈ 1, N and x, z ∈ E with x i = z i , which is (2.11), with θ = 4λ(1 − p * )osc(W )β exp(β). On the other hand, the Doeblin condition (2.12) is clear, since (4.7) immediately yields that for all t 0 and x, y ∈ E 1 and all i ∈ 1, N , Hence, Assumption 2.11 holds and denoting (R t ) t 0 the semi-group on E associated to the particle system (Xi, t) i∈ 1,N ,t 0 , Theorem 2.12 means that for all t 0 > 0, n ∈ N and h 0 , m 0 ∈ P(E), which, applied for t 0 = t/n for a fixed t 0 and letting n go to infinity, reads Similarly, provided m 0 = (m 0 ) ⊗N for some m 0 ∈ P(E 1 ) and similarly for h 0 , following the remark made after Theorem 2.12, holds for all t 0 if m t is either the law of X 1,t or the solution of the non-linear mean-field limit of the particle system (applying Thm. 2.6). In particular, remark that V solves (4.6) iff the probability density proportional to exp(−βV ) is an equilibrium of this non-linear equation. Yet, if ρ = p * − θ/λ > 0 then the contraction of the total variation norm implies that the latter admits a unique equilibrium, which concludes.

The Zig-Zag process with a close to tensor target
Let π ∈ P(R N ) be a probability law with a density proportional to exp(−U ), where U ∈ C 1 (R N ). Consider the so-called Zig-Zag process [4] where, for y ∈ {−1, 1} N and i ∈ 1, N , y −i = (y 1 , . . . , y i−1 , −y i , y i+1 , . . . , y N ). The Zig-Zag process admits π(dx) ⊗ ((δ −1 + δ 1 )/2) ⊗N as an invariant measure and it is ergodic under general conditions on U (see [5]), so that for all reasonable (e.g. bounded) functions f on R N , This makes it suitable for MCMC purpose. If the target measure is of tensor form, i.e. if U (x) = U 1 (x 1 ) + · · · + U N (x N ) for some one-dimensional functions U i , then the N coordinates of the process are independent onedimensional Zig-Zag processes, so that the convergence to equilibrium of each of these coordinates is independent of N . Let us prove that this property is stable under the addition of correlations between the coordinates, provided they are small. Proposition 4.3. Suppose that there exist U 1 , . . . , U N ∈ C 1 (R), W ∈ C 1 (R N ), ρ > 0 and R 1 such that for all i ∈ 1, N and x ∈ R, if |x| R then xU i (x) ρ|x|. Let (R t ) t 0 be the semi-group associated to the generator Then there exist θ * , C > 0 andκ ∈ (0, 1) that depend only on R, ρ and C := sup{|U i (x)|, i ∈ 1, N , |x| < R} such that, if sup{ ∂ xi W ∞ , i ∈ 1, N } < θ * , then for all µ ∈ P(R N ), Proof. Let h, ϕ ∈ C 1 (R) be as defined in the proof of Proposition 4.1. For i ∈ 1, N and (z, On the other hand if |x i | R, with C = 32 exp(ρR/2)(ρ + 1 + C)/ρ. Integrating in time similarly as in the proof of Proposition 4.1, we get that for all t 0, (x, y) ∈ E N and i ∈ 1, N . In particular, summing over i ∈ 1, N , To check the other conditions of Assumption 2.10 and 2.13, we consider the semi-group (P i t ) t 0 on R with generator L i given by w)) , and the jump rates and kernels Then the Zig-Zag process corresponds to the process defined in Section 2.2 associated to the semi-groups (P i t ) t 0,i∈ 1,N and the jump mechanims (λ i , Q i ) i∈ 1,N . Note that, if G i is a representation of Q i , then for all x ∈ E N and almost all u ∈ [0, 1], G i ((x, y), u) = (x, y −i ) so that V i (G i ((x, y), u)) 2V i (x i , y i ). To check (2.14), through computations similar to the previous ones, it is clear that for someC that does not depend on θ. Finally, the Doeblin condition (2.17) is a consequence of the ergodicity of the one-dimensional Zig-Zag process as established in [5,29]. Hence, Theorem 2.14 holds, which concludes.

Hybrid drift/bounce kinetic samplers
Let E 1 = R d for some d ∈ N, U ∈ C 1 (E 1 ) be an exterior potential, W ∈ C 1 (E 2 1 ) an interaction potential and β > 0 be an inverse temperature. Similarly as in Section 4.2, we want to compute expectations with respect to the granular media equilibrium, i.e. the probability measure with density proportional to exp(−βV ) where V solves (4.6). Again, this can be approximated with a system of N interacting particles. Denoting, for x ∈ E := E N 1 , consider the Markov process on E 2 with generator L N defined by and D is some dissipativity operator on the velocities, ergodic with respect to the standard Gaussian measure γ dN on E = R dN , for instance D = hD i , i = 1..3 with h > 0 and for some p ∈ [0, 1). Then the probability measure µ N ∝ exp(−β(U N + W N )) on E is invariant for L N . The motivation to use such an hybrid process to sample µ N is the following. The computations of ∇U N and ∇ x W N have a respective numerical cost of O(N ) and O(N 2 ). Suppose that W is Lipschitz with a known bound ∇W ∞ η. Then, by thinning method [35,36], it is rather simple to sample jump times with rate (v i · ∇ xi W N (x)) + , by proposing jumps at rate |v i |η and then accepting them with some probability. In that case, we only have to compute ∇ xi W at each of these proposed jump times, instead of computing it at all times nδt, n ∈ N, where δt is the timestep used for the discretization of the trajectory. If, on the other hand, no efficient bounds on ∇U are available, then it makes more sense to deal with U with a drift operator (with a discretization scheme with time-step δt) rather than with a jump one. This argument is in fact not restricted to mean-field processes: each time the forces can be decomposed as a bounded but expensive part and a cheap but singular one (for instance, long-range interactions versus short-range interactions in molecular dynamics) it is reasonable to treat the bounded part with jumps and the singular one with drift. Contrary to multiple time-step methods like RESPA [51], there is no contribution of the bounded forces to the systematic bias on the invariant measure. Now, to study the long-time behavior of the particle system with generator L N , we can decompose L N = L N + L N with If D = D 1 (resp. D 3 ) for instance, then L N is the generator of N independent kinetic Langevin (resp. BGK-like) processes. Remark that, in the case of D = D 2 , then L N is not the generator of N independent processes because the partial refreshment of the velocities occurs at the same time for all the particles. As a consequence, this example doesn't exactly enter the framework of Section 2.2. Nevertheless, it is rather clear that, in this case, a coupling in the spirit of [21] will still give for each coordinate, independently from the others, a probability to merge in some time t 0 independent from N . This is what is really required in the proof of Theorem 2.14 (more precisely, of Lem. 3.9). In fact, the partial refreshment of the velocities could also be done, for all particles at once, at a given determinist time period, as in Hamiltonian Monte-Carlo.
In any cases, as in the previous section, establishing the existence of a Lyapunov function V(x) = N i=1 V i (x i ) and a local Doeblin condition for L N is possible under some assumptions on U and, assuming that η is small enough, one can then prove through Theorem 2.14 that the particle system converges towards its equilibrium at a rate independent from N .

Selection/Mutation algorithms
Let (P t ) t 0 be a Markov semi-group on a Polish space E that satisfies Assumption 2.5. Consider λ * > 0, N ∈ N * and a function p : E 2 → [0, 1]. For all i ∈ 1, N and x ∈ E N , set λ i (x) = λ * and where z = x j→i is defined by z k = x k for all k ∈ 1, N \ {i} and z i = x j . In other words, if x = (x 1 , . . . , x N ) is the position of N particles, a r.v. with law Q i (x) is drawn as follows: draw J uniformly over 1, N and, with probability p(x i , x J ), kill the i th particle and replace it by a copy of the J th one. This kind of dynamics is used in a variety of algorithms, see [12,14,16,17,28] and references within. Then Assumption 2.11 clearly holds since, if x, Z ∈ E N are such that x i = z i , Hence, Theorem 2.12 holds.

The mean-field TCP process
We consider the non-linear integro-differential equation on R + given by with, for ν ∈ P(R + ) and x ∈ R + , where g 1 and g 2 are both positive, non-decreasing functions on R + that goes to infinity at infinity. For references on this model, called the TCP process, see [2,13] and references within. The choice for this particular expression of λ ν is not motivated by any modeling consideration; it is only meant to provide a very basic, yet interesting, example where Assumption 2.1 is not satisfied, since the non-linear jump rate is unbounded. In particular, it should be checked that, given a measurable function t → µ t from R + to P(R + ), then the associated process (X µ,x t ) t 0 , such as defined in Section 2.1, is well-defined for all time, namely that the probability that an infinite number of jumps occurs in a finite time is zero.
This process is defined as follows. Suppose that t → µ t is such that for all x 0, g * (x) := sup t 0 ∞ 0 g 2 (x + y)µ t (dy) < ∞ . Let (S k ) k 0 be an i.i.d. sequence of standard exponential r.v., X 0 = x and T 0 = 0. Suppose that T n 0 and (X t ) t∈[0,Tn] have been defined for some n ∈ N. Set T n+1 = inf t > T n , S n < t Tn λ µs (X Tn + (s − T n ))ds , X Tn+1 = X Tn + T n+1 − T n 2 and, for t ∈ (T n , T n+1 ), X t = X Tn + t − T n . The process is then constructed by induction up to time T n for all n ∈ N. Moreover, by construction, X t∧Tn X 0 + t for all n ∈ N and all t 0, and since g 1 and g 2 are non-decreasing, λ µs (X s ) 1 + g 1 (X 0 + t) + g * (X 0 + t) for all s t ∧ T n . Hence, for all n such that T n t, T n 1 1 + g 1 (X 0 + t) + g * (X 0 + t) n−1 k=0 S k .
In particular, for all t 0, there are almost surely a finite number of jumps before time t. In other words, T n almost surely goes to infinity as n goes to infinity, so that X t is almost surely defined for all t 0. Let (R µ s,t ) t s 0 be the associated inhomogeneous Markov semi-group, and L t be its generator, given by L t f (x) = f (x) + λ µt (x) (f (x/2) − f (x)) .
In the following we suppose that g 2 (x) K exp(ρx) for some K, ρ > 0, and that sup t 0 µ t (V) < ∞, where V(x) = exp(ρx) . In particular, (4.9) holds, so that the associated process and semi-group are well defined.
Then, for x R := inf{x 0, g 1 (x) 2ρ}, In particular, for any m 0 ∈ P V (R + ), t 0 and x 0, Note that, simply by changing ρ to 2ρ, the same computations shows that there also existĈ such that Besides, for all ν 1 , ν 2 ∈ P(R + ), denoting (Y,Ỹ ) an optimal coupling of ν 1 and ν 2 such as given by Lemma 3.1, Let m 0 ∈ P V (R + ) and t → µ 1 t , µ 2 t be such that µ i t (V) m 0 (V) ∨C for i = 1, 2 and t 0. Consider the synchronous coupling (X t ,X t ) of m 0 R µ 1 0,t and m 0 R µ 2 0,t with X 0 =X 0 ∼ m 0 , such as defined in Section 3.2. Then 2E e ρ(X0+t) P X t =X t | X 0 .
For t large enough, from (4.11), m t (V 2 ) Ĉ + 1, so we can now restrict the study to initial distributions that satisfies m 0 (V 2 ) Ĉ + 1. Then, with the same argument used to establish (4.13), there exists C > 0 such that for all m 0 , h 0 with m 0 (V 2 ), h(V 2 ) Ĉ + 1 and all t 0 0, which is the "splitting estimates" part of the proof of Theorem 2.9. Since the Lyapunov contraction (4.10) has already been established, what remains to obtain the "merging estimates" part, i.e. the equivalent of Lemma 3.6, is just a time t 0 and a probability α > 0 to merge two processes in a time t 0 , given that they started in some compact set. For the linear processes (i.e. with g 2 = 0), this has been done in [2,13]. As in Section 3.2.1, to couple two non-linear processes, we simply couple two linear processes (i.e. with jump rate 1 + g 1 ) and we hope that no non-linear jump occurs in the time interval [0, t 0 ]. In the present case, however, we should be cautions, since the non-linear jump rate is not uniformly bounded in x. Nevertheless, if the starting point are taken in [0, R 0 ] for some R 0 > 0, then during the time interval [0, t 0 ] the processes remain in [0, R 0 + t 0 ], on which the non-linear jump rate is bounded by (4.14). Thus, conditionnally to the fact that they started in [0, R 0 ] the probability that two coupled non-linear processes with initial law m 0 , h 0 that satisfy m 0 (V 2 ), h(V 2 ) Ĉ + 1 have merged at time t 0 is uniformly bounded away from 0. Then the strategy of the proof of Theorem 2.9 can be adapted to get, provided that K is small enough, that (4.8) admits a unique equilibrium, toward which all solutions with m 0 ∈ P V converges exponentially fast.

Markov processes with delay
We claimed in the introduction that the coupling arguments used in 3 to deal with non-linear equation can also be used to deal with self-interacting processes. A general study of processes (X t ) t 0 interacting with a weighted empirical distribution for all t 0 such that t 0 w(t − s)ds = 0 exceeds the scope of the present paper, and is thus postponed. Nevertheless, as a proof of principle, consider the simple case where w is a Dirac mass at some given time t 0 . In other words, X t follows some Markovian dynamics and jumps at a rate and to a position that depends on X t−t0 . In particular, Y t = (X s ) s∈[t−t0,t] is a Markov process. Nevertheless it is not necessary to consider the somewhat complicated process (Y t ) t 0 to obtain a speed of convergence toward equilibrium for the law of X t . If there is some probability to couple two non-delayed processes starting at different positions and if the delayed jump rate is bounded, then there is some probability to merge two delayed processes at some positive time t 1 and then there is some probability that no delayed jump occurs in a time interval of length t 0 . If that happens, then after time t 0 + t 1 the two processes share the same position and the same memory, and they will stay equal forever. An exponential convergence toward some equilibrium is obtained for the law of X t (which is not the solution of an integro-differential equation).