CONVERGENCE OF A PARTICLE APPROXIMATION FOR THE QUASI-STATIONARY DISTRIBUTION OF A DIFFUSION PROCESS: UNIFORM ESTIMATES IN A COMPACT SOFT CASE

. We establish the convergences (with respect to the simulation time t ; the number of particles N ; the timestep γ ) of a Moran/Fleming-Viot type particle scheme toward the quasi-stationary distribution of a diﬀusion on the d -dimensional torus, killed at a smooth rate. In these conditions, quantitative bounds are obtained that, for each parameter ( t → ∞ , N → ∞ or γ → 0) are independent from the two others.


Introduction
with b ∈ C 1 (T d ), where (B t ) t 0 is a d-dimensional Brownian motion.Add a killing rate λ ∈ C(T d ) and, given a standard exponential random variable E independent from (Z t ) t 0 , define the death time Then a probability measure ν on T d is said to be a quasi-stationary distribution (QSD) associated to the SDE (1.1) and the rate λ if Keywords and phrases: Quasi-stationary distribution, interacting particle system, Wasserstein distance, couplings, propagation of chaos. 1 LJLL, Sorbonne Université. 2 LJLL and LCT, Sorbonne Université.
In our case, there exists a unique QSD ν * and, whatever the initial distribution η 0 of Z 0 , Law(Z t | T > t) −→ t→∞ ν * see e.g.[10], Thm.2.1) or Corollary 2.15 below.The present work is dedicated to the proof of convergence of an algorithm designed to approximate ν * .This is classically done through a system of N interacting particles whose empirical measure converges to Law(Z t | T > t) as N → ∞, where killed particles are resurrected in a suitable way in order to keep constant the size of the system (while a naive Monte Carlo simulation would see the sample shrink along time).This question has already been addressed by many authors in various contexts, see the discussion in Section 1.4 below.Before introducing the algorithm, stating our results and comparing them with previous works, for now, let us simply highlight the main specificities of the present work.
The first novelty is that we take into account the time-discretization of the continuous-time diffusion.That way, we establish error bounds between the theoretical target QSD and the empirical measure indeed obtained with an actual implementation of the algorithm.There are three sources of errors: first, the continuous-time SDE (1.1) has to be discretized with some time-step parameter γ > 0. Second, as will be detailed below, a non-linearity in the theoretical algorithm has to be approximated by a system of N particles.This leads to the definition of an ergodic Markov chain whose invariant measure is close, in some sense, for large N , to the QSD of the time-discretization of the diffusion.But then this Markov chain is only run for a finite simulation time t = mγ, m ∈ N. A third error term then comes from the fact that stationarity is not fully achieved.We will obtained quantitative error bounds in γ, N and t.
A second specifity is that the bound obtained for each parameter will be uniform in the other two.For instance, the only other work in which the long-time convergence of the chain is proven to be, under some (restrictive) conditions, uniform in N , is [13] in a finite state space.Besides, our work is quite close in spirit to this work of Cloez and Thai.The question of the dependency or uniformity of the estimates in other previous works will be further discussed in Section 1. 4.
Finally, although it was not the primary motivation of the present work, it seems that the particular definition of the system of interacting particles considered here, in particular the rebirth mechanism, was not considered in previous works (where, basically, killed particles are resurrected at the position of one of the other particles).Our variant is initially motivated by the property stated in Proposition 1.1 below, which has been indicated to the second author by Bertrand Cloez.Yet, this variant has the unintended advantage to be both discrete in time and non-failable, in the sense that it is well-defined for all times, even though all particles die simultaneously from time to time (see also [37] on this question).
Note that we restrict the study to a compact state space.Moreover, we only consider soft killing at some continuous rate, and no hard killing which would correspond to the case where T is the escape time from some sub-domain (see e.g.[4,21]).Finally, as will be seen below, as far as the long-time behaviour of the process is concerned we will work in a perturbative regime, namely we will assume that the variations of λ are small with respect to the mixing time of the diffusion (1.1) (while λ ∞ itself is not required to be small).These very restrictive conditions, which rule out many cases of practical interest, have to be considered in light of our very strong results (Thms.1.2 and 1.3 below and all the corollaries of Sect.2.5, gathered in Fig. 1).In fact, although already interesting by itself, this restricted framework can be thought as a toy model motivated in particular by the case that arises in the parallel replica algorithm [29].In that case, T is the escape time for (1.1) from a bounded metastable domain, so that the lifespan of the process is expected to be larger than its mixing time (and to depend little from the initial condition, given it is far enough from the boundary).Hence, the compact and perturbative assumptions are consistent with this objective.The restriction to smooth killing rate, however, is made to avoid additional difficulties in the hard case where, even in the metastable case, the probability to leave the domain is high (and exhibits high variations) when the process is close to its boundary.The initial motivation of the present study was to test the general strategy of the proof (via coupling aguments) in a first simple case, with the goal of extending it later on to the metastable hard case by combining it with some Lyapunov arguments to control the variations of the killing rate near the boundary.This is postponed for future work.
This work is organized as follows.The algorithm and main results are presented in Section 1.3, and the relation with previous works is discussed in Section 1.4.Section 2 contains the proofs, and more precisely: a general coupling argument, which is the central tool for all our results, is presented in Section 2.1; the basic bounds in terms of t → +∞, N → +∞ and γ → 0 are then stated and proven respectively in Sections 2.2, 2.3 and 2.4; finally, these basic results are combined in Section 2.5, concluding the proofs of the main theorems and inducing a number of corollaries.

Notations and conventions
We respectively denote P(F ) and B(F ) the set of probability measures and of Borel sets of a Polish space F .Functions on T d are sometimes identified to [0, 1] d -periodic functions, and similar non-ambiguous identifications are performed, for instance if x ∈ T d and G is a d-dimensional standard gaussian random variable, x + G has to be understood in T d , etc.A Markov kernel Q on F is indiscriminately understood as, first, a function from F to P(F ), in which case we denote ]); second, a Markov operator on bounded measurable functions on F , in which case we denote Q : f → Qf (where Qf (x) = f (w)Q(x, dw)); third, by duality, a function on P(F ), in which case we denote We denote E(1) the exponential law with parameter 1, U(I) the uniform law on a set I and N (m, Σ) the Gaussian law with mean m and variance matrix Σ.We use bold letters for random variables in T dN and decompose them in d-dimensional coordinates, like X = (X 1 , . . ., X N ) with X i ∈ T d , or X 1 = (X 1,1 , . . ., X N,1 ).

The algorithm and main result
Starting from the diffusion (1.1) killed at time T given by (1.2), we introduce two successive approximations.The first is time discretization.For a given time step γ > 0 and a sequence (G k ) k∈N of independent random variables with law N (0, I d ), we consider the Markov chain on T d given by Z0 = Z 0 and and, given E ∼ E(1) independent from (G k ) k∈N and Z 0 , From classical results for Euler schemes of diffusions (see e.g.[33]), it is quite clear that, for any A ∈ B(T d ) and all t 0, from which, for all t 0, (we will prove this, see Cor. 2.11 below).Note that, from the memoryless property of the exponential law, given a sequence (U k ) k∈N of independent variables uniformly distributed over [0, 1] and independent from (G k ) k∈N and Z 0 , then (( Zn ) n∈N , T ) has the same joint distribution as (( Zn ) n∈N , T ) with where p(z) = 1 − exp(−γλ(z)) is the probability that, arriving at state z, the chain is killed.A naive Monte Carlo sampler for the QSD would be to simulate N independent copies of the chain (1.3) killed with probability z → p(z) and to consider after a large number of iterations the distribution of the copies that have survived.However, after a long time, most copies (possibly all) would have died and the estimator would be very bad.To tackle this issue, we have to introduce a rebirth mechanism to reincorporate dead particles in the system.
Denote K : T d → P(T d ) the Markov kernel associated with the transition (1.3), i.e.
For µ ∈ P(T d ), let Q µ be the Markov kernel such that, for all x ∈ T d , Q µ (x, •) is the law of the random variable X defined as follows.Let (X k , U k ) k∈N be a sequence of independent random variables such that, for all k ∈ N, X k and and set X = X H . Since λ is bounded, p is uniformly bounded away from 1 and thus H is almost surely finite, so that Q µ is well-defined.
In other words, a random variable X ∼ Q µ (x, •) may be constructed through the following algorithm (in which new means: independent from all the variables previously drawned).
1. Draw X 0 ∼ N (x + γb(x), γI d ) and a new U 0 ∼ U([0, 1]). 2. If U 0 p(X 0 ), set X = X 0 in T d (in that case, we say the particle has moved from x to X 0 without dying).3.If U 0 < p(X 0 ) then set i = 1 and, while X is not defined, do: (a) Draw a new X i distributed according to µ, a new X i ∼ N (X i + γb(X i ), γI d ) and a new U i ∼ U([0, 1]).(b) If U i p(X i ), set X = X i in T d (in that case, we say the particle has died, resurrected at X i , moved to X i and survived).(c) If U i < p(X i ), set i ← i + 1 (in that case, we say the particle has died, resurrected at X i , moved to X i and died again) and go back to step (a).
From this, we define a chain (Y k ) k∈N as follows.Set Y 0 = Z 0 and suppose that Y k has been defined for some This somewhat intricate definition is motivated by the following results (whose proof is postponed to Sect.2): In particular, as n → ∞, the law η n of Y n converges toward the QSD of Z. Unfortunately, it is impossible to sample (Y k ) k∈N in practice since this would require to sample according to η k for any k ∈ N.This is a classical case of a time-inhomogeneous Markov chain which is interacting with its own law or, similarly, of a measurevalued sequence (η k ) k∈N with a non-linear evolution.Such processes arise in many applications, see e.g.[15,16] and references within.Motivated by the Law of Large Numbers, we are lead to a second approximation, which is to use mean-field interacting particles.For a fixed N ∈ N * and for the associated empirical distribution.Then we define the Markov operator R on T dN as In other words, a random variable In order to specify the parameters involved, we will sometimes write R N,γ for R.
Let us informally describe the transitions of such a Markov chain (X k ) k∈N : the i th particle follows the transition given by (1.3) independently from the other particles until it dies.If it dies at a step k ∈ N * , then it is resurrected on another particle X J,k−1 with J uniformly distributed over [ [1, N ]] (in particular and contrary to most previous works on similar algorithms, J = i is not excluded, although it doesn't change much since its probability vanishes as N → ∞) and immediatly performs a step of (1.3); if it dies again after this unique step, it is resurrected again and performs a new step, and so on until it is not killed after a resurrection and an Euler scheme step.Then this is the new value X i,k from which the particle follows again the transitions (1.3) until its next death, etc.
Note that there is no problem of simultaneous death since at step k the particles are resurrected on positions at step k − 1, which are well-defined even if all particles die at once at step k.
It is easily seen that R admits a unique invariant measure toward which the law of the associated Markov chain converges exponentially fast (in the total variation sense for instance), but a naive argument yields a convergence rate that heavily depends on N (and possibly γ).Similarly, classical studies can be conducted for the limits N → ∞ and γ → 0 but again with estimates that are typically exponentially bad with respect to the total simulation time (see the references in Sect.1.4 or Props.2.5 and 2.8).In the following we will focus on a somewhat perturbative regime under which we will establish estimates for each of these limits that are uniform with respect to the other parameters.Even for the continuous-time process (corresponding to γ = 0 , see Sect.2.4 for the definition), such uniform results are new (see Cors. 2.10 and 2.12).
Recall that the W 1 Wasserstein distance between µ, ν ∈ P(T d ) is defined by More generally, for ρ a distance on some Polish space F , denote W ρ the corresponding Wasserstein distance on P(F ), defined by If X ∼ µ and Y ∼ ν, we call (X, Y ) a coupling of µ and ν.If (X, Y ) is a coupling for which the infimum in (1.4) is attained, we say that it is an optimal coupling.From ( [39], Cor.5.22), such an optimal coupling always exists.
Our first main result is a long-time convergence rate uniform in N : Theorem 1.2.There exist c 1 , c 2 , γ 0 > 0 and a distance ρ on T d equivalent to the Euclidean distance, that depend only on the drift b and the dimension d, such that, if λ is Lipschitz with a constant L λ and then the following holds: for all γ ∈ (0, γ 0 ], N ∈ N and all µ, ν ∈ P(T dN ), considering the distance As a consequence, there exists C > 0 that depends only on b and d such that for all γ ∈ (0, γ 0 ], m, N ∈ N and all µ, ν ∈ P(T dN ), This means that, with respect to the metric ρ N , R N,γ has a Wasserstein curvature of γκ in the sense of [28].Theorem 1.2 is proven in Section 2.2.From this first result, similar bounds can be obtained for large N and small γ (see Sects.2.3 and 2.4).Combining all these results eventually yields a quantitative bound on the error made in practice by approximating ν * by the empirical distribution of the particular particle system: Theorem 1.3.Under the conditions of Theorem 1.2, suppose that κ given by (1.5) is positive.There exists where All the constants in Theorems 1.2 and 1.3 (and all other results stated in this work) are explicit.More precisely, c 1 and γ 0 come from Corollary 2.2 of [32] (see Prop. 2.1) where an explicit value is given, and all the other constants involved in our results can be tracked by following the explicit computations.
In Theorem 1.3, the speeds of the different convergences (exponential in the simulation time, with the squareroot of the timestep and with α of the number of particles) are optimal since they are optimal for non-interacting diffusions (i.e. the case λ = 0), see in particular [26] for the large N asymptotic.
Other intermediary results will be established in the rest of the paper that are interesting by themselves: propagation of chaos (i.e.N → ∞) and continuous-time limit at a fixed time (even without the condition κ > 0) respectively in Propositions 2.5 and 2.8.From that, results for the continuous-time process (γ = 0), the equilibria (t = ∞) or the non-linear process (N = ∞), or when two parameters among three are sent to their limits, are then simple corollaries, see Section 2.5.All these results are summarised in Figure 1 at the end of this work.
Note that exp(−γλ(x)) is the probability that the chain is not killed when it arrives at state x.The time step γ should be chosen in such a way that this probability is relatively large, say at least one half.In that case, exp(γ λ ∞ ) is typically close to 1.In other words, the positivity of κ given by (1.5) is mostly a condition about L λ being small enough.
This perturbation condition is different from the one considered in [34], where λ ∞ rather than L λ is supposed to be small (while our main arguments are a direct adaptation of the coupling arguments of [34]).This difference comes from the fact that, in the present study, we work with the W 1 distance rather than the total variation one (which is a Wasserstein distance but associated to the discrete metric d(x, y) = 1 x =y ).Indeed, in our coupling arguments, we need to control |λ(x) − λ(y)| the difference between the death rates of two processes at different locations, which is bounded here by L λ |x − y| and in [34] by 2 λ ∞ 1 x =y .In fact our argument for the long-time convergence may easily be adapted to the total variation distance framework, following [34].Nevertheless this would be more troublesome in the study of the limit N → ∞.Then, one needs to couple η k (that admits a density with respect to the Lebesgue measure) with π(X k ) (which is a sum of Dirac masses), so that the total variation distance is not adapted.This may be solved by considering W 1 → total variation regularization results for (Euler schemes of) diffusions, that can be established by coupling arguments again.Nevertheless, in order to focus on the other difficulties of the problem and for the sake of clarity, we decided to stick to the W 1 distance in all the different results of this work.Similar Wasserstein coupling arguments have been used in [13] on a similar problem (see next section) and in [41] for a different kind of mean-field interacting particle system (also with a similar perturbative condition corresponding to the fact σ in ( [41], Prop.3.1) has to be positive, i.e. the interaction should be small with respect to the independent mixing).
Notice that, among all possible discretization schemes, we only considered the explicit Euler-Maruyama one.This choice was made for simplicity, but the proofs could be extended to other usual schemes.The main ingredient required is a Wasserstein curvature of order γ for a modified W 1 distance (see Prop. 2.1, based on ( [32], Cor.2.2)).Similarly, we only considered the case of an elliptic diffusion process with a constant diffusion matrix for simplicity (since we use ( [32], Cor.2.2) which covers this case), although a similar Wasserstein contraction certainly holds in a much more general framework (even hypoelliptic non-elliptic, as in continuous-time settings [22]).As stated in the introduction, the present paper does not aim at the broadest generality, and by avoiding technicalities we want to highlight the main issue (i.e. the question of the uniformity of bounds in the various parameters).

Related works
The use of particle systems with death and re-birth to approximate the QSD of a Markov process has been introduced in [7], for two-dimensional Brownian motions killed at the boundary of a box.This work refers to the system as a Fleming Viot process.However, in the lecture notes of Dawson [14], a (continuous-time) system of N particles that move independently according to some Markov dynamics and interact through a sampling-replacement mechanism is called a Moran particle process, while the term Fleming-Viot process refers to a measure-valued (continuous-time) process that can be obtained as the limit of the Moran particle system as the number of particles goes to infinity.Besides, with these definitions, the empirical measure of a Moran particle process is nothing but a Fleming-Viot process in the particular case where the initial condition is the sum of N Dirac measures.Both the initial works of Moran [35] and Fleming and Viot [25] are motivated by population genetics models.
The seminal work [7] is a numerical study so that, although a continuous-time continuous density-valued process is targeted, what is really implemented is in fact a discrete-time particle system.From then, the use of similar processes in numerical schemes (for killed processes or more general non-linear problems such as non-linear filtering, rare events analysis and so on [15,16]) have been widely studied.Although the term Moran particle system is used in [18] and a few other works, most studies concerned with quasi-stationary distributions refer to Fleming-Viot particle system, see e.g.[2,9,11,13,21,23,27,30,31] (for continuous-time processes, which thus corresponds to the process introduced in 2.4 i.e. the limit γ → 0).
Different frameworks have been considered: finite space, discrete infinite space, compact and non-compact continuous space; continuous and discrete time; hard or soft killing, or more general non-linear evolutions.There are also some variants on the precise definition of the rebirth mechanism (as mentioned in the introduction, our specific scheme, where killed particles at step k are resurrected on a position of a particle at step k − 1 and then perform an Euler step conditioned not to be killed again, seems to be new).Disregarding these differences, let us discuss the kind of results established in the existing literature.
A first set of works, starting shortly after the initial numerical study of [7], are concerned with finite-time propagation of chaos, either for the marginal laws or at the level of a trajectory in the Skorohod topology [2,8,18,23,27,40], possibly with a precise CLT [9,21].Similarly, propagation of chaos and/or CLT as N → +∞ at stationarity (i.e. for the invariant measure of the Fleming-Viot particle system) are established e.g. in [2,30].Uniform in time propagation of chaos is established in [17,19,38].Contrary to our results, this uniformity in time is not obtained with a long-time convergence of the particle system at a speed independent from N , but rather from the long-time convergence of the limit (N = +∞) non-linear process.This long-time convergence for the non-linear process (or, equivalently according to Proposition 1.1, for the process conditionned to survival) has recently been studied in general settings, in particular in a series of works by Champagnat, Villemonais and coauthors, see e.g.[3,10,12,20] and references within.The idea to combine a finite-time convergence as N → +∞ with a long-time convergence of the limit process to obtain a uniform in time convergence with N traces back at least to [36].Remark that, combining uniform in time propagation of chaos estimates with long-time convergence of the limit process, it is possible to obtain results in the spirit of Theorem 1.3, i.e. that gives an error bound between the empirical measure of the chain simulated in practice with the target QSD, even if no long-time convergence of the particle system is available.
Contrary to the present paper, most of these previous works do not consider a perturbative framework where a condition similar to the positivity of κ given by (1.5) would be considered.Such a condition is considered in [13], which together with the present paper is the only one that establishes a long-time convergence rate uniform in N for the particle system.Remark that geometric ergodicity for the particle system, stated for instance in [37], is usually easy to obtain (for a fixed N ) from classical tools on Markov chains.Obtaining a rate that is uniform in N is much harder (hence the perturbative framework).Besides, for another class of mean-field particle systems, the McKean-Vlasov diffusions (for which interaction is induced by an interaction potential force in the drift of the diffusion), it is well-known that there are cases (in non-convex confining potential with convex interaction for instance, at low temperature, for instance) where the non-linear limit system has several equilibria and the convergence rate of the particle system (which has a unique invariant measure for all N ) goes to +∞ with N .In fact we don't expect this to happen for the Fleming-Viot particle system in our context (compact space, elliptic diffusion, smooth killing) where the QSD is unique and the long-time convergence of the limit process and the uniform in time propagation of chaos have been established in non-perturbative cases (for a discussion on other cases where the QSD may not be unique, we refer to [1,11]).This may indicate that the uniform in N long-time convergence could hold in much more general cases, far from the perturbative regime around the non-interacting case.In that case, our perturbative condition (and the one of [13]) would just come from the particular non-optimal proof.However, the long-time convergence of the limit-process does not imply directly the uniform convergence for the interacting system, so this is still an open question, and our results and those of [13] are the only of their kind.At least, we can say that we have no reason to think that our condition κ > 0, with the explicit expression of κ, is sharp in any way.
The differences of our work with [13] are the following.The latter is concerned with a continuous-time Markov chain on a finite state space rather than a diffusion on the torus.Moreover, it requires a strong Doeblin condition: the parameter λ in [13] is required to be positive, which implies that, for any pair of states i, i , there is a probability to go either from i to i , or from i to i, or from both i and i to some third state j.Related to this, in [23], although uniform in N long-time convergence is not stated or proven, a coupling argument similar to ours or to [13] is used (in ( [23], Prop.3.1)) to obtain uniform in time propagation of chaos estimates.This work is concerned with countable state space under an even more restrictive Doeblin condition than [13] (there exists at least one state i for which the transition rate from j to i is uniformly bounded for all other j), and the uniform in time result requires a perturbative condition (α > C in [23] corresponding to λ > 0 in [13] and κ > 0 for us).We remark that, in a countable discrete state space, our arguments can be easily adapted to obtain a uniform long-time convergence under a condition of positive Wasserstein curvature for some distance (similarly to Prop.2.1), which is much less restrictive than the Doeblin conditions of [13,23].Of course in that case we still need λ to be Lipschitz (with respect to the distance for which the Wasserstein contraction holds) with a Lipschitz constant sufficiently small (with respect to the curvature).
As far as the time discretization error is concerned, we are not aware of results similar to ours in previous works but we refer the reader to [24] for weak error studies à la Talay-Tubaro for some non-linear evolutions, and references within for more details.
Finally, let us mention another related set of works, [4][5][6], based on self-interacting processes.Indeed, the reason we introduced a system of N interacting particles was to approximate some non-linearity in the evolution of a measure-valued process.Yet, actually, when it comes to the approximation of the QSD, we are not really interested in the non-linear evolution, but only in its long-time limit.A classical idea in the field of stochastic algorithms in that case is to construct a chain similar to (Y k ) k∈N except that, in the rebirth mechanism, the unknown law η n is replaced by the occupation measure of the past trajectory (which, by ergodicity, is expected to converge to equilibrium), i.e. there is only one particle and, when it dies at time t > 0, it is resurrected at its position at time s uniformly distributed over [0, t].Although the algorithms are quite similar (and can be combined), their theoretical studies rely on quite distinct arguments.

Proofs
Let us first establish the preliminary result stated in the introduction: Since ν 0 = η 0 , suppose by induction that ν n = η n for some n ∈ N. Keeping the notations introduced of the definition of the kernel Q µ , consider the events Then, for all bounded measurable f , In particular, integrating with respect to µ, we obtain Applied with µ = η n , this reads On the other hand, frow which which concludes.

The basic coupling
The long-time estimates needed to prove convergence toward equilibrium and uniform in time estimates in N and γ are based on the fact that, as long as particles don't die, they follow the chain (1.3) which, like its continuous-time counterpart (1.1), have some mixing properties.In order to quantify the latters, we start by stating ( [32], Cor.2.2) in a suitable way in our context.Proposition 2.1.There exists c 1 , a, γ 0 > 0 (that all depend only on the drift b of (1.1) and on the dimension d) such that, denoting ρ(x, y) = (1 − exp(−a|x − y|))/a for x, y ∈ T d , then ρ is a metric on T d with ∀γ ∈ (0, γ 0 ] , ∀µ, ν ∈ P(T d ) , W ρ (µK, νK) Proof.This is ( [32], Coro.2.2) applied to a diffusion with smooth drift on the torus, in which case the distance on T d for which the contraction holds is ρ.
In the rest of the paper, ρ is the metric and c 1 , a, γ 0 are the constants given by Proposition 2.1.Remark that ρ is equivalent to the Euclidian metric, with where we used that the diameter of T d is √ d/2 and that r → (1 − exp(−ar))/a is a concave function with derivative 1 at zero.In particular, W 1 and W ρ are equivalent.Now, in this particle system, the contraction property of the chain (1.3) may be counterbalanced by the death/resurrection mechanism through which particles interact.Indeed, considering two systems of N interacting particles, for i ∈ [[1, N ]] the previous result means that we can couple the i th particles of both systems to get closer one to the other (on average), as long as they don't die.But then, one of the two particle can die and resurrect far from the other, or even if they die simultaneously they may resurrect far apart one from the other.That being said, first, the closer they get, the easier it is to couple them in order to die simultaneously, and second, when they die simultaneously, keeping the particles close one to the other amount to do a suitable coupling of the laws from which the particles are resurrected.This is quantified in the following proposition.
In all the rest of the paper, we suppose that λ is L λ -Lipschitz (but not necessarily that κ given by (1.5) is positive).Proposition 2.2.Let µ 0 , µ 1 , µ 0 , µ 1 ∈ P(T d ) and let (X 0 , X 0 ) (resp.(X 1 , X 1 )) be a coupling of µ 0 K and µ 0 K (resp.µ 1 K and µ 1 K).Then where , Proof.Let (X k , X k , U k ) k∈N be a sequence of independent triplet of random variables such that, for all k ∈ N, U k ∼ U([0, 1]) is independent from (X k , X k ), which are such as defined in the proposition for k = 0 and 1 and, for j > 1, have the same distribution as (X 1 , X 1 ).Set H = inf{n ∈ N, U n < p(X n )} and H = inf{n ∈ N, U n < p(X n )}.Then, by considering the law of (X k , U k ) k∈N alone, it is clear that X H ∼ µ 0 Q µ1 and, similarly, Different cases are distinguished depending on the value of H and H .In the simplest case, none of the particles dies: where we used the independence between U 0 and (X 0 , X 0 ).In the second case, only one particle dies: using that ρ ∞ 1/a, In the third case, both particles die k 1 times: Finally, combining the computations of the last two cases, the fourth one reads, for k 1, Summing these four cases concludes.

Long-time convergence
For N ∈ N * denote ρ N the metric on T dN given by The following result is similar to the results of [13,34,41] and based on the same coupling argument.Proposition 2.3.There exists c 2 > 0 (that depends only on the drift b of (1.1) and on the dimension d) such that for all γ ∈ (0, γ 0 ] N ∈ N, and all µ, ν ∈ P(T dN ), with κ given by (1.5).
Proof.It is in fact sufficient to prove this for µ = δ x and ν = δ y for any x, y ∈ T dN .Indeed, assuming the result proven for Dirac masses, in the general case, considering (X 0 , Y 0 ) an optimal coupling of µ and ν and (X 1 , Y 1 ) an optimal coupling of R(X 0 , •) and R(Y 0 , •), then X 1 ∼ µR and Y 1 ∼ νR, so that Hence, in the following, we fix x, y be independent pairs of random variables in We want to apply Proposition 2.2 with we consider ( Xi , Ỹi ) an optimal coupling of K(x i , •) and K(y i , •).From Proposition 2.1, ) is independent from the ( Xi , Ỹi )'s, we remark that ( XJ , ỸJ ) is a coupling of π(x)K and π(y)K.Proposition 2.2 applied with these couplings reads, for all i ∈ [[1, N ]], where, if U ∼ U([0, 1]) is independent from the previous variables, and, conditionning on the value of J, ] and applying (2.1) yields and the previous inequality becomes As a direct consequence, Proposition 2.3 gives with κ that does not depends on N nor γ.Provided κ > 0, and since P(T dN ) is complete for W 1 (hence for W ρ N ) the Banach fixed-point theorem implies then that R admits a unique invariant measure toward which it converges at rate γκ.
In the rest of the paper, κ is given by (1.5) (but is not necessarily assumed positive).
Proof of Theorem 1.2.The first part of the theorem has already been proven in Proposition 2.3.The last statement then follows from the first part, the equivalence between the Euclidean distance and ρ N and the fact for all ν, µ ∈ P(T dN ).

Propagation of chaos
Recall that η k is the law at time k of the non-homogeneous Markov chain (Y k ) k∈N on T d introduced in Section 1.3 with transition kernels Q η k and initial condition η 0 , and that R = R N,γ is the transition kernel of the Markov chain (X k ) k∈N on T dN .Lemma 2.4.There exist C 1 > 0 such that for all N ∈ N , γ ∈ (0, γ 0 ], η ∈ P(T d ) and µ ∈ P(T dN ), Proof.Similarly to the proof of Proposition 2.3, we start with the case µ = δ x for some x ∈ T dN .Let From Proposition 2.2 (bounding q 0 max p γ λ ∞ and 1 − q 1 1 − max p exp(−γ 0 λ ∞ )) Now in the general case where µ is not a Dirac mass, considering Z 0 ∼ µ, and (Z 1 , Z 2 ) an optimal coupling of R(Z 0 , •) and Q ⊗N η (Z 0 , •) and conditioning with respect to Z 0 , Proposition 2.5.There exist C 2 , C 3 > 0 such that for all N ∈ N, γ ∈ (0, γ 0 ], m ∈ N and η 0 ∈ P(T d ), first, and second, if (X k ) k∈N is a Markov chain with initial distribution η ⊗N 0 and transition kernel R, then Remark that when κ > 0, γ m s=1 (1 − γκ) s−1 1/κ so that (2.3) and (2.4) yield uniform in time estimates.On the contrary, when k < 0, the estimates are exponentially bad in t = mγ.
Proof.We start with the proof of (2.3), for m 1 (the case m = 0 being trivial).From the triangular inequality, Proposition 2.3 and Lemma 2.4, Since W ρ W 1 , estimating the last term is a classical question, that is to bound the expected Wasserstein distance between the empirical measure of a sample of N independent and identically distributed random variables and their common law.From ( [26], Thm. 1) (and since on the torus the moments of probability measures are uniformly bounded), there exists some C > 0 independent from η 0 , m, N and γ such that Since r 0 = 0, a direct induction concludes the proof of (2.3).
Corollary 2.6.With the notations of Proposition 2.5, for all Proof.Let (X, Y) be an optimal coupling of η ⊗N 0 R m and η ⊗N m , and let σ be uniformly distributed over the set of permutations of N elements, independent from (X, Y).Since the laws of X and Y are exchangeable, X σ = (X σ(1) , . . ., X σ(N ) ) has the same law as X, in particular (X σ(1) , . . ., X σ(k) ) has the same law as (X 1 , . . ., X k ).The same goes for Y σ , and and Proposition 2.5 concludes.
Corollary 2.6 means that, for any fixed k ∈ N * , as N goes to infinity, the k-marginals of the system of particles converge toward the law of k independent non-linear chains, which is the so-called propagation of chaos phenomenon.

Discrete to continuous time
We start by defining (Y t ) t 0 and (X t ) t 0 the continuous-time analoguous of the chains (Y k ) k∈N on T d and (X k ) k∈N on T dN defined in Section 1.3.We start with the non-linear process.For t 0, let where Z solves (1.1) with initial distribution η 0 and T is given by (1.2).We define (Y t ) t 0 as follows.Set Y 0 = Z 0 ∼ η 0 , T 0 = 0 and suppose that T n and (Y t ) t∈[0,Tn] have been defined for some n ∈ N. Let (B t ) t 0 be a new Brownian motion on T d and E ∼ E(1), independent one from the other.Let Ỹ be the solution of d Ỹt = b( Ỹt )dt + dB t for t T n with ỸTn = Y Tn and let For t ∈ (T n , T n+1 ), set Y t = Ỹt .Finally, draw a new Y Tn+1 according to η Tn+1 .By induction T n and (Y t ) t∈[0,Tn] are then defined for all n ∈ N. Since λ is bounded, T n almost surely goes to infinity when n → ∞ so that (Y t ) t 0 is defined for all t 0. Similarly to Proposition 1.1, it can be established that Law(Y t ) = η t for all t 0. Now, as in Section 1.3, from the non-linear process (Y t ) t 0 , the interacting particles (X t ) t 0 are obtained by replacing η t by the empirical distribution of the system when particles die and are resurrected.

More precisely, let (E
,k∈N be a family of independent triplet of independent random variables where, for all From these variables, we simultaneously define by induction the process and its death times For all k ∈ N, ) is well-defined for all t 0. Indeed, it is well defined for all t < S 1 := min{T i,1 , i ∈ [[1, N ]]} the first death time of some particle, and is equal on this interval to ( X1,0,t , . . ., XN,0,t ), which is continuous on [0, S 1 ].Hence, the limits involved in (2.5) are well defined for k = 1 and all i ∈ [[1, N ]] such that T i,1 = S 1 .Then the algorithm above similarly defines the process up to the second time some particles die, etc.
Remark that most of the times (2.5) simply reads Xi,k,T i,k = X J i,k ,T i,k (at its k th death time, the i th particle is resurrected at the current position of the J th i,k particle).Indeed, the only case when this is not true is when the J th i,k particle dies at time T i,k .Since the probability that two or more particles die simultaneously is zero, this almost surely only occurs if J i,k = i, i.e. if the particle is resurrected at its own position.
Denote (P t ) t 0 the Markov semi-group associated with (X t ) t 0 , i.e. for all t 0, P t is the Markov kernel given by We sometimes write P t = P N,t to specify the number of particles.
Lemma 2.7.There exist C 4 > 0 such that for all N ∈ N, γ ∈ (0, γ 0 ] and µ ∈ P(T dN ), Proof.As in the proof of Lemma 2.4, it is sufficient to treat the case µ = δ x with a fixed x ∈ T dN .Let (X t ) t 0 be defined as above from random variables (E i,k , B i,k , J i,k ) i∈[[1,N ]],k∈N .In particular, X γ ∼ δ x P γ .
To define X 1 ∼ δ x R, for all i ∈ [[1, N ]] and k ∈ N, consider ( Xi,k,t ) t 0 the solution to Xi,k,0 = x J i,k and d Xi,k,t = b Xi,k,0 dt + dB i,k,t . Denoting set X 1 := ( X1,H1,γ , . . ., XN,H N ,γ ).Then (X 1 , X γ ) is a coupling of R(x, •) and P γ (x, •), so that We now distinguish four cases, considering the events that is, respectively: none of the two i th particles dies; both the i th particles die exactly once; one particle dies but not the other; at least two deaths are involved for one of the two particle.For all

Conclusion follows by gathering the four cases.
Case 1.It reduces to the classical case of diffusions, since By the Gronwall Lemma, for all t 0, almost surely, Since Xi,0,s is a Gaussian variable with mean x i + sb(x i ) and variance s, As a consequence, for γ γ 0 , Case 2. We bound Similarly to (2.7), where we used the independence of E i,0 from J i,1 and ( Xi,1,t ) t 0 .Denote (X i ) t 0 the solution of dX i,t = b X i,t dt + dB Ji,0,0,t for t < T i,0 dB i,1,t for t T i,0 .
Proof.The proof is similar to Proposition 2.5.Denoting µ m = µR m and ν m = µP mγ , from the triangular inequality, Proposition 2.3 and Lemma 2.7, and an induction concludes.

Conclusion
In this section we use the notations of the previous ones, in particular κ is given by (1.5) and the constants C 2 , C 3 and C 5 are those of Propositions 2.5 and 2.8.We can now gather all these previous results.
Letting either γ vanish or N go to infinity in Proposition 2.3, we obtain long-time convergence for, respectively, the non-homogeneous self-interacting Markov chain (Y k ) k∈N introduced in Section 1.3 and the continuous-time Markov chain (X t ) t 0 defined in Section 2.4.
Corollary 2.9.Let (η n ) n∈N be such as defined in Section 1.3, and (η n ) n∈N be similarly defined but with a different initial distribution η0 ∈ P(T d ).For all m ∈ N and all γ ∈ (0, γ 0 ], Corollary 2.10.For all N ∈ N * , t 0 and µ, ν ∈ P(T dN ), Proof of Corollary 2.9.The proof is based on the simple equality: For all N ∈ N and µ, ν ∈ P(T d ), Conversely, if (X, Y) is an optimal coupling of µ ⊗N and ν ⊗N , then By the triangular inequality, where we applied Propositions 2.3 and 2.5.Using the equality 2.9, dividing by N and letting N go to infinity concludes the proof of Corollary 2.9.
Remark that the beginning of the proof also applies for µ, ν ∈ P(T dN ) that are exchangeable (i.e.invariant by any permutation of the d-dimensional coordinates), in which case, denoting, µ (1) and ν (1) their d-dimensional marginals, we get that W ρ µ (1) , ν (1)  1 N W ρ N (µ, ν) .We now turn to the continuous-time limit of the non-linear chain (Y k ) k∈N .
Corollary 2.11.There exists C 6 > 0 such that for all η 0 ∈ P(T d ) and all γ ∈ (0, γ 0 ], if (η n ) n∈N is such as defined in Section 1.3, and (η t ) t 0 is such as defined in Section 2.4 (with η 0 = η 0 ), then and for all m 1, Proof.For the first inequality, we could follow the proof of Lemma 2.7, but, using the notations of the introduction, we will rather use the fact that where the gaussian variable G 0 in (1.3) is equal to B γ / √ γ where (B t ) t 0 is the Brownian motion involved in (1.1), and T and T are defined with the same E ∼ E(1).Recall the estimate (2.8) for the error from an Euler scheme to its initial diffusion.Then we bound which concludes the first part of the corollary.For the second part, denoting r m = W ρ η m , η mγ , we bound where we used the first part of the corollary and Corollary 2.9.An induction concludes.
We can now prove propagation of chaos results for the continuous-time process: Corollary 2.12.For all N ∈ N, k ∈ [[1, N ]] and all t 0, if (X t ) t 0 is a Markov process with initial distribution η ⊗N 0 associated to the semigroup (P N,t ) t 0 then, first, Proof.As shown in the proof of Proposition 2.5, if (X, Y) is an optimal coupling of µ and ν, Thus, considering a time step γ = t/m, m ∈ N, we decompose take the expectation, apply Propositions 2.5 and 2.8 and Corollary 2.11 and let m go to infinity.This proves the second point, and the proof of the first one is similar, with Corollary 2.6.
Up to now, we have sent either N or γ to their limit.When κ > 0, if we let t = mγ go to infinity at fixed N and γ, we recover results on the equilibria of the processes.Indeed, note that Corollary (2.9) together with the Banach fixed-point theorem imply that n → η n admits a limit which is independent from η 0 .Together with Proposition 1.1, this is the unique QSD of the Markov chain (1.3).Denote it ν γ .Similarly, Proposition 2.3 implies that R N,γ admits a unique invariant measure.Denote it µ ∞,N,γ , and µ (k) ∞,N,γ its first kd-dimensional marginal for k ∈ [[1, N ]] (i.e. the law of (X 1 , . . ., X k ) if X ∼ µ ∞,N,γ ).Third, Corollary 2.10 implies that (P N,t ) t 0 admits a unique invariant measure µ ∞,N .Proofs of Corollaries 2.13 and 2.14.Considering any η 0 ∈ P(T d ) and m ∈ N, Apply Proposition 2.3 with µ = µ ∞,N,γ and ν = η ⊗N 0 , Corollary 2.10 with the same ν and with µ = µ ∞,N , and Proposition 2.8.Letting m go to infinity concludes the proof of Corollary 2.13.The proof of Corollary 2.14 is similar (based on Prop.2.5 and Cor.2.6, like Cor. 2.12).
Next, we can send two parameters to their limit.Sending N to infinity and γ to zero, we get the long time convergence of the non-linear process (Y t ) t 0 introduced in Section 2.4 (or, equivalently, of the process Z solving (1.1) conditionned not to be dead): Corollary 2.15.Let (η t ) t 0 be such as defined in Section 1.3, and (η t ) t 0 be similarly defined but with a different initial distribution η0 ∈ P(T d ).For all t 0, W ρ (η t , ηt ) e −κt W ρ (η 0 , η0 ) .
In particular, if η0 is the QSD ν * , by definition, ηt = ν * for all t 0, so that Corollary 2.15 yields the uniqueness of the QSD and the exponential convergence of Law(Z t | T > t) toward ν * (which is a result in the spirit of [3,10,12,20]).Now, at a fixed γ > 0, letting t and N go to infinity, we obtain an error bound between the QSD ν * of the continus process (1.1) and the QSD ν γ of the discrete scheme.Proof.Thanks to Corollaries 2.9 and 2.15 (applied with one of the initial condition being the equilibrium), let m go to infinity in Corollary 2.11.
Finally, letting γ vanish and t go to infinity at a fixed N ∈ N, we obtain a propagation of chaos result at stationarity (as established first in [2], and more recently with a CLT in [30] in the case of a finite state space) for the continuous time system of interacting particle (X t ) t 0 introduced in Section 2.4.Proof.The proof is similar to Corollary 2.12, letting t go to infinity in Corollary 2.12 thanks to Corollaries 2.10 and 2.15.
Remark that our results at stationarity (Cors.2.13, 2.14, 2.16 and 2.17) all require the perturbative condition κ > 0. Yet, propagation of chaos at stationarity for the continuous time process follows from the works [2,17,19,30,38] in a much broader (non-perturbative) framework (and, although it doesn't seem to have been studied yet, the situation should be similar for error bounds in γ rather than N ).As discussed in Section 1.4, error bounds on N (and possibly γ) that are uniform in time can be obtained thanks to the long-time convergence of the limit (N = +∞) non-linear process.In our case, when κ > 0, this long-time convergence follows from the (uniform in N ) long-time convergence of the particle system (whether it is possible to obtain the latter from the former is unclear), but it holds in more general cases (see [3,10,12,20] and references within) and, Figure 1.Summary of the different results.The label of each arrow indicates the corollary or proposition where the corresponding quantitative convergence is stated.Vertical, horizontal and diagonal arrows correspond respectively to t, γ and N going to their limit.in those cases, results similar to Corollaries 2.13, 2.14, 2.16 and 2.17 should hold.The question of establishing propagation of chaos or discretization error bounds at stationarity in a more general case (i.e.without the condition κ > 0) is out of the scope of the present work.
All our results are summarised in Figure 1 Finally, we detail the proof of our main result.
Taking the expectation, applying Proposition 2.5 and Corollaries 2.9 (applied with η0 = ν γ ) and 2.16, the boundedness of ρ and the equivalence of W ρ and W 1 concludes.

1. 1 .
The problem Start from the diffusion on the d-dimensional periodic flat torus T d dZ t = b(Z t )dt + dB t (1.1)
Proof of Corollary 2.10.Similarly to the previous proof, Corollary 2.10 is a direct consequence of Propositions 2.3 and 2.8, letting m go to infinity at a fixed t and N in W ρ N (µP N,t , νP N,t ) W ρ N µP N,t , µR m N,t/m + W ρ N µR m N,t/m , νR m N,t/m + W ρ N νR m N,t/m , νP N,t .