Wasserstein contraction and Poincar\'e inequalities for elliptic diffusions at high temperature

We consider elliptic diffusion processes on $\mathbb R^d$. Assuming that the drift contracts distances outside a compact set, we prove that, at a sufficiently high temperature, the Markov semi-group associated to the process is a contraction of the $\mathcal W_2$ Wasserstein distance, which implies a Poincar\'e inequality for its invariant measure. The result doesn't require neither reversibility nor an explicit expression of the invariant measure, and the estimates have a sharp dependency on the dimension. Some variations of the arguments are then used to study, first, the stability of the invariant measure of the process with respect to its drift and, second, systems of interacting particles, yielding a criterion for dimension-free Poincar\'e inequalities and quantitative long-time convergence for non-linear McKean-Vlasov type processes.


Overview
Consider (X t ) t 0 a diffusion process on R d solution to where b ∈ C 1 (R d ), T > 0 is a constant and (B t ) t 0 is a standard Brownian motion.Denote by P t the associated semi-group, namely P t f (x) = E x (f (X t )) for all suitable f on R d .Let We are interested in cases where the following holds: Assumption 1.There exist K, R 0 and c > 0 such that Under this condition, it is standard to check that the process is non-explosive, admits a unique invariant measure µ with a positive Lebesgue density and that the law of the process converges to µ as t → ∞, using e.g.Lyapunov/Doeblin conditions [32,43,2].The first main problem considered in this work is to prove that µ satisfies a Poincaré inequality, namely that there exists a constant C P > 0 such that, for all f ∈ C 1 This inequality is related to concentration inequalities for µ and to the long-time convergence in L 2 (µ) of the law of the process toward µ (see e.g.[3,9] and below).More precisely, for µ ∈ P(R d ) we write the optimal constant in the inequality.When b = −∇U for some U ∈ C 2 (R d ), Assumption 1 is equivalent to say that U is convex outside a compact set, and then a Poincaré inequality is known to hold.In fact, in this case, µ has an explicit density, proportional to exp(−U/T ), and moreover the process is reversible, namely its generator is self-adjoint in L 2 (µ).The Poincaré constant is then exactly the spectral gap of L (in nonreversible cases, the spectral gap may be larger).Many tools are available to establish Poincaré inequalities for reversible diffusions.In particular, under Assumption 1, a Poincaré inequality can be obtained by combining the Bakry-Emery curvature criterion and the Holley-Stroock perturbation argument (see e.g.[3, Propositions 4.2.7 and Proposition 4.8.1] or Section 3.3 below) or a local inequality/Lyapunov condition as in [10,2], see also [3,8,2] and references within concerning the reversible case.
Notice that different drifts b can give the same invariant measure, for instance if b = −(I d + J)∇U where J is any skew-symmetric matrix then exp(−U/T ) is invariant for the process.Assumption 1 depends on b, but the Poincaré inequality depends only on µ.Now, if b is not a gradient and if µ is not explicit, much less is known.If k(x) c > 0 for all x ∈ R d then the Bakry-Émery arguments still works [45,11].If k(x) c > 0 only for |x| large enough, a Poincaré inequality should be expected, but to the best of our knowledge it cannot be established by existing methods.
This issue leads to the second main question of this work, which is to prove that P t is a contraction of the W 2 Wasserstein distance for t large enough.Indeed, from classical arguments (see Section 3.1), this implies a Poincaré inequality for µ.Recall that, for α ∈ [1, ∞), the W α Wasserstein distance between ν, ν ∈ P(R d ) (the set of probability measures on R d ) is defined by |x − y| α π(dx, dy) , where Π(ν, ν ) is the set of probability measures on R d × R d with marginals ν and ν .Writing νP t the law at time t of a process solving (1) with an initial condition X 0 distributed according to ν (so that (νP t )f = ν(P t f ) for all bounded measurable f ), we want to find M, λ > 0 such that ∀t 0, ∀ν, ν ∈ P(R d ) , W α (νP t , ν P t ) M e −λt W α (ν, ν ), (5) Our main result is the following: Theorem 1.Under Assumption 1, suppose furthermore that for some α 2, where R * = R (2 + 2K/c) 1/d .Then (5) holds with . This is proven in Section 2. In contrast to the Poincaré inequality, this result is new even in the reversible case.
Let us now state some implications of a W 2 contraction when M > 1, obtained from known arguments (see Section 3 for the proof of the next result).To avoid technical discussions, we assume that the coordinates of the force fields b are in A the set of C ∞ functions from R d to R with all derivatives growing at most polynomially at infinity (with a slight abuse of notation we simply write b ∈ A in that case), and we only consider test functions in A. Combined with Assumption 1 which implies a time-uniform Gaussian moment for X t via standard Lyapunov arguments, we get that, for all f ∈ A and all t 0, L ∈ A, P t f ∈ A and ∂ t P t f = LP t f = P t Lf (see e.g. the proof of [25,Theorem 2.5]).
Theorem 2. Assume that k(x) −K for all x ∈ R d for some K 0, that b ∈ A and that a Wasserstein contraction (5) holds with α = 2 for some M 1, λ > 0. Then: 1.The invariant measure µ satisfies a Poincaré inequality with 2. For all f ∈ A and all t 0, 3. For all t > 0 and any probability law ν on R d with finite second moment, νP t has a density h t with respect to µ and where J(t) K/(1 − e −2Kt ) for all t > 0 and for all t ln(1 + K/λ) 2K if K > 0, and the limit of these expressions as In both Theorems1 and 2, keep in mind that µ and P t depends on T .
As we see, we only have a partial answer to our initial questions, since the results only hold for T large enough (or, equivalently, for R small enough, which means the results hold for small perturbations of the case where k(x) c for all x ∈ R d ).We didn't try to make the condition (6) on T as sharp as possible: it can be slightly improved, but it cannot be suppressed simply by optimizing our proof.We do not know whether, for T = 1, Assumption 1 is sufficient to get (5) for some M, λ or to get a Poincaré inequality for µ.However, notice that, for T T 0 , Theorem 1 gives another important information, which is that the contraction (5) and the Poincaré inequality hold respectively with M, λ and C P (µ)/T which are uniform in T T 0 .Now, this part is clearly false if we suppress the condition that T has to be large enough, namely the statement "Under Assumption 1, there exist M, λ > 0 such that, for all T > 0, (5) holds" is clearly false, and so is "Under Assumption 1, there exists C > 0 such that, for all T > 0, C P (µ) T C".Indeed, the first statement would imply the second (according to Theorem 2), and it is well known that, if for instance b = −∇U where U has several isolated local minima, then C P (µ) e a/T for some a > 0 for T small enough [33,42].
Interestingly, apart from the dependency on T , the bounds on M, λ and C P (µ) given by Theorems 1 and 2 behave rather well with the dimension d, in contrast to what usually give the methods based on the existence of a Lyapunov function and of a local Poincaré inequality [10,2] or on the perturbation of a reference measure [33].For instance, consider the probability measure µ ∝ exp(−U ) with U (x) = |x| β /β, for β > 2. By applying Theorems 1 and 2, we get that see Section 3.2.By contrast, using a standard curvature+bounded perturbation argument, one cannot get better than what is given by the curvature result, which is dimension-free (see Section 3.3 or [11,Remark 5.21]).Besides, here, µ is a radial log-concave probability measure, for which two-sided bounds on the Poincaré constants are known [5,7], in relation to the KLS conjecture [16,38].In particular, for for d 2. Hence, the dependency in the dimension d in (9) (which is based on our general result and thus does not use that µ is radial) is optimal.
As a last remark on Theorem 2, notice that, in (7), the minimum is always given by e −tT /C P (µ) in the reversible case.However, there are non-reversible cases where λ > T /C P (µ), so that the second term becomes smaller for large t.For instance, in the Gaussian case µ ∝ exp(−U/T ) with U (x) = x • Ax for some definite positive symmetric matrix A, denote by ν 1 , . . ., ν d the eigenvalues of A. Then it is well known that T /C P (µ) = min{ν i , i ∈ 1, d }, while non-reversible Gaussian processes with invariant measure µ are constructed in [39] with a linear drift −Bx where the real parts of the eigenvalues of the matrix B are larger than ν := (ν 1 + • • • + ν d )/d, so that a W 2 contraction holds with λ = ν (as can be seen using a synchronous coupling, see e.g.[45]).
Contrary to a simple long-time convergence at equilibrium in W α , a contraction of W α can easily lead to perturbative results.For instance, consider on R d a continuous process solving where Z = (Z t ) t 0 is a random càdlàg process on some state space E and b : Denote by νt the law of Y t .
Proposition 3. Let b be a C 1 vector field on R d satisfying Assumption 1 and P t be its associated semi-group.Let α 2. Assume that T T 0 with T 0 given by (6).Then, for all ν ∈ P(R d ) and t 0, where λ, M α are as in Theorem 1.
This is proven in Section 3.4.The right hand side in (11) can be bounded given additional informations on b, for instance if b(y, z) = b(y) and we simply assume that b − b ∞ < ∞ as in [24,18] then with P t the semi-group associated to b • ∇ + T ∆.In particular, any invariant measure μ of Hence, under our restrictive condition (6), we extend the results of [24], which are restricted to α = 1.More generally, the right hand side in (11) can be bounded under the assumption that E(|b(y) − b(y, Z t )| α ) Q(|y|) for all y ∈ R d for some polynomial Q and then with some moment estimates on Y t obtained by Lyapunov arguments (see e.g. the proof of Theorem 7 below).
A case of particular interest, in view both of the perturbation result and of the condition on the diffusion coefficient T (to be thought as a temperature parameter in statistical physics), is given by systems of interacting particles, detailed in Section 4.
As a summary, the rest of this paper is organized as follows.Section 2 is devoted to the proof of Theorem 1. Section 3 gathers the proofs of the other results stated in this introduction, namely Theorem 2, the Poincaré inequality (9) and Proposition 3, and a discussion on the reversible case.System of interacting particles are studied in Section 4. Finally, we conclude this work in Section 5 by an informal discussion on our method, related works and possible perspectives.
2 Proof of the main theorem Assumption 1 is enforced in all this section, devoted to the proof of Theorem 1.

A probabilistic proof 2.1.1 Synchronous coupling and modified cost
For (B t ) t 0 a standard Brownian motion on R d , we consider (X t , Y t ) t 0 the Markov process on which is called the parallel or synchronous coupling of two diffusions (1).The generator L s of this process is given by Given α 2 and a bounded positive which is a modification of the usual transport cost |x − y| α for the Wasserstein distances.Using that A∇f (x, y) = 0 for f (x, y) = |x − y| 2 , we get For now, assume that ω is such that there exists λ > 0 such that, Then, L s ρ + αλρ 0, i.e. (e αλt ρ(X t , Y t )) t 0 is a submartingale and for all t 0, For any π 0 ∈ Π(ν, ν ), considering an initial condition (X 0 , Y 0 ) ∼ π 0 independent from (B t ) t 0 , we obtain Finally, taking the infimum over π 0 ∈ Π(ν, ν ) yields (5).The proof is thus complete if we are able to construct a bounded positive ω ∈ C 2 (R d ) such that (13) holds for some λ > 0. This is the content of the next section.) − min g (in dimension 1).For r R 2 , g is affine decreasing, for r R 2 * , it is constant, and in between it is convex but with g constrained not to be too large, which thus requires to take R * large enough.

The weight function ω
Our goal is to construct a bounded positive function ω such that Indeed, if this holds, taking λ = c/4, we get We take ω of the form ω(x) = g(|x| 2 ) − inf g with g ∈ C 2 (R + ) to be chosen.Since setting K * = K/4 + c/8, we take g as the C 1 solution of g(0) = 0, g (0) = −2K * /d and where R * > 0 remains to be chosen so that g (R 2 * ) = 0 (and thus g (r) = g (r) = 0 for all r > R 2 * ).See Figure 1 for a draft of the graph of g and ω.Notice that g is not C 2 , but this is easily solved by replacing this g by some We can conclude the proof with g ε and finally let ε vanish in the final result.For simplicity we write the proof directly with g.
For r ∈ [0, R 2 ], we simply have g and thus we choose The proof is concluded by using these estimates in the definition of M and the condition on T .

Alternative proof with Bakry-Emery interpolation
In this section we give a second proof of Theorem 1 (see Section 5 for a discussion on the specific interest of each proof).This proof is similar to the intertwinning method of Joulin, Bonnefont and their coauthors [1,6,7,13] (who focus on reversible cases).For simplicity we only consider the case α = 2, which is the main case of interest due to Theorem 2. The expressions of T 0 and M obtained along this alternative proof are slightly different than those stated in Theorem 1 and established in the first proof (again, we don't try to optimize the estimate on T 0 ).Moreover, in order to justify the time derivatives in this section, we assume that the coordinate functions of b are in A.
The proof is organized in three steps: first we rephrase Assumption 1 as a local condition on the drift, namely a condition on its Jacobian.Second, similarly to Section 2.1.1,we give the proof conditionally to the existence of a suitable weight function.Third, similarly to Section 2.1.2,we construct the weight function.

Step 1: an infinitesimal condition
We start by an equivalent formulation of Assumption 1.We denote by Db the Jacobian matrix of b and k In particular under Assumption 1, k also satisfies (3), with the same K, R, c.Alternatively, assume that there exist K, R 0 and c > 0 such that Then, from (14 . In other words, Assumption 1 is equivalent to assume that the infinitesimal condition (15) holds for some K, R 0 and c > 0. This latter condition is enforced for the rest of Section 2.2.

Step 2: Gamma calculus
Consider a positive a ∈ A, to be chosen later on.Fix t > 0 and f ∈ A. The carré du champ of the generator L is defined as Γ (where Γ(a)/a is understood as 0 if Γ(a) = 0), summing over i ∈ 1, d , we get Assume for now that a is such that there exists λ > 0 such that a and a −1 are bounded and Φ(a) λa .
(This condition is similar to the ones in [1,6,7,13], for instance [1,Theorem 3.2]; in this work, the authors do not assume that a is bounded, as they are interested in Brascamp-Lieb inequalities, namely weighted Poincaré inequalities, but in the present work we are interested in classical Poincaré inequalities and contraction of the Wasserstein distances associated to the standard Euclidean distance, which is why we add this condition so that a|∇f | 2 is equivalent to |∇f | 2 ).Integrating the previous inequality yields and then , which concludes the proof since, thanks to the work of Kuwada [36,37], it is equivalent to the W 2 contraction (5) with the same λ, α = 2 and M = a ∞ a −1 ∞ .Indeed, more precisely, in the case of a diffusion process on R d , as a corollary of [36, Theorem 2.2] (applied with v the Lebesgue measure), we get the following: Proposition 4. Assume that b ∈ A and that k(x) −K for all x ∈ R d for some K 0. For ρ ∈ R, M > 0 and t 0, the two following assertions are equivalent: • For all probability measures ν, µ on R d , Notice that, in [36, Theorem 2.2], the equivalence is stated for the class of functions f which are bounded and Lipschitz.For f ∈ A, we can find a sequence f n of bounded Lipschitz functions ) for some polynomial q, where ε n → 0 as n → 0. Using a synchronous coupling and that the process admits Gaussian moments, it is easily seen that

Step 3: the weight function
It remains to construct a weight a satisfying ( 16) for some λ > 0 under the condition (15).As in Section 2.1.2,we focus on the leading term for large T , setting where ω is a bounded positive function to be chosen.Then, using that a T and Γ(ω Let ω be such that Moreover, for T T0 , we get  Setting V (x) = e c|x| 2 /4 , Assumption 1 classically implies that LV −aV + C for some constants C, a > 0, and that V ∈ L 2 (µ).Moreover, by standard elliptic theory (see e.g.[19, Theorem 0.5 and Condition 0.24.A1]), the process (1) admits a continuous positive transition kernel, hence, for all compact set K ⊂ R d and all t > 0, there exists η > 0 such that inf x∈K P t f (x) K f (z)dz for all positive f .From [32], we get that where we used the equivalence of Wasserstein and gradient contractions of Proposition 4. Letting t → ∞ in the previous equality yields the Poincaré inequality (for all f ∈ A, and then for all f ∈ L 2 (µ) by density).
From the Lumer-Philips Theorem [54, Chapter IX, p.250], the Poincaré inequality is equivalent to ∀f ∈ L 2 (µ) , ∀t 0 Besides, for t > 0 and f ∈ A, we can also bound Integrating with respect to µ and applying this with f replaced by f − µf yields Together with the Poincaré inequality, this means that, for all t 0, i.e.
Now, in the reversible case where b = −∇U , since this implies (see e.g.[12, Lemma 2.14]) that, in fact, and thus C P (µ) T /λ.Finally, the first inequality in ( 8) is simply the Pinsker inequality, and the entropy/W 2 regularisation is proven in [46].It is assumed in the latter work that b is Lipschitz, but it is not used in this part of the proof.We briefly recall the proof for completeness.Remark that if Assumption 1 holds with K = 0 then it also holds with all K > 0 and thus it is sufficient to treat the case K > 0. From [52, Theorem 3.3], under Assumption 1, for all t > 0 all positive f and all x, y ∈ R d , .
Denoting by P * t the dual of P t in L 2 (µ), we apply the previous inequality with f replaced by P * t f for some positive f with µf = 1 and integrate with a coupling measure π ∈ Π(f µ, µ) to get Using that µ is invariant by P t P * t and Jensen's inequality, µ(ln P t P * t f ) ln µ(P t P * t f ) = ln µf = 0 , and taking the infimum over π concludes the proof of (8) with J(t) = K/(1 − e −2Kt ).Then, using the W 2 contraction, for all s ∈ [0, t), The minimum of s → e −2λs /(1 − e −2K(t−s) ) for s t for a fixed t > 0 is attained at s = s * := t − ln (1 + K/λ) /(2K).When s * 0, the proof is concluded by taking s = s * in the previous bound, since

Degenerate convex potential
Let U (x) = |x| β /β for some β > 2. For T > 0, let µ T be the probability law on R d with density proportional to exp(−U/T ).If X is a random variable with law µ 1 , then T 1/β X is distributed according to µ T .The scaling properties of the Poincaré inequality imply that Then, for all x = 0 and all y = x, The term in the inf is non-increasing for r 1, non-decreasing for r 2 and, for r ∈ [1, 2], 1 r .
As a conclusion, for all which means that, for all r > 0, Assumption 1 holds with K = 0, R = r and c = c(r) := r β−2 /(2β − 2).Besides, −x • b(x) = |x| β for all x ∈ R d and thus Theorem 1 applied with α = 2 and , yields (according to Theorem 2 since we are in the reversible case) Notice that, as expected due to the homogeneity of the problem, the powers of r have disappeared.

The reversible case
) for some g ∈ C 2 to be chosen.Assume that g is bounded and that ∇ 2 (U + V )(x) c/2 for all x ∈ R d .Then the Bakry-Emery criterion states that μ ∝ exp(−(U + V )/T ) satisfies a Poincaré inequality with constant 2T /c, and by bounded perturbation we get that µ ∝ exp(−U/T ) satisfies a Poincaré inequality with constant 2T e (max g−min g)/T /c.It remains to choose g.We choose g to be non-increasing and convex, so that we bound As in Section 2.1.2,we simply take g as (a C 1 non-decreasing approximation of) the continuous solution of g (0) = −K/2 − c/4 and In other words, g is exactly such as constructed in Section 2.1.2with d = 1.We end up with As a consequence,

Perturbation of the drift
This section is devoted to the proof of Proposition 3. Define ρ as ρ(x, y) for some α 2, where ω is a positive bounded function to be chosen.Let (X t ) t 0 and (Y t ) t 0 be the solution respectively of (1) and (10) driven by (B t ) t 0 with some initial condition ν.Then with Ψ defined in (12).Let ω be as defined in Section 2.1.2,and λ = c/4.Then, writing m(t) = E (ρ(X t , Y t )) and using the Hölder inequality, and thus Using the equivalence between ρ and |x − y| α and taking the infimum over all coupling of the initial conditions yields which concludes the proof.
4 Interacting particles at high temperature where (B 1,t , . . ., B N,t ) t 0 are N independent standard d-dimensional Brownian motions, In other words, X solves (1) with a drift b whose Theorem 5. Assume that there exist c > 0, a < c and C F , C G , R, M G 0 such that and, for all x, y ∈ R dN N i=1 and Then the semi-group P t associated to the process X satisfies the W 2 contraction (5) .
Notice that the first part of ( 18) implies the second one with a = C G , but in many cases we can have a < C G (possibly a 0, see next section) and the result is much more sensible to the value of a (in particular with the condition a < c) than to the value of C G .
where ω is a positive function to be chosen.As in Section 2.1.2,writing L s the generator on R dN × R dN of a parallel coupling of two processes, we consider separately the leading terms with respect to T and the rest in We take ω as in Section 2.1.2but with K, c replaced respectively by C F +a and c−a (in particular ω is indeed constant outside the ball {|x| R * }).The following holds: Hence, provided T T 0 , the previous bounds yield The conclusion is now similar to the end of the proof of Theorem 1, using that The point of Theorem 5 is that if all the constants in the assumption are independent from N , then so are T 0 , λ and M .In particular, from Theorem 2, we get for the invariant measure of the process a Poincaré inequality independent from N (for T large enough).The restriction to a sufficiently high temperature is very natural for interacting particles systems where phase transitions are expected in the behaviour of the Poincaré inequality at low temperature [48].
A Poincaré constant uniform in N for T large enough is established in [28] in a reversible framework with an explicit invariant measure.Although Theorem 5 does not require reversibility, on the other hand it needs the interaction force G to be bounded, which is not the case in [28] and is a restrictive condition.It is however satisfied in many cases of interest, for instance in adaptive algorithms such as studied in [15], a typical choice is which induces a local repulsion of particles, enhancing the exploration of the state space.More generally, assume that there exist a graph on 1, N of degree D and a bounded and Lipschitz function where i ∼ j means that (i, j) is an edge of the graph.This is the case for mean field interaction (with the complete graph and D = N ) or for interaction with closest neighbors in (Z/nZ) k or 1, n k (with i ∼ j if |i − j| = 1, D = 2k).Then G satisfies the assumptions of Theorem 5 with M G , C G , a which only depend on H (and thus not on the number of particles).For instance, in the particular (reversible) case where the forces are the gradients of some potentials, we get the following corollary of Theorem 5.
Assume that ∇W is bounded and Lipschitz and that there exists c > 0 and a < c such that ∇ 2 V c > 0 outside a compact set and ∇ 2 W −a/2. Then there exist T 0 , C > 0 such that the following holds.For all T T 0 , all N ∈ N, all graph on 1, N , denoting by D the degree of the graph and considering on R dN the potential then e −U/T is integrable and the probability measure proportional to this density satisfies a Poincaré inequality with constant C P CT .
This is a result in the spirit of [28, Theorem 1].

The mean-field case and propagation of chaos
Let us now focus on the case of mean field interactions.More precisely, we work under the following condition: Moreover, there exist c > 0, a < c and C F , R, C G , M G 0 such that F satisfies (17), H is 2C H /3-Lipschitz continuous and for all x, y, x , y ∈ R d , Finally, T T 0 , where T 0 is given in Theorem 5.
It is straightforward to check that this condition implies the assumptions of Theorem 5.As soon as H is 2C H /3-Lipschitz, (19) holds with a = C H , and the condition a < c is then satisfied if the interaction is sufficiently small.However, in some cases, a may be smaller than C H , in particular, in the usual case where H(x, y) = H(x − y) = − H(y − x) for some H, the condition ( 19) reads and this holds with a = 0 if x • H(x) 0 for all x ∈ R d .This is the case for instance for H(x) = −∇W (x) with W (x) = γ 1 + |x| 2 , for any γ 0. Since ∇W is bounded, in this case, Theorem 5 applies whatever the value of γ, i.e. even if the interaction force is not small with respect to the confining force (however, as γ increases, so does the temperature T 0 ).
As N → +∞, according to the propagation of chaos phenomenon, it is well-known that two given particles of the system behave like independent McKean-Vlasov processes solving In other words, νt solves the non-linear equation where The existence, uniqueness of the process ( 20) and of the solution of the equation ( 21), together with time-dependent propagation of chaos estimates, follow from standard arguments [47,41] for initial conditions ν0 in P 2 (R d ) the set of probability measures on R d with finite second moment.
With a W 2 Wasserstein contraction such as given by Theorem 5 at hand, it is straightforward to obtain time-uniform propagation of chaos and a Waserstein contraction for the limit equation.
Theorem 7.Under Assumption 2, there exist (explicit) constants α, β > 0 such that the following holds.For N ∈ N, let P N t be the semi-group associated to L = b • ∇ + T ∆ on R dN and let νt be a solution of (21).Then, for all t 0, where M and λ are given in Theorem 5.
Proof.The proof is essentially the same as the proof of Proposition 3, except that we consider a cost ρ as in the proof of Theorem 5, namely Let (X t ) t 0 be a system of particles with drift b and initial condition ν and Y = (Y 1 , . . ., Y N ) be solutions of (20) (with the same Brownian motions as X) with initial condition ν ⊗N 0 .In particular, Y t ∼ ν⊗N t for all t 0. As in the proof of Proposition 3, writing m(t) = E (ρ(X t , Y t )), we get where λ is given in Theorem 5 and, using that the Y j 's all have the same law, Developing the square and using that the variables A j := H(Y 1,t , Y j,t ) − H * νt (Y 1,t ) for j = 1 are independent and centered, we get Then we bound Under Assumption 2, for all y, y ∈ R d , Integrating in time, Integrating in time (and noticing that λ (c − a)/4), m(t) e −λt m(0) and the proof is concluded by using the equivalence between ρ and the Euclidean norm and taking the infimum over the couplings of the initial distributions.
Theorem 7 has the following consequences.
Corollary 8.Under Assumption 2, considering M, λ, α, β as in Theorem 7, then, for all N ∈ N, k ∈ 1, N and all ν0 , μ0 ∈ P 2 (R d ), the following holds.Let (X t ) t 0 be a system of N interacting particles on R dN with drift b and with initial condition ν⊗N 0 and denote by ν k,N t the law of (X 1,t , . . ., X k,t ).Let νt , μt be the solutions of (21) with respective initial conditions ν0 , μ0 .Then, for all t 0, and there exists a unique stationary solution to (21 .
where C is a constant which depends only on d and on the parameters of Assumption 2, and In the last claim we assumed a finite 5 th moment to get a simple statement but, as can be seen in the proof and from the results of [26], a similar result would hold assuming only a q th finite moment for any q > 2. If only a second moment is available, we still get a similar result if W 2  2 is replaced by W p p for any p < 2.
Proof.Using the interchangeability of particles and that any coupling of νP t and ν⊗N t gives a coupling of the k first particles immediatly yield The first claim then follows from Theorem 7. By the same argument, denoting by µ 1,N t the first d-dimensional marginal of μ⊗N 0 P N t , we get where π 0 ∈ Π(ν 0 , μ0 ) and then taking the infimum over π 0 yields and the second claim is thus obtained by letting N go to infinity in (23).As a consequence, for t large enough, the function Φ t : ν0 → νt , where (ν t ) t 0 is the solution of ( 21) with initial condition ν0 , is a contraction of P 2 (R d ) endowed with the W 2 distance, which is complete.Hence, Φ t admits a unique fixed point for t large enough, and using that Φ t Φ s = Φ s Φ t for all s 0 and the uniqueness of the fixed point we get that the fixed point of Φ t is in fact a fixed point of Φ s for all s 0, i.e. is a stationary solution of (21).
For the last claim, let We bound with c d a constant that depends only on d, where we used the coupling (X J,t , Y J,t ) with J a random variable uniformly distributed over 1, N independent from (X t , Y t ) to bound the first term, and [26, Theorem 1] for the second one.Then, reasoning exactly as in the proof of Theorem 7, we get that for some Q > 0 which depend on the parameters of Assumption 1. Taking the infimum over π t ∈ Π ν⊗N 0 P t , ν⊗N t we end up with , and Theorem 7 concludes the proof.

Discussion
On the two proofs First, let us notice that the main ingredient of the two proofs of Theorem 1 is the very simple construction of a weighted distance.Indeed, the weighted gradient of the second proof is in other words √ a∇ is the gradient associated to the weighted distance r.Weighted distances are a very standard tool, in particular for the study of the long-time convergence of Markov processes.However, it seems to us that the main originality of our work is that the role of the weight is exactly the converse of the usual one.Indeed, under Assumption 1, typically (see e.g.[32,31,22,44,23]), one considers costs of the form ρ(x, y) = d(x, y)(1 + V (x) + V (y)) where d(x, y) = 1 x =y or d(x, y) = f (|x − y|) for some f is the initial distance we are interested in and V is a Lyapunov function (say, V (x) = |x| 2 ), which satisfies LV −cV outside a compact ball, thanks to the deterministic drift part b • ∇.This Lyapunov condition is then combined with some local information (local Poincaré inequality, local Doeblin condition, local coupling condition. . . ) which is available on compact sets.In other words, the weight is used to obtain a decay outside a given compact set.On the contrary, in our case, we take a weight of the form V (x) = C − |x| 2 in a compact ball, so that LV −cV in this ball thanks to the dissipative part T ∆.One of the interest of the first proof is that it gives more information than simply a contraction of the Wasserstein distance at time t > 0: it shows that where (X t , Y t ) t 0 is the synchronous coupling of two diffusions.In particular, this is a Markovian coupling (i.e.(X, Y ) is a Markov process), realized with a single process for all times.On the other hand, one of the interest of the second proof is that it can be adapted to deal with quantities integrated with respect to µ, which could be interesting in the perspective of proving a Poincaré inequality for non-explicit invariant measure of non-reversible diffusion processes without any restriction on T but with the assumption that µ satisfies local Poincaré inequalities (which straightforwardly follow from elliptic lower and upper bounds on the transition density).Indeed, once integrated with respect to µ, the computations of Section 2.2 reads and thus µ a|∇P t | 2 a ∞ a −1 ∞ e −2λt µ|∇f | 2 , which doesn't give a W 2 -contraction but is sufficient to get a Poincaré inequality (see Section 3.1).An argument somehow in this spirit is given in [4], although in a completely different framework, i.e. with a non-elliptic hypoelliptic diffusion, a singular drift and weighted Poincaré inequalities.On the other hand, although the process is non-reversible in this case, its invariant measure is known, and thus the proof relies on the knowledge of L * the adjoint of L in L 2 (µ), which is unavailable if µ is unknown.Besides, in the elliptic case, if L * is known, it should be possible to adapt the arguments of [10,2] to get a Poincaré inequality by considering a Lyapunov function with respect to L * rather than L, as in [4].

Flat torus
As emphasized in the previous paragraph, the proof relies on a synchronous coupling.Now, consider the very simple case of the Brownian motion on the torus T = R/Z, i.e.L = ∆.Considering a synchronous coupling of two such processes leads to |X t − Y t | = |X 0 − Y 0 | (where we write |x − y| the distance on T).Hence, any ρ : T × T → R + such that c|x − y| α ρ(x, y) C|x − y| α for some c, C, α > 0 necessarily satisfies Hence, the first proof of Theorem 1 cannot apply in this case.
Notice that, for compact manifolds, Poincaré inequalities can be obtained by lower and upper bounds on the transition density and then perturbation of the Lebesgue measure.See also [50,51] in the reversible case.Besides, notice that a weighted distance is used in [51], but with the motivation of handling boundaries.

Exponential tail
It is well-known that µ ∝ exp(−U ) satisfies a Poincaré inequality.However, using that U ∞ = 1, and using that This clearly forbids a Wasserstein contraction (5), for any α 1.Indeed,

Wasserstein contraction versus Wasserstein convergence
In the case where µ ∝ exp(−U ) and U is convex at infinity, µ is known to satisfy a so-called log-Sobolev inequality, see [3], which is stronger than the Poincaré inequality and implies (combining the exponential decay of the entropy, the T 2 Talagrand inequality implied by the log-Sobolev one and the W 2 /entropy regularization of [46], see e.g. the proof of [30,Theorem 2]) that there exist C, λ > 0 such that for all t 0 and any probability law ν on R d , W 2 (νP t , µ) Ce −λt W 2 (ν, µ) .
However this convergence towards equilibrium in W 2 is weaker than the contraction (5) for α = 2. Besides, we are interested in cases where the Poincaré inequality does not follow from standard arguments, and thus neither does the log-Sobolev inequality.Besides, in [51, Corollary 1.3], the log-Sobolev inequality is proven from the exponential decay of weighted gradients along the semi-group, but in contrast to the present work it concerns reversible processes, more precisely the perturbation argument used at the end of [51, Corollary 1.3] requires an explicit invariant measure.In order to use a perturbation argument when the invariant measure of the semi-group P t is unknown, we can still say that the measure φµ, where φ is bounded above and below by positive constants, is invariant by the semi-group P φ t f = φ −1 P t (φf ), but it is unclear whether this could be used to adapt the proof of [51] to non-reversible cases with unknown µ.

Non-reversible sampling
In the context of Markov Chain Monte Carlo algorithms, a question is the following: given a known target probability measure µ ∝ exp(−U ), one would like to find, among all the drifts b such that µ is invariant for L = b•∇+∆, the one for which convergence toward equilibrium is the fastest (here for simplicity we fix the diffusion matrix to be the identity), see e.g.[39,29,34,35].If the convergence toward equilibrium is quantified in terms of the L 2 (µ)-norm, then sup f ∈L 2 (µ) e −t/C P (µ) , see e.g.Theorem 2, and moreover this is an equality in the reversible case.As a consequence, adding a non-reversible part to the generator −∇U • ∇ + ∆ can only improve the L 2 convergence rate.For instance, as already mentioned in Section 1, in the Gaussian case where U (x) = x•Ax for some definite positive symmetric matrix A, C P (µ) is the minimum of the eigenvalues of A while the optimal rate obtained with non-reversible elliptic diffusions is the mean of the eigenvalues of A, see [39].
Alternatively, the efficiency of the process can be measured in terms of W 2 -contraction (and this leads to the same conclusion for Gaussian processes).Let B be the set of drifts b which are K-Lipschitz and such that µ is invariant for b • ∇ + ∆ (in practice, the Lipschitz constant impacts the stability of the numerical schemes used to discretize (1), and thus the non-reversible part should not be too large).Then, instead of seeing Theorem 2 as a way to obtain a Poincaré inequality from a Wasserstein contraction, here we can use this result to obtain a constraint on M and λ such that (5) holds in terms of µ, uniformly over B. Another way to see this is the following: for a fixed t > 0 (corresponding to a fixed computational budget), what is the smallest γ(t) one can obtain such that there exist a drift b ∈ B such that where P t is the semi-group associated to b?More generally, for a given b and for s 0, let Since b is K-Lipschitz, we immediatly obtain from a synchronous coupling that γ(s) e Ks and, following the proof of Theorem 2, (24) implies that Hence the lower bound on the contraction rate Of course this is not very informative for large t.
In fact, for sampling algorithm, one is more interested in ergodic averages rather than marginal laws at given times.The bias of an MCMC estimator for a Lipschitz test function is typically bounded as Trying to minimize the right hand side by a suitable choice of drift in B, in any cases it is not possible to get better than C P (µ)/t + o(1/t) due to the constraint (25).Having a better contraction rate in large times is thus only useful if the estimator is 1/(t − t 0 ) t t 0 f (X s )ds for a suitable warm-up time t 0 .
Finally, notice that, in Theorem 1, M √ 2, and thus λ 2T /C P (µ), which means that, for instance, the improvement of the L 2 decay rate from T /C P (µ) to λ obtained by combining Theorems 1 and 2 is small.

Link with a Feynman-Kac eigenvalue problem
This section has to be credited to the anonymous referee who made the following remark: in Section 2.1, writing u(x) = T + αω(x), the proof works as soon as we find λ > 0 and a positive function u such that Lu which is (13), but in fact looking for u, λ such that equality holds in ( 26) is an eigenvalue problem for the operator Lu − αku, and the Krein-Rutman theorem (which holds under Assumption 1, see e.g.[14, Corollary 4.2]) states that in fact such an eigenpair u, λ always exist, with u > 0.
The remaining question is whether λ > 0. Since f t (x) = e −λt u(x) solves we have for u a Feynman-Kac representation u(x) = e λt E x e −α t 0 k(Xs)ds u(X t ) for any t 0. Normalizing u to have µ(u) = 1, using that u grows at most polynomially (again thanks to [14, Corollary 4.2] since under Assumption 1, x → |x| 2 is a Lyapunov function for L) and that µ has all its polynomial moment finite, we get by integrating the previous inequality with respect to µ and using the Hölder inequality that 1 e −λt E µ e −aα t 0 k(Xs)ds This quantity naturally appears when intertwining the semi-group P t with the gradient, as in [13,8,1] (however, notice that in our case, we are just interested in the asymptotic exponential rate, and the expectation is with respect to the invariant measure µ instead of taking the supremum over all initial conditions x ∈ R d ).
It remains to see whether λ > 0. Notice that, by Jensen's inequality, λ − lim sup which means that having a positive mean curvature µ(k) is necessary to proceed with our proof based on the synchronous coupling.
Under Assumption 1, it is classically seen that V (x) = e δ|x| 2 is a Lyapunov function for L (in the sense that LV −rV outside some compact for some r > 0) as soon as δ < c/(2T ).According to [17,Theorem 2.3.],µ thus satisfies a T 1 Talagrand transport inequality (for the Euclidean distance on R d ) with constant θT for some θ > 0 independent from T (we refer to [17] for definitions and details).Then, we can use [27,Corollary 2.4] (although it is written for reversible processes, this assumption is only used in its second part; the first part is a direct corollary of [27,Theorem 2.2] where reversibility is not assumed) to bound, for any v > 0, (where, for consistency with [27], we rescaled the process in time so that the corresponding carré du champ is Γ(f ) = |∇f | 2 , corresponding to the standard distance, instead of T |∇f | 2 ).Here we have to assume that k is a Lipschitz function, which is not a problem since our proof in Section 2.1 works if k is replaced by a lower bound of k.
Then, we follow the proof of [8,Corollary 4.4].Assuming that µ(k) > 0, we introduce the event Finally, as already mentioned, since we only need k to be a lower bound of the curvature, we can take it with a Lipschitz constant arbitrary small.However, this reduces µ(k), which can become negative.But then by assuming T is large enough we can make µ(k) positive again.More precisely: first, fix a lower bound k of the real curvature k defined by (2) (with inf k = −K and k(x) = c for x large enough) with a Lipschitz constant k lip sufficiently small so that αK < c 2 64θ k 2 lip .
At the conclusion of this sketch of proof, we have thus obtained the following: under Assumption 1, for any α > 0, there exists T 0 > 0 such that the operator L − α k has an eigenpair (u, λ) with u > 0 and λ > 0 if T T 0 , where k is some lower bound of the curvature (which implies in particular that Lu − αku −λu).Now, in order to conclude with a result similar to Theorem 1, it remains to do a bit of work on u, which we will not discuss here.Notice that, with an argument which starts with a nonexplicit existence of the positive eigenfunction u, it is not necessarily easy to end up with explicit estimates as in Theorem 1 (in particular for the constant M ).However this approach can be applied in a much more general framework, for instance for non-elliptic hypoelliptic diffusion processes.This question will be the topic of a future work.Moreover, this discussion suggests that the high-temperature regime is in fact a necessary condition for the synchronous coupling to contract distances for sufficiently large times under Assumption 1 (which is consistent with the remarks after Theorem 2).

Figure 1 :
Figure 1: Left: r → g(r).Right: x → ω(x) = g(|x| 2) − min g (in dimension 1).For r R 2 , g is affine decreasing, for r R 2 * , it is constant, and in between it is convex but with g constrained not to be too large, which thus requires to take R * large enough.
) holds with M = 4/3 and λ = c/4.An explicit upper bound of T0 follows from the estimates on ω given in Section 2.1.2.

using Theorem 5 .
Considering a coupling of ν⊗N 0 and μ⊗N 0 of the form π ⊗N 0
s )ds −αµ(k) + v = − s/T )ds −αµ(k) + vfor some v > 0 to be chosen later on.Since k is lower bounded by −K, E µ e −α t 0 k(Xs)ds e αKt P(A) + e −αµ(k)t+vt exp αKt − v 2 t 4θα 2 k 2 lip + e −αµ(k)t+vt .Taking e.g.v = αµ(k)/2, we end up with λ λ If b = −∇U and Assumption 1 holds, let us check what explicit estimate can be obtained by the Holley-Stroock perturbation argument together with the Bakry-Emery curvature one.As in Section 2.2.1, we use that Assumption 1 implies that ∇ 2 U (x)