Quadratic transportation cost in the conditional central limit theorem for dependent sequences

In this paper, we give estimates of the quadratic transportation cost in the conditional central limit theorem for a large class of dependent sequences. Applications to irreducible Markov chains, dynamical systems generated by intermittent maps and $\tau$-mixing sequences are given.


Introduction
Let (X i ) i ∈ Z be a strictly stationary sequence of real-valued random variables (r.v.) with mean zero and finite variance.Set S n = X 1 + X 2 + • • • + X n .By P n −1/2 Sn we denote the law of n −1/2 S n and by G σ 2 the normal distribution N (0, σ 2 ).In this paper, we assume furthermore that the series σ 2 = k ∈ Z Cov(X 0 , X k ) is convergent (under this assumption lim n n −1 VarS n = σ 2 ) and we shall give quantitative estimates of the approximation of P n −1/2 Sn by G σ 2 in terms of the quadratic cost, which is the square of the L 2 -minimal distance.With this aim, we first recall the definition of the L p -minimal metrics.
W p is usually called the L p -minimal distance, and sometimes the Wasserstein distance of order p.It is well known that for probability laws µ and ν on R with respective distributions functions (d.f.) F and G, , where F −1 and G −1 denote respectively the generalized inverse functions of F and G.We refer to [Vil09,Chapter 6] in Villani for the properties of this metric.For (X i ) i ∈ Z a sequence of independent and identically distributed (iid) centered real valued random variables in L 4 , with variance σ 2 , in [Rio09, inequality (1.7)] Rio states that there exists a universal constant c such that for any positive integer n (1.2) nW 2 2 P Sn/ √ n , G σ 2 ⩽ c σ −2 ∥X 1 ∥ 4 4 .In addition, it is also shown in the same paper that this upper bound is optimal.More precisely, for any κ ⩾ 1, let M(4, κ) be the class of the probability measures µ on the real line such that xdµ(x) = 0, x 2 dµ(x) = 1 and x 4 dµ(x) = κ.In case where (X i ) i ∈ Z is a sequence of iid random variables with common law µ in M(4, κ), Theorem 5.1 in [Rio09] asserts that We refer to Bobkov [Bob13] for another proof of (1.2) based on relative entropy and Talagrand's entropy-transport inequality.Actually, the following more general result holds: for any p ⩾ 1, there exists a universal constant c p such that for any positive integer n, (see Rio [Rio09] for p ∈ [1, 2] and Bobkov [Bob18] for p > 2).Extensions to random vectors in R d are given in Bonis [Bon20].We also mention the extensions of the upper bound (1.2) to the m-dependent case and to U -statistics obtained by Fang [Fan19].
In this paper, one of our motivations is to relax the independence assumption and to find sufficient conditions in case of dependent sequences ensuring that (1.4) ) .In the dependent setting, a well known class is the class of irreducible aperiodic and positively recurrent Markov chains (ξ n ) with an atom denoted by A (see the definition [Bol82,p. 286]).Let π be the unique invariant distribution of the Markov chain.From now on, (ξ n ) will be the Markov chain starting from π.Let us then consider the strictly stationary sequence (X k ) defined by X k = f (ξ k ) with f a bounded function such that π(f ) = 0.In view of the regeneration scheme and the upper bound (1.2), one can conjecture that (1.4) holds for S n = n k=1 X k provided that E A (τ 4 A ) < ∞ where τ A is the first return time in A and E A stands for the expectation under P x for x ∈ A. Next, from [Bol82, Lemma 3] and [Rio17b,p. 165], it is known that E A (τ 4 A ) < ∞ is equivalent to (1.5) In this paper we shall prove that (1.4) holds true for any stationary sequence (X k ) k ∈ Z of bounded real-valued random variables satisfying (1.5) for the sequence (α n ) n ⩾ 0 of strong mixing coefficients in the sense of Rosenblatt (see for instance [MPU19, Section 5.1.1.]for a definition of these coefficients in the general case), which includes the case of Markov chains described above.This will be a consequence of a more general result also valid for a class of weakly dependent sequences, which may fail to be strongly mixing.In order to give more precise statements of our results, let us now introduce the dependence coefficients that we will use in this paper.
Definition 1.1.-Let (X i ) i ∈ Z be a stationary sequence of bounded real-valued random variables and F 0 = σ(X i , i ⩽ 0).Let Γ p,q = {(a i ) 1 ⩽ i ⩽ p ∈ N p : a 1 ⩾ 1 and p i=1 a i ⩽ q}, for p and q positive integers.For k ⩾ 0, set θ X,p,q (k) = sup As a consequence of our Theorem 2.1, we will obtain that if then (1.4) holds, which immediately implies that (1.4) holds for additive bounded functionals of a Markov chain satisfying (1.5).In fact we shall give a conditional version of (1.4) and show that when (X k ) k ∈ Z is a stationary sequence of centered and bounded real-valued random variables satisfying (1.6) then Note that in case of bounded functions of a Markov chain (ξ k ) k satisfying E A (τ 4 A ) < ∞, with invariant distribution π, the Schwarz inequality together with (1.7) imply that for any positive measure µ such that dµ = f dπ with f 2 dπ < ∞.Above E µ stands for the expectation of the chain under the initial law µ.
It is noteworthy to indicate that (1.7) implies (1.4).Indeed the following fact is valid.
Fact 1.2.-Let X and Y be two random variables defined on (Ω, A, P) and F be a sub σ-algebra of A.
To see this, let U be a random variable with uniform distribution over [0, 1], independent of F, and let F X|F and F Y |F denote respectively the conditional distribution functions of X and Y given F. Set Then X * has the law P X , Y * has the law P Y and, by (1.1), Taking the expectation, it implies the above fact, since W 2 2 is the minimal quadratic cost.
To prove Theorem 2.1, we shall apply Lindeberg's method, which was used by Billingsley [Bil61] and Ibragimov [Ibr63] in the case of martingales with stationary differences to prove the central limit theorem (we also consider this particular case in our Theorem 2.7).Note that this method was adapted to a large class of dependent sequences (non necessarily martingale differences) to evaluate the L 1 -minimal distance between P Sn/ √ n and G σ 2 , by Pène [Pèn05] in the bounded multidimensional case, and next by Dedecker and Rio [DR08] in the unbounded case (under conditions involving some coefficients similar to θ X,4,3 , or weak mixing coefficients such as those described in Definition 3.2 below).Recently, estimates of the L 1 -minimal distance between P Sn/ √ n and G σ 2 when the underlying process is a function of iid random variables are given in [JWZ21, Theorem 3.1].Their conditions are expressed in terms of some coupling coefficients.
Our paper is organized as follows.Section 2 is devoted to the statements of upper bounds concerning the quadratic transportation cost in the conditional central limit theorem and their applications to pointwise estimates for the distribution function of the normalized sums and its generalized inverse.Applications to α-dependent sequences, τ -mixing sequences and symmetric random walk in the circle are given in Section 3. The proofs are postponed to Section 4. Links between |F −1 Sn/σn (u)−Φ −1 (u)| and W p (P Sn/σn , G 1 ), for any p ⩾ 1, are given in Proposition A.1, where σ n = √ VarS n , Φ −1 is the inverse of the distribution function of the standard normal distribution and F −1 Sn/σn is the generalized inverse of the distribution function of S n /σ n .In particular, rates of convergence for the quadratic cost provide rates of convergence for |F −1 Sn/σn (u) − Φ −1 (u)| (see Corollary 2.5).In the rest of the paper, we shall use the following notation: for two sequences (a n ) n ⩾ 1 and (b n ) n ⩾ 1 of positive reals, a n ≪ b n means there exists a positive constant C not depending on n such that a n ⩽ Cb n for any n ⩾ 1.Moreover, for a real-valued random variable X in L 1 , the notation X (0) means X − E(X).

Quadratic cost in the conditional CLT
The main result of this paper is Theorem 2.1 below.
Now, from the definition of the coefficients θ X,1,1 (k), (2.1) , which is always of a smaller order than the upper bounds (a) and (b).Hence Theorem 2.1 also holds for E(W 2 2 (P Sn/σn|F 0 , G 1 )).We now give applications of Theorem 2.1 to pointwise estimates.We start by Berry-Esseen type estimates.Arguing for instance as in [DMR09, Remark 2.4], Theorem 2.1 together with Comment 2.2 imply the following upper bound.
We now give applications of our main result to estimates of the quantiles and the superquantiles of S n /σ n in the nondegenerate case.Define the 1-risk Q 1,X of X, as in Pinelis [Pin14], by Then Q 1,X (u) is the value of the superquantile of X at point (1 − u).The corollary below, which is a consequence of Theorem 2.1 and Proposition A.1 provides estimates of the accuracy in the central limit theorem for F −1 Sn/σn and Q 1,Sn/σn .Its proof is given in Section A.
Let Y be a standard normal.Then there exists some constant C > 0 such that, for any n ⩾ 1 and any u in (0, 1), Comment 2.6.-From Corollary 2.5(a), for any sequence which can not be deduced from a Berry-Esseen type bound with the rate n −1/2 .Indeed, if ∆ n is defined as in Corollary 2.4, one can only get that If furthermore the sequence (X i ) i ∈ Z is a sequence of martingale differences, then the conditions on the dependence coefficients can be weakened as follows (the proof being less intricate is left to the reader).

α-mixing sequences
Let (Ω, A, P) be a probability space and let U and V be two σ-algebras of A. The strong mixing coefficient α(U, V) between these σ-algebras is defined as follows: Next, for a stationary sequence (Y i ) i ∈ Z of random variables with values in a Polish space S, define its strong mixing (or α-mixing) coefficients of order 4 as follows: Let where As, [MPU19, p. 146], these coefficients can be rewritten in the following form: Let B 1 be the class of measurable functions from S 4 to R and bounded by one.Then Hence, an application of Theorem 2.1 (b) provides the following result.
Corollary 3.1.-Let (Y k ) k ∈ Z be a stationary sequence of random variables with values in a Polish space and such that k ⩾ 1 k 2 α ∞,4 (k) < ∞.Let f be a bounded measurable numerical function and As mentioned in the introduction, this results applies to the class of irreducible aperiodic and positively recurrent Markov chains (ξ n ) with an atom denoted by A, under the condition E A (τ 4 A ) < ∞.Here τ A is the first return time in A and E A stands for the expectation under P x for x ∈ A.

α-dependent sequences and τ -mixing sequences
We start by recalling the definition of the α-dependence coefficients as considered in [DGM10]. where From this result, we can derive rates in the CLT for the partial sums associated with BV observables of the LSV map.More precisely, for γ ∈]0, 1[, let T γ defined from [0, 1] to [0, 1] by This is the so-called LSV [LSV99] map with parameter γ.Recall, that there exists a unique T γ -invariant measure ν γ on [0, 1], which is absolutely continuous with respect to the Lebesgue measure with positive density denoted by h γ .From Corollary 3.3 above and [DGM10, Prop.1.17], we derive that W 2 (P Sn/ √ n , G σ 2 ) ≪ n −1/2 for any γ < 1/4, where f is a bounded variation function and . We now apply Theorem 2.1 to functions of τ -dependent sequences.Before stating the result, some definitions are needed.
Examples of τ η -dependent sequences are given in [DP05].Let (Y k ) k ∈ Z be a stationary sequence of real-valued random variables and f be a bounded and η-Hölder function, with Then, for any positive integers p, q and k, θ X,p,q (k) ⩽ Cτ η,p,Y (k) where C is a positive constant depending only on p, q and ∥f ∥ ∞ .Hence the following result holds.
Corollary 3.5.-Let f be a bounded and η-Hölder function with η ∈]0, 1] and From this result, we can derive rates in the CLT for the partial sums associated with Hölder functions of the LSV map above.Starting from Corollary 3.5 and taking into account [DM15, Prop.5.3 and Inequality (4.2)], we derive that if γ < 1/4, then We now define another class of functions which are well adapted to τ -dependence.
Definition 3.6.-Let c be any concave function from R + to R + , with c(0) = 0. Let L c be the set of functions g from R to R such that be a stationary sequence of bounded real-valued random variables.Then, for any positive integers ℓ and k, τ 1,ℓ,X (k) ⩽ Kc(τ 1,ℓ,Y (k)).As a consequence of Corollary 3.5, the following result holds: where T is a map from [0, 1] to [0, 1] that can be modelled by a Young tower with exponential tails of the return times and ν is the usual invariant measure (see [DM15, Section 4] adapted to the case of exponential tails of the return times).

Symmetric random walk on the circle
Let K be the Markov kernel defined by Kf (x) = (f (x + a) + f (x − a))/2 on the torus R/Z, with a irrational in [0, 1].The Lebesgue-Haar measure m is the unique probability which is invariant by K. Let (ξ i ) i ∈ Z be the stationary Markov chain with transition kernel K and invariant distribution m.
This example has been considered by Derriennic and Lin [DL01] who showed that the central limit theorem holds with the normalization √ n as soon as where f (k) are the Fourier coefficients of f and d(ka, Z) = min i ∈ Z |ka − i|.The aim in this section is to give additional conditions on f and on the properties of the irrational number a ensuring rates of convergence in the CLT.Let us then introduce the following definition: a is said to be badly approximable in the weak sense by rationals if for any positive ε, has only finitely many solutions for k ∈ Z * .
From Roth's theorem the algebraic numbers are badly approximable in the weak sense (cf.Schmidt [Sch80]).Note also that the set of badly approximable numbers in [0, 1] has Lebesgue measure 1.
An application of Theorem 2.1 together with [DR08, Lemma 5.2] and their inequality (5.18) give the following corollary.
Corollary 3.8.-Let X k be defined by (3.1).Suppose that the irrational number a satisfies (3.3).Assume that for some positive ε,

Proof of Theorem 2.1
It is based on the Lindeberg method, which naturally extends to the dependent case.Let us start by giving an overview of the proof in a simplified framework.

Outline of the proof in a simplified framework
Assume in this subsection that ( Let (Y k ) k ∈ Z be a sequence of iid random variables with N (0, σ 2 ) distribution, independent of (X k ) k ∈ Z .Let also Z be a random variable with N (0, We first note that, by the triangle inequality, W 2 (P Sn , G nσ 2 ) ⩽ W 2 (P Sn+Z , P Tn+Z ) + 2σ .It remains to prove that W 2 (P Sn+Z , P Tn+Z ) = O(1).
By [Rio09, Theorem 3.1], if µ and ν are two probability laws on the real line, To control ζ 2 (P Sn+Z , P Tn+Z ) we apply the Lindeberg method.Let Hence, by independence between sequences, for ℓ ⩾ 2 (see Item (1) of the next Lemma 4.4).By Taylor's expansion at order 5, where the remainder term . On another hand, for any positive integer ℓ ⩽ 3, we deduce from the assumptions on the conditional moments that E(f Clearly if we use the bound |E(f ) −1 , we will get a bound of order O(log n) for ζ 2 (P Sn+Z , P Tn+Z ).To get the bound O(1), an additional trick is needed.Actually this additional trick is the content of Item (2) of the next Lemma 4.4.Indeed by the assumptions on the conditional moments, we get that θ X,3,4 (k) = 0 for k ⩾ 1. Therefore Item (2) of Lemma 4.4 entails that there exists a constant c > 0 such that . This ends the proof of the theorem in this simplified framework of constant conditional moments up to order 4, with E(X 3 k |F k−1 ) = 0. Clearly in the more general framework of Theorem 2.1 much work remains to be done.Indeed (X k ) k ∈ Z does not necessarily form a sequence of martingale differences and we do not assume that E(X 3 k ) = 0. To solve the latter problem, we will introduce another sequence of random variables which, in the context of independent random variables, have the same first three moments as the initial random variables.4.1.2.Detailed proof in the general setting of Theorem 2.1 Assume first that σ 2 = 0.In this case G σ 2 = δ 0 and which, combined with (2.1), shows that the upper bounds (a) and (b) hold.
We turn now to the case σ 2 > 0. Let δ be a random variable with uniform distribution over In what follows (Y k ) k ⩾1 will be a sequence of iid random variables independent of G ∞ .In case of Item (a), their common law will be the normal law N (0, σ 2 ) whereas in case of Item (b), we will have to prescribe also their third moment as it is described below.
Let β 3 be a fixed real number.Let Z be a r.v. with distribution N (0, σ 2 /2).There exists a random variable B independent of Z, taking only 2 values and such that We refer to [DR08, Lemma 4.1] for more details.For the proof of Item (b), which is the limit of n −1 E(S 3 n ), as n → ∞, under the conditions of Theorem 2.1 (b).Let (Z k ) k ⩾ 1 be a sequence of independent r.v.'s distributed as Z and let (B k ) k ⩾ 1 be a sequence of independent r.v.'s distributed as B and independent of ( Next, in case of both items, we define n|F 0 = P Sn/ √ n|G 0 , the theorem will follow if one can prove that the upper bounds (a) and (b) still hold with P Sn/ √ n|G 0 replacing P Sn/ √ n|F 0 .With this aim, we shall apply [MR12, Lemma A.1].We start by introducing some notations.Let Let also Λ 2 (E) be the set of measurable functions f : R × E → R wrt the σ-fields L(R × E) and B(R), such that f (•, w) ∈ Λ 2 and f (0, w) = f ′ (0, w) = 0 for any w ∈ E. According to [MR12, Lemma A.1], and denoting by N a N (0, σ 2 )distributed random variable, independent of all the above sequences (so independent of (X k , Y k ) k ), the upper bound (a) will follow if one can prove that whereas the upper bound (b) will follow if In what follows, to shorten the notations, we omit the subscripts for the coefficients θ(k).Proof of Theorem 2.1 (a).-We shall apply the Lindeberg method.Let us first introduce some notations.
where φ t 2 is the density of a N (0, t 2 ).Hence, according to [DMR09, Lemma 6.1], (4.5) f By the Taylor formula at order 3 and using (4.5), we get Now we control the second order term.Let s., we also get by stationarity that Starting from (4.8), it follows that Starting from (4.6) and taking into account (4.7) and (4.10) we derive that To give now an estimate of E(f ′ n−k (S k−1 )X k ), we write It follows that (4.13) We give now an estimate of From now on, we assume that i < [ √ k].We first write In order to estimate the term E(f ′′ n−k (S k−i−1 )X k−i X k ), we introduce the decomposition below: where by convention we set S p = 0 if p ⩽ 0. For any ℓ ∈ {1, • • • , i − 1}, by using the notation (4.9) and the stationarity, we get that As a second step, we bound up |cov(f Hence (4.17 Taking into account the inequalities (4.13)-(4.18),and using that k ⩾ 1 θ(k) < ∞, we get We handle now the quantity We first note that by stationarity, Hence (4.20) On another hand, we write Therefore Hence (4.20) and (4.21) entail that (4.22) The estimates (4.19) and (4.22) yield to (4.23) Taking into account the estimates (4.11) and (4.23), Theorem 2.1 (a) follows.□ Proof of Theorem 2.1 (b).-Recall that in this case the iid random variables (Y k ) k ⩾ 1 have their first three moments defined by (4.1) and (4.2).
Note that, since we assume that j ⩾ 1 jθ(j) < ∞, Therefore, using that f ′ (0, W ) = 0 and that |f ′ (x, W ) − f ′ (y, W )| ⩽ |x − y|, we infer that to prove (4.4), it is enough to show that for any f ∈ Λ 2 (E) and any positive n, This will be done by using again the Lindeberg method.Let us introduce some additional notations.
All along the proof, the following lemma will be used (the proof is postponed to the Appendix A and is based on the fact that the common distribution of the random variables (1) For any i ⩾ 2, there exists a positive constant κ 1 depending on σ 2 and i and such that ∥f Then, for any i ⩾ 2, there exists a constant κ 2 > 0 depending on σ 2 and i such that, for any integer ℓ > 0,

Since the sequence (N, (Y
Next the functions f n−k are C ∞ .Consequently, from the Taylor integral formula at order 5, Taking into account the fact that ∥X k ∥ ∞ ⩽ M and Item 1 of Lemma 4.4, we derive that Therefore, (4.27) Using Lemma 4.4 (1), we first notice that (4.29) Next we develop the first four terms in the right-hand side of the decomposition (4.28) with the help of the Lindeberg method.From now on, to soothe the notation, we shall omit most of the time the index n in all the ∆ Next, we write By Lemma 4.4 (1) we get (4.31)
Notation 4.7.-Let where we recall that m k and m k,i have been defined in (4.56). Since by Lemma 4.4 (1), By simple algebra, and since i ⩾ 1 iθ(i) < ∞, we then derive that (4.65) Next we shall first center the random variables X k−j (X k−i X k ) (0) appearing in the quantity ∆ (1,3)

Now, for any integer
Starting from (4.64) and taking into account (4.65), (4.72) and (4.73), we then obtain In what follows we continue the estimation of each term in the right-hand side of (4.74) and show that the sum over k from 1 to n of their absolute values is bounded by a constant times Let us start by dealing with the quantities ∆ (1,3,0) k,i,2 .With this aim, note first that for m k,j defined in (4.56), Hence, by Lemma 4.4 (1) and the fact that m k ⩽ On another hand, by the Taylor integral formula, According to Lemma 4.4 (1), (4.76) On another hand, for m k,u defined in (4.56), Hence, using Lemma 4.4 (1) and the fact that m Therefore, by Lemma 4.4 (1), (4.79) Taking into account (4.75), (4.76), (4.77), (4.78) and (4.79), it follows that (4.80) With similar (but even simpler) arguments, we infer that the sum over k from 1 to n of the second and third terms in the right-hand side of (4.74 Hence, taking into account (4.82), (4.83), (4.84) and (4.85), we derive that (4.86) Next, recalling the definition (4.56) of m k,u , by Lemma 4.4 (1), note that On another hand, Hence, (4.88) Taking into account (4.87) and (4.88) together with Lemma 4.4 (1), it follows that (4.89) Starting from (4.86) and taking into account the upper bound (4.89), we get that the sum over k from 1 to n of the fourth term in the right-hand side of (4.74) is uniformly bounded as a function of n.More precisely, (4.90) Similar computations (even simpler since we deal with the fourth derivative rather than the third one) give the following upper bound concerning the quantities involved in the fifth and sixth terms in the right-hand side of (4.74): We deal now with the last terms in the decomposition (4.74) and show that Next, let W k,i,j = (X 2 k−j (X k−i X k ) (0) ) (0) .We start by noticing that