Convergence of the empirical spectral measure of unitary Brownian motion

Let $\{U^N_t\}_{t\ge 0}$ be a standard Brownian motion on $\mathbb{U}(N)$. For fixed $N\in\mathbb{N}$ and $t>0$, we give explicit bounds on the $L_1$-Wasserstein distance of the empirical spectral measure of $U^N_t$ to both the ensemble-averaged spectral measure and to the large-$N$ limiting measure identified by Biane. We are then able to use these bounds to control the rate of convergence of paths of the measures on compact time intervals. The proofs use tools developed by the first author to study convergence rates of the classical random matrix ensembles, as well as recent estimates for the convergence of the moments of the ensemble-average spectral distribution.


Introduction
This paper studies the convergence of the empirical spectral measure of Brownian motion on the unitary group U (N ) to its large N limit. Brownian motion on large unitary groups has generated significant interest in recent years, due in part to its relationships with two-dimensional Yang-Mills theory and with the object from free probability theory called free unitary Brownian motion. As is natural in the context of random matrices, there has been particular focus on the asymptotic behavior (as N tends to infinity) of the spectral measure of unitary Brownian motions; see for example [16,19,2,3,11,12,5,9,4] and the references therein.
Of course, many tools have been developed to study the spectral distributions of random matrices in high dimension in a variety of contexts. Among them is an approach developed by the first author with M. Meckes (see [15] for a survey) which allows for quantitative estimates on rates of convergence of the empirical spectral measure in a wide assortment of random matrix ensembles. This approach is based on concentration of measure and bounds for suprema of stochastic processes, in combination with more classical tools from matrix analysis, approximation theory, and Fourier analysis. In the present paper, we combine some of these techniques with recent estimates on the rates of convergence of the moments for the empirical spectral distribution of unitary Brownian motion [4] to prove asymptotically almost sure rates of convergence. We then use these bounds to control the rate of convergence of paths of the measures on compact time intervals.
Statement of results. Let U (N ) denote the unitary group and u(N ) its Lie algebra of skew-Hermitian matrices equipped with the scaled (real) inner product U, V N := N tr(U V * ). This is the unique scaling that gives meaningful limiting behavior as N → ∞; see for example Remark 3.4 of [5]. The inner product on u(N ) induces a left-invariant Riemannian metric on U(N ), and we may define Brownian motion on U (N ) as the Markov diffusion {U N t } t≥0 issued from the identity with generator 1 2 ∆ N , that is, one half the left-invariant Laplacian on U(N ) with respect to this metric. One may equivalently describe U N t as the solution to the Itô stochastic differential equation where W t is a standard Brownian motion on u(N ) (for example, take {ξ k } N 2 −1 k=0 an orthonormal basis of u(N ) with respect to the given inner product and W N t = where the b j t are independent standard Brownian motions on R). This realization of unitary Brownian motion is computationally more useful and is mainly what will be used in the sequel. It should be noted that another standard description of the unitary Brownian motion is via a stochastic differential equation with respect to a Hermitian Brownian motion, which results in a difference of a factor of i in the diffusion coefficient. For t > 0, let ρ N t = Law(U N t ) denote the end point distribution of Brownian motion; ρ N t is called the heat kernel measure on U(N ).
Our primary object of interest is the empirical spectral measure of unitary Brownian motion. A matrix U ∈ U(N ) has N complex eigenvalues of modulus one which we denote by e iθ1 , . . . , e iθN (repeated according to multiplicity), and the spectral measure of U is defined to be the probability measure on the unit circle S 1 given by In particular, for f ∈ C(S 1 ) For each fixed t > 0, U N t is a random unitary matrix, and we denote its empirical spectral measure by µ N t := µ U N t . In [2], Biane showed that the random probability measure µ N t converges weakly almost surely to a deterministic probability measure, which we denote by ν t : that is, for all f ∈ C(S 1 ), The measure ν t represents in some sense the spectral distribution of a "free unitary Brownian motion". For t > 0, ν t possesses a continuous density that is symmetric about 1 ∈ S 1 . When 0 < t < 4, ν t is supported on an arc strictly contained in the circle; for t ≥ 4, supp(ν t ) = S 1 . The paper [4] presents a nice brief summary of these and other properties of ν t and the construction of free unitary Brownian motion.
In the present paper, we give estimates on the L 1 -Wasserstein distance between the empirical spectral distribution µ N t and its limiting spectral measure ν t , where for probability measures µ and ν on C, the L 1 -Wasserstein distance is defined by π is a coupling of µ and ν .
We will also make use of the equivalent dual representation of W 1 due to Kantorovich and Rubenstein: where |f | L denotes the Lipschitz constant of f . The main results of this paper are the following.
denote the empirical spectral measure U t as above, and let µ N t denote the ensembleaveraged spectral measure of U N t defined by Then there is a constant C ∈ (0, ∞) such that with probability one, for all N ∈ N sufficiently large and t > 0, . and, for all N ∈ N sufficiently large and t ≥ 8(log N ) 2 ,

Theorem 2.
Let ν t be the limiting spectral measure for unitary Brownian motion described above. There is a constant C ∈ (0, ∞) such that for all N ∈ N and t > 0 One may infer from these bounds direct (a.s.) estimates on the rate of convergence of the empirical spectral distribution to its limiting distribution for all sufficiently large N . To the authors' knowledge, these results constitute the first known rates of convergence for µ N t itself; previously the only known convergence rates were for moments of the ensemble-averaged spectral measure µ N t [4]. A key advantage of such rates is that they may be applied to obtain almost sure convergence of paths of spectral measures. The following theorem gives uniform bounds on the Wasserstein distance between the empirical spectral measures and the deterministic limiting measures on compact time intervals. Theorem 3. Let T ≥ 0. There are constants c, C such that for all x ≥ c T 2/5 log(N ) In particular, with probability one for N sufficiently large As a technical tool, we also determine rates for the convergence in time of Biane's measure to the uniform distribution on S 1 . Proposition 4. Let ν t denote the limiting spectral measure and ν the uniform measure on S 1 . Then there is a constant C ∈ (0, ∞) so that for all t ≥ 1 The organization of the paper is as follows. In Section 2, we establish improved concentration estimates for heat kernel measure on U(N ) via a coupling of Brownian motions on S 1 and SU(N ). These estimates are then used in Section 3 to prove Theorem 1. In Section 4 we use Fourier and classical approximation methods, as well as the previously mentioned coupling argument, to give bounds on the rate of convergence of the ensemble-averaged spectral measure to the limiting measure ν t as in Theorem 2. In this section, we also give the proof of Proposition 4 using similar methods. Finally, in Section 5, we prove a tail bound on the metric radius of the unitary Brownian motion and a continuity result for the family of measures {ν t } t>0 , which are then both used to give the proof of Theorem 3.

A concentration inequality for heat kernel measure
In this section, we will consider concentration of measure results for Lipschitz functions of the following form. Let (X, d) be a metric space equipped with Borel probability measure ρ. Then, under some conditions, there exists C > 0 such that, for all r > 0 and F : X → R Lipschitz with Lipschitz constant L and E|F | < ∞, Concentration estimates of this type are standard for heat kernel measure on a Riemannian manifold with curvature bounded below. We recall here the necessary results. Let (M, g) be a complete Riemannian manifold, and let ∆ denote the Laplace-Beltrami operator acting on C ∞ (M ). We write P t = e t∆/2 to denote the heat semigroup; that is, for t > 0 and any sufficiently nice function f : M → R, is the heat kernel measure. If Ric denotes the Ricci curvature tensor on M , then Ric ≥ 2k for k ∈ R implies that for all t > 0 the estimate (1) holds for ρ t with coefficient C(t) = 2(1 − e −kt/2 )/k, where when k = 0, we interpret this to be C(t) = t. (A typical proof is via log Sobolev estimates.) See for example Corollary 2.6 and Lemma 6.3 of [10] (stated in the case that k ≥ 0, which is the only relevant case here).
For small t the general machinery described above leads to a sharp concentration estimate for heat kernel measure ρ N t on U (N ). For large t, the estimates are no longer sharp, but we can improve them using a coupling approach inspired by one in [14]. The following lemma gives the key idea.
Lemma 5. Let b 0 be a real-valued Brownian motion and z t := e ib 0 t /N , and let V t be a Brownian motion on SU (N ) issued from the identity. Then z t V t is a Brownian motion on U(N ).
Proof. Set Z t := z t I N , and note that z t and Z t satisfy the stochastic differential equations be an orthonormal basis of be independent real-valued Brownian motions. Theñ , and V t satisfies the stochastic differential equation (Here • denotes a Stratonovich integral, which is then expressed as an Itô integral via the usual calculus.) Since W t = b t +W t is a Brownian motion on u(N ), this implies that z t V t is a Brownian motion on U(N ).
We use this realization of the Brownian motion on U(N ) along with concentration properties of the laws of z t and V t to obtain sub-Gaussian concentration independent of t on U (N ) for large t.
Proposition 6. Let U t be distributed according to heat kernel measure on U (N ), and let F : Proof. To prove the first statement, observe that since the Ricci curvature on U (N ) is nonnegative, the comments preceding Lemma 5 imply that the desired concentration estimate holds for To prove the second statement, observe that the representation of U t in Lemma 5 implies that Now for the first term, measure concentration for V t follows again from curvature considerations: following for example Proposition E.15 and Lemma F.27 of [1], one may compute the Ricci curvature on SU (N ) with respect to the given inner product as Thus, by the discussion preceding Lemma 5, Law(V t ) on SU (N ) satisfies the following concentration estimate: if G : is an L-Lipschitz function on SU (N ), and so the first term of (2) is bounded by 2e − r 2 4L 2 . For the second term of (2), let To deal with the first term in (3), let E Vt denote integration over V t only, E zt integration over z t only, and let E zt|K=k denote integration over z t conditional on denotes the density of V t with respect to Haar measure on SU (N ). Now, for V fixed, F (·V ) is an N L-Lipschitz function on S 1 . So, conditioned on K = k, F (z t V ) can only fluctuate by as much as 2πL. Thus if r 4 > 2πL, the first term is zero. For r 4 ≤ 2πL, we may just use the trivial bound of 1 and choose C in the statement of the proposition so that C ≥ e (8π) 2 /4 .
To deal with the second term in (3), note that we can replace V t with a Haardistributed random matrix V for t sufficiently large. Indeed, letting dV denote integration with respect to Haar measure on SU (N ), and assuming without loss in generality that F (I N ) = 0, since the diameter of U (N ) is N . A sharp estimate of the time to equilibrium of V t was proved in Theorem 1.2 of [18], from which it follows (see the discussion preceding the theorem in [18], and note that the normalization here differs by a factor of 2 from the one used there) that Thus if t ≥ 8(log N ) 2 , replacing V t by V will only affect the constants. Consider therefore and write z t = ω t e and similarly where we have used again that for fixed V , F (ωV ) is an N L-Lipschitz function of ω, and here ω lies within an arc of length 2π N . The estimate now follows as in the first term.

Concentration of µ N t
Armed with the concentration inequality for heat kernel measure, the proof of Theorem 1 is an application of the program laid out in [15] for estimating the Wasserstein distance between the empirical spectral measure of a random matrix and the ensemble average, in the presence of measure concentration. Since it is relatively brief, we include the detailed argument here for completeness.
The first step is to bound the "average distance to average" EW 1 (µ N t , µ N t ) as follows.

Proposition 7.
There is a constant c ∈ (0, ∞) such that for all N ∈ N and t > 0 , and for all N ∈ N and t ≥ 8(log N ) 2 Proof. We will give the proof of the first statement only, which applies the first half of Proposition 6; the proof of the second statement is identical using only instead the second half of Proposition 6.
Recall that where |f | L ≤ 1. That is, our task is to estimate the expected supremum of the centered stochastic process {X f } |f |L≤1 , with Note that without loss we may choose the indexing set to be 1-Lipschitz functions on the circle with f (1) = 0; write Lip 0 (1) for the set of all such functions. Now, if f is a fixed Lipschitz function and µ U denotes the spectral measure of U , then [13], and note the different normalization of the metric on matrices), and so by Proposition 6, That is, the stochastic process {X f } f ∈Lip0(1) satisfies a sub-Gaussian increment condition. Now, if {X v } v =1 is a centered stochastic process indexed by the unit ball of a finite-dimensional normed space V , and {X v } satisfies the increment condition for each x > 0, then it is a consequence of Dudley's entropy bound (see [15] for a detailed proof) that The index set Lip 0 (1) is the unit ball of an infinite-dimensional normed space, but Lipschitz test functions may be approximated by piecewise linear functions coming from a finite-dimensional space. Specifically, for m ∈ N, let A

For any
and so

The space of functions for which A (m) 0
is the unit ball is (m − 1)-dimensional, and so it follows from (6) that completes the proof.
The proof of Theorem 1 is completed via the concentration of W 1 (µ N t , µ N t ) about its mean, as follows.
Proposition 8. For all t > 0, N ∈ N, and x > 0, and there exists C ∈ (0, ∞) such that for all t ≥ 8(log N ) 2 , N ∈ N, and x > 0, Again, we prove only the first statement and the proof of the second is analogous.
Consider the mapping F : U (N ) → R given by where µ U is the spectral measure of U and µ N t is the ensemble-averaged empirical spectral measure of U N t as before. The function F is a 1 N -Lipschitz function of U (again, see Lemma 2.3 of [13]), and so by Proposition 6, for all t > 0 and all x > 0, From the tail estimate of Proposition 8 together with Proposition 7, it follows that for any t, x > 0, In particular, an application of the Borel-Cantelli lemma with x N = c t N 2

1/3
completes the proof of the first statement of Theorem 1. The second statement follows in the same way.

Convergence to ν t
The previous section established a bound on the distance between the (random) spectral measure µ N t and the ensemble average µ N t . The picture is completed by obtaining a rate of convergence of µ N t to the limiting measure ν t . The following is relevant for moderate t. Theorem 9. There is a constant C ∈ (0, ∞) such that for all N ∈ N and t > 0 The proof is via Fourier analysis and classical approximation theory, following the approach of Theorem 2.1 in [13]. The key ingredient of this proof is the bound (7) below, which was proved in [4]. Let and observe that where U t is a Brownian motion on U (N ). Given f : S 1 → R a 1-Lipschitz function, it is known that |f (k)| ≤ C k for k ≥ 1 (in fact, C = π 2 ; see, for example, Theorem 4.6 of [8]), and so Now, by Theorem 1.3 of [4], for t and k fixed, Thus, The proof now proceeds exactly as in Theorem 2.1 of [13]. A theorem of Lebesgue implies that where the infimum is over all trigonometric polynomials g(z) = |k|<m a k z k ; see for example Theorem 2.2 of [17]. Combining this with Jackson's theorem (Theorem 1.4 of the same reference) implies that f − S m ∞ ≤ C ′ log m m , and thus Choosing m = (N/t) 2/5 then gives the stated bound.
The bound above decays if and only if t = o(N/((log N ) 5/2 )). But for sufficiently large t, both µ N t and ν t are close to the uniform measure on the circle. This is not reflected in the bound above, which gets worse for large t. The following propositions treat the large t case by appealing to convergence to stationarity. Proposition 10. Let µ N t denote the ensemble-averaged spectral measure of a random matrix U t distributed according to heat kernel measure on U (N ), and let ν denote the uniform probability measure on S 1 . There are constants C, c ∈ (0, ∞) so that for all N ∈ N and t > 0 Proof. First recall again that, as in the proof of Proposition 7, if µ U denotes the spectral measure of U , then for fixed f : S 1 → R with |f | L ≤ 1, the function Since ν is the spectral measure of a Haar-distributed random unitary matrix U on U (N ), this means that where · N is the norm induced by the scaled inner product ·, · N , and this holds for any coupling (U t , U ) of heat kernel measure and Haar measure. Taking expectation gives Taking the supremum over f gives that and now taking infimum over couplings we have Now consider the coupling U t d = z t V t from Lemma 5, where z t = e ib 0 t /N for b 0 t a standard Brownian motion on R and V t an independent Brownian motion on SU (N ) with V 0 = I N . One can similarly obtain Haar measure on the unitary group from uniform measure on an interval and Haar measure on SU (N ): if z = e iθ/N with θ uniform in [0, 2π) and V is independent of θ and distributed according to Haar measure on SU (N ), then zV is distributed according to Haar measure on U (N ); see for example Lemma 16 of [14]. Moreover, by the translation invariance of Haar measure, θ could also be distributed uniformly on [2πk, 2π(k + 1)) for any k ∈ Z, or indeed be distributed according to any mixture of uniform measure on such intervals, as long as the mixing measure is independent of V . Given any such z t , z, V t , and V , for any F : U (N ) → R a 1-Lipschitz function, we have that The first term of (9) was already bounded in the course of the proof of Proposition 6: To treat the second term, we may as in the proof of Proposition 6 write z t = ω t e 2πiK N , with ω t in the arc from 1 to e 2πi N and K ∈ {0, . . . , N − 1}, and similarly z = ωe 2πiK N the second term of (9) can be bounded as where the second equality follows from the independence of V with (z, z t ) and Fubini's theorem, and the inequality uses the fact that, for V fixed, F (ωV ) is N -Lipschitz as a function of ω, with ω, ω t lying in an arc of length 2π N . Combining this last estimate with (8), (9), (4), and (5) implies that Finally, we compare the limiting (large N ) measure ν t to the uniform measure ν. We restate and prove here Proposition 4.
Proposition 4. For ν t and ν defined as above, there is a constant C ∈ (0, ∞) so that for all t ≥ 1 Observe in particular that as t → ∞, t 3/2 e −t/4 ≤ e − t 8 log(N ) , and so Theorem 2 follows from Propositions 10 and 4 together with the triangle inequality.
Proof of Proposition 4. The measure ν t is symmetric, and the moments of ν t for k ≥ 1 are given by see [2]. As in the proof of Theorem 9, for a fixed 1-Lipschitz test function f : and we have that |f (k)| ≤ C k for all k ≥ 1. Then since both ν t and ν are probability measures on S 1 and S 1 z j dν(z) = 0 if j = 0, is decreasing as a function of ℓ on {1, . . . , k}, it follows that since t, k ≥ 1. By induction and the fact that A 1 (t) = 1, this implies that It now follows from (10) that tm 2 e −t/2 k−1 .
Choose m = 1 √ 2t e t/4 , so that tm 2 e −t/2 ≤ 1 2 . Then As in the proof of Theorem 9, we have that S m − f ∞ ≤ C ′ log m m , which for the chosen value of m yields Combining these estimates completes the proof.

Convergence of paths
This section is devoted to the proof of Theorem 3. The idea is to first discretize the interval [0, T ] and apply the bound from Proposition 8 at the discretization points, then move from approximation at this discrete set of points to approximation along an entire path via a continuity property of the family of measures {ν t } t>0 .
The following tail bound is used in both parts of the argument.
Proposition 11. Let {U t } t≥0 denote Brownian motion in U (N ) with U 0 = I N , and let d g denote the geodesic distance on U (N ) induced by ·, · N . Then for all δ, r, s > 0, Proof. If d g (U, I n ) < s, then by left invariance of the metric and the triangle inequality Thus, Applying the bound in Equation ( Then, recalling again that Ric ≥ 0 on U (N ), the Bishop-Gromov comparison theorem allows us to control the volume of balls in U (N ) by the volume of balls in R N 2 (see for example Theorem 3.16 of [7]); in particular, which completes the proof.
The following lemma gives the required continuity for the family of measures {ν t }.

Lemma 12.
There is a constant c such that for all 0 < s < t Proof. The triangle inequality for W 1 and Theorem 9 imply that for any N Moreover, recall that since U → f dµ U is |f |L N -Lipschitz. Trivially, for any U, V ∈ U (N ), U − V N ≤ d g (U, V ). So, using the stationarity of increments together with Proposition 11 with r = 2s = c 2 N √ t − s, Choosing c large enough that log 3 + log N N 2 − c 2 8 < 0 for all N , this gives that E U t − U s N ≤ cN √ t − s + 1 and thus W 1 (ν t , ν s ) ≤ C (t 2/5 + s 2/5 ) log N N 2/5 + c √ t − s + 1 N .
Since this holds for any N , the result follows.
Proof of Theorem 3. Let m ∈ N such that T m ≤ 1, and for j = 1, . . . , m, let t j := jT m . By Lemma where the first equality is because U t ∈ U (N ) and the second is by the stationarity of the increments of Brownian motion. It follows from this and (11) that Applying Proposition 11 to the first term with 2s = r = N x 6 gives that For the second term, applying the estimate following Proposition 8 together with Theorem 9, if x ≥ 3C T 2/5 log(N ) We thus have that, for any m ∈ N such that T m ≤ 1 and x ≥ 3C T 2/5 log(N ) Choosing m = 72 T log 3