Kesten-McKay law for the Markoff surface mod p

For each prime $p$, we study the eigenvalues of a 3-regular graph on roughly $p^2$ vertices constructed from the Markoff surface. We show they asymptotically follow the Kesten-McKay law, which also describes the eigenvalues of a random regular graph. The proof is based on the method of moments and takes advantage of a natural group action on the Markoff surface.


Introduction
The Kesten-McKay Law governs the eigenvalue distribution of a random d-regular graph in the limit of a growing number of vertices [Kes59,McK81].The limiting probability density function is This spectral density comes from the Plancherel measure on the infinite d-regular tree, and one might expect a similar eigenvalue distribution for non-random d-regular graphs provided they resemble their universal cover closely enough in the sense of having few short cycles.The purpose of this article is to establish such a result for a family of 3-regular graphs constructed from the Markoff equation x 2 + y 2 + z 2 = xyz modulo large prime numbers p → ∞.The vertices, roughly p 2 in number, are simply the solutions (x, y, z) in F 3 p excluding (0, 0, 0).The edges connect (x, y, z) to (x, y, xy − z), (x, xz − y, z), and (yz − x, y, z), the Markoff equation being preserved by these operations.If an edge connects a vertex to itself, then it must be counted just once in order for the graph to be 3-regular.We will write M (F p ) for the vertex set and M p for the graph.The eigenvalues {λ j } of the resulting graph can naturally be thought of as a measure on [−3, 3], namely and our main result is that the moments of this measure converge as p → ∞ to those of the Kesten-McKay measure.
Theorem 1.1.-There are absolute constants c > 0 and C > 1 such that for L c log p, and with an implicit constant independent of both p and L, Thus one can take L to be a small multiple of log p and the error term C L /p will remain negligible.More precisely, we require L < 1 16 log 2 log p − 7 ≈ 0.090168 log p.Our proof of Theorem 1.1 permits C = 3 × 2 16 = 196608, which we have not optimized, but an exponential dependence on L is inevitable.As we will explain heuristically at the end of the paper, we do not expect the moments to agree if L/ log p is too large.
Taking linear combinations and applying Theorem 1.1 for L fixed as p → ∞, we obtain Theorem 1.2.-For any fixed polynomial f , the eigenvalues λ j of the Markoff graph mod p satisfy Taking L growing simultaneously with p gives much more information than one could achieve from any fixed L. In particular, we deduce the following bound for the discrepancy between µ p and ρ 3 .It is plausible that one could replace 1/ log p by 1/p in both of these corollaries.To use our estimates for moments, we approximate the discontinuous indicator function by polynomials, and this entails some loss.
Figure 1.1 shows the histogram of eigenvalues for the Markoff graphs constructed from p = 83 and p = 89, illustrating the fit to the Kesten-McKay law.For 3-regular graphs, the support is [−2 √ 2, 2 √ 2] and the distribution is bimodal, with maxima at ± √ 7. We begin in Section 2 with the overall strategy of comparing the Kesten-McKay moments with those of the graphs we construct.This reduces the problem to counting the fixed points of a natural group action.In Section 3, we compute the fixed points in several examples and outline a heuristic that would give a better dependence on L in Theorem 1.1.In Section 4, we review the connection between the Markoff surface and GL 2 (Z), which is the basis for the actual proof.In Theorem 5.1, we prove that an element has O(p) fixed points with an implicit constant depending on its entries as a matrix in GL 2 (Z).In Section 6, we complete the proof of Theorem 1.1 by noting that the matrix entries are exponential in the length L of the word.In Section 7, we turn to the proof of Corollaries 1.3 and 1.4.In Section 8, we conclude by comparing the Kesten-McKay law to other (more difficult) questions about the graphs M p , in particular their connectedness and spectral gap.We use the rest of this Introduction to summarize some of the recent interest in the Markoff equation and its solutions modulo p.
The original Markoff surface is defined by the cubic equation (1.4) and its solutions in nonnegative integers (x, y, z) are called Markoff triples.It differs from our normalization x 2 + y 2 + z 2 = xyz by a scaling (x, y, z) → 3(x, y, z), which is invertible over F p for p 5. The Markoff equation is a very special case which offers a great simplification compared to other cubic surfaces.The only cubic term in equation (1.4) is 3xyz, so upon fixing two variables, it is only a quadratic equation for the third.Exchanging the two roots of this quadratic allows us to move from one triple to another.By Vieta's Rule, the two solutions of a quadratic must add up to its middle coefficient, so one such move sends (x, y, z) to another Markoff triple (x, y, 3xy − z).There is another move for each of the variables.Markoff proved in 1880 [Mar80] that any Markoff triple except (0, 0, 0) can be reached starting from the solution (1, 1, 1) by a sequence of Vieta operations and transpositions.In contrast, for a general cubic surface, there is no known method for deciding whether there are integer solutions, let alone finding all of them.For instance, it remains out of reach to determine whether a given number is a sum of three (possibly negative) cubes.The Markoff triples can be displayed as a 3-regular tree, with (1, 1, 1) as the root and edges giving the action of the Vieta moves.Reducing this Markoff tree modulo a prime p yields a finite graph with cycles, which is one connected component of the graph we study below.In principle, there may be additional solutions over F p that do not come from reducing integer solutions mod p. Hence it is no longer guaranteed that all solutions can be found by the Vieta moves, although in practice it seems that they can.If every solution mod p lifts to a solution over the integers, then the same sequence of Vieta moves used to reach the lift will reach its image mod p because the moves are polynomial operations in (x, y, z).Thus the graph of solutions over F p will be connected.The connectedness of these graphs for all p is the question of whether strong approximation holds for equation (1.2), that is, whether solutions mod p can always be lifted to integer solutions.Baragar was the first to conjecture that this connectedness does hold for all p and he verified it for p 179 (see [Bar91,p. 124]).
Bourgain-Gamburd-Sarnak [BGS16] proved that, for most primes p, there is only a single component of nonzero solutions (x, y, z) = (0, 0, 0).Their method fails in case p 2 − 1 has many prime factors, which happens only for rare values of p.Even for these exceptional primes, the Bourgain-Gamburd-Sarnak argument shows that there is a giant component containing, for any given ε > 0, all but p ε of the vertices, while any putative extra components would have size at least a power of log(p).On the quantitative level, some improvements have been made by Konyagin-Makarychev-Shparlinski-Vyugin [KMSV20, Theorems 1.3 and 1.4].Meiri-Puder [MP18] prove that the Markoff action on the largest component is highly transitive: up to grouping solutions by sign changes as in (2.1) below, it is either the full symmetric group or its alternating subgroup.Cerbu-Gunther-Magee-Peilen [CGMP20] had proposed earlier that the alternating group arises when p ≡ 3 mod 16, and the full symmetric group otherwise.

Method of moments
Let us define the Markoff graph over F p more precisely.The vertices are the triples (x, y, z) solving x 2 + y 2 + z 2 = xyz, except (0, 0, 0).The most natural graph for our purposes is defined by taking an edge between (x, y, z) and each of its images (x, y, xy−z), (x, xz−y, z), and (yz−x, y, z).We denote the graph by M p and its vertex set by M (F p ), with edges given by the Markoff moves m 1 (x, y, z) = (yz − x, y, z), m 2 (x, y, z) = (x, xz − y, z), and m 3 (x, y, z) = (x, y, xy − z).It has p 2 ± 3p vertices depending on whether p is congruent to 1 or to 3 modulo 4. The total number of solutions to x 2 + y 2 + z 2 = xyz mod p is p 2 + 3( −1 p )p + 1, but we consider (0, 0, 0) separately from the other solutions because it is in an orbit of its own under the Markoff moves.See Carlitz's note, [Car57, equation (2)] for this count.
At present, we have no guarantee that this graph is connected.Baragar [Bar91] conjectured that M p is connected for any prime p, and Bourgain-Gamburd-Sarnak proved connectedness unless p 2 −1 has many small factors in a quantified way [BGS16].They also prove that, even in a possibly disconnected case, there is a giant component containing at least p 2 ± 3p − O(p ε ) vertices for any ε > 0. Our Theorem 1.1 applies both to the whole graph, possibly disconnected, and also to its giant component.
The graphs we study are not simple: Although M p does not contain multiple edges, there are loops at a small fraction of the vertices.On the order of p vertices out of p 2 have loops.We discuss this further in Proposition 3.1, and the presence of loops appears again in Lemma 5.4.It has some importance for our main proofs.
The graph M p is obtained directly from the underlying symmetry of the equation x 2 + y 2 + z 2 = xyz under the Markoff moves m 1 , m 2 , m 3 .Sometimes it may be preferable to take other edges reflecting further symmetries of the Markoff surface.The Markoff equation is preserved by all permutations of (x, y, z) as well as the four double sign changes leaving xyz invariant, namely where the signs obey σ 1 σ 2 σ 3 = 1.One could add edges corresponding to any of these.
Or one could streamline the graph by first taking the quotient by sign changes, or using alternative generators that combine the Markoff moves with permutations.In this way, one could obtain graphs with fewer loops and a closer fit to the Kesten-McKay law.Nevertheless, the Markoff moves themselves seemed the most natural choice to us.
Let A be the adjacency matrix for the Markoff graph mod p, that is, the matrix indexed by vertices with A ij = 1 when there is an edge between i and j and A ij = 0 otherwise.Note that the diagonal entries A jj are typically 0, but may be 1 when there is a loop connecting j to j. Permuting the vertices changes the adjacency matrix to σAσ −1 , where σ is the corresponding permutation matrix.Thus the eigenvalues of A do not depend on any choice of ordering.The connectedness of a graph is closely related to its eigenvalues.Indeed, for a d-regular graph, the number of connected components is the multiplicity of d as an eigenvalue.The Kesten-McKay law is a general theorem about the distribution of eigenvalues for graphs with few short cycles, either random or deterministic.We quote the following theorem of McKay [McK81,Theorem 1.1] to emphasize the generality of the Kesten-McKay law, although we will not be able to use this version to deduce the rate of convergence in Theorem 1.1.
The combinatorial significance of the Kesten-McKay measure is that its moments count walks in a d-regular tree where the walks must start and return at a designated root of the tree.See [McK81, or [Kes59, p. 14] for more on the origins of ρ d .
For the case of the Markoff graph mod p, the number of vertices is p 2 ± 3p.Thus all we would have to show to deduce a qualitative result along the same lines as Theorem 1.1, with no explicit error term, is that the number of k-cycles is o(p 2 ) for each fixed k.For intuition, imagine proving McKay's theorem by the method of moments.The moments are given by tr(A L ) = λ L j up to normalization by p 2 ± 3p.On the other hand, there is a combinatorial interpretation.For L 1, the trace tr(A L ) counts closed paths of length L in the graph: where the inner sum runs over paths of length L from x to x, and the outer sum runs over all vertices x.Changing the order of summation, we can rewrite this as In the summation, w is a (not necessarily reduced) word of length L in the Markoff moves m 1 , m 2 , m 3 and g w is the corresponding element of the free product Z/2 * Z/2 * Z/2 with generators m 1 , m 2 , m 3 .Note that if g w = I is the identity, then all of the p 2 ± 3p vertices are fixed points.These words therefore make a contribution of (#length L paths beginning and ending at a root in a 3-regular tree)(p 2 ± 3p).
We divide by p 2 ± 3p for normalization, and the remaining path-count is exactly the corresponding Kesten-McKay moment.Our task is to show that the remaining contribution, made by words of length L that do not evaluate to the identity, is of a lower order of magnitude as p → ∞.

Some examples and heuristics
To argue that the identity contributes the main term, we must study the fixed points of other words in the Markoff moves.Let w = g 1 • • • g L be a reduced word of length L where each g i is one of the Markoff moves m 1 , m 2 , m 3 .Write the fixed point equation as where f , g, and h are polynomials that can be computed by successively applying the moves that make up the word w.One might expect this system of four equations in only three unknowns to have no solutions, but there may be redundancy.Indeed, the system always has (0, 0, 0) as a trivial solution.The extreme case is w = 1, for which the first three equations amount to (x, y, z) = (x, y, z) and every point on the Markoff surface is fixed.For nontrivial words, we will use the special structure of the Markoff surface to show that there is at least one nontrivial constraint in addition to the equation x 2 + y 2 + z 2 = xyz.First, we consider a few examples of short words.
Proposition 3.1.-(Fixed points of short words) (1) The number of fixed points of a single Markoff move m i is p − 4 − ( −1 p ), and in particular is at most p.
(2) A reduced word of length 2 has no fixed points.
(3) A reduced word of length 3 either has no fixed points or else is conjugate to a single Markoff move.
Part (1) shows that the graph M p contains loops, but only at a small fraction of the vertices.This example also shows that it is possible for (3.1) to reduce to just one nontrivial constraint in addition to the Markoff equation.The fact that the words of length 1 together have only on the order of p fixed points has some importance for our main proofs and we will revisit it in Lemma 5.4.Part (2) shows that there are never multiple edges joining the same pair of vertices.Part (3) shows that the graph contains no triangles.
Proof of (1).-This count is given in [CGMP20, Lemma 2.3], noting that p − 4 − ( −1 p ) is p − 5 when p ≡ 1 mod 4 and p − 3 when p ≡ 3 mod 4. For the reader's convenience, we sketch a similar argument here.If (x, y, z) = (x, y, xy − z), then the Markoff move m 3 connects the vertex (x, y, z) to itself.Substituting z = xy − z into the Markoff equation gives For each y ∈ F p , this is a quadratic equation for x, namely which has no solutions if y 2 = 4, a unique solution x = 0 in case y = 0, and otherwise has 1 + ( y 2 −4 p ) solutions.If y = 0, then the fixed point must be (0, 0, 0), which is not part of our graph.Thus we remove it from the count and find that the number of solutions is The character sum can be evaluated by factoring y 2 − 4 as (y − 2)(y + 2), changing variables to u = y − 2, and using ( u −1 p ) = ( u p ).Note that v = 1 + 4/u assumes all values except 1 and 0 when u is restricted to u = −4, 0, so that Our count becomes p − 4 − ( −1 p ) and the result follows.For any (x, y) solving y 2 = x 2 (y 2 /4 − 1), taking z = xy/2 gives a point (x, y, z) connected to itself by m 3 .In the same way, taking x = yz/2 or y = xz/2 gives points fixed by m 1 or m 2 .All told, there are 3(p − 4 − ( −1 p )) vertices fixed by one of the generators.At each such vertex, there is a single loop.Note, as a special case of part (2), that only (0, 0, 0) is fixed by multiple generators at once.
Proof of (2).-A word of length 2 has no fixed points.We stated before that the Markoff graph does not contain bigons -that is, multiple edges between the same pair of vertices -and a fixed point x of m i m j is equivalent to a bigon between x and m j x.It is easy to see why this does not occur.For example, if the moves m 3 and m 2 define the same edge starting from (x, y, z), then Equivalently, 2y = xz and 2z = xy.Thus 2y = x 2 y/2, which implies that either y = 0 or x = ±2.If y = 0, then 2z = xy = 0 forces z = 0, and then the Markoff equation implies that x is also 0. Thus this case arises only for (x, y, z) = (0, 0, 0), which is not part of our graph.On the other hand, the cases x = ±2 do not arise at all.Indeed, 2z = xy = ±2y implies z = ±y.Substituting this into the Markoff equation gives 4 + 2y 2 = 2y 2 which cannot be.
Proof of (3).-The words of length 3 are either m 2 m 3 m 2 or m 2 m 3 m 1 , up to permuting the variables x, y, z.Note that m 2 m 3 m 2 is conjugate to m 3 since m −1 2 = m 2 , and so it has the same number of fixed points as m 3 .For the word m 2 m 3 m 1 , all four equations impose nontrivial constraints and we will see that there are no solutions.Composing from left to right, we arrive at The second equation implies y = xz/2, and substituting this in the third gives 2z = x 2 z 2 .
As an example involving a word of length 4, consider m 2 m 3 m 2 m 3 .The equation f = x becomes vacuous because the word does not involve m 1 .The remaining equations g = y and h = z can both be solved by taking x = 0. Taking x = 0 in the Markoff equation, we see that every solution of y 2 + z 2 = 0 leads to a fixed point.If p ≡ 1 mod 4, then −1 is a square mod p and any point (0, y, √ −1y) is fixed on the Markoff surface.Thus the system (3.1) can have on the order of p solutions even for a word that is not conjugate to any of the Markoff moves.
In all of these examples, there are on the order of p fixed points at the most.Now we present a heuristic suggesting why this trend should continue for longer words, so that the system (3.1) has only O(p) solutions.Note first that applying a move such as h → f g − h at most doubles the overall degree of the polynomials in the sense that, with respect to any of the variables x, y, or z, It could conceivably leave the degree the same if deg(h) deg(f ) + deg(g).In any case, for a word of length L, the final f , g, and h have degree at most 2 L in any of the variables x, y, z.
Fix z ∈ F p .We expect that (3.1) has only O(1) solutions for x, y.It might happen that two of the equations are redundant, say g = y and h = z, as for a single Markoff move.Thus we consider only f = x together with the Markoff equation itself.The latter is quadratic in x and y, while the equation f = x has degree at most 2 L .By Bézout's theorem, there are at most 2 L+1 common solutions in an algebraic closure, and perhaps even fewer in the ground field F p itself.However, it might happen for some values of z that the locus f = x is contained entirely within x 2 + y 2 + z 2 = xyz.If there were no such z, we could conclude that the number of fixed points is at most 2 L+1 p. Instead, our bound will lead to C L p for some constant C > 2. In particular, the number of fixed points is at most 2 17L+10 p.

The Fricke-Klein trace identity and its consequences
Let P M ∈ Z[x, y, z] denote the polynomial that defines the Markoff surface: Write F 2 for the free group on two generators X, Y .We first explain that the outer automorphism group Out(F 2 ) has a natural action on C 3 by polynomial maps, defined over Z, and moreover preserves the polynomial P M .Although we are ultimately interested in solutions over F p , we use the complex numbers in this section in order to explain this action of Out(F 2 ).
By work of Fricke [Fri96] and Fricke-Klein [FK65] -paraphrased in modern language -Φ is an isomorphism of schemes, provided that the quotient Hom(F 2 , SL 2 (C))/ SL 2 (C) is understood in the sense of geometric invariant theory.The group of automorphisms Aut(F 2 ) acts on Hom(F 2 , SL 2 (C)) by composing θ : This gives a well-defined action of outer automorphisms Out(F 2 ) on the quotient of Hom(F 2 , SL 2 ) by conjugation and hence, via Φ, an action on C 3 by polynomial maps.These polynomial maps are defined over Z, as one verifies on generators of Out(F 2 ).We give examples below, and in the process see how the Markoff moves act in this representation.
We also note that the action of Out(F 2 ) on Hom(F 2 , SL 2 (C))/ SL 2 (C) preserves the function θ → tr([θ(X), θ(Y )]).We rely here on an important fact about F 2 which has no counterpart for free groups of higher rank: given a basis X, Y for F 2 , every outer automorphism preserves the conjugacy class of XY X −1 Y −1 up to inversion [Nie17].This, together with the fact that tr(A) = tr(A −1 ) for A ∈ SL 2 (C), implies that tr([θ(X), θ(Y )]) is an invariant function for Out(F 2 ).Putting this fact together with the Fricke-Klein identity implies that the polynomial action of Out(F 2 ) on C 3 preserves the polynomial P M .
Since Out(F 2 ) acts on C 3 by polynomial maps defined over Z, and preserves the polynomial P M , also defined over Z, we obtain by base change an action of Out(F 2 ) by polynomial maps on the Markoff surface M (F p ) for any prime p.
We now explain the relationship between Out(F 2 ) and GL 2 (Z).Any automorphism of F 2 preserves the commutator subgroup, and in particular Out(F 2 ) acts on the abelianization which is a free abelian group of rank 2. This action induces a map and it is a theorem of Nielsen that this map is an isomorphism (see, for instance, [Aig13, Theorem 6.24] or [Nie17] for the original article).Thus GL 2 (Z) acts on the Markoff surface via the action of Out(F 2 ).
To show that the Markoff generators are induced by the Out(F 2 ) action, and find specific matrix representatives for them, we argue as follows.Given an element θ ∈ Hom(F 2 , SL 2 (C)), write A = θ(X) and B = θ(Y ) in SL 2 (C).
By the Cayley-Hamilton theorem, A solves its own characteristic polynomial, so Multiplying by BA −1 , we obtain Thus the third move m 3 arises from the element of Aut(F 2 ) that sends X to X and Y to Y −1 , which corresponds to the matrix Equally well, since we work in PGL 2 , m 3 could be represented by [ −1 0 0 1 ], which would correspond to writing the trace vector as tr(A), tr(B), tr BA −1 = tr A −1 , tr(B), tr A −1 B by cyclicity of trace.In the same way, we find that the first move m 1 arises from the element of Aut(F 2 ) that sends (X, Y ) to (XY 2 , Y −1 ).The second move arises from the element of Aut(F 2 ) that sends X to X −1 and Y to X 2 Y .In terms of GL 2 (Z), the Markoff moves therefore correspond to the matrices In particular, the group generated by these matrices acts on the Markoff surface.
One also has permutations of the three coordinates.For instance, the transposition τ 23 acts by (tr(A), tr(B), tr(AB)) → (tr(A), tr(AB), tr(B)) = tr(A), tr A −1 B −1 , tr B −1 so that, in matrix form, As before, these correspond to the Markoff moves (4.2) As an abstract group, G ∼ = Z/2Z * Z/2Z * Z/2Z with the m i the generators of the factors in the free product [ÈH74, Theorem 1], [CL09, Theorem 3.1].To conclude this section, we note a property of G that will be used in the sequel: Proof.-This follows from the fact that finite-order elements of a free product are conjugate into one of the factors, which in turn follows for example from Kurosh's theorem (see [MKS04, Corollary 4.9.1]).

Bounds for the number of fixed points of words
The goal of this section is to prove the following Theorem 5.1.The exponent 8 on max(|a|, |b|, |c|, |d|) can likely be replaced by 1.Similarly, the assumption max(|a|, |b|, |c|, |d|) (p/128) 1/8 can most likely be loosened.To keep the arguments to their simplest and most readable, and since the bound above is enough for our qualitative result, we chose not to pursue the optimal constants here.
After raising g to a small power, three natural cases arise, and we will give a different bound in each case.
Lemma 5.2.-For any element g ∈ GL 2 (Z), there is a power 1 K 8 of g such that one of the following holds.
(1) All the entries of g K have absolute value at least 2.
(2) g K is a torsion element of GL 2 (Z).In this case, g is already torsion.
(3) g K is one of the following types of matrices Proof.-We first show that one may take K 4 in the case det(g) = 1.To avoid considering the case det(g) = −1 separately, we replace g by g 2 and double K if necessary.Assuming det(g) = 1, the Cayley-Hamilton theorem implies g 2 − tr(g)g + I = 0.
If tr(g) = 0, we then have g 4 = I so that g is torsion.If tr(g) = ±1, then multiplying by g gives If bc = 0, then ad = 1 and g must be of the form (5.1).Otherwise, we have |b| 1, |c| 1, and (a + d) 2 4. It follows that all entries of g 4 are at least 2 in absolute value (moreover, at least 3).The entries of g 2 might not be, for instance if a = 0.

Fixed points of generic elements of G
The "generic" case is when all the entries of h have absolute value 2. In this case, we use the following bound of Cerbu-Gunther-Magee-Peilen ([CGMP20, Lemma 3.9]).We refer to [CGMP20] for the proof.The assumption that all the entries have absolute value at least 2 makes it possible to implement a rigorous version of the heuristic in Section 3. which is either empty if p ≡ 3 mod 4 or a pair of lines if p ≡ 1. Hence For the remaining values of x, we have #C(x) = p − x 2 − 4 p as we will see by an explicit parametrization.It can also be shown by direct manipulations with the Legendre symbol.
One can think of the conic sections C(x) either as ellipses or hyperbolas modulo p according to whether x 2 − 4 is a square.Following [BGS16], we say x ∈ F p is hyperbolic if x 2 − 4 is a nonzero square in F p .We say x ∈ F p is elliptic if x 2 − 4 is nonzero and not a square.We say x ∈ F p is parabolic if x 2 − 4 = 0, i.e. x = ±2.Note that the parabolic case only arises for p ≡ 1 mod 4, and that the conic section in such a case is not a parabola but something degenerate.The behaviour of rot on C(x) was described by Bourgain, Gamburd, and Sarnak in [BGS16] using this classification of values of x.They state their results for the surface X 2 + Y 2 + Z 2 = 3XY Z, although in many of the proofs they use the same normalization x 2 + y 2 + z 2 = xyz as in the present article.The two surfaces are equivalent over F p for p 5 by a scaling (X, Y, Z) = (x, y, z)/3, and we review the corresponding parts of [BGS16] for the reader's convenience.
A convenient change of variable toward parametrizing C(x) is where ξ = 0 lies in F p if x 2 − 4 is a square, and otherwise in a quadratic extension , tξ + κ tξ solves the Markoff equation for any t = 0. Note that multiplying (5.5) by ξ gives ξ 2 − xξ + 1 = 0, and this equation simplifies the verification that (x, t + κt −1 , tξ + κt −1 ξ −1 ) solves the Markoff equation.The action of rot is to multiply the parameter t by ξ −1 .Indeed, from the definition x = ξ + ξ −1 and rot(y, z) = (xy − z, y), we calculate that that is, t has been multiplied by ξ −1 .These considerations can be summarized in the following Lemma 5.6, due to Bourgain-Gamburd-Sarnak [BGS16].
Lemma 5.6.-(Bourgain-Gamburd-Sarnak) • ([BGS16, Lemma 3]) Let x be parabolic.If p ≡ 3 mod 4 then C(x) is empty.If p ≡ 1 mod 4 then C(x) consists of two lines.Letting i be such that i 2 ≡ −1 mod p, the conic sections are parametrized by The action of rot is given by As a consequence, |C(x)| = p − 1.After this identification, rot acts on As a consequence, |C(x)| = p + 1.After this identification, rot acts on C(x) ∼ = E(x) by multiplication by v −1 .
Proof of Proposition 5.5.By multiplying by −I, taking inverses, or conjugating by [ 0 1 1 0 ] , all the matrices of the proposition can be brought into the form [ 1 n 0 1 ] where n > 0. None of these operations change the number of fixed points of g on M (F p ), or the bound for the number of fixed points claimed in the lemma.So it suffices to prove the proposition for [ 1 n 0 1 ].This matrix acts on M (F p ) by rot n .We split up the fixed points of rot n depending on whether they belong to C(x) with x parabolic, hyperbolic, or elliptic.If x is parabolic, Lemma 5.6 implies that for n < p, rot n has no fixed points on C(x).For fixed hyperbolic x, let x = w + w −1 as in Lemma 5.6.Lemma 5.6 implies that rot n has a fixed point in C(x) if and only if w n = 1, and this happens if and only if every element of C(x) is fixed by rot n .The number of fixed points of rot n contained in C(x) with x hyperbolic is therefore bounded by When x is elliptic, a similar argument using Lemma 5.6 shows that the number of fixed points of rot n contained in C(x) is bounded by Therefore, when n < p, adding our previous bounds together, rot n has at most 2pn fixed points on M (F p ).This concludes the proof.
We have used the bound that there are at most n solutions to w n = 1, as a polynomial cannot have more roots than its degree.For many values of n, the only solution is w = 1.Extra solutions arise only if n and p − 1 have a common factor.This is related to the difficulties encountered in [BGS16] when p 2 − 1 has many factors.

Proof of Theorem 5.1.
Consider any g = 1 in the Markoff group G PGL 2 (Z).Let h = g K where K 8 is the power from Lemma 5.2.Any fixed point of g is also a fixed point of its powers, so it suffices to bound the number of fixed points of h on M (F p ).Note that if g is represented by [ fixed points, and therefore so does g.

Proof of the Kesten-McKay Law
In this section, we prove Theorem 1.1.Let A be the adjacency matrix of the Markoff graph and λ j its eigenvalues.By definition of the empirical measure µ p = δ λ j , we have On the other hand, expanding the trace as in Section 2 gives The product a j 1 , j 2 a j 2 , j 3 . . .a j L , j 1 is 0 unless there is a cycle where each arrow represents a Markoff move m 1 , m 2 , or m 3 .In such a case the product is 1 and the vertex labeled j 1 is fixed by some word of length L. The trace is obtained by summing over all words where Fix(w) denotes the number of fixed points of w acting on M (F p ), and the indices i 1 , . . ., i L take the values 1, 2, 3.The words that reduce to the identity fix all of M (F p ) and contribute the main term: From the combinatorial interpretation noted in Section 2, the Kesten-McKay moment x L dρ 3 is exactly this count of paths in a tree returning to the starting point.We will use Theorem 5.1 to show that the remaining words make a negligible contribution, together with the following preparations.If Proof.-The generators themselves have entries of absolute value at most 2, namely ).This confirms the base case L = 1 (and would even allow a better exponential rate than 4 L ).For the induction step, consider Corollary 6.2.-There is an absolute exponent α > 0 such that if g ∈ GL 2 (Z) with | tr(g)| > 2 is a word of length L in the matrices representing the Markoff moves m 1 , m 2 , m 3 , then g has at most e αL p fixed points.
Proof.-By the previous Proposition 6.1, the entries of g are at most 4 L in absolute value.Combining this with Theorem 5.1, provided L is small enough that 4 L < (p/128) 1/8 we find that the number of fixed points of g is at most Thus we can take α = 26 log 2 = 18.0218 . . .and have the result for all L 1 obeying 4 L < (p/128) 1/8 .For larger L, note that the conclusion holds trivially once e αL p > p 2 + 3p, there being at most p 2 + 3p points on the Markoff surface.If L is so large that 4 L (p/128) 1/8 , then e αL p (p/128) α/16 log 2 p.This can be made to exceed p 2 + 3p for all p 5 by taking α large enough.
There are at most 3 L words m j 1 • • • m j L of length L since each index is either 1, 2, or 3. Using the previous corollary over each of these terms leads to where the sum is over the remaining words, that is, those that do not reduce to the identity.Combining this with the main term from the words that do reduce to 1, we have where C = 3 × 2 16 = 196608 and the implicit constant could be taken as 2 10 = 1024 independent of both p and L. We have |M (F p )| = p 2 ± 3p, and normalizing by p 2 gives The error term is negligible provided that This allows for L ∼ c log p for a sufficiently small c > 0, namely c < 1 log C = 0.082041 . . .and one could also take, for instance, L ∼ 1 log C log p − √ log p.

Proof of Corollary 1.3
Corollary 1.3 compares the measure of an interval under the empirical distribution of eigenvalues as against the limiting Kesten-McKay law, whereas Theorem 1.1 gives information about moments.A natural bridge between these is to approximate the given interval's indicator function by polynomials.If we had estimates for the Fourier transform j e iξλ j , then we could try to bound the discrepancy using the Erdős -Turán inequality (see [Mon94, Corollary 1.1] or the original articles [ET48a,ET48b]).However, Theorem 1.1 only allows us to take moments of the form j λ L j with L on the order of log p.There are standard arguments to pass from moments to discrepancy, and in particular Gamburd-Jakobson-Sarnak faced the same problem in a setting very close to ours [GJS99].What we state below as Lemma 7.1 is a summary of facts given in equations ( 55 Proof of Corollary 1.3.The Markoff eigenvalues lie in [−3, 3], so we first rescale so that Lemma 7.1 applies.Given any subinterval J of [−3, 3], let where K = 3 √ 2. Let f ± m be the polynomials from Lemma 7.1 applied to I, where m will be a small multiple of log p, and let g ± m (x) = f ± m (x/K) be the rescaled polynomials on [−3, 3].Then we can write where |a ± m,i | B m K −i B m , noting that K > 1.We write µ ∞ for the measure with density ρ 3 (x) and µ p for the eigenvalue counting measure (normalized to have total mass 1).
By Theorem 1.1, Since the coefficients a ± m,i are at most B m , we have g ± m dµ p = g ± m dµ ∞ + O (BC) m p −1 Therefore we can replace µ p by µ ∞ in (7.1):

Conclusion
We have argued that nontrivial words of length L have at most pe O(L) fixed points, while the identity has p 2 + O(p).Thus, for any fixed L or even up to a small multiple of log p, the path-count will approximately match what one would get in the process of computing a Kesten-McKay moment.The error term O(p) cannot be improved because some words, such as the Markoff moves themselves, do have on the order of p fixed points.There is room for improvement in taking longer words, namely allowing L to be a larger multiple of log p.This would lead to a more refined scale at which the Kesten-McKay holds.Beyond the scale log p, the Markoff graph no longer resembles a tree in the same statistical sense that we have proved for smaller L. To see this, start from the 3-regular tree of integer solutions and reduce mod p.There are only p 2 ± 3p nonzero solutions mod p (and it is not even known whether all of them appear from integer solutions reduced mod p).On the other hand, the first n layers in a 3-regular tree comprise 3 × 2 n − 2 nodes.Once 3 × 2 n − 2 > p 2 + 3p, there must be distinct Markoff triples over Z that coincide mod p.This gives a cycle in M (F p ) of length at most 2n (to the root and back).The same argument produces a closed path starting from any solution mod p that lifts to Z, which Bourgain-Gamburd-Sarnak [BGS16] prove is the vast majority of them.Thus many cycles of length 4 log 2 (p) or shorter form as the tree collapses on itself mod p.We would not expect it to be possible to take L > 4 log 2 log p = (5.77078 . ..) log p and still have agreement with the Kesten-McKay moments.At that scale, if not sooner, cycles appear at a positive proportion of the vertices.
The Kesten-McKay law leaves open the question of whether the Markoff graphs are connected for each prime p, and the even harder question of whether they form an expander family.The number of connected components of a 3-regular graph is the multiplicity of λ = 3 as an eigenvalue.Corollary 1.4 implies that the number of eigenvalues in an interval [3 − ε, 3] is O(p 2 / log p), which is well short of proving even that the number of components is exactly 1 or even O(1) independent of p.To prove a spectral gap, even if the interval contained a bounded number of eigenvalues, one would need a further argument to rule out some eigenvalues being 3 + o(1) as p → ∞.The bulk distribution of eigenvalues we have proved here is a coarser property.
Figure 1.1.Histogram of eigenvalues for p = 83 and 89 with the density ρ 3 (x) shown in red.
Taking the trace of both sides gives tr(BA) = tr(A) tr(B) − tr BA −1 .This has the same form as a Markoff move on the vector of traces (tr(A), tr(B), tr(BA)), with the other solution for the third coordinate being tr(BA −1 ).To keep the third matrix equal to the product of the first two, we use tr(A −1 ) = tr(A) to rewrite the vector of traces as tr(A), tr(B), tr BA −1 = tr(A), tr B −1 , tr AB −1 Lemma 4.1.-The only torsion elements in the Markoff group are the Markoff moves themselves and their conjugates.
of length L in the generators m 1 , m 2 , m 3 , then the entries a, b, c, d are at most exponential in L. As an explicit upper bound, we have Proposition 6.1.-The entries of a word of length L in the Markoff moves m 1 , m 2 , m 3 are at most 4 L in absolute value.