0$ such that if $g \in \GL_2(\Z)$ with $|\tr(g)|>2$ is a word of length $L$ in the matrices representing the Markoff moves $m_1,m_2,m_3$, then $g$ has at most $e^{\alpha L}p$ fixed points. \end{coro} \begin{proof} By the previous Proposition~\ref{prop:bound-entries}, the entries of $g$ are at most $4^L$ in absolute value. Combining this with Theorem~\ref{thm:fixed-points}, provided $L$ is small enough that $4^L < (p/128)^{1/8}$ we find that the number of fixed points of $g$ is at most \[ 1024 p \left(4^L\right)^8 = p 2^{16L+10}. \] Thus we can take $\alpha = 26\log{2} = 18.0218\,\ldots\,$ and have the result for all $L \geq 1$ obeying $4^L < (p/128)^{1/8}$. For larger $L$, note that the conclusion holds trivially once $e^{\alpha L}p > p^2+3p$, there being at most $p^2+3p$ points on the Markoff surface. If $L$ is so large that $4^L \geq (p/128)^{1/8}$, then $e^{\alpha L}p \geq (p/128)^{\alpha/16\log{2}} p$. This can be made to exceed $p^2+3p$ for all $p \geq 5$ by taking $\alpha$ large enough. \end{proof} There are at most $3^L$ words $m_{j_1}\,\cdots\, m_{j_L}$ of length $L$ since each index is either 1, 2, or 3. Using the previous corollary over each of these terms leads to \[ \sum_{i_1}\,\cdots\,\sum_{i_L} \Fix(m_{i_1}\,\cdots\, m_{i_L}) \leq 3^L \times \left(p \times 2^{16L+10}\right) \] where the sum is over the remaining words, that is, those that do not reduce to the identity. Combining this with the main term from the words that do reduce to 1, we have \[ \tr \left(A^L\right) = |M(\F_p)| \int x^L \rho_3(x) dx + O\big(C^L p \big) \] where $C=3 \times 2^{16} = 196608$ and the implicit constant could be taken as $2^{10}=1024$ independent of both $p$ and $L$. We have $|M(\F_p)| = p^2 \pm 3p$, and normalizing by $p^2$ gives \[ \int x^L d\mu_p(x) = \int x^L \rho_3(x) dx + O\left(\frac{C^L }{p} \right). \] The error term is negligible provided that \[ L - \frac{\log{p}}{\log{C} } \rightarrow - \infty. \] This allows for $L \sim c\log{p}$ for a sufficiently small $c > 0$, namely \[ c < \frac{1}{\log{C} } = 0.082041\ldots \] and one could also take, for instance, $L \sim \frac{1}{\log{C}} \log{p} - \sqrt{\log{p}}$. \section{Proof of Corollary~\ref{cor:discrepancy}} \label{sec:discrepancy} Corollary~\ref{cor:discrepancy} compares the measure of an interval under the empirical distribution of eigenvalues as against the limiting Kesten--McKay law, whereas Theorem~\ref{thm:logp} gives information about moments. A natural bridge between these is to approximate the given interval's indicator function by polynomials. If we had estimates for the Fourier transform $\sum_j e^{i \xi \lambda_j}$, then we could try to bound the discrepancy using the Erd\H{o}s\linebreak--Turán inequality (see~\cite[Corollary~1.1]{Mon} or the original articles~\cite{ET1, ET2}). However, Theorem~\ref{thm:logp} only allows us to take moments of the form $\sum_j \lambda_j^L$ with $L$ on the order of $\log{p}$. There are standard arguments to pass from moments to discrepancy, and in particular Gamburd--Jakobson--Sarnak faced the same problem in a setting very close to ours~\cite{GJS}. What we state below as Lemma~\ref{lem:gjs} is a summary of facts given in equations (55) and (57) of~\cite[Proof of Theorem~1.3]{GJS}. It is based on Selberg polynomials after Selberg~\cite[p.~213--219]{S} and Vaaler~\cite{V}. We also recommend Montgomery's treatment~\cite[p.~5--15]{Mon}. \begin{lemm}\label{lem:gjs} (Gamburd--Jakobson--Sarnak, after Selberg and Vaaler) For any interval $I \subseteq [-1,1]$ and $m \in \N$, there exist polynomials $f_{m}^{\pm}$ of degree $m$ such that \begin{itemize} \item $f_{m}^{-} \leq \chi_{I} \leq f_{m}^{+}$ on $[-1,1]$, \item There is an absolute constant $B>0$, independent of $I$, such that the coefficients of $f_{m}^{\pm}$ have absolute value $\leq B^{m}$. \item \[ \int_{-\frac{1}{\sqrt{2}}}^{\frac{1}{\sqrt{2}}}(f_{m}^{+}-f_{m}^{-})(y)dy=O\left(\frac{1}{m}\right). \] \end{itemize} \end{lemm} \begin{proof}[{Proof of Corollary~\ref{cor:discrepancy}.}] The Markoff eigenvalues lie in $[-3,3]$, so we first rescale so that Lemma~\ref{lem:gjs} applies. Given any subinterval $J$ of $[-3,3]$, let \[ I = K^{-1} J \subseteq \left[-\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}} \right] \] where $K = 3\sqrt{2}$. Let $f_m^{\pm}$ be the polynomials from Lemma~\ref{lem:gjs} applied to $I$, where $m$ will be a small multiple of $\log{p}$, and let $g_m^\pm(x) = f_m^{\pm}(x/K)$ be the rescaled polynomials on $[-3,3]$. Then we can write \[ g_m^{\pm}(y) = \sum_{i=0}^m a_{m,\,i}^{\pm} y^i, \] where $|a_{m,i}^{\pm}| \leq B^m K^{-i} \leq B^m$, noting that $K > 1$. We write $\mu_{\infty}$ for the measure with density $\rho_3(x)$ and $\mu_p$ for the eigenvalue counting measure (normalized to have total mass 1). From Lemma~\ref{lem:gjs}, we have \begin{equation}\label{eqn:above-below} \int g_m^- d\mu_p \leq \mu_p(J) \leq \int g_m^+ d\mu_p. \end{equation} By Theorem~\ref{thm:logp}, \[ \int x^i d\mu_p = \int x^i d\mu_{\infty} + O\big(p^{-1} C^i \big) \] Therefore \[ \int g_m^{\pm} d\mu_p = \int g_m^{\pm} d\mu_{\infty} + O\left(\sum_{i=0}^m \left|a_{m,i}^{\pm}\right| C^i p^{-1} \right) \] Since the coefficients $a_{m,i}^{\pm}$ are at most $B^m$, we have \[ \int g_m^{\pm} d\mu_p = \int g_m^{\pm} d\mu_{\infty} + O\left((BC)^m p^{-1} \right) \] Therefore we can replace $\mu_p$ by $\mu_{\infty}$ in~\eqref{eqn:above-below}: \[ \int g_m^- d\mu_{\infty} + O\left((BC)^m p^{-1} \right) \leq \mu_p(J) \leq \int g_m^+ d\mu_{\infty} + O\left((BC)^m p^{-1} \right). \] Using $g_m^- \leq \chi_J \leq g_m^+$ gives \[ \int g_m^- d\mu_{\infty} \leq \mu_{\infty}(J) \leq \int g_m^+ d\mu_{\infty} \] and since the Kesten--McKay density $\rho_3$ is bounded, Lemma~\ref{lem:gjs} also implies that \[ \int (g_m^+ - g_m^-) d\mu_{\infty} = O\left(\frac{1}{m} \right). \] It follows that \[ |\mu_p(J) - \mu_{\infty}(J) | \lesssim \frac{1}{m} + \frac{(BC)^m}{p}. \] If we choose $m = \lfloor c \log{p} \rfloor$ for a small constant $c > 0$, we obtain \[ |\mu_p(J) - \mu_{\infty}(J) | \lesssim \frac{1}{c \log{p}} + p^{-1+c\log(B C)} \] By choosing $c>0$ such that $-1 + c\log(B C)<0$, the $ \frac{1}{c \log{p}}$ is eventually the larger term above. Thus \[ \mu_{p}(J) =\int_{J} \rho_{3}(x) dx+O\left(\frac{1}{\log p} \right) \] as required. \end{proof} \section{Conclusion} \label{sec:conc} We have argued that nontrivial words of length $L$ have at most $p e^{O(L)}$ fixed points, while the identity has $p^2 + O(p)$. Thus, for any fixed $L$ or even up to a small multiple of $\log{p}$, the path-count will approximately match what one would get in the process of computing a Kesten--McKay moment. The error term $O(p)$ cannot be improved because some words, such as the Markoff moves themselves, do have on the order of $p$ fixed points. There is room for improvement in taking longer words, namely allowing $L$ to be a larger multiple of $\log{p}$. This would lead to a more refined scale at which the Kesten--McKay holds. Beyond the scale $\log{p}$, the Markoff graph no longer resembles a tree in the same statistical sense that we have proved for smaller $L$. To see this, start from the 3-regular tree of integer solutions and reduce mod $p$. There are only $p^2 \pm 3p$ nonzero solutions mod $p$ (and it is not even known whether all of them appear from integer solutions reduced mod $p$). On the other hand, the first $n$ layers in a 3-regular tree comprise $3 \times 2^n - 2$ nodes. Once $3 \times 2^n - 2 > p^2+3p$, there must be distinct Markoff triples over $\Z$ that coincide mod $p$. This gives a cycle in $M(\F_p)$ of length at most $2n$ (to the root and back). The same argument produces a closed path starting from any solution mod $p$ that lifts to $\Z$, which Bourgain--Gamburd--Sarnak~\cite{BGS} prove is the vast majority of them. Thus many cycles of length $4\log_2(p)$ or shorter form as the tree collapses on itself mod $p$. We would not expect it to be possible to take $L > \frac{4}{\log{2}} \log{p} = (5.77078\ldots) \log{p}$ and still have agreement with the Kesten--McKay moments. At that scale, if not sooner, cycles appear at a positive proportion of the vertices. The Kesten--McKay law leaves open the question of whether the Markoff graphs are connected for each prime $p$, and the even harder question of whether they form an expander family. The number of connected components of a 3-regular graph is the multiplicity of $\lambda=3$ as an eigenvalue. Corollary~\ref{cor:exceptional-eigs} implies that the number of eigenvalues in an interval $[3-\varepsilon,3]$ is $O(p^2/\log{p})$, which is well short of proving even that the number of components is exactly 1 or even $O(1)$ independent of $p$. To prove a spectral gap, even if the interval contained a bounded number of eigenvalues, one would need a further argument to rule out some eigenvalues being $3+o(1)$ as $p \rightarrow \infty$. The bulk distribution of eigenvalues we have proved here is a coarser property. \section*{Acknowledgments} We thank Seungjae Lee for Figure~\ref{fig:histogram}, the first numerical evidence in favour of the Kesten--McKay law for graphs constructed from the Markoff equation, and many helpful conversations. We thank Peter Sarnak for his encouragement in this project and Nick Katz for helpful discussions. We thank the anonymous referee for a careful reading and many helpful suggestions. \bibliography{AHL_dci} \end{document}