Hamilton-Jacobi equations for mean-field disordered systems

We argue that Hamilton-Jacobi equations provide a convenient and intuitive approach for studying the large-scale behavior of mean-field disordered systems. This point of view is illustrated on the problem of inference of a rank-one matrix. We compute the large-scale limit of the free energy by showing that it satisfies an approximate Hamilton-Jacobi equation with asymptotically vanishing viscosity parameter and error term.


Motivation
The goal of this paper is to propose a new approach to the computation of the large-scale limit of the free energy of mean-field disordered systems.The new method is based on showing that the finite-volume free energy satisfies an approximate Hamilton-Jacobi equation, with viscosity parameter and error term that vanish in the large-scale limit.
The paper grew out of my attempt to build some intuition for the celebrated Parisi formula for such systems, see [17,13,19,21,18].The classical variational formulation of the free energy allows to write this quantity as the supremum of an energy and an entropy terms.For the Sherrington-Kirkpatrick model, the Parisi formula identifies the limit as an infimum instead, defying intuition.The plot thickens further when one considers more complicated systems such as the perceptron and the Hopfield models, which are expected to have limit free energies given by saddle-point variational problems [20].
In this paper, I propose a change of viewpoint that puts the main emphasis on the fact that the free energy satisfies a Hamilton-Jacobi equation, up to a small error.In this new point of view, this is the fundamental property that should be the center of attention and should receive an explanation.As is well-known, if the nonlinearity in the Hamilton-Jacobi equation is convex, then the solution can be expressed as an inf-sup variational problem.This suggests a Hamilton-Jacobi interpretation with convex nonlinearity for the Sherrington-Kirkpatrick model.However, it is unclear why one should expect the nonlinearity to always be convex.In fact, in the model that will be the focus of our attention here, the nonlinearity is concave, not convex.This still allows for a variational representation, but as a sup-inf instead of an inf-sup.More importantly, this suggests that the Hamilton-Jacobi point of view may be more robust and transparent than the variational representations.
The observation that finite-volume free energies satisfy approximate Hamilton-Jacobi equations already appeared in the physics literature [5,6].As was explained there, this idea can easily be made rigorous in the case of the Curie-Weiss model.Although the interactions in this model are not disordered, it is illustrative to explain the main ideas in this simple case.
For the Curie-Weiss problem, we would like to compute, for each t ⩾ 0, the large-N limit of the free energy σ i σ j .
We aim to do so by identifying a PDE satisfied by F ○ N (t), possibly up to error terms that vanish in the large-N limit.However, at this stage we can only calculate derivatives with respect to t and infer information about the distribution of ∑ N i,j=1 σ i σ j under the associated Gibbs measure.(For instance, the first and second derivatives are related to the mean and variance of this variable.)In order to find a closed set of equations, we need to "enrich" our free energy by introducing another quantity into the problem.This additional quantity should hopefully be simpler than ∑ i,j σ i σ j , and display some nontrivial correlations with the latter.In the present case, this quantity is very easy to guess: it is simply the average magnetization ∑ N i=1 σ i .(In more complicated settings, our intuition can be guided e.g. by cavity calculations.)Although we may a priori only care about calculating F ○ N (t), it is thus natural to introduce, for each t ⩾ 0 and h ∈ R, the enriched free energy Denoting by ⟨⋅⟩ the associated Gibbs measure, we then observe that Since the right side in the identity above is a variance, we should expect it to be small.Moreover, since F N (t, h) encodes complete information on the law of ∑ σ i , it should be possible to find an expression for this variance in terms of F N .We find indeed that On this simple example, the free energy thus solves an exact Hamilton-Jacobi equation with viscosity term equal to N −1 .After observing that the value of F N (0, h) does not depend on N , we have completely identified the limit F ∞ of F N as the viscosity solution to In a nutshell, due to the mean-field character of the model, we expect to be able to identify a handful of quantities whose statistics are related to one another.These relations will produce non-trivial identities between the first derivatives of the free energy: a Hamilton-Jacobi equation.There will be error terms, which one may expect to control by second-order derivatives, since these second-order derivatives are equal to the variances of the quantities of interest.We aim to carry an argument that has a similar structure for disordered meanfield models.However, for the Sherrington-Kirkpatrick and similar models, an important difficulty arises: the number of informative quantities one needs to add to the "enriched" free energy is infinite.In physicists' language, the system has a functional order parameter.As is well-known, this is bound to create very important technical difficulties.We will thus focus on the simpler setting provided by an inference problem.In this context, an additional symmetry forces the system to be replica-symmetric for every choice of parameters, and thus a simpler argument based on the addition of a single quantity suffices to "close the equation".We define the model on which we will focus and state our main results in the next section.

Rank-one estimation, main results
We consider the problem of estimating a vector x = (x 1 , . . ., x N ) ∈ R N of independent entries distributed according to a bounded measure P , given the observations of where W = (W ij ) 1⩽i,j⩽N are independent standard Gaussian random variables, independent of the vector x.We denote the joint law of x and W by P, with associated expectation E. Note that we seek to recover N parameters from N 2 observations, each with a signal-to-noise ratio of the order of N −1 ; this should therefore be the critical scaling for the inference of x.By Bayes' rule, the posterior distribution of x given the observation of Y is the probability measure , where we use the shorthand notation P N for the product measure P ⊗N , and where H N (t, x) is defined by This will be explained in more details and in a slightly more general context in Appendix A. Note that although we suppress it from the notation, the quantity H N (t, x) is random in that it depends on the realization of x and W .Throughout the paper, we write x to denote the 2 norm of the vector x ∈ R N .
Our goal is to understand the large-N behavior of the normalizing constant in (2.1).The asymptotic behavior of this quantity has already been obtained multiple times in the literature; we refer to [10,15,2,14,3,4,11] for references.As was explained above, the point of the present paper is to devise yet another proof of this result, which centers on the identification of an appropriate Hamilton-Jacobi equation.Natually, several elements of the proof presented here can also be found in these previous works; the main difference is the global structure of the argument.
In the spirit of the previous section, we start by introducing an "enriched" system.Let z = (z i ) 1⩽i⩽N be a vector of independent standard Gaussian random variables, independent of x and W under P.For every t, h ⩾ 0 and x ∈ R N , we define The difference between the quantity above and that in (2.2), namely is the energy associated with the much simpler inference prolem in which we try to recover x ∈ R N from the observation of √ h x + z ∈ R N .We define the free energy as well as its expectation (with respect to the variables x, W and z) For every h ⩾ 0, we set In this expression, all the variables are scalar.Observe that F N (0, h) = ψ(h) does not depend on N .Our main goal is to prove the following result.
Theorem 2.1 (Convergence to HJ).For every M ⩾ 1, we have where f (t, h) is the viscosity solution of the Hamilton-Jacobi equation The next proposition is the main ingredient for the proof of Theorem 2.1.It states that the averaged free energy satisfies an approximate Hamilton-Jacobi equation with asymptotically vanishing viscosity parameter.

Proposition 2.2 (Approximate HJ in finite volume
).There exists C < ∞ such that for every N ⩾ 1 and uniformly over [0, ∞) 2 , and moreover, In Proposition 2.2, we kept the variables (t, h) implicit for notational convenience.A more precise statement would be that for every The right side of this inequality is interpreted as +∞ when h = 0.The next section is devoted to the proof of Proposition 2.2.We will also give some basic estimates on the derivatives of F N and show that F N is concentrated around its expectation F N .Section 4 starts with the definitions relevant to the notion of viscosity solutions.We then prove Theorem 2.1 using the results of Section 3. The argument is similar to more standard situations for vanishing viscosity limits, although some additional difficulties appear.We close the section by discussing a variational representation for f given by the Hopf-Lax formula.A generalization to tensors of arbitrary order is then obtained in Section 5.In order to make the paper fully self-contained, two appendices are included.In Appendix A, we recall the proof of the Nishimori identity, which is a property of inference problems and is the main technical mechanism that allows to "close the equation" and remain in the replica-symmetric phase.In Appendix B, we prove the comparison principle and the Hopf-Lax formula for viscosity solutions of (2.6).

Approximate Hamilton-Jacobi equation and basic estimates
The main purpose of this section is to prove Proposition 2.2.We will also record basic estimates on the derivatives of the free energy and its concentration properties that will be useful in the next section.
We denote by ⟨⋅⟩ the Gibbs measure associated with the energy where Note that although the notation does not display it, this random probability measure depends on t, h, as well as on the realization of the random variables x, W and z.We will also consider "replicated" (or tensorized) versions of this measure, and write x, x ′ , x ′′ , etc. for the canonical "replicated" random variables.
Conditionally on x, W and z, these random variables are independent and each is distributed according to the Gibbs measure ⟨⋅⟩.Abusing notation slightly, we still denote this tensorized measure by ⟨⋅⟩.An important ingredient for the proof of Proposition 2.2 is the Nishimori identity, which is a feature of inference problems whose proof is recalled in Appendix A below.For simplicity of notation, we only state this identity in the case of two or three replicas, since this will be sufficient for our purpose: for every bounded measurable function 2) E ⟨f (x, x ′ )⟩ = E ⟨f (x, x)⟩ , and for every bounded measurable function We decompose the proof into three steps.
Step 1.In this step, we compute the first derivatives of F N .Starting with the derivative with respect to t, we have By Gaussian integration by parts, we have for every i, j ∈ {1, . . ., N } that and thus, taking the expectation in (3.4), we get Using also the Nishimori identity (3.2), we conclude that we have We thus deduce that (3.9) In particular, this quantity is non-negative.Note also that so that property (2.7) holds.
Step 2. In the remaining two steps, we will control the right side of (3.9) in terms of the quantitites In this step, we show that these quantities allow for a control of the fluctuations of More precisely, we show that Our starting point is the variance decomposition and we readily have that We also have that and thus, taking expectations and using (3.6), we get Recall that we assume that the measure P has bounded support.This implies that the last term in the display above is bounded by Ch −1 , and thus yields (3.11).
Step 3. In order to conclude, there remains to show that the variance of x ⋅ x is controlled by that of H ′ N (h, x).We show that In view of (3.7) and (3.8), it suffices to show that For every i ≠ j, we have, using Gaussian integration by parts and the Nishimori identity, Similarly, We therefore obtain that By the Cauchy-Schwarz inequality and the Nishimori identity, we have and thus (3.14) is proved.
Before turning to the proof of Theorem 2.1, we record simple derivative and concentration estimates in the next two lemmas.We use the notation Of course, this quantity depends on N , and as we will see in the proof of Lemma 3.2, it grows like √ N .The notation may be slightly misleading, in that it does not display the dependency on N .A similar convention is already in place when we write x to denote the 2 norm of the vector x ∈ R N , a quantity which is typically of the order of √ N .
We now turn to a concentration estimate.Since this is sufficient for our purposes, we simply state an L 2 bound in the probability space, and prove it using the elementary Efron-Stein inequality.The statement could be strengthened to a Gaussian-type integrability using concentration results such as [7, Theorem 5.5 and Theorem 2.8] (and this also allows to improve the rate of decay to 0 as N tends to infinity).

Lemma 3.2 (Concentration of free energy). There exists
Proof.We recall that F N is the expectation of F N with respect to the variables x, W and z.The Efron-Stein inequality gives us that By the Gaussian Poincaré inequality (see e.g.[8, (2.5)] or [1]), we have and Moreover, Since is bounded by C(t + h) N , and the support of the law of x i is bounded, we also have that 1⩽i⩽N We have thus shown that there exists C < ∞ such that for every In order to complete the proof, there remains to use a regularity estimate for F N − F N .By Lemma 3.1, for every t, t ′ , h, h ′ ⩾ 0 satisfying we have On the other hand, it is clear from Lemma 3.1 that F N is uniformly Lipschitz continuous, so in particular the estimate above also holds if F N is replaced by F N .Hence, for any ε ∈ (0, 1], if we set Moreover, for every M ∈ [1, ∞), we have by (3.19) that Combining the two previous displays yields and we clearly have E[ z 2 ] = N .In order to conclude, there remains to verify that E[ W 2 ] ⩽ CN (and then choose ε = M ).For every fixed x ∈ R N satisfying x ⩽ 1 and every i ∈ {1, . . ., N }, we have that (W x) i is a centered Gaussian random variable with variance x 2 ⩽ 1, and moreover, the random variables ((W x) i ) 1⩽i⩽N are independent.We deduce that there exists C < ∞ such that for every x ⩽ 1, and thus by the Chebyshev inequality, after enlarging C < ∞ if necessary, we have that for every a ⩾ C, Now, let A ⊆ R N be a finite set such that any two points in A are at distance at least 1 2 from one another, and no point of { x ⩽ 1} can be added to A without violating this property.By this property of maximality, it must be that for every x satisfying x ⩽ 1, there exists y ∈ A such that x − y ⩽ 1 2. Since for any x, y ∈ R N , we have and thus In order to construct such a set A, we simply pick points in { x ⩽ 1} in some arbitrary manner, until the maximality property is reached.Note that the balls centered at each of the points in A and of radius 1 4 are disjoint; they are also contained in the ball of radius 5 4. Computing the volume of these sets, we infer that A ⩽ 5 N , and thus, by a union bound, we have for every a ⩾ C that This implies in particular that E [ W 2 ] ⩽ CN , as desired.

Convergence to viscosity solution
The main goal of this section is to show that Proposition 2.2 implies Theorem 2.1.We also comment on variational representations for solutions of Hamilton-Jacobi equations at the end of the section.To start with, we recall the definition of viscosity solutions.Definition 4.1.We say that a function We say that a function We say that a function f ∈ C([0, ∞) 2 ) is a viscosity solution of (2.6) if it is a viscosity sub-and supersolution.We may also say that a function is a viscosity solution of (4.1) if it is a viscosity subsolution of (2.6).Similarly, we may say that a function ) is a viscosity solution of (4.1) with the inequalities reversed if it is a viscosity supersolution of (2.6).
The mechanism allowing to identify uniquely the viscosity solution to (2.6) subject to appropriate initial condition relies on the following classical comparison principle.Proposition 4.2 (Comparison principle).Let u be a subsolution and v be a supersolution of (2.6) such that both u and v are uniformly Lipschitz continuous in the variable h.We have The proof of Proposition 4.2 is given in Appendix B. (Besides the inconvenience that the domain under consideration is unbounded, the proof is classical.)In the statement of Proposition 4.2, we assume a certain uniform Lipschitz continuity property in the variable h.As will be clear from the proof, this assumption can be weakened, and possibly be removed.This assumption is meant to allow for a simpler proof, and is not causing additional difficulties elsewhere since it is very easy to check that our candidate solutions satisfy it.
We are now ready to prove Theorem 2.1.
Proof of Theorem 2.1.By Lemma 3.2, it suffices to study the convergence of F N as N tends to infinity.Recall that F N (h, 0) = ψ(h) does not depend on N .Moreover, it is clear from (3.4) and (3.8) that F N is uniformly Lipschitz in both variables.Hence, by the Arzelá-Ascoli theorem, the sequence (F N ) is precompact for the topology of local uniform convergence.Let f be such that F N converges to f locally uniformly as N tends to infinity along a subsequence.
For notational convenience, we will omit to refer to the particular subsequence along which this convergence holds.Our goal is to show that f is a viscosity solution of (2.6).By the comparison principle (Proposition 4.2), this would identify f uniquely, and thus prove the theorem.We decompose the rest of the proof into six steps.
Step 1.We show that f is a viscosity supersolution of (2.6).It is easy to show that in the definition of viscosity supersolution, replacing the phrase "local minimum" by "strict local minimum" yields an equivalent definition.Let (t, h) ∈ (0, ∞) × [0, ∞) and φ ∈ C ∞ ((0, ∞) × [0, ∞)) be such that f − φ has a strict local minimum at the point (t, h).Since F N converges to f locally uniformly, there exists a sequence (t N , h N ) ∈ (0, ∞) × [0, ∞) converging to (t, h) as N tends to infinity and such that F N − φ has a strict local minimum at (t N , h N ).If h N > 0 infinitely often, then along a subsequence on which this property holds, we have that the first derivatives of F N and φ at (t N , h N ) coincide, and thus by Proposition 2.2 that By continuity, this implies that as desired.There remains to consider the case when h N = 0 infinitely often.In this case, we must have h = 0. We can also assert that and we recall that, by Proposition 2.2, If −∂ h φ(t, h) ⩾ 0, then there is nothing to show.Otherwise, using the first statement in (4.2), we find that and thus, using also the second statement in (4.2) and (4.3), This completes the proof of the fact that f is a supersolution.
Step 2. We next show that f is a subsolution of (2.6).In this step, we focus on contact points of the form (t, 0); that is, we give ourselves t > 0 and φ ∈ C ∞ ((0, ∞) × [0, ∞)) such that f − φ has a strict local maximum at the point (t, 0).In this case, there exists a sequence (t N , h N ) ∈ (0, ∞) × [0, ∞) converging to (t, 0) and such that F N − φ has a local maximum at (t N , h N ).If h N = 0, then we must have that This inequality still holds, and is in fact an equality, if h N > 0. In view of (2.7), we thus deduce that −∂ h φ(t N , h N ) ⩽ 0. Letting N tend to infinity, we obtain that −∂ h φ(t, 0) ⩽ 0, as desired.
Step 3. We now consider the remaining possible contact points.Let t, h > 0 and φ ∈ C ∞ ((0, ∞) × [0, ∞)) be such that f − φ has a local maximum at the point (t, h).For the remainder of this proof, we allow the value of the constant C < ∞ to change from place to place, and to depend on t, h, f and φ, without further notice.For convenience, we introduce the notation We clearly have that f − φ has a strict local maximum at (t, h).We also have that for every N .Since f − φ has a local maximum at (t, h), we infer that for N sufficiently large, the function The point of replacing φ by φ was precisely to obtain such an explicit estimate.We have We next wish to use Proposition 2.2 to conclude.However, since the concentration result in Lemma 3.2 applies to F N − F N rather than its derivatives in h, we will want to take a small local average in the h variable to control the term involving ∂ h (F N − F N ).In preparation for this, we show in this step that there exists a constant C < ∞ such that for every We start by writing Taylor's formula The same identity also holds with F N replaced by φ.Since F N − φ has a local maximum at (t N , h N ), and in view of (4.6), we get that for Moreover, the integral on the right side is bounded by C(h ′ − h N ) 2 , since φ is assumed to be smooth.By (3.18), we also have that ∂ 2 h F N ⩾ −C, and thus Inequality (4.7) then follows using (4.8) once more.
Step 4. We set It is clear that the function G N converges to f locally uniformly as N tends to infinity.Hence, there exists a sequence t ′ N , h ′ N > 0 such that for every N sufficiently large, the function G N − φ has a local maximum at (t ′ N , h ′ N ).Repeating the argument of the previous step, we also obtain that (4.9) In the next two steps, we will show the following estimates: (4.13) and (4.14) For now, we assume that these estimates hold and show how to conclude.Using the fact that ∂ h F N is bounded, Jensen's inequality, and (4.13), we obtain that 12), using the estimate above and (4.14), we get Appealing to (4.10)-(4.11),passing to the limit N → ∞ and recalling that the first derivatives of φ and φ coincide at (t, h), we conclude that (4.16) as desired.
Step 5.In order to complete the proof, there remains to show (4.13) and (4.14).In this step, we prove (4.14).The argument relies on the fact that, by integration by parts, we have for any smooth function g Applying (4.17) with g = F N − F N , using that ∂ h F N is bounded and the Cauchy-Schwarz inequality, we get that the left side of (4.14) is bounded by .
By Lemma 3.2, the first term in this product is bounded by CN − 1 6 .For the second term, we use (3.18) to observe that, for the constant C = C 0 identified there, , and thus, using again that ∂ h F N is bounded, we conclude that N , this completes the proof of (4.14).Step 6.We now prove (4.13).Observe that . We use (4.17) with g replaced by the function Using also (3.18) and (3.15), we obtain that Since F N is Lipschitz continuous in the variable t, we also have that The estimate (4.13) then follows using (4.5), (4.9) and (4.7).
We conclude this section with some remarks on variational representations for the function f appearing in Theorem 2.1.Solutions to Hamilton-Jacobi equations of the form ∂ t f − H(∇f ) with convex (resp.concave) H have a variational representation given by the Hopf-Lax formula, in which the convex (resp.concave) dual of H appears (see e.g.[12,Theorem 10.3.4.3]).It is usually under this variational presentation that the limit free energy of mean-field statistical mechanics models is identified.In our case, the function H is simply p ↦ 2p 2 , whose convex dual is q ↦ q 2 8 .Proposition 4.3 (Hopf-Lax formula).For every t ⩾ 0 and h ⩾ 0, we set with the understanding that f (0, h) = ψ(h).The function f is the unique viscosity solution of (2.6) that satisfies f (0, h) = ψ(h) and is globally Lipschitz continuous in the variable h.
For completeness, we provide a proof of this classical result in Appendix B. Denoting H(p) ∶= 2p 2 and H * (q) ∶= q 2 8 , we have the following equivalent expressions for f which may be of interest: We stress that the proof of Theorem 2.1 does not require that f be identified by such a variational presentation.Moreover, the analysis of f itself does not necessarily require explicit usage of this formula.For instance, if one wants to observe that ∂ h f (t, 0) = 0 for small values of t ⩾ 0, which at least on a heuristic level corresponds to a regime where there is no correlation between x and x, see (3.8), then we may proceed as follows.First, we check that there exists a constant C < ∞ such that for every h ⩾ 0, we have ψ(h) ⩽ Ch 2 .(See (B.17) for a first step.)We next observe that the function is a supersolution of (2.6) on (0, (8C) −1 ) × [0, ∞), and thus, by the comparison principle, the solution f to (2.6) remains below this supersolution.Since the null function is a subsolution, we deduce that ∂ h f (t, 0) = 0 for every t < (8C) −1 .

Extension to tensors
We now explain how to adapt the method to tensors of arbitrary order.In this setting, the result was obtained in [16,3].One motivation for exploring this generalization is that some methods, such as that used in [11], do not seem to generalize well to tensors of odd order.
We fix an integer p ⩾ 1. Generalizing the previous setting, we consider the problem of estimating the vector x = (x 1 , . . ., x N ) ∈ R N given the observation of √ t N p−1 2 x ⊗p + W, where W = (W i 1 ...ip ) 1⩽i 1 ,...,ip⩽N is now a tensor of order p made of independent standard Gaussian random variables, independent of the vector x, and where for any x ∈ R N , we denote by x ⊗p the tensor of order p such that, for every i 1 , . . ., i p ∈ {1, . . ., N }, We redefine H N (t, h, x) to be and set F N and F N to be as in (2.3) and (2.4).The analogue of Theorem 2.1 in the context of tensors reads as follows.
Theorem 5.1 (Convergence to HJ).For every M ⩾ 1, we have where f (t, h) is the viscosity solution of the Hamilton-Jacobi equation The next proposition is our replacement for Proposition 2.2.

Proposition 5.2 (Approximate HJ in finite volume
).There exists C < ∞ such that for every N ⩾ 1 and uniformly over [0, ∞) 2 , and moreover, Proof of Proposition 5.2.Observe that By Gaussian integration by parts and the Nishimori identity, we deduce that The expressions (3.7)-(3.8)are still valid (as well as (3.10)).We deduce that Using that a p − b p = (a − b)(a p−1 + ⋯ + b p−1 ) and the fact that the support of the measure P is bounded, we get that The arguments in the proof of Proposition 2.2 apply without any modification to show that ). Combining the two previous displays yields Proposition 5.2.
Proof of Theorem 5.1.As in the proof of Theorem 2.1, it suffices to show that if f is such that F N converges locally uniformly to f along a subsequence, then f is a viscosity solution of (5.1).Abusing notation, we do not write explicitly the subsequence along which the convergence holds.
Step 1.We show that f is a viscosity subsolution of (5.1).The proof follows Steps 2-6 of the proof of Theorem 2.1 very closely.The first difference is that we use Proposition 5.2 to replace (4.12) by .
(Implicit in this expression is the fact that the quantity under the square root on the right side is nonnegative.)The estimates (4.13) and (4.14) still hold and the proofs given there apply without any modification.We deduce as in (4.15) that We average the inequality (5.5) over , use Jensen's inequality, the estimate above and (4.14) to obtain that and then conclude as before that (4.16) holds.
Step 2. We now show that f is a viscosity supersolution of (5.1).Let )) be such that f − φ has a strict local minimum at the point (t, h).We keep the definition of δ N as in (4.4) for consistency of notation (although here a simpler choice not depending on the rate of convergence of F N to f would also do), and redefine G N to be In this new definition of G N , we have shifted the interval over which the integral is taken by δ N to the right in order to avoid the singularity of the error term in Proposition 5.2 near h = 0. Since G N converges to f locally uniformly, there exists a sequence (t N , h N ) ∈ (0, ∞) × [0, ∞) converging to (t, h) as N tends to infinity such that G N − φ has a local minimum at (t N , h N ).By Proposition 5.2, for every h ′ > 0, .
We also observe that the estimate (4.14) still holds in the present context.Averaging the inequality (5.6) over h ′ ∈ [h N + δ N , h N + 2δ N ], using Jensen's inequality, and (4.14), we get that .
Using Jensen's inequality for the left side of (5.6) is justified since We thus obtain that (5.7) lim inf If h N > 0 for infinitely many values of N , then the first derivatives of G N and φ coincide at (t N , h N ) for these values of N , and we thus deduce from (5.7) that On the other hand, if h N = 0 for infinitely many values of N , then we can reproduce the argument of Step 1 of the proof of Theorem 5.1 to conclude.Indeed, in this case, we must have h = 0, ( If −∂ h φ(t, h) ⩾ 0, then there is nothing to show.Else, using the first statement in (5.8), we have that and thus, using the second statement in (5.8) and then (5.7), we deduce that thereby completing the proof.

Appendix A. Nishimori identity
We verify the Nishimori identity stated in (3.2) and (3.3).We redefine the variable Y to be Y = (Y (1) , Y (2) For every bounded measurable functions f and g, we can write the quantity E [f (x)g(Y )], up to a normalization constant that depends neither on f nor on g, as with the shorthand notation dW ∶= ∏ i,j dW ij and dz ∶= ∏ i dz i .A change of variables leads to f (x)g(Y (1) , Y (2) ) Denoting the exponential factor above by E(x, Y ), we thus obtain that the law of Y is the law with density given, up to a normalization constant, by

Appendix B. Classical results on viscosity solutions
In this appendix, for the reader's convenience, we prove the comparison principle (Proposition 4.2) and the Hopf-Lax formula (Proposition 4.3) for solutions of the Hamilton-Jacobi equation (2.6).Classical references for such results include [12,9].
Proof of Proposition 4.2.We argue by contradiction, assuming instead that The argument rests on the idea of "doubling the variables" and considering the maximization of functions of the form where α > 0 is a parameter that is ultimately sent to 0. We will decompose this argument into four steps.In Step 1, we modify the functions u and v slightly so that they become strict sub-and supersolutions respectively.In Step 2, we modify the function u further to ensure that the maximum of the function in (B.2) is achieved at a point that remains in a bounded set as α → 0. In a preliminary Step 0, we build a convenient special function for this purpose.The conclusion is then derived in Step 3.
Step 1.We show that without loss of generality, we can assume that there exists ε > 0 such that u is a viscosity solution of (B.7) Indeed, since we assume that , and then select ε > 0 sufficiently small that the property (B.1) still holds.
Similarly, we can assume that the function v is a viscosity solution of (B.8) Note that these modifications preserve the fact that u and v are uniformly Lipschitz continuous in h.
Step 2. We now "localize" the function u, in the sense that we make sure that the function becomes very negative as me move away from a bounded set.
To start with, we can replace u by u − ε T −t for some ε > 0 and T ∈ (0, ∞).This preserves the fact that u solves (B.7) on (0, T ) × [0, ∞), and for ε > 0 sufficiently small and T sufficiently large, it also preserves the property (B.1) in the sense that (B.9) sup This modification of u also ensures that for every H > 0, Note that we still have that sup and that u is uniformly Lipschitz continuous in h, and thus there exists a constant C < ∞ such that Our final modification is to replace u by u δ ∶= u−Φ δ for the function Φ δ defined in the Step 0. It is clear from (B.3) that for δ > 0 sufficiently small, the properties (B.9) and (B.10) still hold with u replaced by u δ .It is also clear from (B.4) and (B.11) that for every δ > 0 sufficiently small, we have There remains to verify that u δ is still a solution of (B.7), possibly after replacing ε by ε 2. Formally, the verification of the boundary condition on [0, T ) × {0} is immediate from (B.3), while we have and since ∂ h u is bounded, it follows from (B.5) that for every δ ⩾ 0 sufficiently small, This formal calculation is easily made rigorous using test functions.
Finally, we observe that as a consequence of (B.12) and the Lispschitz continuity of v in the h variable, there exists a constant C < ∞ such that for every t ∈ [0, T ), h, t ′ , h ′ ⩾ 0, we have We select δ > 0 sufficiently small that for t, h, t ′ , h ′ as above, Step 3. Summarizing the result of the previous steps, we have shown that without loss of generality, we can assume that that v solves (B.7) on (0, ∞) × [0, ∞), that u solves (B.8) on (0, T ) × [0, ∞) and satisfies (B.10) and (B.13) for every t ∈ [0, T ) and h, t ′ , h ′ ⩾ 0, and that (B.9) holds.
We define, for every α ∈ (0, 1] and t We claim that the maximum of Ψ α is achieved at some point (t α , h α , t ′ α , h ′ α ), and that this point remains in a bounded set as α > 0 is sent to 0. For fixed α > 0, consider a sequence of approximate maximizers for Ψ α denoted by (t n,α , h n,α , t ′ n,α , h ′ n,α ).We deduce from (B.13) that for some C < ∞, and in particular, If h n,α < 2δ −2 , then we can obtain a uniform upper bound on h ′ n,α , and thus also on t n,α − t ′ n,α .Otherwise, we can first obtain a uniform upper bound on h n,α − h ′ n,α , and then deduce an upper bound on h n,α , h ′ n,α and t n,α − t ′ n,α .Finally, we can use (B.10) to conclude that the maximizer of Ψ α exists and remains in a bounded set as α tends to 0.
Since u − v remains bounded from above over this bounded set, see (B.13), there exists a constant C < ∞ such that for every α > 0 sufficiently small, we have After extracting a subsequence if necessary, we may assume that t α , t ′ α → t 0 and h α , h ′ α → h 0 as α → 0. Using again (B.10), it is clear that t Since we also have and the functions u and v are continuous at (t 0 , h 0 ), we deduce, using (B.14) twice, that In view of (B.1) and the second equality in (B.15), we have that t 0 > 0, and thus, for every α > 0 sufficiently small, we have that t α > 0 and t ′ α > 0. Note that by definition, the function has a local maximum at (t, h) = (t α , h α ).If h α = 0, then the definition of viscosity solutions implies that, for every α > 0 sufficiently small, Since in this case (h α = 0) the first term in the minimum is nonnegative, we deduce that As can be seen directly, this conclusion also holds when h α > 0. Similarly, we infer from the fact that the function This is in contradiction with (B.16), and thus the proof is complete.
We now turn to the proof of Proposition 4.3.
Proof of Proposition 4.3.We decompose the proof into five steps.
Step 1.We show that the function ψ is uniformly Lipschitz.This function is clearly differentiable at every h > 0 and where here the notation ⟨⋅⟩ simplifies into .
Since we assume that the support of the measure P is bounded, this completes the proof that ψ ′ is uniformly bounded.
Step 2. For convenience, we extend ψ to be constant equal to ψ(0) on (−∞, 0], so that for every t, h ⩾ 0, For every h, h 1 ⩾ 0, we have and thus, taking the supremum over h ′ on both sides, Since the roles of h and h 1 are symmetric, this shows that f is uniformly Lipschitz in the h variable. Step 3. As preparation for the proof that f is a viscosity solution of (2.6), we prove the dynamic programming principle, namely, that for every t, s, h ⩾ 0, (B.18) f (t + s, h) = sup By convexity of the square function, we have, for every t, s, h, h Step 4. We show that f is a viscosity supersolution of (2.6).Let (t, h) ∈ (0, ∞)×[0, ∞) and φ ∈ C ∞ ((0, ∞)×[0, ∞) be such that (t, h) is a local minimum of f − φ.We start by assuming that h > 0. By (B.18), we have that for every p ∈ R and s > 0 sufficiently small, f (t, h) ⩾ f (t − s, h − sp) − sp 2 8 .
Since f − φ has a local minimum at (t, h), we have that for every p ∈ R and s > 0 sufficiently small, f (t − s, h − sp) − φ(t − s, h − sp) ⩾ f (t, h) − φ(t, h).
Combining these two inequalities and passing to the limit s → 0, we deduce that If −∂ h φ(t, h) ⩾ 0, then there is nothing to show.Otherwise, we can choose p = −4∂ h φ(t, h) and conclude for the validity of (B.20).
Step 5. We show that f is a viscosity subsolution of (2.6).Let (t, h) ∈ (0, ∞)×[0, ∞) and φ ∈ C ∞ ((0, ∞)×[0, ∞) be such that (t, h) is a local maximum of f − φ.In view of (B.18) and of the fact that f is uniformly Lipschitz in the h variable, it is clear that for each s > 0, there exists h ′ s ∈ R such that and moreover, we have that h ′ s → 0 as s → 0. Since f − φ has a local maximum at (t, h), we have that, for every s > 0 sufficiently small, For s > 0 sufficiently small, we can choose a = h ′ s in the inequality above, and reach a contradiction with (B.21).This shows (B.22) in the case h > 0.
When h = 0, our starting point has to be modified from (B.23) to the statement that for every s ∈ [−δ, δ] and a ∈ [−δ, 0], We can then reproduce the argument above and arrive at a contradiction.