Disorder and denaturation transition in the generalized Poland-Scheraga model

We investigate the generalized Poland-Scheraga model, which is used in the bio-physical literature to model the DNA denaturation transition, in the case where the two strands are allowed to be non-complementary (and to have different lengths). The homogeneous model was recently studied from a mathematical point of view in Giacomin, Khatib (Stoch. Proc. Appl., 2017), via a $2$-dimensional renewal approach, with a loop exponent $2+\alpha$ (${\alpha>0}$): it was found to undergo a localization/delocalization phase transition of order $\nu = \min(1,\alpha)^{-1}$, together with -- in general -- other phase transitions. In this paper, we turn to the disordered model, and we address the question of the influence of disorder on the denaturation phase transition, that is whether adding an arbitrarily small amount of disorder (i.e. inhomogeneities) affects the critical properties of this transition. Our results are consistent with Harris' predictions for $d$-dimensional disordered systems (here $d=2$). First, we prove that when $\alpha<1$ (i.e. $\nu>d/2$), then disorder is irrelevant: the quenched and annealed critical points are equal, and the disordered denaturation phase transition is also of order $\nu=\alpha^{-1}$. On the other hand, when $\alpha>1$, disorder is relevant: we prove that the quenched and annealed critical points differ. Moreover, we discuss a number of open problems, in particular the smoothing phenomenon that is expected to enter the game when disorder is relevant.


Introduction of the model and results
The analysis of the DNA denaturation phenomenon, i.e. the unbinding at high temperature of two strands of DNA, has lead to the proposal of a very elementary model, the Poland-Scheraga (PS) model [52], that turns out to be relevant not only at a conceptual and qualitative level [29,32], but also at a quantitative level [17,18].This model can naturally embody the inhomogeneous character of the DNA polymer, which is a monomer sequence of four different types (A,T, G and C).The binding energy for A-T pairs is different from the binding energy for G-C pairs.The quantitative analysis is then based on finite length chains with a given sequence of pairs, but in order to analyse general properties of inhomogeneous chains bio-physicists focused on the cases in which the base sequence is the realization of a sequence of random variables, that is often referred to as disorder in statistical mechanics.The PS model is limited to the case in which the two strands are of equal length and the n th base of one strand can only bind with the n th base of the other strand: it does not allow mismatches or, more generally, asymmetric loops, see Fig. 1a.A less elementary model, the generalized Poland-Scheraga model (gPS) [35] allows asymmetric loops, and different length strands are allowed too, see Fig. 1b.
A remarkable feature of the non disordered PS model (this corresponds to the case in which all the bases are the same: for example a strand AAA... and a second strand TTT...) is its solvable character.Notably, one can show that the model has a denaturation   1a), loops are symmetric (there are 5 loops of lengths 1, 1 loop of length 3 and 1 loop of length 5).The figure on the right represents the generalized PS model: the two strands may have a different number of bases (22 for the 'top' one, and 16 for the 'bottom' one), and loops are allowed to be asymmetric and can be encoded by two numbers (n, m) where n is the length the 'top' strand and m of the 'bottom' strand (the loops in Fig. 1b are from left to right (1, 1), (1, 1), (13,5), (1,1), (1,1), (3,5), (1, 1)).
transition in the limit of infinite strand length, and one can identify the critical point (the critical temperature) and the critical behavior, i.e. the nature of the singularity of the free energy at the critical value.Somewhat surprisingly, also the gPS model is exactly solvable, in spite of the fact that it is considerably more complex than the PS model.This has been pointed out first in [30,31,49] and a mathematical treatment can be found in [35].Let us stress that the higher complexity level of the gPS model is however reflected in a richer behavior.Notably, in the gPS model, other phase transitions exist, beyond the denaturation transition.Another relevant remark is that PS and gPS models contain a parameter -the loop exponent -that, in a mathematical or theoretical physics perspective, can be chosen arbitrarily and on which depends the critical behavior.In fact in this class of models the critical exponent depends on this parameter, and arbitrary critical exponents can be observed by tuning the loop exponent.
Stepping to the disordered model is not (at all) straightforward.One way to attack the problem is by looking at it as a stability issue: is the transition -and we will focus on the denaturation one -still present in the model if we introduce some disorder, for example a small amount?And, if it does, what is the new critical value and is the critical behavior the same as without disorder?We refer to [32,Ch. 5] for an outline on this general very important issue in statistical mechanics and on the renormalization group ideas that lead to the so called Harris criterion of disorder irrelevance.We speak of disorder relevance when the disorder, irrespective of its strength, makes the critical behavior of the model different from the one of the non disordered model.Disorder is instead irrelevant if the two critical behaviors coincide for a small disorder strength.In the relevant (resp.irrelevant) case one can argue that applying a coarse graining procedure makes the disorder stronger (resp.weaker).Harris' idea is that disorder (ir)relevance, can be read out of the critical exponent in the non disordered model.
More precisely, Harris criterion says that, if ν denotes the correlation length exponent of the non-disordered system and d the dimension, ν > 2/d implies disorder irrelevance, at least if the disorder is not too strong.One also expects disorder relevance if ν < 2/d.The case ν = 2/d is dubbed marginal and deciding whether disorder is relevant or not is usually a delicate issue, even leaving aside mathematical rigor.The PS and gPS models, with their wide spectra of critical behaviors, therefore become an ideal framework for testing the validity of the physical predictions.In fact, the mathematical activity on the PS model (which is one-dimensional) has been very successful.Results include: • Very complete understanding of the PS model when disorder is irrelevant [1,34,47,55]; • Precise estimates on the disorder induced shift of the critical point (with respect to the annealed model) in the relevant disorder case [3,26], and a proof of the fact that disorder does change the critical exponent [39,24] (without determining the new one: this is an open problem also in the physical literature, even if consensus is starting to emerge about the fact that pinning model in the relevant disorder regime should display a very smooth localization transition, see [27,6] and references therein); • Determination of whether or not there is a disordered induced critical point shift in the marginal case, and precise estimates of this shift: this issue was controversial in the physical literature [10,37].In absence of critical point shift, the critical exponent has also been shown to be unchanged by the noise.Showing that disorder does change the critical behavior when there is a critical point shift at marginality is an open issue, and determining the critical behavior in presence of disorder does not appear to be easier than attacking the same issue in the relevant case [27].
Our aim is to analyze the disordered gPS model and to understand the effect of disorder on the denaturation transition for this generalized, 2-dimensional, model.
n 's monomer of the first strand is attached to the τ n 's monomer of the second strand.Put differently, the n th loop in the double strand is encoded by (τ n−1 ), see Figure 1a and its caption.We refer to [35] for further details.
Let ω := {ω n,m } n,m∈N be a sequence of IID centered random variables (the disorder), taking values in R, with law denoted P. We assume that the variables ω n,m are centered, have unit variance and exponential moments of all order, and we set for This choice of disorder is discussed in detail in Section 1.3.Given β > 0, h ∈ R (the pinning parameter) and N, M ∈ N, we define P β,h,ω N,M a measure whose Radon-Nikodym derivative w.r.t.P is given by where Z β,h N,M,ω is the constrained partition function (the normalization constant) This corresponds to giving a reward βω n,m + h (or a penalty if it is negative) if the n th monomer of the first strand and the m th monomer of the second strand meet.Note that the presence of 1 (N,M )∈τ in the right-hand side means that we are considering trajectories that are pinned at the endpoint of the system (at a technical level it is more practical to work with the system pinned at the endpoint, see the proof of Theorem 1.1).
We also define the free partition function, where the endpoints are free that can be compared to the constrained partition function (1.4), see Lemma 2.2.For notational convenience, we will sometimes suppress the β, h from the partition function.
One then defines the quenched free energy of the system.We prove the following theorem in Section 2.
where the first limit exists P( dω)-almost surely and in L 1 (P).The same result holds for the free model, that is The homogeneous model corresponds to the case β = 0: let us drop the β and ω dependence in the partition function that will be simply denoted Z h N,M .The homogeneous model is exactly solvable and sharp estimates of f γ (0, h) near criticality are given in [35].
On the other hand, we define the annealed free energy as This link with the homogeneous model and the fact that h c (0) = 0 allow immediately to identify the annealed critical point: Now observe that by Jensen's inequality, we have that E log Z N,M,ω ≤ log EZ N,M,ω and hence f q γ (β, h) ≤ f a γ (β, h).Moreover, since β → f γ (β, h) is non-decreasing, we have that f γ (0, h) ≤ f q γ (β, h).Therefore for every β we have One can show, by adapting the argument of proof of [34,Th. 5.2], that the second inequality is strict for every β = 0.The first inequality may or may not be strict and this is an important issue which is directly linked to disorder relevance and irrelevance.
Harris' criterion predicts that disorder is irrelevant if ν > 2/d.Here, Theorem 1.2 suggests that ν = 1/ min(1, α), if we admit that the correlation length of the non-disordered system can be given by the reciprocal of the free energy, as it is the case for the PS model, see [33].Since the model is 2-dimensional (contrary to the PS model which is 1-dimensional), it would mean that disorder is irrelevant when ν > 1, that is when α < 1.
And in fact our first result states that the first inequality in (1.11) is an equality if α < 1 and β is not too large.For the same values of β we can also show that the critical behavior is the same as for the β = 0 case (disorder irrelevance).Our second result asserts that the inequality is strict for α > 1.We interpret this critical point shift, with a certain abuse, as disorder relevance.We however refer to the discussion in Section 1.3 (in particular Conjecture 1.5) regarding the change in the critical behavior.We therefore prove that disorder is irrelevant if α < 1, and relevant (in terms of critical points) if α > 1, confirming Harris' prediction.
Theorem 1.3.Assume that σ is terminating ( this includes α < 1 and excludes α > 1).Then there exists β 1 > 0 (see (3.5)), such that for every β ∈ (0, β 1 ) we have h c (β) = h a c (β), and moreover Hence, the order of the phase transition is unchanged when σ is terminating (which is the case if α < 1), at least when β is small enough.We prove Theorem 1.3 in Section 3. We mention that when the disorder distribution is infinitely divisible (for instance Gaussian), one can get sharper bounds regarding the critical behavior of f γ (β, h), via a replicacoupling method, as done in [55] or [56].For a statement and a detailed proof, we refer to [44].
On the other hand, when α > 1, we show that the quenched and annealed critical points differ, and we give a lower bound on the critical point shift.
Theorem 1.4.For α > 1 we have h c (β) > h a c (β) for every β > 0.Moreover, for every ε > 0, there exists β ε > 0 such that for any β β ε we have Moreover, there is a slowly varying function L(•) such that We add that β → h c (β) − h a c (β) is a non decreasing function of β: this result can be proven by the exact same procedure as the one used to prove Proposition 6.1 in [38].It is to be interpreted that disorder relevance is non-decreasing in β.
1.3.On the results, perspectives and related work.
On the main theorems.A two replica computation plays a central role in the proof of Theorem 1.3 and in the proof of (1.14) of Theorem 1.4: the intersection renewal σ therefore emerges naturally, like in the PS model.In the PS context, we now know that disorder is irrelevant (for small values of β) if and only if the intersection renewal is terminating [10].For the gPS model our results go in the same direction, but it is not sharp in the marginal case α = 1: we only show disorder irrelevance when the intersection renewal σ is terminating.We refer to Remark A.7 for further discussion on the case α = 1, where more technicalities arise.
The proof of (1.13) is based on coarse graining techniques and fractional moment method: we have chosen to adapt the method proposed in [26] and the difficulties in its generalization come from dealing with the richness of a multidimensional path with respect to the one dimensional structure of the PS models.A keyword for these difficulties is off-diagonal estimates.It can certainly be improved in the direction of getting rid of the ε in the exponent for α ∈ (1, 2] and of the logarithmic term in the case α > 1 by using more sophisticated coarse graining techniques (see [34,Ch. 6] and references therein).One could probably aim also for sharp estimates, like in [10], but the estimates are technically rather demanding already to obtain (1.13).We have chosen to stick to these simplified non-optimal (but almost optimal) bounds because sharper results would have required a substantially heavier argument of proof.The techniques developed in [10,11] should transfer to this model: at the expense of a high level of technicality, we expect that, in analogy with the PS model, the necessary and sufficient condition for a critical point shift is the persistence of the intersection renewal σ = τ ∩ τ .
Discussion on the presence of a smoothing phenomenon.Of course, a fully satisfactory result on disorder relevance would include showing that the critical exponent is modified by the disorder.We do not have such a result, but let us make one observation and formulate a conjecture.
The observation is that Theorem 1.3 may appear at first surprising in view of the smoothing inequality [39,24] for PS models that ensures that the free energy exponent cannot be smaller than 2 in presence of disorder: for the gPS model the free energy exponent can go down to 1, since in (1.12) we can choose α arbitrarily close to 1.The reason of the difference is that the PS model is 1-dimensional whereas the gPS model is 2-dimensional: Harris criterion tells that disorder should be irrelevant if ν > 2 for the PS model, and ν > 1 for the gPS model.In the gPS model, the irrelevant disorder regime therefore holds even if ν (= min(1, α) −1 ) is arbitrarily close to 1: hence one should not hope for a general smoothing inequality valid whatever α is.
It is however worthwhile attempting to sketch the argument in [39], in the simplified set-up of Gaussian charges [32,Ch. 5,Sec. 4].This is useful both to understand were the argument fails and because we can realize that a suitable generalization of the argument naturally leads to a conjecture that we state just below.
The argument [39] is based on introducing a coarse graining scale ∈ N and considering the environment in terms of -boxes, see Figure 2. We argue for the case γ = 1 (M = N ) and we consider the system at criticality, that is h = h c (β): The environment is divided in blocks of size , called -boxes.A -box is good (shadowed in the figure) if the partition function in this block grows at an exponential rate that is larger than the free energy of the system.The good -boxes will be rare, but we can choose n such that in a system of linear size n , with positive probability, there will be at least one good -box.A lower bound on the partition function follows by the limitation to trajectories that visit only a given good -box (say, the closest).
(1) A good -box is a box for which the pinned partition function (i.e.pinned at the south-west and north-east corners of the -box) is larger than exp 1 2 f 1 (β, h+δ) , with δ > 0. For → ∞ this is a rare event.The probability of such an event can be estimated from below by shifting the environment of δ/β, that is ω i,j is replaced with ω i,j + δ/β, and by performing a relative entropy estimate [34,Ch. 5].This shows that the probability of such a rare event is at least exp(−δ 2 2 /(2β 2 )): note the 2 term, with respect to in the PS case [39].
(2) We then make a lower bound on the partition function of the system by discarding renewal trajectories that visit -boxes that are not good, and keeping only trajectories that enter good -boxes through the south-west corner and exit through the north-east corner.The trajectories are therefore alternated jumps to a good box, visit of the box, and then a new jump to another good box.Jumps are long because good boxes are rare.The analysis in [39] is ultimately reduced to see what happens in one jump and visit: by exploiting super-additivity one can even just choose N = n such that there is (say, with probability at least 1/2), at least one good box in the system (like it is done in [9]).We therefore see that we need n 2 exp(−δ 2 2 /(2β 2 )) ≈ 1, so that n ≈ exp(−δ 2 2 /β 2 ): with this level of precision, jumping to enter such a box costs K(n ) = (n ) −(2+α) (let us consider the case in which L(•) is a constant, but the computation goes through in the same way also in the general case).In the box there will be a contribution exp 1  2 f 1 (β, h + δ) .The net contribution to the logarithm of the partition function, divided by the size n of the system, is then with c a positive constant that we have left implicit (it depends on more accurate computations, and can be in principle just reduced to 2 + α).Now let us choose h = h c (β).So the argument we just outlined goes in the direction of saying that so that (1.17) At this stage choosing arbitrarily large is of no help.The steps we have performed up to now require that δ is large (so that the good boxes we have chosen are really sparse).On the other hand we need to have chosen the size of the boxes so that Z β,h , ,ω ≥ exp( (f 1 (β, h c (β)+δ))/2).This is a delicate issue, but it definitely appears that for this to hold, f 1 (β, h c (β) + δ) needs to be sufficiently large (say, larger than a suitable constant): see for example the discussion on the notion of correlation length given in [34, Ch. 2] and references therein, notably [40], where the correlation length is identified by the reciprocal of the free energy.But if is (a constant times) 1/f 1 (β, h c (β) + δ) then from (1.17) we obtain ) for some C > 0. But such a bound is trivial: it holds with C = 1 just because the contact density cannot exceed one!On the other hand, as we have already pointed out, we could not have hoped for a better bound valid for any α > 0.
In spite of the fact that it leads to a trivial result, we insist that the argument we have just outlined can be made rigorous: the delicate step is the last one, where one has to use arguments developed in [40].It can therefore be taken as a starting point to push things further.Indeed, it appears useless to modify the environment in the whole -box, at least if α > 1.In fact if α > 1 one can show that for q > 1/ max(α, 2) lim We can then consider modifying only the environment that is close to the diagonal, that is in a subset of the -box with |i − j| q .This would improve the lower bound on the probability of a good -box to exp(−c δ 2 q+1 ), and (1.17) would become Taking a constant times 1/f 1 (β, h c (β) + δ) as in the argument leading to (1.18), and then taking q arbitrarily close to 1/ min(α, 2) supports the following: Conjecture 1.5.For every α > 0 and every β > 0 lim sup We stress that a natural concern arises from performing the change of measure only in a subset of the environment, close to the diagonal.One indeed needs to be sure that the trajectories contributing to (a fraction of) 1 log Z , ,ω ≈ f 1 (β, h c (β)+δ) can be constrained to stay in the region {(i, j) ∈ Z 2 : |i−j| ≤ q }: if it is the case, one can "force" trajectories to visit sites where the environment has indeed been shifted.
An important modeling issue: the choice of the disorder.There is no doubt that the first disorder that comes to mind when thinking of DNA modeling is not the one we have used.One would rather choose ω i,j = f (ω i , ω j ) for a suitable choice of a function f and a sequence {ω j } j=1,2,... of random variables (let us say IID for simplicity, but if we want to stick to DNA problems very closely it appears that some sort of strongly correlated sequence may be more appropriate [50]).For example, we could choose ω j taking only two values e AT and e GC and then make a choice for f that reflects the fact that AT bounds are weaker than GC bounds, and that all other possible bounds are even weaker.Even restricting to {ω j } j=1,2,... that is IID, this model is highly non trivial (gPS model with this type of disorder has been considered at a numerical level in [30,31], see also [28,54] for related work).But one could also choose to consider the binding of two sequences that are not complementary (the case considered in [49] goes in this direction, even if only heuristics and numerics are presented): choose for example two independent sequences {ω (1) j } j=1,2,... and {ω (2) j } j=1,2,... and use ω i,j = f (ω j ).This is somewhat closer to what we are using (though it can be considered as a one-dimensional disorder), but it is still very difficult to deal with.The problem is in any case due to correlations in the disorder field ω i,j , which can be dealt with in some cases, see e.g.[8,12] or [2,21].Our choice is in a sense a toy choice, but we stress that it is conceptually similar to the simplification made for example in [20] in the RNA context.Moreover it recovers importance once we leave somewhat the DNA context and focus rather on moving toward understanding mathematically Harris' theory of disorder (ir)relevance-in particular for 2-dimensional systems, compared to the PS model, which is 1-dimensional.
We also point out that this disordered version of the gPS model gives a bridge between pinning model and directed polymers in random environment [22,46], in particular, to the long range directed polymer [22,57].Moreover a different class of two-strand polymer problems (the random walk pinning model) is treated in [13,15,16].
Open questions and perspectives.Several natural issues remain open: let us list some of them.
(1) Prove a smoothing inequality, thus showing disorder relevance in the original sense of Harris, for α > 1 (see Conjecture 1.5).( 2) What is the effect of disorder on the other phase transitions?Here we have addressed only the denaturation transition, but in [35] other transitions are shown to exist.Do they withstand the introduction of disorder?If so, does the corresponding critical behavior differ from the homogeneous case?This is the question (quickly) addressed [49] where a rather bold conjecture is set forth.(3) We have dealt only with free energy estimates, but, like for the standard PS model, obtaining precise estimates on the gPS process (i.e.establish properties of trajectories) is very challenging, see [34,Ch.8] and references therein.The problem comes of course from the inhomogeneous nature of the disorder and the fact that on rare regions atypical disorder behaviors appear (this is ultimately also the problem we face at the free energy level, but it becomes particularly explicit when one analyses the trajectories).A precise analysis of the trajectories of the non disordered gPS model can be found in [7]: this analysis is substantially more demanding than the corresponding one for the PS model.(4) Dealing with the marginal case α = 1 is open, mostly because of the additional technical difficulties (more complicated coarse-graining procedure, more technical estimates for bivariate renewals, etc.).This appears to be a problem at reach, but a very substantial amount of technical work is certainly needed.
Organization of the rest of the work.The issues of existence and self-averaging of the free energy, i.e. the proof of Theorem 1.1, are treated in Section 2. In Section 3 we prove Theorem 1.3, as well as the upper bound (1.14) of Theorem 1.4.The rest of the Theorem 1.4 is proven in Section 4. We collect in Appendix A a number of statements and proofs about bivariate renewals.
1.4.Some further notations.We stress that τ is symmetric and in the domain of attraction of a min(α, 2)-stable distribution: we denote (b n ) n 1 be the recentering sequence and (a n ) n≥0 the renormalizing sequence for τ n , that is such that 1 an (τ n − (b n , b n )) converges to a min(α, 2) stable distribution, whose density is denoted , and b n = 0 if α ∈ (0, 1).The asymptotic behavior of a n is characterized by where σ(n = +∞, then σ(n) grows to infinity as a slowly varying function (and verifies σ(n In any case, there exists some slowly varying function ψ(•) such that 2) . (1.22) We provide some useful results on bivariate renewals in Appendix A, in particular on the renewal mass function P((n, m) ∈ τ ).

Free Energy: existence and properties
In this section we often assume γ ∈ Q: in this case we write it as γ = p/q with p and q relatively prime positive integer numbers.Proposition 2.1.For every γ > 0 and every {M (N )} N =1,2,... such that lim N →∞ M (N )/N = γ we have that where the first limit is meant P( dω)-a.s. and in L 1 (P).f γ (•, •) is convex and f γ (β, •) is nondecreasing, and also f γ (•, h) is non-decreasing on the positive semi-axis, non-increasing in the negative one.Moreover if Finally we have the bound: for every γ 2 ≥ γ 1 > 0 ) Proof.The proof is divided into several steps: (1) We first show that for γ ∈ Q, along a subsequence with N/q ∈ N, log Z N,γN,ω is super-additive in an ergodic sense, which implies the existence of the free energy limit (2.1) along this subsequence.(2) The restriction γN ∈ N is then removed by a direct estimate, for what concerns the existence of the free energy limit, still with γ ∈ Q. (3) We then prove a comparison estimate between Z N,γ 1 N,ω and Z N,γ 2 N,ω and use it to establish the existence of the free energy limit for Z N,γN,ω , every γ > 0. (4) The same comparison estimate yields also (2.3), and the fact that one can take the limit along an arbitrary sequence satisfying M (N ) ∼ γN , for N → ∞. (5) Finally, we prove the convexity and monotonicity statements.
Step 3. We now establish (2.1) for M (N ) = γN for an arbitrary γ > 0, by proving the announced comparison bounds, upper and lower.The upper bound is more general: if M 2 > M 1 and if there exists c > 0 such that M 2 ≤ cN we see that where in the first inequality we have used that K(•) is regularly varying and that M 2 ≤ cN to see that there exists c K > 0 such that for every N .For the second inequality we have relaxed the constrained m < M 1 to m < M 2 .
On the other hand, we prove a comparison lower bound only for M of the form γN .Let us choose γ 2 > γ 1 > 0. Note that for we have possibly changing the value of c K > 0.
Step 4. The generalization to a sequence M (N ) ∼ γN is just made by observing that given arbitrary γ 1 < γ 2 with γ ∈ (γ 1 , γ 2 ) for N 0 sufficiently large we have γ 1 N < M (N ) < γ 2 N for every N ≥ N 0 .At this point we can apply the comparison bounds like in the previous step and conclude by an approximation procedure.
Step 5.The function (β, h) → f γ (β, h) is convex because it is the limit of a sequence of convex functions.Monotonicity in h for β fixed is also evident from the finite N expression.The fact that β → f γ (β, h) is non increasing for β ≤ 0 and non decreasing for β ≥ 0 follows from convexity and the fact that ∂ β E log Z β,h N,M,ω = 0 (by direct computation, since the ω variables are centered), so ∂ β f γ (β, h)| β=0 = 0.This completes the proof of Proposition 2.1.
We now compare the constrained and the free partition function: Lemma 2.2.For any α + > α, there exists C such that for every N, M ∈ N and {e βω n,M , e βω N,m } .(2.13) Proof.The lower bound is trivial: we have On the other hand, for N, M ≥ 1, we have Now, observe that for any for any α + > α, so that For n < N and m = N , there exists C 2 such that and we obtain The analogous holds for the last term in (2.14), and the proof is therefore complete.
We now introduce some notation that is used later in the paper: for positive integers a 1 < a 2 and b 1 < b 2 , we define the partition function of the system on [a

Upper bound on the critical point shift
The arguments in this section follow the line of proof of H. Lacoin in [47], and is mainly based on a second moment computation.We start with some preliminary results.
N is uniformly integrable , there exists ζ > 0 such that for every sequence of events {A N } N =1,2,... satisfying lim N P(A N ) = 0 there is N 0 ∈ N such that inf Proof.We set h = h a c (β).It is sufficient to prove that there exists ζ > 0 such that inf and lim For (3.3) we observe that the Fubini-Tonelli Theorem implies N,M,ω (A N ) = 0 and (3.3) follows because We now prove that {Z N is uniformly integrable (and this holds for an arbitrary choice of M (N )) provided that the intersection renewal σ = τ ∩τ is terminating -τ and τ are two independent copies of τ -and β is small enough.Let us point out that, since σ is a terminating renewal then the total number |σ| of renewal points (except the origin), that is |σ| = (n,m)∈N 2 δ n,m with δ n,m = 1 (n,m)∈σ , is a geometric random variable of parameter P ⊗2 (σ 1 < ∞), where σ 1 < ∞ simply means that both components of σ 1 are finite.This in particular implies that we have that for every β ∈ (0, β 1 ) the sequence {Z , and is therefore uniformly integrable.
Proof.We write M = M (N ) and we compute the second moment of the partition function: Since |σ| is a geometric random variable of parameter N,M,ω } N is uniformly integrable for β < β 1 .Now for all 0 < η < α (recall that if σ is terminating, it implies that α 1), we set From Lemma A.2, we have that lim N P(A N ) = 0. Observe also that Let us call E N the event whose probability is estimated from below in (3.1).Then on E N , whose probability is at least ζ > 0, we have Therefore we obtain Our aim is to prove that f γ (β, h + h a c (β)) > 0 or more precisely give a lower bound for f γ (β, h + h a c (β)).We aim at using (2.2), this is why we have chosen γ ∈ Q, and now we choose also N such that γN ∈ N, so N = jq, j ∈ N (γ = p/q).Since the first part of the proof exploits the free partition function, and not the constrained one for which (2.2) holds, we use Lemma 2.2 that guarantees that Since there exists c 2 > 1 such that β|ω N,M | < c 2 log(N + M ) with probability at least 1 − ζ/2, and recalling that M ∼ γN , we get that there exists c 3 > 0 such that Combining this with (3.12), we get that for suitably chosen c 4 , c 5 > 0.
At this point the choice γ = p/q and M = γN ∈ N enters the game.By (2.2) we have and the fact that j has to be chosen larger than a certain j 0 just reflects the fact that the estimates in this proof have been performed for a N larger than a suitable N 0 .We now estimate from below the right-hand side in (3.16) by choosing N = h − 1+ε η (for some ε > 0 fixed): this means that we have chosen h = (jq) −η/(1+ε) .With this choice where the last inequality holds provided that h is small enough.This is the estimate we were after since we can choose η arbitrarily close to α and ε close to 0, but we have established it only for h of the form (jq) −η/(1+ε) , j = j 0 , j 0 + 1, for every sufficiently small h ( this can be verified by checking that ).This completes the proof of Theorem 1.3.
The technique used to prove Theorem 1.3 could be adapted for α > 1 to deduce the upper bound for the difference between quenched and annealed critical points.Proposition 3.3.Let α > 1.There exists a slowly varying function L(•) such that for β ≤ 1.
Proof.As in the previous proof, it suffices to work with the case γ = p/q ∈ Q and M = γN .We set Using Paley-Zygmund inequality, we therefore get that P(Z 1/8 for any N N β , and we can then adapt the proof of Proposition 3.1. Let us take A N := {|τ ∩ ((0, N ] × (0, M ]) | ≤ N/2µ}.Since lim N →∞ P(A N ) = 0, and we find, exactly as in the proof of Proposition 3.1, that there exists N 0 ∈ N such that for every N 0 ≤ N ≤ N β we have Following the proof of Theorem 1.3 (see (3.16)), provided N β N 0 , and since N β ∈ qN, we get that It therefore boils down to estimating N β , namely obtaining a lower bound.Recall from (3.6) that Note that for β ≤ 1, there exists c 8 such that log Q(2β) − 2 log Q(β) ≤ c 8 β 2 , and that We have In order to obtain an upper bound, we use the following fact Then we get Let σ := σ (1) + σ (2) .An elementary observation is that P ⊗2 σ 1 / ∈ (0, M ] 2 is of the same order as P ⊗2 (σ 1 > M ): indeed, for every M ∈ N we have Therefore, using that log(1 − x) − x for x ∈ [0, 1], we get that where we used Lemma A.6 to estimate Since M γN and U N,N is regularly varying, see Proposition A.3, we get that We therefore choose N such that c 10 /U N,N ≥ 3t = 3c 8 β 2 .By Proposition A.3, for α > 1, we can choose for some slowly varying function ψ(•).For this choice of N , we therefore get that which is smaller than 2 provided that β is small enough.It therefore implies that there exists some β 1 > 0 such that The proof is therefore complete by putting (3.32) in (3.22).

Lower bound on the critical point shift
From now on, L i (•) will denote slowly varying functions and C i positive constants for i = 1, 2, ... Also, we sometimes treat certain large quantities as if they were integers, simply to avoid the integer-part notation; in all cases these can be treated as if the integer-part notation were in use.
Our proof is based on combining the fractional moment method and a change of measure argument, following the same strategy adopted in [26].Let z n,m := exp (βω n,m + h) . (4.1) Choose k ≤ N and M such that M ∼ γN and decompose the partition function (1.4) as follows, see Figure 3: with (recall the notation (2.20)) Note that Z (N −i,M −j),(N,M ),ω has the same law as Z i,j,ω and that Z N −n,M −m,ω , z N −i,M −j and Z (N −i,M −j),(N,M ),ω are independent for i < n and j < m.
Let δ ∈ (0, 1) (that will be chosen close to 1 later in the proof), and define with A 0,0 = 1, and A i,0 = A 0,i = 0 for every i ≥ 1.We apply the inequality ( a i ) δ ≤ a i δ (which holds for any finite and countable connection of positive real numbers) to the decomposition (4.2) to get where The key idea of the proof is to the following proposition.
Proposition 4.1.For fixed β and h, if there exist k ∈ N such that ρ 1 + ρ 2 + ρ 3 ≤ 1 with ) Note that by Jensen's inequality we have A i,j E[Z i,j ] δ exp(δh min{i, j}), since there are at most min{i, j} renewals in the region {1, . . ., i} × {1, . . ., j}: we get that A e hk .Then from (4.4) and the fact that ρ 1 + ρ 2 + ρ 3 ≤ 1, we deduce (by induction) that A N,M ≤ A e k,h for all N, M .Then by Jensen's inequality Our aim is therefore to prove that for h = h a c (β) + ∆ ε β (where ∆ ε β is defined in Theorem 1.4) we have that f q 1 (β, h) = 0 (provided that β is small enough), by showing that ρ 1 , ρ 2 , ρ 3 are smaller than 1/3 for such h, for some k = k β wisely chosen.For the choice of k, we pick k proportional to the correlation length of the annealed system, that is k ∝ f(0, ∆ ε β ) −1 , and in view of Theorem 1.2 (here α > 1), we can take Note that, in view of (4.6) and (1.1), provided that δ is close to 1 so that (2 + α)δ > 2, we have and The ρ 3 case being symmetric to ρ 2 , we can therefore focus on ρ 1 and ρ 2 .
4.1.Finite-volume fractional moment estimate.To estimate (4.9) and (4.10), we need a good control over the fractional moment A i,j for any i, j k, and we provide estimates in this section.
However, this estimate is rather rough, especially when i, j is close to the diagonal (that is for example i j i+a i where (a n ) n≥0 is the scaling sequence for τ n , defined in Section 1.4).We therefore prove the following proposition: Then, define also so in any case i a i .There exists some k 0 such that, provided that k k 0 then for all √ k i k and i j i + i we have that This result is the core of the proof, and is based on a change of measure argument.With this result in hand, we are able to show that ρ 1 and ρ 2 are small, for α > 2 in Section 4.2 and for α ∈ (1, 2] in Section 4.3.Let us apply this proposition to get bounds on A i,j in the different cases.
Case α > 2. We get that uniformly for k/2 i k and i j i where the choice (4.8 5 .For the last inequality, we observe that the first term dominates. Case α ∈ (1, 2].We use also the choice (4.8 1) to get that uniformly for k 1−ε 2 i k, we have provided that ε is small enough Therefore, using also that for i k, β −2 i k α−1 (1+ε)α k (1+ε 3 )/α k (if ε has been fixed small enough), we have that uniformly for k 1−ε 2 i k and i j i + i , where again, for the second to last inequality, we observe that the first term dominates, since 1/α > α and ε can be fixed arbitrarily small.
Proof of Proposition 4.2.The idea is to use a change of measure argument.We define a strip J i,j in which we will tilt the environment by some quantity λ (to be chosen wisely): and hence #J i,j 2i i .The width 2 i of the strip is chosen because of the scaling of the bivariate renewal: it is very unlikely that the renewal deviates from the diagonal by more than i , see Theorem A.5.Now, for λ ∈ R and i, j ∈ N, we define a new probability measure P i,j,λ , under which the ω n,m are still independent variables, but tilted by λ in the strip J i,j : where Q(•) is defined in (1.2).Observe now that by Hölder inequality The second term in the right-hand side of (4.19) is equal to Observe that there exists c 11 > 0 such that 0 ≤ log Q(x) ≤ c 11 x 2 for |x| ≤ 1. Therefore for |λ| ≤ min(1, (1 − δ)/δ) and by (4.19) and (4.20), we get Now, we choose λ := (i i ) −1/2 , so that λ 2 #J i,j 2, and so that we are left with estimating E i,j,λ Z i,j,ω ) for λ := (i i ) −1/2 .
Lemma 4.3.There exists a slowly varying function L 4 such that, for every 1 i j i+ i we have Proof.Let us first observe that by symmetry, we get that

27)
From Theorem A.5, we see that the last term in the double sum of (4.27) is bounded above by c 7 /a i (since i − a − k i/2).We get that (4.27) is bounded above by with Then we see that Observe that {S k } is a centred random walk in the domain of attraction of a stable law of index α > 1.From the Lemma in [53] for the case α ∈ (1, 2] (and infinite variance) and [51,Corollary 1] or equation (12) in [19] for the case α > 2 (or α = 2 and finite variance) we get that Therefore by (4.28), (4.30) and (4.31), we obtain (4.26).
In particular, we always have Proof.The last inequality comes from the fact that for i j we have K(i+j) cL(i)i −(2+α) , and the fact that Theorem A.5 give P((i, j) ∈ τ ) c 14 /a i with a i = ψ(i)i 1/α∧2 .We write e −ku i P τ k = (i, j) .
For the first sum, we use Theorem A.1 to get that P(τ k = (i, j)) c 15 kK(i + j) for k i/2µ, so i/2µ k=1 e −ku i P(τ k = (i, j)) c 15 where for the last inequality we bounded the sum by a constant times R + xe −x dx (thanks to a Riemann-sum approximation for sequences u i → 0).
For the second sum we simply bound k by i/2µ to get that it is smaller than Finally, this concludes the proof of Proposition 4.2 thanks to (4.22), using that (a + b + c) δ a δ + b δ + c δ for δ ∈ (0, 1).We start by estimating ρ 1 .Let R be a large constant and split the sum in (4.9) as

Conclusion of the proof of
and Using the fact that A i,j ≤ e δ from (4.11), we get then S 2 is arbitrarily small for k large.Since S 3 and S 4 are the same quantity, we just focus on S 3 .Since A i,j ≤ e δ from (4.11), we obtain which again can be made small in view of the condition (4.35) and because R is large.
Hence ρ 1 can be made arbitrarily small by choosing R large and k large (i.e.β small).
Let us now look at ρ 2 in (4.10).We split the sum to: Let us first study S 5 : Using that A i,j ≤ e δ from (4.11), we get For S 5b , we use (4.11) and Theorem A.5 which gives that if i j A i,j cst.j −δ/2 to get

.44)
Then S 5 can be made small for k large and from condition (4.35).Now we split S 6 as (4.45) and Using (4.11) and Theorem A.5, we see that .
Hence, both S 6a and S 6b are arbitrarily small for k large, by the condition (4.35).
By (4.11), and since provided that k is large enough we have i + √ i log i 3k/4 for i k/2, we obtain which is arbitrarily small for k large.
For the term S 6d , since for every j ∈ {k/2 + 2, . . ., k − 1} there are at most C 10 √ k log k corresponding terms in the sum over i, we have Then we use Proposition 4.2, and more precisely (4.14), to get that In view of the condition (4.35), S 6d can be made arbitrarily small for k large.This completes the proof of (1.13) in the case α > 2.

Conclusion of the proof of Theorem 1.4 in the case
which implies in particular that δ(2 + α) > 3. We also assume that Let us start with showing that ρ 1 is small: we split the sum in (4.9) to and For α 2, we know that there exists a slowly varying function ψ(•) such that a i = ψ(i)i 1/α .For T 1 , using (4.11) and Theorem A.5, we get and from the condition (4.51), T 1 can be made small for k large.For T 2 , since (2 + α)δ − 2 ∈ (1, 2), we have where for the last inequality we used (4.11) and Theorem A.5.Then T 2 is small for k large thanks to (4.51).For T 3 (which is equal to T 4 ), since for the range of i, j considered we have 2k − i − j k/2, we get using (4.11) and Theorem A.5 which can be made small by taking k large, thanks to (4.51).In the end, we get that ρ 1 is bounded from above by a small constant for k large.
As far as ρ 2 is concerned, we split the right-hand side of (4.10) to Recall the definition of i in Proposition 4.2, and define ¯ i = i (1+ε 4 )/α i .We split T 5 as (4.59) and (4.60) From (4.11) and Theorem A.5, we get that (using that (2 + α)δ − 2 > 1 for the second line) and also (using also here that (2 + α)δ − 2 > 1 for the third line)  Finally, it remains to bound T 5d .As for (4.49), there are at most ¯ k terms in the sum over i (and (2 + α)δ − 2 > 1), so that which can be made arbitrarily small by choosing k large, because of condition (4.52).For T 6 , we have (4.66) By following the same procedure adopted for T 5 , T 6 is bounded above by a small term when k is large.The proof of (1.13) in the case α ∈ (1, 2] is therefore complete and, with it, also the proof of the lower bound part of Theorem 1.4. To prove (A.20), we use Theorem A.5 to get that uniformly for 0 r 1 ε a n , we have P((n, n + r) ∈ τ ) 2 ∼ (a n ) −2 c α (r/a n ) 2 as n → ∞.Hence, provided that n is large enough, we get that where the last inequality also comes from a Riemann-sum approximation.Finally, note that c α − 1/ε 0 c α (t) 2 dt is positive, and thanks to (A. 19) smaller than 1/ε ct −3 dt c ε 2 .In the end, we get that, provided that n is large enough, which gives (A.20) provided that ε has been fixed small enough.
We are now ready to estimate U N,N .We write The second term is negligible compared to ϕ(1/λ)λ −ρ as λ ↓ 0, since N n=0 P((n, n) ∈ τ ) 2 is negligible compared to ϕ(N )N ρ , by standard properties of Laplace transforms, and we again focus on the first term.
First of all, an upper bound is Now, we use that there is some n ε such that for n n ε we have that W where we used again Corollary 1.7.3 in [14] for the last asymptotics.By letting ε ↓ 0, we obtain matching upper and lower bound, so that (A.15) is proved.
We now use Proposition A.3, and in particular the estimate of the Laplace transform Û (λ), to obtain estimates on the tail probability of the intersection renewal σ = τ ∩ τ .

( a )
Standard PS model.(b) Generalized PS model.

Figure 1 .
Figure 1.Standard v. Generalized Poland-Scheraga models.The figure on the left represents the standard PS model: the two strands of DNA have the same length (there are 14 base pairs in Fig.1a), loops are symmetric (there are 5 loops of lengths 1, 1 loop of length 3 and 1 loop of length 5).The figure on the right represents the generalized PS model: the two strands may have a different number of bases (22 for the 'top' one, and 16 for the 'bottom' one), and loops are allowed to be asymmetric and can be encoded by two numbers (n, m) where n is the length the 'top' strand and m of the 'bottom' strand (the loops in Fig.1bare from left to right (1, 1), (1, 1),(13,5),(1,1), (1, 1),(3,5),(1,1)).

Figure 2 .
Figure 2. Schematic view of the coarse graining procedure proposed for a smoothing inequality.The environment is divided in blocks of size , called -boxes.A -box is good (shadowed in the figure) if the partition function in this block grows at an exponential rate that is larger than the free energy of the system.The good -boxes will be rare, but we can choose n such that in a system of linear size n , with positive probability, there will be at least one good -box.A lower bound on the partition function follows by the limitation to trajectories that visit only a given good -box (say, the closest).

Figure 3 .
Figure 3. Fixing a value k, the partition function is decomposed by summing over the values of the last renewal epoch outside the corner block (N − k, N ] × (M − k, M ], and the first one inside that block.We distinguish three cases: either the last renewal epoch is in [0, N − k] × [0, M − k] (which is the case represented in the figure, giving Z 1 N,M,ω ), or it is in (N − k, N ] × [0, M − k] (Z 2 N,M,ω ) or in [0, N − k] × (M − k, M ] (Z 3 N,M,ω ).
.38)and the right-hand side of (4.38) can be made small by (4.35) and because R is large.For S 2 , there exists C 4 such that S 2 ≤ C 4 max k−R≤i,j<k A i,j , and from (4.11), combined with Theorem A.5, there exists C 5 such that