Monge-Ampère gravitation as a Γ -limit of good rate functions

Monge-Ampère gravitation is a modiﬁcation of the classical Newtonian gravitation where the linear Poisson equation is replaced by the nonlinear Monge-Ampère equation. This paper is concerned with the rigorous derivation of Monge-Ampère gravitation for a ﬁnite number of particles from the stochastic model of a Brownian point cloud, in the spirit of the formal paper [6]. The main step in this derivation is the Γ − convergence of the good rate functions corresponding to a one-parameter family of large deviation principles. Surprisingly, the derived model includes dissipative phenomena. As an illustration, we show that it leads to sticky collisions in one space dimension.


Introduction
On a periodic domain such as T d = (R/Z) d , Newtonian gravitation is commonly described in terms of the density of probability f (t, x, ξ) to find gravitating matter at time t, position x ∈ T d and velocity ξ ∈ R d , subject to the Vlasov-Poisson equation ∂ t f (t, x, ξ) + div x (ξf (t, x, ξ)) − div ξ (∇ϕ(t, x)f (t, x, ξ)) = 0, where ϕ is the gravitational potential.Notice that the averaged density, say 1, has been subtracted out from the right-hand side of the Poisson equation, due to the periodicity of the spatial domain.This is a common feature of computational cosmology and it let the uniform density be a stationary solution.The Vlasov-Poisson system can be seen as an "approximation" to the more nonlinear Vlasov-Monge-Ampère (VMA) system ∂ t f (t, x, ξ) + div x (ξf (t, x, ξ)) − div ξ (∇ϕ(t, x)f (t, x, ξ)) = 0, (1) where the fully nonlinear Monge-Ampère equation substitutes for the linear Poisson equation of Newtonian gravitation.Indeed, for "weak" gravitational potential, by expanding the determinant about the identity matrix I, we get det(I + D 2 ϕ(t, x)) ∼ 1 + tr(D 2 ϕ(t, x)) = 1 + ∆ϕ(t, x) and recover the Newtonian model approximately (and exactly as d = 1).In this paper, we will speak of "Monge-Ampère gravitation" ("MAG" in short).The Vlasov-Monge-Ampère system has been introduced and related to the Vlasov-Poisson system in [8], and studied as an ODE on the Wasserstein space in [1].It can also be solved numerically thanks to efficient Monge-Ampère solvers recently designed by Mérigot [13].
It has been argued in [5] that the MAG may also be seen as an approximation of Newtonian gravitation for which the "Zeldovich approximation" [16] (see [12,7]), popular in computational cosmology, becomes exact.
In this paper we will not be directly interested in this system, but rather in its discrete version, i.e. when the number of particles is finite.As well known in optimal transport theory [3,4,15], the Monge-Ampère equation ( 2) is solved by the unique function ϕ such that the map Id + ∇ϕ realizes the optimal transport with quadratic cost from the density f dξ to the Lebesgue measure.Then, the kinetic equation ( 1) is known to be the continuous version of Newton equations of classical mechanics in a potential given by ϕ.
In the discrete setting, the stationary Lebesgue measure is replaced by a family (a 1 , . . ., a N ) ∈ (R d ) N of N ≥ 1 points in R d (here we make the presentation in R d instead of T d for the sake of simplicity).One can for instance think of a regular lattice approximating in some region a constant density, even though in the sequel the particular choice of (a 1 , . . ., a n ) will play no role.We will consider the evolution of a cloud of N particles (x 1 , . . ., x n ) in R d whose dynamic is ruled by the discrete optimal transport problem: More precisely, the analogous of (1)(2) in this framework is easily seen to be formally: Following the idea of the recent paper [6], we will derive this discrete dynamic from the very elementary stochastic model of a Brownian point cloud.However, in [6], the derivation was obtained through a double application of the large deviation principle (LDP), through a purely formal use of the Freidlin-Wenzell theory [11].The main purpose of the present paper is to explain how such a derivation can be made rigorous by substituting for one of the applications of the LDP a PDE method inspired by the famous concept of "onde pilote" introduced by Louis de Broglie at the early stage of Quantum Mechanics [9].
The outline of the paper is the following.In Section 2 we show how to derive MAG starting from a finite number of Brownian particles.This will be done in several steps and we do not want to enter into the details now, but a key argument will be the Γ−convergence of the good rate functions associated with a family of SDEs towards an "effective" functional related to MAG.This is stated in Theorem 5, which is our main result.Section 3 is dedicated to the proof of Theorem 5.The effective functional that we obtain does not lead exactly to MAG as stated in (4) (which as already saw in footnote 1 is not well-posed in general), but also includes dissipative phenomena in those points X where the solution of the discrete optimal transport problem (3) is not unique.Even if we do not know for the moment how to treat these dissipative effects in general, the purpose of Section 4 is to show that in 1 space dimension, they lead to sticky collisions.
Notations.We will work with N particles in R d , and hence in (R d ) N .Points of (R d ) N will be denoted with capital letters, mainly X, Y or Z. Curves with values in (R d ) N will be denoted with calligraphic letters X , Y or Z.The position of X , Y and Z at time t ∈ R will be denoted by X t , Y t and Z t respectively.
2 Derivation of the discrete model

The stochastic model of a lattice with Brownian agitation
Take A = (a 1 , . . ., a N ) ∈ (R d ) N a family of N ≥ 1 points in R d .We assume each point of this lattice to be subject to Brownian agitation for times t ≥ 0. At time t, the position of point i is where (B i ) i=1,...,d is a family of N independent normalized Brownian curves and ε monitors the (common) level of noise.As a consequence, at time t > 0, the density of probability ρ ε (t, X) for the point cloud , up to a permutation σ ∈ S N of the labels, is easy to compute.We find where | • | denotes the euclidean norm in R d or (R d ) N depending on the context, and where for all X = (x 1 , . . ., x N ) ∈ (R d ) N , X σ stands for: This was the starting point of the discussion made in [6], using a double large deviation principle.
In the present paper, we rather turn to a PDE viewpoint, where ρ ε is the solution of the heat equation in with, as initial condition, the delta measure located at A = (a 1 , . . ., a N ) ∈ (R d ) N and symmetrized with respect to σ ∈ S N , namely: In some sense, we have solved the heat equation in the space of "point clouds" (R d ) N /S N , with initial position A, defined up to a permutation σ ∈ S N of the labels i = 1, . . ., N .

"Surfing" the "heat wave"
After solving the heat equation in the space of "clouds" (R d ) N /S N (5)(6), we introduce the companion ODE in the space (R d ) N : or, more explicitly , where if U and V are in (R d ) N , U • V denotes the inner product between U and V .This velocity is chosen so that i.e. for the density ρ ε to be transported by the velocity field v ε .We may solve this ODE for arbitrarily chosen position X t0 ∈ (R d ) N and initial time t 0 > 0. In other words, we let the set of N "particles" X t = (x 1 (t), . . .x N (t)) ∈ (R d ) N "surf" the "heat wave" generated by the lattice subject to Brownian agitation!By doing that, we just mimic the idea of quantum particles driven by the "onde pilote", as imagined by Louis de Broglie [9] at the early stage of Quantum Mechanics.
Remark 1.In that case, we would use the same ODE with v = ε∇Im log ψ, ψ solving the Schrödinger equation.For instance, we could consider the free Schrödinger equation instead of the heat equation: with initial condition chosen according to "bosonic statistics".However, in the quantum case, the analysis gets substantially more difficult, due to the possible vanishing of the wave function ψ during the evolution.

Large deviations of the "heat wave" ODE
Let us go back to the "heat wave" ODE and add a noise of the following type: where η is a positive number and α is a smooth function from R * + to R * + .In other words, our "surfers" are now subject to some additional agitation, while surfing on the heat wave generated by the lattice already under Brownian agitation!
We will see that when η and ε are small, then the trajectories charged by the solution of this SDE that are in P ∈ (R d ) N at time t 0 > 0 and in Q ∈ (R d ) N at time t 1 > t 0 (up to ordering) are very close to the dynamic of MAG.Notice that the level of noise depends on time through the function α(t).It will be crucial in Subsection 2.5 since we will only recover MAG after suitable change of time.
Since, for fixed ε > 0 and t > 0, v ε is a smooth velocity field, existence of a strong solution and pathwise uniqueness for (7) is standard once fixed a law for the initial position X ε,η t0 , t 0 > 0. Furthermore, we may pass to the limit η → 0, while ε > 0 is kept fixed, in the sense of large deviation: A direct application of classical Freidlin-Wentzell theory [11,10] leads to: Theorem 2. Let us fix P, Q ∈ (R d ) N the endpoints of our trajectories, up to ordering, and 0 < t 0 < t 1 two positive times.For fixed ε and as η ↓ 0, the law of the solution of (7) between times t 0 and t 1 starting from P and conditioned to arrive in Q (up to ordering) satisfies the large deviation principle on C 0 ([t 0 , t 1 ]; (R d ) N ) of good rate function L ε defined for all X = (X t ) t∈[t0,t1] by: where here and in the rest of the article, we denote by {P σ } and {Q σ } the sets {P σ , σ ∈ S N } and {Q σ , σ ∈ S N } respectively.
In the rest of the article, we will call L the Freidlin-Wentzell action instead of the usual terminology "good rate function".Also, the endpoints P and Q are fixed once for all so we do not write explicitly the dependence of L ε on those.Theorem 2 asserts in particular that when ε is fixed and η is small, if X ε,η solves (7), given X ε,η t0 ∈ {P σ } and X ε,η t1 ∈ {Q σ }, X ε,η is with very high probability close to the minimizers of L ε .Now we will see that these functionals converge as ε ↓ 0 to a functional whose minimizers follow the dynamic of MAG2 , in the sense of Γ−convergence (and hence in the sense of convergence of minimizers as well).As a consequence, when both η and ε are small, given X ε,η t0 ∈ {P σ } and X ε,η t1 ∈ {Q σ }, X ε,η is close with high probability to the dynamic of MAG.
Remark 3. Here, we chose to present the result for point clouds, i.e. when the particle are still indistinguishable.However, the theorem could also be stated replacing the conditioning on X t0 ∈ {P σ } and X t1 ∈ {Q σ } by X t0 = P and X t1 = Q.Otherwise stated, reintroducing distinguishable particles at this stage would not affect the results of this section (neither Theorem 2 nor Theorem 5 below).We decided to keep on working on clouds in order to avoid crossings of particles in Section 4.

The convergence result
Define the following smooth convex function (see Lemma 9 below): It has the property that for all ε > 0, t > 0, and As a consequence, denoting by β the smooth function 1/α 2 , we can rewrite L ε for all ε > 0 as: When ε tends to zero, by virtue of the so-called Laplace's principle, we have the pointwise convergence: The function f no longer depends on the time variable, and it is a convex function with finite values.As a consequence, for each X ∈ (R d ) N , the subdifferential ∂f (X) of f at X is non-empty.We will consider the extended gradient ∇f (X) of f at X defined as: Definition 4 (Extended gradient).We call extended gradient of a real valued convex function h at X, denoted by ∇h(X), the element of ∂h(X) with minimal Euclidean norm.
Here is our Γ-convergence result: Theorem 5.As ε tends to 0, the family of actions for the topology of uniform convergence of C 0 ([t 0 , t 1 ]; (R d ) N ).
Theorem 5 can be seen as the main result of this article.In particular, it implies that any limit point as ε ↓ 0 of a sequence of minimizers of L ε is a minimizer of L. So, one can rigorously obtain an effective action to describe the double limit lim ↓0 lim η↓0 for the solution of the SDE (7).Notice that the lower semi-continuity of L is a direct corollary of the Γ−convergence.In addition, the fact that L has compact sublevels will be clear from the proof.Hence, the existence of global minimizers for L (and hence for all the forthcoming functionals) follows from the direct method of calculus of variations.
We will prove Theorem 5 in Section 3 below, but before doing so, let us show that for a specific choice of β, we recover MAG.

A regime where Monge-Ampère gravitation arises
Let us take β(t) := t which corresponds to α(t) := 1/ √ t.Through the change of variable: we observe that for all X ∈ C 0 ([t 0 , t 1 ]; (R d ) N ), L(X ) = Λ(Z) with: (Recall the definition (9) of f .)Unexpectedly, this action is exactly the one previously suggested by the third author in [5] to include dissipative phenomena (such as sticky collisions in one space dimension) in the Monge-Ampère gravitational model!It turns out to be equivalent to the following one: (By expanding the square and remarking that the mixed product is an exact temporal derivatives, so that its integral only involves the endpoints P and Q.)

Application of the least action principle
We observe that the points Z where f is differentiable are those for which the maximum in the definition (9) of f is reached by a unique permutation σ opt so that ∇f (Z) is nothing but A σopt .For such points Z, we get (by definition of f and using that |A σ | = |A| for any σ ∈ S N ), while, on the set N of non-differentiability of f , we rather have So the action we have obtained in the previous section, namely Λ , bounds from below The second action is definitely strictly larger than the first one for those curves θ → Z θ which take values in N (where f is not differentiable) on a set of times θ ∈ [θ 0 , θ 1 ] which is not negligible for the Lebesgue measure.So, the least action principle may provide different optimal curves, depending on the action we choose.However, if a curve is optimal for Λ and almost surely takes value outside of N , then it must also be optimal for Λ + .Clearly, it is much easier to get the optimality equation for such a curve, by working with Λ + rather than with Λ .By varying action Λ + , we get, as optimality equation, which is the discrete dynamic announced in the introduction.
Of course, these equations have to be suitably modified for those curves which are optimal for action Λ but not for Λ + because they takes values in N for a non negligible amount of time.At this stage, we do not know how to do it.However, at least in the one-dimensional case d = 1, such modifications are tractable and correspond to sticky collisions as x i (t) = x j (t) occurs for different "particles" of labels i = j and during interval of times of strictly positive Lebesgue measure, see Section 4.

Proof of the Γ−convergence
The purpose of this this section is to prove Theorem 5.

The proof as a consequence of three lemmas
As we will see, Theorem 5 will be a consequence of three lemmas that we state below.Lemmas 7 and 8 both involve a family of smooth functions (g ε ) ε>0 on [θ 0 , θ 1 ] × R p for some θ 0 < θ 1 and p ∈ N, pointwise converging to a function g.On these functions, we will assume the following: (H4) The map ∇g ε is uniformly bounded, that is: (H5) The map ∂ θ ∇g ε is uniformly bounded, that is: In order to keep the proofs simple, we did not try to optimize these assumptions for Lemmas 7 and 8, which are probably true in a far more general context.However, as we will see in the proof of Theorem 5, it suffices to check these assumptions for the family (f ε ) ε>0 after suitable change of temporal and spatial scale.This is done in Lemma 9.
In the next subsections, we will prove these three lemmas one by one.The most involved one is undoubtedly Lemma 8, which can be seen as the main step in the proof of Theorem 5. Let us start by proving Theorem 5 using Lemmas 7, 8 and 9.
Proof of Theorem 5.In this proof, the notation X = X t will stand for a generic curve from we define in the same way the family of corresponding curves ) and (g ε ) ε>0 , g as defined in Lemma 9, we have: and: (Note that due to Lemma 9, g is convex with respect to the space variable, and so ∇g is well defined.)Proof of the Γ − lim inf.Let X ε → ε→0 X for the topology of uniform convergence.Of course, we also have Without loss of generality, we can suppose Indeed, if the lim inf of this quantity is infinite, there is nothing to prove, and if the lim inf is finite, up to an extraction, we can reduce ourselves to the case where the sup is finite.
As ∇g ε (θ, Y ) is bounded uniformly in ε, θ, Y (this is (H4)), we easily deduce with ( 14) that this assumption implies In particular, by lower semi-continuity of this H 1 seminorm with respect to uniform convergence, all the curves Y ε , ε > 0 as well as In particular, applying Lemma 7 thanks to Lemma 9, we have: On the other hand, it is clear that under (17), for ε > 0 sufficiently small, the endpoints of X ε are stationary, that is X ε t0 = P σ0 and X ε t1 = Q σ1 with σ 0 , σ 1 independent of ε.So for such ε, Y ε satisfies the endpoint constraint for K ε with R := P σ0 / √ t 0 and S := Q σ1 / √ t 1 .Hence, applying Lemma 8 thanks to Lemma 9, we have: The result follows easily by gathering ( 15), ( 18), ( 19) and (16).Proof of the Γ − lim sup.Let X ∈ C 0 ([t 0 , t 1 ]; (R d ) N ).Without loss of generality, we can suppose that ) and that it satisfies the endpoint constraint for L. In particular, Y belongs to ) and satisfies the endpoint constraint for K with R := X t0 / √ t 0 and S := X t1 / √ t 1 .Lemmas 8 and 9 let us find a family (Y ε ) ε>0 converging to the corresponding Y such that: In particular Y ε is in H 1 for sufficiently small ε, and by Lemmas 7 and 9, The result follows easily from ( 15), ( 20), ( 21) and ( 16), by noticing that because of (20), Y ε satisfies the endpoint constraint for K ε .Hence for such ε, X ε satisfies the endpoint constraint for L ε .

Proof of Lemma 7
The proof of Lemma 7 just consists in integrating by parts and using the convergence properties of (g ε ) ε>0 .
Proof of Lemma 7. Integration by parts.First, notice that as soon as Y ∈ H 1 ([θ 0 , θ 1 ]; R p ) and ε > 0, then θ → g ε (θ, Y θ ) and θ → g(θ, Y θ ) are also in H 1 , with for almost every θ: It is clear in the case of g ε because g ε is smooth, and it is the assumption (H3) in the case of g.As a consequence, by an integration by parts, it suffices to prove that whenever (Y ε ) ε>0 converges to Y as ε → 0 for the topology of uniform convergence, Convergence term by term.The convergence is an easy consequence of the pointwise convergence and of the uniform Lipschitz bound (H4).
For the same reason, we have for all θ But on the other hand, because of (H1) and (H4), g ε is locally bounded, uniformly in ε.Hence, is a consequence of the dominated convergence theorem.

Proof of Lemma 8
Before entering the proof of Lemma 8, we need to state a few standard results concerning the extended gradient ∇ as defined in Definition 4, and its links with the so-called resolvent map.These tools could even be set in the infinite dimensional setting, that is in Hilbert spaces [14], or in metric spaces [2], and we refer to these works for the proofs.Consider h : R p → R a convex function.It is easily shown that for all The following proposition, that we state without a proof, is an easy consequence of this formula and of the elementary fact that in finite dimension, pointwise convergence of convex functions to a finite valued convex functions implies Γ−convergence.
Proposition 10.Let (h ε ) ε>0 be a family of convex functions on R p pointwise converging to h, and let (X ε ) ε>0 be a family of points in R p converging to X. Then For τ > 0 and X ∈ R p , define the resolvent operator by: Once again, the following proposition is standard, and we state it here without a proof.
1. We have for all X ∈ R p and τ > 0: 2. If h is everywhere differentiable and X ∈ R p , then the following first order condition holds: 3. If (h ε ) ε>0 is a family of convex functions on R p pointwise converging to h, then for all τ > 0 and We are now ready for the proof of Lemma 8.

Proof of
and correspondingly: First, we prove: We will then choose τ as a function of ε and show how to fix the endpoints.Proof of (24).By the second point of Proposition 11, for all ε, τ, θ, we have: Using the smoothness and convexity of g ε , and Y ∈ H 1 , we easily deduce that Y τ,ε is in H 1 and that for almost all θ, . By convexity of g ε , we have I ≤ I + τ D 2 g ε in the sense of symmetric matrices, and hence: Recall that M was defined in the uniform integrability assumption (12) on ∂ θ ∇g ε .(In the case when ∂ θ ∇g ε = 0, we recover the known fact that for h independent of time, J τ,h is contractive.)Then, we deduce: Formula (24) follows.

Proof of Lemma 9
The proof is straightforward, and relies on explicit computations.
Proof of (H2).By (30), it suffices to check that h is convex.Differentiating twice (29), we get for all X ∈ (R d ) N : where if a is a function of σ, a(σ) X stands for: .
It follows that D 2 h(X) is a nonnegative symmetric matrix.Proof of (H3).By the definitions (9) of f and (13) of g, we have for all θ ∈ [θ 0 , θ 1 ] and Y ∈ (R d ) N : The convexity is obvious, let us check (10).Let us consider Y ∈ H 1 ([θ 0 , θ 1 ]; (R d ) N ).The function g is clearly locally Lipschitz in both θ and Y .As a consequence, the map G : θ → g(θ, Y θ ) is also H 1 .Let us take θ ∈ (θ 0 , θ 1 ) a point where both Y and G are differentiable (this happens for almost every θ).We have: to get the second line.In the same way, we have: The result follows from gathering these two inequalities.Proof of (H4).In view of (30) and as θ 0 > −∞, it suffices to check that ∇h is bounded.Differentiating (29) at X ∈ (R d ) N leads to: which is clearly bounded by |A|.Proof of (H5).Using (30), we get for all ε > 0, θ ∈ [θ 0 , θ 1 ] and Y ∈ (R d ) N : .
As we already saw in (H4) that ∇h is bounded, it suffices to prove that X → D 2 h(X) • X is bounded.Let us expand everything in (31) and apply X to the right.We get: .
As a consequence, it suffices to show that for each σ, η ∈ S N , is bounded, uniformly in X.First, if η = σ, then T (σ, σ, X) = 0. Else, let us use the bound: obtained by only keeping the terms corresponding to σ = η = σ and σ = η = η in the sum.This leads to: , which is clearly bounded uniformly in X.The result follows.

The case of dimension 1: sticky collisions
In this section, we will study the global minimizers of the functional Λ obtained in Subsection 2.5, in dimension d = 1.If we call t the time variable and if we replace θ 0 and θ 1 by 0 and T respectively, due to the invariance of the functional through translation in time, Λ reads: where: Here, we chose a strictly ordered A = (a 1 , . . ., a N ), that is such that a Once again, when X = (x 1 , . . ., x N ) ∈ R N and σ ∈ S N , X σ := (x σ(1) , . . ., x σ(N ) ), and {P σ } and {Q σ } refer to {P σ , σ ∈ S N } and {Q σ , σ ∈ S N } respectively.Of course P = (p 1 , . . ., p N ) and Q = (q 1 , . . ., q N ) can be supposed to be ordered, that is We recall that we defined the extended gradient ∇f in Definition 4. As already noticed in Subsection 2.4, the existence of global minimizers for Λ follows from the direct method of calculus of variations.Uniqueness does not hold in general, even up to permutations.The purpose of the section is twofold.On the one hand, we will show that the model has nice regularity properties: any global minimizer of Λ is smooth except on a finite number of "sticking" or separation" times3 .On the other hand, we will justify as claimed in Section 2 that Λ describes a model with sticky collisions in the sense that a minimizer Z = (z 1 (t), . . ., z N (t)) of Λ will typically exhibit some sticking effects as z i (t) = z j (t) for i = j on non-trivial intervals.
To describe the sticking effect, it is convenient to introduce the following definition: Definition 12 (Partition of 1, N ).Let X ∈ R N .We say that X is divided according to π(X) when π(X) is the partition of 1, N induced by the relation: We call C(X, i) the class of i ∈ 1, N in π(X).
The main result of the section is the following result: Theorem 13 (Regularity of the optimal trajectories).For given A, P, Q ∈ R N and T > 0 as before, let Z be a global minimizer of Λ defined in (32).Then Z is continuous and there exist: a family of times such that for each i = 1, . . ., p, Z is smooth on [t i−1 , t i ], and π(Z) is constant on (t i−1 , t i ).
It will be quite clear from the proof that sticking effects do occur.This exactly means that there exist trajectories Z for which with the notations of Subsection 2.6, Λ (Z) < Λ + (Z).For such trajectories, Z t is located on the set where f is not differentiable for a set of times of positive Lebesgue measure.But in dimension 1, this set is exactly the set where at least two particles are located at the same place.Otherwise stated, the set of times when π(Z) = {{1}, . . ., {N }} is typically of positive Lebesgue measure.As a consequence of Theorem 13, it is even a finite union of intervals.
Still it might be convenient to illustrate the sticking effects included in the model by the following easy proposition.It asserts that the set of times when all the particles are stuck is an interval: if all the particles are stuck at two different times, the cheapest behaviour between these two times is to remain stuck.It also shows that this phenomenon occurs: if all the particles are sufficiently close at the initial and final time, then they necessarily stick together during a non-trivial interval along the evolution.

Proposition 14 (Intervals of full degeneration).
1.For given A, P, Q ∈ R N and T > 0 as before, let Z = (z 1 (t), . . ., z N (t)) be a global minimizer of Λ .Suppose there exist two times 0 ≤ t 1 < t 2 ≤ T such that: Then for all t 2. For given A ∈ R N and T > 0 as before, the set U of endpoints P, Q ∈ R N with the property that for all minimizer Z = (z 1 (t), . . ., z N (t)) of Λ , the set of times: The proof of Proposition 14 uses almost nothing and is given in Subsection 4.2.Except for that, the whole section is dedicated to the proof of Theorem 13.For this we take once for all A, P, Q ∈ R N and T > 0, A being strictly ordered and P, Q being ordered.
Even if all the arguments are elementary, we will need a certain number of steps, including the explicit computation of the potential |X − ∇f (X)| 2 (Subsection 4.1 and 4.4) and the justification of a priori knowledge on the optimal trajectories: they can be supposed to be ordered at all time (Subsection 4.3), and the conservation of energy and momentum holds during shocks4 (Subsection 4.5).The main ingredient in the proof of Theorem 13 is an estimate given in Subsection 4.6: during a non-pathological shock (pathological shocks are excluded a posteriori ), at least one particle has a below-bounded jump in its velocity (Proposition 23).We finally provide the proof of Theorem 13 in Subsection 4.7.
Throughout the section, we will work with several type of finite sets: the partitions of type π(X) and the class of particles of type C(X, i).Some of the arguments or computations will deal with their cardinal.Thus, if F is a finite set, we will denote by #F its cardinal.

Properties of the extended gradient
The extended gradient of f can be computed explicitly In Lemma 16, we gather easy properties of ∇f that will be needed in the following.Before doing so, let us introduce some notations.Definition 15.Let π be a partition of 1, N .We call E π the linear subspace of R N of all X s.t.π is a refinement of π(X), that is: Here is the lemma: Lemma 16 (Properties of ∇f ).
1.The extended gradient ∇f has the following symmetry: 2. The function X → |X − ∇f (X)| is symmetric: 3. If X is ordered, then ∇f (X) is the orthogonal projection of A on E π(X) .
Proof.Point 1.Let σ ∈ S N .By the definition (33) of f , for all X ∈ R N , f (X σ ) = f (X).Calling I σ : X → X σ , we easily deduce that at the level of subdifferentials: . We conclude by the fact that I σ is orthogonal.Point 2. It is a direct consequence of Point 1. Point 3. Let X = (x 1 , . . .x N ) ∈ R N be an ordered vector.Considering the definition (33) of f and noticing that the maximum is achieved exactly for those σ such that X σ = X, it appears that ∇f (X) belongs to the convex hull: For a given i ∈ {1, . . ., N }, we call V i ∈ R N the vector whose j-th coordinate is 1 if j ∈ C(X, i) and 0 otherwise.On the one hand, we have E π(X) = Span{V i | i = 1, . . ., N }, and on the other hand, for all i, the scalar product V i • Y is constant on the above-mentioned convex hull.So we deduce: Hence, we just have to prove that ∇f (X) ∈ E π(X) .If i, j ∈ {1, . . ., N } are such that x i = x j , let us apply formula (34) to the permutation σ := (i, j): The result follows.Point 4. Let X be ordered and i ∈ {1, . . ., N }.As ∇f (X) ∈ E π(X) , with the notations of the proof of Point 3: where we used A − ∇f (X) ⊥ V i to get the first identity in the second line.
The three next subsections will be dedicated to consequences of this lemma: • A proof of Proposition 14; • When proving Theorem 13, it is enough to consider ordered trajectories (Proposition 18); • For ordered trajectories, the potential in Λ can be decomposed as sum of a smooth "external" potential and an "internal" energy only depending on π(X) (Proposition 19).

Proof of Proposition 14
With the help of Lemma 16, we are ready to prove Proposition 14.
Proof of Proposition 14. Point 1.Without loss of generality, we can suppose t 1 = 0 and t 2 = T , that is P = (p 1 , . . ., p N ) and Q = (q 1 , . . ., q N ) are such that It suffices to prove that when Z is a continuous trajectory joining P to Q, then Λ (Ψ(Z)) ≤ Λ (Z), and with equality if and only if Z = Ψ(Z).As Ψ is 1-Lipschitz, it reduces the kinetic part of Λ .For the potential part, we remark that for all . As a consequence, by Point 3. of Lemma 16, we have as soon as X is ordered ∇f (Ψ(X)) = Ψ(∇f (X)).Hence: with equality if and only if X ∈ E 1,N , i.e. if and only if Ψ(X) = X.This property is extended to non-ordered X using (35), and the result follows.Point 2. The function Λ = Λ (P, Q) defined for all P, Q ∈ R N as the minimal value of Λ is continuous.Indeed, if P, P , Q, Q ∈ R N are chosen so that |P − P | + |Q − Q| 1 and if Z is a trajectory joining P to Q, we can find a trajectory Z joining P to Q with:5 To do so, it suffices to choose τ ∼ |P − P | + |Q − Q|, and to define Z as the trajectory joining P to P in straight line between times 0 and τ , joining P to Q between times τ and T − τ by following Z with a proper affine change of time, and finally joining Q to Q in straight line between times T − τ and T .This shows that Λ is lower semi-continuous, but the continuity is obtained by noticing that the o in (37) is locally uniform on P, Q ∈ R N .The argument is easily adapted to show that Λ = Λ (P, Q) defined for P, Q ∈ R N by: is also continuous.Besides, the set U defined in the statement clearly satisfies: By continuity of Λ and Λ , V is an open set.Hence it remains to prove that: To do so, we take P, Q ∈ E 1,N , Z a curve joining P to Q such that {t | Z t ∈ E 1,N } is negligible, we still call Ψ the orthogonal projection on E 1,N , and we prove that where a > 0 does not depend on Z.Let us call Φ := Id − Ψ the orthogonal projection on the orthogonal of E 1,N .As in the proof of the first point, ∇f • Ψ = Ψ • ∇f .As a consequence: where Z ⊥ = Z ⊥ t := Φ(Z t ) is a curve joining 0 to 0. But for almost all t, Z t / ∈ E 1,N , so as we saw in the proof of the first point, ∇f (Z t ) / ∈ E 1,N .As ∇f only takes a finite number of values (see Lemma 16), for almost all t, Φ(∇f (Z t )) belongs to some finite set, say G, which does not contain 0. Hence, where dist(Z, G) denotes the distance from Z to G.Because Z ⊥ joins 0 to 0 and G does not contain 0, this last integral is easily seen to be below bounded away from 0 independently of Z, and the result follows.

Ordering of the particles
The purpose of this subsection is to show that when proving Theorem 13, we can restrict ourselves to study trajectories that remain ordered (see Figure 1).This is due to the following proposition.
Proposition 18.Let Z = Z t be a global minimizer of Λ .We call Z = Z t the trajectory obtained by reordering the coordinates of Z in increasing order.Then Z is also a global minimizer of Λ .Moreover, Z has the regularity stated in Theorem 13 if and only if Z does.
In particular, Λ always admits an ordered minimizer, and it it is enough to prove Theorem 13 for such minimizers.
Thanks to this proposition, from now on, we only work with ordered minimizers of Λ .These minimizers Z = Z t satisfy in particular Z 0 = P and Z T = Q (as we chose them to be ordered in the first place).
Figure 1: These two trajectories share their initial and final position up to ordering and their actions.But to the right, the order is preserved while to the left, this is not the case.
Proof.Let Z and Z be as in the statement of the proposition.Point 2 of Lemma 16 implies: We call Ψ : R N → R N the operator that reorders the coordinates of a vector in increasing order, so that in particular for all t, Z t = Ψ(Z t ).A simple application of the rearrangement inequality shows that Ψ is 1-Lipschitz.In particular, it reduces the action of curves: By adding the two last formulas, and by noticing that the endpoint constraint is fulfilled, we get Λ ( Z) ≤ Λ (Z).As Z is a minimizer, this inequality is in fact an equality, and Z is also a minimizer.
Remark that both Z and Z are continuous because they have finite action.Hence, the second claim of the proposition is a consequence of the two following facts: • For all t ∈ [0, T ], #π( Z t ) = #π(Z t ).
• For any continuous trajectory t ∈ I → X t ∈ R N where I is an interval, t → π(X t ) is constant if and only if t → #π(X t ) is constant.
Indeed in that case, t → π(Z t ) and t → π( Z t ) are constant on the same intervals, and the result follows.The first point and the "only if" part of the second point are trivial.
For the "if" part of the second one, we reason by contraposition.Suppose s → π(X s ) has a discontinuity at time t and we prove that s → #π(X s ) also does.If s → π(X s ) has a discontinuity at time t, we can find two distinct accumulation points π 1 and π 2 of s → π(X s ) at time t.As for all π, the set E π is closed, X t belongs to E π1 ∩ E π2 .But this set is noting but E π where π is the finest partition of which π 1 and π 2 are refinements, that is the partition corresponding to the relation: In particular, π(X t ) is a refinement of π and as π 1 = π 2 , we easily get: So s → #π(X s ) has a discontinuity at time t, and the result follows.

Decomposition of the potential
Here, we compute explicitly the values of the potential X → |X − ∇f (X)| 2 on ordered vectors X ∈ R N .Notice that for such vectors X, π(X) has an additional structure: if C ∈ π(X), then C is an interval of integers.We say that such partitions are ordered.We prove the following: where h is defined on a partition π of 1, N by: In particular, h has the following monotonicity property: if π and π are two ordered partitions and if π is a strict refinement of π, then h(π) < h(π ).
The more particle are stuck together, the lower h.This is the reason for which Λ favours the sticking of particles.The function −h can be understood as the internal energy of the system.
Dropping the constant term |A| 2 /2 in (38) and defining Λ on a trajectory Z by: it is clear that Λ and Λ have the same minimizers in the class of ordered trajectories.Hence, as a consequence of Proposition 18, it suffices to prove the conclusion of Theorem 13 for the minimizers of Λ in the class of ordered trajectories.
Proof of Proposition 19.Let X ∈ R N be an ordered vector.By Point 3. of Lemma 16, we have A−∇f (X) ∈ E π(X) ⊥ and both X and ∇f (X) ∈ E π(X) .So using twice the Pythagorean theorem, we get: The identities (38) and (39) are obtained by computing |∇f (X)| 2 using (36).If we recap, h(π) is the squared norm of the orthogonal projection of A on E π .But if π is a refinement of π, E π ⊂ E π , and hence h(π) ≤ h(π ).The strict inequality is obtained by noticing with the help of (36) and using the strict ordering of A that if in addition π and π are ordered and π = π, then the projection of A on E π does not belong to E π .

Conserved quantities
In this subsection, we discuss two simple and yet structural properties of the dynamic prescribed by the functionals Λ , Λ : the Hamiltonian of the system is conserved (Proposition 20), and its center of mass is smooth (Proposition 21).In particular, the momentum of the system is conserved during shocks.
Proposition 20.Let Z be an ordered minimizer of Λ .Then: is constant in the sense of distributions.
Proof.The proof is completely standard and consists in comparing the value of Λ on Z and t → Z t+εϕ(t) for small ε and functions ϕ that are smooth and compactly supported in (0, T ).
1.If particle i is not involved in a shock at time t, then for s in the neighbourhood of t, C := C(Z s , i) is constant and z i is a smooth solution of: In particular, if i is involved in an isolated shock at time t, then z i admits left and right derivatives at time t, denoted by żi (t−) and żi (t+) respectively.
2. There is α = α(N, A) > 0 such that for any isolated shock (t, q, C), calling i := min C: Proof.Point 1.If particle i is not involved in a shock at time t, by definition of a shock, it means that C := C(Z t , i) ∈ π(Z s ) for all s in a neighbourhood of t.In particular, for all j ∈ C and s sufficiently close to t, by (36): On the other hand, it is easy to find a neighbourhood U of (t, z i (t)) in [0, T ]×R such that for all j ∈ {1, . . ., N } and all s ∈ [0, T ], (s, z j (s)) ∈ U implies j ∈ C. As a consequence, if ξ : [0, T ] → R is smooth and compactly supported in a sufficiently small neighbourhood of t, and if ε is sufficiently small, by defining Z = ( z 1 (s), . . ., z N (s)) for any j ∈ {1, . . ., N } and s ∈ [0, T ] by: then π(Z) and π( Z) (and hence ∇f (Z) and ∇f ( Z)) coincide at all time.The ODE follows from comparing the values of Λ on Z and trajectories of type Z.
In particular, by boundedness of Z, if particle i is not involved in a shock at time t, |z i | is bounded by a constant not depending on t.The existence of żi (t−) and żi (t+) at the times of isolated shocks follows easily.Point 2. This is the heart of our study of the dynamical system, and maybe the less standard part of Section 4. But still the idea is very easy: With the notations of the statement, if żi (t−) − żi (t+) is too small, then it is cheaper to stick particle i with other particles, as shown in Figure 3.The proof goes as follows.
Step 1: Definition of a competitor.
Let us consider (t, q, C) an isolated shock.Because it is isolated, we can find τ > 0 such that the particles of C are not involved in an other shock between times t − τ and t + τ .By definition of a shock, we cannot have C ∈ π(Z s ) for all s ∈ (t − τ, t + τ ), so either for all s ∈ (t − τ, t), C / ∈ π(Z s ) or for all s ∈ (t, t + τ ), C / ∈ π(Z s ).Without loss of generality, we suppose that the second one holds: the particles of C are not all stuck right after the shock.Moreover, by our choice of τ , for all C ⊂ C, the assertion C ∈ π(Z s ) is either true of false independently on s ∈ (t, t+τ ).Then, for s ∈ (t, t+τ ), the following definitions of C 1 , C 2 ∈ π(Z s ) do not depend on s: k j := #C j , v j := żi (t+) for i ∈ C j , and p := For 0 ≤ σ < τ and λ ∈ [0, 1), we define a competitor Z σ,λ = (z σ,λ 1 (s), . . ., z σ,λ N (s)) by setting for all i = {1, . . ., N } and s ∈ [0, T ]: (See Figure 3 for an illustration of this competitor.)We will get a below bound on v 2 − v 1 by comparing the value of Λ on Z and Z σ,λ , and by differentiating the corresponding inequality first with respect to σ at σ = 0 (we zoom so that the particles of Z only travel along straight lines), and then with respect to λ at λ = 0 (we compute the first variation of the action when we let the particles stick together).
Step 2: A below bound on v 2 − v 1 .
As Z σ coincide with Z for times outside (t, t+σ) and for coordinates that are not in C 1 ∪C 2 , by definition (40) of Λ , we have: where to obtain the second line, we used (45) and the fact that between times t and t + σ, both z i and z σ,λ i remain at a distance of order σ of q.
Let us consider i ∈ C j for j = 1, 2. One the one hand, as z i admits v j as a right derivative at time t, we have: By plugging (47) and ( 48) in (46) and by using the definition (44) of k 1 , k 2 and p, we get: By minimality of Λ (Z), this quantity must be nonnegative.If we divide it by λσ, and if we let σ and then λ go to zero, we end-up with: Step 3: Conservation of momentum during an isolated shock and conclusion.Because (t, q, C) is isolated, it is easy to justify that we can replace V by the vector V C whose j-th coordinate is 1 if j ∈ C and 0 otherwise in the proof of Proposition 21.Doing so, we obtain the "local" conservation of momentum: by ordering of the particles, we have for i = min C: (Indeed, j ∈ C → żj (t−) and j ∈ C → żj (t+) are clearly non-increasing and non-decreasing respectively.)By recalling that v 1 = żi (t+) and using (49), we get: The minimal right hand side's value is δ/(#C 2 − #C), obtained for k 1 = #C − 1 and k 2 = 1.Hence, we get the result by choosing α = δ/(N 2 − N ).

Conclusion: proof of Theorem 13
We are now ready to give the proof of Theorem 13.We give ourselves Z a global minimizer of Λ .Thanks to Proposition 18, we can suppose that Z is ordered, and thanks to Proposition 19, we can consider Λ instead of Λ .Because of Proposition 23, it suffices to prove that there is a finite number of shocks.Indeed, in that case one can take for 0 = t 0 < t 1 • • • < t p = T the moments of these shocks (and the endpoints of [0, T ]).

Lemma 8 .
Proof of the Γ − lim inf.It is straightforward using Fatou's lemma, Proposition 10 and the lower semi-continuity of Y → θ1 θ0 | Ẏθ | 2 dθ with respect to the topology of uniform convergence.Proof of the Γ − lim sup.Let us consider a curveY ∈ H 1 ([θ 0 , θ 1 ]; R p ) with Y θ0 =R and Y θ1 = S (else there is nothing to prove).For all ε > 0 and τ > 0, we define:

C 1 :
= C(Z s , i) for i = min C and C 2 := C(Z s , i) for i = min C\C 1 .(Theclasses C 1 and C 2 are the two leftmost packs of particles of C right after the shock.)Let us define for j = 1, 2:

Figure 3 :
Figure 3: To the left, a piece of the trajectory Z, and to the right, the competitor Z σ,λ that we describe in the proof.