The Landau equation as a Gradient Flow

We propose a gradient flow perspective to the spatially homogeneous Landau equation for soft potentials. We construct a tailored metric on the space of probability measures based on the entropy dissipation of the Landau equation. Under this metric, the Landau equation can be characterized as the gradient flow of the Boltzmann entropy. In particular, we characterize the dynamics of the PDE through a functional inequality which is usually referred as the Energy Dissipation Inequality. Furthermore, analogous to the optimal transportation setting, we show that this interpretation can be used in a minimizing movement scheme to construct solutions to a regularized Landau equation.


Introduction
The Landau equation is an important partial differential equation in kinetic theory.It gives a description of colliding particles in plasma physics [Lifshitz and Pitaevskiȋ 1981], and it can be formally derived as a limit of the Boltzmann equation where grazing collisions are dominant [Degond and Lucquin-Desreux 1992;Villani 1998a].Similar to the Boltzmann equation (see [Boblylev et al. 2013] for a consistency result and related derivation issues), the rigorous derivation of the Landau equation from particle dynamics is still a huge challenge.For a spatially homogeneous density of particles f = f t (v) for t ∈ (0, ∞), v ∈ ‫ޒ‬ d , the homogeneous Landau equation reads For notational convenience, we sometimes abbreviate f = f t (v) and f * = f t (v * ).We also denote the differentiations by ∇ = ∇ v and ∇ * = ∇ v * .The physically relevant parameters are usually d = 2, 3 and γ ≥ −d −1 with [z] = I −(z ⊗z)/|z| 2 being the projection matrix onto {z} ⊥ .In this paper, for simplicity we will focus in the case d = 3 and vary the weight parameter γ , although most of our results are valid in arbitrary dimension.The regime 0 < γ < 1 corresponds to the so-called hard potentials, while γ < 0 corresponds to the soft potentials with a further classification of −2 ≤ γ < 0 as the moderately soft potentials and −4 ≤ γ < −2 as the very soft potentials.The particular instances of γ = 0 and γ = −d are known as the Maxwellian and Coulomb cases respectively.The purpose of this work is to propose a new perspective inspired from gradient flows for weak solutions to (1), which is in analogy with the relationship of the heat equation and the 2-Wasserstein metric; see [Jordan et al. 1998;Ambrosio et al. 2008].Our main result is inspired by and extends [Erbar 2023].There, he establishes the gradient flow perspective for the closely related spatially homogeneous Boltzmann equation with bounded collision kernels (γ = 0) which we perform in the case of Landau for γ ∈ (−3, 0] (see Theorem 12).One of the fundamental steps is to symmetrize the right-hand side of (1).More specifically, if we consider a test function φ ∈ C ∞ c ‫ޒ(‬ d ), we can formally characterize the equation by where the change of variables v ↔ v * has been exploited.Building our analogy with the heat equation and the 2-Wasserstein distance, we define an appropriate gradient noting that 2 = .To highlight the use of this interpretation, we notice that ∇φ = 0, when we choose as test functions φ = 1, v i , |v| 2 for i = 1, . . ., d, which immediately shows that formally the equation conserves mass, momentum and energy.The action functional defining the Landau metric mimics the Benamou-Brenier formula [2000] for the 2-Wasserstein distance; see [Dolbeault et al. 2009;Erbar 2014;Erbar and Maas 2014] for other distances defined analogously for nonlinear and nonlocal mobilities.In fact, the Landau metric is built by considering a minimizing action principle over curves that are solutions to the appropriate continuity equation, that is, where the ∇ • is the appropriate divergence; the formal adjoint to the appropriate gradient (see Section 2.1).Also, we notice that analogously to the heat equation, written as the continuity equation ∂ t f = ∇ • ( f ∇ log f ), the Landau equation can be formally rewritten as ), equivalent to the continuity equation with nonlocal velocity field given by This is a direct way to write (1) in the form of a continuity equation.Considering the evolution of Boltzmann entropy we formally obtain In physical terms this is referred to as the entropy dissipation (referred to as entropy production in the physics literature from defining H with a minus sign) since it formally shows that the entropy functional f log f dv is nonincreasing along the dynamics of the Landau equation.Moreover, by integrating (5) in time one formally obtains Villani [1998a] introduced the notion of H-solution, which captures this formal property.Motivated by the physical considerations of certain conserved quantities and entropy dissipation, H-solutions provided a step towards well-posedness of the Landau equation in the soft potential case.One advantage to this approach is that it avoids assuming that the solutions belongs to L p ‫ޒ(‬ 3 ) for p > 1.For moderately soft potentials, the propagation of L p norms is proven and this is enough to make sense of classical weak solutions [Wu 2014].
In the very soft potential case, there is no longer a guarantee of L p propagation due to the singularity of the weight.We refer to [Desvillettes 2015, Section 1.2] for a heuristic description of this difficulty.
Similar to H-solutions our approach will also be based on the entropy dissipation (6).Following De Giorgi's minimizing movement ideas [Ambrosio 1995;Ambrosio et al. 2008], we characterize the Landau equation by its associated energy dissipation inequality.More specifically, we show that weak solutions to (1) with initial data f 0 are completely determined by the functional inequality where | ḟ | 2 d L (s) stands for the metric derivative associated to the Landau metric defined above.Our analysis is also largely inspired by Erbar's approach [2023] in viewing the Boltzmann equation as a gradient flow and recent numerical simulations of the homogeneous Landau equation in [Carrillo et al. 2020] based on a regularized version of (4).In contrast with the classical 2-Wasserstein metric, one of the main features of the Landau equation (1) and metric (3) is that they are nonlocal.To be precise, gradient flow theory has been successfully applied to the study of many nonlocal PDEs [Carrillo et al. 2010;2012;Blanchet et al. 2008] by viewing them as gradient flows of appropriate energy functionals with respect to the 2-Wasserstein metric.The novelty in this work is the construction of the nonlocal metric d L with respect to which (1) can be viewed as the gradient flow of H. Hence, the convergence analysis usually relying on convexity and lower-semicontinuity needs to be adapted to deal with the nonlocality of this equation.In particular, our characterization Theorem 12 is based in using (expected) a priori estimates to deal with the nonlocality through appropriate bounds.
On the other hand, the state of the art related to the uniqueness for the Landau equation depends on the range of values γ may take.In the cases of hard potentials or Maxwellian, the uniqueness theory is very well understood due to Villani and the third author [Desvillettes and Villani 2000a;2000b;Villani 1998b].In the soft potential case, one of the first major contributions to the general theory of the spatially inhomogeneous Landau equation (γ ≥ −3) was the global existence and uniqueness result of [Guo 2002].This result was achieved in a perturbative framework with high regularity assumptions on the initial data.Through probabilistic arguments, the next major improvement to uniqueness for γ ∈ (−3, 0) came from [Fournier and Guérin 2009].Their result established uniqueness in a class of solutions that shrinks as γ decreases towards −3, as more L p and moments assumptions are needed.In their proof, uniqueness is shown by proving stability with respect to the 2-Wasserstein metric.
Still lots of open questions for the soft potential case remain.In particular, a fundamental question like uniqueness for the Coulomb case is unresolved.To tackle this and other problems an array of novel methods have been employed.Here is an incomplete sample of the contributions made in this direction which highlight the difficulties of the soft potential case [Desvillettes and Villani 2000a;2000b;Alexandre et al. 2015;Carrapatoso and Mischler 2017;Carrapatoso et al. 2017;Wu 2014;Gualdani andZamponi 2017, 2018a;2018b;Gualdani and Guillen 2016;Strain and Wang 2020;Golse et al. 2019a;2019b;Silvestre 2017].A brief glance at some of these references illustrates the breadth of techniques that have found partial success at answering the open questions: probability-based arguments, kinetic and parabolic theory, and many more.
The purpose of this paper is to bring in another set of techniques to help answer some of these fundamental questions.The gradient flow theory applied to PDEs has flourished in the last decades.In their seminal paper, Jordan, Kinderlehrer, and Otto [Jordan et al. 1998] proposed a variational approach (JKO scheme) extended later on to a wide class of PDEs using the optimal transportation distance of probability measures.These results and many more achievements from their contemporaries allowed for novel approaches to questions of existence, uniqueness, convergence to equilibrium, and other aspects of a large class of PDEs; we mention [Ambrosio et al. 2008;Santambrogio 2017] for a coherent exposition of these techniques and the relevant literature, even as more advances have been made since then.
The advantage of our variational characterization of the Landau equation is that it unveils new possible routes of showing convergence results for this equation.First of all, it allows for natural regularizations of the Landau equation by taking the steepest descent of regularized entropy functionals instead of the Boltzmann entropy as in [Carrillo et al. 2019].This idea was recently developed in [Carrillo et al. 2020] leading to structure-preserving particle schemes with good accuracy.We can also consider the framework of convergence of gradient flows based on -convergence introduced in [Sandier and Serfaty 2004;Serfaty 2011] to attack the convergence of these numerical methods [Carrillo et al. 2020].Moreover, this approach is flexible enough to also study the rigorous convergence of the grazing collision limit of the Boltzmann equation to the Landau equation.The grazing collision limit was recently revisited in the gradient flow framework by three of the authors [Carrillo et al. 2022].There, ideas from -convergence were used to pass from Erbar's gradient flow description [2023] for the Boltzmann equation to the present work's description of the Landau equation.Finally, deriving uniqueness from the variational structure is classically done through convexity properties of the entropy functional with respect to the geodesics of the Landau metric.This is another important avenue of research that our work opens.Moreover, gradient flows of convex entropies typically enjoy instantaneous smoothing [Ambrosio et al. 2008]; even if the entropy at t = 0 is infinite, for t > 0, the entropy becomes finite.In the case of Landau, we are not aware if this property holds for H.
We mention briefly the connection between (1) and the Fokker-Planck equation.For γ = 0, one can formally compute the evolution of v i v j f (v) dv through (1).This a priori information allows one to reduce (1) to a linear Fokker-Planck equation for γ = 0.The present work proposes the alternative viewpoint that the resultant Fokker-Planck equation can be viewed as the d L -gradient flow of H for γ = 0. Since many variants of the linear Fokker-Planck equation have been well-studied, this case serves as a nice benchmark to test the gradient flow theory developed here.
The plan of this paper is as follows.Section 2 introduces the prerequisites and contains the statements of the main results.We first construct and analyze in Section 3 the Landau metric based on (3).For a regularized problem, Section 4 shows the equivalence between weak solutions and gradient flows, while Section 5 shows the existence of gradient flow solutions via a minimizing movement scheme.Finally, we show in Section 6 that a gradient flow solution is equivalent to H-solutions of the Landau equation (1) under some integrability assumptions.The Appendix is devoted to some technical lemmas needed in the proof of the main theorems regarding the chain rule identity behind the definition of weak solutions for the regularized Landau equation.

Preliminaries and the main results
We start by introducing the necessary notation and definitions together with a quick overview of gradient flow concepts to make our main results fully self-contained.
2.1.Notation and definitions.We define We adopt the Japanese angle bracket notation for a smooth alternative to absolute value For ϵ > 0, we define our regularization kernel to be an exponential distribution: .
Our results work for some general tail behavior in the kernels given by for s > 0; we point out some of the limitations and restrictions on s > 0 in the later estimates.We shall refer to G 2,ϵ as the Maxwellian regularization.We denote the space of probability measures over ‫ޒ‬ d by P(‫ޒ‬ d ), endowed with the weak topology against bounded continuous functions.We will mostly be dealing with the Lebesgue measure on ‫ޒ‬ d as our reference measure, which we denote by L. The subset P a ‫ޒ(‬ d ) ⊂ P(‫ޒ‬ d ) denotes the set of absolutely continuous probability measures with respect to Lebesgue measure.For p > 0, we also define the probability measures with finite p-moments P p ‫ޒ(‬ d ) by Finally, for E > 0, we consider the subset P p,E ‫ޒ(‬ d ) ⊂ P p ‫ޒ(‬ d ) of probability measures with p-moments uniformly bounded by E: We denote by M the space of signed Radon measures on ‫ޒ‬ d × ‫ޒ‬ d with the standard weak* topology against the continuous and compactly supported functions of ‫ޒ‬ d × ‫ޒ‬ d .The space M d is the space of signed d-length Radon measures.For T > 0, we will add the time contribution of the measures by defining M T to be the space of signed Radon measures on ‫ޒ‬ d × ‫ޒ‬ d × [0, T ] with the usual weak* topology.Similarly, M d T will be the space of signed d-length Radon measures on which we shall see is well-defined provided µ has a finite moment in Lemma 30.Formally, one can calculate the first variation of this functional in P 2 as This can be formally obtained by calculating Fréchet derivatives in the sense of identifying the limit To be precise, the first variation (in an L 2 setting) would actually be δH We drop the constant term since our functional space is P and the first variation typically appears with derivatives applied to it.For a functional F : P a ‫ޒ(‬ d ) → ‫ޒ‬ with first variation δF/δ f , we refer to the F Landau equation as To clarify the meaning of ∇ • , for a given test function φ = φ(v) ∈ ‫ޒ‬ d and vector-valued test function In this way, the F Landau equation ( 7) can be concisely written as Note, by formally testing (7) with φ = δF/δ f , one obtains an analogy of Boltzmann's H-theorem with the functional F: We will refer to D F as the F dissipation.This notation induces our notion of weak solutions to the F Landau equation ( 7) closely following Villani's H-solutions [1998a].
Definition 1 (weak F solutions).For T > 0, we say that a curve f ∈ C([0, T ]; L 1 ‫ޒ(‬ d )) is a weak solution to the F Landau equation ( 7) if the following hold: (1) f L is a probability measure with uniformly bounded second moment so that (2) The functional F evaluated along the curve is bounded by its initial value: (3) The F dissipation is time integrable: (4) For every test function φ ∈ C ∞ c ((0, T ) × ‫ޒ‬ d ), equation ( 7) is satisfied in weak form: For ϵ > 0, we will refer to the weak H ϵ solutions as ϵ-solutions and, recalling H is the Boltzmann entropy, we will refer to weak H solutions as just weak solutions or H-solutions.We deliberately use the terminology of H-solutions since the time integrability of D H ( f t ), as for [Villani 1998a], is essential in our analysis.
2.2.Quick review of gradient flow theory.We recall the basic definitions of gradient flow theory that can be found in more generality in [Ambrosio et al. 2008, Chapter 1].Throughout, (X, d) denotes a complete (pseudo-)metric space X with (pseudo-)metric d.Points a < b ∈ ‫ޒ‬ will refer to endpoints of some interval.F : X → (−∞, ∞] will denote a proper function. Definition 2 (absolutely continuous curve).A function µ : t ∈ (a, b) → µ t ∈ X is said to be an absolutely continuous curve if there exists m ∈ L 2 (a, b) such that for every s ≤ t ∈ (a, b) Among all possible functions m in Definition 2, one can make the following minimal selection.
Definition 3 (metric derivative).For an absolutely continuous curve µ : (a, b) → X , we define its metric derivative at every t ∈ (a, b) by Further properties of the metric derivative can be found in [Ambrosio et al. 2008, Theorem 1.1.2].
Definition 4 (strong upper gradient).The function g : X → [0, ∞] is a strong upper gradient with respect to F if for every absolutely continuous curve µ : t ∈ (a, b) → µ t ∈ X we have that g • µ : (a, b) → [0, ∞] is Borel and the following inequality holds: Using Young's inequality and moving everything to one side, the inequality in Definition 4 implies If the reverse inequality also holds, one obtains the stronger energy dissipation equality.This leads to our notion of gradient flows.
Definition 5 (curve of maximal slope).An absolutely continuous curve µ : (a, b) → X is said to be a curve of maximal slope for F with respect to its strong upper gradient g : is nonincreasing and the following inequality holds: F has the following natural candidates for upper gradient.
Definition 6 (slopes).We define the local slope of F by The superscript "+" refers to the positive part.The relaxed slope of F is given by 2.3.Main results.In order to understand the Landau equation as a gradient flow, we need to clarify what type of object the corresponding metric is.
• d L -bounded sets are weakly compact.
• For any τ ∈ P 2 ‫ޒ(‬ d ) the subset The content of this theorem is essentially that our new proposed distance actually provides a meaningful topological structure on P 2,E ‫ޒ(‬ d ).Furthermore, the connection to ϵ-solutions of Landau is established when considering the previous notions of slope and upper gradient with respect to d L .General conditions which guarantee d L (µ 0 , µ 1 ) < +∞ are presently unknown.In Lemma 15, we will see that a necessary condition is that µ 0 and µ 1 have the same mean velocity.Moreover, for γ ∈ [−4, −2], Lemma 15 asserts that they should have the same second moment.In the construction of d L detailed in Section 3, if µ = µ(t) for t ∈ [0, T ] is an H-solution of Landau, then it is certainly true that d L (µ(t), µ(s)) < +∞ for all 0 ≤ t, s ≤ T.
Theorem 8 (epsilon equivalence).Fix any ϵ, E > 0, γ ∈ [−4, 0].Assume that a curve µ : [0, T ] → P 2,E ‫ޒ(‬ d ) has a density µ t = f t L. Then µ is a curve of maximal slope for H ϵ with respect to its upper gradient D H ϵ if and only if its density f is an ϵ-solution to the Landau equation.
From the numerical perspective, we can also construct ϵ-solutions using the JKO scheme (see Section 5) which is the following: Theorem 9 (existence of curves of maximal slope).For any ϵ, E > 0, γ ∈ [−4, 0], and initial data µ 0 ∈ P 2,E ‫ޒ(‬ d ), there exists a curve of maximal slope in P 2,E ‫ޒ(‬ d ) for H ϵ with respect to its upper gradient D H ϵ .
Remark 10.The curves constructed in Theorem 9 do not necessarily have a density with respect to Lebesgue measure; the regularization allows H ϵ [µ] < +∞ without µ being absolutely continuous with respect to Lebesgue measure.Moreover, uniqueness of such curves is beyond the scope of the present work although it would be interesting to see what convexity properties are available for H ϵ with respect to d L .This could also shed some insight into the available convexity of H with respect to d L .
Remark 11.The choice of an exponential convolution kernel G ϵ for the regularized entropy H ϵ is perhaps unnatural compared to the Maxwellian regularization G 2,ϵ .We discuss in more detail the estimates that fail using G 2,ϵ in Remark 33 as it pertains to Theorem 8.With respect to Theorem 9, the general construction of some curve can be done even with the Maxwellian regularization.However, due to the same lack of estimates, this curve might not be a curve of maximal slope with respect to D H ϵ .This is discussed in Remark 37.
Motivated by recent numerical experiments [Carrillo et al. 2020], Theorems 8 and 9 provide the theoretical basis to this ϵ-approximated Landau equation.In the limit ϵ → 0, more assumptions are required.
Theorem 12 (full equivalence).We fix d = 3 and γ ∈ (−3, 0].Suppose that, for some T > 0, a curve µ : [0, T ] → P(‫ޒ‬ 3 ) has a density µ t = f t L that satisfies the following set of assumptions: (A1) (moments and L p ) Assume that, for some 0 < η ≤ γ + 3, we have (A2) (finite entropy) We assume that the initial entropy is finite (A3) (finite entropy-dissipation) We assume that the entropy-dissipation of f is integrable in time: Then µ is a curve of maximal slope for H with respect to its upper gradient √ D if and only if its density f is a weak solution of the Landau equation.
Remark 13.When γ ∈ [−2, 0], it is known that for suitable initial data (lying in weighted L p spaces for p large enough and for a sufficient power-like weight), weak solutions of Landau equation satisfying (A1)-(A3) are known to exist (and to be strong and unique under extra conditions).We refer to [Wu 2014], and Appendix B of [Desvillettes 2022] when γ > −2, for details.
The focus on the Maxwellian and soft potential regime γ ≤ 0 here is motivated by building a gradient flow framework to address the open questions for Landau.The hard potential case γ ∈ (0, 1) has already been studied in detail in [Desvillettes and Villani 2000b;2000a].We believe that our results also carry to the hard potentials.In particular, the exponents in (A1) should be modified to We emphasize that these conditions are guaranteed since the required moments and L p integrability are propagated from appropriate initial data when γ > 0 [Desvillettes and Villani 2000a;2000b].This condition appears in [Desvillettes 2016, Corollary 2.7].It is the hard potential version of Theorem 41, which is crucial to the proof of Theorem 12.Much of our analysis remains the same; however, the space P 2 should be changed to P 2+γ cohering with the moment condition above and trivializing Lemma 43, for example.
It is an open problem to find the range of values γ under which we can show the existence of curves of maximal slope for the original Landau equation (1), or equivalently, constructing solutions of the original Landau equation passing ϵ → 0 in Theorem 9. Some of the difficulties to achieve this result are the propagation of moments for the regularized Landau equation uniformly in ϵ and the compactness of sequences with bounded in ϵ regularized entropy dissipation D H ϵ .The rest of this work is devoted to showing the main four theorems in the next four sections.

The Landau metric d L
Our approach to defining the distance d L mentioned in Theorem 7 closely follows the dynamic formulation of transport distances originally due to Benamou and Brenier [2000] and further extended by Dolbeault, Nazaret, and Savaré [Dolbeault et al. 2009].We also refer the reader to [Erbar 2023] for a similar approach.
3.1.Grazing continuity equation.We consider for γ ∈ [−4, 0] the grazing continuity equation which is interpreted in the sense of distributions.For every The curves (µ t ) t∈[0,T ] , (M t ) t∈[0,T ] are Borel families of measures belonging to M + and M d respectively.
We will refer to µ from the pair as a curve and M as a grazing rate.For some regularity properties, we will also need to assume the moment condition We first establish some a priori properties of solutions to the grazing continuity equation.
Lemma 14 (continuous representative).For families (µ t ), (M t ) satisfying the grazing continuity equation and the finite moment condition (10), there exists a unique weakly* continuous representative curve ) and any t 0 , t 1 ∈ [0, T ], we have the formula Proof.This proof is nearly identical to [Ambrosio et al. 2008, Lemma 8.1.2].There, it was crucial to estimate the distributional time derivative of t → µ t .We perform the analogous estimate here to highlight the difference in our context.Fix ζ ∈ C ∞ c ‫ޒ(‬ d ) and consider the map According to (9), the distributional time derivative is Depending on the values of γ above or below −2, the integrand can be estimated: Consequently, using the moment condition (10), we have the following estimates depending on γ ∈ [−4, 0]: The rest of the proof proceeds as in [Ambrosio et al. 2008, Lemma 8.1.2]] be Borel families of measures in M + , M d respectively satisfying (8) and the moment condition (10).Assume further that (µ t ) t∈[0,T ] is weakly* continuous with respect to t.We have that mass and momentum are conserved: In the case γ ∈ [−4, −2] we have that the energy is conserved: Proof.To minimize clutter, we introduce w = |v − v * | 1+γ /2 .We show the proof of the conservation of energy for γ ∈ Using the grazing continuity equation, we have where we have controlled the difference with a mean-value-type estimate.From the previous bounds, we can use hypothesis (10) to take R → ∞ in ( 11) and obtain the conservation of energy lim The proofs for conservation of mass and momentum involve testing the grazing continuity equation against φ R and v i φ R respectively, where v i is the i-th component of v.For these statements, the case γ ∈ [−4, −2] follows in the same way.For γ ∈ [−2, 0], the estimates can be more blunt since the weight is no longer singular.□ Remark 16.Note that as γ increases into the range (−2, 0], the weight function w starts adding growth so the mean-value-type argument in Lemma 15 no longer helps unless more moments of M are assumed than (10).Due to the conservation of mass, the unique weakly* continuous representative ( μt ) of Lemma 14 has the additional property of being weakly continuous in the context of P(‫ޒ‬ d ).
Based on the previous results, we propose the following definition.
Definition 17 (grazing continuity equation).For some terminal time T > 0, we define GCE T to be the set of pairs of measures (µ t , M t ) t∈[0,T ] satisfying the following: (1) (2) We have the moment bound (3) The grazing continuity equation ( 8) is satisfied in the distributional sense.That is, for every φ ∈ For fixed probability measures λ, ν, we may also specify the subset GCE(λ, ν) as those pairs (µ, M) ∈ GCE T such that µ 0 = λ, µ T = ν.For E > 0, we will speak of curves (µ, M) ∈ GCE 2,E T such that 3.2.Action of a curve.In this section, we construct the action of a curve under the grazing continuity equation.We introduce the function α : Remark 18.The function α is lower semicontinuous (lsc), convex, and positively 1-homogeneous.
Lemma 19.Let µ ∈ P(‫ޒ‬ d ) be absolutely continuous with respect to L and µ = f L. Let M ∈ M d be given such that A(µ, M) < ∞.Then, M is absolutely continuous with respect to f f * dv dv * given by some density U : Proof.The proof is identical to [Erbar 2023, Lemma 3.6] up to appropriate modifications.Define τ ∈ M by τ = µ⊗µ+|M| and label the corresponding densities (which may be infinite) µ⊗µ = gτ and M = N τ .It suffices to show that M is absolutely continuous with respect to µ ⊗ µ, which is the goal of this proof.Suppose S ⊂ ‫ޒ‬ 2d is a measurable set such that µ⊗µ(S) = 0.This is equivalent to saying g = 0 τ -almost everywhere in S. Since α is positive, the assumption A(µ, M) < +∞ certainly implies α(N , g) < +∞ τ -almost everywhere in S. By the definition of α, we must also have N = 0 τ -almost everywhere in S, which is equivalent to saying M(S) = 0. □ Lemma 20 (lower semicontinuity of action functional).The action functional A as defined in (12) is lower semicontinuous in both arguments.Specifically, if µ n ⇀ µ weakly in P(‫ޒ‬ d ) and M n * Proof.This result is an application of the general lsc result in [Buttazzo 1989, Theorem 3.4.3]since α satisfies the required convexity, lsc, and homogeneity assumptions by Remark 18. □ Another useful property of the action functional is the compactness provided by bounded action.We first state: Lemma 21.Let F : ‫ޒ‬ 2d → [0, ∞] be measurable and fix any µ ∈ P(‫ޒ‬ d ), M ∈ M d .We have the following bound: Proof.This proof follows [Erbar 2023, Lemma 3.8].We assume A(µ, M) < +∞ or else (13) holds automatically.This implies that whenever A ⊂ ‫ޒ‬ 2d is a measurable set, µ ⊗ µ(A) = 0 if and only if Therefore, in the following computations we are implicitly integrating away from sets of zero µ ⊗ µ-measure.We provide the simple argument by Cauchy-Schwarz for completeness.By considering τ = µ ⊗ µ + |M|, we estimate Then for M ∈ M d T the previous estimate (13) yields Therefore, if the integral in time of the second moment of µ is bounded, then M satisfies the moments conditions (10) and the energy is conserved (Lemma 15).In the sequel, we will be considering curves that have bounded second moment which guarantee ( 14).
Proposition 23.Let (µ n t , M n t ) n be a sequence in GCE T such that (µ n 0 ) n is tight and we have the uniform bounds Then, there exists (µ t , M t ) ∈ GCE T such that, possibly after extracting a subsequence, we have the convergences µ n t ⇀ µ t weakly in P(‫ޒ‬ d ) for all t ∈ [0, T ], Furthermore, along this subsequence we have the lower semicontinuity Sketch of the proof.This result follows from a proof similar to that of [Dolbeault et al. 2009, Lemma 4.5] and [Erbar 2023, Proposition 3.11], which we sketch.The second moment bound for µ n in (15) produces a limit µ.Recalling the application of Lemma 21 in Remark 22, the bounded action in (15) and the estimate ( 14) produce a limit M t dt for a subsequence of M n t dt.The lower semicontinuity follows from Fatou's lemma and Lemma 20. □ 3.3.Properties of the Landau metric.We define the distance, d L induced by the action functional on P 2,E ‫ޒ(‬ d ).Throughout, we will be working in the grazing continuity equation space defined earlier by T for T > 0 some terminal time and E > 0 any second moment bound.Definition 24.For λ, ν ∈ P 2,E ‫ޒ(‬ d ) we define the (square of the) Landau distance by Notice this definition is independent of T > 0 considering the scaling of the grazing collision equation and the 1-homogeneity of A. We have an equivalent characterization of d L which can be seen in other PDE contexts such as [Erbar 2023; Dolbeault et al. 2009].
Proof.This proof uses the same reparametrization technique in [Dolbeault et al. 2009, Theorem 5.4].□ Proposition 26 (minimizing curve).Suppose that µ 0 , µ 1 ∈ P 2,E ‫ޒ(‬ d ) are probability measures such that d L (µ 0 , µ 1 ) < ∞.Then there exists a curve (µ, M) ∈ GCE 2,E 1 (µ 0 , µ 1 ) attaining the infimum of (16) (equivalently, also (17)) and A(µ t , M t ) = d 2 L (µ 0 , µ 1 ) for almost every t ∈ [0, 1].Proof.This result follows from the direct method of calculus of variations where the lower semicontinuity comes from Proposition 23. □ Proof of Theorem 7. We prove the statements in exactly the order they are presented in the theorem, starting with the properties of the proposed Landau distance as a metric.The positivity of d L follows from the positivity of α.We now check that d L satisfies the properties of a metric.
, which is a minimizing curve and moreover 0 = d L (µ 0 , µ 1 ) = A(µ t , M t ) implies M = 0.The grazing continuity equation reduces to ∂ t µ t = 0, which implies µ t is constant in time.
The converse statement follows similarly by pairing the constant curve µ : t → µ 0 = µ 1 with the zero measure so that (µ, 0) ∈ GCE 2,E 1 (µ 0 , µ 1 ).Symmetry: Symmetry follows because time can be reversed for every curve.For instance, if (µ, M) ∈ GCE 2,E T (µ 0 , µ 1 ), then one can check that the pair belongs to GCE 2,E T (µ 1 , µ 0 ) with the same action.Triangle inequality: We sketch the argument using a gluing lemma as in [Dolbeault et al. 2009, Lemma 4.4 holds trivially.By Proposition 26, we can find minimizing curves connecting these probability measures .
Their concatenation from time 0 to 1 is given by 1 (µ 0 , µ 2 ), so it is an admissible competitor in the computation of d L (µ 0 , µ 2 ).By looking at the action on the different time pieces, we obtain ⇀ M up to a subsequence.Moreover, the lower semicontinuity in Proposition 23 gives = lim n→∞ µ n , which establishes the weak convergence.
(Pτ , dL ) is a complete geodesic space: We start with the geodesic property from completely analogous arguments to [Erbar 2023]; the remaining statement that P τ equipped with d L is a complete geodesic space follows.Fix τ ∈ P 2,E ‫ޒ(‬ d ) with µ 0 , µ 1 ∈ P τ .The triangle inequality ensures d L (µ 0 , µ 1 ) < ∞ so Proposition 26 guarantees the existence of a minimizing curve (µ, M) ∈ GCE 2,E 1 (µ 0 , µ 1 ).One easily sees that this also induces a minimizing curve for intermediate times.More precisely, for every 0 ≤ r ≤ s ≤ 1, we have that (t → µ t+r , t → M t+r ) ∈ GCE 2,E s−r (µ r , µ s ) also minimizes d L (µ r , µ s ).To show completeness, let (µ n ) n∈‫ގ‬ be a Cauchy sequence in P τ .The sequence is certainly d L -bounded so by Proposition 23, we can find, up to extraction of a weakly convergent subsequence, Lower semicontinuity of d L and the Cauchy property of the subsequence give For any n ∈ ‫ގ‬ the triangle inequality gives ) is absolutely continuous with respect to d L if and only if there exists a Borel family In this equivalence, we have a bound on the metric derivative Furthermore, there exists a unique Borel family ( M t ) t∈[0,T ] belonging to M d which is characterized by where we have equality Proof.The argument follows exactly as in [Dolbeault et al. 2009, Theorem 5.17].□

Energy dissipation equality
The goal in this section is to prove Theorem 8, which states that the notions of gradient flow solutions coincide with ϵ-solutions to the Landau equation.To fix ideas, we recall the regularized entropy functionals acting on probability measures The crucial ingredient to prove Theorem 8 is the following: Then, sup t∈[0,T ] H ϵ [µ t ] < ∞ and the "chain rule" holds: Remark 29.Recall the expression for the dissipation Using a time integrated version of Lemma 21, we have the estimate Therefore, under the hypothesis of Proposition 28, we have that Taking Proposition 28 for granted, we can prove Theorem 8.
Proof of Theorem 8. Throughout, µ = f L is a curve of probability measures with uniformly bounded second moment.
Weak ϵ-solution = ⇒ curve of maximal slope: Consider f an ϵ-solution to the Landau equation.Define m = − f f * ∇(δH ϵ /δ f ) so that the pair of measures (µ = f L, M = mL ⊗ L) therefore belong to GCE E T .Indeed, the distributional grazing continuity equation from Definition 17 is precisely the weak ϵ-Landau equation.Based on the definition of M and the finite H ϵ dissipation, we have the bound which implies the weak continuity of µ.By Proposition 27, we have Using Proposition 28, we have, for any 0 ≤ s ≤ r ≤ T, According to Definition 5, this is the curve of maximal slope property.
Curve of maximal slope = ⇒ weak ϵ-solution: Assume that µ = f L is a curve of maximal slope for H ϵ with respect to the upper gradient Since µ is absolutely continuous with respect to d L , Proposition 27 guarantees existence of a unique curve According to Lemma 19, let M = mL ⊗ L for some measurable function m.We apply the chain rule (18) with Cauchy-Schwarz and Young's inequalities with minus signs in the following computations: All the inequalities in the calculations above are actually equalities owing to the fact that µ is a curve of maximal slope.In particular, since we have the equality in the Young's inequality, this implies As in the previous direction, the weak ϵ Landau equation coincides with the grazing continuity equation when m is equal to The rest of this section is devoted to proving Proposition 28.We need some lemmas to establish crucial estimates.The following result is a variation of [Carlen and Carvalho 1992, Lemma 2.6].
Lemma 30 [Carlen and Carvalho 1992].Let µ be a probability measure on ‫ޒ‬ d with finite second moment/energy, m 2 (µ) ≤ E for E > 0.Then, for every ϵ > 0, there exists a constant C = C(ϵ, E) > 0 such that Proof.Starting with an upper bound, we easily see Turning to the lower bound, we cut off the integration domain to |v ′ | ≤ R for some R > 0 to be chosen later.We estimate, for ϵ > 0 small enough, At this point, we appeal to Chebyshev's inequality to see We can now choose, for example, large R such that 1 − E/R 2 ≥ 1 2 to uniformly lower bound the integral |v ′ |≤R dµ(v ′ ) away from 0 and then conclude the result after applying logarithms.□ Lemma 31 (log-derivative estimates).For fixed ϵ > 0 we have the formula Proof.Equation ( 19) is a direct computation after noticing The first order log-derivative estimate of (20) is calculated using formula (19) to obtain For the second order, we first look at ∂ i j µ * G ϵ which can be computed with the help of (19): Combining this estimate with the previous first-order one, we have for some E > 0. We have (1) Moderately soft case γ ∈ [−2, 0]: (2) Very soft case γ ∈ [−4, −2]: In particular, it holds Proof.We develop the expression for ∇(δH ϵ /δµ) in integral form to be used throughout this proof: (1) Moderately soft case γ ∈ [−2, 0]: We use (a concave version of) the triangle inequality (valid since 1 + γ /2 ≥ 0) and the first estimate of (20) to bound the last line of (21): ( ≤ 1; hence we can brutally estimate (21) using again the first estimate of (20) to obtain, similar to the moderately soft case, the estimate |v − v * | ≤ 1: We can remove the singularity from the weight with a mean-value estimate and the second estimate of (20): Inserting this into (21), we have Remark 33.Originally, we considered the general family of convolution kernels G s,ϵ described in Section 2.1.Besides the context of the Landau equation, Lemma 31 (excluding the second-order logderivative estimate) can be generalized to this family of s-order tailed exponential distributions with additional moment assumptions on µ.In particular, ( 19) and (20) (for s ≥ 1) become Since Maxwellians are known to be stationary solutions for the Landau equation, we wanted to perform the regularization with s = 2.However, the analogous estimates of Lemma 31 for s = 2 are not sufficient for Lemma 32 in the P 2 framework.For example, in the moderately soft potential case, the estimate reads However, there is one value of γ = −2 for which the estimates hold when using a Maxwellian regularization kernel G 2,ϵ .A restriction to P 4 resolves the issue mentioned above for the moderately soft potential case, but then a fourth moment propagation is needed, which we did not pursue.A similar issue is present in the very soft potential case.
Proof of Proposition 28.To prove (18), our strategy is to regularize the pair (µ, M) in time with parameter δ > 0 and differentiate the regularization.Then we obtain uniform bounds in δ needed to take the limit δ → 0.
Finite regularized entropy: We have the following chain of inequalities: The first inequality comes from Lemma 30 because log(µ t * G ϵ ) has linear growth (uniform in time) while in the second inequality, one realizes that µ t * G ϵ has as many moments as µ t with computable constants.
Time regularization with δ > 0: Without loss of generality, let µ be the weakly time continuous representative (Lemma 14) and M be the optimal grazing rate (Proposition 27) achieving the finite distance d L .
We first regularize the pair (µ, M) in time for a fixed parameter δ > 0 as follows.Take η ∈ C ∞ 0 ‫)ޒ(‬ with the following properties: We define the following measures for t ∈ [0, T ], by taking convex combinations: Here, we constantly extend the measures in time.That is, if t − δt ′ ∈ [−δ, 0], we treat µ t−δt ′ = µ 0 , M t−δt ′ = 0.For the other end point, if t − δt ′ ∈ [T, T + δ], we set µ t−δt ′ = µ T , M t−δt ′ = 0.This transformation is stable so that (µ δ , M δ ) ∈ GCE T and in particular, the distributional grazing continuity equation holds: We derive (18) using this regularized grazing continuity equation.Consider which we differentiate with respect to t by appealing to the dominated convergence theorem.Firstly, due to the time regularization, we have The L 1 v bound is obtained on the following difference quotient for a fixed time step h > 0: where we have used the mean value theorem with the chain rule.Applying Lemma 30, we obtain We apply the mean value theorem on the difference quotient again to get Since µ has finite second-order moments, this last expression belongs to L 1 v .By the dominated convergence theorem, The last line is achieved by the self-adjointness of convolution with G ϵ and eliminating the constant term due to the conserved mass of µ δ .Integrating in t, we obtain We now turn to establishing estimates independent of δ > 0 to pass to the limit.
Estimates on the right-hand side of ( 22): According to Lemma 32, we have the estimate where p ≤ 1.By the first moment assumption of M t , we have This estimate also extends to Note that these estimates are independent of δ > 0.
Convergence δ → 0: Firstly, we establish the following identity which will be useful later.For fixed functions f 1 , f 2 we have Using the weak in time continuity of µ, we can consider The "•" stands for the convoluted variable.Since t belongs to a compact set, the function t → ⟨µ t , G ϵ (v ′ − • )⟩ is uniformly continuous from the weak continuity of µ.In particular, using the continuity in v ′ and the lower bound from Lemma 30 we conclude that for any R > 0 Therefore by Lemma 30, defining w = |v − v * | 1+γ /2 , and using ( 23) with For a fixed (v, v * ), we obtain the convergence to zero by taking δ → 0 and R 0 → ∞ in the previous estimate.This holds for all γ ∈ [−4, 0] by taking advantage of the regularity of G ϵ .Using continuity, we obtain that for any R > 0 We turn to the limit estimate for the right-hand side of ( 22).For any R > 0, we have The last term is o(1) as δ → 0 due to similar estimates from the previous step.By sending δ → 0 (the first term vanishes due to ( 25)) and then sending R → ∞ (the second term vanishes again due to the estimate from the previous step), we obtain the convergence Convergence of the left-hand side of ( 22): By ( 24), Lemma 30 and the uniform bound on the second moment, we have → 0 as δ → 0. Therefore, by the previous equation and ( 26) we can take δ → 0 in ( 22) to obtain which is the desired result.□

JKO scheme for ϵ-Landau equation
This section is devoted to the proof of Theorem 9 after a series of preliminary lemmas.Our construction of curves of maximal slope in Theorem 9 uses the basic minimizing movement/variational approximation scheme of [Jordan et al. 1998].Fix a small time step τ > 0 and initial datum µ 0 ∈ P 2,E ‫ޒ(‬ d ) and consider the recursive minimization procedure for n ∈ ‫ގ‬ Then, we concatenate these minimizers into a curve by setting The scheme given by ( 27) and ( 28) satisfies the abstract formulation in [Ambrosio et al. 2008] giving: Proposition 34 (Landau JKO scheme).For any τ > 0 and µ 0 ∈ P 2,E ‫ޒ(‬ d ), there exists ν τ n ∈ P 2,E ‫ޒ(‬ d ) for every n ∈ ‫ގ‬ as described in (27).Furthermore, up to a subsequence of µ τ t described in (28) as τ → 0, there exists a locally absolutely continuous curve (µ t ) t≥0 such that µ τ t ⇀ µ t for all t ∈ [0, ∞).
Proof.Our metric setting is (P µ 0 , d L ) (see Theorem 7) with the weak topology σ .This space is essentially P 2,E ‫ޒ(‬ d ) except we need to make sure that d L is a proper metric; hence we remove the probability measures with infinite Landau distance.We follow the proof of (1) H ϵ is sequentially σ -lsc on d L -bounded sets: Suppose It is known that , we achieve the first property.
(2) H ϵ is lower bounded: By Lemma 30 for fixed ϵ > 0, log(µ * G ϵ ) is uniformly lower bounded by a linearly growing term.For fixed µ ∈ P 2,E ‫ޒ(‬ d ), we have, with Cauchy-Schwarz, (3) d L -bounded sets are relatively sequentially σ -compact: This is one of the consequences from Theorem 7.
Proof.For fixed ϵ, R 1 , R 2 > 0 and γ ∈ ‫,ޒ‬ take T > 0 from Theorem 48 in the Appendix and the unique weak solution µ ∈ C([0, T ]; P 2 ‫ޒ(‬ d )) to The functions 0 ≤ φ R 1 , ψ R 2 ≤ 1 are smooth cut-off functions with the following properties: The notation J ϵ 0 from the Appendix means For this proof alone, we define the reduced ϵ-entropy-dissipation On the other hand, as the ϵ-entropy dissipation comes from the negative time derivative of entropy, we have In the first inequality, we estimated d L (µ 0 , µ t ) by considering the PDE in this lemma as the grazing collision equation with M = −(µ ⊗ µ) ∇ log µ 0 .In the last inequality, we have used the Lebesgue differentiation theorem with strong-weak convergence since µ is continuous in time as well as the fact that We are left with the inequality Owing to the many regularizations applied, the ϵ-entropy-dissipation µ → D R 1 , R 2 ϵ (µ) is continuous with respect to weak convergence of probability measures.By considering weakly convergent sequences and passing to the limit inferior, we deduce the same inequality with the relaxed slope Thus, an application of the monotone convergence theorem in the limit R 1 , R 2 → ∞ on the above inequality completes the proof.□ Lemma 36.|∂ − H ϵ | is a strong upper gradient for H ϵ in P µ 0 ‫ޒ(‬ d ), where µ 0 ∈ P 2,E ‫ޒ(‬ d ).
Proof.Fix λ, ν ∈ P µ 0 ‫ޒ(‬ d ) so that by the triangle inequality of Theorem 7, we have d L (λ, ν) < ∞.Now by Proposition 26, there exists a pair of curves (µ, M) ∈ GCE E 1 connecting λ, ν and A(µ t , M t ) = d 2 L (λ, ν) for almost every t ∈ [0, 1].Using Remark 29 and Lemma 35, we have □ We now have all the ingredients to prove Theorem 9 so that we can relate curves of maximal slope to weak solutions of the ϵ-Landau equation.
Proof of Theorem 9. Take a limit curve µ t constructed in Proposition 34.By the previous Lemma 36, the assumptions of [Ambrosio et al. 2008, Theorem 2.3.3] are fulfilled so the curve is of maximal slope with respect to |∂ − H ϵ | and satisfies the associated energy dissipation inequality The inequality of Lemma 35 gives which is precisely the statement that the limit curve µ t is a curve of maximal slope with respect to Remark 37. The results of Proposition 34 and Lemma 35 can be generalized to other regularization kernels G s,ϵ , in particular, the Maxwellian regularization.However, this is not the case for Lemma 36 since the proof relies on Proposition 28; see Remark 33.
6. Recovering the full Landau equation as ϵ → 0 Theorems 8 and 9 provide the basic existence theory for the ϵ > 0 approximation of the Landau equation.
In this section, we prove the ϵ ↓ 0 analogue of Theorem 8, which is Theorem 12.By definition, both H-solutions and curves of maximal slope to the full Landau equation dissipate the entropy.Therefore, the assumption of finite initial entropy (A2) automatically ensures sup In the sequel, every quotation of (A2) will refer to this bound.
Sketch of the proof of Theorem 12.By repeating the proof of Theorem 8, we see that the crucial ingredient is the chain rule (18) in Proposition 28.For now assume the following: Claim 38.Assume (A1), (A2), (A3) and let M be any grazing rate such that (µ, M) ∈ GCE E T and Then we have the chain rule By following the steps of the proof of Theorem 8 and using (29) instead of ( 18), one completes the proof of Theorem 12.We dedicate this section to proving Claim 38.
Equation ( 29) is clearly the ϵ ↓ 0 limit of ( 18).The left-hand side of ( 29) can be obtained from the left-hand side of (18) using the finite entropy (A2) and the fact that ϵ → H ϵ [µ t ] is nonincreasing for every t.We refer to [Erbar 2023, Proof of Proposition 4.2, Step 4(d)] for more details on a similar argument.
The difficulty remains in deducing that the right-hand side of (18) converges to the right-hand side of (29) as ϵ ↓ 0 given by under the additional assumptions (A1), (A2), (A3) on f.The key result which we will use repeatedly in this section is the following theorem which is a specific case of the result in [Royden 1963, Chapter 4, Theorem 17].
(2) H ϵ and I ϵ converge pointwise a.e. to H and I , respectively.
Then, we have the convergence Setting M = mL ⊗ L (valid by Lemma 19) and using Young's inequality on the right-hand side of (18), we obtain the majorants Notice that the first term is precisely the integrand of D ϵ , while the second term is the integrand of the action functional A(µ t , M t ), which has no dependence on ϵ and is henceforth ignored.We can apply the EDCT (Theorem 39) with X = (0, T ) × ‫ޒ‬ 6 to prove (30) once we show The pointwise a.e.convergence hypothesis of Theorem 39 is straightforward based on the regularization of H ϵ through G ϵ .Focusing on (31), we will use a standard dominated convergence theorem (DCT) for the integration in the t-variable, by proving where C > 0 is a constant independent of ϵ > 0. The estimate of (32) guarantees the L 1 t majorization due to the finite entropy-dissipation (A3).□ Our estimates in this section accomplish both the convergence and the estimate of (32) by nested application of Theorem 39.The significance of all three assumptions (A1), (A2), and (A3) will be apparent in proving the convergence in (32).
Remark 40.In this section, the only properties of G ϵ we use are that it is a nonnegative radial approximate identity with sufficiently many moments.As in the construction of minimizing movement curves in Section 5, the results of this section can be achieved with other radial approximate identities.
6.1.Outline of technical strategy to prove (32).The need to apply Theorem 39 instead of the more classical Lebesgue DCT is that we are unable to prove pointwise estimates in v for the function v Instead, our estimates in this section rely on the self-adjointness of convolution against radial exponentials (SACRE) to construct a convergent majorant in ϵ.
Step 1: finding majorants and appealing to Theorem 39.We seek to find pointwise a.e.majorants in the v-variable: where I 1 ϵ (v) satisfies the hypothesis for the majorant in Theorem 39.We show that I 1 ϵ converges pointwise to some I 1 , since I 1 ϵ depends on ϵ only through convolutions against G ϵ , which is an approximation of the identity.Hence, we are left with showing the integral convergence of Theorem 39(3) Step 2: use SACRE with G ϵ .To show the integral convergence for I 1 ϵ , we find functions A 1 and B 1 such that and apply Theorem 39.As in the previous step, the pointwise convergence is easily proved.Hence, we are left to show the integral convergence The key observation is applying SACRE to obtain Therefore, we have reduced the problem to showing integral convergence of Theorem 39(3) for I 2 ϵ (as the pointwise convergence is easily proved).
Step 3: repeat Step 2. We repeat the process outlined in Step 2 by finding functions A 2 and B 2 such that we have the pointwise bound Again the pointwise convergence for the majorant follows easily; hence we only need to check the integral convergence of Theorem 39(3) given by Using SACRE, we study instead the integral convergence of Eventually, after a finite number of times of finding majorants and applying SACRE, we will obtain a majorant I i ϵ for which the estimates and the convergence as ϵ → 0 follow from the standard Lebesgue DCT, using the bound of the weighted Fisher information in terms of the entropy-dissipation (see Theorem 41) and (A3).

Preparatory results.
As mentioned in the previous section, for the final step of the proof we need a bound on the weighted Fisher information and a closely related variant in terms of the entropy-dissipation originally discovered by the third author in [Desvillettes and Fellner 2006].
Theorem 41.Suppose γ ∈ (−4, 0] and let f ≥ 0 be a probability density belong to L 1 2−γ ∩ L log L(‫ޒ‬ 3 ).We have where C > 0 is a constant depending only on the bounds of m 2−γ ( f ) and the Boltzmann entropy, The estimate in this precise form can be found in [Desvillettes 2022, Proposition 4, p. 10].We will refer to the second term on the left-hand side as a "cross Fisher information".We mention here that (A2) enters in the sequel since the constant C > 0 in Theorem 41 depends on bounds for H[ f ].
To decompose the entropy-dissipation in a manageable way that makes the cross Fisher term more apparent, we have the following linear algebra fact.
Lemma 42.For x, y ∈ ‫ޒ‬ 3 , we have Proof.Without loss of generality, we assume neither x, y = 0 or else the statement holds trivially.Let θ be an oriented angle between x and y.We expand the definition of [x] and observe The following lemma shows how we use (A1) to control the singularity of the weight.
Lemma 43.Given γ ∈ (−3, 0], assume that f satisfies (A1) for some 0 < η ≤ γ + 3. Then we have for a.e.t where Proof.We will only prove the first inequality of (33) since the second inequality uses the same procedure.We split the estimation for local |v| ≤ 1 and far-field |v| ≥ 1.
Case 1: |v| ≤ 1.We split the integral over v * into two regions where we have used that ‫ޒ‬ 3 f = 1 and γ ≤ 0. For the integral with the singularity, we apply Young's convolution inequality with conjugate exponents Here, ω 2 is the volume of the unit sphere in ‫ޒ‬ 3 .
Case 2: |v| ≥ 1.Once again, we split the integral into two parts The first term and second term come from the following inequalities based on their respective integration regions: We estimate the first integral using the unit mass of f, while the second integral is more delicate but again uses the splitting of the previous step to obtain In the large brackets, the first integral can be estimated by m −γ ( f ).Now we use the same Young's inequality argument for the remaining integral to obtain The proof is complete by combining the estimates for |v| ≤ 1 and |v| ≥ 1. □ Lemma 44 (Peetre).For any p ∈ ‫ޒ‬ and x, y ∈ ‫ޒ‬ d , we have Proof.Our proof follows [ Barros-Neto 1973].Starting with the case p = 2, for fixed vectors a, b ∈ ‫ޒ‬ d we have, with the help of Young's inequality, Dividing by ⟨b⟩ 2 and setting a = x − y, b = −y, we obtain the inequality for p = 2 By taking nonnegative powers, this proves the inequality for p ≥ 0. On the other hand, when we divided by ⟨b⟩ 2 we could have also set a = x − y, b = x to obtain ⟨y⟩ 2 ⟨x⟩ 2 ≤ 2⟨x − y⟩ 2 .Taking strictly nonnegative powers here proves the inequality for p < 0. □ Next, we prove an estimate for algebraic functions (growing or decaying) convoluted against G ϵ with respect to the original function.
Lemma 45.For any p ∈ ‫,ޒ‬ we have where C > 0 is a constant depending only on | p| and m | p| (G).
Proof.We use Peetre's inequality in Lemma 44 to introduce v − w into the angle brackets We stress that Peetre's inequality in Lemma 44 is necessary for the estimate of Lemma 45 with nonpositive powers p which we apply in the sequel.Finally, the last result we will need is an integration by parts formula for the differential operator associated to the cross Fisher information.
Lemma 46 (twisted integration by parts).Let f, g be smooth scalar functions of ‫ޒ‬ 3 which are sufficiently integrable.Then, we have the formula Here, the meaning of 6.3.Proof of (32) using Theorem 39.We start by decomposing and estimating the integrand of D ϵ .With the help of Lemma 42, we expand the square term of the integrand to see where we use the shorthand notation By using that G ϵ is an approximation of the identity, we know that the integrand of D ϵ converges pointwise a.e. to the integrand of D as ϵ ↓ 0. As well, each i ⃝ for i = 1, 2, 3, 4 converges pointwise a.e. to 1 By Theorem 39, to show the integral convergence in (32), it suffices to show, for example, and similarly for each i ⃝ for i = 2, 3, 4. By symmetry considerations when swapping the variables v ↔ v * , the convergence for the terms 1 ⃝ and 4 ⃝ controls the convergence for 2 ⃝ and 3 ⃝, respectively.Hence we will focus on the term 4 ⃝ first and then on term 1 ⃝.
6.3.1.Term 4 ⃝.We seek to show in the limit ϵ ↓ 0 By the reordering of integrations written above, we now think of the double integral over v, v * of To be precise, we wish to apply Theorem 39 with X = ‫ޒ‬ 3 with We can use Cauchy-Schwarz on the convolution integral to absorb the power term as follows: where the last inequality comes from Lemma 45.Continuing with Lemma 43, we have By Theorem 39, we reduce the problem to showing in the limit ϵ ↓ 0 This is were we use SACRE, Step 2 of our general strategy in Section 6.1.Application of SACRE and further simplification using the specific forms of a ϵ and b ϵ (see ( 34)) yields We work with this simplified expression and note that pointwise convergence is still valid Next, we notice that the function β : (F, f ) → |F| 2 / f is jointly convex in F ∈ ‫ޒ‬ 3 and f > 0, so we can use Jensen's inequality with b ϵ = G ϵ as the reference probability measure to obtain a further pointwise majorant for the integrand of (36) Using Theorem 39 again, we reduce the problem to showing in the limit ϵ ↓ 0 We use SACRE once more and place the convolution onto the weight term Now, we are in a position to apply the classical dominated convergence theorem.We notice that we have the pointwise convergence Furthermore, using Lemma 45, we can estimate b ϵ * ⟨ • ⟩ γ uniformly in ϵ to find the domination Using Theorem 41, the finite entropy-dissipation (A3), and uniformly bounded entropy (A2) (remember the constant in Theorem 41 depends also on bounds for the entropy), we know that the right-hand side belongs to L 1 v for a.e.t ∈ (0, T ).Therefore, for a.e.t ∈ (0, T ) the conditions of the dominated convergence theorem are satisfied so we have the integral convergence We have closed the argument for the convergence of ( 35) after retracing the previous estimates with Theorem 39.
6.3.2.Term 1 ⃝.We seek to show in the limit ϵ ↓ 0 using the same strategy of nested applications of Theorem 39 like in Section 6.3.1.We will encounter difficulty when trying to use Jensen's inequality due to the cross Fisher information term.As in Section 6.3.1, we have written this double integral over v, v * as a single integral over v.By Theorem 39 and Lemma 43, it suffices to show the integral convergence of to obtain the integral convergence of (37).Pointwise, we can make the following manipulations: where we have used the radial symmetry of G ϵ to get the cancellation (v − w) × ∇G ϵ (v − w) = 0 and the twisted integration by parts Lemma 46 (we note that we do not pick up any signs in the integration by parts, as the variable w appears with a minus sign in the argument of G ϵ ).We apply Cauchy-Schwarz, multiply and divide by ⟨w⟩ γ , and use Lemma 45 to obtain Remembering that this majorant holds pointwise on the integrand of (38), we multiply by ⟨v⟩ γ f (v) and obtain Now, we recognize a convolution inside the brackets.Hence, using SACRE we can rewrite Using Theorem 39, we need to show the convergence of the right-hand side.Here, it is now possible to use Jensen's inequality after some more manipulations.
Proof of Claim 47.We start by repeating an argument similar to (39).Using that G ϵ is radially symmetric and the twisted integration by parts Lemma 46, we obtain Therefore, since β : (F, f ) → |F| 2 / f is jointly convex in F ∈ ‫ޒ‬ 3 and f > 0, we apply Jensen's inequality with G ϵ as the reference probability measure to the left-hand side of (40) to see which proves the claim.□ Continuing, by Theorem 39, we seek to establish the integral convergence of Finally, the integrand of the right-hand side has a majorant due to Lemma 45 Once again, using Theorem 41 and Assumptions (A3) and (A2), we obtain that for a.e.t ∈ (0, T ) the right-hand side belongs to L 1 v ‫ޒ(‬ 3 ).Using dominated convergence theorem, we see that the integral converges.Tracing back the estimates, this takes care of the convergence of the term 1 ⃝ and establishes the convergence in (38).
We note that the estimates in the previous subsections not only establish the a.e.pointwise convergence of (32), but also the majorization by Lemma 43.Hence, using (A3) and ( 32) we can apply Lebesgue DCT to pass to the limit in the time integral and show the desired chain rule Claim 38.
Appendix: An auxiliary PDE for Lemma 35 In this section, we fix ϵ > 0 throughout and study weak solutions to the PDE We assume the initial data µ 0 belongs to P 2 ‫ޒ(‬ d ).For R 1 , R 2 > 0, the functions 0 ≤ φ R 1 , ψ R 2 ≤ 1 are smooth cut-off functions used to approximate the identity function in different ways: For ϵ > 0, J ϵ 0 is the gradient of first variation of H ϵ applied to µ 0 , meaning The main result of this section is: Then, there is a global unique weak solution µ ∈ C([0, +∞); P 2 ‫ޒ(‬ d )) to (41).
By Lemma 31, we know that J ϵ 0 is uniformly bounded (with constant depending on ϵ and µ 0 only through bounds on its second moment).The purpose of φ R 1 , φ R 1 * is to cut off the growth of J ϵ 0 , J ϵ 0 * to ensure that the "velocity field" in the right-hand side of ( 41) is globally Lipschitz (it is, in fact, smooth and compactly supported).The ψ R 2 (v − v * )-term avoids the possible singularities coming from the weight |v − v * | γ +2 for soft potentials γ < 0.
The construction of the solution in Theorem 48 is given in two steps.Firstly, a local well-posedness theory established to some finite time interval T > 0 which depends on ϵ, γ , R 1 , R 2 and µ 0 .Secondly, the time of existence (and uniqueness) is extended to +∞ since T depends on µ 0 only through its second moment, which is conserved by the evolution of (41).
Remark 49.Since we are cutting off the "velocity" field at radius R 1 , R 2 , the growth of J ϵ 0 is inconsequential.Hence the results of this section can be applied when replacing the convolution kernel of J ϵ 0 with general tailed exponential distributions G s,ϵ (v) for s > 0.
For µ ∈ P 2 ‫ޒ(‬ d ), we will denote by U [µ](v) the function so that the PDE in (41) can be written as a nonlinear transport/continuity equation: To fix ideas, the weak formulation of ( 41) is such that the following equality holds for all test functions τ ∈ C ∞ c ‫ޒ(‬ d ) and times t ∈ [0, T ] Thanks to all the smooth cutoffs from φ R 1 , φ R 1 * , and ψ R 2 and µ 0 ∈ P 2 ‫ޒ(‬ d ), we can enlarge the class of test functions to smooth functions with quadratic growth.In particular, by choosing τ (v) = |v| 2 and symmetrizing the right-hand side by swapping v ↔ v * , we see that the second moment of µ 0 is conserved along the evolution of (41).Our first step is to look at the level of the characteristic equation associated to (41).
Lemma 50 (characteristic equation).Proof.U [µ(t)]( • ) is smooth and compactly supported uniformly in t, so classical Cauchy-Lipschitz theory gives existence and uniqueness of solution v with the promised regularity.
For the estimate on the growth rate, note that U [µ] has support contained in B R 1 +1 .Points outside this ball do not change in time according to this ODE.□ We will denote by t µ the flow map associated to this ODE, so that It is known that, given ν ∈ C([0, T ]; P 2 ‫ޒ(‬ d )), the curve of probability measures µ(t) = t ν #µ 0 is a weak solution to ∂ t µ(t) = −∇ • {µ(t)U [ν(t)]}, µ(0) = µ 0 .
Here, t ν #µ 0 is the push-forward measure of µ 0 defined in duality with τ ∈ C b ‫ޒ(‬ d ) by We seek to find a fixed point to the map µ → t µ #µ 0 as it would weakly solve (41).To better understand the properties of this map, we need to establish estimates on the flow map through U as a function of time and measures.
Lemma 51 (L ∞ estimate for velocity field).There exists a constant C = C(ϵ, γ , R 1 , R 2 , µ 0 ) > 0 such that, for every T > 0 and ν ∈ C([0, T ]; P 2 ‫ޒ(‬ d )), we have Proof.Estimate for γ ≥ −2: We have the three inequalities due to the range of γ , boundedness of , and Lemma 31, respectively.These three inequalities provide the estimate where we have dropped ψ R 2 altogether.For the integral term, we apply Hölder's inequality taking advantage of the compact support of φ R 1 and the unit mass of ν t to further obtain Again, since φ R 1 has compact support, we can brutally estimate the polynomial to conclude.
Estimate for γ < −2: Unlike the previous case, we change one of the inequalities due to the unavailability of a triangle inequality and use From these inequalities and the compact support of φ R 1 , we have The next result follows exactly as in [Cañizo et al. 2011].
Lemma 52 (time continuity of flow map).Let C = C(ϵ, γ , R 1 , R 2 , µ 0 ) > 0 be the same constant from Lemma 51.Then, for any T > 0, and ν ∈ C([0, T ]; P 2 ‫ޒ(‬ d )) we have Our next objective is to establish the regularity of the flow map with respect to the measures in the subscript.To simplify the subsequent lemmas, let us use the notation in the following: The function F is smooth and compactly supported.In particular, for every k, l ∈ ‫,ގ‬ there is a constant C = C(ϵ, γ , R 1 , R 2 , µ 0 , k, l) > 0 such that More precisely, the constant C depends on µ 0 only through bounds on its second moment as in Lemma 31.
The first inequality uses a mean-value-type estimate (in the second variable of F) and the second inequality uses Cauchy-Schwarz, or equivalently, that W 2 is stronger than W 1 .
(2) As with item (1), we estimate the difference using F to find Once more, a mean-value-type estimate is applied (in the first variable of F) and we recall ν t is a probability measure.□ The next result combines both items of Corollary 54 to estimate the regularity of the flow map with respect to measures and follows exactly as in [Cañizo et al. 2011].
Lemma 56 (continuity of flow map with respect to measures).For T > 0 fix any ν 1 ,ν 2 ∈ C([0, T ]; P 2 ‫ޒ(‬ d )) and t ∈ [0, T ].With C := C 1 = C 2 the same constants in Corollary 54, we have the estimate recalling that d(ν 1 , ν 2 ) = sup t∈[0,T ] W 2 (ν 1 t , ν 2 t ).It is by now classical how to obtain Theorem 48 from Corollary 54 and Lemma 56; see [Cañizo et al. 2011;Carrillo et al. 2014;Golse 2016] for instance.The time of existence can be given by any 0 < T < (1/C) log 2, where C > 0 is chosen as in Lemma 56 and the result follows by a fixed-point argument.The extension to all times is owed to the fact that C > 0 depends on the initial data µ 0 only through its second moment.This quantity is conserved through by the evolution of (41) and so the maximal time of existence is +∞.