Exponential integrability for log-concave measures

Talagrand observed that finiteness of $\mathbb{E}\, e^{\frac{1}{2}|\nabla f(X)|^{2}}$ implies finiteness of $\mathbb{E}\, e^{\, f(X)}$ where $X$ is the standard Gaussian vector in $\mathbb{R}^{n}$ and $f$ is a smooth function with zero average. However, in this paper we show that finiteness of $ \mathbb{E}\, e^{\frac{1}{2}|\nabla f|^{2}} (1+|\nabla f|)^{-1}$ implies finiteness of $\mathbb{E}\, e^{\, f(X)}$, and we also obtain quantitative bounds \begin{align*} \log\, \mathbb{E}\, e^{\, f} \leq 10\, \mathbb{E}\, e^{\frac{1}{2}|\nabla f|^{2}} (1+|\nabla f|)^{-1}. \end{align*} Moreover, the extra factor $(1+|\nabla f|)^{-1}$ is the best possible in the sense that there is smooth $f$ with $\mathbb{E}\, e^{\,f} =\infty$ but $\mathbb{E}\, e^{\frac{1}{2}|\nabla f|^{2}} (1+|\nabla f|)^{-c}<\infty$ for all $c>1$. As an application we show corresponding dual inequalities for the discrete time dyadic martingales and its quadratic variations.

Theorem 1.1. For any n ≥ 1, we have for all f ∈ C ∞ 0 ‫ޒ(‬ n ), where X ∼ N (0, I n×n ). To see the sharpness of the factor (1+|∇ f |) −1 in (1-2), let n = 1, and let f (x) = 1 2 x 2 . Then ‫ޅ‬e f (X ) = ∞. On the other hand, for all c > 1. It remains to multiply f by a smooth cut-off function 1 |x|≤R and take the limit R → ∞.
Using standard mass transportation arguments the exponential integrability (1-2) may be extended to random vectors X having uniformly log-concave densities.
Corollary 1.2. Let X be an arbitrary random vector in ‫ޒ‬ n with density e −u(x) d x such that Hess u ≥ R I n×n for some R > 0. Then for all f ∈ C ∞ 0 ‫ޒ(‬ n ). Exponential integrability has been studied for other random vectors X as well. Let us briefly record some known results where we assume f to be real-valued with ‫ޅ‬ f (Y ) = 0. In all examples Y is uniformly distributed on the set where it is given.
where in (1-6) by the symbol D( f ) 2 we denote the discrete gradient; see [Bobkov and Götze 1999]. The estimate (1-4), also known as the Mozer-Trudinger inequality (with the best constants due to Onofri), has been critical for geometric applications [Moser 1971;Onofri 1982]. A slightly weaker version of (1-6), namely, was obtained by Efraim and Lust-Piquard [2008]. The proof of the main theorem follows from heat flow arguments. We construct a certain increasing quantity A(s) with respect to a parameter s ∈ [0, 1]. We will see that To describe the expression for A(t), let (t) = ‫(ސ‬X 1 ≤ t) be the Gaussian cumulative distribution function, and set k(x) = − log( ′ (t)/ (t)). Our main object will be a certain function F : where (k ′ ) −1 is the inverse function to k ′ (it will be explained in the next section why F is well defined). For g : ‫ޒ‬ n → (0, ∞), we consider its heat flow U s g(y) : will have the desired properties: A ′ (s) ≥ 0, A(0) = log ‫ޅ‬ g, and A(1) = ‫ޅ‬ log g + ‫ޅ‬ F(|∇g|/g). The argument gives the inequality (1-9) If we set g(x) = e f (x) with f : ‫ޒ‬ n → ‫ޒ‬ and use the chain rule |∇g|/g = |∇ f |, we obtain (1-10) The last step is to show the pointwise estimate F(s) ≤ 10e s 2 /2 (1 + s) −1 for all s ≥ 0. We remark that the obtained inequality (1-10) is stronger than (1-2) and it should be considered as a corollary of (1-10); however, due to a complicated expression for F we decided to state the main result in the form of (1-2). The computation of A ′ (s) is technical and is done in Section 2C, where we also explain how the expression A(t) was "discovered". We should note that the main reason that makes A ′ ≥ 0 is the fact that k ′ /k ′′ > 0 and the inequality 1 1 − k ′′ − k ′ e k ≥ 0, which for k = − log( ′ (t)/ (t)) serendipitously turns out to be an equality. Sections 2A and 2B are technical and can be skipped when reading the paper for the first time. In these sections we show that F ∈ C 2 ([0, ∞)) is an increasing convex function with values F(0) = F ′ (0) = 0 and F ′′ (0) = 1. Furthermore, the modified hessian matrix of is positive semidefinite: (1-12) In Section 2C we demonstrate that the condition (1-12) implies the inequality for all smooth bounded g : ‫ޒ‬ n → (0, ∞). At the end of Section 2C, we deduce Theorem 1.1 and Corollary 1.2 from (1-13).
Step 1: an implicit function F and its properties. Let Define a real-valued function F as k ′′ (s)e k(s) ds for all t ∈ ‫.ޒ‬ (2-1) The following lemma, in particular, shows that F is well defined.
Lemma 2.1. We have x as x → ∞, and k ′′ > 0 (and hence k ′ > 0); Proof. Let us investigate the asymptotic behavior of k and its derivatives at x = −∞. Let x < 0, and for m ≥ 0 define Integration by parts reveals I m = −x −(m+1) − (m + 1)I m+2 . By iterating we obtain is trivial. Next, we show that k ′′ > 0. By elementary calculus we have Since v(−∞) = 0 and v ′ > 0, we obtain v(x) > 0 for all x ∈ ‫.ޒ‬ In particular u ′ > 0, and taking into account that u(−∞) = 0, we conclude u(x) > 0 for all x ∈ ‫.ޒ‬ To verify the second part of the lemma notice that F(0) = 0 by considering the limit as t → −∞ in (2-1). Taking the derivative in t of (2-1) and dividing both sides by k ′′ > 0 we obtain F ′ (k ′ ) = e k . Considering the limit as t → −∞ we realize F ′ (0) = 0. Taking the second derivative gives ޒ‬ After a suitable change of variables in (2-1), we write which coincides with the expression announced in (1-8).
Next, we claim the simple chain of inequalities Indeed, inequality (A) follows from the fact that it is true at x = 0 and In contrast, inequality (B) is immediate. Therefore, we conclude that Step 2: Monge-Ampère type PDE. Define Clearly M ∈ C 2 and M y (x, 0) = 0, where M x = ∂ M/∂ x and M y = ∂ M/∂ y. Next, let us consider the matrix We claim the following: Proof. Let us calculate the partial derivatives of M. Let t := yx −1 . We have To see that A(x, y) is positive semidefinite, it suffices (due to the inequality M yy > 0) to check that det(A) = 0. We have Step 3: the heat flow argument. First we would like to give an explanation for how the flow is constructed. For simplicity consider n = 1. If we succeed in proving the inequality where ξ ∼ N (0, 1) and M(x, y) = log x + F(y/x), then we obtain which for g = e f coincides with (1-10). So the goal is to prove (2-4). We consider a discrete approximation of ξ , namely, let where the ε j are i.i.d. symmetric Bernoulli ±1 random variables. By the central limit theorem, We hope to prove the hypercube analog of (2-4), i.e., for all m ≥ 1, where the discrete gradient |Dg(⃗ ε)| := m j=1 |D jg (⃗ ε)| 2 is defined as follows: One sees that as m → ∞ we have at least for bounded smooth functions g with uniformly bounded derivatives. Thus taking the limit m → ∞ we observe that the right-hand side of (2-5) converges to the right-hand side of (2-4); in particular, (2-5) implies (2-4). Next, we take this one step further and consider the inequality (2-5) for allg : {−1, 1} m → ‫ޒ‬ instead of the specific functions defined in (2-5); in doing so we are ever so slightly enlarging the class of test functions to include those that are not invariant with respect to permutations of (ε 1 , . . . , ε n ).
Upon closer inspection, we see that (2-7) follows 2 from the 4-point inequality for all real numbers x, y, a, b such that x ± a > 0. To prove (2-8) for one specific M seems to be a possible task; however, if we take into account that M is defined by (2-2) which involves an implicitly defined F, the 4-point inequality (2-8) becomes complicated (see [Ivanisvili and Volberg 2020], where one such inequality was proved for M(x, y) = −ℜ(x + i y) 3/2 by tedious computations involving high degree polynomials with integer coefficients). Expanding (2-8) at the point (a, b) = (0, 0) via Taylor series, one easily obtains a necessary assumption: the infinitesimal form of (2-8), i.e., Of course, the infinitesimal condition (2-9) does not necessarily imply its global two-point inequality (2-8) (and in particular (2-6)). Also, it may seem implausible to believe that the positive semidefiniteness of (2-9) implies the inequality (2-4) in Gauss space. Surprisingly this last guess turns out to be correct, and 2 In fact they are equivalent provided that y → M(x, y) is nondecreasing.
perhaps the reason lies in the fact that one only needs to verify (2-5) as m → ∞ (and only for symmetric functionsg). Let us "take the limit" and see how the heat flow arises. Let ‫ޅ‬ m−k be the average with respect to the variables ε 1 , . . . , ε m−k , and let ‫ޅ‬ k be the average with respect to the remaining variables ε m−k+1 , . . . , ε m . Then the 4-point inequality (2-7) implies that where X, Y ∈ N (0, 1) are independent and ‫ޅ‬ X takes the expectation with respect to the random variable X . In other words, if we let U s g(y) = ‫ޅ‬ g(y + √ s X ) to be a heat flow defined as (2-10) Luckily we may ignore all the steps by starting from the map (2-10) and taking its derivative in s to divine when it has nonnegative sign. Slightly abusing the notations, denote D = ∂/∂ x, and, for simplicity, let us work with the map s → U s M(U 1−s g, √ sU 1−s g ′ ), where we omit the absolute value in the second argument of M. Let b = U 1−s g. Clearly db/ds = − 1 2 D 2 b. We have Notice that It remains to extend the argument to higher dimensions and put the absolute value back into the second argument of M. for all smooth bounded functions g : ‫ޒ‬ n → (0, ∞) with uniformly bounded first and second derivatives.

Applications: the proofs of Theorem 1.3 and estimate (1-15)
Let us recall the definition of dyadic martingales. For each n ≥ 0 we denote by D n the dyadic intervals belonging to [0, 1) of level n, i.e., D n = k 2 n , k + 1 2 n , k = 0, . . . , 2 n − 1 . here |I | denotes the Lebesgue length of I. If we let F n be the σ -algebra generated by the dyadic intervals in D n , then ξ n = ‫(ޅ‬ξ | F n ) is the martingale with respect to the increasing filtration {F k } k≥0 . Next we define the quadratic variation where d n := ξ n − ξ n−1 is the martingale difference sequence. In what follows, to avoid the issues with convergence of the infinite series we will be assuming that all but finitely many d n are zero, i.e., ξ N = ξ N +1 = · · · = ξ for N sufficiently large. Such martingales we call simple dyadic martingales; they are also known as Walsh-Paley martingales [Hytönen et al. 2016].
Lemma 3.1. For all real numbers p, a, t we have provided that p ± a > 0 and t ≥ 0.
Next, consider the process where B s is the standard Brownian motion starting at zero. It follows from Ito's formula that X s is a martingale. Indeed, we have d X s = N s ds + N p d B s + 1 2 N pp ds  = N p d B s . Define the stopping time τ = inf{s ≥ 0 : B s / ∈ (−a, a)}.
Set Y s = X min{s,τ } for s ≥ 0. Clearly Y s is a martingale. On the one hand Y 0 = N ( p, t). On the other hand , (by concavity of t → N ( p, t)). Finally, as B 2 s − s is a martingale, we have that 0 = ‫(ޅ‬B 2 τ − τ ) = a 2 − ‫ޅ‬τ . By symmetry we obtain ‫(ޅ‬τ | B τ = −a) = ‫(ޅ‬τ | B τ = a) = a 2 . Thus the lemma follows from the optional stopping theorem. □ Before we complete the proof of Theorem 1.3 let us make a remark. If N ( p, t) is an arbitrary smooth function satisfying the backwards heat equation (3-2) and inequality (3-1), then t → N ( p, t) must be concave. In other words, the concavity of t → N ( p, t) is necessary and sufficient for the inequality (3-1) to hold provided that N solves the backwards heat equation. Indeed, let r (a) = N ( p + a, t + a 2 ). Then r ′′ (a) = N pp + 4a N pt + 2N t + 4a 2 N tt  = 4a N pt + 4a 2 N tt , r ′′′ (a) = 4N pt + 4a N ppt + 8a 2 N ptt + 8a N tt + 4a 2 N tt p + 8a 3 N ttt (3-2) = 4N pt + 12a 2 N ptt + 8a 3 N ttt , r ′′′′ (a) = 4N pt p + 8a N ptt + 24a N ptt + 12a 2 N ptt p + 24a 3 N pttt + 24a 2 N ttt + 8a 3 N ttt p + 16a 4 N tttt By Taylor's formula we have Thus it follows from (3-2) that Now we are ready to complete the proof of Theorem 1.3. Let N ≥ 0 be such that ξ N = ξ N +1 = · · · = ξ . We have Notice that the random variables are F N −1 measurable. Yet on each atom Q of F N −1 the random variable ξ N −1 − ξ N takes values ±A with equal probabilities 1 2 |Q|. Then it follows from (3-1) that Iterating this inequality and using the boundary value N ( p, 0) = log p for p > 0, we obtain This finishes the proof of Theorem 1.3. □ Inequality (1-15) follows from the following lemma.
Lemma 3.2. We have that for all y > 0.

Concluding remarks
One may ask how we guessed N ( p, t) which played an essential role in the proof of Theorem 1.3. There is a general argument [Ivanisvili et al. 2018] which informally says that any estimate in Gauss space (or more generally on the hamming cube) involving f and its gradient has a corresponding dual estimate for a stopped Brownian motion and its quadratic variation (or more generally dyadic square function). which, if true, implies that M satisfies the 4-point inequality (2-8); see [Ivanisvili et al. 2017;2018] for more details. These functions M(x, y) and u( p, t) we call dual to each other. One may verify that for our particular M defined by (1-11), the corresponding dual u( p, t) is u( p, t) = 1 + log(− p) + ∞ − p/ √ t ∞ s r −2 e (−r 2 +s 2 )/2 dr ds, p < 0, t ≥ 0, which coincides with N ( p, t) after subtracting 1 and reflecting in the variable p.
So, one may hope to obtain Theorem 1.1 on the hamming cube after substituting g = e f . However, we did not proceed with this path on the unfortunate grounds that the chain rule misbehaves on the hamming cube, i.e., the identity |De f |/e f = |D f | does not hold. Therefore, to prove (1-2) on the hamming cube perhaps different ideas are needed.