Eindhoven University of Technology · Large deviation and variational approaches to generalized...

Large deviation and variational approaches togeneralized gradient flows

Manh Hong Duong

Large deviation and variational approaches togeneralized gradient flows

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan deTechnische Universiteit Eindhoven, op gezag van de

Rector Magnificus, prof.dr.ir. C.J. van Duijn, voor eencommissie aangewezen door het College voor

Promoties in het openbaar te verdedigenop donderdag 25 september 2014 om 16.00 uur

door

Manh Hong Duonggeboren te Bac Giang, Vietnam

Dit proefschrift is goedgekeurd door de promotor:

prof.dr. Mark A. Peletier

Copromotor:

dr. Johannes Zimmer

ISBN: yyy-yy-yyy-yyyy-y

Reproduction: Universiteitsdrukkerij Technische Universiteit Eindhoven

c© Copyright 2014, Manh Hong DuongAll rights reserved. No part of this publication may be reproduced, stored in a retrievalsystem, or transmitted, in any form or by any means, electronic, mechanical, photocopy-ing, recording or otherwise, without the prior written permission from the copyright owner.

Contents

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 The central equation . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Equations treated in the thesis . . . . . . . . . . . . . . . . . . . . 2

1.2 Aim of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Part I: Generalized gradient flows and large-deviation principle . . . . . . . . . . 51.3 Generalized gradient flow structure of the main equation . . . . . . . . . . 51.4 Main questions of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Large-deviation principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Approximation schemes from small-noise large deviation . . . . . . . . . . 121.7 Wasserstein gradient flow structure of the Fokker-Planck equation . . . . . 131.8 GENERIC structure of the Vlasov-Fokker-Planck equation . . . . . . . . . 141.9 Generalizations: nonlinear diffusion and the thermo-visco-elasticity . . . . 16

1.9.1 The porous medium equation . . . . . . . . . . . . . . . . . . . . . 161.9.2 Thermo-visco-elasticity equation as the hydrodynamic limit of a par-

ticle system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.10 Summary of Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Part II: Coarse-graining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.11 General framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.12 Main questions of Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.13 Qualitative coarse-graining from large-deviation principle . . . . . . . . . . 241.14 Quantitative rate of convergence to the hydrodynamic limits . . . . . . . . 271.15 Summary of Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

I Generalized gradient flows and large-deviation principle 31

2 Approximation schemes for a generalized Kramers equation 332.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.1.1 The Kramers equation . . . . . . . . . . . . . . . . . . . . . . . . . 332.1.2 Variational evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 342.1.3 A combination of conservative and dissipative effects . . . . . . . . 342.1.4 Huang’s discrete schemes for the Kramers equation . . . . . . . . . 35

v

vi Contents

2.1.5 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.1.6 The schemes of this chapter . . . . . . . . . . . . . . . . . . . . . . 38

2.1.7 Organization of the chapter . . . . . . . . . . . . . . . . . . . . . . 40

2.2 Assumptions and main result of the chapter . . . . . . . . . . . . . . . . . 40

2.3 Properties of the three cost functions . . . . . . . . . . . . . . . . . . . . . 42

2.4 The Euler-Lagrange equation for the minimization problem . . . . . . . . . 47

2.4.1 Schemes 2a and 2b . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.4.2 Scheme 2c . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

2.5 A priori estimate: Boundedness of the second moment and entropy . . . . 52

2.6 Proof of Theorem 2.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

2.7 Conclusion and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3 Wasserstein gradient flows from large deviations of many-particle limits 65

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.1.1 Main result of the chapter . . . . . . . . . . . . . . . . . . . . . . . 66


3.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.2.1 Continuous and absolutely continuous curves . . . . . . . . . . . . . 68

3.2.2 The tangent space . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.2.3 Relevant functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.2.4 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.2.5 Gamma-convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.3 Large deviations of trajectories . . . . . . . . . . . . . . . . . . . . . . . . 71

3.4 Lower bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.5 Recovery sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4 GENERIC structure of the Vlasov-Fokker-Planck equation 91

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.1.1 The Vlasov-Fokker-Planck equation . . . . . . . . . . . . . . . . . . 91

4.1.2 Aim of the chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . 92


4.2 Large deviations for the VFP equation . . . . . . . . . . . . . . . . . . . . 93

4.3 The VFP equation and the large deviations in GENERIC form . . . . . . . 99

4.3.1 GENERIC formalism . . . . . . . . . . . . . . . . . . . . . . . . . . 99

4.3.2 Making the VFP equation conserve energy . . . . . . . . . . . . . . 100

4.3.3 The VFP equation as a GENERIC system . . . . . . . . . . . . . . 101

4.3.4 Large deviations for the VFP equation in GENERIC form . . . . . 102

4.4 A variational formulation for GENERIC systems . . . . . . . . . . . . . . . 104

4.5 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.6 Interpretation of the GENERIC properties . . . . . . . . . . . . . . . . . . 106

4.7 The generalized VFP equation . . . . . . . . . . . . . . . . . . . . . . . . . 110

Contents vii

5 q−Gaussians and the Wasserstein gradient flow structure of the PME 1135.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.2 Properties of q-Gaussian measures . . . . . . . . . . . . . . . . . . . . . . . 116

5.2.1 q-Gaussian measures and solutions of the porous medium equation 1165.2.2 q-Gaussian and the Wasserstein metric . . . . . . . . . . . . . . . . 118

5.3 Computing the functional Jh . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.4 Proof of the main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6 Microscopic derivation of the thermo-visco-elasticity system 1316.1 The thermo-visco-elasticity system . . . . . . . . . . . . . . . . . . . . . . 1316.2 GENERIC structure of the TVE . . . . . . . . . . . . . . . . . . . . . . . 1316.3 A microscopic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1326.4 Hydrodynamic limits (HDL) . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.4.1 HDL for un . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.4.2 HDL for pn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366.4.3 HDL for en . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

II Coarse-graining 141

7 Qualitative coarse-graining from large-deviation principle 1437.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1.1 General framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.1.2 Coarse-graining from large-deviation principle . . . . . . . . . . . . 1447.1.3 Organization of the chapter . . . . . . . . . . . . . . . . . . . . . . 145

7.2 From a perturbed Hamiltonian system to diffusion on a graph . . . . . . . 1457.2.1 The case of one degree of freedom and single-well potential . . . . . 1467.2.2 Discussion on the case of many degrees of freedom and multi-well

potential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1517.3 From the Kramers equation to the Fokker-Planck equation . . . . . . . . . 153

8 The two-scale approach to hydrodynamic limits for non-reversible dy-namics 1618.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1618.2 Framework and main results . . . . . . . . . . . . . . . . . . . . . . . . . . 163

8.2.1 Abstract setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1638.2.2 Application to spin systems . . . . . . . . . . . . . . . . . . . . . . 167

8.3 Proof of the abstract results . . . . . . . . . . . . . . . . . . . . . . . . . . 1708.3.1 Proof of Theorem 8.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . 1708.3.2 Sketch of proof of Theorem 8.2.5 . . . . . . . . . . . . . . . . . . . 181

8.4 Application part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

viii Contents

Summary 192

Curriculum Vitae 195

Publications 196

Acknowledgments 199

Chapter 1

Introduction

1.1 Overview

1.1.1 The central equation

Complex phenomena in nature and science are often described by partial differentialequations. The diffusion equation, which describes how substances spread out in space, isperhaps the most typical example. The present thesis deals with mathematical analysis ofsuch equations in particular of the following form

∂tρ(t) = L(ρ(t))∗ρ(t), t > 0,

ρ(0) = ρ0.(1.1)

The initial value ρ0 is a probability density on RN . L(ρ(t))∗ will be the (formal) adjointoperator of the generator of a Markov process. It is in divergence form and as a consequenceρ(t) is also a probability density on RN for almost every fixed time t. Note that in generalL(ρ(t))∗ depends on ρ(t); therefore, (1.1) is a nonlinear equation. However, in this thesiswe are mainly interested in weakly nonlinear systems in the sense that the nonlinear termsin L(ρ(t))∗ (if any) are of convolution forms.

The link between Equation (1.1) and stochastic differential equations is central to theanalysis of this thesis. Equation (1.1) can be derived from stochastic differential equationsin two ways. On the one hand, it is the forward Kolmogorov equation of a stochasticdifferential equation on RN of the form (see for instance, [ØB03, Chapter 8])

dX(t) = B(X(t),Law(X(t))) dt+ σ(X(t)) dW (t), (1.2)

where B : RN × P(RN) → RN is a drift vector, σ : RN → RN×N is a diffusion matrix,and W (t) is the standard N -dimensional Wiener process. Law(X(t)) stands for the lawof X(t). In this derivation, ρ(t) is the density of the law of X(t). L(ρ)∗ in the right handside of (1.1) is the formal dual of the generator L(ρ), which acts on smooth functions onRN , of the stochastic process X(t). L(ρ) can be computed explicitly in terms of the drift

1

2 Introduction

vector and the diffusion matrix

L(ρ)f(x) =N∑

i,j=1

Aij(x)∂2

∂xi∂xjf(x) +

N∑i=1

Bi(x, ρ)∂

∂xif(x),

where A(x) = 12σ(x)σ∗(x).

On the other hand, it is the many-particle (or hydrodynamic) limit of the followingsystem of weakly interacting stochastic particles

dXi(t) = B(Xi(t), ρn(t)) dt+ σ(Xi(t)) dWi(t), i = 1, . . . , n, (1.3)

where W1(t), . . . ,Wn(t) are independent Wiener processes and

ρn(t, dx) =1

n

n∑i=1

δXi(t)(dx) (1.4)

denotes the empirical measure of (X1(t), . . . , Xn(t)). The hydrodynamic limit statement isthat at each t, the empirical measure ρn(t) almost surely converges in the narrow topologyof probability measures (i.e. in duality with continuous and bounded functions) to theunique solution of (1.1) (see for instance, [Dud89, Theorem 11.4.1], [Oel84]).

1.1.2 Equations treated in the thesis

In this thesis, we focus on two important classes depending on the relation between thedrift vector and the diffusion matrix,

(C1) B(x) = −A(x)∇H(x),

(C2) B(x, µ) = −1θA(x)∇H(x) + J [∇H(x) + (∇ψ ∗ µ)(x)],

where H, ψ : RN → R are given smooth functions, J ∈ RN×N is an anti-symmetric matrix,and θ is a constant. More specifically, the following equations are treated in this thesis

(1) The Vlasov-Fokker-Planck (VFP) equation (case (C2)), with

N = 2d, x = (q, p)T , H(q, p) = V (q) +p2

2m, ψ(x) = ψ(q),

A = γθ

(0 00 Id

), J =

(0 Id−Id 0

), θ = kTa,

where k is the Boltzmann constant, Ta is the absolute temperature. It is more conve-nient to use the explicit form of the VFP,

∂tρ = − divq

(ρp

m

)+ divp ρ

(∇qV +∇qψ ∗ ρ+ γ

p

m

)+ γθ∆pρ. (1.5)

1.1. Overview 3

In this equation, the spatial domain is R2d with coordinates (q, p), with q and p eachin Rd. We use subscripts as in divq and ∆p to indicate that the differential operatorsact only on those variables. The unknown is a time-dependent probability measureρ : [0, T ] → P(R2d); the functions V = V (q) and ψ = ψ(q) are given, as are thepositive constants γ, m, and θ. The convolution ψ ∗ ρ is defined by (ψ ∗ ρ)(q) =∫R2d ψ(q − q′)ρ(q′, p′) dq′dp′.

Equation (1.5) arises as the many particle limit of a collection of interacting Brownianparticles with inertia, described by the following stochastic differential equation

dQi(t) =Pi(t)

mdt, (1.6a)

dPi(t) = −∇V (Qi(t)) dt−n∑j=1

∇ψ(Qi(t)−Qj(t))−γ

mPi(t) dt+

√2γθ dWi(t).

(1.6b)

Here Qi and Pi are the position and momentum of particle i = 1, . . . , n, with the samemass m. These two equations describe the movement of this particle under a fixedpotential V , an interaction potential ψ, a friction force (the drift term −γPidt/m) anda stochastic forcing described by the n independent d-dimensional Wiener measures Wi.

Both the friction force and the noise term arise from collisions with the solvent, and theparameter γ in both terms characterizes the intensity of these collisions. The parameterθ = kTa, where k is the Boltzmann constant and Ta is the absolute temperature,measures the mean kinetic energy of the solvent molecules, and therefore characterizesthe magnitude of the collision noise.

(2) The Kramers equation1 (case (C2)), which is a special case of the VFP with ψ ≡ 0,

∂tρ = − divq

(ρp

m

)+ divp ρ

(∇qV + γ

p

m

)+ γθ∆pρ. (1.7)

The particle system (1.6) becomes

dQi(t) =Pi(t)

mdt, (1.8a)

dPi(t) = −∇V (Qi(t)) dt−γ

mPi(t) dt+

√2γθ dWi(t). (1.8b)

(3) The Fokker-Planck equation (case (C1)), with

N = d, x ∈ Rd, A = Id.

Its explicit form is the following

∂tρ = div(ρ∇H) + ∆ρ. (1.9)

1In Chapter 2, we actually study a slightly generalized Kramers equation, where the friction term isnot necessarily linear. However, the readers can just simply think of the Kramers equation.

4 Introduction

The particle system (1.3) in this case is

dXi = −∇H(Xi) dt+√

2 dWi, i = 1, . . . , n. (1.10)

(4) A general non-reversible dynamic, with (case (C2))

θ = 1, ψ ≡ 0.

Equation (1.1) for this case is

∂tρ = div[ρ(J + A)∇H + A∇ρ

]. (1.11)

In particular, this includes a weakly asymmetric version of the Ginzburg-Landau modelendowed with Kawasaki dynamics, where

A = N2

2 −1 −1−1 2 −1

. . . . . . . . .

−1 2 −1−1 −1 2

, J =N

2

0 1 −1−1 0 1

. . . . . . . . .

−1 0 11 −1 0

.

All these equations play a central role in statistical physics and/or chemistry. The Kramersequation was originally introduced as a simplified model for chemical reactions [Kra40] andhas become an important equation in this field [HTB90]. The Vlasov-Fokker-Planck equa-tion is used as a model of diffusive, confined and interacting stellar or charged matter whenψ respectively derives from the Newton and Coulomb potential, see for instance [BD95].The Fokker-Planck equation appears in a variety of fields of science and engineering [Ris89].The Ginzburg-Landau model endowed with Kawasaki dynamics is a kind of archetypalmodel for interacting particle systems and has been studied by many authors, see forinstance [GPV88, GOVW09].

We remark that the drift vector B and diffusion matrix A may depend on some pa-rameters such as the friction coefficient γ as in the VFP and the Kramers equations or thedimension N as in the weakly asymmetric Ginzburg-Landau model.

1.2 Aim of the thesis

The present thesis deals with mathematical analysis of Equation (1.1) from variousaspects and is divided into two parts.

As will be explained in the next section, all equations treated in this thesis can be seenas generalized gradient flows. We use this terminology to indicate that there naturallyexists a functional that decreases in time along solutions of the equation (1.1) akin toentropy/free-energy for classical gradient flows. However, equations belonging to class(C2) also contain additional conservative effects and therefore, we use the terminology

1.3. Generalized gradient flow structure of the main equation 5

of generalized gradient flows. The first part of this thesis is devoted to understand thisstructure using the theory of large-deviation principle.

The macroscopic equation (1.1) often depends on a certain parameter denoted hereby ε. For instance, ε corresponds to the friction coefficient γ in the Kramers equation or1/N in the Ginzburg-Landau model. In the second part of the thesis, we are interested inderiving the limiting system as ε→ 0. This procedure is often known as coarse-graining.

The two parts are connected. Studying and exploiting the relation between generalizedgradient flows and large-deviation principle in the first part provides new understandingand techniques for coarse-graining in the second part.

We now go to a more detailed description of the content of the thesis. The rest of thechapter is divided into two parts. Each part is structured as follows. First, we describea general framework. Then, we pose main questions of the part. Next, we show how toanswer these questions in more detail by summarizing each chapter. Finally, we give ashort overview of the part.

Part I: Generalized gradient flows and large-deviation

principle

1.3 Generalized gradient flow structure of the main

equation

For simplicity, we discuss in this section the case ψ ≡ 0, θ = 1. The discussion alsoholds for general ψ with some modification. A common feature of the two classes (C1) and(C2) is that they both contain a dissipative (irreversible) effect: the free energy functional

F(ρ) =

∫RN

ρ(x)(H(x) + log ρ(x))dx (1.12)

decreases in time along solutions of (1.1). Indeed, suppose that ρ(t) is a solution of (1.1),the time-derivative of t 7→ F(ρ(t)) is

d

dtF(ρ(t)) =

∫RN

(H(x) + log ρ(t, x) + 1)∂tρ(t, x) dx

= −∫RN

1

ρA(ρ∇H +∇ρ) · (ρ∇H +∇ρ) dx ≤ 0.

Note that this computation holds for the case (C2) due to the anti-symmetry property ofJ .

The distinction between the two classes is that class (C2) contains an additional con-servative (reversible) effect. The conservative behavior can be recognized in the anti-symmetric part in the drift vector B: Equation (1.1) with A = 0 conserves the expectationof the Hamiltonian

∫RN H(x)ρ(t, x) dx.

6 Introduction

After some modification, all equations treated in this thesis can actually be seen asspecial cases of a more general class of systems that unifies both conservative and dissipativeeffects, namely the GENERIC2 (General Equation for Non-Equilibrium Reversible andIrreversible Coupling [Ott05]). A GENERIC equation for an unknown z in a state spaceZ is a mixture of both reversible and dissipative dynamics:

∂tz = L(z) dE(z) + M(z) dS(z). (1.13)

Here

• E, S : Z→ R are interpreted as energy and entropy functionals.

• dE, dS are appropriate derivatives of E and S (such as either the Frechet derivativeor a gradient with respect to some inner product).

• L = L(z) is for each z an antisymmetric operator satisfying the Jacobi identity

F1,F2L,F3L + F2,F3L,F1L + F3,F1L,F2L = 0, (1.14)

for all functions Fi : Z → R, i = 1, 2, 3, where the Poisson bracket ·, ·L is definedvia

F,GL := dF · L dG. (1.15)

• M = M(z) is symmetric and positive semidefinite.

Moreover, the building blocks L,M,E, S are required to fulfill the degeneracy conditions :for all z ∈ Z,

L dS = 0, MdE = 0. (1.16)

As a consequence of these properties, energy is conserved along a solution, and entropy isnon-decreasing3:

d

dtE(z(t)) = dE · dz

dt= dE · (L dE + MdS) = 0,

d

dtS(z(t)) = dS · dz

dt= dS · (L dE + MdS) = dS ·MdS ≥ 0.

A GENERIC system is then fully characterized by the quintuple Z,E, S, L,M. Thepreceding discussion on GENERIC is formal since we have not specified the state space andthe meaning of the derivatives, see Remark 4.3.1 for a more detailed discussion. However,many rigorous results have been obtained for class (C1). In particular, in the seminalpaper [JKO98], the authors proved that the Fokker-Planck equation is a gradient flow ofthe free energy F with respect to the Wasserstein metric. Since the result of [JKO98]

2Since GENERIC framework imposes many conditions on its coefficients, we prefer to use the termi-nology of generalized gradient flows to describe Equation (1.1). GENERIC will be studied in more detailin Chapters 4 and 6.

3Note that the physical entropy is negative of the mathematical entropy.

1.3. Generalized gradient flow structure of the main equation 7

plays an essential role in the thesis, we recall it here. We first need to introduce theWasserstein metric. We denote by P2(Rd) :=

ρ ∈ P(Rd) :

∫|x|2 ρ(dx) <∞

the space of

probability measures on Rd with finite second moment. The Wasserstein distance betweentwo probability measures ρ0, ρ1 ∈ P2(Rd) is defined via

W 22 (ρ0, ρ1) = inf

γ∈Γ(ρ0,ρ1)

∫R2d

|x− y|2dγ, (1.17)

where Γ(ρ0, ρ1) is the set of all probability measures on R2d that have ρ0 and ρ1 as thefirst and the second marginals respectively. We note that convergence in the Wassersteintopology is equivalent to narrow convergence together with convergence of second moments[Vil03, Th. 7.12]. The main result in [JKO98] can be summarized as follows.

Theorem 1.3.1 ([JKO98]). The solution t 7→ ρ(t, x) of the Fokker-Planck equation can beapproximated by the time-discrete sequence ρk defined recursively by

ρk ∈ argminρ

Kh(ρ, ρk−1), Kh(ρ, ρk−1) :=1

2hW2(ρ, ρk−1)2 + F(ρ)−F(ρk−1). (1.18)

Note that at time step k, ρk−1 is already known, so F(ρk−1) is a constant and addingit to Kh does not affect the minimizing problem. The approximation scheme (1.18) is nowknown as the JKO-scheme in the literature although the idea goes back to De Giorgi. TheWasserstein gradient flow theory has developed tremendously in the last two decades. Notonly many partial differential equations have been proven to be Wasserstein gradient flowsbut also the theory links many branches of mathematics together such as optimal transport,fluid dynamics, differential geometry, etc [AGS08, Vil03, Vil09]. In particular, Benamouand Brenier [BB00] provided an alternative formulation of the Wasserstein distance asfollows

W 22 (ρ0, ρ1) = inf

(ρ,v)

∫ 1

0

∫Rd

|v(t, x)|2ρ(t, x) dx dt, (1.19)

where the infimum is taken over all couples of time-dependent probability and velocityfields (ρ(t, x), v(t, x)) that satisfy

∂tρ+ div(ρv) = 0,

ρ(0) = ρ0, ρ(1) = ρ1.

Using this formulation, the Wasserstein distance for empirical measures of particles can beinterpreted as the minimal energy dissipated by moving those particles through a viscousfluid [Pel14, Section 5.1].

The Benamou-Brenier formulation deduces the following norm in the tangent space ofthe space of probability measures (see [FK06, Appendix D5] for more detail),

‖s‖2−1,ρ = inf

∫|v|2 dρ : s+ div(ρv) = 0

. (1.20)

8 Introduction

The Fokker-Planck equation can be recast into the GENERIC form as follows

∂tρ = M(ρ) dS(ρ),

with S(ρ) := −F(ρ), M(ρ)ξ := div(ρ∇ξ), dS(ρ) := δFδρ

(ρ). Thanks to the Benamou-Brenier theorem, this formulation coincides with the Wasserstein gradient flow structureof the Fokker-Planck equation.

1.4 Main questions of Part I

The result of Jordan, Kinderlehrer and Otto [JKO98] sparked the research of this thesis.As we have seen in the previous section, many partial differential equations are not gradientflows but contain both conservative and dissipative effects. The JKO-scheme can not beapplied to such systems since it is only for gradient flows. It is thus natural to ask thefollowing question.

Question 1: Can we construct approximation schemes for a system thatincludes both conservative and dissipative effects?

Among all systems that contain both conservative and dissipative effects, the Kramersequation would be the first interesting one to work with because of two reasons. First, itdescribes the evolution of the distribution of a Brownian particle with inertia (i.e., the fullLangevin dynamics). The Fokker-Planck equation corresponds to the overdamped regime.Secondly, it is a degenerate diffusion equation in the sense that the diffusion is presentonly in the momentum variable. This makes the mathematical analysis of the Kramersequation non-trivial.

We recall that the JKO-scheme consists of two terms: a driving functional F thatdecreases along solutions, and a dissipation mechanism W2 that describes how F dissipates.The schemes under discussion, ideally, should respect the conservative-dissipative split inthe following way: the driving functional is substituted by a functional that decreases inthe dissipative part, and the dissipation mechanism is replaced by an appropriate optimaltransportation functional between two probability measures that takes into account theconservative effect. The major difficulty is to construct such an optimal transportationfunctional instead of the Wasserstein metric. To motivate the construction, we recall thatthe cost function associated to the Wasserstein metric, c(x, y) := |x − y|2, can be writtenas a minimum of the integral of squared velocity,

c(x, y) = h inf

∫ h

0

∣∣∣ξ(t)∣∣∣2 dt : ξ ∈ C1([0, h],Rd) such that ξ(0) = x, ξ(h) = y

.

(1.21)This equality can be verified by solving the Euler-Langre equation associated to the mini-mizing problem. The optimal path is the straight line connecting x and y. As a result, theright hand side is equal to |x−y|2. Note that the cost function c(x, y) is independent of thetime step h. We also see that there is no inertial effect in the minimizing formulation above.

1.4. Main questions of Part I 9

This is compatible with the fact that the Fokker-Planck equation is the overdamped limitof the Kramers equation where inertia is negligible. This discussion suggests that in orderto construct an approximation scheme for the Kramers equation, one needs to incorporateinertia into the minimizing problem that defines a cost function between two points in theambient space. We will show later that the cost function depends on the time step h in anon-trivial way.

We postpone the construction of the cost function to the next sections. Now we moveto the second question.

Question 2: Why the JKO-scheme?

The JKO-scheme is remarkable since it provides a rigorous mathematical proof for thewell-accepted notion in physics that the diffusion process is driven by the entropy. Henceit is not surprising that the free energy should decay along solutions of the Fokker-Planckequation. However, it is not intuitively clear why the dissipation of free energy must bedescribed by the Wasserstein metric. The appearance of the Wasserstein metric in thisformulation is intriguing since it was originally introduced in a completely different field ofmathematics. How do the Wasserstein metric and the diffusion process connect together?We recall that the fundamental solution of the diffusion process is given by the Gaussiankernel

G(t, x, y) =1

(4πt)d/2e−|x−y|2

4t . (1.22)

Looking at the two formulas (1.17) and (1.22) we find one common factor: the squaredistance |x− y|2. Is this simply a coincidence? Is there a deep connection behind?

Understanding the origin of the Wasserstein metric in the JKO-scheme is desirablebecause of its ubiquity as mentioned after Theorem 1.3.1.

Of course, one also can ask similar questions for the approximation schemes that comeup in Question 1. We pose this in a more general form.

Question 3: How can we derive GENERIC structures?

A systematic approach for this question was developed in the work of Ottinger and cowork-ers [Ott05, Chapter 6]. The above question aims at obtaining a mathematically rigorousapproach. This is non-trivial due to the non-linearity and generality of the GENERICframework. A first reasonable approach would be studying case by case. Answering thisquestion would lead to a deeper understanding of a large class of partial differential equa-tions and certainly would provide techniques for many purposes especially mathematicalmodelling.

As has been shown in Section 1.3, all partial differential equations treated in this thesiscan be derived as the hydrodynamic limits of underlying particle systems. However, thehydrodynamic limit provides only the equation, but not the structure of the equation. Toobtain the equation together with its structure, we thus need to look at a stronger resultthan the hydrodynamic limit. Surprisingly enough, the answers for all these questions canbe found from one theory: the large-deviation principle of stochastic processes. We nowbriefly summarize this theory and apply it to answer the questions above for the equationslisted in Section 1.1.2.

10 Introduction

1.5 Large-deviation principle

We first recall some general information on large-deviation principles since we will applythem in different contexts. For more detail on the theory, we refer to the books [DZ87,FK06].

The theory of large deviations is concerned with the asymptotic behaviour of sequencesof probability measures. Let X be a Polish space (i.e. a complete separable metric space).Let µk be a sequence of probability measures on X. µk satisfies a large-deviationprinciple if there is a lower semicontinuous function I : X → [0,∞] such that the followingtwo conditions hold

(i) For every open set O ⊂ X,

lim infk→∞

1

klog µk(O) ≥ − inf

x∈OI(x).

(ii) For every closed set G ⊂ X,

lim supk→∞

1

klog µk(G) ≤ − inf

x∈GI(x).

I is called the rate function. It is good if the level set Ka = x : I(x) ≤ a is compact inX for each a <∞.

Morally speaking, the large deviation principle provides the probability of observingany event x ∈ X,

µk(x) ≈ exp[−kI(x)

], as k →∞.

Since the rate function I is always non-negative, we have

limk→∞

exp[−kI(x)

]=

1, if I(x) = 0,

0, otherwise.

We are interested in large-deviation principles in two different settings: large-deviationfrom a small-noise limit and large-deviation from the hydrodynamic limit. We will use theformer to construct new cost functions raised in Question 1 and use the latter to derivethe generalized gradient flow structures of Equation (1.1) answering Questions 2 and 3.

The small-noise large-deviation principle is given by the Freidlin-Wentzell theorem [DZ87,Theorem 5.6.7]. Consider a stochastic differential equation on Rd with small noise

dXεt = B(Xε

t ) dt+√

2εσ(Xεt ) dWt, 0 ≤ t ≤ 1, Xε

0 = x.

The diffusion matrix σ is not necessarily non-degenerate. The Freidlin-Wentzell theo-rem [DZ87, Theorem 5.6.7] states that the process Xε satisfies a large-deviation principlein the space C([0, 1],Rd) with the rate functional given by

I(f) = infv

1

2

∫ 1

0

|v|2 dt,

1.5. Large-deviation principle 11

where the infimum is taken over v ∈ L2([0, 1],Rd) such that f = b(f) + σ(f)v. If σ isnon-degenerate, I can be written explicitly in terms of f ,

I(f) =1

2

∫ 1

0

|σ(f)−1(f − b(f))|2dt.

Now we proceed with the large-deviation principle from the hydrodynamic limit of theparticle system (1.3). We are concerned with the large-deviation principle of the empiricalprocess (1.4) in the space C([0, T ],P(RN)). There is a large literature on large-deviationprinciples for the empirical process (1.4) associated to the weakly interacting particle sys-tems (1.3). We only review the most relevant ones. Dawson and Gartner [DG87] proved thelarge-deviation-principle result for the case of non-degenerate weakly interacting diffusion,i.e., for nonsingular mobilities σ with range R2d. Cattiaux and Leonard [CL94, CL95a]generalized the method of Dawson and Gartner to singular mobilities, but for independentparticles. Feng and Kurtz [FK06] introduced different techniques based on convergence ofsemi-groups to strengthen the result in [DG87] to the Wasserstein topology, but they re-quired super-quadratic assumptions on the potential H. Recently, Budhiraja et al. [BDF12]have proved the full general case treated in this thesis but their obtained rate functionalis in implicit form through optimal control.

In the large deviation principle from the hydrodynamic limit, a minimizer of the ratefunctional is a solution of a macroscopic partial differential equation. Variational methodscome into play here: we study the rate functional instead of the equation itself. One of themain purposes of the thesis is to derive the geometric structure of Equation (1.1) via therate functional and use it to study coarse-graining; therefore, we are interested not only inthe large-deviation-principle results but also in the representations of the rate functional.

We will use two alternative representations. The first one is an integral over time of thedeviation from solutions to (1.1) measured by a norm deduced from the diffusion matrix,

I(ρ) =1

2

∫ T

0

‖∂tρ(t)− L(ρ(t))∗ρ(t)‖2−1,A,ρ dt, (1.23)

where ‖ · ‖−1,A,ρ is the (semi-)norm which is defined using the Benamou-Brenier for-mula (1.19)

‖s‖2−1,A,ρ = inf

∫|v|2 dρ : s+ div(ρAv) = 0

.

We can clearly see from this formulation that a curve ρ ∈ C([0, T ],PN) is a solution to (1.1)if and only if I(ρ) = 0. The second representation of I is derived from the one above usingthe dual formulation of the norm ‖ · ‖2

−1,A,ρ,

I(ρ) = supf∈C∞c (R×RN )

G(ρ, f), (1.24)

where

G(ρ, f) =

∫RN

[fT dρT−f0 dρ0

]−∫ T

0

∫RN

[(∂t+L(ρt))ft

]dρtdt−

1

2

∫ T

0

∫RN

A∇ft·∇ft dρt dt.

12 Introduction

Thanks to this supremum formulation, this representation is useful for studying coarse-graining in Part II.

We now discuss in more detail how we use the large-deviation principles to answer thequestions posed in Section 1.4.

1.6 Approximation schemes from small-noise large de-

viation

In Chapter 2, we introduce new approximation schemes for the Kramers equation4. Weconsider a small-noise perturbation of the ordinary differential equation given by the driftvector B. For the Kramers equation, it is a stochastically perturbed Hamiltonian system5

dQε(t) =Pε(t)

mdt, (1.25a)

dPε(t) = −∇V (Qε) dt+√

2γθε dW (t), (1.25b)

which can formally be written as

md2

dt2Qε(t) +∇V (Qε(t)) =

√2γθε

dW

dt(t).

An application of the Freidlin-Wentzell theorem states that Qε satisfies a large-deviationprinciple in C([0, h],Rd) as ε→ 0 with rate function

I(ξ) =1

4γθ

∫ h

0

∣∣mξ(t) +∇V (ξ(t))∣∣2 dt.

We define a cost function by

Ch(q, p; q′, p′) := h inf

∫ h

0

∣∣mξ(t) +∇V (ξ(t))∣∣2 dt : ξ ∈ C1([0, h],Rd) such that

(ξ,mξ)(0) = (q, p), (ξ,mξ)(h) = (q′, p′)

. (1.26)

It is worth comparing (1.21) and (1.26). Instead of minimizing the integral of squared veloc-

ity as in the Wasserstein cost, now the cost function Ch(q, p; q′, p′) minimizes the deviation

from the Hamiltonian flow that connects two points (q, p), (q′, p′) ∈ R2d. The inertial effect

is also taken into account explicitly. The optimal curve now is more complicated and Chdepends on h in a non-trivial way.

Having this cost functional, the optimal transport cost functional between two proba-bility measures and the approximation schemes are constructed in the same manner as theWasserstein metric and the JKO-scheme. The main result of Chapter 2 can be summarizedas follows.

4Chapter 2 is a joint work with Mark Peletier and Johannes Zimmer [DPZ13a].5Since we only construct approximation schemes for the Kramers equation, ψ ≡ 0.

1.7. Wasserstein gradient flow structure of the Fokker-Planck equation 13

Theorem 1.6.1. Given a previous state ρhk−1, define ρhk as the solution of the minimizationproblem

minρ

1

2h

1

γWh(ρ

hk−1, ρ) +

∫RN

(p2

2mρ+ log ρ

)ρ, (1.27)

where Wh is the optimal-transport cost on P(R2d)×P(R2d) with cost function Ch. Then,up to a piecewise linear interpolation, ρhk converges to the (unique) weak solution of theKramers equation (1.7).

The precise notion of a weak solution and the proof of this theorem will be givenin Chapter 2. Furthermore, we also provide two variants of the cost functional Ch andcorresponding schemes which are more convenient for different purposes.

1.7 Wasserstein gradient flow structure of the Fokker-

Planck equation

In Chapter 3, we provide a microscopic interpretation of the Wasserstein gradient flowstructure of the Fokker-Planck equation6. We consider a conditional large deviation prin-ciple for the empirical process (1.4) after a time step h given the initial distribution ρ0,

Prob (ρn(h) ≈ ρ | ρn(0) ≈ ρ0) ≈ exp (−nJh(ρ|ρ0)) as n→∞,

where the “≈” is made precise in Section 1.5.

Applying the results in [DG87] and [FK06, Theorem 13.37] and using a contractionprinciple we can write Jh(ρ|ρ0) as follows

Jh(ρ|ρ0) = infρ(·)∈C(ρ0,ρ)

1

4h

∫ 1

0

‖∂tρ− h(∆ρ+ div(ρ∇H))‖2−1,ρ dt

, (1.28)

where C(ρ0, ρ) is the set of narrowly continuous curves [0, 1] → P(Rd) starting in ρ0 andending in ρ, and ‖ · ‖−1,ρ is defined in (1.19). Under certain conditions on the potentialH and the initial profile ρ0, the integral in the right hand side of (1.28) can be expanded.Thus the free energy difference is explicitly present,

1

4h

∫ 1

0

‖∂tρ− h(∆ρ+ div(ρ∇H))‖2−1,ρ dt =

1

2

[F(ρ)−F(ρ0)

]+

1

4h

∫ 1

0

‖∂tρ‖2−1,ρ dt

+h

4

∫ 1

0

‖∆ρ+ div(ρ∇H)‖2−1,ρ dt. (1.29)

Using this formulation, in Chapter 3 we prove the following.

6Chapter 3 is a joint work with Michiel Renger and Vaios Laschos [DLR13].

14 Introduction

Theorem 1.7.1. Let d = 1. Under certain conditions on the potential H and the initialprofile ρ0, we have

Jh(ρ|ρ0) =1

2

[F(ρ)−F(ρ0)

]+

1

4hW 2

2 (ρ, ρ0) + o(1), as h ↓ 0. (1.30)

This theorem will be proved rigorously in the sense of Gamma-convergence in Chapter 3.This result for the diffusion equation was discovered first in [ADPZ11]. The novelty ofour approach is the use of the large-deviation principle in the path space followed by acontraction principle as shown above. The liminf inequality in the Gamma-convergencethen formally follows directly from (1.29). In addition, our result holds for a larger classof systems than in [ADPZ11].

This result provides us a microscopic interpretation of the Wasserstein gradient flowstructure of the Fokker-Planck equation. Up to a factor 1/2, which does not affect theminimizing problem, Jh is the same as the functional Kh in the JKO-scheme (1.18) in thelimit of vanishing time step. Being the rate functional in the large deviation principle, Jhcharacterizes the fluctuation of the microscopic behavior of the system. A minimizer ofJh corresponds to an exact solution of the Fokker-Planck equation at time h. Thus, theequivalence between Jh and Kh not only gives a microscopic origin of the functional Kh

but also explains why the sum of (the difference of) the free energy and the Wassersteinmetric should be minimized to obtain the best approximation of the real solution at thetime step.

1.8 GENERIC structure of the Vlasov-Fokker-Planck

equation

In the previous section, we have shown that the Wasserstein gradient flow structure ofthe Fokker-Planck equation arises as a characterization of the large-deviation behaviour ofan underlying stochastic particle system, thus explaining amongst other things the originof the Wasserstein gradient flows.

In Chapter 4, we generalize this relationship beyond gradient flows to an examplefrom the class of GENERIC systems: the Vlasov-Fokker-Planck (VFP) equation (1.5) 7.The VFP equation itself is not a GENERIC system since the total free energy, H(ρt) =∫ (

H(q, p) + 12(ψ ∗ ρ)(q)

)ρt dqdp, is not conserved. However, by adding an auxiliary vari-

able e that characterizes the exchange of the energy with the heat bath,

∂te = − d

dtH(ρt),

the VFP can be written in GENERIC form for z = (ρ, e) with explicit construction ofZ, L,M,E, S, see Section 4.3. We then focus on deriving the GENERIC structure from

7Chapter 4 is a joint work with Mark Peletier and Johannes Zimmer [DPZ13b]. The paper has beenselected by the editors of Nonlinearity for inclusion in the exclusive 2013 Highlights Collection.

1.8. GENERIC structure of the Vlasov-Fokker-Planck equation 15

large-deviation principle of the empirical processes of the microscopic particle system (1.6).Instead of looking at the rate functional at a final time step as in the previous section, wenow consider the rate functional on the whole trajectory (path) space. The large-deviationprinciple of ρn in C([0, T ],P(R2d)) as well as a characterization of the rate functionalI(ρ) via optimal control have been proved in [BDF12]. In Chapter 4, we provide alternativerepresentations of the rate functional.

Theorem 1.8.1. Assume that the initial data (Qi(0), Pi(0)), i = 1, . . . , n are deterministicand chosen such that ρn(0) ρ0 for some ρ0 ∈ P(R2d). Then the empirical process ρnsatisfies a large-deviation principle in the space C([0, T ],P(R2d)), with good rate functional

I(ρ) =

1

4γθ

∫ T

0

∥∥∂tρt − L(ρt)∗ρt∥∥2

−1,A,ρtdt if ρ ∈ AC([0, T ];P(R2d)) and ρ|t=0 = ρ0,

+∞ otherwise,

(1.31)where AC([0, T ];P(R2d)) denotes the set of all curves that are absolutely continuous indistribution sense (see Definition 4.2.1).

The rate functional can be written in terms of the ingredients of the GENERIC structurein the sense that

J(z)

=

I(ρ) provided t 7→

∫R2d

(p2

2m+ V (q) + 1

2(ψ ∗ ρ)(q)

)ρ(t, dqdp) + et is constant,

+∞ otherwise,

where

J(z) =

∫ T

0

1

4θ

∥∥∂tzt − L(zt) gradE(zt)−M(zt) grad S(zt)∥∥2

M(zt)−1 dt,

if z = (ρ, e) ∈ AC([0, T ];Z) and ρt=0 = ρ0,

+∞ otherwise.

In the expression above, ‖·‖M(z)−1 is the (semi-)norm deduced from the positive semidef-inite operator M, and ρ0 is the initial profile. This result demonstrates that the GENERICstructure of the (extended) VFP equation can be deduced from the rate functional of thelarge-deviation principle of the underlying microscopic particle system.

Using the degeneracy condition of GENERIC, we can expand the integral in J(z)

2θJ(z) = S(z(T ))− S(z(0)) +1

2

∫ T

0

[‖∂tz− L gradE

∥∥2

M−1 + ‖ grad S∥∥2

M

]dt. (1.32)

We also formally suggest a variational formulation for an arbitrary GENERIC system

Variational formulation of a GENERIC system: Given a GENERICsystem Z,E, S, L,M, define J as in (1.32). A function z : [0, T ] → Z is asolution of the GENERIC equation (1.13) iff J(z) = 0.

16 Introduction

We expect that this formulation will be helpful to understand GENERIC although it isvery formal at this step. For gradient flow systems, this variational formulation is wellknown and has been put to good use. For instance, Sandier and Serfarty [SS04] (seealso e.g. [Ser09, Ste08, Le08, AMP+12]) showed how the variational form can be used topass to limits in parameters in the equation. In Chapter 7, we will use this variationalformulation to derive the Fokker-Planck equation from the Kramers equation in the highfriction limit and a diffusion on a graph from a perturbed Hamiltonian system. It also wouldbe interesting to ask whether one can use this formulation to construct approximationschemes for a GENERIC system that is similar to the minimizing movement schemesdeveloped for gradient flows in the metric space setting [AGS08]. We consider the schemepresented in Chapter 2 as the first step in this direction.

1.9 Generalizations: nonlinear diffusion and the thermo-

visco-elasticity

In the preceding sections, we have shown how to derive the Wasserstein gradient flowstructure of the Fokker-Planck equation and the GENERIC structure of the Vlasov-Fokker-Planck equation from particle systems using either discrete time or continuous time ap-proach. It is quite straightforward to construct the particle systems for these equationssince they are either linear or weakly non-linear. Things become more involved when wedeal with more complex systems. Finding the right particle systems and establishing thehydrodynamic limits and large-deviation principles for such systems are non-trivial. In thissection, we attempt to generalize the results in the previous sections to two fully nonlinearsystems: the porous medium equation and the thermo-visco-elasticity equation. We arenot yet able to rigorously prove the large-deviation principle for these systems.

1.9.1 The porous medium equation

In Chapter 5, we provide a first attempt to generalize (1.30) to the porous mediumequation 8,

∂tρ(t, x) = ∆ρ2−q(t, x), for (t, x) ∈ (0,∞)×Rd and ρ(0, x) = ρ0(x). (1.33)

In [Ott01], Otto showed that the porous medium equation is a gradient flow of the internalenergy functional Eq,

Eq(ρ) =

1

1−q

∫Rd

ρ(x) [ρ(x)1−q − 1] dx if q 6= 1,∫Rd

ρ(x) log ρ(x)dx if q = 1,(1.34)

8The result of Chapter 5 has been submitted for publication [Duo13].

1.9. Generalizations: nonlinear diffusion and the thermo-visco-elasticity 17

with respect to the Wasserstein distance W2 defined in (1.17). The JKO-scheme in thiscase reads

ρk ∈ argminρ

Kh(ρ, ρk−1), Kh(ρ, ρk−1) =1

2hW 2

2 (ρ, ρk−1) + Eq(ρ)− Eq(ρk−1). (1.35)

As has been shown in Chapter 3 (also [ADPZ11, DLZ12]) for the case of the linear diffusionequation (i.e., with q = 1) the functional Kh in (1.35) is asymptotically equivalent, as

h→ 0, to a discrete rate functional Jh that comes from the large-deviation principle of themicroscopic model. The rate functional Jh : P(Rd)→ [0,+∞] is defined by

Jh(ρ|ρ0) = infQ∈Γ(ρ0,ρ)

H(Q‖Q0→h), (1.36)

where

Q0→h(dxdy) = ph(x, y)ρ0(dx)dy; ph(x, y) =1

(4πh)d2

e−|x−y|2

4h ,

and H(Q‖Q0→h) is the relative entropy of Q with respect to Q0→h defined as,

H(Q‖Q0→h) =

∫

R2d

log(

dQdQ0→h

)dQ, if dQ

dQ0→hexists,

+∞, otherwise.

The aim of Chapter 5 is to generalize (1.30) to the nonlinear porous medium equationfor the class of q-Gaussian measures in 1D. As will become clear in Section 5.2, thisclass plays an important role because it is invariant under the semigroup of the porous-medium equation and is isometric to the space of Gaussian measures with respect to theWasserstein metric. Before stating the main result of Chapter 5 we need to recall somerelevant information about the q-Gaussian measures.

The q-exponential function and its inverse, the q-logarithmic function, are defined re-spectively by

expq(t) = [1 + (1− q)t]1

1−q+ , (1.37)

where [x]+ = max0, x, and

logq(t) =t1−q − 1

1− qfor t > 0. (1.38)

Given m ∈ R, to be specified later on, the m-relative entropy between two probabilitymeasuresQ(dx) = f(x)dx and P (dx) = g(x)dx, that are absolutely continuous with respectto the Lebesgue measure, is given by

Hm(Q∥∥P ) =

1

2−m

∫[f logm f − g logm g − (2−m) logm g(f − g)] dx

=1

2−m

∫[f logm f + (1−m)g logm g − (2−m)f logm g] dx. (1.39)

18 Introduction

For v ∈ Rd and V ∈ Sym+(d,R), which is the set of all symmetric positive definite matricesof size d, the q-Gaussian measure with mean v and covariance matrix V is given by

Nq(v, V ) = C0(q, d)(detV )12 expq

[−1

2C1(q, d)〈x− v, V −1(x− v)〉

]Ld, (1.40)

where Ld is the Lebesgue measure on Rd and C0(q, d), C1(q, d) are positive constants de-pending only on d and q defined explicitly in Section 5.2. From now on, we denote bynq(v;V ) the density of Nq(v;V ) with respect to the Lebesgue measure.

In particular, the q-Gaussian measure in 1D has density

nq(µ, σ2) =

C0(q, 1)

σexpq

(−1

2C1(q, 1)

(x− µ)2

σ2

). (1.41)

In 2D, the q-bivariate Gaussian Nq(µ1, σ21, µ2, σ

22, θ) has density

nq(v, V ) =C0(q, 2)

σ1σ2

√1− θ2

expq

−1

2C1(q, 2)

1

1− θ2

[(x− µ1)2

σ21

+(y − µ2)2

σ22

− 2θ(x− µ1)(y − µ2)

σ1σ2

],

(1.42)which corresponds to the mean vector v and covariance matrix V ,

v =

(µ1

µ2

), V =

(σ2

1 θσ1σ2

θσ1σ2 σ22

).

In Section 5.2, it is shown that, if the initial data is a q-Gaussian ρ0(x) = nq(µ0, Cσ20)(x),

then the solution of the porous medium equation at time t is again a q-Gaussian ρ(t, x) =

nq(µ0, C(t+ σ3−q0 )

23−q )(x), where C is a constant given in (5.21).

The main result of Chapter 5 is the following.

Theorem 1.9.1. Let d = 1, q ∈ Qd ≡ (0, 1) ∪(1, d+4

d+2

)and N0

q = Nq(µ0, Cσ20) be given.

We set m = 3− 2q, σ2

h = (h+ σ3−q0 )

23−q and

Q0→h = Nm(µ0, Cσ20, µ0, Cσ

2h,σ0

σh). (1.43)

For Nq = Nq(µ,Cσ2), we define the functional Jh(Nq, N

0q ) by

Jh(Nq|N0) := infQ∈Q

Hm(Q∥∥Q0→h), (1.44)

whereQ :=

Nm(µ0, Cσ

20, µ, Cσ

2, θ)∣∣ θ ∈ [−1, 1]

.

Then there exist explicit constants a = a(σ0, q) and b = b(σ0, q) such that the followingstatement holds, on the sub-manifold of the q-Gaussians equipped with the Wassersteinmetric,

ab(σ2h − σ2

0)1−qq Jh(·|N0

q ) =b

σ2h − σ2

0

W 22 (·, N0

q ) + Eq(·)− Eq(N0q ) + o(1), as h ↓ 0.


Remark 1.9.2. When q → 1, then a → 4, b → 12, σ2

h − σ20 → h and we recover (1.30)

for the diffusion equation. However, for the diffusion equation, Jh is the (conditional) ratefunctional of the empirical process of many i.i.d. Brownian particles. The functional Jhin (5.13) is defined similarly as Jh but we do not know whether it is the rate functional ofsome process.

1.9.2 Thermo-visco-elasticity equation as the hydrodynamic limitof a particle system

In Chapter 6, we are interested in the thermo-visco-elasticity system (TVE) that de-scribes the evolution of a visco-elastic body including thermal effects 9,

utt = kuxx + αθx + µutxx

θt = κθxx + αθutx + +µu2tx.

(1.45)

In the above equation, u and θ are respectively the displacement and the absolute tem-perature. They are functions of t ∈ [0,∞) and x ∈ Ω ≡ (0, 1). In this section and inChapter 6, subscripts correspondingly denote the derivative of the unknowns with respectto t and x. Finally, k, α, µ and κ are positively constants. The traditional derivation ofthe TVE system is based on conservation laws of momentum and energy as well as con-stitutive assumptions on the free energy. In addition, the second law of thermodynamic isalso fulfilled. Equation (1.46) need to be supplemented with suitable initial and boundaryconditions. The existence and uniqueness of global classical solutions to the TVE system(or variants) under appropriate boundary and initial conditions has been studied exten-sively in the literature, see for instance [CH94]. In this thesis, we are only interested inthe structure of the equation.

The TVE system can be written as a first order system using the momentum variablep as follows,

u = p

p = kuxx + αθx + µpxx

θ = κθxx + αθpx + µp2x.

(1.46)

Note that the last equation in (1.46) can also be written in terms of the internal energye = θ + k

2u2x + 1

2p2 as follows

e = (αθp+ κθx + kpux + µppx)x.

In [Mie11], the author showed that the TVE system is a GENERIC system (see Sec-tion 1.3) for z = (u, p, θ)T ,

z = L(z)dE(z) + M(z)dS(z),

with explicit formula for the building blocks Z, L,M,E, S, see Section 6.2.

9Chapter 6 is work in progress together with Mark Peletier and Johannes Zimmer.

20 Introduction

Our aim is to derive this GENERIC structure of the TVE system from the rate func-tional of an underlying particle system. Due to the nonlinearity of the TVE system, it isnot straightforward to construct a microscopic particle system that gives rise to the TVEsystem as the hydrodynamic limit. The large-deviation principle from the hydrodynamiclimit is even more intricate. In this thesis, we are only able to perform a modest step:guessing the particle model and formally computing the hydrodynamic limit for the caseα = 0. This is the main content of Chapter 6.

Now we introduce our microscopic particle system.

A microscopic model

We consider a chain of one-dimensional harmonic oscillators located at sites i ∈ Sn :=1, . . . , n. Here we assume periodicity modulo n, i.e., the site n + 1 is the same asthe site 1. The configuration space is denoted by Ωn = (R×R)Sn , and a configuration isω = ui, pini=1, where ui and pi respectively represent the displacement and the momentumof the particle i. We consider a process (named TVE-process) whose generator L is givenby

Lnf =n∑i=1

pi∂uif +n∑i=1

n2k(ui+1 − 2ui + ui−1)∂pif + n2µn∑i=1

Y 2i f, (1.47)

= Anf + n2kBnf + n2µCnf, (1.48)

where f is a smooth function, and

Yi = (pi − pi+1)∂pi−1+ (pi+1 − pi−1)∂pi + (pi−1 − pi)∂pi+1

. (1.49)

Remark 1.9.3. The last term in the generator characterizes the random exchange ofmomentum between three consecutive particles and has been used in [BBO06]. Its mainproperty is to conserve total momentum and total kinetic energy. As a consequence, theTVE-process conserves the deformation field, the total momentum and the energy

Ln

n−1∑j=1

rj = Ln

n∑j=1

pj = Ln

n∑j=1

ej = 0, (1.50)

where rj = uj+1 − uj.

We are interested in the following empirical processes

un(t, dx) =1

n

n∑i=1

ui(t)δi/n(dx), (1.51)

pn(t, dx) =1

n

n∑i=1

pi(t)δi/n(dx), (1.52)

en(t, dx) =1

n

n∑i=1

ei(t)δi/n(dx), (1.53)


where δi/n(dx) is the Delta measure at site i/n and

ei(t) =pi(t)

2

2+n2k

2[ui+1(t)− ui(t)]2. (1.54)

For each t, these are probability measures on the flat one-dimensional torus T = R/Z.

Remark 1.9.4. Note how different effects are scaled differently in Ln. The first term isscaled by factor 1 while the last two terms in (1.47) are speeded up by a factor n2. HenceLn consists of both hyperbolic and diffusive scalings. The second term in (1.53) is alsomultiplied by n2 to ensure that ui, pi, ei and 1

n

∑ni=1 ei are of order O(1).

Let πn(t) be any of the three processes defined in (1.51)-(1.53). Let ρ0 be a givenprobability measure on T. A typical result of the hydrodynamic limit of πn consists inproving that, if πn(0) converges to ρ0 in an appropriate topology then for any t > 0, πn(t)converges to some probability measure ρ(t). The hydrodynamic ρ(t) limit often satisfies acertain partial differential equation that has ρ0 as its initial profile.

Conjecture 1.9.5. Let Pn denote the distribution of the TVE process. Assume that forevery continuous and bounded function ϕ : T→ R and for every δ > 0 we have,

limn→∞

Pn(∣∣∣〈ϕ, un(0)〉 −

∫Tϕ(x)u0(x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, pn(0)〉 −

∫Tϕ(c)p0(x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, en(0)〉 −

∫Tϕ(x)e0(x)dx〉

∣∣∣ > δ

)= 0.

Then these limits also hold for any t > 0,

limn→∞

Pn(∣∣∣〈ϕ, un(t)〉 −

∫Tϕ(x)u(t, x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, pn(t)〉 −

∫Tϕ(c)p(t, x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, en(t)〉 −

∫Tϕ(x)e(t, x)dx〉

∣∣∣ > δ

)= 0,

where (u, p, e) solve the TVE system (1.46), up to constant scaling, with α = 0,u = p

p = kuxx + 6µpxx

e = k∂x(pux) + µ∂xx(θ + 3p)2.

(1.55)

In Chapter 6, we provide formal argument which justifies why this conjecture is expectedto be true.

22 Introduction

1.10 Summary of Part I

We summarize how we have managed to answer the questions posed in Section 1.4.

Question 1: We construct approximation schemes for a generalized Kramers equation(Section 1.6 and Chapter 2). The cost functionals in the schemes are inspired by the ratefunctional in a small-noise large deviation for a perturbed Hamiltonian system. We alsosuggest a variational formulation for a GENERIC system (Section 1.8 and Chapter 4),which will be used in the second part.

Question 2: We provide a microscopic interpretation of the JKO-scheme for the Fokker-Planck equation (Section 1.7 and Chapter 3). We show that the functional Kh is asymp-totically equivalent to the conditional rate functional Jh in the large-deviation principleof the empirical process associated to a drift-diffusion particle system.

Question 3: We show that the GENERIC structure of the Vlasov-Fokker-Planck equa-tion can be derived from the (path-wise) rate functional in the large-deviation principle ofthe empirical process associated to a collection of weakly interacting Brownian particleswith inertia (Section 1.8 and Chapter 4).

We also attempt to generalize these results to the porous medium equation (Section 1.9.1and Chapter 5) and the thermo-visco-elasticity system (Section 1.9.2 and Chapter 6).

Part II: Coarse-graining

1.11 General framework

In the second part of the thesis, we consider (1.1) as an ε-dependent equation for somesmall parameter ε. For instance, ε can be the friction coefficient γ in the Kramers equationor 1/N in the Ginzburg-Landau model. We are interested in deriving the limiting systemas ε→ 0.

We first describe a general framework10. Suppose that ρε : [0, T ] → P(X ) (X := RN),solves the ε-dependent problem

(Pε) :

∂tρ

ε = L∗ε ρε,ρε(0) = ρε0.

(1.56)

The aim is to derive an ε-independent problem (P) that can be considered as an approxi-mation (in a suitable sense) of (Pε) as ε→ 0,

(P) :

∂tρ = L∗ ρ,ρ(0) = ρ0.

(1.57)

10Since we will not work with the Vlasov-Fokker-Planck equation, ψ ≡ 0.

1.12. Main questions of Part II 23

Here ρ : [0, T ]→ P(X0), where X0 is some Euclidean space.Coarse-graining is a technique for such purpose. It consists of two steps. The first

one is to transform the problem (Pε) to a coarse-grained problem (Pε) defined on P(Y),where Y is some coarse-grained Euclidean space, via a coarse-graining map Πε : X →Y . The coarse-grained space is often of dimension less than the original space and asa consequence the coarse-grained map is non-injective. The coarse-grained problem (Pε)describes the evolution of the coarse-grained profile ρε which is the push-forward of ρε

under Πε, ρε = Πε#ρ

ε : [0, T ]→ P(Y),

(Pε) :

∂tρ

ε = L∗ε(ρε) ρε,ρε(0) = ρε0.

(1.58)

Note that the coarse-grained generator L∗ε(ρε) depends on ρε, therefore it is not Markovianin general.

The second step is to derive (P) from (Pε). The success of the technique relies onwhether one can define an appropriate coarse-grained problem. Usually one also has torescale the temporal and/or the spatial variables appropriately depending on the effectsthat one wishes to observe.

1.12 Main questions of Part II

In the first part, we have shown that generalized gradient flow structures of the mainequation (1.1) (for fixed ε) arise as characterizations of the large-deviation behaviour of theempirical process (1.4) associated to the stochastic particle systems (1.3): ρε is a solutionof (1.1) if and only if Iε(ρε) = 0. It is then natural to ask whether we can use thischaracterization to pass to the limit ε→ 0. Since it is known that many partial differentialequations can be derived from the large deviation principle, we prefer to reformulate thequestion in a general form

Question 4: How do we use the rate functional to pass to the limit ε→ 0?

It turns out that the rate functional is very useful for passing to the limit thanks to itsvariational representation (as a supremum (1.24)). We will discuss this in Section 1.13.

We have asked whether we can use the large deviation principle to study coarse-graining.Since deriving the hydrodynamic limit is consistent to the general framework in the begin-ning of this part, it is also interesting to ask the following question,

Question 5: Can we use coarse-graining to study the hydrodynamic limit?

A typical result of convergence to the hydrodynamic limit consists in proving that, under asuitable time-space scaling and initial conditions, a random system with a large number ofparticles behaves deterministically given as the solution of a partial differential equation.The above question is motivated by [GOVW09] in which the authors introduced a new

24 Introduction

quantitative method (the two-scale method) to study hydrodynamic limits for reversibledynamics of the class (C1),

(PN) : ∂tρ = div(ρA∇H + A∇ρ) ∈ P(RN).

The main idea of the method is the same as the coarse-graining procedure discussed above.It also consists of two steps: (PN) −→ (P) −→ (P∞). Coarse-graining is accomplished bydividing the microscopic state into smaller blocks and taking the average of each block.Together with a logarithmic Sobolev inequality argument, the two-scale approach obtainsnot only the hydrodynamic equation but also an explicit estimate for the deviation fromit.

In Chapter 8, we extend the two-scale method to the case of non-reversible dynamics,i.e., to equations in the class (C2). We will discuss this in more detail in Section 1.14.

1.13 Qualitative coarse-graining from large-deviation

principle

In this section, we will demonstrate how the rate functional can be used for coarse-graining. We explicitly include the superscript ε whenever it is necessary to indicate thedependence on ε.

As has been shown in the first part of the thesis, for fixed ε the equation (Pε) can bederived from the rate functional of the large deviation principle of the empirical process ofan underlying particle system Xε

i that satisfies (1.3) (after rescaling). More precisely,

ρε is a solution to (Pε) iff Iε(ρε) = 0,

where the rate functional Iε(ρε) is given by (see (1.28))

Iε(ρε) = supf∈C∞c (R×X )

Gε(ρε, f). (1.59)

The functional Gε(ρε, f) has the following form

Gε(ρε, f) =

∫X

[fT dρ

εT − f0 dρ

ε0

]−∫ T

0

∫X

[(∂t + Lε)ft

]dρεtdt−

1

2

∫ T

0

∫XA∇ft · ∇ft dρεt dt,

where A is the diffusion matrix. In order to study the asymptotic behavior of ρε, we studyGamma-convergence of the functional Iε instead. If one is only interested in convergenceof the solutions, one only needs to prove the liminf inequality in the Gamma-convergenceprovided that the limiting functional is non-negative. In Chapter 7, we introduce a newmethod for coarse-graining using the rate functional. The core idea of our method can besummarized in the following four steps.

1.13. Qualitative coarse-graining from large-deviation principle 25

Step 1. Choose a special class of test functions: By taking f = g Πε, where g ∈C∞c (R× Y), we obtain

Iε(ρε) ≥ supg∈C∞c (R×Y)

Gε(ρε, g Πε). (1.60)

Note that g Πε may not have compact support. Therefore, some approximationargument may be required to ensure that g Πε is admissible.

Step 2. Compactness property for ρε and ρε. In this step, one needs to prove that ρε

and ρε possess appropriate compactness property. Assume that ρεσ−→ ρ, ρε

σ−→ ρ,where σ and σ denote appropriate topologies.

Step 3. Prove that, up to an o(1) term, Gε(ρε, g Πε) depends only on g and thecoarse-grained variable ρε. We denote by Gε(ρε, g) the dominating term inGε(ρε, g Πε). In addition, suppose that we can pass to the limit, with respectto the topology σ, in the functional Gε(ρε, g) for any fixed g. If this assumptionshold, we may define

G(ρ, g) := limε→0

Gε(ρε, g) for fixed g, (1.61)

and alsoI(ρ) := sup

gG(ρ, g). (1.62)

Step 4. Derive the limiting system as the law ρ ∈ C([0, T ],P(X0)) that uniquelysatisfies I(ρ) = 0.

We assume that the rate functional is uniformly bounded supε>0

Iε(ρε) < ∞ and that the

initial data is well-prepared, for instance supε>0

∫Hρε0 < ∞. All the steps then follow from

these assumptions and the variational formulation of the rate functional. The first step isobvious. Uniform bound of the rate functional often provides estimates on the (relative)entropy and the (relative) Fisher information which in turn offers appropriate compactnessand regularity of the sequences ρε and ρε. Step 3 essentially requires a proof of localequilibrium: the loss of information in passing from ρε to ρε can be deduced from thepush-forward, at least approximately in the limit ε → 0. This property can be verifiedeasily for some parts of Gε. For instance, the first term of Gε gives,∫

X

[fT dρ

εT − f0 dρ

ε0

]=

∫X

[gT Πε dρ

εT − g0 Πε dρ

ε0

]=

∫Y

[gT dρ

εT − g0 dρ

ε0

].

However, this is not straightforward for the other parts. In the last step, the equation forthe law ρ uniquely satisfying I(ρ) = 0 often can be shown to be equivalent to a weak-solution concept for a partial differential equation.

In Chapter 7, we apply this method to derive two limiting evolutions: the overdamped(high friction) limit of the Kramers equation and the small-noise limit from a perturbed

26 Introduction

Hamiltonian system11. For the former case, we obtain the Fokker-Planck equation as thelimiting equation. The latter gives a diffusion on a graph.

We now describe the high friction limit of the Kramers equation. Consider the Kramersequation in its full form as in (1.7) (here we set θ = 1 and rescale time t 7→ t/γ),

(Pγ) : ∂tργ = γ

[− divq

(ργp

m

)+ divp(ρ

γ∇qV )]

+ γ2[divp(ρ

γ p

m) + ∆pρ

γ].

We are interested in the asymptotic behavior of this equation as γ → +∞. This over-damped limit was derived formally first in [Kra40] and has been extensively studied inthe literature from different point of view such as asymptotic expansions or probabilisticmethods, see for instance [Nel67, Wil76, GPK12] and references therein. We reprove thisresult to illustrate our method. The coarse-graining map is given by Πγ(q, p) := q + 1

γp.

The local equilibrium statement requires us to prove that the p-marginal of the limitingmeasure is the Maxwell distribution. The compactness property follows directly from theassumption that sup

εIε(ρε) <∞ together with certain conditions on the initial data.

Conjecture 1.13.1. The limiting system is the Fokker-Planck equation,

(P) : ∂tρ(t, q) = div(ρ(t, q)∇V (q)) + ∆ρ(t, q).

In Chapter 7, we provide formal argument to support this conjecture.We now switch to the small-noise limit. We consider the following stochastically per-

turbed Hamiltonian system (1.25) where θ,m are set to 1, ε := γ and the time is rescaledt 7→ t/ε,

dQε =1

εPε, (1.63a)

dPε = −1

ε∇V (Qε) +

√2 dW. (1.63b)

Without the noise, this is the classical deterministic Hamiltonian system with the Hamil-tonian H(q, p) = 1

2p2 + V (q). This dynamic preserves H and solutions are restricted to

the level sets of H. When the noise is present, the Hamiltonian is no longer preservedand the solutions behave stochastically. The probability density ρε of (Qε, P ε) satisfies thefollowing equation,

(Pε) : ∂tρε = −1

εdiv(ρεJ∇H) + ∆pρ

ε.

The asymptotic behavior of this equation as ε ↓ 0 was first studied by Freidlin andWentzell [FW94, FW98b]. The authors showed that the limiting system can be describedas a diffusion on a graph: over O(ε) time the solution follows level sets of H, while at O(1)time scale, it performs a biased Brownian motion between level sets.

11Chapter 7 is work in progress with Agnes Lamacz, Mark Peletier and Upanshu Sharma [DLPS14]. Ashort announcement has appeared as an Oberwolfach report [DPS13].

1.14. Quantitative rate of convergence to the hydrodynamic limits 27

In Chapter 7, we re-prove this result as another illustration of our method. The asso-ciated rate functional is as follows (see (1.28)),

Iε(ρε) = supf∈C∞c (R×R2d)

∫R2

[fT dρ

εT − f0 dρ

ε0

]−∫ T

0

∫R2d

[∂tf +

1

εJ∇H · ∇ft + ∆pft

]dρεtdt

−1

2

∫ T

0

∫R2d

|∇pft|2]dρεtdt

.

The coarse-graining map Π mapsR2d to the graph Γ consisting of equivalence classes of levelsets of H, under the equivalence relation of belonging to the same connected componentof the level sets of H. The local equilibrium statement requires us to prove that ρε isconstant on each level set of H. The compactness property also follows directly from theassumption that supε I

ε(ρε) <∞ together with certain conditions on the initial data.

Conjecture 1.13.2. The limiting system is given by

(P) : ∂tρ(t, h) = ∂h(a(h)∂hρ(t, h))− ∂h(b(h)ρ(t, h)),

where h denotes the height of the level set; a, b are determined in terms of H. This equationis complemented by gluing conditions at interior vertices of the graph.

In Chapter 7, we prove this conjecture rigorously for the case of one degree of freedom(d = 1) and single-well potential (i.e. V is strictly convex). In addition, we provide formalargument to support this conjecture in the general case of many degrees of freedom andmulti-well potential.

1.14 Quantitative rate of convergence to the hydro-

dynamic limits

In the preceding section, we have shown how to derive the limiting equation (P) from theε-dependent equation (Pε) using coarse-graining. In [GOVW09], the authors introduced atwo-scale approach to the hydrodynamic limits for reversible dynamics. The main idea ofthe two-scale method is based on a coarse-graining argument and a logarithmic Sobolevinequality. The combination of them enables the authors to obtain a quantitative rate ofconvergence to the hydrodynamic limits.

In Chapter 8, we extend the result in [GOVW09] to the non-reversible case 12,

(PN) : ∂t(fµ) = div[µ(A+ J)∇f ], (1.64)

where µ(dx) := Z−1 exp(−H(x))dx = Z−1 exp

(−

N∑i=1

φ(xi)

)dx. Here both µ and fµ are

probability measures on X := RN and the unknown f(t, x) is the microscopic density offµ with respect to µ.

12Chapter 8 is a joint work with Max Fathi and has been submitted for publication [DF14].

28 Introduction

The reversible dynamics studied in [GOVW09] correspond to the case J ≡ 0. Notethat A and J depend on N . The aim of Chapter 8 is to obtain a quantitative rate ofconvergence to the hydrodynamic limit for this system as N → ∞. This will be done intwo steps: first, we project the system into a smaller (coarse-grained) space, Y := RM forsome M , then we send both N and M to infinity appropriately. The coarse-graining mapP : X → Y is a linear operator satisfying

NPP t = idY , (1.65)

where P t is the adjoint operator of P . In applications, the microscopic space is dividedinto many blocks, each of them is of dimension M . P will be the average of each block.It induces a decomposition of the invariant measure into a macroscopic component anda fluctuation component. Let µ(dy) = P#µ be the push-forward of µ under the operatorP and µ(dx|y) be the conditional measure of µ given Px = y. For each y, µ(dx|y) is aprobability measure on X satisfying, for any test function ϕ,∫

X

ϕ(x)dµ(x) =

∫Y

(∫Px=y

ϕ(x)µ(dx|y))µ(dy). (1.66)

Applying the technique in [GOVW09], we show that under certain conditions, thecoarse-grained variable y = Px, with law given by f(t, y) =

∫Px=y

f(t, x)µ(dx), is closedto the solution of the following differential equation

(Pcg) :dη

dt= −(A+ J)∇H(η(t)). (1.67)

In this equation, A is a symmetric, positive definite operator and J is another operatoron Y defined by

A−1

= PA−1NP t, J = APA−1NJP t,

and H : Y → R is the macroscopic Hamiltonian that satisfies

µ(dy) = exp(−NH(y))dy.

Next, by sending N and M to infinity, we obtain a nonlinear drift-diffusion equation as thehydrodynamic limit. The main result of Chapter 8 can be summarized in the following.

Theorem 1.14.1. Let µ(dx) = Z−1 exp(−H(x)) dx be a probability measure on X, andlet P : X → Y satisfy (1.65). Let A : X → X be a symmetric, positive definite operator,and f(t, x) and η(t) be the solutions of (1.64) and (1.67), with initial data f(t, ·) and η0

respectively. Define

Θ(t) :=1

2N

∫X

(x−NP tη(t)) · A−1(x−NP tη(t))f(t, x)µ(dx).

Under certain assumptions, for any T > 0, we have

max

sup

0≤t≤TΘ(t),

λ

8

∫ T

0

(∫Y

|y − η(t)|2Y f(t, y)µ(dy)

)dt

≤ e

8cΛ2

λT[Θ(0) + E(T,M,N)

],

1.15. Summary of Part II 29

where E(T,M,N)→ 0 as N ↑ ∞,M ↑ ∞, NM↑ ∞.

In the specific example where A and J are given as in Example 4 in Section 1.1.2, thehydrodynamic equation is a nonlinear drift-diffusion equation

(P) :

∂ζ∂t

= ∂2

∂θ2ϕ′(ζ) + ∂

∂θϕ′(ζ),

ζ(0, ·) = ζ0,(1.68)

where ϕ is the Cramer transform of φ, i.e.

ϕ(m) = supσ∈R

σm− log

∫R

exp (σx− φ(x))dx

. (1.69)

This theorem will be proved in Chapter 8. Moreover, we also show extra properties ofsolutions of (1.64).

1.15 Summary of Part II

We briefly summarize how we have answered two questions posed in Section 1.12.

Question 4: We introduce a new method for coarse-graining using the rate functionalfrom the large-deviation principle (Section 1.13 and Chapter 7). We illustrate our methodfor two limiting systems: the overdamped (high friction) limit of the Kramers equationand the small-noise limit of a perturbed Hamiltonian system. We recover known resultsin the literature.

Question 5: We extend the two-scale approach to the hydrodynamic limits to non-reversible dynamics (Section 1.14 and Chapter 8). We obtain a quantitative rate ofconvergence to the hydrodynamics for non-reversible dynamics.

Part I

Generalized gradient flows andlarge-deviation principle

31

Chapter 2

Approximation schemes for ageneralized Kramers equation

In this chapter, we propose three new discrete variational schemes that capture theconservative-dissipative structure of a generalized Kramers equation. The first two schemesare single-step minimization schemes while the third one combines a streaming and aminimization step. The cost functionals in the schemes are inspired by the rate functionalin the Freidlin-Wentzell theory of large deviations for the underlying stochastic system. Weprove that all three schemes converge to the solution of the generalized Kramers equation.1

2.1 Introduction

2.1.1 The Kramers equation

In this chapter we discuss the variational structure of a generalized Kramers equa-tion (1.7),

∂tρ = − divq ρp

m+ divp ρ∇qV + γ divp ρ∇pF + γkT∆pρ, in R2d ×R+, (2.1)

which is the Fokker-Planck or Forward Kolmogorov equation of the stochastic differentialequation (1.8)

dQ(t) =P (t)

mdt, (2.2a)

dP (t) = −∇V (Q(t))dt− γ∇F (P (t))dt+√

2γkT dW (t). (2.2b)

The theory of such variational structures took off with the introduction of Wassersteingradient flows by [JKO97, JKO98] and of the energetic approach to rate-independent pro-cesses [MTL02, Mie05]. Both have changed the theory of evolution equations in many ways.

1This chapter is a joint work with Mark A. Peletier and Johannes Zimmer [DPZ13a].

33

34 Approximation schemes for a generalized Kramers equation

If a given evolution equation has such a variational structure, then this property givesstrong restrictions on the type of behaviour of such a system, provides general methods forproving well-posedness [AGS08] and characterizing large-time behaviour (e.g., [CMV03]),gives rise to natural numerical discretizations (e.g., [DMM10]), and creates handles for theanalysis of singular limits (e.g., [SS04, Ste08, AMP+12]). Because of this wide range oftools, the study of variational structure has important consequences for the analysis of anevolution equation.

Remark 2.1.1. A brief word about dimensions. We make the unusual choice of preservingthe dimensional form of the equations, because the explicit constants help in identifying themodelling origin and roles of the different terms and effects, and these aspects are centralto this chapter. Therefore Q and q are expressed in m, P and p in kg m/s, m in kg, V , F ,and kT in J, and γ in kg/s. The density ρ has dimensions such that

∫ρ is dimensionless.

This setup implies that the Wiener process has dimension√

s, in accordance with the formalproperty dW 2 = dt.

2.1.2 Variational evolution

To avoid confusion between the Boltzmann constant and the integer k, from now onwe define β−1 := kT . The authors of [JKO98] studied an equation that can be seen as asimpler, spatially homogeneous case of (2.1), where ρ = ρ(t, p):

∂tρ = γβ−1∆pρ+ γ divp ρ∇pF. (2.3)

They showed that this equation is a gradient flow of the free energy

Ap(ρ) :=

∫Rd

[ρF + β−1ρ log ρ

]dp

with respect to the Wasserstein metric W2 defined in (1.17) in the sense that the solutiont 7→ ρ(t, p) can be approximated by the time-discrete sequence ρk defined recursively by

ρk ∈ argminρ

Kh(ρ, ρk−1), Kh(ρ, ρk−1) :=1

2h

1

γW2(ρ, ρk−1)2 +Ap(ρ), (2.4)

where d is the Wasserstein distance between two probability measures as defined in (1.17).A consequence of this gradient-flow structure is that Ap decreases along solutions of (2.3).

Unfortunately, a convincing generalization of this gradient-flow concept and correspond-ing theory to equations such as the Kramers equation is still lacking. This is related tothe mixture of both dissipative and conservative effects in these equations, which we nowexplain.

2.1.3 A combination of conservative and dissipative effects

The full Kramers equation (2.1) is a mixture of the dissipative behaviour describedby (2.3) and a Hamiltonian, conservative behaviour. The conservative behaviour can be

2.1. Introduction 35

recognized by setting γ = 0, thus discarding the last two terms in (2.2); what remainsin (2.2) is a deterministic Hamiltonian system with Hamiltonian energy H(q, p) = p2/2m+V (q). The evolution of this system is reversible and conserves H. Correspondingly, theevolution of (2.1) with γ = 0 also is reversible and conserves the expectation of H,

H(ρ) :=

∫R2d

ρ(q, p)H(q, p) dqdp.

On the other hand, as suggested by the discussion in the previous section, the γ-dependent terms represent dissipative effects. In the variational schemes that we definebelow, a central role is played by the (q, p)-dependent analogue of Ap,

A(ρ) :=

∫R2d

[ρ(q, p)F (p) + β−1ρ(q, p) log ρ(q, p)

]dqdp.

Because of the special structure of (2.1), the functionalA does not decrease along solutions,but in the particular case F (p) := p2/2m, a ‘total free energy’ functional does: setting

E(ρ) := A(ρ) +

∫ρV dqdp =

∫ [H + β−1 log ρ

]ρ dqdp,

we calculate that

∂tE(ρ(t)) = −γ∫R2d

1

ρ(t, q, p)

∣∣∣ρ(t, q, p)p

m+ β−1∇pρ(t, q, p)

∣∣∣2 dqdp ≤ 0. (2.5)

The choice F (p) = p2/2m is related to the fluctuation-dissipation theorem, and we com-ment on this in Section 2.7.

Because of the conservative, Hamiltonian terms, equation (2.1) is not a gradient flow,and an approach such as [JKO98] is not possible. In 2000 Huang [Hua00] proposed avariational scheme that is inspired by [JKO98], but modified to account for the conserva-tive effects, and in this chapter we describe three more variational schemes for the sameequation.

2.1.4 Huang’s discrete schemes for the Kramers equation

The time-discrete variational schemes of Huang’s and of this chapter are best under-stood through the connection between gradient flows on one hand and large deviations onthe other. We have recently shown this connection for a number of systems [ADPZ11,PRV13, DLZ12, DLR13], including (2.3).

The philosophy can be formulated in a number of ways, and here we choose a perspectivebased on the behaviour of a single particle. We start with the simpler case of equation (2.3)and the discrete approximation (2.4). Let Xεε>0 be a rescaled d-dimensional Wienerprocess,

dXε(t) =√

2σε dW (t), (2.6)


where σ is a mobility coefficient. If we fix h > 0, then by Schilder’s theorem (e.g. [DZ87,Th. 5.2.3]), the process Xε(t) : t ∈ [0, h] satisfies a large-deviation principle

Prob(Xε(·) ≈ ξ(·)

)∼ exp

[−1

εI(ξ)

], as ε→ 0,

where the rate functional I : C([0, h]; Rd)→ R ∪ +∞ is given by

I(ξ) =1

4σ

∫ h

0

∣∣∣ξ(t)∣∣∣2 dt.The Wasserstein cost function |x− y|2 can be written in terms of I as

|x− y|2 = 4hσ infI(ξ) : ξ ∈ C1([0, h],Rd) such that ξ(0) = x, ξ(h) = y

. (2.7)

Hence the cost |x− y|2 can be interpreted as the the probability that a Brownian particlegoes from x to y in time h, in the sense of large deviations, and rescaled as to be independentof the magnitude of the noise σ.

The results of [ADPZ11, PRV13, DLR13] concern a similar large-deviation analysis, butnow for the empirical measure of a large number n of particles. For this system the limitn→∞ plays a role similar to ε→ 0 in the example above. In [ADPZ11, PRV13, DLR13],it is shown that this rate functional is very similar to the right-hand side of (2.4) in thelimit h → 0. This result explains the strong connection between large deviations on onehand and the gradient-flow structure on the other.

However, the core of the argument of [ADPZ11, PRV13, DLR13] is contained in theSchilder example (2.6) and its connection (2.7) to the Wasserstein cost. Hence we usethis simpler point of view to generalize the approximation scheme (2.4) to the Kramersequation. There are two different ways of doing this.

Approach 1 [Hua00]. Instead of the inertia-less Brownian particle given by (2.6), weconsider a particle with inertia satisfying

dQε(t) =Pε(t)

mdt, (2.8a)

dPε(t) =√

2εγβ−1 dW (t), (2.8b)

which can formally also be written as

md2

dt2Qε(t) =

√2γβ−1ε

dW

dt(t).

By the Freidlin-Wentzell theorem (e.g. [DZ87, Th. 5.6.3]), the process Qε(t) satisfies asimilar large-deviation principle with rate functional I : C([0, h],Rd) → R ∪ +∞ givenby

I(ξ) =1

4γβ−1

∫ h

0

∣∣∣mξ(t)∣∣∣2 dt.


The comparison with (2.7) suggests to define a cost functional Ch(q, p; q′, p′) in a similar

way, i.e.,


∫ h

0

∣∣∣mξ(t)∣∣∣2 dt : ξ ∈ C1([0, h],Rd) such that

(ξ,mξ)(0) = (q, p), (ξ,mξ)(h) = (q′, p′)

= |p′ − p|2 + 12

∣∣∣∣mh (q′ − q)− p′ + p

2

∣∣∣∣2 . (2.9)

The second formula follows from an explicit calculation of the minimizer. As above, theinterpretation is that of the probabilistic ‘cost’, that is, the large-deviations characterizationof the probability of a path of (2.8) connecting (q, p) to (q′, p′) over time h. Note that Ch

is not a metric, since it is not symmetric, and also Ch(q, p; q, p) = 12|p|2 generally does notvanish. Therefore the Wasserstein ‘distance’ W h defined with Ch as cost is not a metric,but only an optimal-transport cost (see [Vil03] for an exposition on the theory of optimaltransportation).

Huang then defines the approximation scheme as

Scheme 1 [Hua00]. Given a previous state ρhk−1, define ρhk as the solution of theminimization problem

minρ

1

2h

1

γW h(ρ

hk−1, ρ) +A(ρ) +

2m

γh

∫R2d

ρ(q, p)V (q) dqdp, (2.10)

where W h is the optimal-transport cost on R2d with cost function Ch.

Huang proves [Hua00] that the approximations generated by this scheme indeed con-verge to the solution of (2.1) as h→ 0.

2.1.5 Criticism

Although Scheme 1 is approximately of similar form to (2.4), there are in fact importantissues with this scheme:

1. In (2.1), the dissipative effects are represented by the terms prefixed by γ, and theconservative effects by the the Hamiltonian terms divq ρp/m and divp ρ∇V . It wouldbe natural to see these effects play separate roles in the variational formulation.However, in Scheme 1 the effects are mixed, since the final term in (2.10) mixesconservative effects (represented by V and m) with dissipative effects (the prefactorγ, and the role as driving force in a gradient-flow-type minimization).

2. The dependence on h of the final term in (2.10) adds to the confusion; since this pa-rameter is an approximation parameter chosen independently from the actual system,the combination A+ 2m/γh

∫ρV can not be considered a single driving potential.


3. In fact, in the standard case F (p) = p2/2m the sum of A and∫ρV is a natural object,

since it represents total free energy and decreases along solutions (see Section 2.1.3).Note how the coefficient in this sum is 1 instead of 2m/γh.

The way in which V appears in Scheme 1 can be traced back to the fact that of the twoconservative terms in (2.1) and (2.2), only P/m is represented in the definition of the costCh, in the right-hand side of (2.8a); the term ∇V is missing in (2.8). Therefore the schemehas to compensate for the other term ∇V in a different manner.

These arguments lead us to pose the following question, which is the central topic ofthis chapter:

Can we construct an approximation scheme that respects the conservative-dissipative split?

The answer is ‘yes’, and in the rest of this chapter we explain how; in fact we detail threedifferent schemes, corresponding to different ways of answering this question.

2.1.6 The schemes of this chapter

Approach 2. To set up a new cost functional, we first return to the single-particle pointof view, as in (2.6) and (2.8). We now take a particle whose behaviour is a combination ofthe two Hamiltonian terms in (2.2) and a noise term:

dQε(t) =Pε(t)

mdt, (2.11a)

dPε(t) = −∇V (Qε) dt+√

2γβ−1ε dW (t), (2.11b)

which again can formally be written as

md2

dt2Qε(t) +∇V (Qε(t)) =

√2γβ−1ε

dW

dt(t).

Note how this system differs from (2.8) by the term involving ∇V in (2.11b).

A very similar application of the Freidlin-Wentzell theorem states that Qε satisfies alarge-deviation principle as ε→ 0 with rate function

I(ξ) =1

4γβ−1

∫ h

0

∣∣mξ(t) +∇V (ξ(t))∣∣2 dt.

This leads to the following scheme.


Scheme 2a. We define the cost to be


∫ h

0

∣∣mξ(t) +∇V (ξ(t))∣∣2 dt : ξ ∈ C1([0, h],Rd) such that

(ξ,mξ)(0) = (q, p), (ξ,mξ)(h) = (q′, p′)

. (2.12)

Given a previous state ρhk−1, define ρhk as the solution of the minimization problem

minρ

1

2h

1

γWh(ρ

hk−1, ρ) +A(ρ), (2.13)

where Wh is the optimal-transport cost on R2d with cost function Ch.

Note how now the term involving V has disappeared from the minimization prob-lem (2.13). In Sections 2.4–2.6 we show that this approximation scheme converges to thesolution of (2.1) as h→ 0.

For practical purposes it is inconvenient that the cost Ch in (2.12) has no explicit

expression. It turns out that we may approximate Ch with an explicit expression andobtain the same limiting behaviour.

Scheme 2b. Define


∫ h

0

∣∣∣mξ(t) +∇V (q)∣∣∣2 dt : (ξ,mξ)(0) = (q, p), (ξ,mξ)(h) = (q′, p′)

(2.9)= |p′ − p|2 + 12

∣∣∣∣mh (q′ − q)− p′ + p

2

∣∣∣∣2 + 2h(p′ − p) · ∇V (q) + h2 |∇V (q)|2

= |p′ − p+ h∇V (q)|2 + 12

∣∣∣∣mh (q′ − q)− p′ + p

2

∣∣∣∣2 . (2.14)

Given a previous state ρhk−1, define ρhk as the solution of the minimization problem

minρ

1

2h

1

γWh(ρ

hk−1, ρ) +A(ρ), (2.15)

where Wh is the optimal-transport cost on R2d with cost function Ch.

Note how Ch differs from (2.12) in that ξ(t) is replaced by q in ∇V . This approxi-mation is exact when V is linear. We prove the convergence of solutions of Scheme 2b inSections 2.4-2.6.

Neither of the costs Ch and Ch gives rise to a metric, since they are asymmetric anddo not vanish when (q′, p′) = (q, p). It is possible to construct a two-step scheme with asymmetric cost and corresponding metric Wh.


Scheme 2c. Define

Ch(q, p; q′, p′) := |p′ − p|2 + 12

∣∣∣∣mh (q′ − q)− p′ − p2

∣∣∣∣2 + 2m(q′ − q) · (∇V (q′)−∇V (q)).

(2.16)Assume ρhk−1 is given, define the single-step, backwards approximate streaming operator

σh(q, p) :=(q − h p

m, p+ h∇V (q)

). (2.17)

Given a previous state ρhk−1, define ρhk in two steps.

Hamiltonian step: First determine µhk(q, p) such that

µhk(q, p) := σ−1h (q, p)]ρ

hk−1(q, p), (2.18)

where ] denotes the push forward operator.

Gradient flow step: Then determine ρhk that minimizes

minρ

1

2h

1

γWh(µ

hk, ρ) +A(ρ), (2.19)

where Wh is the metric on R2d generated by the cost function Ch.

2.1.7 Organization of the chapter

The rest of the chapter is organized as follows. In Section 2.2, we describe our assump-tions and state the main result. Section 2.3 establishes some properties of the three costfunctions. The proof of the main theorem is given in Sections 2.4 to 2.6. In Section 2.4, weestablish the Euler-Lagrange equations for the minimizers in three schemes. In Section 2.5,we prove the boundedness of the second moments and the entropy functional. Finally, theconvergence result is given in Section 2.6.

2.2 Assumptions and main result of the chapter

Throughout the chapter we make the following assumptions.

V ∈ C3(Rd) and F ∈ C2(Rd), F (x) ≥ 0 for all x ∈ Rd. (2.20)

There exists a constant C > 0 such that for all z1, z2 ∈ Rd

1

C|z1 − z2|2 ≤ (z1 − z2) · (∇V (z1)−∇V (z2)), (2.21a)

|∇V (z1)−∇V (z2)| ≤ C |z1 − z2| , (2.21b)

|∇F (z1)−∇F (z2)| ≤ C |z1 − z2| , (2.21c)∣∣∇2V (z1)∣∣ , ∣∣∇3V (z1)

∣∣ ≤ C. (2.21d)

2.2. Assumptions and main result of the chapter 41

Note that (2.21a) implies that V increases quadratically at infinity, and therefore V achievesits minimum. Without loss of generality we assume that this minimum is at the origin,which implies the estimate

|∇V (z)| ≤ C|z|. (2.22)

As we remarked in the Introduction, we work in the dimensional setting, and keep allthe physical constants in place, in order to make the physical background of the expressionsclear. We make an important exception, however, for inequalities of the type above; herethe constants C can have any dimension, and we will group terms on the right-hand sideof such estimates without taking their dimensions into account. This can be done withoutloss of generality, since we do not specify the generic constant C, and this constant will beallowed to vary from one expression to the next.

We only consider probability measures on R2d which have a Lebesgue density, and weoften tacitly identify a probability measure with its density. We denote by P2(R2d) the setof all probability measures on Rd ×Rd with finite second moment,

P2(R2d) :=

ρ : Rd ×Rd → [0,∞) measurable,

∫R2d

ρ(q, p)dqdp = 1,M2(ρ) <∞,

where

M2(ρ) =

∫R2d

(γ2|q|2 + |p|2)ρ(q, p) dqdp. (2.23)

With these assumptions, the functionals A and E introduced in the introduction arewell-defined in P2(R2d). Moreover, the following two lemmas are now classical (see,e.g., [Vil03, Theorem 1.3], [JKO98, Proposition 4.1], and [Hua00, Lemma 4.2]). Let C∗hbe one of Ch, Ch, or Ch, defined in (2.12), (2.14), and (2.16), with corresponding optimal-transport cost functional W ∗

h .

Lemma 2.2.1. Let ρ0, ρ ∈ P2(R2d) be given. There exists a unique optimal plan P ∗opt ∈Γ(ρ0, ρ) such that

W ∗h (ρ0, ρ) =

∫R4d

C∗h(q, p; q′, p′)P ∗opt(dqdpdq′dp′). (2.24)

Lemma 2.2.2. Let ρ0 ∈ P2(R2d) be given. If h is small enough, then the minimizationproblem

minρ∈P2(R2d)

1

2h

1

γW ∗h (ρ0, ρ) +A(ρ), (2.25)

has a unique solution.

These lemmas imply that Schemes 2a–c are well-defined. Next, we make the definitionof a weak solution precise. A function ρ ∈ L1(R+ × R2d) is called a weak solution ofequation (2.1) with initial datum ρ0 ∈ P2(R2d) if it satisfies the following weak formulation


of (2.1):∫ ∞0

∫R2d

[∂tϕ+

p

m· ∇qϕ−

(∇qV (q) + γ∇pF (p)

)· ∇pϕ+ γβ−1∆pϕ

]ρ dqdpdt

= −∫R2d

ϕ(0, q, p)ρ0(q, p) dqdp, for all ϕ ∈ C∞c (R×R2d). (2.26)

The main result of the chapter is the following.

Theorem 2.2.3. Let ρ0 ∈ P2(R2d) satisfy A(ρ0) < ∞. For any h > 0 sufficiently small,let ρhk be the sequence of the solutions of any of the three Schemes 2a–c. For any t ≥ 0,define the piecewise-constant time interpolation

ρh(t, q, p) = ρhk(q, p) for (k − 1)h < t ≤ kh. (2.27)

Then for any T > 0,

ρh ρ weakly in L1((0, T )×R2d) as h→ 0, (2.28)

where ρ is the unique weak solution of the Kramers equation with initial value ρ0. Moreover

ρh(t)→ ρ(t) weakly in L1(R2d) as h→ 0 for any t > 0, (2.29)

and as t→ 0,ρ(t)→ ρ0 in L1(R2d). (2.30)

Outline of the proof. The proof follows the procedure of [JKO98] (see also [Hua00]) and isdivided into three main steps, which are carried out in Sections 2.4, 2.5, and 2.6: establishthe Euler-Lagrange equation for the minimizers, then estimate the second moments andentropy functionals, and finally pass to the limit h→ 0. We start in Section 2.3 with someproperties of the cost functions.

2.3 Properties of the three cost functions

Here we derive and summarize a number of properties of the three cost functions. Definethe quadratic form

N(q, p) := |γq|2 + |p|2,

so that M2(ρ) =∫R2d N(q, p) ρ(q, p) dqdp.

Lemma 2.3.1. 1. Let C∗h be either Ch or Ch. There exists C > 0 such that

|q − q′|2 + |p− p′|2 ≤ CCh(q, p; q′, p′), (2.31a)

|q − q′|2 ≤ Ch2[C∗h(q, p; q′, p′) +N(q, p) +N(q′, p′)

], (2.31b)

|p− p′|2 ≤ C[C∗h(q, p; q′, p′) + h2N(q, p) + h2N(q′, p′)

]. (2.31c)

2.3. Properties of the three cost functions 43

2. For the cost function Ch of Scheme 2a we have

∇q′Ch(q, p; q′, p′) =

24m

h

(m

h(q′ − q)− p′ + p

2

)− 2h∇2V (q′) · p′ + σh(q, p; q

′, p′),

(2.32a)

∇p′Ch(q, p; q′, p′) = 2(p′ − p)− 12

(m

h(q′ − q)− p′ + p

2

)+ 2h∇V (q′) + τh(q, p; q

′, p′),

(2.32b)

where there exists C > 0 such that

|σh(q, p; q′, p)|,1

h|τh(q, p; q′, p′)| ≤ Ch

Ch(q, p; q

′, p′) + N(q, p) + N(q′, p′) + 1.

(2.33)

3. For the cost function Ch of Scheme 2b we have

∇q′Ch(q, p; q′, p′) =

24m

h

(m

h(q′ − q)− p′ + p

2

), (2.34a)

∇p′Ch(q, p; q′, p′) = 2(p′ − p)− 12

(m

h(q′ − q)− p′ + p

2

)+ 2h∇V (q). (2.34b)

4. For the cost function Ch of Scheme 2c we have

∇q′Ch(q, p; q′, p′) =

24m

h

(m

h(q′ − q)− p′ − p

2

)+ 4m(∇V (q′)−∇V (q)) + r(q, q′),

(2.35a)

∇p′Ch(q, p; q′, p′) = 2(p′ − p)− 12

(m

h(q′ − q)− p′ − p

2

), (2.35b)

where|r(q, q′)| ≤ Ch2

[Ch(q, p; q

′, p′) +N(q, p) +N(q′, p′)]. (2.36)

Proof. For the length of this proof we fix q, p, q′, p′, and h, and we abbreviate

Ch := Ch(q, p; q′, p′), Ch := Ch(q, p; q

′, p′),

N := N(q, p) +N(q′, p′) = |γq|2 + |p|2 + |γq′|2 + |p′|2.

Let ξ(t) and ξ(t), respectively, be the optimal curves in the definition of Ch in (2.9) and of

Ch in (2.14). We will need a number of properties of these two curves. All the statementsbelow are of the following type: there exists C > 0 and 0 < h0 < 1 such that the propertyholds for all h < h0. Here C is always independent of q, p, q′, p′, and h. The norm ‖ · ‖p isthe Lp-norm on the interval (0, h).


The curve ξ satisfies....ξ = 0, and hence it is a cubic polynomial

ξ(t) = q0 + at+ bt2 + ct3, (2.37)

where the coefficients can be calculated from the boundary conditions:

a =p

m, b =

3

h2

(q′ − q − ph

m

)− p′ − p

mh, c =

p′ + p

mh2− 2

h3(q′ − q).

Explicit calculations give

‖ξ‖22 ≤ h‖ξ‖2

∞ ≤ ChN, (2.38)

‖ξ‖22 ≤ h‖ξ‖2

∞ ≤ Ch−3|q − q′|2 + h−1|p− p′|2

, (2.39)

‖ξ‖1 ≤ h‖ξ‖∞ ≤ Ch−1|q − q′|+ |p− p′|

. (2.40)

The curve ξ(t) satisfies the equation

N (ξ)(t) := m2....ξ (t) + 2m∇2V (ξ) · ¨ξ(t) +m∇3V (ξ) · ˙

ξ · ˙ξ(t) +∇2V (ξ) · ∇V (ξ)(t) = 0,

(2.41)

(ξ, m˙ξ)(0) = (q, p), (ξ, m

˙ξ)(h) = (q′, p′),

where ∇3V is the third-order tensor of third derivatives of V . This is a relatively benignequation, but non-trivially nonlinear.

We will need the following four intermediate estimates:

‖ξ‖22 ≤ ChN, (2.42)

Ch + h‖¨ξ ‖22 ≤ C

Ch + h2N

, (2.43)

‖ ˙ξ ‖2

2 ≤ ChCh +N

, (2.44)

‖....u ‖1 ≤ CCh +N + 1

. (2.45)

We first prove (2.42). Since ξ is optimal in Ch,

m‖¨ξ‖2 ≤ ‖m¨ξ +∇V (ξ)‖2 + ‖∇V (ξ)‖2

(2.12)

≤ ‖mξ +∇V (ξ)‖2 + ‖∇V (ξ)‖2

≤ m‖ξ‖2 + ‖∇V (ξ)‖2 + ‖∇V (ξ)‖2

(2.22)

≤ m‖ξ‖2 + C(‖ξ‖2 + ‖ξ‖2

)≤ m‖ξ‖2 + C

(‖ξ‖2 + h1/2‖ξ‖∞

). (2.46)

2.3. Properties of the three cost functions 45

Therefore

‖ξ‖∞ ≤ |ξ(0)|+ h| ˙ξ(0)|+ h3/2‖¨ξ‖2

≤ |q|+ h

m|p|+ Ch3/2

‖ξ‖2 + ‖ξ‖2 + h1/2‖ξ‖∞

.

If h0 is small enough, then Ch2 < 1/2, so that

‖ξ‖∞(2.38),(2.39)

≤ 2|q|+ 2h

m|p|+ C

|q − q′|+ h|p− p′|+ h2

√N.

Therefore‖ξ‖2

2 ≤ h‖ξ‖2∞ ≤ ChN,

which is (2.42).

Similar to (2.46) it also follows, since ξ is admissible for Ch, that

Ch = m2h‖ξ‖22 ≤ m2h‖¨ξ‖2

2 ≤ 2h‖m¨ξ +∇V (ξ)‖2

2 + 2h‖∇V (ξ)‖22

(2.12),(2.22)

≤ 2Ch + Ch‖ξ‖22

(2.42)

≤ 2Ch + Ch2N,

which implies (2.43).We now can prove part 1 of the Lemma. (2.31a) is a direct consequence of (2.16)

and (2.21a). The estimate for p follows from (2.14) and (2.22) for Ch, and from (2.9)

and (2.43) for Ch:

|p′ − p|2 ≤ C[|p′ − p+ h∇V (q)|2 + h2 |∇V (q)|2

]≤ C

[Ch(q, p; q

′, p′) + h2N],

|p′ − p|2 ≤ Ch ≤ C(Ch + h2N).

Similarly,

|q′ − q|2 =h2

m2

∣∣∣∣mh (q′ − q)− p+ p′

2+p′ + p

2

∣∣∣∣2≤ 3h2

m2

(∣∣∣∣mh (q′ − q)− p+ p′

2

∣∣∣∣2 +|p|2

4+|p′|2

4

)≤ Ch2(Ch +N) ≤ Ch2(Ch +N), (2.47)

and also|q′ − q|2 ≤ Ch2(Ch +N).

Using the Poincare inequality ‖v − −∫v‖2 ≤ Ch‖v′‖2, the estimate (2.44) then follows

by

‖ ˙ξ‖2

2 ≤ 2‖−∫ ˙ξ‖2

2 + Ch2‖¨ξ‖22

(2.43)

≤ 2

h|q − q′|2 + Ch

Ch + h2N

(2.31b)

≤ ChCh +N

.


To prove the final of the four intermediate estimates, (2.45), we define u = ξ−ξ; remarkthat

m2....u = −2m∇2V (ξ) · ¨ξ −m∇3V (ξ) · ˙

ξ · ˙ξ −∇2V (ξ) · ∇V (ξ). (2.48)

Note that u = u = 0 at t = 0, h, so that we have ‖u‖1 ≤ Ch4‖....u ‖1 and ‖u‖1 ≤ Ch2‖....u ‖1.We then calculate

‖....u ‖1

(2.48),(2.21)

≤ C‖¨ξ‖1 + ‖ ˙

ξ‖22 + ‖ξ‖1

≤ C

‖ξ‖1 + ‖ ˙

ξ‖22 + ‖ξ‖1 + ‖u‖1 + ‖u‖1

≤ C

‖ξ‖1 + ‖ ˙

ξ‖22 + ‖ξ‖1 + h2‖....u ‖1 + h4‖....u ‖1

.

Again, taking h0 sufficiently small, we have C(h2 + h4) < 1/2, and therefore

‖....u ‖1 ≤ C‖ξ‖1 + ‖ ˙

ξ‖22 + ‖ξ‖1

(2.38),(2.40),(2.44)

≤ C |q − q′|

h+ |p− p′|+ hCh + hN + h

√N

(2.31b)

≤ C√

Ch +N + hCh +N + 1

≤ CCh +N + 1

.

We now continue with parts 2, 3, and 4. The derivatives of Ch can be calculated directlyusing the explicit expression (2.14). The derivatives of Ch can be calculated as follows.Let η ∈ C2([0, h]; R2d) satisfy η(0) = η(0) = 0. Then

4γβ−1hd

dεI(ξ + εη)

∣∣ε=0

= 2h

∫ h

0

(m

¨ξ +∇V (ξ)

)·(mη +∇2V (ξ) · η

)(t) dt

= 2h

∫ h

0

N (ξ) · η(t) dt+ 2h[mη(m

¨ξ +∇V (ξ)

)−mη

(m

...ξ +∇2V (ξ) · ˙

ξ)]

(h).

Note that N (ξ) ≡ 0 by the stationarity (2.41) of ξ. This expression is equal to

∇q′Ch(q, p; q′, p′) · η(h) +∇p′Ch(q, p; q

′, p′) ·mη(h),

which allows us to identify the two derivatives in terms of ξ. Setting u = ξ− ξ, we rewrite

2.4. The Euler-Lagrange equation for the minimization problem 47

these in terms of u:

∇q′Ch(q, p; q′, p′) = −2hm2

...ξ (h)− 2hm∇2V (ξ(h)) · ˙

ξ(h)

= −2hm2...ξ (h)− 2hm∇2V (ξ(h)) · ˙

ξ(h)− 2hm2(...ξ (h)−

...ξ (h)

)(2.37)=

24m

h

(m

h(q′ − q)− p′ + p

2

)− 2h∇2V (q′) · p′ − 2hm2...

u (h),

∇p′Ch(q, p; q′, p′) = 2hm

¨ξ(h) + 2h∇V (ξ(h))

= 2hmξ(h) + 2h∇V (ξ(h)) + 2hm(¨ξ(h)− ξ(h)

)(2.37)= 2(p′ − p)− 12

(m

h(q′ − q)− p′ + p

2

)+ 2h∇V (q′) + 2hmu(h).

Therefore (2.32) holds with

σh = −2hm2...u (h) and τh = 2hmu(h).

The estimates (2.33) then follow from (2.45) and the inequalities

‖u‖∞ ≤ h‖...u ‖∞ ≤ Ch‖....u ‖1,

which hold since u = u = 0 at t = 0, h.The derivatives of Ch are given by (2.35), where

r(q, q′) := 2m[∇2V (q′) · (q′ − q)−∇V (q′) +∇V (q)

].

The estimate (2.36) on r follows from (2.21d), (2.47), and the fact that by (2.21a), Ch ≤Ch.

2.4 The Euler-Lagrange equation for the minimiza-

tion problem

Let C∗h be one of Ch, Ch, or Ch, defined in (2.12), (2.14), and (2.16), with correspondingoptimal-transport cost functional W ∗

h . Let ρ ∈ P2(R2d) be given and let ρ be the uniquesolution of the minimization problem

minµ∈P2(R2d)

1

2γhW ∗h (ρ, µ) +A(µ).

We now establish the Euler-Lagrange equation for ρ. Following the now well-establishedroute (see e.g. [JKO98, Hua00]), we first define a perturbation of ρ by a push-forward underan appropriate flow. Let ξ, η ∈ C∞0 (R2d,Rd). We define the flows Φ,Ψ: [0,∞)×R2d → Rd

such that

∂Ψs

∂s= φ(Ψs,Φs),

∂Φs

∂s= η(Ψs,Φs),

Ψ0(q, p) = q, Φ0(q, p) = p.


Let ρs(q, p) be the push forward of ρ(q, p) under the flow (Ψs,Φs), i.e., for any ϕ ∈C∞0 (R2d,R) we have∫

R2d

ϕ(q, p)ρs(q, p)dqdp =

∫R2d

ϕ(Ψs(q, p),Φs(q, p))ρ(q, p)dqdp. (2.49)

Obviously ρ0(q, p) = ρ(q, p), and an explicit calculation gives

∂sρs∣∣s=0

= −divqρφ− divpρη in the sense of distributions. (2.50)

By following the calculations in e.g. [Hua00] we then compute the stationarity conditionon ρ,

0 =1

2γh

∫R4d

[∇q′C∗h(q, p; q′, p′) · φ(q′, p′) +∇p′C

∗h(q, p; q′, p′) · η(q′, p′)]P ∗opt(dqdpdq

′dp′)

+

∫R2d

ρ(q, p)∇pF (p) · η(q, p)dqdp− β−1

∫R2d

ρ(q, p) [divqφ(q, p) + divpη(q, p)] dqdp,

(2.51)

where P ∗opt is optimal in W ∗h (ρ, ρ). For any ϕ ∈ C∞0 (R2d,R), we choose

φ(q′, p′) = − γh2

6m2∇q′ϕ(q′, p′) +

γh

2m∇p′ϕ(q′, p′),

η(q′, p′) = − γh2m∇q′ϕ(q′, p′) + γ∇p′ϕ(q′, p′).

i.e., (φη

)=

(− γh2

6m2 Iγh2mI

− γh2mI γI

)∇ϕ(q′, p′). (2.52)

Now the specific form of the cost functional C∗h(q, p; q′, p′) comes into play. We calculatethe gradient expression in (2.51) for each scheme in the next subsections.

Remark 2.4.1. The structure of the choice (2.52) can be understood in terms of theconservative-dissipative nature of the Kramers equation. The matrix in front of ∇ϕ(q′, p′)in (2.52) is of the form(

− γh2

6m2 Iγh2mI

− γh2mI γI

)=

(− γh2

6m2 I 00 γI

)︸︷︷︸

A

− γh

2m

(0 I−I 0

)︸︷︷︸

B

.

Note that A is symmetric and B is antisymmetric: this mirrors the conservative-dissipativestructure of the Kramers equation.

The top-left block in A, which would correspond to diffusion in the spatial variable q,is of order O(h2), and therefore vanishes when h→ 0. The other block, which correspondsto diffusion in the momentum variable p, is of order O(1) and remains. This explains howin the limit h→ 0 only diffusion in the momentum variable remains.


2.4.1 Schemes 2a and 2b

Lemma 2.4.2. Let h > 0 and let ρhk be the sequence of the minimizers either for prob-

lem (2.13) in Scheme 2a or for problem (2.15) in Scheme 2b. Let W ∗h be Wh for Scheme 2a

and Wh for Scheme 2b, and let P h∗k be optimal in W ∗

h (ρhk−1, ρhk). Then, for all ϕ ∈ C∞c (R2d),

there holds

0 =1

h

∫R4d

[(q′ − q) · ∇q′ϕ(q′, p′) + (p′ − p) · ∇p′ϕ(q′, p′)]P h∗k (dqdpdq′dp′)

− 1

m

∫R2d

p′ · ∇q′ϕ(q′, p′)ρhk(q′, p′)dq′dp′ +

∫R2d

∇V (q′) · ∇p′ϕ(q′, p′)ρhk(q′, p′)dq′dp′

+ γ

∫R2d

[∇F (p′) · ∇p′ϕ(q′, p′)− β−1∆p′ϕ(q′, p′)

]ρhk(q

′, p′)dq′dp′ + ωhk , (2.53)

where

|ωhk | ≤ Ch[W ∗h (ρhk−1, ρ

hk) +M2(ρhk−1) +M2(ρhk) + 1

].

The second moment M2 is defined in (2.23).

Proof. For Scheme 2b we combine (2.52) with (2.34) to yield

∇q′Ch(q, p; q′, p′) · φ(q′, p′) +∇p′Ch(q, p; q

′, p′) · η(q′, p′)

= 2γ

[(q′ − q) · ∇q′ϕ(q′, p′) + (p′ − p) · ∇p′ϕ(q′, p′)− h

mp′ · ∇q′ϕ(q′, p′)

]+ 2γ∇V (q) ·

[− h2

2m∇q′ϕ(q′, p′) + h∇p′ϕ(q′, p′)

]. (2.54)

Substituting (2.52) and (2.54) into the Euler-Lagrange equation (2.51), we obtain

0 =1

h

∫R4d

[(q′ − q) · ∇q′ϕ(q′, p′) + (p′ − p) · ∇p′ϕ(q′, p′)] P hk (dqdpdq′dp′)

− 1

m

∫R2d

p′ · ∇q′ϕ(q′, p′)ρhk(q′, p′)dq′dp′ +

∫R4d

∇V (q) · ∇p′ϕ(q′, p′)P hk (dqdpdq′dp′)

+ γ

∫R2d

[∇F (p′) · ∇p′ϕ(q′, p′) + β−1 h2

6m2∆q′ϕ(q′, p′)− β−1∆p′ϕ(q′, p′)

]ρhk(q

′, p′)dq′dp′

− h

2m

∫R4d

[∇V (q) + γ∇F (p′)

]· ∇q′ϕ(q′, p′)P h

k (dqdpdq′dp′). (2.55)


Therefore (2.53) holds with

|ωhk | =

∣∣∣∣ ∫R4d

(∇V (q)−∇V (q′)) · ∇p′ϕ(q′, p′)P hk (dqdpdq′dp′)dq′dp′

+β−1 h2

6m2

∫R2d

∆q′ϕ(q′, p′)ρhk(q′, p′)dq′dp′

− h

2m

∫R4d

[∇V (q) + γ∇F (p′)

]· ∇q′ϕ(q′, p′)P h

k (dqdpdq′dp′)

∣∣∣∣(2.21b),(2.21c)

≤ C

∫R4d

[|q − q′|+ h(|q|+ |p′|+ 1)

]P hk (dqdpdq′dp′)

≤ C

∫R4d

[1

h|q − q′|2 + h(|q|2 + |p′|2 + 1)


(2.31)

≤ Ch[Wh(ρ

hk−1, ρ

hk) +M2(ρhk−1) +M2(ρhk) + 1

].

This proves Lemma 2.4.2 for Scheme 2b.

For Scheme 2a we obtain an identity similar to (2.54),


′, p′) · η(q′, p′)

= 2γ

[(q′ − q) · ∇q′ϕ(q′, p′) + (p′ − p) · ∇p′ϕ(q′, p′)− h

mp′ · ∇q′ϕ(q′, p′)

]+ 2γ

h∇V (q′) +

1

2τh(q, p; q

′, p′)·[− h

2m∇q′ϕ(q′, p′) +∇p′ϕ(q′, p′)

]+ 2γ

−h∇2V (q′) · p′ + 1

2σh(q, p

′; q′, p′)·[− h2

6m2∇q′ϕ(q′, p′) +

h

2m∇p′ϕ(q′, p′)

].

This leads to the same equation as (2.53), but now with error term

ωhk = − h

2m

∫R4d

∇V (q′) · ∇q′ϕ(q′, p′)P hk (dqdpdq′dp′)

+

∫R4d

∇2V (q′) · p′ − 1

2hσh(q, p; q

′, p′)·[h2

6m2∇q′ϕ(q′, p′)− h

2m∇p′ϕ(q′, p′)


+1

2h

∫R4d

τh(q, p; q′, p′)

[− h

2m∇q′ϕ(q′, p′) +∇p′ϕ(q′, p′)


− γh

2m

∫R4d

∇F (p′) · ∇q′ϕ(q′, p′)ρhk(q′, p′)dq′dp′

+ β−1 h2

6m2

∫R2d

∆q′ϕ(q′, p′)ρhk(q′, p′)dq′dp′.


We estimate this error as follows, using the notation of the proof of Lemma 1:

|ωhk | ≤ C

∫R4d

h(1 + |q′|) + h|p′|+ |σh|+

1

h|τh|+ h(1 + |p′|) + h2

P hk

≤ C

∫R4d

h(1 + |q′|2 + |p′|2) + h

[Ch +N + 1

]P hk

≤ Ch

∫R4d

[Ch +N + 1

]P hk

≤ Ch[Wh(ρ

hk−1, ρ

kk) +M2(ρhk−1) +M2(ρhk) + 1

].

This concludes the proof of Lemma 2.4.2.

2.4.2 Scheme 2c

Lemma 2.4.3. Let h > 0 and let µhk and ρhk be the sequences constructed in Scheme2c. Let P h

k (dqdpdq′dp′) be the optimal plan in the definition of Wh(µhk, ρ

hk). Then, for all

ϕ ∈ C∞c (R2d), there holds

0 =1

h

∫R4d

[(q′ − q +

p

mh) · ∇q′ϕ(q′, p′) + (p′ − p− h∇qV (q)) · ∇p′ϕ(q′, p′)


− 1

m

∫R2d

p · ∇qϕ(q, p)ρhk(dqdp) +

∫R2d

∇V (q) · ∇pϕ(q, p)ρhk(q, p)dqdp

+ γ

∫R2d

[∇F (p) · ∇pϕ(q, p)− β−1∆pϕ(q, p)

]ρhk(q, p)dqdp+ ζhk , (2.56)

where

|ζhk | ≤ Ch[hWh(µ

hk, ρ

hk) +M2(µhk) +M2(ρhk) + 1].

Proof. From (2.52) and (2.35) we obtain


′, p′) · η(q′, p′)

= 2γ

[(q′ − q) · ∇q′ϕ(q′, p′) + (p′ − p) · ∇p′ϕ(q′, p′)− h

m(p′ − p) · ∇q′ϕ(q′, p′)

]+ γ[4m(∇V (q′)−∇V (q)) + r(q, q′)

]·− h2

6m2∇q′ϕ(q′, p′) +

h

2m∇p′ϕ(q′, p′)

.

(2.57)


Substituting (2.52) and (2.57) into the Euler-Lagrange equation (2.51), we obtain

0 =1

h

∫R4d

[(q′ − q) · ∇q′ϕ(q′, p′) + (p′ − p) · ∇p′ϕ(q′, p′)]P hk (dqdpdq′dp′)

− 1

m

∫R4d

(p′ − p) · ∇q′ϕ(q′, p′)P hk (dqdpdq′dp′) (2.58)

+

∫R4d

(∇V (q′)−∇V (q)) · ∇p′ϕ(q′, p′)P hk (dqdpdq′dp′)

+ γ

∫R2d

[∇F (p) · ∇pϕ(q, p)− β−1∆pϕ(q, p)

]ρhk(q, p)dqdp+ ζhk , (2.59)

where we estimate the remainder, again using the notation of the proof of Lemma 2.3.1,

|ζhk | =

∣∣∣∣∣− h

3m

∫R4d

(∇V (q′)−∇V (q)) · ∇q′ϕ(q′, p′)P hk (dqdpdq′dp′)

+1

2

∫R4d

r(q, q′) ·[− h

6m2∇q′ϕ(q′, p′) +

1

2m∇p′ϕ(q′, p′)


− γh

2m

∫R2d

ρhk(q, p)∇F (p) · ∇qϕ(q, p)dqdp+ β−1 γh2

6m2

∫R2d

ρhk(q, p)∆qϕ(q, p)dqdp

∣∣∣∣∣(2.21),(2.36)

≤ C

∫R4d

[h|q′ − q|+ h2(Ch +N) + h(1 + |p′|) + h2


≤ C

∫R4d

[h(|q|2 + |q′|2) + h2(Ch +N) + h(1 + |p′|2)


≤ Ch[hWh(µ

hk, ρ

hk) +M2(µhk) +M2(ρhk) + 1].

This concludes the proof of Lemma 2.4.3.

2.5 A priori estimate: Boundedness of the second mo-

ment and entropy

This section includes some technical lemmas that are needed in order to prove theconvergence result of Section 2.6.

Lemma 2.5.1. Let ρhkk≥1 be the sequence of the minimizers of Scheme 2a or Scheme 2bfor fixed h > 0. Then for any positive integer n and sufficiently small h, we have

n∑k=1

W ∗h (ρhk−1, ρ

hk) ≤ 2γh(A(ρ0)−A(ρhn)) + Ch2

n∑k=0

M2(ρhk) + Cnh2, (2.60)

for some constant C > 0 independent of n, where W ∗h is either Wh or Wh. Similarly, if

µhk and ρhk are the sequences constructed in Scheme 2c, thenn∑k=1

Wh(µhk, ρ

hk) ≤ 2γh(A(ρ0)−A(ρhn)) + Ch2

n∑k=0

M2(ρhk) + Cnh2.

2.5. A priori estimate: Boundedness of the second moment and entropy 53

Proof. We give the details for Scheme 2a and then comment on the differences for the otherschemes. We first define the operator sh : R2d → R2d as the solution operator over time hfor the Hamiltonian system

Q′ =P

m, P ′ = −∇V (Q), (2.61)

that is, sh(q, p) is the solution at time h given the initial datum (q, p) at time zero. Theoperator sh is bijective and volume-preserving.

For any fixed k ≥ 1, ρhk minimizes the functional (2hγ)−1Wh(ρhk−1, ρ) + A(ρ) over

ρ ∈ P2(R2d), i.e.,

Wh(ρhk−1, ρ

hk) + 2hγA(ρhk) ≤ Wh(ρ

hk−1, ρ) + 2hγA(ρ), (2.62)

for every ρ ∈ P2(R2d). In particular by taking ρ = (s−1h )]ρ

hk−1 =: ρh∗ , for which Wh(ρ

hk−1, ρ

h∗) =

0, it follows that

Wh(ρhk−1, ρ

hk) ≤ 2γh

[A(ρh∗)−A(ρhk)

]= 2γh

[F(ρh∗)−F(ρhk)

]+ 2γh

[S(ρh∗)−S(ρhk)

]. (2.63)

We now estimate each term on the right hand side. Write (q, p) = sh(q, p). Using equa-tion (2.61), we readily estimate that the solution (Q(t), P (t)) starting at (q, p) and endingat (q, p) satisfies ‖Q‖∞ ≤ C (|q|+ h|p|), and therefore∣∣∣∣∫ h

0

∇V (Q(t))dt

∣∣∣∣ ≤ h supt∈[0,h]

|∇V (Q(t))| ≤ h‖Q‖∞ ≤ Ch (|q|+ h|p|) ,

so that

F (p) = F(p+

∫ h

0

∇V (Q(t))dt)

(2.20),(2.21c)

≤ F (p) + C(|p|+ 1)

∣∣∣∣∫ h

0

∇V (Q(t))dt

∣∣∣∣+ C

(∫ h

0

∇V (Q(t))dt

)2

≤ F (p) + Ch(|p|+ 1) (|q|+ h|p|) + Ch2 (|q|+ h|p|)2

≤ F (p) + Ch[N(q, p) + 1

].

Therefore

F(ρh∗) =

∫R2d

F (p)ρh∗(q, p)dqdp =

∫R2d

F (p)ρhk−1(q, p)dqdp

≤∫R2d

(F (p) + ChN(q, p) + Ch)ρhk−1(q, p)dqdp ≤ F(ρhk−1) + ChM2(ρhk−1) + Ch.

(2.64)

For the entropy term, we have, since sh is volume-preserving and bijective,

S(ρh∗) = β−1

∫R2d

ρh∗(q, p) log ρh∗(q, p)dqdp

= β−1

∫R2d

ρhk−1(sh(q, p)) log ρhk−1(sh(q, p))dqdp = S(ρhk−1). (2.65)


From (2.63), (2.64), and (2.65), we obtain

Wh(ρhk−1, ρ

hk) ≤ 2γh(A(ρhk−1)−A(ρhk)) + Ch2M2(ρhk−1) + Ch2.

Summing over k = 1 to n we obtain (2.60).For Scheme 2b, the equation (2.61) only modifies slightly, in that the acceleration

becomes constant:

Q′ =P

m, P ′ = −∇V (q).

Similar estimates lead to the same result.For Scheme 2c, the proof is again similar, by taking ρh∗ := µhk and estimating the

difference A(µhk)−A(ρhk−1) as is done above.

Lemma 2.5.2. There exist positive constants T0, h0, and C, independent of the initialdata, such that for any 0 < h ≤ h0, the solutions ρhkk≥1 for Scheme 2a, Scheme 2b, orScheme 2c, satisfy

M2(ρhk) ≤ C[M2(ρ0) + 1

]and |S(ρhk)| ≤ C

[S(ρ0) +M2(ρ0) + 1

]for any k ≤ K0, (2.66)

where K0 = dT0/he.

Proof. We detail the proof for Scheme 2a; the modifications for Schemes 2b and 2c arevery minor.

For a fixed i, let Pi ∈ Γ(ρhi−1, ρhi ) be the optimal plan in the definition of Wh(ρ

hi−1, ρ

hi ).

We have (∫R2d

|p|2ρhi (q, p)dqdp) 1

2

=

(∫R4d

|p′|2P hi (dqdpdq′dp′)

) 12

≤(∫

R4d

|p′ − p|2P hi (dqdpdq′dp′)

) 12

+

(∫R4d

|p|2P hi (dqdpdq′dp′)

) 12

By (2.31c), we estimate(∫R4d

|p′ − p|2P hi (dqdpdq′dp′)

) 12

≤ CWh(ρhi−1, ρ

hi )

12 + Ch

[M2(ρhi )

12 +M2(ρhi−1)

12

],

and hence,(∫R2d

|p|2ρhi (q, p)dqdp) 1

2

≤(∫

R2d

|p|2ρhi−1(q, p)dqdp

) 12

+CWh(ρhi−1, ρ

hi )

12 +Ch

[M2(ρhi )

12 +M2(ρhi−1)

12

].

Summing over i from 1 to k we obtain(∫R2d

|p|2 ρhk(q, p)dqdp) 1

2

≤ C

k∑i=1

Wh(ρhi−1, ρ

hi )

12 + Ch

k∑i=1

M2(ρhi−1)12 +

(∫R2d

|p|2 ρ0(q, p)dqdp

) 12

≤ Ck∑i=1

Wh(ρhi−1, ρ

hi )

12 + Ch

k∑i=1

M2(ρhi )12 + CM2(ρ0)

12 .

2.5. A priori estimate: Boundedness of the second moment and entropy 55

Therefore∫R2d

|p|2 ρhk(q, p)dqdp ≤ C

(k∑i=1

Wh(ρhi−1, ρ

hi )

12

)2

+ Ch2

(k∑i=1

M2(ρhi )12

)2

+ CM2(ρ0)

≤ Ckk∑i=1

Wh(ρhi−1, ρ

hi ) + Ckh2

k∑i=1

M2(ρhi ) + CM2(ρ0). (2.67)

Similarly, we use (2.47) and the fact that

q′ =h

2m√

32√

3

(m

h(q′ − q)− p+ p′

2

)+

h

2m(p′ + p) + q

to derive that(∫R2d

|q|2ρhi (q, p)dqdp) 1

2

=

(∫R4d

|q′|2P hi (dqdpdq′dp′)

) 12

≤ h

2m√

3

(∫R4d

12

∣∣∣∣mh (q′ − q)− p′ + p

2

∣∣∣∣2 P hi (dqdpdq′dp′)

) 12

+h

2m

(∫R4d

|p′|2P hi (dqdpdq′dp′)

) 12

+h

2m

(∫R4d

|p|2P hi (dqdpdq′dp′)

) 12

+

(∫R2d

|q|2ρhi−1(q, p)dqdp

) 12

≤ ChWh(ρhi−1, ρ

hi )

12 + Ch

[M2(ρhi−1)

12 +M2(ρhi )

12

]+

(∫R2d

|q|2ρhi−1(q, p)dqdp

) 12

.

Summing over i from 1 to k, we obtain(∫R2d

|q|2ρhk(q, p)dqdp) 1

2

≤ Chk∑i=1

Wh(ρhi−1, ρ

hi )

12 + Ch

k∑i=1

M2(ρhi )12 + CM2(ρ0)

12

and therefore,∫R2d

γ2 |q|2 ρhk(q, p)dqdp ≤ Ckh2

k∑i=1

Wh(ρhi−1, ρ

hi ) + Ckh2

k∑i=1

M2(ρhi ) + CM2(ρ0). (2.68)

From (2.67) and (2.68) it holds that

M2(ρhk) =

∫R2d

(|γq|2 + |p|2)ρhk(q, p)dqdp ≤ Ckk∑i=1

Wh(ρhi−1, ρ

hi ) + Ckh2

k∑i=1

M2(ρhi ) + CM2(ρ0).

Applying Lemma 2.5.1 with n = k, it follows that

M2(ρhk) ≤ Ck

[h(A(ρ0)−A(ρhk)) + Ch2

k∑i=0

M2(ρhi ) + Ckh2

]+ Ckh2

k∑i=1

M2(ρhi ) + CM2(ρ0)

≤ −CkhS(ρhk) + Ckh2

k∑i=1

M2(ρhi ) + CM2(ρ0) + CkhA(ρ0) + Ck2h2. (2.69)


By inequality (29) in [JKO98], S(ρhk) is bounded from below by M2(ρhk),

S(ρhk) ≥ −C − CM2(ρhk). (2.70)

Substituting (2.70) into (2.69) we have

M2(ρhk) ≤ C21kh

2

k∑i=1

M2(ρki ) + C1khM2(ρhk) + C1(k2h2 + 1) + C1M2(ρ0), (2.71)

where we fix the constant C1, and use it to set the time horizon T0:

T0 =1

4C1

, K0 =

⌈T0

h

⌉. (2.72)

We emphasize that C1, and hence T0, is independent of the initial data. We now chooseh0 ≤ T0 so small that for all h ≤ h0 we have K0h ≤ 2T0 and C1K0h ≤ 1

2. Then it follows

from (2.71) that, for any h ≤ h0, k ≤ K0,

3

4M2(ρhk) ≤ C2

1kh2

k∑i=1

M2(ρhi ) + C1(4T 20 + 1) + C1M2(ρ0). (2.73)

Hence

3

4

K0∑i=1

M2(ρhi ) ≤ C21K

20h

2

K0∑i=1

M2(ρhi ) +K0(T0 + C1) + C1M2(ρ0)

≤ 4C21T

20

K0∑i=1

M2(ρhi ) +K0(T0 + C1) + C1M2(ρ0) (2.74)

≤ 1

4

K0∑i=1

M2(ρhi ) +K0(T0 + C1) + C1M2(ρ0).

Consequently,K0∑i=1

M2(ρhi ) ≤ 2K0(T0 + C1) + 2C1M2(ρ0). (2.75)

Substituting (2.75) into (2.73), we obtain

M2(ρhk) ≤2

3

(2 +K0

)(T0 + C1) + C1M2(ρ0). (2.76)

This finishes the proof of the boundedness of M2(ρhk).We now show that the entropy S(ρhk) is also bounded. From (2.70) and (2.76), it

follows that S(ρhk) is bounded from below. It remains to find an upper bound. Applying

Lemma 2.5.1 for n = k, and noting that F (ρhk) ≥ 0, Wh(ρhi−1, ρ

hi ) ≥ 0 for all i, we have

S(ρhk) ≤ A(ρ0) + Chk∑i=0

M2(ρhi ) + Ckh ≤ Chk∑i=1

M2(ρhi ) + C[S(ρ0) +M2(ρ0)

]+ 2CT0.

(2.77)

2.6. Proof of Theorem 2.2.3 57

By combining with (2.75) we obtain the upper bound for the entropy. This completes theproof of the lemma.

The following lemma extends Lemma 2.5.2 to any T > 0. The proof is the same asLemma 5.3 in [Hua00], and we omit it.

Lemma 2.5.3. Let ρhkk≥1 be the sequence of the minimizers of Scheme 2a or Scheme 2bfor fixed h > 0. For any T > 0, there exists a constant C > 0 depending on T and on theinitial data such that

M2(ρhk) ≤ C, (2.78)

k∑i=1

W ∗h (ρhi−1, ρ

hi ) ≤ Ch, (2.79)

∫R2d

maxρhk log ρhk, 0 dqdp ≤ C, (2.80)

for any h ≤ h0 and k ≤ Kh, where

Kh =

⌈T

h

⌉.

For Scheme 2c the same inequalities hold, with (2.79) replaced by

k∑i=1

Wh(µhi , ρ

hi ) ≤ Ch.

2.6 Proof of Theorem 2.2.3

In this section we bring all the parts together to prove Theorem 2.2.3. The structure ofthis proof is the same as that of e.g. [JKO98, Hua00], and we refer to those references forthe parts that are very similar. The main difference lies in the convergence of the discreteEuler-Lagrange equations for each of the cases to the weak formulation of the Kramersequation as h→ 0.

Throughout we fix T > 0 and for each h > 0 we set

Kh := dT/he.

The proof of the space-time weak compactness (2.28) is the same for the three schemes.Let (ρhk)k be the sequence of minimizers constructed by any of the three schemes, and lett 7→ ρh(t) be the piecewise-constant interpolation (2.27). By Lemma 2.5.3 we have

M2(ρh(t)) +

∫R2d

maxρh(t) log ρh(t), 0 dqdp ≤ C, for all 0 ≤ t ≤ T. (2.81)


Since the function z 7→ maxz log z, 0 has super-linear growth, (2.81) guarantees thatthere exists a subsequence, denoted again by ρh, and a function ρ ∈ L1((0, T )×R2d) suchthat

ρh → ρ weakly in L1((0, T )×R2d). (2.82)

This proves (2.28).

The proof of the stronger convergence (2.29) and of the continuity (2.30) at t = 0 followsthe same lines as in [JKO98, Hua00]. The main estimate is the ‘equi-near-continuity’estimate

d(ρh(t1), ρh(t2)

)2 ≤ C(|t2 − t1|+ h),

where d(ρ0, ρ1) is the metric generated by the quadratic cost |q − q′|2 + |p − p′|2. Thisestimate follows from the inequality (see (2.31))

|q − q′|2 + |p− p′|2 ≤ C[C∗h(q, p; q′, p′) + h2N(q, p) + h2N(q′, p′)

],

and the estimates (2.81) and (2.79); see [Hua00, Theorem 5.2].

The only remaining statement of Theorem 2.2.3 is the characterization of the limit interms of the solution of the Kramers equation, and we now describe this.

Let ρh be generated by one of the three schemes. We now prove that the limit ρ satisfiesthe weak version of the Kramers equation (2.26). Fix T > 0 and ϕ ∈ C∞c ((−∞, T )×R2d);all constants C below depend on the parameters of the problem, on the initial datum ρ0,and on ϕ, but are independent of k and of h. We first discuss Schemes 2a and 2b.

Let P h∗k ∈ Γ(ρhk−1, ρ

hk) be the optimal plan for W ∗

h (ρhk−1, ρhk), where the star indicates

the quantities associated with either Scheme 2a or Scheme 2b. For any 0 < t < T , we have∫R2d

[ρhk(q, p)− ρhk−1(q, p)

]ϕ(t, q, p)dqdp

=

∫R2d

ρhk(q′, p′)ϕ(t, q′, p′)dq′dp′ −

∫R2d

ρhk−1(q, p)ϕ(t, q, p)dqdp

=

∫R4d

[ϕ(t, q′, p′)− ϕ(t, q, p)

]P h∗k (dqdpdq′dp′)

=

∫R4d

[(q′ − q) · ∇q′ϕ(t, q′, p′) + (p′ − p) · ∇p′ϕ(t, q′, p′)

]P h∗k (dqdpdq′dp′) + εk,

(2.83)

where

|εk| ≤ C

∫R4d

[|q′ − q|2 + |p′ − p|2

]P h∗k (dqdpdq′dp′)

(2.31)

≤ CW ∗h (ρhk−1, ρ

hk) + Ch2

[M2(ρhk−1) +M2(ρhk)

](2.81)

≤ CW ∗h (ρhk−1, ρ

hk) + Ch2. (2.84)

2.6. Proof of Theorem 2.2.3 59

By combining (2.83) with (2.53) we find∫R2d

(ρhk(t, q, p)− ρhk−1(q, p)

h

)ϕ(t, q, p)dqdp

=

∫R2d

[ pm· ∇qϕ(t, q, p)− (∇V (q) + γ∇F (p)) · ∇pϕ(t, q, p) + γβ−1∆pϕ(t, q, p)

]ρhk(q, p)dqdp

+ θk(t), (2.85)

where

|θk(t)| ≤ |εk|h

+ Ch[W ∗h (ρhk−1, ρ

hk) +M2(ρhk−1) +M2(ρhk) + 1

](2.81),(2.84)

≤ C

hW ∗h (ρhk−1, ρ

hk) + Ch. (2.86)

Note that θk depends on t through the t-dependence of ϕ. Next, from (2.85), for k ≥ 1 wehave ∫ kh

(k−1)h

∫R2d

(ρhk(q, p)− ρhk−1(q, p)

h

)ϕ(t, q, p)dqdpdt

=

∫ kh

(k−1)h

∫R2d

[ pm· ∇qϕ(t, q, p)− (∇V (q) + γ∇F (p)) · ∇pϕ(t, q, p)

+γβ−1∆pϕ(t, q, p)]ρhk(q, p)dqdpdt+

∫ kh

(k−1)h

θk(t)dt

=

∫ kh

(k−1)h

∫R2d


+ γβ−1∆pϕ(t, q, p)]ρh(t, q, p)dqdpdt+

∫ kh

(k−1)h

θk(t)dt.

Summing from k = 1 to Kh we obtain

Kh∑k=1

∫ kh

(k−1)h

∫R2d

(ρhk(q, p)− ρhk−1(q, p)

h

)ϕ(t, q, p)dqdpdt

=

∫ T

0

∫R2d


+γβ−1∆pϕ(t, q, p)]ρh(t, q, p)dqdpdt+Rh, (2.87)

where

Rh =

Kh∑k=1

∫ kh

(k−1)h

θk(t)dt. (2.88)


By a discrete integration by parts, we can rewrite the left hand side of (2.87) as

−∫ h

0

∫R2d

ρ0(q, p)ϕ(t, q, p)

hdqdpdt+

∫ T

0

∫R2d

ρh(t, q, p)

(ϕ(t, q, p)− ϕ(t+ h, q, p)

h

)dqdpdt.

(2.89)

From (2.87) and (2.89) we obtain∫ T

0

∫R2d

ρh(t, q, p)

(ϕ(t, q, p)− ϕ(t+ h, q, p)

h

)dqdpdt

=

∫ T

0

∫R2d


(2.90)

+γβ−1∆pϕ(t, q, p)]ρh(t, q, p)dqdpdt

+

∫ h

0

∫R2d

ρ0(q, p)ϕ(t, q, p)

hdqdpdt+Rh. (2.91)

Now Rh → 0 as h→ 0, since

|Rh|(2.88)

≤Kh∑k=1

∫ kh

(k−1)h

|θk(t)|dt(2.86)

≤ C

Kh∑k=1

∫ kh

(k−1)h

(1

hW ∗h (ρhk−1, ρ

hk) + h

)dt

= C

Kh∑k=1

[W ∗h (ρhk−1, ρ

hk) + Ch2

] (2.79)

≤ Ch.

Taking the limit h→ 0 in (2.91) yields equation (2.26).

For Scheme 2c, only (2.83) is different:∫R2d

[ρhk(q, p)− ρhk−1(q, p)

]ϕ(t, q, p) dqdp

=

∫R2d


∫R2d

ρhk−1(q, p))ϕ(t, q, p)dqdp

=

∫R2d


∫R2d

µhk(q, p)ϕ(t, σh(q, p))dqdp

=

∫R4d

[ϕ(t, q′, p′)− ϕ

(t, q − p

mh, p+∇V (q)h

)]P hk (dqdpdq′dp′)

=

∫R4d

[(q′ − q +

p

mh) · ∇q′ϕ(t, q′, p′) + (p′ − p−∇V (q)h) · ∇p′ϕ(t, q′, p′)


+ εk,

where

|εk| ≤ C

∫R4d

(γ2∣∣∣q′ − q +

p

mh∣∣∣2 + |p′ − p−∇V (q)h|2

)P hk (dqdpdq′dp′)

2.7. Conclusion and discussion 61

with the constant C depending only on ϕ. Since |p′ − p|2 , |q′ − q|2 ≤ CCh(q, p; q′, p′) and

|∇V (q)|2 ≤ C |q|2,

γ2∣∣∣q′ − q +

p

mh∣∣∣2 + |p′ − p− h∇V (q)|2 ≤ 2

(γ2|q − q′|2 +

γ2h2

m2|p|2 + |p− p′|2 + h2|∇V (q)|2

)≤ CCh(q, p; q

′, p′) + Ch2N(q, p).

Therefore

|εk| ≤ C

∫R4d

[Ch(q, p; q

′, p′) + h2N(q, p) + h2]P hk (dqdpdq′dp′)

= CWh(µhk, ρ

hk) + CM2(µhk)h

2 + Ch2

≤ CWh(µhk, ρ

hk) + Ch2.

The rest of the proof is the same.

2.7 Conclusion and discussion

Relation to GENERIC. The main theorem of this chapter, Theorem 2.2.3 below, statesthat the three new Schemes 2a-c are indeed approximation schemes for the Kramers equa-tion (2.1): the discrete-time approximate solutions constructed using each of these threeschemes converge, as h→ 0, to the unique solution of (2.1).

This statement itself is a relatively uninteresting assertion: it states that the schemesare what we claim them to be, approximation schemes. The interest of this chapter lies inthe fact that these three schemes suggest a way towards a generalization of the theory ofmetric-space gradient flows, as developed in [AGS08], to equations like (2.1) that combinedissipative with conservative effects.

Indeed, the full class of equations and systems that combines dissipative and conser-vative effects is extremely large. It contains the Navier-Stokes-Fourier equations (whichinclude heat generation and transport), systems modelling visco-elasto-plastic materials,relativistic hydrodynamics, many Boltzmann-type equations, and many other equationsdescribing continuum-mechanical systems. In fact, the full class of systems covered bythe GENERIC formalism [Ott05] is of this conservative-dissipative type, and indeed whenF (p) = |p|2/2m, the Kramers equation (2.1) can be cast in this form. Because of this,the results of this chapter strongly suggest that similar schemes can be constructed forarbitrary GENERIC systems. We discuss this in chapter 4 .

Value of the three schemes. Scheme 2a is in our opinion interesting because ‘it is theright thing to do’—it stays as close as possible to the underlying physics. However, itsnon-explicit nature makes it difficult to work with, as the calculations in the proof ofLemma 2.3.1 illustrate. Scheme 2b is therefore useful as an approximation of Scheme 2a.Scheme 2c has the advantage of being formulated in terms of a metric Wh, which suggestsapplicability of metric-space theory.


Whichever scheme is chosen, the split between conservative and dissipative effects maylead to operator-splitting numerical methods that reflect the same division between conser-vative and dissipative effects [HKLR10]. Since conservative effects are often best treatedby explicit or symplectic integration, while dissipative effects are better discretized usingimplicit schemes, this split allows for better tailoring of the method to the two steps.

The potential force ∇V . Throughout this chapter, V appears only through its gradient.The results of the chapter should hold true if∇V is replaced by a generic field B : Rd → Rd.However, in this case the Hamiltonian structure is lost.

The linear-friction case F (p) = |p|2/2m. The coefficient γkT in (2.1) and the coefficientσ :=

√2γkT in (2.2b) are obviously related by σ2 = 2γkT . When F (p) = |p|2/2m, the

coefficient γ is also the coefficient of linear friction, and this relationship between σ, γ, andtemperature is the one given by the fluctuation-dissipation theorem. This guarantees thatthe Boltzmann distribution

ρ∞(q, p) = Z−1 exp

(− 1

kTH(q, p)

), (2.92)

is the unique stationary solution of (2.1). Moreover, the total free energy E is the relativeentropy with respect to ρ∞, and it is a Lyapunov functional for the system, as is shownin (2.5).

When F does not have this specific form, but does have appropriate growth at infinity,then there still exists a unique stationary solution ρ∞, which however does not have theconvenient characterization (2.92). The relative entropy with respect to ρ∞ is then againa Lyapunov fucntional.

Connection to ultra-parabolic equations. If V is linear, V (q) = c · q, where c ∈ Rd is a

constant vector, then Ch coincides with Ch. In this case, Ch = Ch is closely related to thefundamental solution of the equation

∂tρ(t, q, p) = − p

m· ∇qρ(t, q, p) + c · ∇pρ(t, q, p) +

σ2

2∆pρ(t, q, p). (2.93)

Indeed, the fundamental solution Γt(q, p; q′, p′) of (2.93) is given by

Γt(q, p; q′, p′) =

α1

t2dexp

(− γ

σ2tCt(q, p; q

′, p′)), (2.94)

where α1 is a normalization constant depending only on d. This fact is true for a much moregeneral linear system and is related to the controllable property of the system [DM10]. Theappearance of the rate functional from the Freidlin-Wentzell theory in (2.94) consolidatesthe connection to the large deviation principle of our aprroach.

Connection to the isentropic Euler equations. The cost function Ch has been used in[GW09, Wes10] to study the system of isentropic Euler equations,

∂tρ+∇ · (ρu) = 0,

∂tu + u · ∇u = −∇U ′(ρ),

2.7. Conclusion and discussion 63

where U : [0,∞) −→ R is an internal energy density. We now formally show the re-lationship between two equations. Suppose that ρ(t, q, p) is a solution of the Kramersequation (2.1) with F (p) = |p|2 /2m. We define the macroscopic spacial density and thebulk velocity as

ρ(t, q) =

∫Rd

ρ(t, q, p)dp, (2.95)

u(t, q) =1

ρ(t, q)

∫Rd

p

mρ(t, q, p)dp. (2.96)

Using the so-called moment method, we find that (ρ,u) satisfies the following dampedEuler equations [CSR96, Cha03, CLL04],

∂tρ+∇ · (ρu) = 0 (2.97)

∂tu + u · ∇u = −β−1

m

∇ρρ− 1

m∇V − γ

mu. (2.98)

If γ = 0 and V ≡ 0, these are the isentropic Euler equations with internal energyU(ρ) = β−1ρ log ρ. In [GW09, Wes10], the authors showed that the isentropic Euler equa-tions may be interpreted as a second-order differential equation in the space of probabilitymeasures. They introduced a discrete approximation scheme, which is similar to Schemes2a-b, using the cost functional Ch. One future topic of research is to analyse whetherone can approximate other second-order differential equations in the space of probabilitymeasures (e.g., the Schrodinger equation [vR12]), using the cost function Ch.

Connection to Ambrosio-Gangbo [AG08]. The Hamiltonian step in Scheme 2c is ageneralization of the implicit Euler method for a finite-dimensional Hamiltonian system toan infinite-dimensional case. It is also compatible with the concept of Hamiltonian flows inthe Wasserstein space of probability measures defined by Ambrosio and Gangbo in [AG08].Let H : P2(R2d) → (−∞,+∞] and µ ∈ P2(R2d) be given. Then µt : [0,∞) → P2(R2d) iscalled a Hamiltonian flow of H with the initial measure µ if the following equation holds

d

dtµt = divqp(µtJ∇H(µt)), µ0 = µ, t ∈ (0, T ),

where J is a skew-symmetric matrix and∇H(µt) is the gradient of the HamiltonianH at µt

(Definition 3.2 in [AG08]). In particular, when H(ρ) =∫R2d

(p2

2m+ V (q)

)ρ(q, p)dqdp then

∇H = (∇qV (q), pm

)T . According to Lemma 6.2 in [AG08] when µ is regular, a Hamiltonianflow in a small interval (0, h) is constructed by pushing forward the initial measure µ underthe map Φ(t, ·) = (q(t), p(t)) which is the solution of the system (2.2) (with γ = 0). In theHamiltonian step we approximate this system by the implicit Euler method and define µhkto be the end point µ(h).

Chapter 3

Wasserstein gradient flows from largedeviations of many-particle limits

In this chapter, we study the Fokker-Planck equation as the many-particle limit of astochastic particle system on one hand and as a Wasserstein gradient flow on the other.We write the path-space rate functional, which characterises the large deviations from theexpected trajectories, in such a way that the free energy appears explicitly. Next we use thisformulation via the contraction principle to prove that the discrete time rate functional isasymptotically equivalent in the Gamma-convergence sense to the functional derived fromthe Wasserstein gradient discretization scheme.1

3.1 Introduction

This chapter is concerned with the Wasserstein gradient flow structure of the Fokker-Planck equation (1.9),

∂tρt = ∆ρt + div(ρt∇H) in Rd ×R+, ρ(0, x) = ρ0(x), (3.1)

where the potential H : Rd → R is a smooth function, and ρ0(x) is a probability densityin Rd. As mentioned in section 1.3, in [JKO98], Jordan, Otto and Kinderlehrer provedthat the Fokker-Planck equation is a gradient flow of the free energy F(ρ) with respect tothe Wasserstein metric. The free energy is the sum of the Boltzmann entropy S(ρ) and apotential energy E(ρ), with

E(ρ) =

∫Rd

H(x)ρ(dx) and S(ρ) :=

∫Rd ρ(x) log ρ(x) dx, for ρ(dx) = ρ(x) dx

∞, otherwise.

(3.2)

We recall the JKO-scheme here again.

1This chapter is a joint work with Vaios Lachos and Michiel Renger [DLR13].

65

66 Wasserstein gradient flows from large deviations of many-particle limits

Theorem 3.1.1 ([JKO98]). The solution t 7→ ρ(t, x) to the Fokker-Planck equation can beapproximated by the time-discrete sequence ρk defined recursively by

ρk ∈ argminρ

Kh(ρ, ρk−1), Kh(ρ, ρk−1) :=1

2hW2(ρ, ρk−1)2 + F(ρ)−F(ρk−1). (3.3)

The aim of this chapter is to provide a microscopic interpretation of this theorem.

3.1.1 Main result of the chapter

We recall the particle system associated to the Fokker-Planck equation. Consider nindependent random particles in Rd with positions Xk(t) that satisfy the stochastic differ-ential equation

dXk(t) = −∇H(Xk(t)) dt+√

2dWk(t), (3.4)

where Wk are independent Wiener process. Assume that the initial values are fixeddeterministically by some X1(0) = x1, X2(0) = x2, . . . in such a way that2

ρn(0)→ ρ0 narrowly for some given ρ0 ∈ P(Rd), (3.5)

The corresponding empirical process is given by

ρn : t 7→ 1

n

n∑k=1

δXk(t).

Then, as a consequence of the Law of Large Numbers, at each h ≥ 0 the empirical measureρn(h) converges almost surely in the narrow topology as n → ∞ to the solution of theFokker-Planck equation (3.1) with initial condition ρ0 [Dud89, Theorem 11.4.1]. The rateof this convergence is characterised by a large deviation principle. Roughly speaking, thismeans that there exists a unique Jh : P(Rd)×P(Rd)→ [0,∞] such that (see Section 1.5)

Prob (ρn(h) ≈ ρ | ρn(0) ≈ ρ0) ∼ exp (−nJh(ρ|ρ0)) as n→∞.

In [Leo07, Prop. 3.2] and [PRV13, Cor. 13], it was found that

Jh(ρ|ρ0) = infH(γ|ρ0 ph) : γ ∈ Π(ρ0, ρ)

(3.6)

where H is the relative entropy, ph is the fundamental solution of the Fokker-Planck equa-tion (3.1) and Π(ρ0, ρ) is the set of all Borel measures in Rd×Rd that have first and secondmarginal ρ0 and ρ respectively. In section 3.3, we characterise a class of potentials H andinitial data ρ0 for which (3.6) is equal to

infρ(·)∈CW2

(ρ0,ρ)

1

4h

∫ 1

0

‖∂tρt‖2−1,ρt

dt+h

4

∫ 1

0

‖∆ρt + div(ρt∇H)‖2−1,ρt

dt+1

2F(ρ1)− 1

2F(ρ0)

.

(3.7)

2The reason behind this specific initial condition is that we want to somehow condition on ρn = ρ,which is a measure-0 set.


where the ‖·‖−1,ρ norm and the exact meaning of ∂tρt,∆ρt and div(ρt∇H) will be defined inthe sequel. In the main theorem, by using the above equality, we show that the Wassersteinscheme (3.3) has the same asymptotic behavior with Jh for h → 0, in terms of Gamma-convergence (see [Bra02] for an exposition of Gamma-convergence).

The main result of this chapter is the following.

Theorem 3.1.2. Let ρ0 = ρ0(x)dx ∈ P2(R) be absolutely continuous with respect to theLebesgue measure and with density ρ0(x) being bounded from below by a positive constantin every compact set. Assume that

∫R|∇H(x)|2 ρ0(dx) and the Fisher information I(ρ0)

(introduced in Section 3.2) are finite, and that H satisfies either Assumption 3.3.1 or 3.3.4(introduced in Section 3.3). Then we have

Jh(· |ρ0)− W 22 (ρ0, · )

4h

Γ−−→h→0

1

2F( · )− 1

2F(ρ0), in P2(R). (3.8)

Actually, we will prove Mosco convergence of (3.8), i.e. in Theorem 3.4.1 we prove thatthe Gamma-convergence lower bound holds for any sequence in P2(R), equipped with thenarrow topology, and in Theorem 3.5.1 we prove the existence of the recovery sequencein the Wasserstein topology. This is equivalent to having Gamma convergence in bothtopologies.

In the Wasserstein topology, the Gamma-convergence (3.8) immediately implies:

h Jh(· |ρ0)Γ−−→

h→0

1

4W 2

2 (ρ0, · ) in P2(R). (3.9)

For a system of Brownian particles, i.e. H ≡ 0, statement (3.9) can also be found in[Leo07]. Together, the two statements (3.8) and (3.9) make up an asymptotic developmentof the rate Jh for small h, i.e.

Jh(ρ|ρ0) ≈ 1

2F(ρ)− 1

2F(ρ0) +

1

4hW 2

2 (ρ0, ρ).

Apart from the factor 1/2, which do not affect the minimisers, this approximation indeedcorresponds to the functional Kh in the JKO-scheme (3.3).

For H = 0, the main statement (3.8) was proven in [ADPZ11] in a subset of P2(R)consisting of measures that are sufficiently close to a uniform distribution on a compactinterval. In [PRV13], it was proven that whenever (3.8) holds for H = 0, then it alsoholds for any H ∈ C2

b (Rd). Both papers make use of the specific form of the fundamentalsolution of (3.1). In [DLZ12], (3.8) was shown for Gaussian measures on the real line.The novelty of the chapter (3) lies in the use of large deviations in the space of trajectoriesrather than conditional large deviations of the form (3.6). The conditional large deviationsare obtained by a contraction. This provides us with an alternative formulation of the ratefunctional from which, formally, the Gamma lower bound follows immediately. Moreover,this approach allows us to prove the Gamma convergence in a much more general context.

All theorems in this chapter are valid in higher dimensions except for the existence ofthe recovery sequence. There are a number of reasons why, at least by the approach of this


chapter, the argument fails in higher dimensions. First of all, in the proof of Lemma 3.5.3we use an explicit formula of optimal transport maps in terms of cumulative distributionfunctions. Secondly, the proof of the same lemma in higher dimensions would requireregularity and global estimates of derivates of the transport map, which are still unknowntoday (see for example [Vil03, p. 141]).


The chapter is organized as follows. The required concepts of this chapter are intro-duced in Section 3.2. In Section 3.3 we obtain a representation of the functional Jh viathe path-wise large deviation principles. In Section 3.4, we prove the Gamma-convergencelower bound, and in Section 3.5 the existence of the recovery sequence.

3.2 Preliminaries

By the nature of this study, we need a combination of techniques from probabilitytheory, mostly from the theory of large deviations, and from functional analysis, mostlyfrom the gradient flow calculus as set out in [AGS08]. Let us introduce these conceptshere.

3.2.1 Continuous and absolutely continuous curves

We write C([0, 1],P(Rd)) for the space of narrowly continuous curves [0, 1] → P(Rd),and C(ρ0, ρ) for the space of narrowly continuous curves [0, 1] → P(Rd) starting inρ0 and ending in ρ. Similarly, for Wasserstein-continuous curves in P2(Rd) we writeCW2([0, 1],P2(Rd)) and CW2(ρ0, ρ).

Furthermore, we use two different notions of absolutely continuous curves. The firstnotion is taken from [DG87, Def. 4.1]. Let D = C∞c (Rd) be the space of test functionswith the corresponding topology (see [Rud73, Sect. 6.2]), let D′ be its dual, consisting ofthe associated distributions, and let 〈 , 〉 be the dual pairing between D′ and D. We willidentify a measure ρ ∈ P(Rd) with a distribution by setting 〈ρ, f〉 :=

∫f dρ. Denote by

DK ⊂ D the subspace of all Schwartz functions with compact support K ⊂ Rd. Thena curve ρ(·) : [0, 1] → D′ is said to be absolutely continuous in the distributional sense iffor each compact set K ⊂ Rd there is a neighborhood UK of 0 in DK and an absolutelycontinuous function GK : [0, 1]→ R such that

|〈ρt2 , f〉 − 〈ρt1 , f〉| ≤ |GK(t2)−GK(t1)|,

for all 0 < t1, t2 < 1 and f ∈ UK . Note that if a curve ρ(·) : [0, 1] → D′ is absolutelycontinuous then the derivative in the distributional sense ∂tρt = limh→0

1h(ρt+h − ρt) exists

for almost all t ∈ [0, 1].

3.2. Preliminaries 69

Secondly, we say a curve ρ(·) : [0, 1] → P2(Rd) is absolutely continuous in the Wasser-stein sense if there exists a g ∈ L1(0, 1) such that

W2(ρt1 , ρt2) ≤∫ t2

t1

g(t) dt

for all 0 < t1 ≤ t2 < 1 (see for example [AGS08, Def. 1.1.1]).Sufficient conditions for absolute continuity in the Wasserstein sense are given by the

following useful lemma:

Lemma 3.2.1. [AGS08, Th. 8.3.1] Let ρ(·) : (0, h) → P(Rd) be a narrowly continuouscurve and let v(·) be a vector field such that the continuity equation holds:

∂tρt + div(ρt vt) = 0 in distributional sense. (3.10)

If

ρ0 ∈ P2(Rd) and

∫ h

0

‖vt‖2L2(ρt)

dt <∞ (3.11)

then ρt ∈ P2(Rd) for all 0 < t < h and ρ(·) is absolutely continuous in the Wassersteinsense.

Remark 3.2.2. We point out that the hypothesis in Lemma 3.2.1 requires a priori thatthe curve ρ(·) lies in P2(Rd), but the proof actually shows that the condition (3.11) impliesthe whole curve to be in P2(Rd) (and it is absolutely continuous in the Wasserstein sense).

3.2.2 The tangent space

For an absolutely continuous curve in the Wasserstein sense ρ(·) there is a unique Borel

field vt ∈ V (ρt) := ∇f : f ∈ DL2(ρt)

such that the continuity equation (3.10) holds[AGS08, Th. 8.3.1]. This motivates the identification of the tangent space3 of P2(Rd) at ρwith all s ∈ D′ for which there exists a v ∈ V (ρ) such that

s+ div(ρv) = 0 in distributional sense. (3.12)

The following inner product on the tangent space at ρ is the metric tensor correspondingto the Wasserstein metric [Ott01]

(s1, s2)−1,ρ :=

∫Rd

v1 · v2 dρ,

where v1 and v2 are associated with s1 and s2 through (3.12). The corresponding normcoincides with the dual operator norm on D′

‖s‖2−1,ρ := sup

f∈D

2〈s, f〉 −

∫Rd

|∇f |2dρ. (3.13)

3Here we like to point out that in [AGS08] the tangent space is identified with the set of velocity fieldsV (ρ).


This norm is closely related to the Wasserstein metric through the Benamou-Brenier for-mula [BB00]

W2(ρ0, ρ1)2 = min

∫ 1

0

‖∂tρt‖2−1,ρt dt : ρt|t=0 = ρ0 and ρt|t=1 = ρ1

. (3.14)

3.2.3 Relevant functionals

We sometimes write ∆ρ and div(ρ∇H) for the functionals in D′ defined by

〈∆ρ, f〉 :=

∫Rd

∆f dρ and 〈div(ρ∇H), f〉 := −∫∇H · ∇f dρ.

For ρ ∈ P2(Rd), we define the Fisher information

I(ρ) :=

∫Rd

|∇ρ(x)|2ρ(x)

dx if ρ(dx) = ρ(x) dx and√ρ ∈ H1(Rd),

∞ otherwise,(3.15)

where ∇ρ is the distributional derivative of ρ. By using (3.13), it is straightforward tosee that ‖∆ρ‖2

−1,ρ ≤ I(ρ), where the inequality turns to equality when the right hand isfinite. Similarly we have that ‖ div(ρ∇H)‖2

−1,ρ ≤∫|∇H|2dρ. Here equality holds whenever∫

R|∇H|2 dρ <∞, which is certainly true if H satisfies assumptions (3.3.1) and ρ ∈ P2(Rd).Observe that in Theorem 3.1.2 we assume finiteness of I(ρ0) and

∫R|∇H|2 dρ0. As

a consequence of the HWI inequality [Vil09, Cor. 20.13], these conditions, together withAssumptions 3.3.1 and 3.3.4, imply that the free energy S(ρ0) + E(ρ0) is also finite.

3.2.4 Chain rule

We conclude this section with a chain rule for the free energy (3.2) on absolutelycontinuous curves.

Lemma 3.2.3. Let H ∈ C2(Rd) be bounded from below and λ−convex for some λ ∈ R (see[Vil03, Sect. 2.1.3]). Assume also that ρ(·) : (0, h) → P2(Rd) is an absolutely continuouscurve in the Wasserstein sense, that satisfies the conditions E(ρt),S(ρt) <∞ ∀t ∈ [0, h]and ∫ h

0

(∫Rd

|∇H(x)|2ρt(x) dx+ I(ρt)

)dt <∞. (3.16)

Then t→ F(ρt) is absolutely continuous and for a.e t ∈ [0, h] we have

d

dtF(ρt) = − (∆ρt + div(ρt∇H), ∂tρt)−1,ρt

. (3.17)

Proof. This lemma is a direct consequence of [AGS08, Th. 10.3.18]. All conditions of this

theorem are easily checked; the only non-trivial condition may be∫ h

0|∂F|(ρt) |ρt| dt < ∞,

3.3. Large deviations of trajectories 71

where |∂F|(ρt) is the metric slope and |ρt| is the metric derivative (see [AGS08, Sect. 1.1and 1.2]). This is true, since by [AGS08, Th. 10.4.13 and Th. 8.3.1] and (3.16):∫ h

0

|∂F|(ρt) |ρt| dt ≤1

2

∫ h

0

|∂F|2(ρt) dt+1

2

∫ h

0

|ρt|2 dt

≤ 1

2

∫ h

0

∫ ∣∣∣∇H(x) + ∇ρt(x)ρt(x)

∣∣∣2 ρt(x) dx dt+1

2

∫ h

0

‖∂tρt‖2−1,ρt dt

≤∫ h

0

∫|∇H(x)|2 ρt(x) dx dt+

∫ h

0

I(ρt) dt+1

2

∫ h

0

‖∂tρt‖2−1,ρt dt <∞.

3.2.5 Gamma-convergence

We recall the definition of Gamma convergence for the readers’ convenience.

Definition 3.2.4. [Bra02] Let X be a metric space. We say that a sequence fn : X →R Γ− converges in X to f : X → R, denoted by fn

Γ−→ f , if for all x ∈ X we have

1. (lower-bound part) For every sequence xn converging to x

lim infn→∞

fn(xn) ≥ f(x), (3.18)

2. (upper-bound part) There exists a sequence xn converging to x such that

limn→∞

fn(xn) = f(x). (3.19)

If fn satisfy the lower (or upper, respectively) bound part then we write fnΓ−lim inf−−−−−→ f

(or fnΓ−lim sup−−−−−→ f respectively).

3.3 Large deviations of trajectories

In this section we prove, under suitable assumptions for ρ0 and H, the equivalence ofthe rate functionals (3.6) and (3.7). The latter form will be used to prove the main Gammaconvergence theorem. First, the large deviations of the empirical process is derived. Tothis aim we will need to distinguish between two different types of potentials H. Next, wetransform these large deviation principles back to the large deviations of the empirical mea-sure ρn(h) by a contraction principle, and finally show that the resulting rate functionalsare the same for both cases.

In the first case we consider potentials that satisfy the following

Assumption 3.3.1 (The subquadratic case). Let H ∈ C2(Rd) such that


1. H is bounded from below,

2. there is a C0 > 0 such that |x||∇H(x)| ≤ C0(1 + |x|2) for all x ∈ Rd,

3. H is λ−convex for some λ ∈ R,

4. there exists constants 0 ≤ C1 <14

and C2, C3 ∈ R+ such that |∆H(x)| ≤ C1|∇H(x)|2+C2H(x) + C3.

Note that the second assumption indeed implies |H(x)| ≤ C0(1 + |x|2). Under As-sumption 3.3.1, combined with initial condition (3.5), the empirical process ρn(t)0≤t≤hsatisfies a large deviation principle in C([0, h],P(Rd)) with good rate functional [DG87,Th. 4.5]

Jh(ρ(·)) =1

4

∫ h

0

‖∂tρt −∆ρt − div(ρt∇H)‖2−1,ρtdt, (3.20)

if the curve ρ(·) is absolutely continuous in the distributional sense; else we set Jh to∞. Itfollows from a contraction principle [DZ87, Th. 4.2.1] and a change of variables t 7→ t/hthat the conditional rate functional (3.6) can also be written as

Jh(ρ|ρ0) = infρ(·)∈C(ρ0,ρ)

1

4h

∫ 1

0

‖∂tρt − h(∆ρt + div(ρt∇H))‖2−1,ρt

dt. (3.21)

Remark 3.3.2. The first assumption guarantees that the functional E : P(Rd)→ (−∞,∞]is well defined. The last two assumptions are not necessary to derive (3.20); however wewill need them in the sequel.

Remark 3.3.3. In (3.21) we implicitly set 14h

∫ 1

0‖∂tρt−h(∆ρt + div(ρt∇H))‖2

−1,ρt dt =∞if the curve is not absolutely continuous in distributional sense. Therefore, from now on,we shall only consider curves in C(ρ0, ρ) or CW2(ρ0, ρ) that are absolutely continuous indistributional sense.

In the second case we require a combination of assumptions on H that were taken from[FK06] and [FN12]:

Assumption 3.3.4 (The superquadratic case). Let H ∈ C4(Rd) such that:

1. H is λH-convex for some λH ∈ R;

2.∫Rd H(x)e−2H(x) dx <∞;

3. H has superquadratic growth at infinity, i.e. lim|x|→∞H(x)|x|2 = +∞;

4. There exists an ω ∈ C(R+∪0) with ω(0) = 0 and an α ≥ 1 such that limx→∞ω(x)|x|α →

0, and for all x, y ∈ Rd

H(y)−H(x) ≤ ω(|y − x|)(1 +H(x)),

|H(y)−H(x)|2 ≤ ω(|y − x|)(1 + |∇H(x)|2 +H(x));


5. 2∆H ≤ A|∇H|2 +B for some 0 < A < 1 and B > 0;

6. ζ := |∇H|2 − 2∆H has superquadratic growth at infinity, i.e. lim|x|→∞ζ(x)|x|2 = +∞;

7. ζ is λζ-convex for some λζ ∈ R.

Whenever Assumption 3.3.4 and initial condition (3.5) hold, then by [FK06, Th. 13.37]the process ρn(t)0≤t≤h satisfies a large deviation principle in CW2([0, h],P2(Rd)) withgood rate functional (3.20).

Remark 3.3.5. Contrary to the subquadratic case, the latter is actually a large deviationprinciple on the set of all continuous paths in P2(Rd) with respect to the Wasserstein topol-ogy. Although we strongly believe that this is also true for the subquadratic case, it is verydifficult to prove due to the fact that the functional Jh does not have Wasserstein-compactsub-level sets, and therefore it can not be a good rate functional in CW2([0, h],P2(Rd)) whenH is subquadratic.

Again, by a contraction principle and a simple change of variables, it follows from (3.20)that (3.6) must be equal to:

Jh(ρ|ρ0) = infρ(·)∈CW2

(ρ0,ρ)

1

4h

∫ 1

0


dt. (3.22)

Observe that in this case the infimum is taken over Wasserstein-continuous curves,while in the subquadratic case (3.21) the infimum was over narrowly continuous curves.However, we will prove that under the extra assumption that ρ0 ∈ P2(Rd) and F(ρ0) isfinite, even in the subquadratic case the infimum can be taken over CW2(ρ0, ρ). Actually, wewill prove something even stronger, that we will need in the sequel, namely the following:

Proposition 3.3.6. Let H ∈ C2(Rd) satisfy Assumption 3.3.1. Let ρ0 ∈ P2(Rd) withF(ρ0) < ∞ and assume ρ(·) ∈ C(ρ0, ρ) with Jh(ρ(·)) finite. Then we have that ρt ∈P2(Rd) for every t ∈ [0, 1] and, furthermore, the curve ρ(·) is absolutely continuous in theWasserstein sense, and F(ρt) is absolutely continuous with respect to t. Finally, thereholds:

1

4h

∫ 1

0


dt

=1

4h

∫ 1

0


dt+h

4

∫ 1

0

‖∆ρt + div(ρt∇H)‖2−1,ρt

dt+1

2F(ρ1)− 1

2F(ρ0).

Before we prove this theorem we prove two auxiliary lemmas.

Lemma 3.3.7. Assume that

1. E(ρ0) <∞

2. H ∈ C2(Rd) satisfies Assumption 3.3.1,


3. ρ(·) ∈ C(ρ0, ρ),

4. Jh(ρ(·)) <∞.

Then ∫ h

0

∫Rd

|∇H(x)|2ρt(dx) dt <∞. (3.23)

Proof. For simplicity we take h = 1. We will prove the following statement: there exist0 < δ ≤ 1 and α, β > 0 that depend only on H such that

α supt∈[0,δ]

∫Rd

|H| dρt + β

∫ δ

0

∫Rd

|∇H|2 dρt dt ≤ 8J1(ρ(·)) + 4 | inf H|+ 2

∫Rd

H dρ0 + 2δC3.

(3.24)Obviously (3.23) follows from (3.24) by repeating it 1/δ times.

By [DG87, Lem. 4.8], for any 0 ≤ s ≤ 1 we have

4J1(ρ(·)) ≥ 4Js(ρ(·)) = supf∈C2

c (Rd)

∫Rd

f dρs−∫Rd

f dρ0−∫ s

0

∫Rd

(∆f −∇H · ∇f +

1

2|∇f |2

)dρt dt.

(3.25)It is worth highlighting that in the above equality, the supremum is taken over C2

c (Rd)functions instead of D.

The idea is to use two approximations of H so that it can be chosen as a test functionf in (3.25). The first approximation is used to show that this inequality still holds if wetake replace C2

c (Rd) by

A :=f ∈ C2(Rd) : f,∇f,∆f, xf, |∇f | |x| are all bounded

. (3.26)

Take an arbitrary f ∈ A. Define the bump function

ζ(x) :=

exp(

1− 11−|x|2

), |x| < 1,

0, |x| ≥ 1,

and set ζk(x) := ζ(x/k). Then surely ζkf ∈ C2c (Rd). It is easy to check that

|ζk(x)| ≤ 1, |∇ζk(x)| ≤ 1

kand |∆ζk(x)| ≤ 1

k2. (3.27)


By the Dominated Convergence Theorem, as k →∞∫Rd

ζkf dρs →∫Rd

f dρs,∫Rd

ζkf dρ0 →∫Rd

f dρ0,∫ s

0

∫Rd

∆(ζkf) dρt dt =

∫ s

0

∫Rd

(f∆ζk + 2∇ζk · ∇f + ζk∆f) dρt dt→∫ s

0

∫Rd

∆f dρt dt,∫ s

0

∫Rd

∇H · ∇(ζkf) dρt dt =

∫ s

0

∫Rd

∇H · (f∇ζk + ζk∇f) dρt dt→∫ s

0

∫Rd

∇H · ∇f dρt dt,∫ s

0

∫Rd

|∇(ζkf)|2 dρt dt =

∫ s

0

∫Rd

|f∇ζk + ζk∇f |2 dρt dt→∫ s

0

∫Rd

|∇f |2 dρt dt,

where the absolute finiteness of the right-hand integrals is guaranteed by the properties ofthe set A. Therefore (3.25) indeed becomes

4J1(ρ(·)) ≥ supf∈A

∫Rd

f dρs −∫Rd

f dρ0 −∫ s

0

∫Rd

(∆f −∇H · ∇f +

1

2|∇f |2

)dρt dt. (3.28)

For the second approximation we take

η(x) := exp(

1−√

1 + |x|2),

and set ηk(x) := η(x/k). Then the following estimates hold

|ηk(x)| ≤ 1, |∇ηk(x)| ≤ 1

kηk(x) and |∆ηk(x)| ≤ 1

k2ηk(x). (3.29)

Since ηkH ∈ A by the subquadratic Assumption 3.3.1, we can substitute ηkH in (3.28):

4J1(ρ(·)) ≥∫Rd

ηkH dρs −∫Rd

ηkH dρ0 −∫ s

0

∫Rd

∆(ηkH) dρt dt

+

∫ s

0

∫Rd

(∇H · ∇(ηkH)− 1

2|∇(ηkH)|2

)dρt dt. (3.30)

for any k ∈ N and s ∈ [0, 1].

We now estimate each term in the right-hand side of (3.30). For the first two terms,we have∫

Rd

ηkH dρs −∫Rd

ηkH dρ0 ≥∫Rd

ηk|H| dρs − 2 | inf H| −∫Rd

ηkH dρ0. (3.31)


For the third term of (3.30), we find

−∫ s

0

∫Rd

∆(ηkH) dρt dt = −∫ s

0

∫Rd

(H∆ηk + 2∇ηk · ∇H + ηk∆H) dρt dt

≥ −∫ s

0

∫Rd

(|∆ηk| |H|+ |∇ηk| (|∇H|2 + 1) + ηk|∆H|

)dρt dt

(3.29)

≥ −∫ s

0

∫Rd

(1

k2ηkH +

ηkk

(|∇H|2 + 1) + ηk|∆H|)dρt dt

Ass.3.3.1(4)

≥ −∫ s

0

∫Rd

(1

k2ηkH +

ηkk

(|∇H|2 + 1) + ηk(C1|∇H|2 + C2|H|+ C3

))dρt dt

≥ −s(

1

k2+ C2

)supt∈[0,s]

∫Rd

ηk|H| dρt −(

1

k+ C1

)∫ s

0

∫Rd

ηk|∇H|2 dρt dt−s

k− sC3.

(3.32)

Finally, for the last part of (3.30)∫ s

0

∫Rd

(∇H · ∇(ηkH)− 1

2|∇(ηkH)|2

)dρt dt

=

∫ s

0

∫Rd

(−1

2|∇ηk|2H2 + (1− ηk)∇ηk ·H∇H + (1− 1

2ηk)ηk|∇H|2

)dρt dt

(3.29)

≥∫ s

0

∫Rd

(− 1

2k2ηkH

2 − 2ηk∣∣1kH∣∣ ∣∣1

2∇H

∣∣+3

4ηk|∇H|2

)dρt dt,

≥∫ s

0

∫Rd

(− 3

2k2ηkH

2 +

(3

4− 1

4

)ηk|∇H|2

)dρt dt.

≥∫ s

0

∫Rd

(−3C0(1 + k2)

2k2ηk|H|+

1

2ηk|∇H|2

)dρt dt

≥ −3sC0(1 + k2)

2k2supt∈[0,s]

∫Rd

ηk|H| dρt +

∫ s

0

∫Rd

(1

2ηk|∇H|2

)dρt dt,

(3.33)

where the fourth line follows from Young’s inequality, and in the fifth line we used sub-quadratic Assumption 3.3.1(2). Substituting (3.31), (3.32) and (3.33) into (3.30) we get

∫Rd

ηk|H| dρs +

∫ s

0

∫Rd

1

2ηk|∇H|2 dρt dt ≤ 4J1(ρ(·)) + 2 | inf H|+

∫Rd

ηkH dρ0 +s

k+ sC3

+ s

(1

k2+ C2 +

3C0(1 + k2)

2k2

)supt∈[0,s]

∫Rd

ηk|H| dρt +

(1

k+ C1

)∫ s

0

∫Rd

ηk|∇H|2 dρt dt.

If we first discard the first term on the left-hand side and maximise the equation overs ∈ [0, δ] for some 0 < δ ≤ 1, then discard the second term and maximise, the sum of the


inequalities can be written as(1− 2δ

(1

k2+ C2 +

3C0(1 + k2)

2k2

))supt∈[0,δ]

∫Rd

ηk|H| dρt+(

1

2− 2

k− 2C1

)∫ δ

0

∫Rd

|∇H|2 dρt dt

≤ 8J1(ρ(·)) + 4 | inf H|+ 2

∫Rd

ηkH dρ0 +2δ

k+ 2δC3.

For δ such that 1 > 2δ(C2 + 3C0

2

), we get 1 > 2δ

(1k2 + C2 + 3C0(1+k2)

2k2

)for suffiently large

k, and therefore from Fatou’s Lemma

α supt∈[0,δ]

∫Rd

|H| dρt + β

∫ δ

0

∫Rd

|∇H|2 dρt dt ≤ 8J1(ρ(·)) + 4 | inf H|+ 2

∫Rd

H dρ0 + 2δC3,

with α := 1 − 2δ(C2 + 3C0

2) > 0 and β := 1

2− 2C1. The latter is positive by Assumption

3.3.1(4).

The second auxiliary lemma is:

Lemma 3.3.8. Let ε > 0 and ρ(x) dx ∈ P(Rd) be given. Let θ(x) :=(

12π

) d2 e−|x|2

2 bethe density of the d-dimensional normal distribution. We define θε(x) := ε−dθ(x

ε) and

ρε := ρ ∗ θε. Then there exists a constant Cε that depends only on ε such that I(ρε) < Cε.

Proof. We have

∇ρε(x) = (ρ ∗ ∇θε)(x) =

∫Rd

ρ(x− y)∇θε(y) dy = −ε−2

∫Rd

ρ(x− y)yθε(y) dy.

Furthermore

|∇ρε(x)|2 ≤ ε−4

∫Rd

ρ(x−y)|y|2θε(y) dy

∫Rd

ρ(x−y)θε(y) dy ≤ ε−4ρε(x)

∫Rd

ρ(x−y)|y|2θε(y) dy.

Now

I(ρε) =

∫Rd

|∇ρε(x)|2

ρε(x)dx ≤ ε−4

∫Rd

∫Rd

ρ(x− y)|y|2θε(y) dy dx

= ε−4

∫Rd

∫Rd

ρ(x− y) dx |y|2θε(y) dy

≤ ε−4

∫Rd

|y|2θε(y) dy =: Cε.

We are now ready to proceed with the


Proof of Proposition 3.3.6. Let ρ(·) satisfy the assumptions (of Proposition 3.3.6). ByLemma 3.3.7 we have ∫ 1

0

∫Rd

|∇H(x)|2ρt(dx) dt <∞

and therefore

1

4h

∫ 1

0

‖∂tρt − h∆ρt‖2−1,ρt dt ≤

1

2h

∫ 1

0


dt

+h

2

∫ 1

0

∫Rd

|∇H|2ρt(dx) dt <∞.

Take a 0 < s ≤ 1. Since

1

4h

∫ s

0

‖∂tρt − h∆ρt‖2−1,ρt

dt <∞ (3.34)

we have that ‖∂tρt − h∆ρt‖2−1,ρt <∞ for almost every t. By [FK06, Lem. D.34] there is a

vt ∈ L2(ρt) such that

∂tρt − h∆ρt = − div(vt ρt)

in distributional sense. Take the Gaussian θε(x) as in Lemma 3.3.8. Then we have

∂tρt,ε − h∆ρt,ε = − div(vt,ε ρt,ε),

where

ρt,ε = ρt ∗ θε(x), vt,ε =(vt ρt) ∗ θε(x)

ρt,ε.

By [AGS08, Th. 8.1.9] we have

1

4h

∫ s

0

‖∂tρt,ε − h∆ρt,ε‖2−1,ρt dt ≤

1

4h

∫ s

0

‖vt,ε‖2L2(ρt,ε)

dt ≤ 1

4h

∫ s

0

‖vt‖2L2(ρt)

dt =

1

4h

∫ s

0

‖∂tρt − h∆ρt‖2−1,ρt

dt. (3.35)

Furthermore by Lemma 3.3.8 we have that∫ s

0

‖∆ρt,ε‖2−1,ρt,ε dt =

∫ s

0

I(ρt,ε) dt ≤ Cε, (3.36)

and therefore ∫ s

0

‖∂tρt,ε‖2−1,ρt,ε dt <∞. (3.37)

From (3.36) and since ρ0 ∈ P2(Rd), by using [FK06, Lem. D.34] and Lemma 3.2.1 we getthat the curve ρt,ε is absolutely continuous in P2(Rd). In addition, it is straightforward


that S(ρt,ε) is finite for every 0 < t ≤ s. From (3.36), (3.37) and by Lemma 3.2.3, S(ρt,ε)is absolutely continuous with respect to t. Hence we obtain

1

4h

∫ s

0

‖∂tρt,ε − h∆ρt,ε‖2−1,ρtdt

=1

4h

∫ s

0

‖∂tρt,ε‖2−1,ρtdt+

h

4

∫ s

0

‖∆ρt,ε‖2−1,ρtdt−

1

2

∫ s

0

(∆ρt,ε, ∂tρt,ε)−1,ρtdt

=1

4h

∫ s

0


h

4

∫ s

0

‖∆ρt,ε‖2−1,ρt +

1

2S(ρs,ε)−

1

2S(ρ0,ε).

It follows from this and (3.35) that

1

4h

∫ s

0


h

4

∫ s

0

‖∆ρt,ε‖2−1,ρtdt+

1

2S(ρs,ε)−

1

2S(ρ0,ε) ≤

1

4h

∫ 1

0

‖∂tρt−h∆ρt‖2−1,ρtdt.

Now letting ε go to zero and by the lower semicontinuity of the entropy and the Fisherinformation functionals we get S(ρs) <∞ and

∫ s0‖∆ρt‖2

−1,ρt dt <∞. Therefore∫ s

0

‖∂tρt‖2−1,ρtdt ≤ 2

(∫ s

0

‖∂tρt − h∆ρt‖2−1,ρtdt+ h2

∫ s

0

‖∆ρt‖2−1,ρt dt

)<∞.

and∫ s

0

‖∆ρt + div ρt∇H‖2−1,ρtdt ≤ 2

(∫ s

0

‖∆ρt‖2−1,ρt dt+

∫ s

0

∫Rd

|∇H(x)|2ρt(x) dx dt

)<∞.

By Lemma 3.2.1, the curve ρt is in ACW2

([0, 1];P2(Rd)

). Moreover, t 7→ F(ρt) is absolutely

continuous and (3.17) holds. Hence we have

1

4h

∫ 1

0


dt

=1

4h

∫ 1

0


dt+h

4

∫ 1

0

‖∆ρt + div(ρt∇H))‖2−1,ρt

dt+1

2F(ρ1)− 1

2F(ρ0).

This finishes the proof of the proposition.

Remark 3.3.9. For the superquadratic case, the above Proposition was proved by Fengand Nguyen in [FN12] by using probabilistic tools. In addition, they obtain an estimate forthe growth of F along the curves.

Now the following is a straightforward result:

Corollary 3.3.10. Let ρ0 ∈ P2(Rd) with F(ρ0) < ∞. If H ∈ C2(Rd) satisfies eitherAssumption 3.3.1 or 3.3.4, then

Jh(ρ|ρ0) = infρ(·)∈CW2

(ρ0,ρ)

1

4h

∫ 1

0

‖∂tρt − h(∆ρt + div(ρt∇H))‖2−1,ρt dt.


3.4 Lower bound

In this section we prove the lower bound of the Gamma convergence (3.8) in our mainresult, Theorem 3.1.2.

Theorem 3.4.1 (Lower bound). Under the assumptions of Theorem 3.1.2, we have forany ρ1 ∈ P2(Rd) and all sequences ρh1 ∈ P2(Rd) narrowly converging to ρ1

lim infh→0

(Jh(ρ

h1 |ρ0)− W 2

2 (ρ0, ρh1)

4h

)≥ 1

2F(ρ1)− 1

2F(ρ0). (3.38)

Proof. Take any sequence ρh1 ∈ P2(Rd) narrowly converging to a ρ1 ∈ P2(Rd). We onlyneed to consider those ρh1 for which Jh(ρ

h1 |ρ0) <∞. For each such ρh1 , by the definition of

infimum there exists a curve ρht ∈ C(ρ0, ρh1) satisfying

1

4h

∫ 1

0

∥∥∂tρht − h(∆ρht + div(ρht ∇H))∥∥2

−1,ρhtdt ≤ Jh(ρ

h1 |ρ0) + h <∞. (3.39)

By Proposition 3.3.6 for the subquadratic case and [FN12, Lem. 2.6] for the superquadraticcase, we have

Jh(ρh1 |ρ0) + h ≥ 1

4h

∫ 1

0

∥∥∂tρht − h(∆ρht + div(ρht ∇H))∥∥2

−1,ρhtdt

=1

4h

(∫ 1

0

∥∥∂tρht ∥∥2

−1,ρhtdt+ 2h(F(ρh1)−F(ρ0)) + h2

∫ 1

0

‖∆ρht + div(ρht ∇H)‖2−1,ρht

dt

)=

1

2(F(ρh1)−F(ρ0)) +

1

4h

∫ 1

0

∥∥∂tρht ∥∥2

−1,ρhtdt+

h

4

∫ 1

0

‖∆ρht + div(ρht ∇H)‖2−1,ρht

dt

≥ 1

2(F(ρh1)−F(ρ0)) +

1

4h

∫ 1

0

∥∥∂tρht ∥∥2

−1,ρhtdt

≥ 1

2(F(ρh1)−F(ρ0)) +

1

4hW 2

2 (ρ0, ρh1).

In the last inequality above we have used the Benamou-Brenier formula (3.14) for theWasserstein distance. Finally, using ρh1 → ρ1 narrowly with the narrow lower semi-continuity of F , we find that

lim infh→0

(Jh(ρ

h1 |ρ0)− W 2

2 (ρ0, ρh1)

4h

)≥ 1

2F(ρ1)− 1

2F(ρ0).

3.5 Recovery sequence

In this section we prove the upper bound of the Gamma convergence (3.8). This willconclude the proof of Theorem 3.1.2.

3.5. Recovery sequence 81

Theorem 3.5.1 (Recovery sequence). Under the assumptions of Theorem 3.1.2, for anyρ1 ∈ P2(R) there exists a sequence ρh1 ∈ P2(R) converging to ρ1 in the Wasserstein metricsuch that

lim suph→0

(Jh(ρ

h1 |ρ0)− W 2

2 (ρ0, ρh1)

4h

)≤ 1

2S(ρ1)− 1

2S(ρ0). (3.40)

As mentioned in Section 3.1, our approach for the recovery sequence only works ford = 1. Hence throughout this section, we will consider d = 1.

The existence of the recovery sequence is proven by making use of the following dense-ness argument, which is also interesting in its own4:

Proposition 3.5.2. Let (X, d) be a metric space and let Q be a dense subset of X. IfKn, n ∈ N and K∞ are functions from X to R such that:

(a) Kn(q)→ K∞(q) for all q ∈ Q,

(b) for every x ∈ X there exists a sequence qn ∈ Q with qn → x and K∞(qn)→ K∞(x),

then for every x ∈ X there exists a sequence rn ∈ Q, with rn → x such that Kn(rn) →K∞(x).

Proof. The proof is by a diagonal argument. Take any x ∈ X and take the correspondingsequence qn → x such that K∞(qn) → K∞(x). By assumption, for any q ∈ Q and L > 0there exists a nL,q such that for any n ≥ nL,q there holds d(Kn(q), K∞(q)) < 1/L. Define

ρn :=

1, 1 ≤ n < n2,q2 ,

2, n2,q2 ≤ n < maxn2,q2 , n3,q3,. . .

Take the subsequence rn := qρn . Observe that ρn → ∞ as n → ∞ such that indeedqρn → x, and:

d(Kn(qρn), K∞(x)) ≤ d(Kn(qρn), K∞(qρn))︸︷︷︸≤ 1ρn

+d(K∞(qρn), K∞(x))→ 0.

For a fixed ρ0 satisfying the assumptions of Theorem 3.1.2, we want to apply Proposi-

4A more or less similar idea can be found in [Bra02, Remark 1.29]; Proposition 3.5.2 is slightly stronger.


tion 3.5.2 to the situation where

X = P2(R),

Q = Q(ρ0)

=ρ(x) dx ∈ P2(R) : ρ(x) is bounded from below by a positive constant in every compact set,

I(ρ),

∫R

|H ′(x)|2ρ(x) dx <∞, and there exists a M > 0 such that

ρ0(x) = ρ(x) for all |x| > M,

Kn(ρ) = Jhn(ρ |ρ0)− W 22 (ρ0, ρ)

4hn, for an arbitrary sequence hn converging to zero,

K∞(ρ) =1

2F(ρ)− 1

2F(ρ0).

Assumption (a) of Proposition 3.5.2, i.e. pointwise convergence for every ρ1 ∈ Q(ρ0),can be proven as follows. Take ρ1 ∈ Q(ρ0) and let ρt be the geodesic that connects ρ0

and ρ1. In the following Lemma 3.5.3, we will prove that I(ρt) and∫R|H ′(x)|2ρt(x)dx are

uniformly bounded, so that we have∫ 1

0

‖∂tρt − h(∂xxρt + ∂x(ρtH′))‖2

−1,ρtdt

≤ 3

∫ 1

0


dt+ 3h2

∫ 1

0

‖∂xxρt‖2−1,ρt dt+ 3h2

∫ 1

0

‖∂x(ρtH ′)‖2−1,ρt dt <∞.

By Proposition 3.3.6 for the subquadratic case or [FN12, Lem. 2.6] for the superquadraticcase, together with Young’s inequality:

limh→0

(Jh(ρ1|ρ0)− W 2

2 (ρ0, ρ1)

4h

)≤ lim

h→0

[h

2

∫ 1

0

(∫R

((ρ′t(x))2

ρt(x)+ |H ′(x)|2ρt(x)

)dx

)dt

+1

2F(ρ1)− 1

2F(ρ0)

]=

1

2F(ρ1)− 1

2F(ρ0).

The pointwise convergence then follows from this together with the lower bound (3.38).To prove the uniform bounds:

Lemma 3.5.3. Let H ∈ C2(R) with H(x) > −A − B|x|2 for some positive constants(this includes both our cases). Let ρ0 = ρ0(x)dx ∈ P2(R) be absolutely continuous withrespect to the Lesbegue measure, where ρ0(x) is bounded from below by a positive constantin every compact set. Let ρ1 ∈ Q(ρ0) and ρt be the geodesic that connects ρ0 and ρ1.Assume that E(ρ0), I(ρ0) and

∫R|H ′(x)|2ρ0(x)dx are all finite. Then F(ρt), I(ρt) and∫

R|H ′(x)|2ρt(x) dx are uniformly bounded with respect to t.


Proof. Let T (x) be the optimal map that transports ρ0(dx) to ρ1(dx). The geodesic thatconnects ρ0 and ρ1 is defined by

ρt(x) = ((1− t)x+ tT (x))]ρ0(x).

First we prove that I(ρt) is uniformly bounded with respect to t. In the real line, the mapT (x) can be determined via the cumulative distribution functions as follows [Vil03, Sect.2.2]). Let F (x) and G(x) be respectively the cumulative distribution functions of ρ0(dx)and ρ1(dx), i.e.

F (x) =

∫ x

−∞ρ0(x) dx; G(x) =

∫ x

−∞ρ1(x) dx.

Then T = G−1 F . We have

F (M) +

∫ +∞

M

ρ0(x) dx = G(M) +

∫ +∞

M

ρ1(x) dx = 1. (3.41)

From (3.41) and by the assumption that ρ0(x) = ρ1(x) for all |x| > M we find thatF (M) = G(M). Hence for all x such that |x| > M we have

F (x) = F (M) +

∫ x

M

ρ0(x) dx = G(M) +

∫ x

M

ρ1(x) dx = G(x).

Consequently, for all x with |x| > M we have T (x) = (G−1F )(x) = x. Therefore T ′(x) = 1for all |x| > M . Also since the densities ρ0, ρ1 are absolutely continuous (by assumption)we get that F (x), G(x) are differentiable everywhere with G′(x) = ρ1(x) > 0. We deducethat T (x) has a classical derivative everywhere and moreover since G(T (x)) = F (x), bydifferentiating we get that T (x) satisfies the Monge - Ampere equation.

ρ0(x) = ρ1(T (x))T ′(x).

or equivalently (since ρ1(x) > 0),

T ′(x) =ρ0(x)

ρ1(T (x)). (3.42)

Because of (3.42) we have that T ′(x) is absolutely continuous and strictly positive. There-fore the derivative of T ′ exists almost everywhere. Now for the derivative of T ′ we have

T ′′(x)

T ′(x)= (log(T ′(x)))′

= (log(ρ0(x))− log(ρ1(T (x)))′

=ρ′0(x)

ρ0(x)− ρ′1(T (x))T ′(x)

ρ1(T (x)).

Set Tt(x) = tx+ (1− t)T (x). For 0 ≤ t ≤ 1 we have

ρt(x) = ρ1(Tt(x))T ′t(x), (3.43)


Since ρ1(Tt(x)) and T ′t(x) are both absolutely continuous so is ρt(x). Hence the derivativeappeared in (3.15) for I(ρt) is the classical derivative. Substituting (3.43) into (3.15) weget ∫

R

(ρ′t(x))2

ρt(x)dx =

∫R

[(ρ1(Tt(x))T ′t(x))′]2

ρ1(Tt(x))T ′t(x)dx

=

∫R

[ρ′1(Tt(x))T ′t(x)2 + ρ1(Tt(x))T ′′t (x)]2


≤ 2

∫R

(ρ′1(Tt(x)))2(T ′t(x))4

ρ1(Tt(x))T ′t(x)dx+ 2

∫R

(ρ1(Tt(x))T ′′t (x))2


= 2

∫R

(ρ′1(Tt(x)))2

ρ1(Tt(x))(T ′t(x))3 dx+ 2

∫R

ρ1(Tt(x))(T ′′t (x))2

T ′t(x)dx (3.44)

Note that in the inequality above we have used (a + b)2 ≤ 2(a2 + b2). To proceed we willestimate each term in the right-hand side of (3.44) using the fact that |T ′(x)| is boundedand I(ρ0), I(ρ1) <∞. For the first part we have∫

R

(ρ′1(Tt(x)))2

ρ1(Tt(x))(T ′t(x))3 dx =

∫R

(ρ′1(Tt(x)))2

ρ1(Tt(x))(T ′t(x))(T ′t(x))2 dx

≤ C2

∫R

(ρ′1(Tt(x)))2

ρ1(Tt(x))(T ′t(x)) dx

= C2

∫R

(ρ′1(x))2

ρ1(x)dx

= C2I(ρ1). (3.45)

Let B be the ball of radius M centered at the origin. Since T ′′(x) = 0 for all |x| > M wecan restrict our calculation for the second part in the ball B.∫R

ρ1(Tt(x))(T ′′t (x))2

T ′t(x)dx =

∫B

ρ1(Tt(x))(T ′′t (x))2

T ′t(x)dx

=

∫B

ρ1(Tt(x))((1− t)T ′′(x))2

T ′t(x)dx

=

∫B

ρ1(Tt(x))T ′t(x)

(T ′(x)(1− t)

T ′t(x)

)2(T ′′(x)

T ′(x)

)2

dx

=

∫B

ρ1(Tt(x))T ′t(x)

(T ′(x)(1− t)

t+ (1− t)T ′(x)

)2(ρ′0(x)

ρ0(x)− ρ′1(T (x))T ′(x)

ρ1(T (x))

)2

dx

≤ 2

∫B

ρ1(Tt(x))T ′t(x)

(ρ′0(x)

ρ0(x)

)2

dx

+ 2

∫B

ρ1(Tt(x))T ′t(x)

(ρ′1(T (x))T ′(x)

ρ1(T (x))

)2

dx


= 2

∫B

ρ1(Tt(x))T ′t(x)

ρ0(x)

((ρ′0(x))2

ρ0(x)

)dx

+ 2

∫B

ρ1(Tt(x))T ′t(x)T ′(x)

ρ1(T (x))

((ρ′1(T (x)))2

ρ1(T (x))T ′(x)

)dx

≤ C

(∫B

(ρ′0(x))2

ρ0(x)dx+

∫B

(ρ′1(T (x)))2

ρ1(T (x))T ′(x) dx

)≤ C(I(ρ0) + I(ρ1)). (3.46)

From (3.44), (3.45) and (3.46) we find that

I(ρt) =

∫R

(ρ′t(x))2

ρt(x)dx ≤ C(I(ρ0) + I(ρ1)).

Next we are going to prove the boundedness of the functional∫R|H ′(x)|2ρt(x)dx.

Since T (x) = x for |x| > M we have ρt(x) = ρ1(x) for |x| > M . Hence∫R

|H ′(x)|2ρt(x) dx =

∫B

|H ′(x)|2ρt(x) dx+

∫|x|>M

|H ′(x)|2ρt(x) dx

=

∫B

|H ′(x)|2ρt(x) dx+

∫|x|>M

|H ′(x)|2ρ1(x) dx

≤ C

∫B

ρt(x) dx+

∫|x|>M

|H ′(x)|2ρ1(x) dx

≤ C +

∫|H ′(x)|2ρ1(x) dx <∞.

Now we repeat the same argument for E(ρt). Finally by [Vil09, Cor. 20.13] we get thatS(ρ0),S(ρ1) are finite and the result for S(ρt) comes from the fact that S is geodesicallyconvex.

Finally we prove that for ρ0 satisfying the assumptions in the main Theorem 3.1.2, theset Q(ρ0) is dense in P2(R), thus satisfying assumption (b) of Proposition 3.5.2. The ideabehind the lemma is a simple modification of a cut and glue argument (see Figure 3.1). Fora given measure ρ1 ∈ P2(R), we construct a measure that is in some sense nice and closeto ρ1 in a compact set, and equal to ρ0 outside of it. To do so, we first find an interval suchthat the contribution of both measures ρ0, ρ1 to the functionals S and E is small outsidethat interval. We cut out the part of ρ0 that lies outside the interval, mollify it to ensureboth positivity and smoothness, and then add a quadratic decay to get finiteness of theFisher information functional5. For ρ0 we just keep the tails and add a quadratic decay.The approximating probability measure is then produced by a linear combination of theabove constructed measures.

5It is easy to check that a linear decay, which would have been a simpler choice, is not enough to keepthe Fisher information functional finite.


Lemma 3.5.4. Assume that ρ0 ∈ P2(R) such that the density is bounded from below by apositive constant in every compact set, and F(ρ0),

∫|H ′|2 dρ0 and I(ρ0) are all finite. Let

H ∈ C2(R) satisfy either Assumption 3.3.1 or 3.3.4. Then for any ρ1 ∈ P2(R) there existsa sequence ρh in Q(ρ0) such that ρh → ρ1 in the Wasserstein topology, and F(ρh)→ F(ρ1).

Proof. Take a ρ1 ∈ P2(R) with E(ρ1) <∞ (otherwise the construction is trivial). First ob-serve that, because

∫x2ρ1(dx),

∫x2ρ0(dx) and S(ρ0),S(ρ1) are all finite,

∫|ρ1 log ρ1|,

∫|ρ0 log ρ0|

are also finite [JKO98, Eq. (15)]. Secondly,∫|H| dρ1 and

∫|H| dρ0 are also finite since H is

bounded from below in both Assumptions 3.3.1 and 3.3.4. Therefore, for any h > 0 thereexist Lebesgue points M−

h < −1 and M+h > 1 of ρ1 such that (to ease notation we assume

that −M−h = M+

h =: Mh)

ρ0(−Mh), ρ1(−Mh) < min

h

|H(−Mh)|,h

M2h

and ρ0(Mh), ρ1(Mh) < min

h

|H(Mh)|,h

M2h

(3.47a)∫

|x|>Mh

(ρ0 + |ρ0 log ρ0|+ x2ρ0 + |H|ρ0

)< h, (3.47b)∫

|x|>Mh

(ρ1 + |ρ1 log ρ1|+ x2ρ1 + |H|ρ1

)< h. (3.47c)

Define a new density by cutting the tails of ρ1, and mollifying it by the Gaussian θtfrom Lemma 3.3.8:

µh := (ρ1 1[−Mh,Mh]) ∗ θth , (3.48)

where th is chosen sufficiently small such that∫ Mh

−Mh

|µh − ρ1| < h, (3.49a)∫ Mh

−Mh

|Hµh −Hρ1| < h and

∫ Mh

−Mh

|x2µh − x2ρ1| < h, (3.49b)∣∣∣∣ ∫ Mh

−Mh

(µh log µh − ρ1 log ρ1

)∣∣∣∣ < h, (3.49c)

µh(−Mh) < min

h|H(−Mh)| ,

hM2h

and µh(Mh) < min

h

|H(Mh)| ,hM2h

, (3.49d)

µh(x) > 0 whenever |x| ≤Mh. (3.49e)

Observe that property (3.49d) is feasible, because −Mh and Mh are Lebesgue points of ρ1

and

µh(Mh) ≤ (ρ1 ∗ θth)(Mh) and µh(−Mh) ≤ (ρ1 ∗ θth)(−Mh).

In order to construct a suitable approximating sequence for ρ1, small intervals around−Mh and Mh are needed where bounds of the type (3.47a) and (3.49d) still hold. Indeed,


because of (3.49d) and the continuity of H, there exists 0 < ah < 1 such that for allx ∈ [−Mh − ah,−Mh + ah]:

ρ0(x) < h, and ρ0(−Mh) < min

h

|H(x)|,h

x2

, (3.50a)

µh(x) < h, and µh(−Mh) < min

h

|H(x)|,h

x2

, (3.50b)

and for all x ∈ [Mh − ah,Mh + ah]:

ρ0(x) < h, and ρ0(Mh) < min

h

|H(x)|,h

x2

, (3.50c)

µh(x) < h, and µh(Mh) < min

h

|H(x)|,h

x2

. (3.50d)

Note that by assumption Mh > 1, so that the two intervals can not overlap.Now, using these intervals, replace the tails of µh, which were introduced by the molli-

fication, by quadratically decaying tails (see Figure 3.1)

νh(x) =

µh(x), |x| ≤Mh,(x−Mh−ah

ah

)2

µh(Mh), Mh < x < Mh + ah,(x+Mh+ah

ah

)2

µh(−Mh), −Mh − ah < x < −Mh,

0, |x| ≥Mh + ah.

On the other hand, the approximation sequence for ρ1 requires the same tails as ρ0; thesetails are captured by (see Figure 3.1)

νh0 (x) =

0, |x| ≤Mh − ah,(x−Mh+ah

ah

)2

ρ0(Mh), Mh − ah < x < Mh,(x−ah+Mh

ah

)2

ρ0(−Mh), −Mh < x < −Mh + ah,

ρ0(x), |x| ≥Mh.

Finally, the approximating sequence is defined as a normalised sum of ν and ν0:

ρh(x) := αh νh(x) + νh0 (x), (3.51)

where ‖ · ‖1 abbreviates the L1(R) norm, and αh :=1−‖νh0 ‖1‖νh‖1 .

Now we check that the sequence ρh indeed lies in Q(ρ0). By construction, ρh hasthe same tails as ρ0, and it is bounded from below a positive constant on compact sets.Moreover, it is straight-forward that

∫x2 dνh0 ,

∫x2 νh,

∫|H ′|2 dνh0 ,

∫|H ′|2 dνh and, I(νh0 ) are


νh

µh

−Mh Mh

(a) Cut the tails, add quadratic de-cay.

ρ0

νh0

−Mh Mh

(b) Crop the tails, add quadratic de-cay.

Figure 3.1: The construction of νh and νh0 .

all finite; I(νh) is finite by Lemma 3.3.8. Then the functionals∫x2 dρh,

∫|H ′|2 dρh are also

finite. To check that the Fisher information remains finite:

I(ρh) =

∫R

(αh ν

h′ + νh0′)2

αh νh + νh0dx

≤ 2

∫R

(αh ν

h′)2

αh νh + νh0dx+ 2

∫R

(νh0′)2

αh νh + νh0dx

≤ 2

∫R

(αh ν

h′)2

αh νhdx+ 2

∫R

(νh0′)2

νh0dx

= 2αhI(νh) + 2I(νh0 ) <∞,

so that indeed ρh ∈ Q(ρ0).Next, the convergence properties of the sequence ρh are checked. First we show that

ρh → ρ1 in L1(R). Since ‖νh0 ‖1 → 0 and ‖νh‖1 → 1, the normalisation constant alsoconverges: αh → 1. Therefore,∫

R

|ρh − ρ1| =∫R

∣∣αhνh + νh0 − ρ1

∣∣≤∫ Mh

−Mh

∣∣αhµh − µh∣∣+

∫ Mh

−Mh

|µh − ρ1|+∫|x|>Mh

αhνh +

∫|x|>Mh

ρ1 +

∫R

νh0

≤∫ Mh

−Mh

∣∣αhµh − µh∣∣+

∫ Mh

−Mh

|µh − ρ1|+ αhµh(−Mh)

∫ −Mh

−Mh−ah

(x+Mh+ah

ah

)2

+ αhµh(Mh)

∫ Mh+ah

Mh

(x−Mh−ah

ah

)2

+

∫|x|>Mh

ρ1 + ‖νh0 ‖1

≤ |αh − 1|∫ Mh

−Mh

µh +

∫ Mh

−Mh

|µh − ρ1|+ αhah µh(−Mh) + αhah µ

h(Mh) +

∫|x|>Mh

ρ1 + ‖νh0 ‖1

≤ |αh − 1|+ h+ αhµh(−Mh) + αhµ

h(Mh) + h+ ‖νh0 ‖1 −−→h→0

0,

(3.52)


where the last line follows from ah < 1 together with (3.49a) and (3.47c).

Secondly, we check the convergence of the second moments∫Rx2ρh →

∫Rx2ρ1. Observe

that there is a uniform bound on∫ Mh

−Mh

x2µh ≤∫ Mh

−Mh

∣∣x2µh − x2ρ1

∣∣+

∫ Mh

−Mh

x2ρ1

(3.49b)< h+

∫R

x2ρ1 ≤ 1 +

∫R

x2ρ1 (3.53)

for h ≤ 1. Moreover, for the right-side quadratic tail of νh:∫ Mh+ah

Mh

x2νh =

∫ Mh+ah

Mh

x2

(x−Mh − ah

ah

)2

µh(Mh) dx ≤∫ Mh+ah

Mh

x2 µh(Mh) dx(3.50d)< hah ≤ h,

(3.54)and similarly for the other quadratically decaying parts of νh and νh0 . Therefore∫R

|x2ρh − x2ρ1| ≤∫R

|αhx2νh − x2νh|+∫R

|x2νh − x2ρ1|+∫R

x2νh0

≤ |αh − 1|∫ Mh

−Mh

x2µh + |αh − 1|∫|x|>Mh

x2νh +

∫ Mh

−Mh

|x2µh − x2ρ1|+∫|x|>Mh

x2νh

+

∫|x|>Mh

x2ρ1 +

∫ Mh

−Mh

x2νh0 +

∫|x|>Mh

x2ρ0

≤ |αh − 1|(

1 +

∫R

x2ρ1 + 2h

)+ 7h→ 0

as h→ 0, where the last line follows from (3.53), (3.54), (3.47b), (3.47c) and (3.49b). Sincethe sequence ρh converges strongly in L1(R) to ρ1 by (3.52), it also converges narrowly.Together with the convergence of the second moments, this implies convergence in theWasserstein distance [Vil03, Th. 7.12], which was to be shown.

Thirdly, we need to check that E(ρh) → E(ρ1); this is proven in the same way as theconvergence of the second moments above, where x2 is replaced by the potential H(x).

Finally, we prove the convergence of the entropies S(ρh)→ S(ρ1). Because of

|S(ρh)− S(ρ1)| ≤ |S(νh)− S(ρ1)|+ |S(ρh)− S(νh)|, (3.55)

it suffices to show that both differences on the right-hand side vanish. For the first differ-ence:

S(νh) =

∫ Mh

−Mh

(νh log νh − ρ1 log ρ1

)+

∫ Mh

−Mh

ρ1 log ρ1 +

∫Mh<|x|<Mh+ah

νh log νh → S(ρ1).

Here, the first term vanishes by (3.49c), and the third term, containing the quadraticallydecaying tails, vanishes because µh(−Mh) and µh(Mh) vanish. For the second difference

Chapter 4

GENERIC structure of theVlasov-Fokker-Planck equation

In this chapter we discuss the connections between a Vlasov-Fokker-Planck equationand an underlying microscopic particle system, and we interpret those connections in thecontext of the GENERIC framework (Ottinger 2005). This interpretation provides (a) avariational formulation for GENERIC systems, (b) insight into the origin of this variationalformulation, and (c) an explanation of the origins of the conditions that GENERIC placeson its constitutive elements, notably the so-called degeneracy or non-interaction conditions.This work shows how the general connection between large-deviation principles on one handand gradient-flow structures on the other hand extends to non-reversible particle systems.1

4.1 Introduction

4.1.1 The Vlasov-Fokker-Planck equation

This chapter deals with the Vlasov-Fokker-Planck (VFP) equation (1.5),

∂tρ = − divq

(ρp

m

)+ divp ρ

(∇qV +∇qψ ∗ ρ+ γ

p

m

)+ γθ∆pρ. (4.1)

The VFP is the hydrodynamic limit of a collection of interacting Brownian particles withinertia given by the following stochastic differential equation (1.6)

dQi(t) =Pi(t)

mdt, (4.2a)

dPi(t) = −∇V (Qi(t)) dt−n∑j=1

∇ψ(Qi(t)−Qj(t))−γ

mPi(t) dt+

√2γθ dWi(t). (4.2b)

1This chapter is a joint work with Mark A. Peletier and Johannes Zimmer [DPZ13b]. The paper hasbeen selected by the editors of Nonlinearity for inclusion in the exclusive 2013 Highlights Collection.

91

92 GENERIC structure of the Vlasov-Fokker-Planck equation

4.1.2 Aim of the chapter

The framework GENERIC described in Section 1.3 provides a systematic method toderive thermodynamically consistent evolution equations. It was originally introducedin the context of complex fluids [OG97a, OG97b], and more recently has been appliedto anisotropic inelastic solids [HT08a], to viscoplastic solids [HT08b], and thermoelasticdissipative materials [Mie11]. The key ingredients of GENERIC are its building blocks:a Poisson operator L, a dissipative operator M, an energy functional E, and an entropyfunctional S, which are required to satisfy certain properties, see Section 1.3. Althoughmany equations have been shown to have a GENERIC structure, two important aspectshave not been addressed.

The first one is the relationship between the GENERIC framework on one hand andlarge deviations of underlying microscopic particle systems on the other. It is well-knownthat many deterministic evolution equations can be derived as hydrodynamic limits ofa stochastic particle system. More recently it has become clear the connection betweenparticle systems and their upscaled limits runs deeper: gradient-flow structures of thelimit equations arise as characterizations of the large-deviation behaviour of the stochasticparticle systems, thus explaining amongst other things the origin of the Wasserstein gra-dient flows [ADPZ11, ADPZ13, DLZ12, PRV13, Ren13]. In this chapter we generalize thisrelationship beyond gradient flows to an example from the class of GENERIC systems.

The second aspect is a variational structure for GENERIC systems. The study of vari-ational structure has important consequences for the analysis of an evolution equation. Itprovides general methods for proving well-posedness [AGS08] and characterizing large-timebehaviour (e.g., [CMV03]), gives rise to natural numerical discretizations (e.g., [DMM10]),and creates handles for the analysis of singular limits (e.g., [SS04, Ste08, AMP+12]). Theappearance of the concepts of energy and entropy in the formulation of GENERIC sug-gests a strong variational connection, but to date this has not been made explicit. In thischapter we exhibit such a variational structure, and as in the case of the gradient flows,this structure is intimately tied to the large-deviation behaviour of an underlying system.

In this chapter we treat some of these questions in full generality, that is, for a general,abstract GENERIC system. Because of this generality the treatment is necessarily formal.We illustrate the abstract features with a specific system, that of the Vlasov-Fokker-Planckequation (4.1), for which the large-deviation behaviour has been proved rigorously. Thisgives a specific case in which the impact of the abstract arguments can be recognized. Wealso discuss some generalizations at the end of the chapter.


In Section 4.2, we construct a large-deviation principle for the SDE (4.2) associatedwith the VFP equation. Next, in Section 4.3 we construct a GENERIC structure for theVFP equation and reformulate the large-deviation rate function in this context. Finally,in Section 4.4 we deduce from the large-deviation result a variational formulation for theVFP equation and more generally for any GENERIC system.

4.2. Large deviations for the VFP equation 93

Having connected the GENERIC structure with particle systems and large deviations,in Section 4.6 we use this connection to understand the origin and interpretation of thevarious properties of GENERIC listed in Section 1.3. We conclude with some generaliza-tions.

4.2 Large deviations for the VFP equation

For many gradient-flow systems it is now understood that the gradient-flow structureitself arises from the fluctuation behaviour of an underlying stochastic process [DG89,ADPZ11, ADPZ13, DLZ12, PRV13, DLR13, Ren13]. The theory of large deviations allowsone to make this statement precise. We now apply the same ideas to the VFP equation.

We first specify our conditions on the functions ψ and V . Since we are interested inpresenting ideas rather than obtaining the most general results, we choose fairly restrictiveconditions on V and ψ to eliminate technical complications:

V ∈ C2(Rd) with globally bounded second derivatives, and V ≥ 0; (4.3a)

ψ ∈ C2(Rd) with globally bounded first and second derivatives, and ψ ≥ 0. (4.3b)

In addition, we assume that the initial datum ρ0 satisfies

ρ0 ∈ P(R2d) with H(ρ0) <∞, (4.3c)

where H is defined in (4.28a). With these assumptions,

• Given a deterministic starting position, the stochastic differential equation (4.2) hasstrong solutions that are weakly unique (see e.g. [KS91, Chapter 3]) and non-explosive(e.g. [Wu01]);

• The VFP equation (4.1) is well-defined in the distributional sense and has a uniquedistributional solution with initial datum ρ0 [Mel96].

Given a realization (Qi, Pi)ni=1 of the particle system (4.2), we define the empirical

measure

ρn : [0,∞)→ P(R2d), ρn(t) :=1

n

n∑i=1

δ(Qi,Pi)(t).

Theorem 4.2.4 below states that the random variable ρn satisfies a large-deviation principleas n→∞.

For the theorem below we equip P(R2d) with the weak or narrow topology, gener-ated by the duality with Cb(R

2d), so that the space C([0, T ];P(R2d)) consists of narrowlycontinuous curves in P(R2d).

Define for ν ∈ P(R2d) the parametrized generator

Aν : D(Aν) ⊂ Cb(R2d)→ Cb(R

2d),

Aνf :=p

m· ∇qf −

[∇qV +∇qψ ∗ ν + γ

p

m

]· ∇pf + γθ∆pf.


Note that equation (4.1) can be written in terms of the transpose Aτ as

∂tρt = Aτρtρt.

For the formulation of the rate function we will also need the concept of absolutecontinuity in distributional sense. For a compact set K ⊂ R2d, the space DK is the setof all f ∈ C∞c (R2d) with supp f ⊂ K; the set D is the union of all DK , with the usualtest-function topology.

Definition 4.2.1. A curve [0, T ] 3 t 7→ ρt ∈ P(R2d) is called absolutely continuous indistributional sense if it has the following property: for each compact K ⊂ R2d there existsa neighbourhood UK of 0 in DK and an absolutely continuous function GK : [0, T ] → Rsuch that

∀ 0 ≤ t1 ≤ t2 ≤ T, ∀f ∈ UK : | 〈ρt1 , f〉 − 〈ρt2 , f〉 | ≤ |GK(t1)−GK(t2)|.

The set of all such curves is denoted AC([0, T ];P(R2d)). If ρ is absolutely continuous,then for almost all t ∈ [0, T ] the time derivative ∂tρt exists in D′(R2d). The proof of thisand other properties of this concept can be found in [DG87, Section 4].

Finally, we define the norm that will measure the magnitude of fluctuations:

Definition 4.2.2. Fix ρ ∈ P(R2d). For any distribution T ∈ D′(R2d) define

‖T ‖2−1,ρ := sup

f∈C∞c (R2d)

2〈T , f〉 −∫R2d

|∇pf |2 dρ. (4.4)

Define L2∇(ρ) as the completion of ∇pf : f ∈ C∞c (R2d) with respect to the norm

‖ · ‖2ρ :=

∫R2d

| · |2 dρ.

Note that, depending on ρ, ‖ · ‖ρ may be only a seminorm and not a norm; but sincethe completion identifies elements that have zero distance in this seminorm, L2

∇(ρ) is awell-defined Hilbert space. Its elements are equivalence classes of measurable functionsthat are ρ-a.e. equal. Also note that whenever H(ρ) <∞, the function (q, p) 7→ p belongsto L2

∇(ρ).The dual norm ‖ · ‖−1,ρ has an explicit representation:

Lemma 4.2.3. It holds that

‖T ‖2−1,ρ =

∫R2d

|h|2 dρ if T = − divp(ρh) with h ∈ L2∇(ρ),

+∞ otherwise.


Proof. Results of this type are common; this argument is adapted from [DG87].Since there is a one-to-one correspondence between f ∈ C∞c (R2d) and ∇pf ∈ L :=

∇pf : f ∈ C∞c (R2d), T can be considered to be a linear functional on L. If ‖T ‖−1,ρ <∞,we can replace f by λf and optimize with respect to λ ∈ R in (4.4). We then find that

|〈T , f〉| ≤ ‖T ‖−1,ρ‖∇pf‖ρ.

Therefore T is bounded with respect to the L2∇(ρ)-norm; it can be uniquely extended to a

bounded linear functional on the whole of L2∇(ρ), and Riesz’ representation theorem implies

the assertion of the Lemma.

We can now state the large-deviation principle.

Theorem 4.2.4. Assume that the initial data (Qi(0), Pi(0)), i = 1, . . . , n are deterministicand chosen such that ρn(0) ρ0 for some ρ0 ∈ P(R2d). Then the empirical process ρnsatisfies a large-deviation principle in the space C([0, T ],P(R2d)), with good rate function

I(ρ) =

1

4γθ

∫ T

0

∥∥∂tρt − Aτρtρt∥∥2

−1,ρtdt if ρ ∈ AC([0, T ];P(R2d)) and ρ|t=0 = ρ0,

+∞ otherwise.

(4.5)

The rate function I can also be written as

I(ρ) =

1

4γθ

∫ T

0

∫R2d

|ht|2 dρtdt if ∂tρt = Aτρtρt − divp(ρtht), for h ∈ L2(0, T ;L2∇(ρt))

and ρ|t=0 = ρ0,

+∞ otherwise.

(4.6)

Proof. We set x = (q, p) and b(x, ν) =(p/m,−∇V (q)−(∇ψ∗ν)(q)−γp/m

)for ν ∈ P(R2d).

Then b : R2d × P(R2d) → R2d is continuous and, by the assumptions (4.3), satisfies theestimate

|b(x, ν) · x| ≤ C(1 + |x|2) for all x ∈ R2d and ν ∈ P(R2d).

The system (4.2) can be written as system of weakly interacting diffusions

dXi(t) = b(Xi(t), ρn(t)) dt+ σ dWi(t), (4.7)

where Wi are d-dimensional standard Wiener processes and for the length of this proof, σis the 2d× d matrix

σ =√

2γθ

(0Id

).

Theorem 3.1 and Remark 3.2 of [BDF12] implies that ρn satisfies a large-deviation principlewith rate function

I(ρ) := inf E[

1

2

∫ T

0

|Ut|2 dt],


where the infimum is taken over all processes (X,U,W ) taking values in R2d ×Rd ×Rd

that solve

dX t = b(X t, ρt) dt+ σUt dt+ σ dWt, (4.8a)

W is a standard d-dimensional Wiener process, (4.8b)

lawX t = ρt for all t. (4.8c)

For each such triple, for any f ∈ C∞c (R×R2d) the process

Mt := ft(X t)− f0(X0)−∫ t

0

[(∂s + Aρs + (σUs) · ∇)fs

](Xs) ds

is a martingale, and therefore EMt = EM0 = 0 for every t > 0.We now show (4.5) by showing that I = I. Define for any ρ ∈ C([0, T ];P(R2d)) and

f ∈ C∞c (R×R2d),

J(ρ, f) :=

∫R2d

fT dρT−∫R2d

f0 dρ0−∫ T

0

∫R2d

[(∂s+Aρs)fs

]dρsds−γθ

∫ T

0

∫R2d

|∇pft|2 dρtdt.

It is well known (see e.g. [DG87, Lemma 4.8]) that

I(ρ) = supf∈C∞c (R×R2d)

J(ρ, f).

We have for any f ∈ C∞c (R×R2d) and for any solution (X,U,W ) of (4.8),

E[

1

2

∫ T

0

|Ut|2 dt]

= E[∫ T

0

(Ut∇pft(X t)−

1

2|∇pft(X t)|2

)dt

]+ E

[1

2

∫ T

0

|Ut −∇pft(X t)|2 dt].

Using EMT = 0 we rewrite this as

E[fT (XT )− f0(X0)−

∫ T

0

[(∂s + Aρs)fs

](Xs) ds−

1

2

∫ T

0

|∇pfs(Xs)|2 ds]

+ E[

1

2

∫ T

0

|Ut −∇pft(X t)|2 dt]

= J(ρ,

f√2γθ

)+ E

[1

2

∫ T

0

|Ut −∇pft(X t)|2 dt]. (4.9)

Therefore

I(ρ) = inf E[

1

2

∫ T

0

|Ut|2 dt]≥ sup

fJ(ρ, f) = I(ρ).

To prove the converse inequality, assume without loss of generality that I(ρ) < ∞.Using a reasoning similar to the proof of Lemma 4.2.3 we find that there exists an h ∈L2(0, T ;L2

∇(ρt)) such that

∂tρt − Aτρtρt = −√

2γθ divp ρtht in the sense of distributions. (4.10)


Here the space L2(0, T ;L2∇(ρt)) is the Hilbert space obtained by closing C∞c (R×R2d) with

respect to the (semi-)norm

‖f‖2ρ,T :=

∫ T

0

∫R2d

|f(x, t)|2 ρt(x) dt. (4.11)

We now construct a specific solution of (4.8). Let (X,W ) be a solution of (4.8a) with

U = 0 and law X0 = ρ0; let P be the law of (X,W ) on C([0, T ]; R2d)×C([0, T ]; Rd). Since‖h‖ρ,T <∞, the process

Nt := σ

∫ t

0

hs(Xs) dWs

is a P -square integrable continuous martingale with quadratic variation 〈N〉t = 2γθt.Define Ph as the modified law on C([0, T ]; R2d)× C([0, T ]; Rd) given by

Ph := exp[NT − 1

2〈N〉T

]P.

By the Girsanov theorem (e.g. [IW81, Section IV.4]) Ph is the law of the unique solution(X,W ) of equation (4.8a) with Ut = ht(Xt), and since equation (4.10) is the correspondingFokker-Planck equation, it follows that the law of Xt is equal to ρt. Therefore (X, hX,W )is a solution of (4.8). Using (4.9) for this solution, we find for all f that

I(ρ) ≤ J(ρ,

f

2γθ

)+

1

4γθE[∫ T

0

|ht(Xt)−∇pft(Xt)|2 dt]

≤ I(ρ) +1

4γθE[∫ T

0

|ht(Xt)−∇pft(Xt)|2 dt]

= I(ρ) +1

4γθ

∫ T

0

∫R2d

|ht(ξ)−∇pft(ξ)|2 ρt(dξ)dt.

Since L2(0, T ;L2∇(ρt)) is the closure of C∞c under the norm (4.11),

inff∈C∞c (R×R2d)

∫ T

0

∫R2d

|ht(ξ)−∇pft(ξ)|2 ρt(dξ)dt = 0.

Hence I(ρ) ≤ I(ρ) and this concludes the proof of (4.5). The form as in (4.6) of I thenfollows from (4.5) and Lemma 4.2.3.

Remark 4.2.5. The structure of the large-deviation result of Theorem 4.2.4 reflects anumber of properties of the stochastic particle system (4.2). To start with, the rate functionis only finite if ∂tρ−ATρ ρ only has a perturbation in the p-direction, not in the q-direction;this reflects the fact in (4.2) that the noise is confined to the P -equation. In addition, theperturbation can only be in divergence form; this reflects the deterministic conservation ofparticles. Finally, the flux is of the form ρh where h is in the closure L2

∇(ρ) of p-gradients;this property is also seen in the characterization of absolutely continuous curves in theWasserstein metric [AGS08, Theorem 8.3.2].


Remark 4.2.6. There is a large literature on large-deviation principles for stochastic par-ticle systems; here we just mention a few results. Dawson and Gartner [DG87] prove alarge-deviations result for systems of interacting particles with non-degenerate diffusion,i.e., for nonsingular mobilities σ with range R2d. Cattiaux and Leonard [CL94, CL95a]generalize the method of Dawson and Gartner to singular mobilities, but for independentparticles. In a separate paper [CL95b], Cattiaux and Leonard also discuss the identifica-tion question treated in the proof of Theorem 4.2.4 in more generality. Fischer [Fis12] alsoproves identification results on related systems.

In the proof above we used the large-deviation result by Budhiraja et al. [BDF12] above toobtain the large-deviation principle itself and a first characterization of the rate functional.The methods by which we identified I with I are standard, but we did not find a theoremthat suited our needs, and therefore we gave a separate proof.

Remark 4.2.7. If the initial datum for the particle system is not deterministic, as in thecase of Theorem 4.2.4, then we expect that the sequence ρn then satisfies a large-deviationprinciple with rate function I(ρ)+I0(ρ|t=0), where I0 is the rate function of the initial dataρn|t=0.

For the sequel it will be useful to have a regularity result on the Hamiltonian H(see (4.28a)) associated with those curves ρ for which I(ρ) is finite:

Lemma 4.2.8. If I(ρ) < ∞ and H(ρ0) < ∞, then the function t 7→ H(ρt) is an elementof W 1,2(0, T ), and

∫R2d p

2 dρt ∈ L∞(0, T ).

Proof. By (4.3), H(ρ) bounds the integral∫p2/m2 dρ from above. Using the characteriza-

tion of I in (4.6), we formally calculate that

∂tH(ρt) =γθd

m− γ

∫p2

m2dρt −

∫p

m· ht dρt (4.12)

≤ γθd

m− γ

∫p2

m2dρt + γ

∫p2

m2dρt +

1

4γ

∫|ht|2 dρt

=γθd

m+

1

4γ

∫|ht|2 dρt.

This calculation can be made rigorous in its time-integrated form by approximating p2/m2+V (q) by a sequence of smooth functions fn ∈ C∞c (R2d), and using fn in the distributionalform of the equation ∂tρt = ATρtρt − divp(ρtht). Continuing with the proof, it follows that

supt∈[0,T ]

H(ρt) ≤ H(ρ0) +γθd

mT +

1

4γ

∫ T

0

∫|ht|2 dρt = H(ρ0) +

γθd

mT + θ I(ρ) <∞,

and consequently∫p2 dρt is also uniformly bounded. We conclude by remarking that the

right-hand side of (4.12), as a function of time t, is an element of L2(0, T ).

Remark 4.2.9. Note that a solution ρ of (4.1) satisfies I(ρ) = 0, and therefore Lemma 4.2.8also applies to solutions of (4.1).

4.3. The VFP equation and the large deviations in GENERIC form 99

4.3 The VFP equation and the large deviations in

GENERIC form

In this section we reformulate both the VFP equation and the large-deviation ratefunctional of the previous section in terms of the GENERIC structure. It will becomeapparent that the large-deviation behaviour respects the GENERIC structure, in the sensethat the rate function for this system can be formulated in an abstract form, using onlythe GENERIC building blocks. This will suggest in Section 4.4 a variational formulationfor a very general GENERIC system.

4.3.1 GENERIC formalism

In this section, we recall the GENERIC framework presented in the introductory chap-ter.

A GENERIC equation for an unknown z in a state space Z is a mixture of both reversibleand dissipative dynamics:

∂tz = L dE + MdS. (4.13)

Here

• E, S : Z→ R are interpreted as energy and entropy functionals,

• dE, dS are appropriate derivatives of E and S (such as either the Frechet derivativeor a gradient with respect to some inner product);

• L = L(z) is for each z an antisymmetric operator satisfying the Jacobi identity

F1,F2L,F3L + F2,F3L,F1L + F3,F1L,F2L = 0, (4.14)

for all functions Fi : Z → R, i = 1, 2, 3, where the Poisson bracket ·, ·L is definedvia

F,GL := dF · L dG (4.15)

(see Remark 4.3.1 for a discussion of the meaning of the ‘dot’ here).

• M = M(z) is symmetric and positive semidefinite.

Moreover, the building blocks L,M,E, S are required to fulfill the degeneracy conditions :for all z ∈ Z,

L dS = 0, MdE = 0. (4.16)

A GENERIC system is then fully characterized by the quintuple Z,E, S, L,M.

Remark 4.3.1. In equation (4.13) we implicitly have assumed that Z is a space with a dif-ferentiable structure, in which time derivatives ∂tz and state-space derivatives dS and dE ex-ist. In many cases of importance, including the main example of this paper, this is not true,and then generalizations are necessary; the book by Ambrosio, Gigli and Savare [AGS08]


is an example of such generalizations in the case of gradient flows. Nonetheless, we feelthat the formal differentiable way of writing provides the right intuition, and therefore inthis formal part of the paper we maintain this way of writing the system.

Even in the smooth setting, we have not made specific exactly which derivative dE anddS should be, and let us briefly make the situation concrete. Derivatives of the functionalsE and S are naturally defined as covectors, i.e. elements of the cotangent space (they arethen called differentials) or dual space (called Frechet derivatives). Since ∂tz is an elementof the tangent or primal space, L and M should be duality maps, mapping cotangent totangent spaces, or equivalently dual to primal spaces. In this case the meaning of the dotin (4.15) is that of the duality pairing.

In practice, however, it often is more convenient to use gradients rather than differen-tials: then the covectorial derivative is mapped to a tangent vector by some fixed dualitymapping, associated with an inner product, often only formally. In all of the explicit cal-culations in this paper this will be the case; for instance, we use the L2(R2d) structure as aformal inner product on the space of measures P(R2d) to define ‘gradH’ in equation (4.26).In this situation L and M map vectors to vectors, and the dot in (4.15) is that of the formalinner product.

4.3.2 Making the VFP equation conserve energy

As it stands, the VFP equation (4.1) does not satisfy the conditions of GENERIC,since there is no conserved functional E. The reason for this is physical: the SDE (4.2)models a system of particles in interaction with a heat bath, and this interaction causesfluctuations of the natural energy (the Hamiltonian) of the particle system,

Hn(Q1, . . . , Qn, P1, . . . , Pn) :=1

n

n∑i=1

[ P 2i

2m+ V (Qi)

]+

1

2n2

n∑i,j=1

ψ(Qi −Qj). (4.17)

Indeed, combining (4.2) with Ito’s lemma the derivative of the expression above is

− 1

n

n∑i=1

[γ

m2P 2i dt−

γθd

mdt+

√2γθ

mPi dWi

],

which has no reason to vanish. There is a simple remedy for this: we add a single scalarunknown en and define its evolution by the negative of the above, leading to the extendedparticle system

dQi =Pimdt, (4.18a)

dPi = −∇V (Qi) dt−n∑j=1

∇ψ(Qi −Qj)−γ

mPi dt+

√2γθ dWi, (4.18b)

den =1

n

n∑i=1

[γ

m2P 2i dt−

γθd

mdt+

√2γθ

mPi dWi

], (4.18c)


with which Hn + en becomes deterministically constant. Note that en can be interpretedas the energy of the heat bath; the flow of energy between the particle system and the heatbath is described by the flow of energy between Hn and en.

Exactly the same arguments apply to the VFP equation (4.1). At this level the analogueof the Hamiltonian Hn is the functional H defined in (4.28a), and indeed H is not constantalong a solution, as can be directly verified. We mirror the arguments above and adda new variable e, depending only on time, so that the solution space becomes (ρ, e) ∈P(R2d)×R. The full system is now defined by the VFP equation (4.1) plus the equationde/dt = −(d/dt)H(ρ), that guarantees that H(ρ) + e is conserved. When writing thisequation in full, it becomes

∂tρ = − divq

(ρp

m

)+ divp ρ

(∇qV +∇qψ ∗ ρ+ γ

p

m

)+ γθ∆pρ, (4.19a)

d

dte = γ

∫R2d

p2

m2ρ(dqdp)− γθd

m. (4.19b)

We stress that this system is coupled only in one direction: the second equation isslaved to the first one. Note that equation (4.19b) is well-defined: if H(ρ0) <∞, then byLemma 4.2.8 and Remark 4.2.9 H(ρt) is bounded for all t; therefore

∫p2dρt is finite for all

t.By this simple mechanism a non-conserving system can be made conserving. Although

mathematically this is no more than a trick, for this system it has physical meaning, aswe argued above: the additional variable keeps track of the movement of energy betweenthe particle system and the heat bath. We next show that the remaining conditions ofGENERIC can also be verified.

4.3.3 The VFP equation as a GENERIC system

With the extension of the previous section, the VFP equation is formally a GENERICsystem with the following building blocks:

Z = P2(R2d)×R, E(ρ, e) = H(ρ) + e, L = L(ρ, e) =

(Lρρ 00 0

),

z = (ρ, e), S(ρ, e) = S(ρ) + e, M = M(ρ, e) = γ

(Mρρ Mρe

Meρ Mee

),

(4.20)

where the operators defining L and M are given, upon applying them to a vector (ξ, r) at(ρ, e), by

Lρρξ = div ρJ∇ξ, Mρρξ = − divp ρ∇pξ, Mρer = r divp

(ρp

m

),

Meρξ = −∫R2d

p

m· ∇pξ ρ(dqdp) Meer = r

∫R2d

p2

m2ρ(dqdp).


The space P2(R2d) is the subset of P(R2d) with bounded second p-moments:

P2(R2d) :=ρ ∈ P(R2d) :

∫R2d

p2ρ(dpdq) <∞.

We equip P2(R2d) with the same weak topology as P(R2d). Finally, the entropy S isdefined as

S(ρ) := −θ∫R2d

f(x) log f(x) dx whenever ρ has Lebesgue density f .

With these definitions, equation (4.1) can be written as

∂tzt = L(zt) gradE(zt) + M(zt) grad S(zt), (4.21)

where the gradient operators are to be interpreted as L2-gradients. At this stage, however,this equation is formal, since the sense in which this equation holds has not been specified.Rather than going into detail here, we defer this discussion to after the introduction of thevariational structure in Section 4.4.

The operators L and M can readily be seen to be antisymmetric and symmetric (withrespect to the L2-inner-product, since we use L2-gradients as derivatives); for instance, inthe case of L, we have for any vectors (ξ1, r1) and (ξ2, r2) at (ρ, e) by partial integrationthat

〈(ξ1, r1), L(ρ, e)(ξ2, r2)〉 = 〈ξ1, Lρρ(ρ)ξ2〉 =

∫R2d

ξ1 div ρJ∇ξ2 = −∫R2d

∇ξ2 · JT∇ξ1 ρ,

which is antisymmetric since J is antisymmetric (see (4.27)). The verification of the symme-try of M is similar; the verification of the Jacobi identity (1.14) is a tedious but elementarycalculation, which hinges on the fact that J is constant and antisymmetric. Finally, theverification of the degeneracy conditions (1.16) is again straightforward.

4.3.4 Large deviations for the VFP equation in GENERIC form

We now reformulate the large-deviations rate functional of Theorem 4.2.4 in terms ofthe GENERIC building blocks above, and therefore in terms of the extended unknownz = (ρ, e) ∈ Z. To do this, we also generalize the concepts of absolute continuity andintroduce the appropriate norms.

Definition 4.3.2. The function [0, T ] 3 t 7→ z(t) = (ρ(t), e(t)) ∈ Z is absolutely continu-ous if ρ ∈ AC([0, T ];P2(R2d)) and e ∈ AC([0, T ]; R).

Again, if z is absolutely continous, then ∂tz exists for almost all t as an element of D′(R2d)×R.

The ‘matrix’ M generates a natural pair of semi-inner-products and seminorms.


Definition 4.3.3. Fix z = (ρ, e) ∈ Z. The seminorms ‖ · ‖M(z) and ‖ · ‖M(z)−1 are definedas follows. For (ξ, r) ∈ C∞c (R2d)×R,

‖(ξ, r)‖2M(z) := γ

∫R2d

[ξMρρξ + ξMρer + rMeρξ + rMeer

]dx

= γ

∫R2d

∣∣∣∇pξ − rp

m

∣∣∣2 dρ = γ∥∥∥∇pξ − r

p

m

∥∥∥2

ρ.

For (T , s) ∈ D′(R2d)×R,

‖(T , s)‖2M(z)−1 = sup

ξ∈C∞c (R2d)r∈R

2〈T , ξ〉+ 2sr − ‖(ξ, r)‖2M(z). (4.22)

The inner products (·, ·)M and (·, ·)M−1 are then defined through the expression 4(a, b) =‖a+ b‖2 − ‖a− b‖2.

As in the case of L2∇(ρ), the M-seminorm is degenerate: there exist ρ, ξ, and r for which

it vanishes. Let HM be the set of equivalence classes of elements of C∞c (R2d) × R withzero distance in this norm. On HM, the M-seminorm is a norm, and we define HM as thecompletion of HM with respect to this norm. Note that HM can be identified with thespace L2

∇(ρ), as follows. On one hand, if (ηn, sn) is a Cauchy sequence in HM, then∥∥(ηn, sn)− (ηn′ , sn′)∥∥M

=√γ∥∥∥∇p(ηn − ηn′)− (sn − sn′)

p

m

∥∥∥ρ−→ 0 as n, n′ →∞,

so that∇pηn−snp/m is a Cauchy sequence in L2∇(ρ) and thus converges to some h ∈ L2

∇(ρ);vice versa, for each h ∈ L2

∇(ρ) by definition there exists a sequence ηn ∈ C∞c such that∇pηn → h in L2

∇(ρ), and therefore (ηn, 0) is a Cauchy sequence in HM corresponding to h.Since the M-seminorm is degenerate, the M−1-seminorm is singular. Indeed, Lemma 4.2.3

implies the following

Lemma 4.3.4. Assume that∫R2d p

2 dρ <∞. Then

‖(T , s)‖2M(z)−1 =

1

γ

∫R2d

|h|2 dρ if T = − divp ρh with h ∈ L2∇(ρ) and s = −

∫R2d

p

m· h dρ,

+∞ otherwise.

Proof. As in the case of Lemma 4.2.3, ‖(T , s)‖M(z)−1 < ∞ implies that (T , s) is a linearfunctional on C∞c ×R, and by the assumption

∫p2 dρ <∞ it is bounded with respect to

the M-seminorm. Because of the identification with L2∇(ρ) we can consider it as a bounded

linear functional on L2∇(ρ). By the Riesz representation theorem there exists an element

h ∈ L2∇(ρ) such that for all ξ and r

〈T , ξ〉+ rs =

∫R2d

h(∇pξ − r

p

m

)dρ =

∫R2d

h · ∇pξ dρ− r

∫R2d

h · pmdρ.

From this identity the claim follows.


The rate function of Theorem 4.2.4 now has a reformulation in terms of the objectsthat we have just defined.

Lemma 4.3.5. The rate function I of Theorem 4.2.4 can be written in terms of z as

J(z) =

∫ T

0

1

4θ

∥∥∂tzt − L(zt) gradE(zt)−M(zt) grad S(zt)∥∥2

M(zt)−1 dt,

if z = (ρ, e) ∈ AC([0, T ];Z) and ρt=0 = ρ0,

+∞ otherwise,

(4.23)in the sense that

J((ρ, e)

)=

I(ρ) provided t 7→ H(ρt) + et is constant

+∞ otherwise.

Proof. First assume that I(ρ) <∞. By (4.5) and Lemma 4.2.3 we have

∂tρt − Aτρtρt = − div ρtht,

where h ∈ L2(0, T ;L2∇(ρt)). Define e by

e0 := 0 and ∂tet = γ

∫R2d

p2

m2ρt(dqdp)−

γθd

m+

∫R2d

p

mht ρt(dqdp).

By Lemma 4.2.8 the function t 7→∫p2 dρt is in L∞(0, T ), and since h ∈ L2(0, T ;L2

∇(ρt))the last term is in L1(0, T ); therefore e is well-defined, and an element of AC([0, T ]; R). Byconstruction the function t 7→ H(ρt) + et is constant. Upon setting z := (ρ, e), an explicit

calculation shows that I(ρ) and J(z) are both equal to (4γθ)−1∫ T

0

∫R2d |ht|2 dρtdt.

A similar argument starts by assuming J(z) <∞ for z = (ρ, e) and showing that I(ρ)and J(z) are again equal.

Remark 4.3.6. Note how the condition of constant energy H + e is contained in (4.23)through the defintion of the seminorm ‖ · ‖M−1.

4.4 A variational formulation for GENERIC systems

The functional J in (4.23) has the interesting property that it only depends on theGENERIC building blocks, and therefore makes sense, at least formally, for an arbi-trary GENERIC system. We now explore the consequences of this observation for generalGENERIC systems. The discussion in this section is therefore necessarily formal.

First, we note that the functional J can be written in a different way by using one ofthe degeneracy conditions (1.16). As above, we associate a formal inner product with Mand M−1 by

(a, b)M := a ·M b and (a, b)M−1 := a ·M−1b.

4.4. A variational formulation for GENERIC systems 105

Then the antisymmetry of L and the first degeneracy condition in (1.16) imply that(L gradE,M grad S

)M−1 = L gradE · grad S = −gradE · L grad S = 0.

Therefore

‖∂tz− L gradE−M grad S∥∥2

M−1 = ‖∂tz− L gradE∥∥2

M−1 + ‖M grad S∥∥2

M−1 + 2(∂tz,M grad S

)M−1

= ‖∂tz− L gradE∥∥2

M−1 + ‖ grad S∥∥2

M+ 2 ∂tz · grad S,

so that

2θJ(z) = S(z(T ))− S(z(0)) +1

2

∫ T

0

[‖∂tz− L gradE

∥∥2

M−1 + ‖ grad S∥∥2

M

]dt. (4.24)

This discussion suggests a general variational formulation for any GENERIC system,as follows:

Variational formulation of a GENERIC system: Given a GENERICsystem Z,E, S, L,M, define J as in (4.24). A function z : [0, T ] → Z is asolution of the GENERIC equation (1.13) iff J(z) = 0.

In full generality, this characterization is formal; no details about the functional settingare stated. In the example of the VFP equation, however, this formulation is exact, asdescribed by Lemma 4.3.5.

Indeed, let us now come back to the question in which sense the VFP equation satisfiesthe GENERIC equation (4.21). The discussion above suggests that this variational formu-lation could be a natural solution concept. Indeed, for any z = (ρ, e) ∈ AC([0, T ];Z) withfinite S(z(0)) each of the terms in (4.24) makes sense as an element of (−∞,∞]:

• S(z(T )) ∈ (−∞,∞] by definition;

• The assumption that z ∈ AC([0, T ];Z) implies that for almost all t, ∂tρ is a distribu-tion on R2d and ∂te exists in R;

• Under reasonable assumptions on V and ψ, L gradE = − divq ρp/m + divp ρ[∇qV +

∇qψ ∗ ρ]

is well-defined in the sense of distributions;

• Therefore the seminorm ‖∂tz− L gradE‖2M−1 is well-defined in [0,∞];

• The seminorm ‖ · ‖2M can be assumed well-defined in [0,∞] for any argument, by

extending it by +∞ outside of HM.

For the VFP equation there are several other solution concepts that are natural fordifferent reasons and have various advantages; examples are distributional solutions andsolutions in the sense of semigroups (since the first and last terms on the right-hand sideof (4.1) form a hypoelliptic operator with a smooth and strictly positive fundamentalsolution). The relevance of this discussion therefore lies not so much in the specific case ofthe VFP equation, but more in the potential application to general GENERIC systems.


Remark 4.4.1. Gradient flows are GENERIC systems with E = 0. For this class ofsystems, this variational formulation is well known and has been put to good use. Forinstance, Sandier and Serfarty [SS04] (see also e.g. [Ser09, Ste08, Le08, AMP+12]) showedhow the variational form can be used to pass to limits in parameters in the equation. Weexpect something similar might be possible for these GENERIC variational formulations,and will return to this in Chapter 7.

4.5 Synthesis

Let us recapitulate what we have just seen.

• The VFP equation has a variational formulation of the type ‘J(z) ≥ 0, and J(z) = 0iff z is a solution’;

• This variational formulation, the functional J , is identical to the large-deviation ratefunctional for the stochastic particle system (4.2) for the case of fixed energy;

• The equation and the variational formulation can both be written in terms of onlythe GENERIC building blocks;

• This suggests a variational formulation for an arbitrary GENERIC system.

In the remainder of this chapter we discuss a number of consequences. In Section 4.6we use the connection between the VFP equation, large deviations, and the GENERICstructure to shed some light on the properties of GENERIC as formulated in Section 1.3.Section 4.7 is devoted to the generalization mentioned in Section 1.3.

4.6 Interpretation of the GENERIC properties

The GENERIC structure of the VFP equation, introduced in Section 4.3.3, does raisesome questions. Why are these bulding blocks the ‘right’ ones, from a philosophical, ormodelling point of view? Is it clear why E and S should be what they are defined to bein (4.20)? Is it clear why L and M are what they are? Why they do indeed satisfy thevarious conditions described above?

In addition, the origin of the GENERIC properties themselves, as described in Sec-tion 1.3, is somewhat obscure. Why should ‘every’ thermodynamic system satisfy theseproperties? We now show how the connection with large deviations of the underlyingparticle system gives us some answers to these questions.

The reversible operator L and the Hamiltonian H. First consider the simpler case whenψ = 0. Then the only non-zero component of the operator L, which is Lρρ = − div ρJ∇, isthe Liouville operator for the Hamiltonian flow on R2d generated by the symplectic matrix

4.6. Interpretation of the GENERIC properties 107

J and the Hamiltonian H(q, p) = p2/2m + V (q). Indeed, x(t) = (q(t), p(t)) solves theHamiltonian equation

d

dtx = −J∇H(x)

if and only if ρ(t) := δx(t) solves

∂tρ− div(ρJ∇H) = 0.

Therefore L is the natural embedding of the symplectic geometry of J in R2d into thespace of measures P(R2d); and when ψ = 0, H(δx) = H(x), and therefore H similarly isthe natural embedding of the R2d-space Hamiltonian H into the space of measures. Theanti-symmetry and Jacobi identity properties of L follow directly from that of the matrix J.

When ψ is non-zero, a similar interpretation of H is possible, since with the notationof (4.17) we have

H(ηn(x1, . . . , xn)

)= Hn(x1, . . . , xn), where ηn(x1, . . . , xn) :=

1

n

n∑i=1

δxi .

Similarly, L can be interpreted as the embedding into P(R2d) of the Hamiltonian flow onR2nd generated by a symplectic matrix Jn consisting of n copies of J.

The entropy functional S. The functional S in (4.20) is defined as e + S(ρ) = e −θ∫ρ log ρ dx. The second term in this sum is the usual entropy of ρ, multiplied by temper-

ature θ. Its form arises from the loss of information in the mapping ηn defined above. Weexplain it now for the case of finite state S = 1, · · · , r; the general case can be handled us-ing the characterization of the relative entropy as a supremum over finite partitions [DE97,Lemma 1.4.3]. Let X1, · · · , Xn be independent identically distributed S-valued randomvariables with common law µ on a probability space (Ω,Σ,P). Define the (random) em-pirical measure

Ln :=1

n

n∑i=1

δXi .

There is a loss of information in going from X1, · · · , Xn to the empirical measure Ln: Ln(ω)characterizes the observed frequencies of 1, · · · , r among X1(ω), · · · , Xn(ω), but does nottell us exactly what values they take. The degree of degeneracy, the number of possibleways that X1(ω), · · · , Xn(ω) can be such that Ln(ω) is equal to a given ρ = (ρi)

ni=1 =(

k1

n, · · · , kr

n

), where (k1, · · · , kr) ∈ Nr,

∑ri=1 ki = n, is n

k1···kr. We have

Prob(Ln = ρ) =n

k1· · · kr

r∏i=1

µkii ,

where µi = µ(i) for i = 1, · · · , r. Hence

1

nlog Prob(Ln = ρ) =

1

n

(log n−

r∑i=1

ki+r∑i=1

ki log µi

).


Using Stirling’s formula in the form

logm= m logm−m+ o(m) as m→∞,

we find

1

nlogP(Ln = ρ) ≈ 1

n

[n log n− n−

r∑i=1

(ki log ki − ki) +r∑i=1

ki log µi

]

= log n−r∑i=1

kin

log ki +r∑i=1

kin

log µi (sincer∑i=1

ki = n)

=r∑i=1

ρi (log n− log ki + log µi) (since ρi =kin

andr∑i=1

ρi = 1)

=r∑i=1

ρi (− log ρi + log µi) = −r∑i=1

ρi logρiµi.

Retracing the steps in this computation we see that the term∑r

i=1 ρi log ρi originates fromthe degree of degeneracy n

k1···kr.

The degeneracy condition L grad S = 0. In the case of the VFP equation, this propertyholds true for any functional which depends locally on ρ, i.e., any functional of the form

F (ρ, e) = e+

∫f(ρ) dx.

The functional S indeed has this form with f(ρ) = ρ log ρ. Therefore the degeneracyL grad S = 0 holds exactly because the entropy is a local functional—and this locality isclosely connected to the fact that the entropy characterizes the loss of information encoun-tered when taking a limit and representing the system in terms of (limits of) empiricalmeasures, as described above.

The irreversible operator M and its properties. To understand the operator M we use anargument that we learned from Alexander Mielke. We transform the co-ordinates z = (ρ, e)to z = (ρ, e), where

ρ := ρ, e := e+

∫gradH dρ.

Then the new variable z again solves a GENERIC equation, with new building blocks L,M, E, and S. Using the change-of-variable formula [OG97a], the operator M is given by

M =∂(z)

∂(z)M

[∂(z)

∂(z)

]T, (4.25)

where∂(z)

∂(z)=

(∂ρ∂ρ

∂ρ∂e

∂e∂ρ

∂e∂e

)=

(id 0∫

gradH id

)

4.6. Interpretation of the GENERIC properties 109

is the transformation matrix. This formula should be read as operator composition; wewrite id for the identity operator, both for functions on R2d and for elements of R, and weuse the notation ∫

gradH for the operator ξ 7→∫ξ gradH.

Hence

M(z) =

(id 0∫

gradH id

)(− divp(ρ∇p) divp(∇p gradH)

−∫∇p gradH · ∇p dρ

∫|∇p gradH|2 dρ

)(id gradH0 id

)=

(id 0∫

gradH id

)(− divp(ρ∇p) 0

−∫∇p gradH · ∇p dρ 0

)=

(− divp(ρ∇p) 0

0 0

).

These remarks now enable us to comment on the form of M. First, the transformation toa different set of variables has the effect of ‘cleaning up’ the operator M: in the new variablesz, the operator only acts on the ρ variable. Also, The operator M is clearly symmetric andpositive semi-definite. The same properties for M then follow as a consequence of (4.25).

The operator − divp(ρ∇p) that appears in M is a familiar figure. It also appears in thecharacterization of Wasserstein gradient flows [ADPZ13], and originates in the fluctuationbehaviour of the Brownian noise in those systems—as is the case in Theorem 4.2.4. Inthe SDE (4.2), however, the noise only appears in the P -variable, and as a consequencethe operator − divp(ρ∇p) also only operates on the p-variables. The symmetry of thisoperator is a consequence of Ito’s formula: in this formula for the stochastic evolutionof functions f(Xt) of a stochastic variable Xt, the second derivative d2f appears, andthis second derivative gives rise to the second-order derivative in − divp(ρ∇p). Thesymmetry of this expression therefore has the same origin as the symmetry of second-derivative matrices of functions.

In the new variables, the degeneracy condition M grad E is natural; indeed, E(z) =

E((ρ, e)

)= e. Therefore grad E = (0, 1), and the degeneracy condition coincides with the

property that only Mρρ is non-zero.

To conclude, the connection between large deviations and the GENERIC structurein the case of the VFP equation allows us to understand and explain where the variousproperties of the GENERIC formalism come from:

• The antisymmetry and the Jacobi identity of L follow from the same properties ofthe underlying Hamiltonian system;

• The symmetry of M follows from the symmetry of second derivatives, as they appearin Ito’s formula;

• The energy E is (an extended version of) the Hamiltonian of the underlying system,after embedding into the space of measures;


• The entropy S characterizes the loss of information upon passing to empirical mea-sures, in the sense of large deviations;

• The degeneracy condition L grad S = 0 arises from the fact that S is a local functional;

• The degeneracy condition M gradE = 0 arises as a consequence of energy conserva-tion.

4.7 The generalized VFP equation

Let H,S : P(R2d) → R be two functionals on P(R2d). Denote by gradH and gradSthe L2-gradient of H and S, otherwise known as the variational derivative.

The following equation we call a generalized Vlasov-Fokker-Planck equation,

∂tρ = div(ρJ∇ gradH) + div(ρσσT∇ grad(H + S)

), (4.26)

where ∇ and div are the gradient and divergence operators with respect to the full spatialvariable x = (q, p) ∈ R2d, and J is the 2d× 2d skew symmetric block matrix

J =

(0 −IdId 0

), (4.27)

where Id is the Rd×d−identity matrix.The Vlasov-Fokker-Planck equation (4.1) is an example of this abstract equation, in

which

x = (q, p)T ∈ R2d, H(ρ) =

∫R2d

(p2

2m+ V (q) +

1

2(ψ ∗ ρ)(q)

)ρ(dqdp), (4.28a)

σ =√γ

(0 00 Id

), S(ρ) = θ

∫R2d

ρ log ρ dqdp. (4.28b)

Other well-known equations are of the same form; the Kramers equation [Kra40] is equa-tion (4.1) with ψ ≡ 0, and Wasserstein gradient flows [Ott01] are of the form (4.26) withσ = I2d and H = 0. As a final example, when

σ = I2d, E(ρ) =1

2

∫R2d

ρ(ψ ∗ ρ), S(ρ) = θ

∫R2d

ρ log ρ,

equation (4.26) becomes

∂tρ = div(ρJ∇ψ ∗ ρ) + θ∆ρ+ div(ρ∇ψ ∗ ρ).

This equation describes the relaxation of a point vortex towards statistical equilibrium, thatarises in the kinetic theory of point vortices. It is closely related to the two-dimensionalNavier-Stokes equation [Cha01, CPR08, CPR09, FS13].

4.7. The generalized VFP equation 111

In this section, we show that the generalized VFP equation, after extension, also is aGENERIC system for abitrary S and H, and we compute the corresponding functional Jexplicitly. This section is necessarily formal.

By computing the derivative ∂H(ρt) for a solution ρ of (4.26) we construct the extendedversion of (4.26):

∂tρ = div(ρJ∇ gradH) + div(D(ρ)∇ grad(H + S)

), (4.29a)

d

dte =

∫R2d

∇ gradH · D(ρ) · ∇ grad(H + S). (4.29b)

Here D(ρ) := ρσσT . The corresponding GENERIC building blocks are

Z = P2(R2d)×R, E(ρ, e) = H(ρ) + e, L = L(ρ, e) =

(Lρρ 00 0

),

z = (ρ, e), S(ρ, e) = S(ρ) + e, M = M(ρ, e) = γ

(Mρρ Mρe

Meρ Mee

),

(4.30)

where the components of L and M are given by

Lρρξ = div ρJ∇ξ, Mρρξ = − div(D(ρ)∇ξ

), Mρer = r div

(D(ρ)∇ gradH

),

Meρξ = −∫R2d

∇ξT · D(ρ) · ∇ gradH, Meer = r

∫R2d

(∇ gradH)T · D(ρ) · ∇ gradH.

Most of the GENERIC properties of Section 1.3 follow immediately from this setup,such as the antisymmetry and symmetry of L and M, the Jacobi identity, the positivesemidefiniteness of M. The degeneracy condition M gradE = 0 can be checked explicitly,but it can also be understood in the same way as in Section 4.6, by first transforming thesystem to a new set of variables.

Finally, the degeneracy condition L grad S requires a specific assumption, as we alreadyencountered above:

Lemma 4.7.1. If S(ρ) =∫f(ρ) for some function f , then the system (4.29) is a GENERIC

system with the building blocks (4.30).

The proof consists of simple verification.

By following the same arguments as in Section 4.2, we find a variational formulation ofexactly the same type: a curve z ∈ AC([0, T ];Z) is a variational solution if J(z) = 0, whereJ is defined by (4.24) with building blocks (4.30). We have the following characterization:

Lemma 4.7.2. For equation (4.29) the functional J , defined in (4.24), can be characterizedas follows: If

d

dt

(ρe

)= VFPg(ρ, e) +

(div(D(ρ)∇η)∫D(ρ)∇η · ∇ gradH

),

then

J(ρ, e) =1

2

∫ T

0

∫R2d

∇ηT · D(ρ) · ∇η dx dt.

Here VFPg(ρ, e) is the right-hand side of (4.29).


The proof follows the same lines as as Lemmas 4.3.4 and 4.3.5.

Chapter 5

q−Gaussians and the Wassersteingradient flow structure of the porousmedium equation

In this chapter, we prove that, for the case of q-Gaussians on the real line, the functionalderived by the JKO-discretization scheme of the porous medium equation is asymptoticallyequivalent to a rate-large-deviation-like functional1.

5.1 Introduction

In this chapter, we provide a first attempt to generalize (1.30) to the porous mediumequation,

∂tρ(t, x) = ∆ρ2−q(t, x), for (t, x) ∈ (0,∞)×Rd and ρ(0, x) = ρ0(x), (5.1)

In [Ott01], the author showed that the porous medium equation is a gradient flow of theinternal energy functional Eq,

Eq(ρ) =

1

1−q

∫Rd

ρ(x) [ρ(x)1−q − 1] dx if q 6= 1,∫Rd

ρ(x) log ρ(x)dx if q = 1,(5.2)

with respect to the Wasserstein distance W2 defined in (1.17). The JKO-scheme nowbecomes

ρk ∈ argminρ

Kh(ρ, ρk−1), Kh(ρ, ρk−1) =1

2hW 2

2 (ρ, ρk−1) + Eq(ρ)− Eq(ρk−1). (5.3)

As has been shown in Chapter 3 (also [ADPZ11, DLZ12]) for the case of the linear diffusionequation (i.e., with q = 1) the functional Kh in (5.3) is asymptotically equivalent, as

1The result of this chapter has been submitted for publication [Duo13].

113

114 q−Gaussians and the Wasserstein gradient flow structure of the PME

h→ 0, to a discrete rate functional Jh that comes from the large deviation principle of themicroscopic model. The rate functional Jh : P(Rd)→ [0,+∞] is defined by

Jh(ρ|ρ0) = infQ∈Γ(ρ0,ρ)

H(Q‖Q0→h), (5.4)

where

Q0→h(dxdy) = ph(x, y)ρ0(dx)dy; ph(x, y) =1

(4πh)d2

e−|x−y|2

4h ,

and H(Q‖Q0→h) is the relative entropy of Q with respect to Q0→h,

H(Q‖Q0→h) =

∫

R2d

log(

dQdQ0→h

)dQ, if dQ

dQ0→hexists,

+∞, otherwise.

The aim of this chapter is to generalize (1.30) to the nonlinear porous medium equationfor the class of q-Gaussian measures in 1D. As will become clear in section 5.2, this classplays an important role because it is invariant under the semigroup of the porous-mediumequation and is isometric to the space of Gaussian measures with respect to the Wassersteinmetric. Thanks to this property, we can compute all relevant objects explicitly. The proofof the main theorem parallels to [DLZ12] though some technical improvements need to bedone due to the nonlinearity. Before stating the main result of this chapter, we need torecall some relevant information about the q-Gaussian.

The q-exponential function and its inverse, the q-logarithmic function, are defined re-spectively by

expq(t) = [1 + (1− q)t]1

1−q+ , (5.5)

where [x]+ = max0, x, and

logq(t) =t1−q − 1

1− qfor t > 0. (5.6)

Given m ∈ R, to be specified later on, the m-relative entropy between two probabilitymeasuresQ(dx) = f(x)dx and P (dx) = g(x)dx, that are absolutely continuous with respectto the Lebesgue measure, is given by

Hm(Q∥∥P ) =

1

2−m

∫[f logm f − g logm g − (2−m) logm g(f − g)] dx (5.7)

=1

2−m

∫[f logm f + (1−m)g logm g − (2−m)f logm g] dx. (5.8)

For v ∈ Rd and V ∈ Sym+(d,R), which is the set of all symmetric positive definite matricesof size d, the q-Gaussian measure with mean v and covariance matrix V is given by

Nq(v, V ) = C0(q, d)(detV )12 expq

[−1

2C1(q, d)〈x− v, V −1(x− v)〉

]Ld, (5.9)


where Ld is the Lebesgue measure on Rd and C0(q, d), C1(q, d) are positive constants de-pending only on d and q defined explicitly in Section 5.2. From now on, we denote bynq(v;V ) the density of Nq(v;V ) with respect to the Lebesgue measure.

In particular, the q-Gaussian measure in 1D has density

nq(µ, σ2) =

C0(q, 1)

σexpq

(−1

2C1(q, 1)

(x− µ)2

σ2

). (5.10)

In 2D, the q-bivariate Gaussian Nq(µ1, σ21, µ2, σ

22, θ) has density

nq(v, V ) =C0(q, 2)

σ1σ2

√1− θ2

expq

−1

2C1(q, 2)

1

1− θ2

[(x− µ1)2

σ21

+(y − µ2)2

σ22

− 2θ(x− µ1)(y − µ2)

σ1σ2

],

(5.11)which corresponds to the mean vector v and covariance matrix V ,

v =

(µ1

µ2

), V =

(σ2

1 θσ1σ2

θσ1σ2 σ22

).

In Section 5.2, it is shown that, if the initial data is a q-Gaussian ρ0(x) = nq(µ0, Cσ20)(x),

then the solution to the porous medium equation at time t is again a q-Gaussian ρ(t, x) =

nq(µ0, C(t+ σ3−q0 )

23−q )(x), where C is an explicit constant given in (5.21).

We are now in the position to introduce the main result of this chapter.

Theorem 5.1.1. Let d = 1, q ∈ Qd ≡ (0, 1) ∪(1, d+4

d+2

)and N0

q = Nq(µ0, Cσ20) be given.

We set m = 3− 2q, σ2

h = (h+ σ3−q0 )

23−q and

Q0→h = Nm(µ0, Cσ20, µ0, Cσ

2h,σ0

σh). (5.12)

For Nq = Nq(µ,Cσ2), we define the functional Jh(Nq, N

0q ) by

Jh(Nq|N0) := infQ∈Q

Hm(Q∥∥Q0→h), (5.13)

whereQ :=

Nm(µ0, Cσ

20, µ, Cσ

2, θ)∣∣ θ ∈ [−1, 1]

.

Then there exist explicit constants a = a(σ0, q) and b = b(σ0, q), which are given respec-tively in (5.59) and (5.60), such that the following statement holds, on the sub-manifold ofthe q-Gaussians equipped with the Wasserstein metric,

1. a(σ2h − σ2

0)1qJh(·|N0

q )Γ→ W 2

2 (·, N0q ) as h→ 0.

2. ab(σ2h − σ2

0)1−qq Jh(·|N0

q )− bσ2h−σ

20W 2

2 (·, N0q )

Γ→ Eq(·)− Eq(N0q ) as h→ 0.

3. If 0 < q < 1, then ab(σ2h−σ2

0)1−qq Jh(·|N0

q )− 12hW 2

2 (·, N0q )

Γ−lim inf−−−−−→ Eq(·)−Eq(N0q ) as h→

0.


Remark 5.1.2. When q → 1, then a→ 4, b→ 12, σ2

h− σ20 → h and we recover (1.30) for

the diffusion equation.

The rest of the chapter is organized as follows. In Section 5.2, we first recall relevantproperties of the q-Gaussians. Next we compute the functional Jh in Section 5.3. Finally,the proof of the main theorem is given in Section 5.4.

5.2 Properties of q-Gaussian measures

The q-exponential function and q-logarithmic function satisfy the following properties

expq(logq x) = logq(expq x) = x, (5.14)

and

logq(xy) = logq x+ logq y + (1− q) logq(x) logq y

= logq x+ x1−q logq y (5.15)

= logq y + y1−q logq x.

The constants C0(q, d) and C1(q, d) in (5.9) are given by

C1(q, d) =2

2 + (d+ 2)(1− q), (5.16)

and

C0(q, d) =

Γ( 2−q

1−q+ d2)

Γ( 2−q1−q )

((1−q)C1(q,d)

2π

) d2

if 0 < q < 1,

Γ( 1q−1)

Γ( 1q−1− d

2)

((q−1)C1(q,d)

2π

) d2

if 1 < q < d+4d+2

.

(5.17)

5.2.1 q-Gaussian measures and solutions of the porous mediumequation

The porous medium equation (5.1) has a self-similar solution, which is called theBarenblatt-Pattle solution, of the form

ρq(x, t) :=[At−dα(1−q) −B|x|2t−1

] 11−q+

=[A−B|x|2t−2α

] 11−q+

t−dα, (5.18)

where

α = α(q, d) :=1

d(1− q) + 2, B = B(q, d) :=

(1− q)α2(2− q)

, (5.19)

5.2. Properties of q-Gaussian measures 117

and A is a normalization constant∫Rd

ρq(x, t)Ld(dx) = 1.

More precise,

A := C2α(1−q)0

[α

(2− q)C1

]dα(1−q)

. (5.20)

It is straightforward to see that

ρq(x, t) = Nq(0, Ct2αId)(x),

where

C :=(2− q)C1

αA. (5.21)

It is well known that a solution to the diffusion equation is obtained by a convolution ofan initial data with the diffusion kernel. Hence if the initial data is a Gaussian measureN(v, V ) then the solution at time t is again a Gaussian N(v, Vt), which is given by

N(v, Vt) = N(v, V ) ∗N(0, 2tId) = N(v, V + 2tId).

In [Tak12, OW10], the authors show that a similar statement holds for the porous mediumequation on the space of q-Gaussian measures. That is, a solution to the porous mediumequation with an initial data being a q-Gaussian measure is again a q-Gaussian for all time.Moreover, to find a solution at time t > 0, it reduces to solving an ordinary differentialequation for the covariance matrix. Let Θ be a map on Sym+(d,R) defined by

Θ(V ) := (detV )−α(1−q)V. (5.22)

We note that ρq(x, t) = Nq(0, CΘ(tId))(x).

Proposition 5.2.1 ([Tak12]). For any q ∈ Qd and V ∈ Sym+(d,R), we set the time-dependent matrix Vt as

Θ(Vt) = Θ(V ) + σ(t)Id,d

dtσ(t) = 2α(det Θ(Vt))

− 1−q2 . (5.23)

Then Nq(v, CΘ(Vt)) is a solution to the porous medium equation.

Remark 5.2.2. The assertion also holds true for q = 1.

We will work out this theorem in 1D in more detail. We calculate relevant variables.

α =1

3− q; Θ(V ) = (detV )−α(1−q)V = V 1−α(1−q) = V

23−q ; det Θ(Vt) = Vt. (5.24)

The ODE becomes

Θ(Vt) = Θ(V ) + σ(t)Id,d

dtσ(t) = 2α(Θ(Vt))

− 1−q2 .


Solving this ODE we get

Θ(Vt) =[α(3− q)t+ Θ(V )

3−q2

] 23−q

= (t+ V )2

3−q .

So if ρ0(x) = Nq(v, CV2

3−q )(x) then ρ(t, x) = Nq(v, C(t+ V )2

3−q )(x) for all t > 0. In other

words, if ρ0(x) = Nq(µ0, Cσ20)(x) then ρ(t, x) = Nq(µ0, C(t+ σ3−q

0 )2

3−q )(x) for all t > 0.

5.2.2 q-Gaussian and the Wasserstein metric

The q-Gaussian measures have another important property stated in the followingProposition.

Proposition 5.2.3 ([Tak12]). For any q ∈ Qd, the space of q-Gaussian measures is convexand isometric to the space of Gaussian measures with respect to the Wasserstein metric.As a consequence, we have

W2(Nq(µ,Σ), Nq(ν, V ))2 = W2(N(µ,Σ), N(ν, V ))2 = |µ− ν|2 + trΣ + trV − 2tr√

Σ12V

12 .

In particular,

W2(Nq(µ1, σ21), Nq(µ2, σ

22))2 = (µ1 − µ2)2 + (σ1 − σ2)2. (5.25)

5.3 Computing the functional Jh

In this section, we compute the functional Jh explicitly.

Proposition 5.3.1. Let P (dx) = g(x)dx be absolutely continuous with respect to theLebesgue measure in Rd. Let Q be the set of all Borel measures Q = f(x)dx in Rd

satisfying ∫Rd

ri(x) · f(x)dx = ai , i ∈ 1, 2, ...N, (5.26)

where ri(x) : Rd → R are given functions and ai ∈ R.Assume that there is a measure Q∗ ∈ Q that has a density satisfying the equation

logm f∗(x) = logm g(x) +

N∑i=1

λiri(x), (5.27)

for some λi ∈ R. Then there holds

Hm(Q∥∥P ) = Hm(Q∗

∥∥P ) +Hm(Q∥∥Q∗) for all Q ∈ Q. (5.28)

and as a consequence, Q∗ is the unique minimiser of Hm(Q∥∥P ) over all Q ∈ Q.

5.3. Computing the functional Jh 119

Proof. For notational convenience, all integration in this proof will be over Rd. We have

(2−m)Hm(Q∥∥P ) (5.29)

(5.7)=

∫f logm f + (1−m)g logm g − (2−m)f logm g

=

∫f logm f + (1−m)f ∗ logm f

∗ − (2−m)f logm f∗

+

∫(2−m)f(logm f

∗ − logm g)− (1−m)(f ∗ logm f∗ − g logm g)

(5.7)= (2−m)Hm(Q∗

∥∥Q) +

∫(2−m)f(logm f

∗ − logm g)− (1−m)(f ∗ logm f∗ − g logm g).

(5.30)

We rewrite the second line in (5.30) using (5.26) and (5.27)∫(2−m)f(logm f

∗ − logm g)− (1−m)(f ∗ logm f∗ − g logm g)

(5.26)= (2−m)

∫f

N∑i=1

λiri − (1−m)

∫(f ∗ logm f

∗ − g logm g)

= (2−m)N∑i=1

λi

∫fri − (1−m)

∫(f ∗ logm f

∗ − g logm g)

(5.27)= (2−m)

N∑i=1

λi

∫f ∗ri − (1−m)

∫(f ∗ logm f

∗ − g logm g)

= (2−m)

∫f ∗

N∑i=1

λiri − (1−m)

∫(f ∗ logm f

∗ − g logm g)

(5.27)= (2−m)

∫f ∗(logm f

∗ − logm g)− (1−m)

∫(f ∗ logm f

∗ − g logm g)

=

∫f ∗ logm f

∗ − (2−m)f ∗ logm g + (1−m)g logm g

(5.7)= (2−m)Hm(Q∗

∥∥P ). (5.31)

From (5.30) and (5.31) we obtain (5.28).

Remark 5.3.2. The property that the relative entropy and the m-relative entropy satisfya generalized Pythagorean relation is well-known in the literature, see for instance [Csi75]and [OW10] for similar relationship.

We now apply this Proposition to the q-bivariate measures.

Proposition 5.3.3. Let P = Nm(µ1, σ21, µ2, σ

22, θ) be given. We define

Q :=Nm(ν1, ξ

21 , ν2, ξ

22 , θ)

∣∣ θ ∈ [−1, 1]. (5.32)


The minimizer of the minimization problem

minQ∈Q

Hm(Q∥∥P ), (5.33)

is given by

Q∗ = Nm(ν1, ξ21 , ν2, ξ

22 , η), (5.34)

whereη

(1− η2)3−m

2

=θ

(1− θ2)3−m

2

(ξ1ξ2

σ1σ2

)2−m

. (5.35)

Proof. For notational convenience, all integration in this proof will be over R2. We willprove a slightly stronger statement, namely that Q∗ is the minimiser of H(·‖P ) over Q,which is defined as the set of all Q(dx) = f(x)dx satisfying∫

f(x, y)dxdy = 1,∫xf(x, y)dxdy = ν1,

∫yf(x, y)dxdy = ν2,∫

x2f(x, y)dxdy = ξ21 + ν2

1 ,

∫y2f(x, y)dxdy = ξ2

2 + ν22 .

Since Q∗ ∈ Q ⊂ Q, Q∗ is also the unique minimizer on Q. Let g(x) and f ∗(x) respectivelydenote the densities of P and Q∗. By Proposition 5.3.1, f ∗ satisfies the following equation

logm f∗(x) = logm g(x) +

5∑i=1

λiri(x), (5.36)

where r1 = 1, r2 = x, r3 = y, r4 = x2, r5 = y2. To get the equation (5.35), we simplyequalize the coefficients of xy in two sides of (5.36) and obtain

θ

σ1σ2(1− θ2)

[C0(m, d)

σ1σ2

√1− θ2

]1−m

=η

ξ1ξ2(1− η2)

[C0(m, d)

σ1σ2

√1− η2

]1−m

,

which is equivalent to

η

(1− η2)3−m

2

=θ

(1− θ2)3−m

2

(ξ1ξ2

σ1σ2

)2−m

.

We now compute the m-entropy of a m-Gaussian and the m-relative entropy betweentwo m-Gaussians.


Proposition 5.3.4. We have

Em(Nm(µ,Σ))−Em(Nm(µ, V )) = (2−m)C1(m, d)

(C0(m, d)

(detV )12

)1−m

logm(detV )

12

(det Σ)12

, (5.37)

and

Hm(Nm(µ,Σ), Nm(ν, V ))

=1

2C1(m, d)

(C0(m, d)

(detV )12

)1−m [tr(V −1Σ

)+⟨µ− ν, V −1(µ− ν)

⟩+ 2 logm

(det Σ)12

(detV )12

− d

].

(5.38)

Remark 5.3.5. When m = 1, we recover the corresponding formula for the Gaussianmeasures.

Proof. For notational convenience, all integration in this proof will be over Rd. We denoteby f(x) and g(x) respectively the densities of Nm(µ,Σ) and Nm(ν, V ). Using the explicitformula of f and g as in (5.9) and by a straightforward calculation we get

∫f logm fdx = logm

[C0(m, d)

(det Σ)12

expm

(−d

2C1(m, d)

)](5.15)=

(−d

2C1(m, d)

)+ logm

C0(m, d)

(det Σ)12

+ (1−m)

(−d

2C1(m, d)

)logm

C0(m, d)

(det Σ)12

,

(5.39)∫g logm gdx = logm

[C0(m, d)

(detV )12

expm

(−d

2C1(m, d)

)](5.15)=

(−d

2C1(m, d)

)+ logm

C0(m, d)

(detV )12

+ (1−m)

(−d

2C1(m, d)

)logm

C0(m, d)

(detV )12

,

(5.40)∫f logm gdx = logm

[C0(m, d)

(detV )12

expm

(−1

2C1(m, d)(tr(V −1Σ) + 〈µ− ν, V −1(µ− ν))〉

)].

(5.41)

(5.15)=

(−1

2C1(m, d)(tr(V −1Σ) + 〈µ− ν, V −1(µ− ν)〉

)+ logm

C0(m, d)

(detV )12

(5.42)

+ (1−m)

(−1

2C1(m, d)(tr(V −1Σ) + 〈µ− ν, V −1(µ− ν)〉)

)logm

C0(m, d)

(detV )12

.

(5.43)


Hence∫(f logm f − g logm g) dx =

(1− (1−m)

d

2C1(m, d)

)[logm

C0(m, d)

(det Σ)12

− logmC0(m, d)

(detV )12

]

=

(1− (1−m)

d

2C1(m, d)

)(C0(m, d)

(detV )12

)1−m

logm(detV )

12

(det Σ)12

,

(5.44)

and∫(g − f) logm gdx

=1

2C1(m, d)

(tr(V −1Σ) + 〈µ− ν, V −1(µ− ν)〉 − d

) [1 + (1−m) logm

C0(m, d)

(detV )12

](5.45)

=1

2C1(m, d)

(tr(V −1Σ) + 〈µ− ν, V −1(µ− ν)〉 − d

) [C0(m, d)

(detV )12

]1−m

. (5.46)

Since

1− (1−m)d

2C1(m, d) = (2−m)C1(m, d),

We get

Em(Nm(µ,Σ))− Em(Nm(µ, V )) =

∫(f logm f − g logm g) dx

= (2−m)

(C0(m, d)

(detV )12

)1−m

logm(detV )

12

(det Σ)12

, (5.47)

and

Hm(Nm(µ,Σ), Nm(ν, V )

(5.15)=

1

2−m

∫[f logm f − g logm g − (2−m) logm g(f − g)] dx

=1

2C1(m, d)

[C0(m, d)

(detV )12

]1−m [tr(V −1Σ) + 〈µ− ν, V −1(µ− ν)〉+ 2 logm

(detV )12

(det Σ)12

− d

].

(5.48)

Proposition 5.3.6. Let Nq(µ0, Cσ20), Nq(µ,Cσ

2) be given. Set σ2h = (h+ σ3−q

0 )2

3−q and letNq(µ0, Cσ

2h) be the solution at time h. Let

Q0→h = Nm(µ0, Cσ20, µ0, Cσ

2h, θh), where θh =

σ0

σh, (5.49)


be the m-bivariate with mean vector

µ0→h =

(µ0

µ0

), Σ0→h =

(Cσ2

0 Cθhσ0σhCθhσ0σh Cσ2

h

)=

(Cσ2

0 Cσ20

Cσ20 Cσ2

h

). (5.50)

DefineQ =

Nm(µ0, Cσ

20, µ, Cσ

2, θ)∣∣θ ∈ [−1, 1]

, (5.51)

andQ∗ = argminQ∈QHm(Q

∥∥Q0→h). (5.52)

ThenQ∗ = Nm(µ0, Cσ

20, µ, Cσ

2, ηh), (5.53)

which is a m-bivariate with the mean vector and covariance matrix

µ∗ =

(µ0

µ

), Σ∗ =

(Cσ2

0 Cηhσ0σCηhσ0σ Cσ2

), (5.54)

where ηh satisfies the equation

ηh

(1− η2h)

3−m2

=θh

(1− θ2h)

3−m2

(σ

σh

)2−m

. (5.55)

Moreover, the m-relative entropy Hm(Q∗∥∥Q0→h) is

Hm(Q∗∥∥Q0→h)

=1

2C1(m, 2)

(C0(m, 2)

Cσ0(σ2h − σ2

0)12

)1−m [(σ − σ0)2

σ2h − σ2

0

+2σ0σ(1− ηh)σ2h − σ2

0

+(µ− µ0)2

C(σ2h − σ2

0)

+ 2 logm

(σ0

σηh

) 13−m

− 1]. (5.56)

The difference Eq(Nq(µ,Cσ2))− Eq(Nq(µ0, Cσ

20)) is

Eq(Nq(µ,Cσ2))− Eq(Nq(µ0, Cσ

20)) = (2− q)C1(q, 1)

(C0(q, 1)

σ0

√C

)1−q

logq

(σ0

σ

), (5.57)

and the Wasserstein distance W 22 (Nq(µ,Cσ

2), Nq(µ0, Cσ20)) is

W2(Nq(µ,Cσ2), Nq(µ0, Cσ

20))2 = C(σ − σ0)2 + (µ− µ0)2. (5.58)

Proof. By Proposition 5.3.4,

Hm(Q∗∥∥Q0→h) =

1

2C1(m, 2)

[C0(m, 2)

(det Σ0→h)12

]1−m

×[tr(Σ−1

0→hΣ∗) + 〈µ∗ − µ0→h,Σ

−10→h(µ

∗ − µ0→h)〉+ 2 logm(det Σ0→h)

12

(det Σ∗)12

− 2

].


We now calculate each term in the above formula explicitly.

det Σ0→h = C2σ20(σ2

h − σ20) = C2σ2

0σ2h(1− θ2

h), Σ−10→h =

1

C(σ2h − σ2

0)

( 1θ2h−1

−1 1

).

det Σ∗ = C2σ20σ

2(1− η2).

tr(Σ−1

0→hΣ∗) =

σ2h + σ2

0 − 2ησ0σ

σ2h − σ2

0

=(σ − σ0)2

σ2h − σ2

0

+2σ0σ(1− η)

σ2h − σ2

0

+ 1.

(µ∗ − µ0→h)TΣ−1

0→h(µ∗ − µ0→h) =

(µ− µ0)2

C(σ2h − σ2

0).

det Σ0→h

det Σ∗=σ2h

σ2· 1− θ2

h

1− η2.

By (5.55)

ηh

(1− η2h)

3−m2

=θh

(1− θ2h)

3−m2

(σ

σh

)2−m

.

Hence

1− θ2h

1− η2h

=

(θhηh

) 23−m

(σ

σh

) 2(2−m)3−m

,

andσ2h

σ2· 1− θ2

h

1− η2h

=

(σhθhσηh

) 23−m

=

(σ0

σηh

) 23−m

.

Therefore(det Σ0→h)

12

(det Σ∗)12

=

(σ0

σηh

) 13−m

,

and (5.56) follows.(5.57) is a direct consequence of the first equality in Proposition 5.3.4 and (5.58) has

been shown in (5.25).

5.4 Proof of the main theorem

In this section, we bring all ingredients together to prove the main theorem. Supposethat the assumption of the main theorem is true. We set

a(q, σ0) =2C

C1(m, 2)

(C0(m, 2)

Cσ0

)m−1

=2C2−m

C1(m, 2)

(C0(m, 2)

σ0

)m−1

. (5.59)

b(σ0, q) =(2− q)C1(q, 1)

C

(C0(q, 1)

σ0

√C

)1−q

=(2− q)C1(q, 1)

C3−q

2

(C0(q, 1)

σ0

)1−q

. (5.60)

5.4. Proof of the main theorem 125

Let Q∗ be the minimizer in (5.13). By Proposition 5.3.6, we have

Q∗ = Nm(µ0, Cσ20, µ, Cσ

2, ηh),

where ηh = η(h, σ) satisfies the following equation

ηh

(1− η2h)

3−m2

=θh

(1− θ2h)

3−m2

(σ

σh

)2−m

=

σ0

σh(1− σ2

0

σ2h

) 3−m2

(σ

σh

)2−m

=σ0

(σ2h − σ2

0)3−m

2

σ2−m.

(5.61)Since |ηh| ≤ 1 and the right hand side of (5.61) is positive, it holds that 0 < ηh < 1. Usingthe relationship m = 3− 2

q, we can rewrite the above equation as follows

ηqh1− η2

h

=σq0σ

(2−q)

σ2h − σ2

0

. (5.62)

We now use the following statement whose proof is straightforward.Assume that x0 > 0 is given. For all ε > 0, and for all h > 0 sufficiently small, there

exists a constant C = C(ε, x0) such that

(h+ x0)ε − xε0 ≤ Ch. (5.63)

In particular, if ε < 1 then C = εxε−10 .

Using (5.63) for ε = 23−q , x0 = σ3−q and from (5.62), we get

1− ηh =ηqh

1 + ηh

σ2h − σ2

0

σq0σ(2−q) =

ηqh1 + ηh

(h+ σ3−q0 )

23−q − σ2

0

σq0σ(2−q) ≤ Ch

σ(2−q) , (5.64)

where C > 0 is a constant depending only on σ0 and q. This implies that for fixed σ,limh→0 ηh = limh→0 η(h, σ) = 1 and as a sequence of functions η(h, ·)→ 1 locally uniform.

1. We now prove the first statement of the main theorem. We need to prove

a(σ2h − σ2

0)1qJh(·|N0

q )Γ→ W 2

2 (·, N0q ) as h→ 0. (5.65)

Let Nq(µ,Cσ2) be given and we denote it by Nq for short. By Proposition 5.3.6, we

have

a(σ2h − σ2

0)1qJh(Nq|N0

q )

= C(σ − σ0)2 + (µ− µ0)2 + 2Cσ0σ(1− ηh) + C(σ2h − σ2

0)[2 logm

(σ0

σηh

) 13−m

− 1],

andW 2

2 (Nq, N0q ) = C(σ − σ0)2 + (µ− µ0)2. (5.66)


For the lower bound part: Assume that Nhq = Nq(νh, Cξ

2h) → Nq. This means that

(νh − µ)2 + (ξh − σ)2 → 0. Hence we can assume that 0 < σ2≤ suph ξh ≤ 3

2σ.

Let ηh = η(h, ξh) be the solution of (5.61) where σ is replaced by ξh. By (5.64) wehave limh→ ηh = 1, so we can assume that ξhηh is uniformly bounded above and awayfrom 0. We now have

a(σ2h − σ2

0)1qJh(N

hq |N0

q )

= C(ξh − σ0)2 + (νh − µ0)2 + 2Cσ0ξh(1− ηh) + C(σ2h − σ2

0)[2 logm

(σ0

ξhηh

) 13−m

− 1]

≥ C(ξh − σ0)2 + (νh − µ0)2 + C(σ2h − σ2

0)[2 logm

(σ0

ξhηh

) 13−m

− 1].

Hence

lim infh→0

a(σ2h − σ2

0)1qJh(N

hq |N0

q )

≥ lim infh→0

C(ξh − σ0)2 + (νh − µ0)2 + C(σ2

h − σ20)[2 logm

(σ0

ξhηh

) 13−m

− 1]

= C(σ − σ0)2 + (µ− µ0)2 = W 22 (Nq, N

0q ).

For the upper bound part: as a recovery sequence, we just simply take the fixedsequence Nh

q = Nq.

2. We now prove the second statement of the main theorem. We need to prove:

ab(σ2h − σ2

0)1−qq Jh(·|N0

q )− b

σ2h − σ2

0

W 22 (·, N0

q )Γ→ Eq(·)− Eq(N0

q ) as h→ 0. (5.67)

Let Nq = Nq(µ, σ2) be given. Let ηh be the solution of (5.61). We have

ab(σ2h − σ2

0)1−qq Jh(Nq|N0

q )

= bC(σ − σ0)2

σ2h − σ2

0

+(µ− µ0)2

C(σ2h − σ2

0)+ 2

σ0σ(1− ηh)σ2h − σ2

0

+[2 logm

(σ0

σηh

) 13−m

− 1],

(5.68)

b

σ2h − σ2

0

W 22 (Nq, N

0q ) = bC

[(σ − σ0)2

σ2h − σ2

0

+(µ− µ0)2

C(σ2h − σ2

0)

], (5.69)

Eq(Nq)− Eq(N0q ) = bC logq

(σ0

σ

). (5.70)

Define

Fh(Nq;N0q ) :=

2σ0σ(1− ηh)σ2h − σ2

0

+ 2 logm

(σ0

σηh

) 13−m

− 1,

F(Nq;N0q ) := logq

(σ0

σ

).


We need to prove

Fh(·;N0q )

Γ−→ F(·;N0q ). (5.71)

We first prove that Fh(·, N0q )→ F(·, N0

q ) locally uniform. We now rewrite the RHSof Fh using the relationship between m and q.

Since for any t > 0

logm t1

3−m =t

1−m3−m − 1

1−m=t1−q − 1

2(1−q)q

=q

2logq t;

Hence

2 logm

(σ0

σηh

) 13−m

= q logq

(σ0

σηh

). (5.72)

From (5.61), we get

ηh

(1 + ηh)3−m

2

=

(σ0σ(1− ηh)σ2h − σ2

0

) 3−m2(σ

σ0

) 1−m2

. (5.73)

This implies that

σ0σ(1− ηh)σ2h − σ2

0

=ηqh

1 + ηh

(σ0

σ

)1−q=

ηqh1 + ηh

((1− q) logq

(σ0

σ

)+ 1). (5.74)

From (5.72) and (5.74) we obtain

Fh(Nq;N0q ) = 2

ηqh1 + ηh

(σ0

σ

)1−q+ q logq

(σ0

σηh

)− 1.

Now we have the following estimate

|Fh −F| =∣∣∣2 ηqh

1 + ηh

(σ0

σ

)1−q+ q logq

(σ0

σηh

)− logq

(σ0

σ

)− 1∣∣∣

=∣∣∣2 ηqh

1 + ηh

(σ0

σ

)1−q− 1− (1− q) logq

(σ0

σ

)+ q logq

(σ0

σηh

)− q logq

(σ0

σ

) ∣∣∣=(σ0

σ

)1−q ∣∣∣2 ηqh1 + ηh

− 1 +q

1− q

(η

(q−1)h − 1

) ∣∣∣≤(σ0

σ

)1−q [∣∣∣2 ηq

1 + ηh− 1∣∣∣+∣∣∣ q

1− q

(η

(q−1)h − 1

) ∣∣∣]≤ 1

σ1−qC(1− ηh)(5.64)

≤ Ch

σ3−2q, (5.75)

where C is a constant depending only on σ0 and q.


The locally uniform convergence of Fh to F thus follows from the estimate (5.75).

For the Γ-convergence, we get the lower bound part by the local uniform convergenceand the continuity of the entropy. Indeed, let Nq := Nq(µ,Cσ

2) be given and assumethat Nh

q := Nq(µh, Cξ2h) → Nq. Then µh → µ and ξh → σ. Hence we can assume

without loss of generality that σ2≤ suph ξh ≤ 3σ

2. We have the following estimate∣∣∣Fh(Nh

q ;N0q )−F(Nq;N

0q )∣∣∣ ≤ ∣∣∣Fh(Nh

q ;N0q )−F(Nh

q ;N0q )∣∣∣+∣∣∣F(Nh

q ;N0q )−F(Nq;N

0q )∣∣∣

(5.75)

≤ Ch

ξ3−2qh

+∣∣∣ logq

(σ0

ξh

)− logq

(σ0

σ

) ∣∣∣≤ Ch

σ3−2q+∣∣∣ logq

(σ0

ξh

)− logq

(σ0

σ

) ∣∣∣→ 0.

Thereforelimh→0Fh(Nh

q ;N0q ) = F(Nq;N

0q ).

For the upper part, as a recovery sequence, we can choose the fixed sequenceNhq = Nq.

3. We now prove the third statement of the main theorem. Assume that 0 < q < 1. Weneed to prove

ab(σ2h − σ2

0)1−qq Jh(·|N0

q )− 1

2hW 2

2 (·, N0q )

Γ−lim inf−−−−−→ Eq(·)− Eq(N0q ) as h→ 0.

We have

ab(σ2h − σ2

0)1−qq Jh(·|N0

q )− 1

2hW 2

2 (·, N0q )

= ab(σ2h − σ2

0)1−qq Jh(·|N0

q )− b

σ2h − σ2

0

W 22 (·, N0

q ) +

(b

σ2h − σ2

0

− 1

2h

)W 2

2 (·, N0q ).

Since 0 < q < 1, 0 < 23−q < 1, using (5.75) for ε = 2

3−q , x0 = σ3−q0 , we have

σ2h − σ2

0 = (h+ σ3−q0 )

23−q − σ2

0 ≤2

(3− q)σ1−q0

h.

Thereforeb

σ2h − σ2

0

≥ (3− q)bσ1−q0

2h=

1

2h.

It implies that

ab(σ2h−σ2

0)1−qq Jh(·|N0

q )− 1

2hW 2

2 (·, N0q ) ≥ ab(σ2

h−σ20)

1−qq Jh(·|N0

q )− b

σ2h − σ2

0

W 22 (·, N0

q ),

and the third statement thus follows from the second one.

This completes the proof of the main theorem.


5.4.1 Discussion

The Gamma convergence results of the main theorem suggest the following asymptoticequivalence as h ↓ 0,

a(σ2h − σ2

0)1qJh(·|N0

q ) ≈ W 22 (·, N0

q ), (5.76)

ab(σ2h − σ2

0)1−qq Jh(Nq|N0

q ) ≈ b

(σ2h − σ2

0)W 2

2 (Nq, N0q ) + Eq(Nq)− Eq(N0

q ). (5.77)

In the case of 0 < q < 1, we have

ab(σ2h − σ2

0)1−qq Jh(Nq|N0

q ) ≈ 1

2hW 2

2 (Nq, N0q ) + Eq(Nq)− Eq(N0

q ) as h ↓ 0. (5.78)

At each time step h > 0, and given Nq and N0q , the functional Jh(Nq|N0

q ) is alwaysnon-negative and is equal to 0 if and only if Nq = N0

q , so are the functionals on the lefthand sides of (5.76), (5.77) and (5.78). By its definition, Jh(Nq|N0

q ) measures the deviationof Nq from the solution of the porous medium equation at time h given the initial data N0

q .Hence minimizing Jh means to find the best approximation of the solution at a time h. Asa consequence of the Gamma convergence results, minimizers of the functionals on the lefthand sides of (5.76), (5.77) and (5.78) converge, as h→ 0, to minimizer of the functionalson the right hand sides respectively. Thus the main theorem explains the Wassersteindistance is involved and why we should minimize the combination of it with the internalenergy functional Eq.

For the linear diffusion equation, by Sanov’s theorem the relative entropy is the ratefunctional of the empirical process of many i.i.d particles. The functional Jh in (5.4) is therate functional after time h of the empirical process of many-Brownian motions. Hence ithas a clear microscopic interpretation. The m-relative entropy Hm and the functional Jhare defined and have similar properties as H and Jh. However, it is unclear whether or notJh is the rate functional of some microscopic stochastic process. We leave this as an openproblem.

Chapter 6

Microscopic derivation of thethermo-visco-elasticity system

In this chapter, we formally derive the thermo-visco-elasticity (TVE) system as thehydrodynamic limit of an underlying particle model1.

6.1 The thermo-visco-elasticity system

This chapter is concerned with the thermo-visco-elasticity system,u = p

p = kuxx + αθx + µpxx

θ = αθpx + κθxx + µp2x.

(6.1)

In the equation, u is the displacement, p is the momentum and θ is the absolute tem-perature. They are functions of t ∈ [0,∞) and x ∈ Ω ≡ (0, 1). u is the time derivativeof u and similarly for the other unknowns. Subscripts in the right hand side denote thederivative with respect to spacial variable x. Finally, k, α, κ and µ are positive constantparameters. Equation (6.1) need to be supplemented with suitable initial and boundaryconditions. Note that the last equation in (6.1) also can be written in terms of the internalenergy e = θ + k

2u2x + 1

2p2 as follows

e = (αθp+ κθx + kpux + µppx)x.

6.2 GENERIC structure of the TVE

In [Mie11],the author showed that the TVE is a GENERIC system (see Section 1.3),

z = L(z)dE(z) + M(z)dS(z),

1This chapter is work in progress together with Mark A. Peletier and Johannes Zimmer.

131

132 Microscopic derivation of the thermo-visco-elasticity system

with the following building blocks

z = (u, p, θ)T , S(z) =

∫Ω

(log θ − αux) dx, E(z) =

∫Ω

(θ +

k

2u2x +

1

2p2

)dx,

L(z) =

0 1 0−1 0 α∂x(θ)0 αθ∂x 0

, M(z) =

0 0 00 −µ∂x(θ∂x) µ∂x(θpx)0 −µθpx∂x −∂x(κθ2∂x) + µθp2

x

.

We recall that L(z) and M(z) are two operators, and in their formula represent thearguments.

In Chapters 3 and 4 we have interpreted the Wasserstein gradient flow structure of theFokker-Planck equation and the GENERIC structure of the Vlasov-Fokker-Planck equa-tion as characterizations of the large deviation principles of underlying particle systems. Inthis chapter, we are interested in generalizing results in these chapters to the TVE system.Unlike the Fokker-Planck or the Vlasov-Fokker-Planck equation, it is not straightforwardto construct a microscopic particle system that gives rise to the TVE system as the hydro-dynamic limit. The large-deviation principle from the hydrodynamic is even more intricate.The difficulty arises due to the nonlinearity presented in the TVE system.

In the next section, we formally derive the TVE system as the hydrodynamic of achain of one-dimensional harmonic oscillators. To obtain the hydrodynamic limit for theinternal energy, we make two assumptions: a local equilibrium assumption (6.20) and areplacement assumption (6.21). Roughly speaking, these two assumptions allow us to closean equation by substituting certain quantities by functions of the unknowns. To make theformal derivation rigorous, we need to prove these two assumptions which are non-trivial.This work is in progress.

6.3 A microscopic model

We consider a chain of one-dimensional harmonic oscillators located at sites i ∈ Sn :=1, . . . , n. Here we assume periodicity modulo n, i.e., the site n + 1 is the same asthe site 1. The configuration space is denoted by Ωn = (R×R)Sn , and a configuration isω = ui, pini=1, where ui and pi respectively represent the displacement and the momentumof the particle i.

We consider the TVE-process introduced in the introductory chapter. Its generator Lacts on smooth functions f : R2n → R and is given by

Lnf =n∑i=1

pi∂uif +n∑i=1

n2k(ui+1 − 2ui + ui−1)∂pif + n2µn∑i=1

Y 2i f, (6.2)

= Anf + n2kBnf + n2µCnf, (6.3)

whereYi = (pi − pi+1)∂pi−1

+ (pi+1 − pi−1)∂pi + (pi−1 − pi)∂pi+1. (6.4)

6.3. A microscopic model 133

Remark 6.3.1. The last term in the generator characterizes the randomly exchange ofmomentum between three consecutive particles and has been used in [BBO06]. Its mainproperty is to conserve total momentum and total kinetic energy. As a consequence, theTVE-process conserves the deformation field, the total momentum and the energy

Ln

n−1∑j=1

rj = Ln

n∑j=1

pj = Ln

n∑j=1

ej = 0, (6.5)

where rj = uj+1 − uj.

We are interested in the following empirical processes

un(t, dx) =1

n

n∑i=1

ui(t)δi/n(dx), (6.6)

pn(t, dx) =1

n

n∑i=1

pi(t)δi/n(dx), (6.7)

en(t, dx) =1

n

n∑i=1

ei(t)δi/n(dx), (6.8)

where δi/n(dx) is the Delta measure at site i/n and

ei(t) =pi(t)

2

2+n2k

2[ui+1(t)− ui(t)]2. (6.9)

For each t, these are probability measures on the flat one-dimensional torus T = R/Z.

Remark 6.3.2. Note how different effects are scaled differently in Ln. The first term isscaled by factor 1 while the last two terms in (1.47) are speeded up by a factor n2. Hence Lnconsists both hyperbolic and diffusive scalings. The second term in (1.53) is also multipliedby n2 to ensure that ui, pi, ei as well as 1

n

∑ni=1 ei are of order O(1).

Let πn(t) be any of the three processes defined in (1.51)-(1.53). Let ρ0 be a givenprobability measure on T. A typical result of convergence to the hydrodynamic limit of πn

consists in proving that, under suitable initial conditions on πn(0), if πn(0) converges to ρ0

in an appropriate topology then for any t > 0, πn(t) converges to some probability measureρ(t) which satisfies some partial differential equation that has ρ0 as its initial profile.

Conjecture 6.3.3. Let Pn denote the distribution of the TVE process. Assume that forevery continuous and bounded function ϕ : T→ R and for every δ > 0 we have,

limn→∞

Pn(∣∣∣〈ϕ, un(0)〉 −

∫Tϕ(x)u0(x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, pn(0)〉 −

∫Tϕ(c)p0(x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, en(0)〉 −

∫Tϕ(x)e0(x)dx〉

∣∣∣ > δ

)= 0.


Then these limits also hold for any t > 0,

limn→∞

Pn(∣∣∣〈ϕ, un(t)〉 −

∫Tϕ(x)u(t, x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, pn(t)〉 −

∫Tϕ(c)p(t, x)dx〉

∣∣∣ > δ

)= 0,

limn→∞

Pn(∣∣∣〈ϕ, en(t)〉 −

∫Tϕ(x)e(t, x)dx〉

∣∣∣ > δ

)= 0,

where (u, p, e) satisfy the following equationu = p

p = kuxx + 6µpxx

e = k∂x(pux) + µ∂xx(θ + 3p)2.

(6.10)

In the next section, we show formal computation to support this conjecture.

6.4 Hydrodynamic limits (HDL)

Let πn(t) be any of the three processes defined in (6.6)-(6.8). Denote by Pn the dis-tribution of the TVE-process and EPn is the expectation with respect to Pn. To derivethe hydrodynamic limit (HDL) for πn we use the following property of the generator ofMarkov processes: for any smooth function ϕ on T, we have

d

dtEPn

[〈ϕ, πn(t)〉

]= EPn

[Ln〈ϕ, πn(t)〉

], (6.11)

We compute Lnuj, Lnpj, and Lnej for j = 1, . . . , n first.

Lnuj = Anuj = pj, (6.12)

Lnpj = n2kAnpj + n2µCnpj = n2k(uj+1 − 2uj + uj−1) + n2µn∑i=1

Y 2i pj, (6.13)

Yipj = (pi − pi+1)δi−1,j + (pi+1 − pi−1)δi,j + (pi−1 − pi)δi+1,j, (6.14)

where δa,b is Kronecker notation

δa,b =

1, if a = b,

0, if a 6= b.

From (6.14), we get

Y 2i pj = (pi− pi+1)(−δi,j + δi+1,j) + (pi+1− pi−1)(−δi−1,j + δi+1,j) + (pi−1− pi)(−δi−1,j + δi,j).

6.4. Hydrodynamic limits (HDL) 135

Hencen∑i=1

Y 2i pj =

n∑i=1

[(pi − pi+1)(−δi,j + δi+1,j) + (pi+1 − pi−1)(−δi−1,j + δi+1,j)

+ (pi−1 − pi)(−δi−1,j + δi,j)]

= pj+2 + 2pj+1 − 6pj + 2pj−1 + pj−2.

It follows that

Lnpj = n2k(uj+1 − 2uj + uj−1) + n2µ[pj+2 + 2pj+1 − 6pj + 2pj−1 + pj−2]

= k∆nuj + µ∆n(pj+1 + 4pj + pj−1), (6.15)

where ∆n denotes the discrete Laplacian operator

∆nsj = n2(sj+1 − 2sj + sj−1).

We now compute Lnej. By definition of ej,

Lnej =1

2Ln(p2

j) +n2k

2Ln(uj+1 − uj)2. (6.16)

We compute each term. First,

Ln(uj+1 − uj)2 = An(uj+1 − uj)2 = 2(pj+1(uj+1 − uj)− pj(uj+1 − uj)).

Next,

Lnp2j = n2kBnp

2j + n2µ

n∑i=1

Y 2i p

2j

= 2n2k(uj+1 − 2uj + uj−1)pj + n2µn∑i=1

Y 2i p

2j .

Since

Yip2j = 2

[(pi − pi+1)pi−1δi−1,j + (pi+1 − pi−1)piδi,j + (pi−1 − pi)pi+1δi+1,j

],

Y 2i p

2j = 2

(pi − pi+1)

[(pi − pi+1)δi−1,j − piδi,j + pi+1δi+1,j

]+ (pi+1 − pi−1)

[pi−1δi−1,j + (pi+1 − pi−1)δi,j − pi+1δi+1,j

]

+ (pi−1 − pi)[− pi−1δi−1,j + piδi,j + (pi−1 − pi)δi+1,j

],

n∑i=1

Y 2i p

2j = 2

[p2j+2 + 2p2

j+1 − 6p2j + 2p2

j−1 + p2j−2 − 2pj+1pj+2 + pj+2pj + 2pj+1pj − 2pj+1pj−1

+ 2pjpj−1 + pjpj−2 − 2pj−1pj−2

]=: 2(Dp)j.


so we obtain

Lnej = n2k[pj+1(uj+1 − uj)− pj(uj − uj−1)] + n2µ(Dp)j. (6.17)

6.4.1 HDL for un

By definition of un(t), we have

Ln〈ϕ, un(t)〉 =1

n

n∑i=1

ϕ(i/n)Lnui(t)

(6.12)=

1

n

n∑i=1

ϕ(i/n)pi(t) = 〈ϕ, pn(t)〉.

Using (6.11), we getd

dtEPn(〈ϕ, un(t)〉) = EPn(〈ϕ, pn(t)〉). (6.18)

By the law of large number, for each t ∈ [0, T ], the empirical process un(t), pn(t) and en(t)respectively converge to deterministic profiles u(t), p(t) and e(t), as N →∞, in the sensethat for any δ > 0 and ϕ ∈ Cb(T)

limN→∞

Pn (|〈ϕ, un(t)〉 − 〈ϕ, u(t)〉| > δ) = 0,

limN→∞

Pn (|〈ϕ, pn(t)〉 − 〈ϕ, p(t)〉| > δ) = 0,

limN→∞

Pn (|〈ϕ, en(t)〉 − 〈ϕ, e(t)〉| > δ) = 0.

This together with (6.18), we obtain as n→∞

d

dt〈ϕ, u(t)〉 = 〈ϕ, p(t)〉,

which is a weak formulation for the equation ∂tu(t) = p(t).

6.4.2 HDL for pn

By definition of pn(t), we have

Ln〈ϕ, pn(t)〉 =1

n

n∑i=1

ϕ(i/n)Lnpi(t)

(6.15)=

1

n

n∑i=1

ϕ(i/n)[k∆nui(t) + µ∆n(4pi(t) + pi+1(t) + pi−1(t))

]=

1

n

n∑i=1

[kui(t)∆nϕ(i/n) + 6µpi(t)∆nϕ(i/n)

]+ o(1)

= 〈∆ϕ, kun(t) + 6µpn(t)〉+ o(1),

6.4. Hydrodynamic limits (HDL) 137

where the third line is obtained by discrete integration by parts, and

(∆nG)(i/n) = n2[G

(i+ 1

n

)− 2G

(i

n

)+G

(i− 1

n

)]+ o(n−1),

(∇nG)(i/n) = n[G

(i+ 1

n

)−G

(i

n

)]+ o(n−1).

Now using (6.11),

d

dtEPn〈ϕ, pn(t)〉 = EPn(〈∆ϕ, kun(t) + 6µpn(t)〉) + o(1),

and as n→∞ we obtain

d

dt〈ϕ, p(t)〉 = 〈∆ϕ, ku(t) + 6µp(t)〉,

which is the weak formulation of the equation

∂tp(t) = k∆u(t) + 6µ∆p(t).

6.4.3 HDL for en

By definition of en(t), we have

Ln〈ϕ, en(t)〉 =1

n

n∑i=1

ϕ(i/n)Lnei(t)

=1

n

n∑i=1

ϕ(i/n)n2k[pi+1(ui+1 − ui)− pi(ui − ui−1)] + n2µ(Dp)i

(t)

=1

n

n∑i=1

ϕ(i/n)[k∇n(pr)i(t) + µ∆nhi(t)]

= − 1

n

n∑i=1

k(pr)i(t)∇nϕ(i/n) + µ1

n

n∑i=1

hi(t)∆nϕ(i/n)

= −k〈∇ϕ, pn(t)rn(t)〉+ µ1

n

n∑i=1

hi(t)∆nϕ(i/n),

where ri = n[ui − ui−1], ∇n is the discrete gradient operator

∇nsi = N(si+1 − si),

and

hi(t) = p2i+1(t) + 4p2

i (t) + p2i−1(t)− 2pi+1(t)pi(t) + pi+1(t)pi−1(t)− 2pi(t)pi−1(t).


Using (6.11), we have

d

dtEPn〈ϕ, en(t)〉 = EPn

[k〈∇ϕ, pn(t)rn(t)〉+ µ

1

n

n∑i=1

hi(t)∆nϕ(i/n)]

(6.19)

In contrast to the HDL for un and pn, the right hand side of (6.19) is not closed in thesense that it depends on hi which is not a function of un, pn, en. In order to get a closedsystem we need the so-called Local Equilibrium Assumption as follows.

Assume that rn(t, x)dx→ r(t, x)dx, pn(t, dx)→ p(t, x)dx, en(t, dx)→ e(t, x)dx. Set

θ(t, x) = e(t, x)− 1

2p(t, x)2 − k

2r(t, x)2.

The Local Equilibrium Assumption means that we can substitute hi(t) by its expecta-tion with respect to νnβ,r,p :=

∏ni=1 νβ(t,i/n),r(t,i/n),p(t,i/n)(dridpi) , where for fixed parameters

β, r, p

νβ,r,p(drdp) =2π

β√k

exp[− β

((p− p)2

2+k(r − r)2

2

)]drdp.

Set θ(t, x) = 2β(t,x)

, straightforward calculations give

Eνnβ,r,p(pi+1(t)2) = θ(t, (i+ 1)/n) + p(t, (i+ 1)/n)2,

Eνnβ,r,p(pi(t)2) = θ(t, i/n) + p(t, i/n)2,

Eνnβ,r,p(pi−1(t)2) = θ(t, (i− 1)/n) + p(t, (i− 1)/n)2,

Eνnβ,r,p(pi+1(t)pi(t)) = p(t, (i+ 1)/n)p(t, i/n),

Eνnβ,r,p(pi+1(t)pi−1(t)) = p(t, (i+ 1)/n)p(t, (i− 1)/n),

Eνnβ,r,p(pi(t)pi−1(t)) = p(t, i/n)p(t, (i− 1)/n).

So we can substitute 1n

∑ni=1 hi(t)∆nϕ(i/n) by

1

n

n∑i=1

∆nϕ(i/n)(θ(t, (i+ 1)/n) + p(t, (i+ 1)/n)2 + 4θ(t, i/n) + 4p(t, i/n)2 + θ(t, (i− 1)/n)

+ p(t, (i− 1)/n)2 − 2p(t, (i+ 1)/n)p(t, i/n) + p(t, (i+ 1)/n)p(t, (i− 1)/n)

− 2p(t, i/n)p(t, (i− 1)/n)). (6.20)

In turn, this expression can be substituted by

1

n

n∑i=1

∆nϕ(i/n)(

6θ(t, i/n) + 3p(t, i/n)2)

= 〈∆ϕ, 6θn(t) + 3pn2 (t)〉,

where

θn(t, dx) =1

n

n∑i=1

θ(t, i/n)δi/n(dx), pn2 (t, dx) =1

n

n∑i=1

p(t, i/n)2δi/n(dx).

6.5. Discussion 139

We also need to assume a replacement assumption which states that we can substitutepn2 (t, dx) by p(t, x)2dx in the limit. It follows that

〈∆ϕ, 6θn(t) + 3pn2 (t)〉 → 〈∆ϕ, 6θ(t) + 3p2(t)〉, (6.21)

and taking the limit n→∞ in (6.19), we obtain

d

dt〈ϕ, e(t)〉 = −k〈∇ϕ, p(t)r(t)〉+ µ〈∆ϕ, 6θ(t) + 3p2(t)〉,

which is the weak formulation of the equation

∂te(t) = k∂x(pux) + µ∂xx(6θ + 3p2).

6.5 Discussion

In Chapter 4, we have shown that the GENERIC structure of the (extended) Vlasov-Fokker-Planck equation can be derived from the large deviation principle of a collection ofinertial Brownian particles. It is interesting to ask whether the GENERIC structure of theTVE system given in Section 6.2 can be derived from the large deviation principle of theparticle system in Section 6.3. Deriving the large deviation principle is more intricate thanthat of the hydrodynamic limit because we need to prove that the error in the replacementassumption is exponentially small [KL99, Chapter 10]. One of the interesting feature in theGENERIC structure of the TVE system in Section 6.2 is that it provides another gradientflow structure for the heat equation

θ = κθxx.

More precisely (see the first term in the last entry of M(z)),

θ = θxx = ∂x

[D(θ)∂x

δSθδθ

(θ)

], (6.22)

where

Sθ(θ) = −∫

Ω

log θ dx, D(θ) = κθ2.

It is worth to comparing this gradient flow structure with the Wasserstein gradient flowstructure of the diffusion equation,

ρ = ρxx = ∂x

[D(ρ)∂x

δSρδρ

(ρ)

],

where

Sρ(ρ) =

∫Ω

ρ log ρ dx, D(ρ) = ρ.

The driving force and the diffusion mobility of the two structures are different. Mathe-matically, at the macroscopic level, the heat equation and the diffusion equation are of


the same form. However, as indicated by the names, they describe two different physicalphenomena. The microscopic derivation provides more insight in the sense that it revealsthe underlying physics. Recently, in [PRV14], the authors has formally derived the struc-ture (6.22) from the large deviation principle for the Brownian Energy Process (BEP).The generator of the BEP is similar to the last term in Ln but only uses two consecutiveparticles instead of three ones, see Remark 6.3.1. The authors also assume the replacementassumption. The large deviation principle for the TVE process is certainly more involvedthat that of the BEP. Therefore, it is wise to try to prove it rigorously for the BEP first.The method in [KL99, Chapter 10] seems to be not applicable due to the fact that theBEP has unbounded states. It is interesting to see whether or not the method based onconvergence of semigroups in [FK06] works for the BEP. This work is in progress.

Part II

Coarse-graining

141

Chapter 7

Qualitative coarse-graining fromlarge-deviation principle

In this chapter, we introduce a new technique for coarse-graining and illustrate it bytwo examples. The technique is based on the connection between a macroscopic equationand the large-deviation principle of an underlying particle system via the rate functionalexploited in the first part of this thesis. We analyze two examples: the overdamped (highfriction) limit of the Kramers equation and the small-noise limit of a perturbed Hamiltoniansystem. We formally derive the limiting systems 1.

7.1 Introduction

7.1.1 General framework

We first recall the general framework presented in Section 1.112. Suppose that ρε : [0, T ]→P(X ) (X := RN), solves the ε-dependent problem

(Pε) :

∂tρ

ε = L∗ε ρε,ρε(0) = ρε0.

(7.1)

The aim is to derive an ε-independent problem P that can be considered as an approxima-tion (in a suitable sense) of (Pε) as ε→ 0,

(P) :

∂tρ = L∗ ρ,ρ(0) = ρ0.

(7.2)

Here ρ : [0, T ]→ P(X0), where X0 is some Euclidean space.

1This chapter is work in progress with Agnes Lamacz, Mark Peletier and Upanshu Sharma [DLPS14].A short announcement has appeared as an Oberwolfach report [DPS13].

2Since we will not work with the Vlasov-Fokker-Planck equation, ψ ≡ 0.

143

144 Qualitative coarse-graining from large-deviation principle

Coarse-graining is a technique for such purpose. It consists of two steps. The firstone is to transform the problem (Pε) to a coarse-grained problem (Pε) defined on P(Y),where Y is some coarse-grained Euclidean space, via a coarse-graining map Πε : X →Y . The coarse-grained space is often of dimension less than the original space and asa consequence the coarse-grained map is non-injective. The coarse-grained problem (Pε)describes the evolution of the coarse-grained profile ρε which is the push-forward of ρε

under Πε, ρε = Πε#ρ

ε : [0, T ]→ P(Y),

(Pε) :

∂tρ

ε = L∗ε(ρε) ρε,ρε(0) = ρε0.

(7.3)

Note that the coarse-grained generator L∗ε(ρε) depends on ρε, therefore it is not Markovianin general.

The second step is to derive (P) from (Pε). The success of the technique relies onwhether one can define an appropriate coarse-grained problem. Usually one also has torescale the temporal and/or the spatial variables appropriately depending on the effectsthat one wishes to observe.

7.1.2 Coarse-graining from large-deviation principle

As has been shown in the first part of the thesis, for fixed ε the equation (Pε) can bederived from the rate functional of the large deviation principle of the empirical process ofan underlying particle system Xε

i that satisfies (1.3) (after rescaling). More precisely,

ρε is a solution to (Pε) iff Iε(ρε) = 0,

where the rate functional Iε(ρε) is given by (see (1.28))

Iε(ρε) = supf∈C∞c (R×X )

Gε(ρε, f). (7.4)

The functional Gε(ρε, f) has the following form

Gε(ρε, f) =

∫X

[fT dρ

εT − f0 dρ

ε0

]−∫ T

0

∫X

[(∂t + Lε)ft

]dρεtdt−

1

2

∫ T

0

∫XA∇ft · ∇ft dρεt dt,

where A is the diffusion matrix. In order to study the asymptotic behavior of ρε, we studyGamma-convergence of the functional Iε instead. If one is only interested in convergenceof the solutions, one only needs to prove the liminf inequality in the Gamma-convergenceprovided that the limiting functional is non-negative. In this chapter, we introduce a newmethod for coarse-graining using the rate functional. The core idea of our method can besummarized in the following four steps.

Step 1. Choose a special class of test functions: By taking f = g Πε, where g ∈C∞c (R× Y), we obtain

Iε(ρε) ≥ supg∈C∞c (R×Y)

Gε(ρε, g Πε). (7.5)

7.2. From a perturbed Hamiltonian system to diffusion on a graph 145

Note that g Πε may not have compact support. Therefore, some approximationargument may be required to ensure that g Πε is admissible.

Step 2. Compactness property for ρε and ρε. In this step, one needs to prove that ρε

and ρε possess appropriate compactness property. Assume that ρεσ−→ ρ, ρε

σ−→ ρ,where σ and σ denote appropriate topologies.

Step 3. Prove that, up to an o(1) term, Gε(ρε, g Πε) depends only on g and thecoarse-grained variable ρε. We denote by Gε(ρε, g) the dominating term inGε(ρε, g Πε). In addition, suppose that we can pass to the limit, with respectto the topology σ, in the functional Gε(ρε, g) for any fixed g. If this assumptionshold, we may define

G(ρ, g) := limε→0

Gε(ρε, g) for fixed g, (7.6)

and alsoI(ρ) := sup

gG(ρ, g). (7.7)

Step 4. Derive the limiting system as the law ρ ∈ C([0, T ],P(X0)) that uniquelysatisfies I(ρ) = 0.

We now apply this method to derive two limiting systems: the overdamped (high friction)limit of the Kramers equation and the small-noise limit of a perturbed Hamiltonian system.


The rest of the chapter is organized as follows. In Section 7.2, we study the small-noiselimit. Section 7.3 is devoted to the overdamped (high friction) limit.

7.2 From a perturbed Hamiltonian system to diffu-

sion on a graph

We now describe the small-noise limit. We consider the following stochastically per-turbed Hamiltonian system, (see (1.25) with θ = m = 1, ε := γ and the time is rescaledt 7→ t/ε),

dQε =1

εPε, (7.8a)

dPε = −1

ε∇V (Qε) +

√2 dW. (7.8b)

The probability density ρε of (Qε, P ε) satisfies the following equation,

(Pε) ∂tρε = −1


ε,


where H(q, p) = p2

2+ V (q). The asymptotic behavior of this equation as ε ↓ 0 was first

studied by Freidlin and Wentzell [FW94]. They showed that the limiting system can bedescribed as a diffusion on a graph: over O(ε) time the solution follows level sets of H,while at O(1) time scale, it performs a biased Brownian motion between level sets.

In this section, we re-prove this result as an illustration of our method. The associatedrate functional is as follows (see (1.28)),

Iε(ρε) = supf∈C∞c (R×R2)

∫R2d

[fT dρ

εT − f0 dρ

ε0

]−∫ T

0

∫R2

[∂tf +

1

εJ∇H · ∇ft + ∆pft

]dρεtdt

−1

2

∫ T

0

∫R2

|∇pft|2]dρεtdt

. (7.9)

7.2.1 The case of one degree of freedom and single-well potential

We now discuss the simplest case: d = 1 and V is a single-well potential (i.e. strictlyconvex). In this case, the coarse-graining map is the Hamiltonian.

Theorem 7.2.1. Assume that

(S1) The rate functional and the initial data are uniformly bounded,

supε>0

[∫Hρε0 + Iε(ρε)

]< C. (7.10)

(S2) V is strictly convex, bounded from below and satisfies lim|q|→∞ V (q) = ∞. Withoutloss of generality, we assume that V ≥ 0.

(S3) (Growth conditions on H) There exist constant C such that

max|∇H|, |∆H| ≤ C(1 +H). (7.11)

Then the following hold

(1) (compactness properties) ρεt and the push-forward ρε := H#ρε satisfy,

supt∈[0,T ]

supε>0

∫R2

Hρεt < C, for some C > 0, (7.12)

andρε −→ ρ in C([0, T ],P(R)) for some ρ. (7.13)

(2) (local equilibrium property) ρt(dx) is “constant on level sets” in the sense that,

ρt(dx) = ρt(H(x))1

T (H(x))dx, (7.14)

where T is defined in (7.16).


(3) (liminf inequality) Iε satisfies the following liminf-inequality

lim infε→0

Iε(ρε) ≥ I(ρ), (7.15)

where

I(ρ) = supg∈C∞c (R×R)

[∫R

gTdρT −∫R

g0dρ0

−∫ T

0

∫R

(∂tg(h) + b(h)g′(h) + a(h)g′′(h) +

1

2a(h)(g′(h))2

)ρt(dh) dt

],

with

T (h) =

∫H−1(H(h))

1

|∇H(x)|H1(dx), (H1 is the 1-d Hausdorff measure), (7.16)

a(h) =1

T (h)

∫H−1(h)

|∇pH(x)|2

|∇H(x)|H1(dx), (7.17)

b(h) =1

T (h)

∫H−1(h)

∆pH(x)

|∇H(x)|H1(dx). (7.18)

(4) (The limiting system) The limiting system can be written as

∂tρ = ∂h(a(h)∂hρ)− ∂h(b(h)ρ). (7.19)

Proof. Since Iε(ρε) < ∞ and according to Section 4.2, there exists hεt ∈ L2(0, T ;L2∇(ρεt))

such that

∂tρεt = −1


ε − divp (hεtρεt).

The rate functional Iε(ρε) can be expressed in terms of hε as

Iε(ρε) =1

2

∫ T

0

|hεt |2ρεt dt. (7.20)

Therefore, for t ∈ [0, T ] and f ∈ C2c (R2d), we have

d

dt

∫R2

f(x)ρεt(x)dx =

∫R2

f(x)∂tρεt(x)dx (7.21)

=

∫R2

(1

εJ∇H · ∇f + ∆pf +∇pf · hεt

)ρεt . (7.22)


Substituting f = H in (7.22) we have the following formal calculation,

d

dt

∫R2d

Hρεt =

∫R2d

(1

εJ∇H · ∇H + ∆pH +∇pH · hεt

)ρεt

≤∫R2d

(∆pH +

1

2

[|∇pH|2 + |hεt |2

])ρεt

(7.11),(7.20)

≤ C

∫R2d

(1 +H)ρεt + Iε(ρε).

Using (7.10) and a Gronwall-type estimate, we obtain∫R2d

Hρεt < C.

To make these calculations rigorous we define for each m ∈ N, ψm ∈ C∞c (R) with 0 ≤ψm ≤ 1 such that |ψ′m| ≤ ψm/m and |ψ′′m| ≤ ψm/m

2. We make the choice fm(x) =H(x)ψm(H(x)), where note that fm ∈ C2

c (R2d). Proceeding as in calculation above andusing Gronwall type estimates we arrive at∫

R2d

fmρεt ≤ C.

Using monotone convergence theorem we obtain (7.12).

To prove (7.13), we will use [CL12, Theorem 3] which is an extension of the classicalcompactness results of Simon [Sim86] to the case of semi-normed spaces. The spatial com-pactness of ρε is a direct consequence of (7.12) and the coercivity of V . To prove the timecompactness, we define three spaces

X1 = (M+(R), ‖ · ‖1BL), X2 = (M(R), ‖ · ‖BL), X3 = (C20(R))∗,

where

‖µ‖1BL = ‖µ‖BL +

∫|x| dµ, ‖µ‖BL := sup

f∈BL(R), ‖f‖BL≤1

∣∣∣ ∫ f dµ∣∣∣ .

Here BL(R) denotes the space of bounded Lipschitz functions on R. Note that ‖ · ‖BLmetrizes the narrow topology. Then X1 is a seminormed nonnegative cone in X2. Moreover,X1 →→ X2 → X3. Take ϕ ∈ C2

0(R), we have∫R2d

ϕρετ

∣∣∣∣τ=t+s

τ=t

=

∫ϕ(H)ρετ

∣∣∣∣τ=t+s

τ=t

=

∫ t+s

t

∫R2d

(1

εJ∇H(x) · ∇ϕ(H(x)) + ∆pϕ(H(x)) +∇pϕ(H(x))hετ

)ρετ dτ.


The first term inside the integral above equals to 0. Using the argument as in the proofof (7.12), we find that ∣∣∣∣ ∫

R2d

ϕρεt+s −∫R2d

ϕρεt

∣∣∣∣ ≤ Cs.

By [CL12, Theorem 3], ρε is relatively compact in C([0, T ],P(R)).Now we prove (7.14). From (7.9), we have for every f ∈ C∞c (R×R2)∫ T

0

∫R2

J∇H · ∇f dρεtdt ≤ ε

[∫R2

[f0ρ

ε0 − fTρεT

]+

∫ T

0

∫R2

(∂t + ∆pft +

1

2|∇pft|2

)dρεtdt+ Iε(ρε)

]≤ Cε.

Substituting f by −f , we obtain the opposite inequality. This and together with (7.12) weget ∫ T

0

∫R2

J∇H · ∇f ρt(dx)dt = 0, for all f ∈ C∞c (R×R2). (7.23)

In particular, for each fixed t ∈ [0, T ] and f ∈ C2c (R2),∫

R2

J∇H(x) · ∇f(x)ρt(dx) = 0. (7.24)

Choosing f(x) = ζ(H(x))ψ(x), where ψ ∈ C2c (R2) is a spatial cutoff function, and applying

Disintegration Theorem [AGS08, Theorem 5.3.1], we get

0 =

∫R2

J∇H(x) ·(ζ(H(x))∇ψ(x)

)ρt(dx)

=

∫R

ζ(h)ρt(dh)

∫H−1(h)

∇ψ(x) · J∇H(x)

|∇H(x)||∇H(x)|ρt(dx|h).

We denote τ := J∇H|∇H| . Since τ ⊥ ∇H, |τ | = 1, τ is the tangential vector of the level set

H−1(h). Since the choice of ζ is arbitrary, we conclude

for ρt-a.e. h ∈ R,

∫H−1(h)

|∇H(x)|∂τψ(x)ρt(dx|h) = 0. (7.25)

Since ψ is arbitrary, the above equality implies that |∇H|ρt(dx|h) is constant on H−1(h).

This means that ρt(dx|h) = c(h)|∇H| where c(h) depends only on h but not x. Since ρt(dx|h)

is a probability measure on H−1(h), the function c(h) can be found by

1 = c(h)

∫H−1(h)

1

|∇H(x)|H1(dx),

or equivalently,

c(h) =1

T (h),


where T (h) is defined in (7.16).

As a consequence, we get

ρt(dx|h) =H1(dx)

T (h)|∇H(x)|, for ρt-a.e. h ∈ R. (7.26)

To obtain (7.14) we use the following co-area formula. The proof can be found in [MSZ03].

Lemma 7.2.2 (co-area formula for Sobolev mappings). Let H ∈ W 1,ploc (Ω,R) where Ω ⊂

R2d is an open subset such that ∇H(x) 6= 0 a.e. and g ∈ L1(R2d). Then,∫Ω

g(x)dx =

∫R

dh

( ∫H−1(h)∩Ω

g(x)

|∇H(x)|H2d−1(dx)

). (7.27)

Applying this lemma, on one hand, we have∫R2

f(x)ρt(x) dx =

∫R

dh

∫H−1(h)

f(x)ρt(x)

|∇H(x)|H1(dx). (7.28)

On the other hand, from (7.26), we have for any f ∈ C2c (R2),∫

R2

f(x)ρt(dx) =

∫R

ρt(dh)

∫H−1(h)

f(x)ρt(dx|h) =

∫R

ρt(dh)

T (h)

∫H−1(h)

f(x)

|∇H(x)|H1(dx).

(7.29)

Comparing (7.28) and (7.29) gives (7.14).

Next, we prove (7.15).

We take f(t, x) = g(t,H(x)) for g ∈ C∞c (R × R) and pass to the limit in the ratefunctional (7.9). We compute derivatives of f ,

∂tf = ∂tg(H), ∇f = g′(H)∇H, ∇pf = g′(H)∇pH, ∆pf = g′′(H)|∇pH|2 + g′(H)∆pg.

The first three terms are straightforward,∫R2

[fT dρ

εT − f0 dρ

ε0

]=

∫R2

[gT H dρεT − f0 H dρε0

]=

∫R2

[gT dρ

εT − f0 dρ

ε0

],∫ T

0

∫R2

∂tf dρεtdt =

∫ T

0

∫R

∂tg dρεtdt.

The fourth term vanishes since J is anti-symmetric,∫ T

0

∫R2

J∇H · ∇ft dρεtdt =

∫ T

0

∫R2

g′(H)J∇H · ∇H dρεtdt = 0.


To transform the last two terms we need to use (7.14) and (7.29). We have

lim infε→0

∫ T

0

∫R2

∆pfdρεtdt = lim inf

ε→0

∫ T

0

∫R2

[g′′(H(x))|∇pH(x)|2 + g′(H(x))∆pH(x)

]ρεt(x)dxdt

(7.30)

=

∫ T

0

∫R2

[g′′(H(x))|∇pH(x)|2 + g′(H(x))∆pH(x)

]ρt(x)dxdt

(7.29)=

∫ T

0

∫R

(a(h)g′′(h) + b(h)g′(h)

)ρt(dh)dt,

and

lim infε→0

∫ T

0

∫R2

|∇pf |2dρεtdt = lim infε→0

∫ T

0

∫R2

(g′(H(x)))2|∇pH|2dρεtdt

=

∫ T

0

∫R2

(g′(H(x)))2|∇pH|2dρtdt

(7.29)=

∫ T

0

∫R

a(h)(g′(h))2ρt(dh)dt, (7.31)

where a(h), b(h) are defined in (7.17)-(7.18).Combining all these terms we have,

Iε(ρε) ≥ supg∈C∞c (R×R)

[∫R

gTdρT −∫R

g0dρ0

−∫ T

0

∫R

(∂tg(h) + b(h)g′(h) + a(h)g′′(h) +

1

2a(h)(g′(h))2

)ρt(dh) dt

]=: I(ρ).

Note that by choosing g = 0, we always have I(ρ) ≥ 0. The limiting system (7.19) thenfollows from the form of the rate functional I(ρ).

7.2.2 Discussion on the case of many degrees of freedom andmulti-well potential

We expect that Theorem 7.2.1 still holds true for the case of multidimensional and/ormulti-well potential though the argument will be more involved.

We first need to introduce a notion of graph deduced from the level sets of the Hamil-tonian. Intuitively, the graph Γ consists of equivalence classes of level sets of H, underthe equivalence relation of belonging to the same connected component of the level sets ofH. To make this rigorous, we follow the idea/notation given in [FW94]. Note that the setof all connected components of the level sets of H are homeomorphic to a graph Γ (see


Figure 7.1: Left: Hamiltonian R2 3 (q, p) 7→ H(q, p), Right: Graph Γ

Fig. 7.1). Each periodic trajectory corresponds to an interior point of one of the edges.The equilibrium points which are maximum or minimum correspond to vertices connectedby just one edge and are called exterior vertices. Each saddle point Oi corresponds to thevertex connected by three edges called interior vertex.

To introduce a coordinate system on Γ, we first index each edge of the graph withnumbers 1, 2, . . . , n. Then the value of H(q, p) on the level set component correspondingto a point P ∈ Γ together with the index i = i(P ) of the edge containing P forms acoordinate system on Γ. We denote O ∼ Ik if the vertex O is at the end of edge Ik.Note that if O ∼ Ik1 , O ∼ Ik2 and O ∼ Ik3 and h0 is the value of H(q, p) at the pointcorresponding to O, then the coordinates (H0, k1), (H0, k2) and (H0, k3) correspond tothe same point O. If a point (h, k) is not a vertex of Γ, it corresponds to a periodictrajectory Ck(h). Each level set C(h) = (q, p) ∈ R2d : H(q, p) = h is the union offinite number of connected level sets (as we have assumed that H has finitely many saddlepoints). We define k : R2d → 1, . . . , n as the index of the edge Ik ⊂ Γ containing thepoint corresponding to the periodic trajectory starting at (q, p).

Once the graph is defined, we define the coarse-graining map Π : R2d → Γ by Π(q, p) =(H(q, p), k(q, p)) ∈ Γ.

The limiting equation will be diffusion on the graph. It consists of a system of drift-diffusion equations on edges and gluing conditions at the interior vertices Oi.

There are three difficulties in making this intuitive idea rigorous. First, the topology onthe graph needs to be defined rigorously. Secondly, in the multidimensional case, the localequilibrium statement can no longer be derived from (7.25). An extra noise needs to beintroduced appropriately as in [FW01]. Finally, behavior of the limiting system at interiorvertices (like O1 and O2 in Figure 7.1) is more involved and must be defined properly.This would lead to some gluing conditions as in [FW98a, FW01]. To get these conditions,the class of test functions on the variational formulation of the rate functional needs to beenlarged. This work is in progress.

7.3. From the Kramers equation to the Fokker-Planck equation 153

7.3 From the Kramers equation to the Fokker-Planck

equation

In this section, we derive the Fokker-Planck equation as the overdamped (high friction)limit of the Kramers equation. The overdamped limit was derived formally first in [Kra40]and has been extensively studied in the literature from different point of view such asasymptotic expansions or probabilistic methods, see for instance [Nel67, Wil76, GPK12]and references therein. We reprove this result to illustrate our method.

We recall the Kramers equation

∂tρ = − divq

(p

mρ

)+ divp

(∇qV (q)ρ

)+ γ

[divp

( pmρ)

+ θ∆pρ(t, q, p)], (7.32)

where m, γ, θ are positive constants. For simplicity we set θ = 1. The overdamped limitcorresponds to the limit γ →∞ in (7.32).

Rescaling time appropriately (speeding up by 1/γ) we arrive at

∂tρ = −γ divq

(p

mρ

)+ γ divp

(∇qV (q)ρ

)+ γ2

[divp

( pmρ)

+ ∆pρ].

The large-deviation rate functional associated to this equation is (see (7.4))

Iγ(ρ) = supf∈C∞c (R×R2d)

[ ∫R2d

(fTdρT − f0dρ0)−T∫

0

∫R2d

(∂tf + γ

p

m· ∇qf − γ∇qV · ∇pf − γ2 p

m· ∇pf

+ γ2∆pf

)dρtdt−

γ2

2

T∫0

∫R2d

|∇pf |2 dρtdt].

(7.33)

The rate functional can be written in a more general form in terms of the generator L as,

Iγ(ρ) = supf∈C∞c (R×R2d)

∫R2d

[fTρT − f0ρ0

]−∫ T

0

∫R2d

(∂tft + (J − A)∇H · ∇ft + ∆pft

+1

2|∇pft|2) dρt dt

= sup

f∈C∞c (R×R2d)

∫R2d

[fTρT − f0ρ0

]−∫ T

0

∫R2d

(∂tft + Lft +1

2(∇ft)TA∇ft) dρt dt

,

(7.34)

where

Lf = (J − A)∇H · ∇f + div(A∇f), J = γ

(0 I−I 0

), A = γ2

(0 00 I

).


We recall definition of the relative entropy and the relative Fisher information. Let µ(dx) =Z−1 exp(−H(x))dx be the invariant measure. The relative entropy H(ν

∣∣µ) and the relativeFisher information RF (ν

∣∣µ) of a measure ν with respect to µ is respectively given by

H(ν∣∣µ) =

∫R2d

dνdµ

log dνdµdµ if ν µ,

∞ otherwise.(7.35)

RF (ν∣∣µ) =

∫R2d

A∇ dνdµ·∇ dν

dµdνdµ

dµ if dν = ν(x)dx,∇ dνdµ∈ L1

loc(R2d),

∞ otherwise.(7.36)

We define the coarse-graining map as follows,

Πγ : R2d → Rd

(q, p) 7→ Πγ(q, p) = q +p

γ.

Conjecture 7.3.1. Assume that

(B1) The rate functional and the initial data are uniformly bounded

supγ>0

[Iγ(ργ) +

∫R2d

ργ0 log ργ0 +Hργ0

]<∞.

(B2) V is bounded from below and satisfies lim|q|→∞ = ∞ and ‖∇2V ‖∞ < ∞. Withoutloss of generality, we assume V ≥ 0.

Then the following hold

1. (compactness properties) ργ and the push-forward ργ := Πγ#ργ satisfy,

supt∈[0,T ]

supγ>0

∫R2d

H(q, p)ργt (dqdp) <∞, (7.37)

andργ → σ in C([0, T ],P(Rd)) for some σ. (7.38)

2. (local equilibrium statement)

ργ Z−1 exp

(− p2

2m

)σ in P([0, T ]×R2d). (7.39)

3. (liminf inequality) Iγ(ργ) satisfies the following liminf inequality

lim infγ→∞

Iγ(ργ) ≥ I(σ), (7.40)


where

I(σ) := supg∈C∞c (R×Rd)

∫Rd

gTdσT −∫Rd

g0dσ0 −T∫

0

∫Rd

(∂tg −∇V · ∇g + ∆g)dσtdt

−1

2

T∫0

∫Rd

|∇g|2 dσtdt

.4. (the limiting system) σ satisfies the Fokker-Planck equation

∂tσ = div(∇V σ) + ∆σ. (7.41)

We now show formal argument to support this conjecture. A crucial step is to establisha priori estimate on the relative entropy and the relative Fisher information.

A priori estimate (upper bound for the relative entropy and the relativeFisher information).

Claim 1: It holds that

H(ργT∣∣µ) +

1

2

∫ T

0

RF (ργt∣∣µ) dt ≤ I(ργ) +H(ργ0

∣∣µ). (7.42)

As a consequence,

supt∈[0,T ]

supγ>0

∫R2d

H(q, p)ργt (dqdp) <∞. (7.43)

We use the following variational formulation for the relative entropy and the Fisher infor-mation [FK06, Chapter 9 and Appendix D6],

H(ρ∣∣µ) = sup

ψ∈C∞c (R2d)

∫ψρ− log

∫eψ dµ

,

1

2RF (ρ|µ) = sup

ϕ∈C∞c (R2d)

∫ (− div(A∇ϕ) + A∇ϕ · ∇H − 1

2(∇ϕ)TA∇ϕ

)ρ.

Given ϕ and ψ, we take f such that

∂tft + Lft +1

2(∇ft)TA∇ft = div(A∇ϕ)− A∇ϕ · ∇H +

1

2(∇ϕ)TA∇ϕ, fT = ψ. (7.44)

For the Kramers, Lf = −γ pm·∇qf+∇V (q) ·∇pf−γ2 p

m·∇pf+γ2∆pf and the equation

above becomes

∂tf − γp

m· ∇qf +∇V (q) · ∇pf − γ2 p

m· ∇pf + γ2∆pf +

γ2

2|∇pf |2

= γ2∆pϕ− γ2∇pϕ ·p

m+γ2

2|∇pϕ|2. (7.45)


Set F = exp(f/2), then f = 2 logF . F satisfies the following equation∂tF − γ p

m· ∇qF +∇V (q) · ∇pF − γ2 p

m· ∇pF + γ2∆pF = γ2F

4[|∇pϕ|2 + 2∆pϕ] ,

FT = exp(ψ/2).

(7.46)

Assumption 7.3.2. Assume that we can take f as a test function in the variationalformulation of the rate functional (7.4).

Then we have

H(ρT∣∣µ) +

1

2

∫ T

0

RF (ρt∣∣µ) dt

= supψ,ϕ

∫ρTψ − log

∫eψ dµ−

∫ T

0

∫R2d

[div(A∇ϕ)− A∇ϕ · ∇H +

1

2(∇ϕ)TA∇ϕ

]ρt dt

(7.44)= sup

ψ,ϕ

∫ρTfT −

∫ T

0

∫R2d

[∂tft + Lft +

1

2(∇ft)TA∇ft

]ρt dt− log

∫eψ dµ

≤ I(ρ) + sup

ψ,ϕ

∫R2d

f0ρ0 − log

∫R2d

efT dµ

≤ I(ρ) +H(ρ0

∣∣µ) + supψ,ϕ

log

∫R2d e

f0 dµ∫R2d efT dµ

.

Now we prove that∫R2d e

f0 dµ ≤∫R2d e

fT dµ. This will be proven if we show that t 7→∫R2d e

ft dµ is an increasing function. Indeed, we compute its derivative with respect totime,

d

dt

∫R2d

eft dµ

(7.44)=

∫R2d

(− Lft −

1

2(∇ft)TA∇ft + div(A∇ϕ)− A∇ϕ · ∇H +

1

2(∇ϕ)TA∇ϕ

)eft dµ.

Since

−∫R2d

eft−HLft =

∫R2d

[−b(x) · ∇ft − div(A∇ft)]eft−H

=

∫R2d

(−J + A)∇H · ∇ft eft−H +

∫R2d

A∇ft · ∇(ft −H) eft−H

= −∫R2d

e−HJ∇H · ∇(eft) +

∫R2d

(∇ft)TA∇ft eft−H

=

∫R2d

eft div[e−HJ∇H] +

∫R2d

(∇ft)TA∇ft eft−H

=

∫R2d

A∇ft · ∇ft eft−H (since J is anti-symmetric),


and ∫R2d

div(A∇ϕ) eft−H = −∫R2d

A∇ϕ · ∇(ft −H) eft−H ,

it follows that

d

dt

∫R2d

eft dµ =

∫R2d

[1

2A∇ft · ∇ft +

1

2A∇ϕ · ∇ϕ− A∇ϕ · ∇ft

]eft−H

=1

2

∫R2d

A∇(ft − ϕ) · ∇(ft − ϕ) ≥ 0.

Therefore, t 7→∫eftµ is an increasing function. Thus we obtain∫

ef0 dµ ≤∫efT dµ.

The assertion (7.42) then follows. It is more helpful to use its explicit form as follows

supγ>0

supt∈[0,T ]

H(ργt∣∣µ) +

1

2γ2

∫ T

0

∫R2d

1

ρtγ

∣∣∣ pmργt +∇pρ

γt

∣∣∣2 dqdpdt≤ sup

γ>0Iγ(ργt ) +H(ρ0

∣∣µ) < C. (7.47)

Now we prove (7.12). It follows from the above estimate that

supt∈[0,T ]

supγ>0

∫R2d

ργt log ργt dqdp+

∫R2d

H(q, p)ργt dqdp <∞. (7.48)

Let 0 < α < 1. We have

0 ≤ H(ργt∣∣Z−1

α exp(−αH)) =

∫R2d

ργt log ργt dqdp+ α

∫R2d

H(q, p)ργt dqdp+ logZα.

It implies that ∫R2d

ργt log ργt dqdp ≥ −α∫R2d

H(q, p)ργt dqdp− logZα.

Substituting the above inequality into (7.48), we get

supt∈[0,T ]

supγ>0

∫R2d

H(q, p)ργt dqdp <∞.

Verify the conjecture: We now show how the conjecture can be deduced from (7.42)and (7.43).

1. Estimate (7.37) has been already proved in (7.43). Similarly as in the proof ofpart (1) of Theorem 7.2.1, the compactness properties of ργ and ργ follows directlyfrom (7.43).


2. The local equilibrium statement is a consequence of the vanishing of the relativeFisher information obtained from (7.47).

3. Now we prove the liminf inequality (7.40). In (7.33) by taking f(t, q, p) = g(t,Πγ(q, p)),and using

∂tf = (∂tg) Πγ, ∇qf = (∇g) Πγ, ∇pf =1

γ(∇g) Πγ, ∆pf =

1

γ2(∆g) Πγ,

∇V (q) = (∇V ) Πγ +∇V (q)−∇V(q +

1

γp),

we get

Iγ(ργ) ≥∫Rd

gTdργT −

∫Rd

g0dργ0 −

T∫0

∫Rd

(∂tg −∇V · ∇g + ∆g)dργt dt

− 1

2

T∫0

∫Rd

|∇g|2 dργt dt+

∫ T

0

∫R2d

[∇V (q)−∇V

(q +

1

γp

)]· ∇g(t, q +

1

γp)dργt dt.

(7.49)

In order to pass to the limit, we need to control the last term in (7.49). Since∣∣∣∇V (q)−∇V(q +

1

γp

)∣∣∣ ≤ 1

γ‖∇2V ‖∞|p|, |∇g(t, q +

1

γp)| ≤ ‖∇g‖∞,

we have ∣∣∣∣∣∣T∫

0

∫R2d

[∇V (q)−∇V

(q +

1

γp

)]· ∇g(t, q +

1

γp)dργt dt

∣∣∣∣∣∣≤ 1

γ

T∫0

∫R2d

‖∇2V ‖∞ ‖∇g‖∞ |p| dργt dt. (7.50)

Due to (7.43), the right hand side of (7.50) vanishes as γ →∞. Therefore

lim infγ→∞

Iγ(ργ) ≥ I(σ),

where

I(σ) := supg∈C∞c (R×Rd)

∫Rd

gTdσT −∫Rd

g0dσ0 −T∫

0

∫Rd

(∂tg −∇V · ∇g + ∆g)dσtdt

−1

2

T∫0

∫Rd

|∇g|2 dσtdt

.


4. It follows from the structure of I that the limiting system is the Fokker-Planckequation. In addition, according to [DG87], see also Chapter 3, I is the rate functionalof the large-deviation principle for the empirical process

σn(t, dx) :=1

n

n∑i=1

δXi(t),

where dXi(t) = −∇V (Xi(t)) dt +√

2 dWi(t) and Wi, i = 1, . . . , n are independentWiener processes.

To make the argument rigorous, we need to justify that the functions we used in Assump-tion 7.3.2 and in Step 3 are indeed admissible. In Assumption 7.3.2, it is not straightforwardto see whether or not f = 2 logF , where F is a solution of (7.46), is bounded and hassufficient regularity. We expect that this difficulty can be overcome by using the factthat (7.44) is a hypoelliptic equation. In Step 3, the function f = g Πγ does not hascompact support. Hence, we need to approximate these two functions by a sequence ofsmooth functions with compact support. Some modification of the argument in the proofof Lemma 4.11 in [DG87] might be required. This work is in progress.

Chapter 8

The two-scale approach tohydrodynamic limits fornon-reversible dynamics

In [GOVW09], a new method to study hydrodynamic limits was developed for reversibledynamics. In this chapter, we generalize this method to a family of non-reversible dynamics.As an application, we obtain quantitative rates of convergence to the hydrodynamic limitfor a weakly asymmetric version of the Ginzburg-Landau model endowed with Kawasakidynamics. These results also imply local Gibbs behavior, following a method of [Fat13].1

8.1 Introduction

In this chapter, we are interested in generalizing the results of [GOVW09] on hydro-dynamic limits to the case of weakly asymmetric interacting spin systems. We obtainquantitative rates of convergence to the hydrodynamic limit for such dynamics. Our maincontribution is a method of controlling the effects of the antisymmetric component of thedynamic.

A typical result of convergence to the hydrodynamic limit consists in proving that,under a suitable time-space scaling and for nice initial conditions, a random systems witha large number of particles behaves like a deterministic object, given as the solution of apartial differential equation.

In [GOVW09], a new method to study such problems was developed. It consists inestablishing estimates in Wasserstein distance between the distribution of the system anda well-chosen macroscopic state, given as the solution of a differential equation. The mainelements are a coarse-graining argument and a logarithmic Sobolev inequality. It wasapplied to dynamics of the form

dXt = −A∇H(Xt)dt+√

2AdWt,

1This chapter is joint work with Max Fathi [DF14].

161

162 The two-scale approach to hydrodynamic limits for non-reversible dynamics

on some Euclidean space, where A is a positive definite matrix, H is the Hamiltonian andW is a Wiener process. In the case where A and H correspond to the Ginzburg-Landaumodel endowed with Kawasaki dynamics, they obtained scaling limits of the form

∂ρ

∂t=

∂2

∂θ2ϕ′(ρ).

In this chapter, we add an extra term to the previous dynamic, and study

dXt = −A∇H(Xt)dt− J∇H(Xt)dt+√

2AdWt,

where J is an antisymmetric matrix. This extra term makes the dynamic non-reversible,but does not modify the invariant measure. For a particular choice of J , we obtain a scalinglimit of the form

∂ρ

∂t=

∂2

∂θ2ϕ′(ρ) +

∂

∂θϕ′(ρ).

Our method is restricted to the case where the square of the antisymmetric part −J2 iscontrolled by A (in the sense of symmetric matrices). This is because if the antisymmetriccomponent becomes dominant in the scaling limit, we would expect the limiting PDE tobe hyperbolic (rather than parabolic), and estimates in Wasserstein distances would notbe adapted.

These estimates in Wasserstein distance also allow us to study local Gibbs behavior(which is stronger form of convergence) by using an interpolation inequality, following amethod developed in [Fat13]. We also obtain quantitative rates of convergence for themicroscopic free energy to its scaling limit.

The plan of the chapter is as follows: in Section 2, we present the framework and ourmain results. Section 3 contains the proofs of our results in the abstract setting. In section4, we give the proofs of convergence to the hydrodynamic limit for the Ginzburg-Landaumodel endowed with a weakly asymmetric version of Kawasaki dynamics.

Throughout this chapter, we will use the following notations.

Notations

• C denotes a positive constant, which may vary from line to line, or even within aline;

• ∇ is the gradient, Hess stands for Hessian, | · | is the norm and 〈·, ·〉 is an innerproduct. If necessary, a subscript will indicate the space on which these are taken.

• At is the adjoint of the operator A.

• Φ is the function defined by Φ(x) := x log x on R+.

• Entµ(f) =∫f(log f)µ −

(∫fµ)

log(∫

fµ)

is the entropy of the positive function fwith respect to the measure µ.

• Z is a constant enforcing unit mass for a probability measure.

8.2. Framework and main results 163

8.2 Framework and main results

8.2.1 Abstract setting

Let X, Y be two Euclidean spaces with X ⊂ RN , Y ⊂ RM . We think of X as themicroscopic space and Y as the macroscopic space. N and M can then be thought of asthe size of the microscopic and macroscopic data respectively. Let A and J be respectivelya positive definite symmetric and an anti-symmetric linear operators on X. Let H : X → Rbe a given function. We consider the stochastic dynamic on X that is given by the followingstochastic differential equation

dXt = −A∇H(Xt) dt− J∇H(Xt) dt+√

2AdWt, (8.1)

where Wt is a Wiener process, and√A is the square root of the matrix A. When J 6= 0,

this is a non-reversible process, and the Fokker-Planck equation associated to this SDE is

∂t(fµ) = div[µ(A+ J)∇f ], (8.2)

where µ is the invariant measure of the dynamic, which is

µ(dx) :=1

Zexp(−H(x))dx.

In the application we have in mind, which we shall present in the next section, A willbe the discrete Laplacian, and J the discrete derivation.

We now introduce an abstract framework for the notion of coarse-graining operator.Let P : X → Y be a linear operator such that

NPP t = idY , (8.3)

where P t is the adjoint operator of P . We think of y = Px as the macroscopic state asso-ciated to the microscopic state x. This operator induces a decomposition of the invariantmeasure into a macroscopic component and a fluctuation component. Let µ(dy) = P]µ bethe push-forward of µ under the operator P and µ(dx|y) be the conditional measure of µgiven Px = y, i.e., for each y, µ(dx|y) is a probability measure on X and satisfies that forany test function ϕ ∫

X

ϕ(x)dµ(x) =

∫Y

(∫Px=y

ϕ(x)µ(dx|y))µ(dy). (8.4)

Applying the technique in [GOVW09], we show that under certain conditions, themacroscopic profile y = Px, with law given by f(t, y) =

∫Px=y

f(t, x)µ(dx), is close to thesolution of the following differential equation

dη

dt= −(A+ J)∇H(η(t)). (8.5)


In this equation, A is a symmetric, positive definite operator and J is another operatoron Y defined by

A−1

= PA−1NP t, J = APA−1NJP t, (8.6)

and H : Y → R is the macroscopic Hamiltonian that satisfies

µ(dy) = exp(−NH(y))dy. (8.7)

In order to state the assumptions, we need to recall the definition of the LogarithmicSobolev inequality. A probability measure ν ∈ P(X) is said to satisfy an LSI with constantρ > 0 (abbreviation LSI(ρ)) if, for any locally Lipschitz, nonnegative function f ∈ L1(ν),∫

Φ(f) dν − Φ

(∫f dν

)≤ 1

2ρ

∫|∇f |2

fdν.

Assumptions: Throughout the chapter, we assume that

(i) κ := maxx∈X〈HessH(x) · u, v〉, u ∈ Ran(NP tP ), v ∈ Ran(idX −NP tP ), |u| = |v| =1 <∞;

(ii) There is ρ > 0 such that µ(dx|y) satisfies LSI(ρ) for all y;

(iii) There exist λ,Λ > 0 such that λId ≤ HessH ≤ ΛId;

(iv) There is α > 0 such that∫X|x|2fµ(dx) ≤ αN ;

(v) There is β > 0 such that infy∈Y H(y) ≥ −β;

(vi) There is γ > 0 such that for all x ∈ X,

|(idX −NP tP )x|2 ≤ γM−2〈x,Ax〉X ;

(vii) There are constants C1 and C2 such that the initial datum satisfy∫Φ(f(0, x))µ(dx) ≤ C1N and H(η0) ≤ C2;

(viii) There is a τ > 0 such that A ≥ τId;

(ix) −J2 ≤ cA;

(x) J and A commute.

Under these assumptions, we have the following bound on the Wasserstein distancesbetween fµ and δη.


Theorem 8.2.1. Let µ(dx) = exp(−H(x)) dx be a probability measure on X, and letP : X → Y satisfy (8.3). Let A : X → X be a symmetric, definite positive operator,and f(t, x) and η(t) be the solutions of (8.2) and (8.5), with initial data f(t, ·) and η0

respectively. Suppose that the assumptions above hold. Define

Θ(t) :=1

2N

∫X

(x−NP tη(t)) · A−1(x−NP tη(t))f(t, x)µ(dx).

Then for any T > 0, we have

max

sup0≤t≤T

Θ(t),λ

8

∫ T

0

(∫Y

|y − η(t)|2Y f(t, y)µ(dy)

)dt≤ e

8cΛ2

λT[Θ(0) + E(T,M,N)

],

where E(T,M,N)→ 0 as N ↑ ∞,M ↑ ∞, NM↑ ∞. More precisely,

E(T,M,N) = T

(M

N

)+

4cγΛ2T

λ

(α +

2C1

ρ

)1

M+ C1

(γκ2

2λρ2+

2cγκ2

τλρ2+

4γc

λτ

)1

M2

+√

2Tγ

(α +

2C1

ρ

) 12(

1 +

√c

τ+

√2cγ

M

)√C1

+√

2

(1 +

√c

τ

)(H(η0)−H(ηT )) + CT (1 + eCTH(η0))

12

1

M,

where

ρ :=1

2

ρ+ λ+κ2

ρ−

√(ρ+ λ+

κ2

ρ

)2

− 4ρλ

.

Remark 8.2.2 (Remarks on the assumptions). Assumptions (i) to (viii) are collectedfrom [GOVW09] and [Fat13]. Assumption (ix) means that the asymmetric effect is con-trolled by the symmetric one. Its main use is to rule out situations where the scaling limitis a hyperbolic equation (this would be the case for a continuous analog of the asymmetricexclusion process), which the two-scale approach doesn’t seem to handle. Assumption (x) isnatural if we think of J and A are finite approximations of first and second derivatives oper-ators, which is the application we have in mind. It could be replaced by an appropriate boundon the symmetric part of PA−1JNP t (which is the macroscopic component of the commu-tator between A−1 and J), and an additional bound of the form |Tr(PJA−1NP t)| ≤ CM .But since our proof is already fairly technical, and we do not have an application in mindthat would warrant the greater generality, we decided to just assume that A and J commute,and simplify the proof. All these assumptions will be used in Lemma 3.4 to estimate thetime derivative of Θ(t). In particular, (ii) and (vi) are used to handle the covariance andfluctuations terms respectively.

The hydrodynamic limit is obtained as a consequence.


Corollary 8.2.3. Consider a sequence X`, Y`, P`, A`, J`, µ`, f0,`, η0,`` satisfying the as-sumptions (i) to (x) with uniform constants κ, ρ, λ,Λ, α, β, γ, C1, C2 and c. Suppose that

N` −→`↑∞∞; M` −→

`↑∞∞;

M`

N`

−→`↑∞

0.

Further assume that

lim`↑∞

1

N`

∫(x−N`P

tη0,`) · A−1` (x−N`P

tη0,`)f0,`(x)µ`(dx) = 0.

Then, for any T > 0

(a) The microscopic variables are close to the solution of (8.5) in the penalized norminduced by A−1

` , uniformly in t ∈ [0, T ],

lim`↑∞

sup0≤t≤T

1

N`

∫(x−N`P

tη`) · A−1` (x−N`P

tη`)f`(t, x)µ`(dx) = 0;

(b) The macroscopic variables are close to the solution of (8.5) in the strong L2(Y ) norm,in a time-integrated sense,

lim`↑∞

∫ T

0

∫|y − η`|2Y f(t, y)µ(dy)dt = 0.

Another topic of interest is whether the data behaves like a local Gibbs state.

Definition 8.2.4. The local Gibbs state with macroscopic profile η ∈ Y is the probabilitymeasure on X whose density with respect to µ is given by

G(x)µ(dx) :=1

Zexp(NP t∇H(η) · x)µ(dx).

Such a probability measure is close (in Wasserstein distance) to the associated macro-scopic profile η.

In [Yau91], it is shown that, if the initial data is close (in the sense of relative entropy)to a local Gibbs state, then this also holds at any positive time, for a time-dependentlocal Gibbs state. Since closeness in relative entropy is stronger (in the current setting)than closeness in Wasserstein distance, the kind of results obtained with Yau’s method arestronger than those of the previous Corollary, but require a stronger assumption on theinitial data.

In [Kos01], it was shown that convergence in relative entropy actually holds at positivetimes, even if the initial data converges only in a weaker sense. In [Fat13], the second authorobtained a new proof of this fact in the reversible setting, using the two-scale approach.This method also yields quantitative rates of convergence in relative entropy. Now that wehave generalized the two-scale approach to the non-reversible setting, the extension of theresults of [Fat13] follows.


Theorem 8.2.5. Let G(t, x) be the time-dependent local Gibbs state associated to thesolution η of (8.5). Under our assumptions, the following holds

(a) The relative entropy with respect to the local Gibbs state is controlled as follows∫ T

0

1

N

∫Φ

(f(t, x)

G(t, x)

)G(t, x)µ(dx)dt = O

(√Θ(0) +

M

N+

1

M

), (8.8)

where the actual constants in the bound (which can be made explicit) depend on T , λ, Λ,α, γ, ρ, κ, τ , c, C1 and C2, but not on M and N ;

(b) The difference between the microscopic free energy and the free energy associatedwith the macroscopic profile η is bounded as follows∫ T

0

∣∣∣∣ 1

N

∫Φ(f(t, x))µ(dx)− H(η(t))

∣∣∣∣ dt= O

(√Θ(0) +

M

N+

1

M

)

+ O(M

N

)×max

(∣∣∣∣log

(Γ(Y, | · |Y )2/(M−1)

ΛN

)∣∣∣∣ , ∣∣∣∣log

(Γ(Y, | · |Y )2/(M−1)

λN

)∣∣∣∣) , (8.9)

where Γ(Y, | · |Y ) is the Gaussian integral on the space Y with respect to the norm | · |Y .

8.2.2 Application to spin systems

We now give an application of Theorem 8.2.1 to a system of interacting continuousspins. The application we have in mind is when the matrices A and J are given by

A = N2

2 −1 −1−1 2 −1

. . . . . . . . .

−1 2 −1−1 −1 2

, (8.10)

and

J =N

2

0 1 −1−1 0 1

. . . . . . . . .

−1 0 11 −1 0

. (8.11)

As in [GOVW09], let

H(x) :=N∑i=1

ψ(xi), (8.12)


where ψ : R −→ R satisfies the following assumptions

ψ(x) =1

2x2 + δψ(x); ||δψ||C2 <∞. (8.13)

We consider the dynamic where A and J are given by (8.10) and (8.11) respectively.This corresponds to the system of stochastic differential equations

dXi(t) = −N2(2ψ(Xi)−ψ(Xi+1)−ψ(Xi−1))dt−N2

(ψ(Xi+1)−ψ(Xi−1)dt+N√

2(dBi+1t −dBi

t).

This is the dynamic studied in [GPV88] and [GOVW09], to which we have added aweak asymmetric perturbation. This model is to the symmetric dynamic what the weaklyasymmetric exclusion process is to the simple symmetric exclusion process, i.e., we haveadded an extra asymmetric term which has a scaling of lower order in N .

Since this dynamic conserves the mean spin m = N−1∑Xi, the natural space on which

to work is

XN,m :=

x ∈ RN ;

1

N

N∑i=1

xi = m

,

which we endow with the usual `2 scalar product. Following [GOVW09], the macroscopicspace is

YM,m :=

y ∈ RM ;

1

N

M∑i=1

yi = m

,

which we endow with the L2 scalar product

〈y, y〉Y :=1

M

M∑i=1

yiyi.

The coarse graining operator P is defined as

(Px)i :=1

K

iK∑j=(i−1)K+1

xi,

where K is an integer such that N = KM . We can think of this coarse-graining operatoras taking local averages of the microscopic profile over boxes of size K. This operator doessatisfy the relation PNP t = idY .

When K is large enough, it has been shown that the coarse-grained Hamiltonian H isuniformly convex, so we will be able to apply the previous abstract Theorem.

Without loss of generality, we shall assume in the sequel that m = 0, since it does notplay a role in our estimates.

To study the scaling limit, we need to embed our spaces XN,m into a single functionalspace. To a macroscopic profile x ∈ XN,0, we associate the step function on the torus x,defined by


x(θ) := xi ∀θ ∈[i− 1

N,i

N

).

We endow the space L2(T) with the H−1 norm, defined by

||w||2H−1 =

∫g2dθ, g′ = w,

∫g dθ = 0.

The closure of the spaces XN,0 for this norm is the usual H−1 space of functions of average0, which is the dual of the Sobolev space H1 for the L2 norm.

We can now state the hydrodynamic limit result we obtain for this model :

Theorem 8.2.6. Let A` and J` be given by (8.10) and (8.11) respectively. Assume thatψ satisfies (8.13). Let f(t, x) be a time-dependent probability density on (XN,0, µN,0) solv-ing (8.2), with f(0, ·) = f0 such that∫

f0 log f0dµN,0 ≤ CN,

for some C > 0 and

limN↑∞

∫||x− ζ0||2H−1f0(x)µN,0(dx) = 0,

for some initial macroscopic profile ζ0 ∈ L2(T). The, for any T > 0, we have

limN↑∞

sup0≤t≤T

∫||x− ζ(t, ·)||2H−1f(t, x)µN,0(dx) = 0,

where ζ is the unique solution of∂ζ∂t

= ∂2

∂θ2ϕ′(ζ) + ∂

∂θϕ′(ζ),

ζ(0, ·) = ζ0,(8.14)

where ϕ is the Cramer transform of ψ, i.e.

ϕ(m) = supσ∈R

σm− log

∫R

exp (σx− ψ(x))dx

. (8.15)

We can also use [Fat13] to study local Gibbs behavior, and convergence of the relativeentropy.

Theorem 8.2.7. Under the same assumptions as in Theorem 8.2.6, the following holds∫ T

0

∫XN

Φ

(fN(t, x)

GN(t, x)

)GN(t, x)µN(dx)dt −→ 0, (8.16)


where GN(t, ·) is the local Gibbs state given by ηN(t). As a consequence, we have conver-gence of the microscopic entropy to the hydrodynamic entropy, in a time-integrated sense

∫ T

0

∣∣∣∣ 1

N

∫Φ(fN(t, x))µN(dx)−

(∫Tϕ(ζ(θ, t))dθ − ϕ

(∫Tζ(t, θ)dθ

))∣∣∣∣ dt →N→∞ 0. (8.17)

Moreover, convergence of 1N

∫Φ(fN(t, x))µN(dx) to

∫T ϕ(ζ(θ, t))dθ − ϕ

(∫T ζ(t, θ)dθ

)holds

uniformly on any time-interval [ε, T ], for any 0 < ε < T .

Since deducing this result from 8.2.5 is nearly the same as in [Fat13], we omit the proof.The only significant difference is proving that the solution of the hydrodynamic equationζ is smooth on [ε, T ], which is a known result, that can be proven by a straightforwardadaptation of the proof of Proposition 3.22 in [Fat13].

8.3 Proof of the abstract results

In this section, we prove Theorem 8.2.1 and provide a sketch of proof of Theorem 8.2.5.

8.3.1 Proof of Theorem 8.2.1

Following the approach of [GOVW09], we prove Theorem 8.2.5 in three steps : first wedifferentiate with respect to time the Wasserstein distance between f(t)µ and the macro-scopic profile η(t), then we derive an upper bound for the quantity we obtain, beforeintegrating in time and applying Gronwall’s Lemma to obtain the result.

8.3. Proof of the abstract results 171

Lemma 8.3.1. The time-derivative of Θ(t) is given by the following formula

d

dt

1

2N

∫X

(x−NP tη(t)) · A−1(x−NP tη(t))f(t, x)µ(dx)

=M

N−∫Y

(y − η) · (∇YH(y)−∇YH(η))f(t, y)µdy

−∫Y

PJA−1NP t(y − η) · (∇YH(y)−∇YH(η))f(t, y)µdy

−∫Y

(y − η) · P covµ(dx|y)(f,∇H)µdy

− 1

N

∫X

(idX −NP tP )x · ∇f(t, x)µ(dx)

+

∫A∇YH(η) · PA−1(idX −NP tP )xfµ(dx)

+

∫Y

PJA−1NP t(y − η) · P covµ(dx|y)(f,∇H)dy

+

∫Y

PJA−1(idX −NP tP )x · P∇f(t, x)µ(dx)

+1

N

∫X

(idX −NP tP )JA−1(x−NP tη) · ∇f(t, x)µ(dx)

+

∫X

PA−1(idX −NP tP )x · J∇YH(η)f(t, x)µ(dx). (8.18)

Proof. We have

d

dt

1

2N

∫X

(x−NP tη(t)) · A−1(x−NP tη(t))f(t, x)µ(dx)

(8.2)= − 1

N

∫X

A−1(x−NP tη) · (A+ J)∇fµ(dx)−∫P tdη

dt· A−1(x−NP tη)fµ(dx)

(8.5)= − 1

N

∫X

A−1(x−NP tη) · A∇fµ(dx) +

∫A∇YH(η) · PA−1(x−NP tη)fµ(dx)

− 1

N

∫A−1(x−NP tη) · J∇fµ(dx) +

∫A−1(x−NP tη) · P tJ∇H(η)fµ(dx)

= (I) + (II) + (III) + (IV ). (8.19)

We now use the decomposition x = NP tPx + (idX −NP tP )x to transform each term onthe right hand side of (8.19). We need the following definition of the µ-covariance of twofunctions f, g ∈ L2(µ)

covµ(f, g) =

∫fg dµ−

(∫f dµ

)(∫g dµ

). (8.20)


The first two terms, (I) and (II), are already done in [GOVW09]. We repeat here forthe sake of completeness.

(I) = − 1

N

∫X

(x−NP tη) · ∇fµ(dx)

= −∫X

P t(Px− η) · ∇fµ(dx)− 1

N

∫(idX −NP tP )x · ∇fµ(dx). (8.21)

We now transform the first term in (8.21) using (8.4) and Lemma 21 in [GOVW09].

−∫X

P t(Px− η) · ∇fµ(dx) = −∫

(Px− η) · P∇fµ(dx)

− (8.4)=

∫Y

(y − η) · P∫Px=y

∇fµ(dx|y)µdy

[GOVW09,(36)]= − 1

N

∫(y − η) · ∇Y fµdy −

∫(y − η) · P covµ(dx|y)(f,∇H)µdy

(8.7)=

1

N

∫∇Y · yfµdy −

∫(y − η) · ∇YH(y)fµdy −

∫(y − η) · P covµ(dx|y)(f,∇H)µdy

=dimY

N−∫

(y − η) · ∇YH(y)fµdy −∫

(y − η) · P covµ(dx|y)(f,∇H)µdy.

We obtain

(I) =dimY

N−∫

(y − η) · ∇YH(y)fµdy −∫

(y − η) · P covµ(dx|y)(f,∇H)µdy

− 1

N

∫(idX −NP tP )x · ∇fµ(dx). (8.22)

Now we proceed with (II).

(II) =

∫A∇YH(η) · PA−1NP t(Px− η)fµ(dx) +


(8.6)=

∫∇YH(η) · (Px− η)fµ(dx) +


(8.4)=

∫Y

(y − η) · ∇YH(η)fµ(dy) +

∫A∇YH(η) · PA−1(idX −NP tP )xfµ(dx).

(8.23)

Next, we continue with (III).

(III) =1

N

∫JA−1(x−NP tη) · ∇fµ(dx)

=1

N

∫PJA−1(x−NP tη) ·NP∇fµ(dx) +

1

N

∫(idX −NP tP )JA−1(x−NP tη) · ∇fµ(dx)

=1

N

∫PJA−1NP t(Px− η) ·NP∇fµ(dx) +

1

N

∫PJA−1(idX −NP tP )x ·NP∇fµ(dx)

+1

N

∫(idX −NP tP )JA−1(x−NP tη) · ∇fµ(dx).


The first term on the right hand side of the expression above can be transformed furtherusing Lemma 21 in [GOVW09] as done for (I).

1

N

∫PJA−1(x−NP tη) ·NP∇fµ(dx)

=1

N

∫Y

(∫Px=y

PJA−1NP t(y − η) ·NP∇fµ(dx|y))µdy

=1

N

∫Y

PJA−1NP t(y − η) ·[∇Y f(y) +NP covµ(dx|y)(f,∇H)

]µ(dy)

=1

N

∫Y

PJA−1NP t(y − η) · ∇Y f(y)µ(dy) +

∫Y

PJA−1NP t(y − η) · P covµ(dx|y)(f,∇H)µ(dy)

= −Tr(PJA−1NP t)

N+

∫Y

PJA−1NP t(y − η) · ∇YH(y)fµ(dy)

+

∫Y

PJA−1NP t(y − η) · P covµ(dx|y)(f,∇H)µ(dy).

Since PJA−1NP t is anti-symmetric, Tr(PJA−1NP t)=0, and we obtain

(III) =

∫Y

PJA−1NP t(y − η) · ∇YH(y)fµ(dy)

+

∫Y

PJA−1NP t(y − η) · P covµ(dx|y)(f,∇H)µ(dy)

+1

N

∫PJA−1(idX −NP tP )x ·NP∇fµ(dx)

+1

N

∫(idX −NP tP )JA−1(x−NP tη) · ∇fµ(dx). (8.24)

Finally, we now transform (IV ).

(IV ) =

∫PA−1NP t(Px− η) · J∇YH(η)fµ(dx) +

∫PA−1(idX −NP tP ) · J∇YH(η)fµ(dx)

(8.4)=

∫Y

PA−1NP t(y − η) · J∇YH(η)fµ(dy) +

∫PA−1(idX −NP tP ) · J∇YH(η)fµ(dx)

(8.6)= −

∫PJA−1NP t(y − η) · ∇YH(η)fµ(dy) +

∫PA−1(idX −NP tP ) · J∇YH(η)fµ(dx).

(8.25)

Substituting (8.22)-(8.25) into (8.19), we obtain (8.18) and the lemma is proven.

The following auxiliary lemma will be helpful in the sequel. The second and the thirdparts are respectively (54) and (52) in [GOVW09]; we put them here for readers’ conve-nience.

Lemma 8.3.2. We have the following estimate


1. For every y ∈ Y

|PJA−1NP ty|2 ≤ c〈A−1y, y〉 ≤ c

τ|y|2Y , (8.26)

〈APJA−1NP ty, PJA−1NP ty〉 ≤ c|y|2. (8.27)

2. For every x ∈ X

(idX −NP tP )x · A−1(idX −NP tP )x ≤ γ

M2|x|2. (8.28)

3. It holds that

|NP tP covµ(dx|y)(f,∇H)|2 ≤ γκ2

ρ2

1

M2f

∫1

f∇f · A∇fµ(dx|y). (8.29)

Proof. We only need to prove the first part.We start with (8.26). The first inequality is obtained using the assumption (2) and the

fact that NP tP is an orthogonal projection as follows.

〈PJA−1NP ty, PJA−1NP ty〉 =1

N〈NP tPJA−1NP ty, JA−1NP ty〉

≤ 1

N〈JA−1NP ty, JA−1NP ty〉

= − 1

N〈J2A−1NP ty, A−1NP ty〉

≤ c

N〈A−1NP ty,NP ty〉 (used assumption (ix) here)

= c〈NPA−1P ty, y〉

= c〈A−1y, y〉.

Now we prove the second one. Since τ is a lower bound on the spectral value of A, 1τ

is anupper bound on that of A−1. Hence

〈A−1y, y〉 = 〈PA−1NP ty, y〉 =

1

N〈A−1NP ty,NP ty〉 ≤ 1

Nτ〈NP ty,NP ty〉 =

1

τ|y|2Y .

Next, we prove (8.27). By duality, we have

〈APJA−1NP ty, PJA−1NP ty〉 = supz2〈PJA−1NP ty, z〉 − 〈A−1z, z〉

(8.26)

≤ supz2〈y, PJA−1NP tz〉 − c−1|PJA−1NP tz|2

≤ supz2〈y, z〉 − c−1|z|2

≤ c|y|2.


Lemma 8.3.3. If f(t, x) and η(t) satisfy the assumptions of theorem 8.2.1, then for anyT <∞ we have∫ T

0

∫1

f∇f · A∇f(t, x)µ(dx)dt =

∫Φ(f(0, x))µ(dx)−

∫Φ(f(T, x))µ(dx); (8.30)

∫ T

0

〈A∇YH(η),∇YH(η)〉dt ≤ 2(H(η0)−H(ηT )) + CT (1 + eCTH(η0)), (8.31)

where C > 0 is a constant;(∫|x|2f(t, x)µ(dx)

) 12

≤(∫|x|2µ(dx)

) 12

+

(2

ρ

∫Φ(f(0, x))µ(dx)

) 12

. (8.32)

Proof. The proof of this lemma is similar to that of proposition 24 in [GOVW09]. Weprove (8.30) first. We have

d

dt

∫Φ(f(t, x))µ(dx) =

∫(log f + 1)∂t(fµ)

=

∫(log f + 1)div(µ(A+ J)∇f)

= −∫

(A+ J)∇f · ∇ffµ(dx)

= −∫

1

fA∇f · ∇fµ(dx) (since J is anti-symmetric). (8.33)

Thus (8.30) follows. Next we prove (8.31). We have

d

dtH(η(t)) = 〈η(t),∇YH(η)〉

(8.5)= −〈A∇YH(η),∇YH(η)〉 − 〈J∇YH(η),∇YH(η)〉

= −〈A∇H(η),∇H(η)〉 − 〈APJA−1NP t∇H(η),∇H(η)〉

≤ 1

2〈APJA−1NP t∇H(η), PJA−1NP t∇H(η)〉 − 1

2〈A∇YH(η),∇YH(η)〉

(8.27)

≤ c

2|∇H(η)|2 − 1


and therefore

d

dtH(η(t)) +

1


≤ c

2|∇H(η)|2

≤ C(|η|2 + 1)

≤ C(H(η) + 1).


In the above estimate, C > 0 is a general constant. Note that we have used the as-sumption (iii). The above Gronwall-type inequality implies that for every t ≥ 0, we haveH(η(t)) ≤ eC(T+1)H(η0), and∫ T

0

〈A∇YH(η),∇YH(η)〉dt ≤ 2(H(η0)−H(ηT )) + CT (1 + eCTH(η0)).

By (8.33),∫

Φ(f(t, x))µ(dx) is non-increasing in t; hence the proof of (8.32) is the sameas that of (46) in [GOVW09].

Lemma 8.3.4. We have the following estimate

d

dtΘ(t)− 8cΛ2

λΘ(t) +

λ

8

∫|y − η|2fµ(dy)

≤ M

N+

4cγΛ2

2λNM2

∫|x|2fµ(dx)

+

(γκ2

2λρ2M2+

2cγκ2

τλρ2M2+

4γc

λτM2

)∫1

Nf∇f · A∇fµ(dx)

+

√γ

M

(∫1

N|x|2fµ(dx)

) 12

[(1 +

√c

τ+

√2cγ

M

)(∫1


) 12

(8.34)

+

(1 +

√c

τ

)(A∇YH(η) · ∇YH(η)

) 12

]. (8.35)

Proof. We estimate each term in (8.18). The 2nd, 4th and 5th terms are already donein [GOVW09]. We get

−∫Y

(y − η) · (∇YH(y)−∇YH(η))fµdy ≤ −λ∫|y − η|2Y fµdy, (8.36)∣∣∣∣∫ (y − η) · P covµ(dx|y)(f,∇H)µdy

∣∣∣∣ ≤ γκ2

2λρ2M2

∫1

Nf∇f · A∇fµ(dx) +

λ

2

∫|y − η|2Y fµ(dy),

(8.37)∣∣∣∣ 1

N

∫(idX −NP tP )x · ∇fµ(dx)

∣∣∣∣ ≤ ( γ

M2

∫1

Nf∇f · A∇fµ(dx) ·

∫1

N|x|2fµ(dx)

) 12

.

(8.38)

We estimate the 3rd term. Since

|PJA−1NP t(y − η)| · |∇YH(y)−∇YH(y)| ≤ Λ|y − η| · |PJA−1NP t(y − η)|(8.26)

≤ Λ|y − η|√c〈A−1

(y − η), y − η〉

≤ λ

8|y − η|2 +

2cΛ2

λ〈A−1

(y − η), y − η〉,


we have

∣∣∣∣∫Y

PJA−1NP t(y − η) · (∇YH(y)−∇YH(η))fµ(dy)

∣∣∣∣≤∫Y

|PJA−1NP t(y − η)||∇YH(y)−∇YH(η)|fµ(dy)

≤ λ

8

∫Y

|y − η|2fµ(dy) +2cΛ2

λ

∫Y

〈A−1(y − η), y − η〉fµ(dy)

=λ

8

∫Y


λ

1

N

∫X

〈A−1NP t(Px− η), NP t(Px− η)〉fµ(dx)

≤ λ

8

∫Y


λ

2

N

∫X

〈A−1(x−NP tη), (x−NP tη)〉fµ(dx)

+2cΛ2

λ

2

N

∫X

〈A−1(idX −NP tP )x, (idX −NP tP )x〉fµ(dx)

≤ λ

8

∫Y


λΘ(t) +

4cγΛ2

λNM2

∫|x|2fµ(dx). (8.39)

Next we estimate the 6th term.


=

∫P tA∇YH(η) · A−1(idX −NP tP )xfµ(dx)

≤(∫

P tA∇YH(η) · A−1NP tA∇YH(η)fµ(dx)

) 12

×(1

N

∫(idX −NP tP )x · A−1(idX −NP tP )xfµ(dx)

) 12

.

Since

P tA∇YH(η) · A−1NP tA∇YH(η) = A∇YH(η) · PA−1NP tA∇YH(η) = A∇YH(η) · ∇YH,

and from (8.28), we have

∫A∇YH(η)·PA−1(idX−NP tP )xfµ(dx) ≤

(A∇YH(η) · ∇YH(η)

) 12

(γ

NM2

∫|x|2fµ(dx)

) 12

.

(8.40)


Next, we estimate the 7th term.

∣∣∣∣∫ PJA−1NP t(y − η) · P covµ(dx|y)(f,∇H)µ(dy)

∣∣∣∣≤(∫|PJA−1NP t(y − η)|2fµ(dy) ·

∫1

f|P covµ(dx|y)|2Y µ(dy)

) 12

(8.26),(8.29)

≤(

2c

τγκ2

ρ2

1

M2


∫1


) 12

≤ 2cγκ2

τλρ2M2

∫1


λ

8

∫|y − η|2fµ(dy). (8.41)

For the 8th term, we have

∫PJA−1(idX −NP tP )x · P∇fµ(dx) =

1

N

∫NP tPJA−1(idX −NP tP )x∇fµ(dx)

≤(∫

1

NNP tPJA−1(idX −NP tP )x · A−1NP tPJA−1(idX −NP tP )xfµ(dx)

) 12

×(∫1


) 12

.

Since

〈NP tPJA−1(idX −NP tP )x,A−1NP tPJA−1(idX −NP tP )x〉

≤ 1

τ〈NP tPJA−1(idX −NP tP )x,NP tPJA−1(idX −NP tP )x〉

≤ 1

τ〈JA−1(idX −NP tP )x, JA−1(idX −NP tP )x〉

=1

τ〈−J2A−1(idX −NP tP )x,A−1(idX −NP tP )x〉

≤ c

τ〈(idX −NP tP )x,A−1(idX −NP tP )x〉

(8.28)

≤ cγ

τM2|x|2,

we obtain

∫PJA−1(idX−NP tP )x·P∇fµ(dx) ≤

(cγ

τM2

∫1

N|x|2fµ(dx) ·

∫1


) 12

.

(8.42)


Next we estimate the 9th term. Set z = JA−1(x−NP tη). We have

1

N

∫(idX −NP tP )JA−1(x−NP tη) · ∇fµ(dx) =

1

N

∫(idX −NP tP )z · ∇fµ(dx)

≤(∫

1

N(idX −NP tP )z · A−1(idX −NP tP )zfµ(dx)

∫1


) 12

(8.28)

≤(

γ

M2

∫1


∫1

N|z|2fµ(dx)

) 12

.

We estimate the second integral inside the parentheses. It holds that

|z|2 = 〈JA−1(x−NP tη), JA−1(x−NP tη)〉= 〈−J2A−1(x−NP tη), A−1(x−NP tη)〉(ix)

≤ c〈A−1(x−NP tη), x−NP tη〉≤ 2c

(〈A−1NP t(Px− η), NP t(Px− η)〉+ 〈A−1(idX −NP tP )x, (idX −NP tP )x〉

)(8.28)

≤ 2c

(1

τ|NP t(Px− η)|2 +

γ

M2|x|2)

= 2c

(N

τ|Px− η|2 +

γ

M2|x|2).

Therefore,

1

N

∫(idX −NP tP )JA−1(x−NP tη) · ∇fµ(dx)

≤(

γ

M2

∫1


) 12(

2c

τ

∫|y − η|2fµ(dy) +

2cγ

M2N

∫|x|2fµ(dx)

) 12

≤ 4γc

M2λτ

∫1


λ

8


+

(γ

M2

∫1


) 12(

2cγ

M2N

∫|x|2fµ(dx)

) 12

. (8.43)

Finally, we estimate the 10th term. Since

〈PA−1NP tJ∇H(η), J∇H(η)〉= 〈PJA−1NP t∇H(η), APJA−1NP t∇H(η)〉≤ c|∇H(η)|2

≤ c

τ〈A∇Y H(η),∇Y H(η)〉,


we have ∣∣∣∣∫ A−1(idX −NP tP )x · P tJ∇YH(η)fµ(dx)

∣∣∣∣≤(∫

P tJ∇YH(η) · A−1NP tJ∇YH(η)fµ(dx)

) 12

× (8.44)(∫1

N(idX −NP tP )x · A−1(idX −NP tP )xfµ(dx)

) 12

≤( cτA∇YH(η) · ∇YH(η)

) 12

(γ

NM2

∫|x|2fµ(dx)

) 12

. (8.45)

Summing up from (8.36) to (8.45), we obtain

d

dtΘ(t)− 8cΛ2

λΘ(t) +

λ

8


≤ M

N+

4cγΛ2

2λNM2

∫|x|2fµ(dx)

+

(γκ2

2λρ2M2+

2cγκ2

τλρ2M2+

4γc

λτM2

)∫1


+

√γ

M

(∫1

N|x|2fµ(dx)

) 12

[(1 +

√c

τ+

√2cγ

M

)(∫1


) 12

×(1 +

√c

τ

)(A∇YH(η) · ∇YH(η)

) 12

].

Proof of Theorem 8.2.1. Denote by R(t) the right hand side of (8.35). Set D = 8cΛ2

λ.

For any 0 < t ≤ T , we have

d

dt

(e−DtΘ(t)

)+ e−DT

λ

8

∫|y − η|2f µ(dy) ≤ d

dt

(e−DtΘ(t)

)+ e−Dt

λ

8

∫|y − η|2f µ(dy)

≤ e−DtS(t) ≤ S(t). (8.46)

Integrating (8.35) with respect to time, for any 0 < t ≤ T , we have

e−DTΘ(t) +λ

8e−DT

∫ T

0

|y − η|2f µ(dy) ≤ e−DtΘ(t) +λ

8e−DT

∫ T

0

|y − η|2f µ(dy)

≤ Θ(0) +

∫ T

0

S(t)dt. (8.47)


It follows that for any T > 0

max

supt∈(0,T )

Θ(t),λ

8

∫ T

0

∫Y

|y − η|2f µ(dy)

≤ eDT

(Θ(0) +

∫ T

0

S(t)dt

). (8.48)

It remains to take care of each term in the right hand side of (8.48). Let a, b > 0 betwo constants.∫ T

0

∫1

N|x|2f(t, x)µ(dx)dt

(8.32)

≤ 2

(α +

2C1

ρ

)T ;∫ T

0

∫1

Nf∇ · A∇fµ(dx)dt

(8.30)

≤ C1;∫ T

0

(∫1

N|x|2fµ(dx)

) 12

(a

(∫1


) 12

+ b(A∇Y H(η) · ∇Y H(η)

) 12

)dt

≤(∫ T

0

∫1

N|x|2fµ(dx)dt

) 12

×(a

(∫ T

0

∫1

N∇f · A∇fµ(dx)dt

) 12

+ b

(∫ T

0

A∇Y H(η) · ∇Y H(η)dt

) 12

)

≤√

2T

(α +

2C1

ρ

) 12 (a√C1 +

√2b(H(η0)−H(ηT )) + CT (1 + eCTH(η0))

12

).

Substituting these estimate to (8.48) concludes the proof of Theorem 8.2.1.

8.3.2 Sketch of proof of Theorem 8.2.5

In this section, we give the main arguments of the proof of Theorem 8.2.5, which exactlyfollows the method of [Fat13].

• First, we decompose the relative entropy with respect to the local Gibbs state into amacroscopic component and a fluctuations component. Since G(x) only depends onthe macroscopic profile y = Px, we have

EntGµ(fµ) = EntGµ(f µ) +

∫Y

Entµ(dx|y)(fµ)G(y)µ(dy).

• The fluctuations component∫ T

0

∫Y

Entµ(dx|y)(fµ)G(y)µ(dy)dt can be bounded usingthe logarithmic Sobolev inequality for µ(dx|y), assumption (vi) and the bound onthe microscopic entropy production of Lemma 8.3.3.


• For the macroscopic component, since Gµ is log-concave, we can use the HWI in-equality of [OV00], which states that

EntGµ(f µ) ≤ W2(f µ, Gµ)√IGµ(f µ),

where the Wasserstein distance W2 is taken with respect to the norm | · |Y , and I isthe Fisher information

IGµ(f µ) :=

∫|∇(f/G)|2

f/GGdµ.

As a consequence, to obtain convergence in relative entropy, we only require conver-gence in Wasserstein distance and a bound on the Fisher information.

• We already have a bound on∫ T

0W2(f µ, δη(t))

2dt from Theorem 8.2.1. Moreover,

W2(Gµ, δη)2 ≤ M

λN

by Proposition 4.1 of [Fat13]. A bound on∫ T

0W2(f µ, Gµ)dt immediately follows

from the triangle inequality.

• Finally, the time-integral of the Fisher information can be bounded using the boundson the entropy production of Lemma 8.3.3. This concludes the proof of (a).

• (b) can be deduced from (a) using elementary inequalities and the bound∣∣∣∣ 1

N

∫Φ(Gη)dµ− H(η)

∣∣∣∣≤ (M − 1)

2Nmax

(∣∣∣∣log

(Γ(Y, | · |Y )2/(M−1)

ΛN

)∣∣∣∣ , ∣∣∣∣log

(Γ(Y, | · |Y )2/(M−1)

λN

)∣∣∣∣)+

√M

λN|∇H(η)|Y ,

which was proven in Proposition 4.1 of [Fat13].

8.4 Application part

In this section, we prove Theorem 8.2.6. First, we give a precise definition of the notionof weak solution to the limiting equation (8.14).

Definition 8.4.1. ζ = ζ(t, θ) is called a weak solution of (8.14) on [0, T ]× T1 if

ζ ∈ L∞(L2),∂ζ

∂t∈ L2(H−1), ϕ′(ζ) ∈ L2(L2), (8.49)

8.4. Application part 183

and⟨g,∂ζ

∂t

⟩H−1

= −∫T1

gϕ′(ζ) dθ+

∫T1

Gϕ′(ζ) dθ, for all g ∈ L2(T1), for almost every t ∈ [0, T ],

(8.50)where G is the (unique up to a set of Lebesgue measure 0) function on the torus such that∫T1 Gdθ = 0 and G′ = g.

As in Corollary 8.2.3, consider a sequence M`, N`∞`=1 such that

M` ↑ ∞; N` ↑ ∞;N`

M`

↑ ∞.

Let η`0 be a step-function approximation of ζ0, such that

||η`0 − ζ0||L2 −→`↑∞

0. (8.51)

Consider η` the solutions to

dη`

dt= −(A+ J)∇Y H(η`), η`(0) = η`0.

To obtain Theorem 8.2.6 from Theorem 8.2.3, we shall need to study the convergenceof the sequence η`. It is given by the following result.

Proposition 8.4.2. With the notations above, the sequence of step functions η` convergestrongly in L∞(H−1) to the unique weak solution of (8.14) with initial condition ζ0.

The key estimate which will allow us to pass to the limit is the fact that, when N goesto infinity, the Euclidean product associated to A−1 behaves like the H−1 norm. This isthe content of the following lemma :

Lemma 8.4.3. There exists C < +∞ such that, for any x ∈ X, if x is the associated stepfunction, then

1

C||x||2H−1 ≤

1

N〈A−1x, x〉 ≤ C||x||2H−1 .

Moreover, if x is bounded in L2, then∣∣∣∣||x||2H−1 −1

N〈A−1x, x〉

∣∣∣∣ ≤ C

N.

These estimates have been proven in section 6.3 of [GOVW09].

We delay the proof of Proposition 8.4.2, and first prove Theorem 8.2.6


Proof of Theorem 8.2.6. Our aim is to apply Corollary 8.2.3. To do this, we need to checkthat assumptions (i) to (x) hold with uniform constants. Assumptions (i) to (vii) havebeen checked in [GOVW09], and assumption (viii) in [Fat13]. Assumption (x) can beimmediately check by the direct computation of JA and AJ . Finally, it is easy to see thatfor any x ∈ X, we have

〈−J2x, x〉 = |Jx|2

=N2

4

N∑i=1

(xi+1 − xi−1)2

≤ N2

4

N∑i=1

2(xi+1 − xi)2 + 2(xi − xi−1)2

= N2

N∑i=1

(xi+1 − xi)2

= 〈Ax, x〉, (8.52)

and therefore assumption (ix) holds with c = 1.Applying Corollary 8.2.3, we get

lim`↑∞

sup0≤t≤T

∫〈(x−NP tη`(t)), A−1(x−NP tη`(t))〉f(t, x)µ(dx) = 0.

By Lemma 8.4.3, this implies

lim`↑∞

sup0≤t≤T

∫||x− η`(t)||2H−1f(t, x)µ(dx) = 0.

Applying Proposition 8.4.2 and using the triangle inequality then concludes the proof.

We now turn to the proof of Proposition 8.4.2. It is based on the following six lemmas,and closely follows the method of [GOVW09], with additional arguments to take intoaccount the extra first-order term.

Lemma 8.4.4. Assume H is convex. Then η satisfies (8.5) with initial condition η(0) = η0

if and only if

2

∫ T

0

H(η)β(t)dt ≤∫ T

0

[H(η + g) + H(η − PA−1NJP tg)

]β(t)dt−

∫ T

0

〈g, (A)−1η〉Y β(t)dt,

(8.53)for all g ∈ Y and smooth β : [0, T ]→ [0,∞).Similarly, assume that ϕ is convex. Then ζ satisfies (8.50) if and only if

2

∫ T

0

∫T1

ϕ(ζ(t, θ))β(t) dθ dt

≤∫ T

0

∫T1

[ϕ(ζ(t, θ) + g(θ)) + ϕ(ζ(t, θ)−G(θ))] β(t) dθ dt−∫ T

0

〈g(·), ζ(t, ·)〉H−1 β(t)dt,

(8.54)


for all g ∈ L2T1 and smooth β : [0, T ] → [0,∞), where G is the (unique up to a set ofLebesgue measure 0) function on the torus such that

∫T1 Gdθ = 0 and G′ = g.

Proof. The proof of this Lemma is modified from that of Lemma 36 in [GOVW09]. We showthat (8.5) is equivalent to (8.53). The equivalence of (8.50) and (8.54) follows analogously.The weak form of (8.5) is given by∫ T

0

〈g, (A)−1η〉Y β(t)dt =

∫ T

0

[〈g,∇Y H(η)〉Y − 〈PA−1NJP tg,∇Y H(η)〉Y

]β(t) dt, (8.55)

for all g ∈ Y and smooth β : [0, T ] → [0,∞). We now show that (8.55) implies (8.53).Since H is convex, we have

〈g − PA−1NJP tg,∇Y H(η)〉Y ≤ (H(η + g)− H(η)) + (H(η − PA−1NJP tg)− H(η))

= −2H(η) + H(η + g) + H(η − PA−1NJP tg). (8.56)

Substituting (8.56) into (8.55), we obtain (8.53)∫ T

0

〈g, (A)−1η〉Y β(t)dt

≤ −2

∫ T

0

H(η)β(t) dt+

∫ T

0

[H(η + g) + H(η − PA−1NJP tg)

]β(t)dt. (8.57)

Next we show (8.53) implies (8.55). Take g = εg in (8.55), for some ε > 0 and g ∈ Y , weget∫ T

0

〈g, (A)−1η〉Y β(t)dt ≤∫ T

0

[H(η + εg)− H(η)

ε+H(η − εPA−1NJP tg)− H(η)

ε

]β(t) dt.

By passing to the limit ε→ 0, we get∫ T

0

〈g, (A)−1η〉Y β(t)dt ≤∫ T

0

〈g − PA−1NJP tg,∇Y H(η)〉Y β(t) dt.

Similarly now by taking g = −εg, we obtain the opposite inequality∫ T

0

〈g, (A)−1η〉Y β(t)dt ≥∫ T

0

〈g − PA−1NJP tg,∇Y H(η)〉Y β(t) dt.

Thus (8.55) is proven.

Lemma 8.4.5. Let η`∞l=1 be a sequence of solutions of (8.5) with initial data η`0 satisfying‖η`0‖L2 ≤ C. There exists a constant C independent of l such that∫ T

0

⟨dη`

dt(t), (A)−1dη

`

dt(t)

⟩dt ≤ C, (8.58)

supt∈[0,T ]

〈η`(t), η`(t)〉Y ≤ C. (8.59)


As a consequence, there is a subsequence of the sequence of the associated step functionsη` and a function η∗ such that

η` η∗ weak-* in L∞(L2) = (L1(L2))∗.

Proof. According to proof of (8.31), we have

H(η`(t)) ≤ eC(T+1)H(η`0) for all t ∈ [0, T ]. (8.60)

Since H is strictly convex, we obtain

〈η`(t), η`(t)〉Y ≤ C(H(η`(t)) + 1) ≤ CeC(T+1)H(η`0) ≤ C,

which is (8.59). Now we establish (8.58). From (8.5), we have

〈η`(t), (A)−1η`(t)〉 = 〈A(I + PJA−1NP t)∇Y H(η`(t)), (I + PJA−1NP t)∇Y H(η`(t))〉≤ 2(〈A∇Y H(η`(t)),∇Y H(η`(t))〉+ 〈APJA−1NP t∇Y H(η`(t)), PJA−1NP t∇Y H(η`(t))〉)(8.27)

≤ 2(〈A∇Y H(η`(t)),∇Y H(η`(t))〉+ c|∇Y H(η`(t))|2)

(iii)

≤ 2(〈A∇Y H(η`(t)),∇Y H(η`(t))〉+ C(H(η`(t)) + 1)).

Therefore,∫ T

0

〈η`(t), (A)−1η`(t)〉 dt ≤ 2

∫ T

0

(〈A∇Y H(η`(t)),∇Y H(η`(t))〉+ C(H(η`(t)) + 1)) dt

(8.31),(8.60)

≤ C,

which is (8.58).

Lemma 8.4.6. Let η`∞1 be a sequence of solutions of (8.5) satisfying Lemma 8.4.5. Wetake any subsequence that the associated step functions weak-* convergence in (L1(L2))∗ toa limit η∗. Then on any bounded time interval, we have

η∗ ∈ L∞(L2),∂η∗∂t∈ L2(H−1), ϕ′(η∗) ∈ L2(L2). (8.61)

Proof. Having the estimate in Lemma 8.4.5, the proof of this Lemma is the same as thatof Lemma 35 in [GOVW09]; hence we omit it here.

Lemma 8.4.7. If g` → g strongly in H−1(T), then −PA−1JNP tg` → G strongly in L2(T)where G is the primitive of g.

Proof. Set

D = N

1 −1

1 −1. . . . . . . . .

1 −1−1 1

, (8.62)


then we can write

A = DDT , J =1

2(DT −D).

Hence

JA−1 =1

2(DT −D)(DTD)−1 =

1

2(DT −D)D−1(DT )−1 =

1

2(D−1 − (DT )−1). (8.63)

The inverse of D and DT can be computed explicitly

D−1 =1

2N

1

1 11

-1 11

, (DT )−1 = (D−1)T .

So we obtain

D−1 − (DT )−1 =1

N

0

0 10

-1 00

, (8.64)

Let ξ =

ξ1...ξM

∈ Y = RM be given. We now compute PA−1JNP tξ explicitly in three

steps.

First, by definition of P t, we have

NP tξ =

ξ1...ξ1

ξ2...ξ2...ξM...ξM

∈ RN = RKM . (8.65)


Second, from (8.63), (8.64) and (8.65), we have

A−1JNP tξ =1

2N

K(ξ1 + · · ·+ ξM)(K − 1)ξ1 +K(ξ2 + · · ·+ ξM)

...K(ξ2 + · · ·+ ξM)

(K − 1)ξ2 +K(ξ3 + · · ·+ ξM)...

K(ξ3 + · · ·+ ξM)...ξM

− 1

2N

ξ1

2ξ1...

Kξ1

Kξ1 + ξ2...

K(ξ1 + · · ·+ ξM)

.

Therefore, by definition of P ,

PA−1JNP tξ =1

2M

ξ1 + ξ2 + · · ·+ ξMξ2 + · · ·+ ξM

...ξM

− 1

2M

ξ1

ξ1 + ξ2...

ξ1 + ξ2 + · · ·+ ξM

= − 1

2M

ξ1

2ξ1 + ξ2...

2(ξ1 + · · ·+ ξM−1) + ξM

.

This implies that −PA−1JNP tξ = Υξ, where Υξ is the primitive of ξ. The assertion thenfollows since

(g` → g strongly in H−1(T)) ⇐⇒ (Υg` → Υg ≡ G strongly in L2(T)).

Lemma 8.4.8. Suppose that the sequence η` satisfies (8.58), (8.59) and (8.53), and con-sider a subsequence such that

η` η∗ weak-* in L∞(L2) = (L1(L2))∗.

holds. Let ξ` = π`(ξ+η`)−η`, where ξ is an arbitrary L2 function and π` is the L2-projection

onto elements of Y . Let Ξ be the primitive with average 0 of ξ. Then we have(i)

lim inf`

∫ T

0

H(η`(t))β(t)dt ≥∫ T

0

∫Tϕ(η∗(t, θ))β(t)dθdt;

(ii)

lim`

∫ T

0

H(η`(t) + ξ`(t))β(t)dt =

∫ T

0

∫Tϕ(η∗(t, θ) + ξ(θ))β(t)dθdt;


(iii)

lim`

∫ T

0

H(η`(t)− PA−1JNP tξ`(t))β(t)dt =

∫ T

0

∫Tϕ(η∗(t, θ)− Ξ(θ))β(t)dθdt;

(iv)

lim`

∫ T

0

〈ξ`(t), A−1η`(t)〉Y β(t)dt =

∫ T

0

〈ξ(θ), η∗(t, θ)〉H−1 β(t)dt.

Proof. (i), (ii) and (iv) have already been proven in Lemma 37 of [GOVW09], so we onlyhave to prove (iii).

Since η` converges to η∗ and PA−1JNP tξ`(t) converges to Ξ, by weak lower-semi con-tinuity and the uniform convergence of ψK to ϕ we immediately get

lim inf`

∫ T

0

H(η`(t)− PA−1JNP tξ`(t))β(t)dt ≥∫ T

0

∫Tϕ(η∗(t, θ)− Ξ(θ))β(t)dθdt,

so we only need to prove the associated upper bound. Let g`(t) be a sequence of elementsof Y such that g` strongly converges in L∞(L2) to η∗ − Ξ. Since we then have∫ T

0

H(g`(t))β(t)dt −→∫ T

0

∫Tϕ(η∗(t, θ)− Ξ(θ))β(t)dθdt,

we only need to show that

lim sup`

∫ T

0

H(η`(t)− PA−1JNP tξ`(t))β(t)dt−∫ T

0

H(g`(t))β(t)dt ≤ 0.

Let AM be the discrete Laplacian with scaling factor M2 on Y . Since ψK is convex, wehave

H(η`(t)− PA−1JNP tξ`(t))− H(g`(t)) =1

M

M∑i=1

ψK(ηì − (PA−1JNP tξ`)i)− ψK(gì )

≤ 1

M

M∑i=1

ψ′K(ηì − (PA−1JNP tξ`)i)(ηì − (PA−1JNP tξ`)i − gì )

= 〈∇H(η` − PA−1JNP tξ`), (η` − PA−1JNP tξ` − g`)〉Y≤ 〈AM∇H(η` − PA−1NP tξ`),∇H(η` − PA−1NP tξ`)〉1/2Y

× 〈A−1M (η` − PA−1NP tξ` − g`), (η` − PA−1NP tξ` − g`)〉1/2Y .

Since 〈A−1M ·, ·〉Y behaves like the H−1 norm, the fact that η` − PA−1NP tξ` and g`

converge to the same limit in L∞(H−1) implies that

〈A−1M (η` − PA−1NP tξ` − g`), (η` − PA−1NP tξ` − g`)〉Y −→ 0,


and therefore it will be enough to show that∫ T

0

〈AM∇H(η` − PA−1NP tξ`),∇H(η` − PA−1NP tξ`)〉Y dt < C.

Since under our assumptions ψ′K is bi-Lipschitz, we have

〈AM∇H(η` − PA−1NP tξ`),∇H(η` − PA−1NP tξ`)〉Y

= M

M∑i=1

(ψ′K(ηì+1 − (PA−1NP tξ`)i+1)− ψ′K(ηì − (PA−1NP tξ`)i))2

≤ CM

M∑i=1

(ηì+1 − (PA−1NP tξ`)i+1 − (ηì − (PA−1NP tξ`)i))2

≤ CM

M∑i=1

(ηì+1 − ηì )2 + ((PA−1NP tξ`)i+1 − (PA−1NP tξ`)i)2

≤ CMM∑i=1

(ψ′K(ηì+1)− ψ′K(ηì ))2 +

C

M

M∑i=1

(ξì+1 − ξì )2

≤ C〈AM∇H(η`), H(η`)〉+ C||ξ`||2L2 .

Since ξ` converges in L2, ||ξ`||2L2 is bounded. To conclude, we then only require (8.58) andthe fact that

〈AMy, y〉 ≤ C〈Ay, y〉 ∀y ∈ Y. (8.66)

This statement is equivalent to bounding from below A−1M by A−1. This does hold, since

we have

〈A−1y, y〉Y =1

N〈A−1NP ty,NP ty〉X

≤ C|| ¯NP ty||2H−1

≤ C||y||2H−1

≤ C〈A−1M y, y〉Y ,

which concludes the proof.

Finally, we need to prove uniqueness of solutions to the limiting PDE :

Lemma 8.4.9. Given an initial condition ζ0, there is at most one solution to (8.14).

Proof. Let ζ1 and ζ2 be two solutions of (8.14) with same initial condition. Let F (t) :=2−1||ζ1(t, ·)− ζ1(t, ·)||2H−1 , and let let g1 and g2 be mean-zero primitives (in space) of ζ1 and


ζ2. Then, for any λ > 0,

F ′(t) = −∫T

(ϕ′(ζ1)− ϕ′(ζ2))(ζ1 − ζ2)dθ +

∫T

(ϕ′(ζ1)− ϕ′(ζ2))(g1 − g2)dθ

≤ − inf ϕ′′

2

∫T

(ζ1 − ζ2)2dθ +λ

2

∫T

(ϕ′(ζ1)− ϕ′(ζ2))2dθ

+1

2λ

∫(g1 − g2)2dθ

≤ − inf ϕ′′

2

∫T

(ζ1 − ζ2)2dθ +λ supϕ′′

2

∫T

(ζ1 − ζ2)2dθ +1

λF (t).

Taking λ = inf ϕ′′

supϕ′′, we obtain a differential inequality which, by Gronwall’s lemma, implies

that ζ1 = ζ2.

We can now prove Proposition 8.4.2:

Proof of Proposition 8.4.2. According to Lemma 8.4.5, we can consider a subsequence suchthat

η` η∗ weak-* in L∞(L2) = (L1(L2))∗.

and strongly in L∞(H−1). By Lemma 8.4.6, η∗ satisfies (8.49). According to Lemma 8.4.4,η` satisfies (8.53). Passing to the limit using Lemma 8.4.8, we see that η∗ satisfies (8.54),and therefore is a weak solution of (8.14).

Since Lemma 8.4.9 guarantees uniqueness of the weak solution, the full sequence (η`)`converges to the unique weak solution of (8.14).

Summary

Large deviation and variational approaches to general-

ized gradient flows

The present thesis deals with mathematical analysis of partial differential equationsthat are generalized gradient flows. We use this terminology to indicate that there natu-rally exists a functional that decreases in time along solutions of these equations akin toentropy/free-energy for classical gradient flows. Some equations also contain additionalconservative effects. Moreover, these equations often depend on a certain small parameterdenoted here by ε.

The thesis consists of two parts. The first part is devoted to understand the generalizedgradient flow structures using the theory of large-deviation principle of underlying micro-scopic stochastic processes. In the second part of the thesis, we are interested in derivingthe limiting systems as ε→ 0. This procedure is often known as coarse-graining.

The two parts are connected. Studying and exploiting the relation between generalizedgradient flows and large-deviation principle in the first part provides new understandingand techniques for coarse-graining in the second part.

In Chapter 2, we construct approximation schemes for a generalized Kramers equation.The cost functionals in the schemes are inspired by the rate functional in the Freidlin-Wentzell theory of large deviation for a small-noise limit of a perturbed Hamiltonian sys-tem.

Chapter 3 provides a microscopic interpretation of the JKO-scheme for the Fokker-Planck equation. We show that the functional Kh is asymptotically equivalent to the con-ditional rate functional Jh from the large-deviation principle of a drift-diffusion stochasticparticle systems.

In Chapter 4, we show that the GENERIC structure of the (extended) Vlasov-Fokker-Planck equation can be derived from the (path-wise) rate functional of the large-deviationprinciple of a collection of weakly interacting inertial Brownian motions. We also suggesta variational formulation for a GENERIC system. This formulation will be used to studycoarse-graining in Chapter 7.

Chapter 5 generalizes the result in Chapter 3 to the porous medium equation. Weprove that, for the case of q-Gaussians on the real line, the functional derived by the JKO-discretization scheme is asymptotically equivalent to a rate-large-deviation-like functional.

193


In Chapter 6, we formally derive the thermo-visco-elasticity as the hydrodynamic of achain of one-dimensional harmonic oscillators.

Chapter 7 focuses on qualitative coarse-graining. We introduce a new method forcoarse-graining using the rate functional. We illustrate the method by two examples:the high friction limit of the Kramers equation and the small-noise limit of a perturbedHamiltonian system.

Finally, Chapter 8 extends the two-scale approach to the hydrodynamic limits for non-reversible dynamics.

Curriculum Vitae

Manh Hong Duong was born on 31/12/1982 in Bac Giang, Viet Nam. He obtainedhis Bachelor Degree (with distinction) in Informatics and Applied Mathematics in 2006 atthe Hanoi University of Science and Technology. From August 2006 to August 2008, hewas a researcher at the department of mathematical foundation of computer science of theHanoi Institute of Mathematics. From August 2008 to August 2010, he studied Industrialand Applied Mathematics in the Erasmus Mundus Master Program at the EindhovenUniversity of Technology and Kaiserslautern University of Technology. He obtained hisMSc. Degree (with Cum Laude) from this program in August 2010. He started his PhDproject as a Marie-Curie Fellowship funded by the European project Fronts and Interfacesin Science and Technology at the University of Bath from October 2010 till September2012. He continued in the same program from October 2012 till September 2013, and ina NWO-project from September 2013 to September 2014 at the Eindhoven University ofTechnology. The results of his PhD research are presented in this dissertation.

195

List of publications

Journal articles:

1. Manh Hong Duong, Mark A. Peletier, Johannes Zimmer. GENERIC formalism of aVlasov-Fokker-Planck equationand connection to Large Deviation Principle. Nonlin-earity 26, 2951-2971, 2013. 2

2. Manh Hong Duong, Mark A. Peletier, Johannes Zimmer. Conservative-dissipative ap-proximation schemes for a generalized Kramers equation. To appear in MathematicalMethods in the Applied Sciences, 2013.

3. Manh Hong Duong, Vaios Laschos, Michiel Renger. Wassersteingradient flows fromlarge deviations of thermodynamic limits. ESAIM: Control, Optimisation and Calculusof Variations, 19(4), 1166-1188, 2013.

Preprints/Work in progress:

1. Manh Hong Duong. Asymptotic equivalence of the discrete variational functional anda rate-large-deviation-like functional in the Wasserstein gradient flow of the porousmedium equation. Submitted, 2013. http://arxiv.org/abs/1307.5184.

2. Manh Hong Duong, Max Fathi. The two-scale approach to hydrodynamic limits fornon-reversible dynamics. Submitted, 2014. http://arxiv.org/abs/1404.1971.

3. Manh Hong Duong, Agnes Lamacz, Mark A. Peletier, Upanshu Sharma, Passing to thelimits in the Kramers equation using variational methods. In preparation, 2014.

4. Manh Hong Duong, Mark A. Peletier, Johannes Zimmer. Thermo-visco-elasticity equa-tion: microscopic model, large deviation principle and passing to the limits. Work inprogress, 2014.

Proceedings:

1. Mark Peletier, Manh Hong Duong, Upanshu Sharma. Coarse-graining and fluctuations:Two birds with one stone. Oberwolfach report of the workshop Material Theories, OWR2013/59, 2013.

2This paper has been selected by the journal for 2013 Highlights Collection.

197


2. Ted van der Aalst, Dee Denteneer, Hanna Doering, Manh Hong Duong, Ross J. Kang,Mike Keane, Janne Kool, Ivan Kryven, Thomas Meyfroyt, Tobias Mueller, Guus Regts,Jakub Tomczyk. Random disc thrower problem. Report in the Study Group Mathemat-ics with Industry, Leiden University, 2013.

3. Vaios Laschos, Manh Hong Duong, Michiel Renger. Wasserstein gradient flows fromlarge deviations of thermodynamic limits. Oberwolfach report of the workshop Interplayof Analysis and Probability in Physics, OWR, 2012/06, 2012.

Acknowledgments

During these four years as a PhD student at Bath and Eindhoven, I have met manycolleagues and friends. I would like to take this opportunity to thank all people whocontributed to my PhD’s life.

First of all, I express my sincere gratitude to my supervisors Prof. Mark Peletier andDr. Johannes Zimmer for the constant help and guidance throughout the course of myPhD. Without their help, this thesis would not have come out as this present shape. I havealso learned a lot from them as a father.

I would like to thank my other co-authors: Max Fathi, Dr. Agnes Lamacz, Dr. VaiosLaschos, Dr. Michiel Renger, and Upanshu Sharma. It have been great opportunities forme to collaborate with all of you. I wish the collaboration will be continued.

I thank all committee members for spending time to judge my thesis.I am grateful for many discussion with other colleagues and PhD fellows at Bath and

Eindhoven. Thank you all for making my PhD life enjoyable.I would like to take this chance to thank the FIRST program and the NWO for sup-

porting my PhD’s projects. I also thank many staffs in the Human Resources, and theSecretaries at both University of Bath and Eindhoven University of Technology for assistingme with administrative tasks.

Last but not least, I would like to thank my beloved family, my wife and my son. Theyare always a great source of inspiration for me. I dedicate this thesis to you.

199

Bibliography

[ADPZ11] S. Adams, N. Dirr, M. A. Peletier, and J. Zimmer. From a large-deviationsprinciple to the Wasserstein gradient flow: a new micro-macro passage. Com-munications in Mathematical Physics, 307:791–815, 2011.

[ADPZ13] S. Adams, N. Dirr, M. A. Peletier, and J. Zimmer. Large deviations and gra-dient flows. Philosophical Transactions of the Royal Society A, 371:20120341,2013.

[AG08] L. Ambrosio and W. Gangbo. Hamiltonian ODEs in the Wasserstein spaceof probability measures. Communications on Pure and Applied Mathematics,61(1):18–53, 2008.

[AGS08] L. Ambrosio, N. Gigli, and G. Savare. Gradient flows in metric spaces andin the space of probability measures. Lectures in Mathematics. ETH Zurich.Birkhauser, Basel, 2nd edition, 2008.

[AMP+12] S. Arnrich, A. Mielke, M. A. Peletier, G. Savare, and M. Veneroni. Passing tothe limit in a Wasserstein gradient flow: From diffusion to reaction. Calculusof Variations and Partial Differential Equations, 44:419–454, 2012.

[BB00] J. D. Benamou and Y. Brenier. A computational fluid mechanics solution tothe Monge-Kantorovich mass transfer problem. Numer. Math., 84(3):375–393,2000.

[BBO06] G. Basile, C. Bernardin, and S. Olla. Momentum conserving model withanomalous thermal conductivity in low dimensional systems. Phys. Rev. Lett.,96:204303, 2006.

[BD95] F. Bouchut and J. Dolbeault. On long time asymptotics of the Vlasov-Fokker-Planck equation and of the Vlasov-Poisson-Fokker-Planck systemwith Coulombic and Newtonian potentials. Differential Integral Equations,8(3):487–514, 1995.

[BDF12] A. Budhiraja, P. Dupuis, and M. Fischer. Large deviation properties of weaklyinteracting processes via weak convergence methods. The Annals of Probabil-ity, 40(1):74–102, 2012.

201

202 Bibliography

[Bra02] A. Braides. Γ−convergence for beginers. Oxford University Press, 2002.

[CH94] Z. M. Chen and K. H. Hoffmann. On a one-dimensional nonlinear thermo-viscoelastic model for structural phase transitions in shape memory alloys. J.Differential Equations, 112(2):325–350, 1994.

[Cha01] P. H. Chavanis. Kinetic theory of point vortices: diffusion coefficient andsystematic drift. Phys. Rev. E, 64:026309, Jul 2001.

[Cha03] P. H. Chavanis. Generalized thermodynamics and fokker-planck equations:Applications to stellar dynamics and two-dimensional turbulence. PhysicalReview E, 68:036–108, 2003.

[CL94] P. Cattiaux and C. Leonard. Minimization of the Kullback information of dif-fusion processes. Annales de l’Institut Henri Poincare. Probabilites et Statis-tiques, 30(1):83–132, 1994.

[CL95a] P. Cattiaux and C. Leonard. Correction to: “Minimization of the Kullbackinformation of diffusion processes” [Ann. Inst. H. Poincare Probab. Statist.30 (1994), no. 1, 83–132; MR1262893 (95d:60056)]. Ann. Inst. H. PoincareProbab. Statist., 31(4):705–707, 1995.

[CL95b] P. Cattiaux and C. Leonard. Large deviations and Nelson processes. ForumMathematicum, 7(1):95–116, 1995.

[CL12] X. Chen and J. G. Liu. Two nonlinear compactness theorems in Lp(0, T ;B).Applied Mathematics Letters, 25(12):2252–2257, 2012.

[CLL04] P. H. Chavanis, P. Laurencot, and M. Lemou. Chapman-Enskog derivation ofthe generalized smoluchowski equation. Physica A: Statistical Mechanics andits Applications, 341:145–164, 2004.

[CMV03] J. A. Carrillo, R. J. McCann, and C. Villani. Kinetic equilibration rates forgranular media and related equations: entropy dissipation and mass trans-portation estimates. Revista Matematica Iberoamericana, 19(3):971–1018,2003.

[CPR08] E. Caglioti, M. Pulvirenti, and F. Rousset. The 2D constrained Navier-Stokesequation and intermediate asymptotics. J. Phys. A, 41(34):344001, 9, 2008.

[CPR09] E. Caglioti, M. Pulvirenti, and F. Rousset. On a constrained 2-D Navier-Stokesequation. Comm. Math. Phys., 290(2):651–677, 2009.

[Csi75] I. Csiszar. I-divergence geometry of probability distributions and minimizationproblems. Ann. Probability, 3:146–158, 1975.

Bibliography 203

[CSR96] P. H. Chavanis, J. Sommeria, and R. Robert. Statistical mechanics of two di-mensional vortices and collisionless stellar systems. The Astrophysical Journal,471:385–399, 1996.

[DE97] P. Dupuis and R. S. Ellis. A weak convergence approach to the theory of largedeviations, volume 902. Wiley-Interscience, 1997.

[DF14] M. H. Duong and M. Fathi. The two-scale approach to hydrodynamic limitsfor non-reversible dynamics. http://arxiv.org/abs/1404.1971, 2014.

[DG87] D. A. Dawson and J. Gartner. Large deviations from the McKean-Vlasov limitfor weakly interacting diffusions. Stochastics, 20(4):247–308, 1987.

[DG89] D. A. Dawson and J. Gartner. Large deviations, free energy functional andquasi-potential for a mean field model of interacting diffusions. Mem. Amer.Math. Soc., 78(398):iv+94, 1989.

[DLPS14] M. H. Duong, A. Lamacz, M. A. Peletier, and U. Sharma. Passing to the limitsin the kramers equation using variational methods. In preparation, 2014.

[DLR13] M. H. Duong, V. Laschos, and D. R. M. Renger. Wasserstein gradient flowsfrom large deviations of many-particle limits. ESAIM Control Optim. Calc.Var., 19(4):1166–1188, 2013.

[DLZ12] N. Dirr, V. Laschos, and J. Zimmer. Upscaling from particle models to entropicgradient flows. J. Math. Phys., 53(6):063704, 9, 2012.

[DM10] F. Delarue and S. Menozzi. Density estimates for a random noise propagat-ing through a chain of differential equations. Journal of Functional Analysis,259(6):1577–1630, 2010.

[DMM10] B. During, D. Matthes, and J. Milisic. A gradient flow scheme for nonlinearfourth order equations. Discrete and Continuous Dynamical Systems. SeriesB. A Journal Bridging Mathematics and Sciences, 14(3):935–959, 2010.

[DPS13] M. H. Duong, M. A. Peletier, and U. Sharma. Coarse-graining and fluctua-tions: Two birds with one stone. Oberwolfach report of the workshop MaterialTheories, OWR 2013/59, 2013.

[DPZ13a] M. H. Duong, M. A. Peletier, and J. Zimmer. Conservative-dissipative approx-imation schemes for a generalized kramers equation. Mathematical Methods inthe Applied Sciences, To appear, 2013.

[DPZ13b] M. H. Duong, M. A. Peletier, and J. Zimmer. GENERIC formalism of aVlasov-Fokker-Planck equation and connection to large-deviation principles.Nonlinearity, 26(11):2951–2971, 2013.

204 Bibliography

[Dud89] R. M. Dudley. Real analysis and probability. Wadsworth & Brooks/Cole,Pacific Grove, CA, USA, 1989.

[Duo13] M. H. Duong. Asymptotic equivalence of the discrete variational functionaland a rate-large-deviation-like functional in the wasserstein gradient flow ofthe porous medium equation. http://arxiv.org/abs/1307.5184, 2013.

[DZ87] A. Dembo and O. Zeitouni. Large deviations techniques and applications,volume 38 of Stochastic modelling and applied probability. Springer, New York,NY, USA, 2nd edition, 1987.

[Fat13] M. Fathi. A two-scale approach to the hydrodynamic limit part II: local Gibbsbehavior. ALEA Lat. Am. J. Probab. Math. Stat., 10(2):625–651, 2013.

[Fis12] M. Fischer. On the form of the large deviation rate function for the empiricalmeasures of weakly interacting systems. http://arxiv.org/abs/1208.0472,2012.

[FK06] J. Feng and T. G. Kurtz. Large deviations for stochastic processes, volume 131of Mathematical Surveys and Monographs. American Mathematical Society,Providence, RI, 2006.

[FN12] J. Feng and T. Nguyen. Hamilton-Jacobi equations in space of measures asso-ciated with a system of conservation laws. J. Math. Pures Appl. (9), 97(4):318–390, 2012.

[FS13] J. Feng and A. Swi‘ech. Optimal control for a mixed flow of Hamiltonian and

gradient type in space of probability measures. Trans. Amer. Math. Soc.,365(8):3987–4039, 2013. With an appendix by Atanas Stefanov.

[FW94] M. I. Freidlin and A. D. Wentzell. Random perturbations of hamiltoniansystems. Mem. Amer. Math. Soc., 109 (523), 1994.

[FW98a] M. Freidlin and M. Weber. Random perturbations of nonlinear oscillators.The Annals of Probability, 26(3):925–967, 1998.

[FW98b] M. I. Freidlin and A. D. Wentzell. Random perturbations of dynamical systems,volume 260 of Grundlehren der Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences]. Springer-Verlag, New York, secondedition, 1998. Translated from the 1979 Russian original by Joseph Szucs.

[FW01] M. Freidlin and M. Weber. On random perturbations of hamiltonian systemswith many degrees of freedom. Stochastic processes and their applications,94(2):199–239, 2001.

Bibliography 205

[GOVW09] N. Grunewald, F. Otto, C. Villani, and M. G. Westdickenberg. A two-scale ap-proach to logarithmic Sobolev inequalities and the hydrodynamic limit. Ann.Inst. Henri Poincare Probab. Stat., 45(2):302–351, 2009.

[GPK12] B. D. Goddard, G. A. Pavliotis, and S. Kalliadasis. The overdamped limit ofdynamic density functional theory: rigorous results. Multiscale Model. Simul.,10(2):633–663, 2012.

[GPV88] M. Z. Guo, G. C. Papanicolaou, and S. R. S. Varadhan. Nonlinear diffusionlimit for a system with nearest neighbor interactions. Comm. Math. Phys.,118(1):31–59, 1988.

[GW09] W. Gangbo and M. Westdickenberg. Optimal transport for the system ofisentropic Euler equations. Communications in Partial Differential Equations,34(7-9):1041–1073, 2009.

[HKLR10] H. Holden, K. H. Karlsen, K.-A. Lie, and N. H. Risebro. Splitting Methods forPartial Differential Equations with Rough Solutions: Analysis and MATLABPrograms. European Mathematical Society, 2010.

[HT08a] M. Hutter and T. A. Tervoort. Finite anisotropic elasticity and material frameindifference from a nonequilibrium thermodynamics perspective. Journal ofNon-Newtonian Fluid Mechanics, 152:45 – 52, 2008.

[HT08b] M. Hutter and T. A. Tervoort. Thermodynamic considerations on non-isothermal finite anisotropic elasto-viscoplasticity. Journal of Non-NewtonianFluid Mechanics, 152:53–65, 2008.

[HTB90] P. Hanggi, P. Talkner, and M. Borkovec. Reaction-rate theory: fifty years afterkramers. Rev. Mod. Phys., 62:251–341, Apr 1990.

[Hua00] C. Huang. A variational principle for the Kramers equation with un-bounded external forces. Journal of Mathematical Analysis and Applications,250(1):333–367, 2000.

[IW81] N. Ikeda and S. Watanabe. Stochastic differential equations and diffusionprocesses. North-Holland, 1981.

[JKO97] R. Jordan, D. Kinderlehrer, and F. Otto. Free energy and the Fokker-Planckequation. Physica D. Nonlinear Phenomena, 107(2-4):265–271, 1997. Land-scape paradigms in physics and biology (Los Alamos, NM, 1996).

[JKO98] R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation of thefokker-planck equation. SIAM Journal on Mathematical Analysis, 29(1):1–17,1998.

206 Bibliography

[KL99] C. Kipnis and C. Landim. Scaling limits of interacting particle systems, vol-ume 320 of Grundlehren der Mathematischen Wissenschaften [FundamentalPrinciples of Mathematical Sciences]. Springer-Verlag, Berlin, 1999.

[Kos01] E. Kosygina. The behaviour of the specific entropy in the hydrodynamic scalinglimit for Ginzburg-Landau model. Markov Process. Related Fields, 7(3):383–417, 2001.

[Kra40] H. A. Kramers. Brownian motion in a field of force and the diffusion model ofchemical reactions. Physica, 7:284–304, 1940.

[KS91] I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus, vol-ume 113. Springer Verlag, 1991.

[Le08] N. Q. Le. A gamma-convergence approach to the Cahn-Hilliard equation. Calc.Var. Partial Differential Equations, 32(4):499–522, 2008.

[Leo07] C. Leonard. A large deviation approach to optimal transport. arxiv.org/

abs/0710.1461v1, 2007.

[Mel96] S. Meleard. Asymptotic behaviour of some interacting particle systems;McKean-Vlasov and Boltzmann models. In Probabilistic models for nonlin-ear partial differential equations (Montecatini Terme, 1995), volume 1627 ofLecture Notes in Math., pages 42–95. Springer, Berlin, 1996.

[Mie05] A. Mielke. Evolution of rate-independent systems. In Evolutionary equations.Vol. II, Handb. Differ. Equ., pages 461–559. Elsevier/North-Holland, Amster-dam, 2005.

[Mie11] A. Mielke. Formulation of thermoelastic dissipative material behavior usinggeneric. Continuum Mechanics and Thermodynamics, 23:233–256, 2011.

[MSZ03] J. Maly, D. Swanson, and W. P. Ziemer. The co-area formula for Sobolevmappings. Trans. Amer. Math. Soc., 355(2):477–492 (electronic), 2003.

[MTL02] A. Mielke, F. Theil, and V. I. Levitas. A variational formulation of rate-independent phase transformations using an extremum principle. Archive forRational Mechanics and Analysis, 162(2):137–177, 2002.

[Nel67] E. Nelson. Dynamical theories of Brownian motion. Princeton UniversityPress, Princeton, N.J., 1967.

[ØB03] Øksendal and K. Bernt. Stochastic Differential Equations: An Introductionwith Applications. Berlin: Springer. ISBN 3-540-04758-1, 2003.

[Oel84] K. Oelschlager. A martingale approach to the law of large numbers for weaklyinteracting stochastic processes. Ann. Probab., 12(2):458–479, 1984.

Bibliography 207

[OG97a] H. C. Ottinger and M. Grmela. Dynamics and thermodynamics of complexfluids. I. Development of a general formalism. Physical Review E. Statistical,Nonlinear, and Soft Matter Physics, 56(6):6620–6632, 1997.

[OG97b] H. C. Ottinger and M. Grmela. Dynamics and thermodynamics of complexfluids. II. Illustrations of a general formalism. Physical Review E. Statistical,Nonlinear, and Soft Matter Physics, 56(6):6633–6655, 1997.

[Ott01] F. Otto. The geometry of dissipative evolution equations: the porous mediumequation. Comm. Partial Differential Equations, 26(1-2):101–174, 2001.

[Ott05] H. C. Ottinger. Beyond equilibrium thermodynamics. Wiley-Interscience, 1stedition, 2005.

[OV00] F. Otto and C. Villani. Generalization of an inequality by Talagrand and linkswith the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361–400, 2000.

[OW10] A. Ohara and T. Wada. Information geometry of q-Gaussian densities andbehaviors of solutions to related diffusion equations. J. Phys. A, 43(3):035002,18, 2010.

[Pel14] M. A. Peletier. Variational modelling: Energies, gradient flows, and largedeviations. http://arxiv.org/abs/1402.1990, 2014.

[PRV13] M. A. Peletier, D. R. M. Renger, and M. Veneroni. Variational formulationof the Fokker-Planck equation with decay: a particle approach. Commun.Contemp. Math., 15(5):1350017, 43, 2013.

[PRV14] M. A. Peletier, F. Redig, and K. Vafayi. Large deviations in stochastic heat-conduction processes provide a gradient-flow structure for heat conduction.http://arxiv.org/abs/1403.4994, 2014.

[Ren13] D. R. M. Renger. Microscopic interpretation of Wasserstein gradient flows.PhD thesis, TU Eindhoven, 2013.

[Ris89] H. Risken. The Fokker-Planck equation, volume 18 of Springer Series in Syn-ergetics. Springer-Verlag, Berlin, second edition, 1989. Methods of solutionand applications.

[Rud73] W. Rudin. Functional Analysis. McGraw-Hill, New York, NY, USA, 1973.

[Ser09] S. Serfaty. Gamma-convergence of gradient flows on hilbert and met-ric spaces and applications. http://www.math.nyu.edu/faculty/serfaty/

gcv-erice2.pdf, 2009.

[Sim86] J. Simon. Compact sets in the space Lp(0, T ;B). Annali di Matematica Puraed Applicata, 146(1):65–96, 1986.

208 Bibliography

[SS04] E. Sandier and S. Serfaty. Gamma-convergence of gradient flows with applica-tions to Ginzburg-Landau. Communications on Pure and Applied Mathemat-ics, 57(12):1627–1672, 2004.

[Ste08] U. Stefanelli. The Brezis–Ekeland principle for doubly nonlinear equations.SIAM Journal on Control and Optimization, 47:1615, 2008.

[Tak12] A. Takatsu. Wasserstein geometry of porous medium equation. Ann. Inst. H.Poincare Anal. Non Lineaire, 29(2):217–232, 2012.

[Vil03] C. Villani. Topics in optimal transportation, volume 58 of Graduate Studiesin Mathematics. American Mathematical Society, Providence, RI, 2003.

[Vil09] C. Villani. Optimal transport: old and new. Springer Verlag, 2009.

[vR12] M. K. von Renesse. An optimal transport view of Schrodinger’s equation.Canad. Math. Bull., 55(4):858–869, 2012.

[Wes10] M. Westdickenberg. Projections onto the cone of optimal transport mapsand compressible fluid flows. Journal of Hyperbolic Differential Equations,7(4):605–649, 2010.

[Wil76] G. Wilemski. On the derivation of smoluchowski equations with correctionsin the classical theory of brownian motion. Journal of Statistical Physics,14(2):153–169, 1976.

[Wu01] L. Wu. Large and moderate deviations and exponential convergence forstochastic damping Hamiltonian systems. Stochastic processes and their ap-plications, 91(2):205–238, 2001.

[Yau91] H. T. Yau. Relative entropy and hydrodynamics of Ginzburg-Landau models.Lett. Math. Phys., 22(1):63–80, 1991.

Eindhoven University of Technology · Large deviation and variational approaches to generalized...

Documents

Transcript of Eindhoven University of Technology · Large deviation and variational approaches to generalized...