L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf ·...

25
L. Vandenberghe ECE236C (Spring 2020) 14. Newton’s method Kantorovich theorem inexact Newton method 14.1

Transcript of L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf ·...

Page 1: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

L. Vandenberghe ECE236C (Spring 2020)

14. Newton’s method

• Kantorovich theorem

• inexact Newton method

14.1

Page 2: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Newton’s method for nonlinear equations

Newton iteration for solving a nonlinear equation f (x) = 0:

x(k+1) = x(k) − f ′(x(k))−1 f (x(k))

• f : Rn → Rn is a vector valued function f (x) = ( f1(x), . . . , fn(x))• f ′(x) is the n × n Jacobian matrix at x:

( f ′(x))i j =∂ fi∂x j(x), i, j = 1, . . . ,n

• x(k+1) is the solution of the linearized equation at x(k):

f (x(k)) + f ′(x(k))(x − x(k)) = 0

we denote the iterates by the simpler notation xk = x(k) if the meaning is clear

Newton’s method 14.2

Page 3: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Matrix norm

in this lecture, operator norms are used for square matrices:

‖A‖ = supx,0

‖Ax‖‖x‖

the same (arbitrary) vector norm is used for ‖Ax‖ and ‖x‖

Properties (A, B are n × n matrices and x is an n-vector)

• identity matrix: ‖I ‖ = 1

• matrix-vector product: ‖Ax‖ ≤ ‖A‖‖x‖• submultiplicative property: ‖AB‖ ≤ ‖A‖‖B‖• perturbation lemma: if A is invertible and ‖A−1B‖ < 1, then A + B is invertible,

‖(A + B)−1‖ ≤ ‖A−1‖1 − ‖A−1B‖

Newton’s method 14.3

Page 4: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Proof of perturbation lemma

• A + B is invertible: if (A + B)x = 0, then

‖x‖ = ‖A−1Bx‖ ≤ ‖A−1B‖‖x‖

if ‖A−1B‖ < 1 this is only possible if x = 0

• Y = (A + B)−1 satisfies (I + A−1B)Y = A−1; therefore

‖Y ‖ = ‖A−1 − A−1BY ‖≤ ‖A−1‖ + ‖A−1BY ‖≤ ‖A−1‖ + ‖A−1B‖‖Y ‖

from which the inequality in the lemma follows

Newton’s method 14.4

Page 5: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Outline

• Kantorovich theorem

• inexact Newton method

Page 6: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Assumptions in Kantorovich theorem

• f : Rn → Rn is continuously differentiable on an open convex set D

• the Jacobian matrix f ′(x0) at the starting point x0 ∈ D is invertible

• the Jacobian is Lipschitz continuous on D: there exists a positive γ such that

‖ f ′(x0)−1( f ′(x) − f ′(y))‖ ≤ γ‖x − y‖ for all x, y ∈ D

• the norm η B ‖ f ′(x0)−1 f (x0)‖ of the first Newton step is bounded by

ηγ ≤ 12

• D contains the ball B(x0,r) = {x | ‖x − x0‖ ≤ r}, where

r =1 −

√1 − 2γηγ

Newton’s method 14.5

Page 7: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Kantorovich theorem

xk+1 = xk − f ′(xk)−1 f (xk), k = 0,1, . . .

under the assumptions on the previous page:

• the iteration is well defined, i.e., the Jacobian matrices f ′(xk) are invertible

• the iterates remain in B(x0,r)

• the iterates converge to a solution x? of f (x) = 0

• the following error bound holds:

‖xk − x?‖ ≤ (2γη)2k−1

2k−1 η

Newton’s method 14.6

Page 8: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Comments

• this is the affine-invariant version of the theorem: invariant under transformation

f (x) = A f (x), A nonsingular

• existence of a solution is not assumed but follows from the theorem

• complete theorem includes uniqueness of solution in a larger region

• theorem explains very fast local convergence; for example, if γη = 0.4

k (2γη)2k−1/2k−1

0 2.00000000000000001 0.80000000000000002 0.25600000000000003 0.05242880000000004 0.00439804651110405 0.00006189700196436 0.00000002451992877 0.0000000000000077

Newton’s method 14.7

Page 9: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Newton method for quadratic scalar equation

we first examine the convergence of Newton’s method applied to g(t) = 0, where

g(t) = γ2

t2 − t + η with γ > 0, η > 0, h B γη ≤ 12

• the roots will be denoted by t? and t??

t? =1 −√

1 − 2hγ

, t?? =1 +√

1 − 2hγ

• the Newton iteration started at t0 = 0 is

tk+1 =(γ/2)t2

k − ηγtk − 1

Newton’s method 14.8

Page 10: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Iteration map

Newton iteration can be written as tk+1 = ψ(tk) where

ψ(t) = (γ/2)t2 − η

γt − 1

0 t? 1 t?? 2

0

γ = 1, η = h = 3/8

g(t)ψ(t)

t

0 t? = t?? 2

0

γ = 1, η = h = 1/2

g(t)ψ(t)

t

Newton’s method 14.9

Page 11: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Recursions

to derive simple error bounds, we define gk(τ) as g(t) scaled and centered at tk :

gk(τ) =g(τ + tk)−g′(tk)

=γk

2τ2 − τ + ηk, k = 0,1, . . .

• coefficients γk , ηk , and hk = γkηk satisfy the recursions (see next page)

γk+1 =γk

1 − hk, ηk+1 =

hkηk

2(1 − hk), hk+1 =

h2k

2(1 − hk)2

with γ0 = γ, η0 = η, h0 = γη

• we denote the smallest root of gk by rk :

rk = t? − tk =1 − √1 − 2hk

γk=

2ηk

1 +√

1 − 2hk

• Newton step for gk at τ = 0 is equal to Newton step for g at t = tk :

ηk = tk+1 − tk

Newton’s method 14.10

Page 12: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Proof of recursions

• since g is quadratic,

γk

2τ2 − τ + ηk =

g(tk + τ)−g′(tk)

=(g′′(tk)/2)τ2 + g′(tk)τ + g(tk)

−g′(tk)

• recursion for γk :

γk+1 =g′′(tk+1)−g′(tk + ηk)

=g′′(tk)

−g′(tk) − g′′(tk)ηk=

γk

1 − γkηk=

γk

1 − hk

• recursion for ηk :

ηk+1 =g(tk + ηk)−g′(tk + ηk)

=(g′′(tk)/2)η2

k−g′(tk) − g′′(tk)ηk

=γkη

2k

2(1 − γkηk)=

hkηk

2(1 − hk)

• recursion for hk follows from hk+1 = γk+1ηk+1

Newton’s method 14.11

Page 13: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Error bounds

• Newton step ηk = tk+1 − tk :

ηk ≤(2h)2k−1

2kη

(see next page)

• error rk = t? − tk :

rk =2ηk

1 +√

1 − 2hk≤ 2ηk ≤

(2h)2k−1

2k−1 η

Newton’s method 14.12

Page 14: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Proof of bound on ηk

• since h0 ≤ 1/2 the recursion for hk shows that

2hk =h2

k−1(1 − hk−1)2

≤ (2hk−1)2

• applying this recursively we obtain 2hk ≤ (2h0)2k

• from the recursion for ηk (and hk ≤ 1/2):

ηk =hk−1ηk−1

2(1 − hk−1)≤ hk−1ηk−1

• applying this recursively and using the bound on hk we obtain the bound on ηk :

ηk ≤ hk−1 · · · h1h0η0

= 2−k(2h0)2k−1(2h0)2

k−2 · · · (2h0)2(2h0)η0

= 2−k(2h0)2k−1η0

Newton’s method 14.13

Page 15: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Summary of proof of Kantorovich theorem

to prove the Kantorovich theorem (pp. 14.5–14.6), we show that

‖xk+1 − xk ‖ ≤ tk+1 − tk

where tk are the iterates in Newton’s method, started at t0 = 0, for

γ

2t2 − t + η = 0

• tk is called a majorizing sequence for the sequence xk

• the bounds for tk on page 14.12 provide bounds and convergence results for xk

Newton’s method 14.14

Page 16: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Consequences of majorization

t0 = 0, ‖xk+1 − xk ‖ ≤ tk+1 − tk for k ≥ 0

• by the triangle inequality, if k ≥ j,

‖xk − x j ‖ ≤k−1∑i= j‖xi+1 − xi‖ ≤

k−1∑i= j(ti+1 − ti) = tk − t j

• the inequality shows that xk is a Cauchy sequence, so it converges

• taking j = 0 shows that xk remains in the set B(x0,r) (defined on page 14.5):

‖xk − x0‖ ≤ tk − t0 ≤ t? = r

• taking limits for k →∞ shows the error bound on page 14.6:

‖x? − x j ‖ ≤ t? − t j = r j ≤ (2h)2 j−1

2 j−1 η

Newton’s method 14.15

Page 17: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Details of proof of Kantorovich theorem

we prove that the following inequalities hold for k = 0,1, . . .

‖ f ′(xk+1)−1 f ′(xk)‖ ≤1

1 − hk(1)

‖ f ′(xk)−1( f ′(x) − f ′(y))‖ ≤ γk ‖x − y‖ for all x, y ∈ D (2)

‖ f ′(xk)−1 f (xk)‖ ≤ ηk (3)

B(xk+1,rk+1) ⊆ B(xk,rk) (4)

• γk , ηk , hk , rk are the sequences defined on page 14.10

• for k = 0, inequalities (2) and (3) hold by assumption, since γ0 = γ, η0 = η

• (3) is the majorization inequality ‖xk+1 − xk ‖ ≤ ηk = tk+1 − tk

Newton’s method 14.16

Page 18: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Proof by induction: suppose (2) and (3) hold at k = i, and (4) holds for k < i

• xi+1 ∈ D because B(xi,ri) ⊆ · · · ⊆ B(x0,r0) ⊆ D and ‖xi+1 − xi‖ ≤ ηi ≤ ri

• the inequality (2) at k = i implies that

‖ f ′(xi)−1 f ′(xi+1) − I ‖ = ‖ f ′(xi)−1( f ′(xi+1) − f ′(xi))‖≤ γi‖xi+1 − xi‖≤ γiηi

= hi

invertibility of f ′(xi+1) and (1) at k = i follow from the perturbation lemma

• inequality (2) at k = i + 1 follows from (1) and (2) at k = i:

‖ f ′(xi+1)−1( f ′(x) − f ′(y))‖ ≤ ‖ f ′(xi+1)−1 f ′(xi)‖‖ f ′(xi)−1( f ′(x) − f ′(y))‖≤ γi

1 − hi‖x − y‖

= γi+1‖x − y‖

Newton’s method 14.17

Page 19: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

• inequality (3) at k = i + 1 follows from (2) at k = i + 1: define v = xi+1 − xi,

‖ f ′(xi+1)−1 f (xi+1)‖ = f ′(xi+1)−1

(∫ 1

0f ′(xi + tv)vdt + f (xi)

) =

f ′(xi+1)−1∫ 1

0

(f ′(xi + tv) − f ′(xi)

)vdt

≤ ‖v‖

∫ 1

0

f ′(xi+1)−1( f ′(xi + tv) − f ′(xi)) dt

≤ γi+12‖v‖2

≤ γi+1η2i

2= ηi+1

• inequality (4) at k = i follows from (3) at k = i + 1 and ri = ri+1 + ηi

‖x − xi+1‖ ≤ ri+1 =⇒ ‖x − xi‖ ≤ ‖x − xi+1‖ + ‖xi+1 − xi‖ ≤ ri+1 + ηi = ri

Newton’s method 14.18

Page 20: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Limit

it remains to show that the limit x? solves the equation

• by the assumptions on page 14.5,

‖ f ′(x0)−1 f (xk)‖ = ‖ f ′(x0)−1 f ′(xk)(xk+1 − xk)‖= ‖( f ′(x0)−1( f ′(xk) − f ′(x0)) + I) (xk+1 − xk)‖≤

(‖( f ′(x0)−1( f ′(xk) − f ′(x0))‖ + 1

)‖xk+1 − xk ‖

≤ (γr + 1)‖xk+1 − xk ‖

• since ‖xk+1 − xk ‖ → 0 and f is continuous,

‖ f ′(x0)−1 f (x?)‖ = limk→∞

‖ f ′(x0)−1 f (xk)‖ = 0

Newton’s method 14.19

Page 21: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Outline

• Kantorovich theorem

• inexact Newton method

Page 22: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Inexact Newton method

inexact Newton method for solving nonlinear equation f (x) = 0:

xk+1 = xk + sk where sk ≈ − f ′(xk)−1 f (xk)

• sk is an approximate solution of the Newton equation

f ′(xk)s = − f (xk)

• goal is to reduce cost per iteration while retaining fast convergence

• in Newton-iterative methods, Newton equation is solved by iterative method

• an example is the Newton-CG method if f ′(x) is symmetric positive definite

Newton’s method 14.20

Page 23: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Forcing condition

accept the inexact Newton step sk if

‖ f ′(xk)sk + f (xk)‖ ≤ αk ‖ f (xk)‖

• coefficient αk < 1 is called the forcing term

• αk limits relative error in the Newton equation

• provides a stopping condition in iterative method for solving Newton equation

• αk is constant or adjusted adaptively

Newton’s method 14.21

Page 24: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

Local convergence

Assumptions

• the equation has a solution x? and f ′(x?) is invertible

• f ′ is Lipschitz continuous in a neighborhood of x?

Local convergence result

• the iterates xk converge to x? if x0 is sufficiently close to x?

• an error bound of following type holds (for some κ with κα < 1 if αk ≤ α < 1):

‖xk+1 − x?‖ ≤ κ (‖xk − x?‖ + αk) ‖xk − x?‖

• this shows how the forcing term determines the rate of convergence

αk constant αk = 0 αk ↘ 0

convergence: linear quadratic superlinear

Newton’s method 14.22

Page 25: L.Vandenberghe ECE236C(Spring2019) 16. Newton’s methodvandenbe/236C/lectures/newton.pdf · 2019-11-16 · References Newton method J.E.Dennis,Jr.,andR.B.Schabel,NumericalMethodsforUnconstrainedOptimizationand

References

Newton method

• J. E. Dennis, Jr., and R. B. Schabel, Numerical Methods for Unconstrained Optimization andNonlinear Equations (1996).

• P. Deuflhard, Newton Methods for Nonlinear Problems: Affine Invariance and AdaptiveAlgorithms (2011).

• C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations (1995).• J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several

Variables (2000).

Kantorovich theorem: the statement and proof of the theorem in the lecture follow

• P. Deuflhard, Newton Methods for Nonlinear Problems: Affine Invariance and AdaptiveAlgorithms (2011), theorem 2.1.

• T. Yamamoto, A unified derivation of several error bounds for Newton’s process, Journal ofComputational and Applied Mathematics (1985).

Inexact Newton method

• C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations (1995), chapter 6.• J. Nocedal and S. J. Wright, Numerical Optimization (2006), chapter 7.

Newton’s method 14.23