Jordan Algebraic Approach to Symmetric Optimization · 2012-12-10 · This simpler analysis in [9]...

Jordan Algebraic Approach toSymmetric Optimization

Manuel V. C. Vieira

Jordan Algebraic Approach toSymmetric Optimization

PROEFSCHRIFT

ter verkrijging van de graad van doctoraan de Technische Universiteit Delft,

op gezag van de Rector Magnicus Prof. dr. ir. J.T. Fokkema,voorzitter van het College voor Promoties,

in het openbaar te verdedigen opdinsdag 13 november 2007 om 12.30 uur

door

Manuel Valdemar Cabral VIEIRA

Grau de mestre em Investigação OperacionalUniversidade de Lisboa

geboren te Vila Nova de Foz Côa, Portugal.

Dit proefschrift is goedgekeurd door de promotor:Prof. dr. ir. C. Roos

Samenstelling promotiecommissie:Rector Magnicus voorzitterProf. dr. ir. C. Roos Technische Universiteit Delft, promotorProf. dr. F. Glineur Université Catholique de LouvainProf. dr. A. Yoshise Tsukuba University, JapanProf. dr. J.M.A.M. van Neerven Technische Universiteit DelftProf. dr. Y. Bai Shanghai University, ChinaDr. J. Brinkhuis Erasmus Universiteit RotterdamDr. M. Laurent Centrum voor Wiskunde en InformaticaProf. dr. ir. C. Vuik Technische Universiteit Delft, reservelid

Financial support was provided by the Faculty of Science and Technology of the NewUniversity of Lisbon and by the Portuguese Foundation for Science and Technology.

Dit proefschrift kwam tot stand onder auspicien van:

THOMAS STIELTJES INSTITUTE

FOR MATHEMATICS

Copyright c© 2007 by M. V. C. Vieira

All rights reserved. No part of the material protected by this copyright notice maybe reproduced or utilized in any form or by any means, electronic or mechanical,including photocopying, recording or by any information storage and retrievalsystem, without the prior permission of the author.ISBN 978-90-6464-189-3

Author email: [email protected]

To my wife

Acknowledgements

The time had run. If sometimes it seemed impossible to accomplish, now the resultcomes out and I am completing my PhD.

I would like rst to thank my supervisor, Kees Roos, for his availability, supportand enriching comments during the writing of the thesis. I am thankful for his extremepatience while reading my thesis.

I would like to thank my sponsors, the Faculty of Science and Technology ofthe New University of Lisbon and to the Portuguese Foundation for Science andTechnology. Without them, this life experience of studying in the Netherlands wouldnot be possible. I also would like to thank my host institution the Technical Universityof Delft, where I always felt welcome.

Thanks Etienne de Klerk, who friendly hosted me in my rst visit to Delft.I would like to thank François Glineur for his careful reading of my thesis and

useful remarks to improve my thesis.I also would like to thank all my colleagues from the optimization group. I thank

my colleagues Ivan and Hossein for their help to handle some aairs with the univer-sity, after I left the Netherlands.

Now, I want to thank my family for their support that is crucial for pursuing thegoal of doing a PhD. Thanks to all my friends. But most of all I want to apologizeto my family and friends of being apart so long. I am sorry.

Of course, I specially thank my wife Élia, for her great support and love, speciallyduring the hard moments.

Delft, November 2007

Manuel Vieira

i

Contents

Acknowledgements i

List of notation v

1 Introduction 11.1 Kernel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Symmetric optimization . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Why Jordan algebras? . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Jordan algebras and optimization . . . . . . . . . . . . . . . . . . . . 31.5 Subject of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5.1 Our approach to symmetric optimization . . . . . . . . . . . 41.5.2 Contents of the thesis . . . . . . . . . . . . . . . . . . . . . . 5

2 Euclidean Jordan algebras and symmetric cones 72.1 Power associative algebras . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Jordan algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Quadratic representation . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Euclidean Jordan algebras . . . . . . . . . . . . . . . . . . . . . . . . 212.5 Symmetric cones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.6 The natural barrier function . . . . . . . . . . . . . . . . . . . . . . . 322.7 Simple Jordan algebras . . . . . . . . . . . . . . . . . . . . . . . . . . 352.8 Automorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.9 The Peirce decomposition in a Jordan algebra . . . . . . . . . . . . . 392.10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Eigenvalues, spectral functions and their derivatives 493.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.2 Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.3 An important eigenvalue inequality . . . . . . . . . . . . . . . . . . . 523.4 Derivatives of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . 563.5 Spectral functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.6 Derivatives of spectral functions . . . . . . . . . . . . . . . . . . . . . 663.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

iii

iv Contents

4 Barrier functions 714.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.2 Kernel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.3 Barrier functions based on kernel functions . . . . . . . . . . . . . . 734.4 Derivatives of the barrier function . . . . . . . . . . . . . . . . . . . 754.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Interior-point methods based on kernel functions 775.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2 Symmetric optimization problem . . . . . . . . . . . . . . . . . . . . 775.3 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.5 The central path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825.6 The Nesterov-Todd direction . . . . . . . . . . . . . . . . . . . . . . 855.7 A new search direction for symmetric optimization . . . . . . . . . . 865.8 The algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895.9 Analysis of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . 91

5.9.1 Growth behavior . . . . . . . . . . . . . . . . . . . . . . . . . 915.9.2 Decrease of the barrier function during a inner iteration . . . 925.9.3 Iteration bounds . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.10 Recipe to calculate a complexity bound . . . . . . . . . . . . . . . . 1005.11 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.12 The accuracy of solution produced by the algorithm . . . . . . . . . 1025.13 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6 Conclusions 1056.1 Final notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056.2 Directions for further research . . . . . . . . . . . . . . . . . . . . . . 105

A Topological notions 107

B Some matrix properties 109

C Matrices of quaternions and octonions 111C.1 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111C.2 Matrices of quaternions . . . . . . . . . . . . . . . . . . . . . . . . . 111C.3 Octonions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114C.4 The Albert Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

D Technical properties 115

Bibliography 117

Index 121

List of notations

Sets

R - the eld of real numbers;

C - the eld of complex numbers;

H - the set of quaternions;

O - the set of Octonions;

K∗ - the dual of the cone K;

Sn - the vector space of real symmetric matrices;

Sn+ - the cone of positive semidenite matrices;

R[x] - the subalgebra generated by e and x (see page 8);

GL(V ) - the set of invertible linear mappings from V into itself;

End(V) - the set of endomorphisms of V ;

Aut(K) - the automorphism group, g ∈ GL(V ) : g(K) = K (see page 38);

OAut(K) - the set of orthogonal automorphism that leave K invariant (see page 38);

K(V ) - the cone of squares, x2 : x ∈ V (see page 28);

I - the set of invertible elements of V (see page 13);

U0 - the interior of the set U ;

cl(U) - the closure of the set U ;

Aut(V ) - the set of automorphisms of V (see page 38);

Vij - the Peirce spaces (see page 43).

v

vi List of notations

Operators

- Jordan product (see page 13);L(x) - Operator of multiplication by x (see page 9);P (x) - Quadratic operator (see page 16);〈·, ·〉 - inner product;⊕ - direct sum;Pii := P (ci) - the orthogonal projection onto Vii;Pij := 4L(ci)L(cj) - the orthogonal projection onto Vij ;[S, T ] := ST − TS - the commutator of the endomorphisms S and T ;x#y - geometric mean of x and y (see page 80);Dxf(x) - the rst derivative of f at x (see page 18);Du

xf(x) - the derivative of f in the direction u at x (see page 19);∇f(x) - the gradient of f (see page 19);D2

xf(x) the second derivative of f at x (see page 32).

Functions

λi(x) - an eigenvalue of x;λmin(x) - the smallest eigenvalue of x;λmax(x) - the largest eigenvalue of x;det(x) - the determinant of x,

∏ri=1 λi(x);

tr (x) - the trace of x,∑r

i=1 λi(x);ψ(t) - kernel function (see page 71);Ψ(x) - barrier function (see page 73);B(x) - the natural barrier function (see page 32).

Special elements

e - the identity element;x−1 - the inverse element of x;x1/2 - the square root of x.

Chapter 1Introduction

To introduce the topic of this thesis we have to go back to the rst paper on interior-point methods by Karmarkar [27] in 1984. He introduced a polynomial-time projectivealgorithm for linear optimization (LO), the rst polynomial-time algorithm perform-ing well in practice. Since then LO revived as an active area of research. Todaythe resulting interior-point methods are among the most eective methods for solvingLO problems. Many researchers have proposed and analyzed various interior-pointmethods for LO and a large amount of results have been reported. For a survey werefer to recent books on the subject [45, 51, 52]. An interesting fact is that almostall known polynomial-time variants of interior-point methods use the so-called centralpath [52] as a guideline to the optimal set and some variant of Newton's method tofollow the central path approximately. Therefore, analyzing the behavior of New-ton's method has been a crucial issue in the theoretical investigation of interior-pointmethods. It is generally agreed that primal-dual path-following methods are the mostecient methods from the computational point of view (see e.g., Andersen et al. [3]).These methods use the Newton direction as a search direction; this direction is closelyrelated to the well-known primal-dual logarithmic barrier function.

The interior-point methods developed for LO could be naturally extended to ob-tain polynomial-time methods for conic optimization. In conic optimization, a linearfunction is minimized over the intersection of an ane space and a closed convexcone. The foundation for solving these problems by interior-point methods was laidby Nesterov and Nemirovskii [37]. These authors considered primal (and dual) interiorpoint methods based on so-called self-concordant barrier functions. Later, Nesterovand Todd [38, 39] introduced symmetric primal-dual interior-point methods on a spe-cial class of cones called self-scaled cones, which allowed a symmetric treatment ofthe primal and the dual problem. Conic optimization includes solving problems suchas linear optimization, semi-denite optimization and second order cone optimizationproblems (see e.g. [2, 12]).

During the last two decades interior-point methods have proved to be a powerful

1

2 Chapter 1. Introduction

tool to solve convex optimization problems (see for example [37]), provided that wehave a self-concordant computationally tractable barrier function for the underlyingcone. Until recently all the barrier functions considered were so-called logarithmicbarrier functions. However, there is a gap between the practical behavior of the algo-rithms and the theoretical performance results, where the practical behavior is betterthan the worst-case complexity analysis. This is especially true for the so-called large-update methods. If n denotes the number of variables in the problem, then the theo-retical complexity analysis of large-update methods yielded an O(n log(n/ε)) iterationbound, where ε represents the desired accuracy of the solution. In practice, however,large-update methods are much more ecient than the so-called small-update meth-ods for which the theoretical iteration bound is only O(

√n log(n/ε)). So the current

theoretical bounds dier by a factor √n, in favor of the small-update methods. Thisgap is signicant.

1.1 Kernel functions

Recently, the gap could be narrowed by deviating from the usual approach. Peng et al.[4043] replaced the primal-dual logarithmic barrier by a so-called self-regular barrierfunction, which is determined by a simple univariate self-regular function, called itskernel function. The search direction was modied accordingly, and a large-updatemethod was obtained for which the theoretical iteration bound is O(

√n log n log(n

ε )).Thus the gap between the theoretical iteration bounds for small- and large-updatemethods has been narrowed. They naturally extended their work to semi-deniteoptimization and second-order cone optimization.

Later, a new class of barrier functions was introduced whose members are notnecessarily self-regular [9]. Some new analytic tools were developed for the analysisof interior-point methods based on such kernel functions. As a result the analysis ismuch simpler that in [4043], whereas the iteration bounds are at least as good. Inaddition, the analysis also applies to some (self-regular) functions, see Bai et al. [9].

This simpler analysis in [9] motivated us to extend the analysis to symmetricoptimization problems, which is the aim of this thesis.

1.2 Symmetric optimization

As we mentioned before, several interior-point methods for LO were extended to semi-denite optimization and second-order cone optimization. In fact, these optimizationproblems can be dened as minimizing a linear function over the intersection of anane space and a closed convex cone. If the cone is the linear cone, the secondorder cone or the cone of real semi-denite positive symmetric matrices, then we haverespectively a linear optimization problem, a second-order cone optimization problemor a semi-denite optimization problem. These three cones are the most relevantfor the optimization eld, and they were classied as belonging to the set of self-dual and homogenous cones (Section 2.5), also called symmetric cones. Thus, many

1.3 Why Jordan algebras? 3

authors developed interior-point methods for symmetric cones (e.g. [1619, 46, 47])by generalizing existing interior-point methods for LO.

1.3 Why Jordan algebras?

Jordan algebras were created to illuminate a particular aspect of physics: the quantummechanical observables. However, Jordan algebras illuminated connections with manyother areas of mathematics. A surprising observation was their relation to symmetriccones. This relation is as follows: any symmetric cone, can be realized as a cone ofsquares of some Euclidean Jordan algebra. It turns out that Euclidean Jordan algebrasprovided the tools to treat optimization problems involving symmetric cones. In short,Jordan algebras provide us with a simple structure to analyze, at once, all symmetricoptimization problems.

1.4 Jordan algebras and optimization

The rst work connecting Jordan algebras and optimization is due to Güler [22].He observed that the family of the self-scaled cones ([38]) is identical to the set ofsymmetric cones for which there exists a complete classication theory. It is worthmentioning that Nesterov and Todd [38], who provided a theoretical foundation forthe study of interior-point methods for symmetric optimization problems, did not usea Jordan algebraic approach.

Faybusovich analyzed several interior-point methods for symmetric optimizationusing the Jordan algebra framework: the primal-dual interior-point method analyzedby Alizadeh for semi-denite optimization [1] and a short-step path-following algo-rithm [17, 18]. With Arana [19], he derived complexity estimates for a long-stepprimal-dual interior-point algorithm. Later on, he described a primal-dual potential-reduction algorithm and its complexity estimates [16]. The complexity estimates werealways obtained in terms of the rank of the Euclidean Jordan algebra.

Schmieta and Alizadeh [46] presented a general framework in which the analysis ofinterior-point algorithms for semi-denite optimization can be extended verbatim tooptimization problems over symmetric cones derivable from associative algebras. Inparticular, their analysis is extendable to the cone of positive semi-denite Hermitianmatrices with complex and quaternion entries, and to the second-order cone. Theydealt with the case of the second-order cone by embedding its associated Jordanalgebra into the Cliord algebra. Later, Schmieta and Alizadeh [47] showed thatthe so-called commutative class of primal-dual interior-point algorithms which weredesigned by Monteiro and Zhang for semi-denite optimization [35] extends word-by-word to all symmetric cones. They also proved polynomial-time worst-case boundsfor variants of the short-, semi-long, and long-step path-following algorithms usingthe Nesterov-Todd, XS, or SX directions.

Rangarajan [44] established polynomial-time convergence of infeasible interior-point methods for symmetric optimization.

4 Chapter 1. Introduction

The well known software SeDuMi [49] was a major contribution of Jos Sturm. Thissoftware solves symmetric optimization problems. SeDuMi also solves optimizationproblems with complex variables. With Luo and Zhang he also analyzed the so-calledself-dual embedding for symmetric optimization [32].

Other contributions to the application of Jordan algebraic techniques in optimiza-tion were given by Baes, Hauser and Lim [4, 23, 24, 31].

1.5 Subject of the thesis

In this thesis we deal with the symmetric optimization problem. As may be clearnow, the object of symmetric optimization is to minimize a linear function over theintersection of an ane subspace and a symmetric cone. Symmetric optimizationproblems oer a unied framework for linear optimization, second-order cone opti-mization, semi-denite optimization and combinations of these.

1.5.1 Our approach to symmetric optimization

As said before, our aim is to generalize the kernel function based approach of Penget al. [4043] and Bai et al. [9] for LO to symmetric optimization. We thus obtainan interior-point method for solving symmetric optimization problems. It relies ona real univariate function, called kernel function, which generates an associated bar-rier function. The way to generate the barrier function is via the eigenvalues of itsargument, for which the theory of Euclidean Jordan algebras provides the necessaryframework. The barrier function is used to dene the search direction and to denea proximity measure of the current iterate to the central path.

Many things are generalizable word-by-word but others do not. In fact, we haveencountered some diculties: while the analysis of the method in LO deals withthe coordinates of the vectors, in symmetric optimization we have to deal with theeigenvalues (or spectral decomposition) and even sometimes with the so-called Peircedecomposition in a Euclidean Jordan algebra (see Section 2.9). This fact producedthe main dierences between LO and symmetric optimization. Particularly, we hadto establish some similarity properties, which replaced some equalities in LO. This isimportant because if two elements are similar the value of the barrier function is thesame for both elements.

By denition a kernel function is e-convex. A main issue in the kernel functionbased approach is to prove that the associated barrier function is also e-convex, be-cause only then we are able to perform the analysis of the algorithm. The proof ofe-convexity of the barrier function turned out to be the most demanding part of ourwork.

During our work we developed formulas for some derivatives of eigenvalues. Theyare interesting in themselves, despite the fact that we did not use them (see Section3.4).

1.5 Subject of the thesis 5

1.5.2 Contents of the thesis

We next summarize the contents of each chapter, giving a global picture of this thesis.

Chapter 2: We give a not so short but an as simple as possible introduction tothe theory of Euclidean Jordan algebras and symmetric cones, also explainingthe connection between Euclidean Jordan algebras and symmetric cones. Thisintroduction is sucient for the purpose of this thesis and does not require apriori knowledge of Jordan algebras, especially since we have included a largenumber of proofs. The results presented are not new, but we found it useful togive our own exposition.

Chapter 3: We present some properties of eigenvalues, especially similarity relationsand inequalities. We also give formulas for the derivatives of eigenvalues andrecall derivatives formulas for separable spectral functions, in terms of EuclideanJordan algebras.

Chapter 4: We recall the notion of a kernel function and discuss how a kernelfunction can be used to dene a barrier function for symmetric cones. Moreover,we achieve a crucial inequality for the barrier function based on the so-calledexponential convexity of the kernel function.

Chapter 5: We rst dene the symmetric optimization problem and the new searchdirection based on kernel functions, using the framework of Euclidean Jordanalgebras. Moreover, we prove that we can adapt the algorithm for LO presentedin [9] to general symmetric optimization problems, using properties that we havedeveloped before.

Chapter 6: This chapter contains some concluding remarks and suggestions forfurther research.

Appendix: The appendix consists of four sections. One of these contains a quickintroduction to quaternions and octonions, including some results. We alsoprove three properties concerning matrices of quaternions. The remaining sec-tions contains some notions from topology, some matrix properties and a fewtechnical properties that are relevant for the thesis.

Chapter 2Euclidean Jordan algebras and

symmetric cones

This chapter oers an introduction to the theory of Euclidean Jordan algebras asneeded for the optimization techniques that we present later. In order to make thethesis as self-supporting as possible, we decided to write down the proofs of quite alarge number of properties. We omit proofs in cases that would require the intro-duction of concepts that are far behind the purpose of this work. The approach wepresent closely follows Faraut and Korányi [15].

2.1 Power associative algebras

It turns out later on that Jordan algebras are power associative. Therefore, to startwith we introduce the notion of a power associative algebra and some of its properties.

Let V be a nite-dimensional vector space over R. A map h : V ×V 7→ V is calledbilinear if:(i) h(αu + βv, w) = αh(u,w) + βh(v, w) for all u, v ∈ V and α, β ∈ R.(ii) h(w,αu + βv) = αh(w, u) + βh(w, v) for all u, v ∈ V and α, β ∈ R.

If there exists a bilinear map (x, y) → xy from V ×V into V then (V, ) is calledan algebra over R (also called an R-algebra). We call the product of (V, ). If theproduct is associative, that is, for all x, y, z ∈ V,

x (y z) = (x y) z,

then (V, ) is called an associative algebra. Moreover, an R-algebra (V, ) is commu-tative if we have, for all x, y ∈ V ,

x y = y x.

7

8 Chapter 2. Euclidean Jordan algebras and symmetric cones

If for some e ∈ V ,x e = e x = x

for every x ∈ V then e is an identity element of V . V can have at most one identityelement, because if e1 and e2 are identity elements then e1 = e1 e2 = e2 e1 = e2.So, if it exists, the identity element e is unique.

In an algebra (V, ) with identity element e, we recursively dene powers of ele-ments as follows:

x0 := e, xn := x xn−1 for n ∈ N.

Denition 2.1.1. An R-algebra (V, ) is power associative if it has an identity ele-ment and for any x ∈ V and nonnegative integers p, q, one has: xp xq = xp+q. 2

From now on, we assume that (V, ) is a power associative R-algebra with identityelement e. We let R[X] denote the algebra over R of all polynomials in the variableX with coecients in R and with the usual product of polynomials.

For an element x in V we dene

R[x] := p(x) : p ∈ R[X].

Let U be a subspace of V . We say that U is a subalgebra of V if x y ∈ U for allx, y ∈ U . We say that the subalgebra of V generated by x and e consists of all linearcombinations, with coecients in R, of e and powers of x. Obviously, the algebra(V, ) is power associative if for each x ∈ V , the subalgebra generated by x and e, i.e.R[x], is associative.

Since V is a nite-dimensional vector space, for each x ∈ V , there exists a positiveinteger k such that the set e, x, x2, . . . , xk is linearly dependent. This implies theexistence of a polynomial p 6= 0 such that p(x) = 0. Recall that a monic polyno-mial is a polynomial with the leading coecient equal to 1. We dene the minimalpolynomial of x ∈ V as the monic polynomial p ∈ R[X] of minimal degree such thatp(x) = 0. The minimal polynomial is unique, because if p1 and p2 are two distinctminimal polynomials of x, then we have p1(x)−p2(x) = 0. Since p1 and p2 are monicpolynomials, the degree of p1− p2 is less than the degree of p1, which contradicts theminimality of p1.

We dene the degree of x, denoted as degree(x), as the degree of the minimalpolynomial of x. Obviously degree(x) ≤ dim(V ), where dim(V ) denotes the dimensionof the vector space V over R.

We dene the rank of V as

rank(V ) := maxdegree(x) : x ∈ V .

An element x ∈ V is called regular if degree(x) = rank(V ).The next proposition uses the notion of dense and open set. The meaning of these

well known concepts are presented in Appendix A.

2.1 Power associative algebras 9

Proposition 2.1.2 (Proposition II.2.1 in [15]). Let (V, ) be a power associative R-algebra with rank r. The set of regular elements is open and dense in V . There existpolynomials a1, a2, . . . , ar ∈ R[X], i = 1 . . . , r such that ai(x) ∈ R and the minimalpolynomial of every regular element x ∈ V in the variable λ is given by

f(λ;x) = λr − a1(x)λr−1 + a2(x)λr−2 + · · ·+ (−1)rar(x).

The polynomials a1, . . . , ar are unique and aj is homogeneous of degree j, for 1 ≤ j ≤r.

The polynomial f(λ; x) in the above proposition is called the characteristic poly-nomial of x. Furthermore, the proposition immediately implies that f(x;x) = 0. Itsproof is beyond the scope of our work. But it is not hard to get the functions aj(x).Let y be a regular element. Then the elements e, y, y2, . . . , yr−1 ∈ V are independentand there exist elements b1, . . . , bn−r ∈ V such that

B = e, y, y2, . . . , yr−1, b1, . . . , bn−ris a basis of V . Note that we want to nd polynomials aj such that

xr − a1(x)xr−1 + a2(x)xr−2 + · · ·+ (−1)rar(x)e = 0.

This equation can be thought of a system of n linear equations in r unknowns aj(x),with respect to the basis B. By Cramer's rule we get

aj(x) = (−1)j−1 Det(e, x, . . . , xj−1, xr, xj+1, . . . , xr−1, b1, . . . , bn−r)Det(e, x, . . . , xj−1, xj , xj+1, . . . , xr−1, b1, . . . , bn−r)

,

where Det denotes the usual matrix determinant. It can be proved that aj , j = 1, . . . , rare indeed polynomials, see [15].

Since the regular elements are dense in V we can extend by continuity the poly-nomials ai(x) to all elements of V and consequently the characteristic polynomial.Moreover, the minimal polynomial is equal to the characteristic polynomial for regularelements (as stated in Proposition 2.1.2), but it divides the characteristic polynomialof non-regular elements.

For an element x ∈ V , let L(x) : V → V be the linear map dened byL(x)y := x y, ∀y ∈ V. (2.1)

By its denition, L(x) is linear in x and L(e) is the identity operator which we denoteby I.

Let x ∈ V be a regular element and L0(x) be the restriction of L(x) to R[x]. Theset B := e, x, . . . , xr−1 is a basis of R[x] since x is regular. Then it is clear that thematrix of L0(x) with respect to the basis B is given by the r × r matrix

L0(x) =

0 0 . . . 0 (−1)r−1ar(x)1 0 . . . 0 (−1)r−2ar−1(x)0 1 . . . 0 (−1)r−3ar−2(x)...

... . . . ......

0 0 . . . 1 a1(x)

. (2.2)


To build the matrix we used that

L0(x)xr−1 = xr = a1(x)xr−1 − a2(x)xr−2 + · · ·+ (−1)r−1ar(x)e,

by Proposition 2.1.2. Therefore, with Tr and Det denoting the usual trace and deter-minant of endomorphisms, we have

TrL0(x) = a1(x),DetL0(x) = ar(x).

By the density of the regular elements in V these equalities extends to non-regularelements.

Denition 2.1.3. The coecient a1(x) is called the trace of x, denoted as tr (x).The coecient ar(x) is called the determinant of x, denoted as det(x). The roots ofthe characteristic polynomial are called the eigenvalues of x.

According to this denition we have

tr (x) = TrL0(x)det x = DetL0(x).

We dene the polynomial F (λ; L0(x)) := Det(λI − L0(x)), for x ∈ V regular, inthe algebra of matrices. The large amount of zeros in the matrix of L0(x), as givenby 2.2, makes it easy to compute the determinant of λI − L0(x) which turns out tobe exactly the characteristic polynomial of x. Since

Det(λI − L0(x)) = Det(L0(λe− x)) = det(x− λe),

where the last equality is due to Denition 2.1.3. It follows that the characteristicpolynomial of x is given by

f(λ; x) = det(λe− x).

Hence it follows that the eigenvalues of x are precisely the eigenvalues of L0(x).

Example 2.1.4. Let e be the identity element of V . Since L0(e) is the identity map,its matrix is the identity matrix. So we have tr (e) = r and det(e) = 1. Clearly itsminimal polynomial is λ− 1 and its characteristic polynomial is given by

f(λ; e) = det(λe− e) = det((λ− 1)e) = ar((λ− 1)e).

Since ar is homogenous of degree r, it follows that

f(λ; e) = (λ− 1)rar(e) = (λ− 1)r det(e) = (λ− 1)r.

Note that λ− 1 divides (λ− 1)r. 2

2.1 Power associative algebras 11

Since the trace is a homogeneous polynomial of degree 1, we have

tr (x + y) = tr (x) + tr (y) .

In general it is not true that det(xy) = det(x) det(y). This is illustrated by a simpleexample.

Example 2.1.5. Let Sn be the Euclidean space of n × n real symmetric matrices.Let the binary operation dened by

X Y :=XY + Y X

2

in Sn, where XY denotes the usual matrix product. Since the usual product ofmatrices is associative, (Sn, ) is a power associative R-algebra. Since for X ∈ Sn thecharacteristic polynomial is Det(λI −X), the determinant of X, det(X), is the usualdeterminant of a matrix. Now, let

X :=[

1 11 2

]and Y :=

[3 11 2

].

It easily follows that

Det(X Y ) = 4 6= 5 = Det(X)Det(Y ).

Hence, in general, det(x y) 6= det(x) det(y). 2

After this example, the following result is interesting.

Proposition 2.1.6 (Proposition II.2.2 in [15]). For all u in V , and x, y in R[u] wehave

det(x y) = det(x) det(y).

Proof. We assume rst that u is regular. For x in R[u] let L0(x) denote the restrictionof L(x) to R[u]. The algebra R[u] is associative. Therefore, for any x and y in R[u],

L0(x y) = L0(x)L0(y),

and hence we have

det(x y) = DetL0(x y) = DetL0(x)DetL0(y) = det(x) det(y),

i.e., for any polynomials p and q in R[u] we have

det(p(u) q(u)) = det p(u) det q(u).

Finally, the set of regular elements in V is dense in V , and by continuity in u it followsthat the last equality holds for any u in V . ¥


An element x is said to be an invertible element if there exists an element y inR[x] such that x y = e. Since R[x] is associative, y is unique. It is called the inverseof x and is denoted by y := x−1. If y ∈ V and x y = e, it is not necessarily true thatany y is the inverse of x. Before giving an example we rst deal with the followingresult.

Proposition 2.1.7 (Proposition II.2.3 in [15]). If L(x) is invertible, then x is invert-ible and x−1 = L(x)−1e.

Proof. If L(x) is invertible, the restriction L0(x) of L(x) to R[x] is one to one and onto.Hence y = L(x)−1e belongs to R[x] and x y = x L(x)−1e = L(x)L(x)−1e = e. ¥

The following example also shows that the converse of Proposition 2.1.7 is nottrue.

Example 2.1.8. Let V be the vector space of 2× 2 symmetric matrices and theproduct dened in Example 2.1.5. Now consider

X :=[

1 00 −1

], Y :=

[1 aa −1

], Z :=

[0 aa 0

], a ∈ R.

Then X is invertible and X−1 = X. We have X Y = I, but Y does not belong toR[X] for a 6= 0. Also, L(X) is not invertible since L(X)Z = 0. 2

Proposition 2.1.9 (Proposition II.2.4 in [15]). An element x is invertible if and onlyif det x 6= 0, and then

x−1 =q(x)detx

, (2.3)

whereq(x) := (−1)r−1

(xr−1 − a1(x)xr−2 + · · ·+ (−1)r−1ar−1(x)e

).

Proof. We have

x q(x) = x ((−1)r−1(xr−1 − a1(x)xr−2 + · · ·+ (−1)r−1ar−1(x)e)= (−1)r−1(xr − a1(x)xr−1 + · · ·+ (−1)r−1ar−1(x)x)

[ by Proposition 2.1.2]= (−1)r−1((−1)(−1)rar(x)e)= (−1)2rar(x)e = det(x)e.

Therefore, if det(x) 6= 0, then x is invertible and x−1 is given by (2.3). Conversely,if x is invertible, there exists a polynomial p ∈ R[x] such that x p(x) = e, and, byProposition 2.1.6

det(x) det(p(x)) = 1,

therefore det(x) 6= 0. ¥

2.2 Jordan algebras 13

The above proposition establishes that the set of the invertible elements is givenby

I := x ∈ V : det x 6= 0. (2.4)By Proposition 2.1.7 we have

L := x ∈ V : L(x) is invertible is a subset of I. Moreover, we have the following result.Proposition 2.1.10. The set L = x ∈ V : DetL(x) 6= 0 is dense in the set of theinvertible elements. In other words, cl(L) = I.Proof. Since L is a subset of I, it is enough to prove that I is a subset of the closureof L. Let y ∈ I. If DetL(y) 6= 0 then y ∈ L. Suppose now that DetL(y) = 0. Then ybelongs to cl(L) if for small enough ε > 0, we have y − εe ∈ L. Let

p(ε) := Det(L(y − εe)) = Det(L(y)− εI),

where I denotes the identity operator. Since the roots of p are the eigenvalues of L(y)it follows that there exists α∗ such that

α∗ := min|α| : p(α) = 0 and α 6= 0.Hence, for all ε such that 0 < ε < α∗ we have p(ε) 6= 0. Thus Det(L(y−εe)) 6= 0 whichis equivalent to y − εe ∈ L and this implies that y ∈ cl(L). The result follows. ¥

2.2 Jordan algebras

Denition 2.2.1. Let (V, ) be a nite-dimensional R-algebra. Then (V, ) is aJordan R-algebra if(J1) x y = y x ∀x, y ∈ V ,

(J2) x (x2 y) = x2 (x y) ∀x, y ∈ V . 2

Using denition from (2.1) of L(x) it is clear that (J1) and (J2) are equivalent to(J1*) L(x)y = L(y)x ∀x, y ∈ V ,

(J2*) L(x)L(x2) = L(x2)L(x) ∀x ∈ V .The property (J2*) means that the operators L(x) and L(x2) commute. The

notation [S, T ] := ST − TS is called the commutator of S and T , for any two en-domorphisms of the vector space V . Hence the property (J2*) can also be writtenas

[L(x), L(x2)] = 0,∀x ∈ V. (2.5)We sometimes abbreviate the notation (V, ) to V when there is no possible con-

fusion on the product.


Proposition 2.2.2. Let (V, ) be a Jordan R-algebra. Then the following identitieshold for all x, y ∈ V :

(i) [L(y), L(x2)] + 2[L(x), L(x y)] = 0,

(ii) [L(x), L(y z)] + [L(y), L(z x)] + [L(z), L(x y)] = 0,

(iii) L(x2 y)− L(x2)L(y) = 2(L(x y)− L(x)L(y))L(x).

Proof. By (2.5) we have for each t ∈ R:

[L(x + ty), L((x + ty)2)] = [L(x) + tL(y), L(x2) + 2tL(x y) + t2L(y2)]= [L(x), L(x2)] + 2t[L(x), L(x y)] + t[L(y), L(x2)]

+2t2[L(y), L(x y)] + t2[L(x), L(y2)] + t3[L(y), L(y2)]= t

(2[L(x), L(x y)] + [L(y), L(x2)]

)

+t2(2[L(y), L(x y)] + [L(x), L(y2)]

)= 0.

So we have for all t ∈ R,

t(2[L(x), L(x y)] + [L(y), L(x2)]

)+ t2

(2[L(y), L(x y)] + [L(x), L(y2)]

)= 0,

therefore2[L(x), L(x y)] + [L(y), L(x2)] = 0

and (i) follows. If we replace x by x + tz and y by and y + tz, with t ∈ R, in (i) andperform as in the previous item, one obtains (ii). Applying both sides of (i) to anelement z ∈ V , the resulting identity can be rewritten as

L(y)(x2 z)− L(x2)(y z) = 2L(x y)(x z)− 2L(x)((x y) z),

which is valid for all x, y, z ∈ V . Using (J1*), this is the same as

L(x2 z)y − L(x2)L(z)y = 2L(x z)L(x)y − 2L(x)L(z)L(x)y,

and this means that

L(x2 z)− L(x2)L(z) = 2L(x z)L(x)− 2L(x)L(z)L(x),

which is just the identity (iii), when z is replaced by y. ¥

Jordan algebras are not necessarily associative, but they are power associative, asbecomes clear in the next result.

Proposition 2.2.3 (Proposition II.1.2 in [15]).

(i) Let V be a Jordan R-algebra. Then, for any x in V and any positive integers pand q:

[L(xp), L(xq)] = 0.

2.2 Jordan algebras 15

(ii) Any Jordan algebra is power associative.

Proof. (i) Let End(V ) be the set of endomorphisms of V . We use the identity (iii)of Proposition 2.2.2 with y := xn−1 and n ∈ N, which gives that

L(xn+1) = L(x2)L(xn−1) + 2L(xn)L(x)− 2L(x)L(xn−1)L(x), n ∈ N.

This implies, by induction to n, that, for every n, L(xn) belongs to the subal-gebra of End(V ) generated by L(x) and L(x2), which is commutative by (J2*).In other words, it follows that L(xn) is a polynomial in L(x) and L(x2) and (i)follows.

(ii) We need to show that xp xq = xp+q for all positive integers p and q. We startwith q = 2 and use induction on p. By denition it is obviously true if p = 1.Using (J2) we may write

x2 xp+1 = x2 (x xp) = x (x2 xp) = x x2+p = x3+p.

Now using induction on q we obtain

xp+q+1 = xp+q x = (xp xq) x = L(x)L(xp)xq,

and using (i):xp+q+1 = L(xp)L(x)xq = xp xq+1.

The proposition is proved. ¥

Corollary 2.2.4. For all u in V , and x, y in R[u], we have

L(x)L(y) = L(y)L(x).

Proof. The proof immediately follows from Proposition 2.2.3, since x and y are poly-nomials in u. ¥

The following three examples illustrate some of the properties presented before.Example 2.2.5. Dening for all x, y ∈ Rn the operation

x y := (x1y1; . . . ; xnyn),

one easily veries that (Rn, ) is a Jordan R-algebra. The characteristic polynomialof x ∈ Rn is

f(λ;x) = (λ− x1)(λ− x2) . . . (λ− xn),

and rank(Rn) = n. Consequently, the trace of x is the sum of all components of thevector x and the determinant is their product. The identity element is e = (1; . . . ; 1)and the inverse element of x is

x−1 = (x−11 ; x−1

2 ; . . . ;x−1n ),

if it exists. 2


Example 2.2.6. We denote (x0;x1; . . . ; xn) ∈ Rn+1 as x = (x0; x) with x :=(x1; . . . ; xn) and dene the product as

x y := (xT y; x0y + y0x).

Therefore (Rn+1, ) is a Jordan R-algebra. We can see easily that the identity elementis (1; 0; . . . ; 0). Suppose that x and e are linearly independent. Now we want to showthat rank(Rn+1) = 2. So we need to verify whether there exist α, β ∈ R such that

x2 = αe + βx.

Since x2 = (xT x; 2x0x), we may rewrite this equation as a system of equations,

xT x = α + βx0 (2.6)2x0xi = βxi i = 1, . . . , n. (2.7)

Since x 6= e there exists some i such that xi 6= 0. Therefore, we get β = 2x0 andconsequently α = ‖x‖2 − 2x2

0 = ‖x‖2 − x20. Hence, the characteristic polynomial of x

isλ2 − 2x0λ + x2

0 − ‖x‖2

and rank(Rn+1) = 2. Obviously, the trace of x is 2x0 and the determinant is x20−‖x‖2.

Moreover, we can obtain the inverse element using Proposition 2.1.9:

x−1 =1

det(x)(−1)(x− a1(x)e) =

1det(x)

(−1)(x− 2x0e) =1

x20 − ‖x‖2

(x0;−x),

if it exists, i.e., if x20 − ‖x‖2 6= 0. 2

Example 2.2.7. Let Sn be the Euclidean space of real symmetric matrices. With dened as in Example 2.1.5, (Sn, ) is a Jordan R-algebra. Remark that theJordan algebra is commutative but not associative, contrary to the usual matrixproduct which is associative but not commutative. Since X X = XX, we can easilyconclude that the powers of X are equal wether we consider the Jordan algebra (Sn, )or the algebra of symmetric matrices with the usual product. Thus, the characteristicpolynomial of X is det(λI −X) and rank(Sn) = n. Consequently, the trace and thedeterminant are the usual ones, the identity and the inverse elements are the identitymatrix and the inverse matrix. 2

2.3 Quadratic representation

In this section we give the notion of the quadratic representation of a Jordan algebra.Let (V, ) be a nite-dimensional Jordan algebra over R, with the identity element e.Given x ∈ V , we dene:

P (x) := 2L(x)2 − L(x2).

2.3 Quadratic representation 17

It will turn out to be much easier to work with the operator P (x) rather than withL(x). The endomorphisms L(x) and P (x) commute, because we may write

P (x)L(x) = 2L(x)2L(x)− L(x2)L(x)= 2L(x)L(x)2 − L(x)L(x2),

where we used Proposition 2.2.3. Therefore,

P (x)L(x) = L(x)P (x). (2.8)

Example 2.3.1. Let (Rn, ) be the Jordan algebra with xy := (x1y1; . . . ; xnyn), asdened in Example 2.2.5. Then L(x) = Diag(x) and P (x) = Diag(x2), where Diag(x)denotes a diagonal matrix whose entries are the xi's in their natural order. 2

Example 2.3.2. Let (Rn+1, ) be the Jordan algebra dened in Example 2.2.6. Con-sider the canonical basis in Rn+1, it is quite standard to obtain the matrix of L(x)

L(x) =[

x0 xT

x x0I

],

where we identify L(x) with its matrix. By denition, we easily get

P (x) =[

xT x 2x0xT

2x0x det(x)I + 2xxT

],

after some elementary calculations. 2

Example 2.3.3. Let (Sn, ) be the Jordan algebra dened in Example 2.2.7. Con-sider the matrix X in the vector space of n × n matrices, denoted as Mn(R). Ifwe choose the basis B11, B12, . . . , B1n, . . . , Bnn and Bij i, j = 1, . . . , n are matricessuch that in row i and column j its entry is equal to 1 and the others entries are equalto 0, after tedious calculations we get

L(X) =12(X ⊗ I + I ⊗X),

where ⊗ denotes the Kronecker product. It easily follows that

P (X) = X ⊗X,

by using the properties of the Kronecker product. By denition, we obtain P (X)Y =XY X. 2

Proposition 2.3.4 (Proposition II.3.1 in [15]). An element x ∈ V is invertible if andonly if the linear operator P (x) : V 7→ V is invertible. In this case:

P (x)x−1 = x, (2.9)P (x)−1 = P (x−1). (2.10)


Proof. If P (x) is invertible, then the restriction of P (x) to R[x] is a bijection andy = P (x)−1x belongs to R[x]. Since

P (x)e = 2L(x)2e− L(x2)e = 2L(x)(x e)− x2 e = x2

and P (x) and L(x) commute we have

(P (x)−1x) x = L(x)P (x)−1x = P (x)−1L(x)x = P (x)−1x2 = e,

proving (2.9). Suppose that x is invertible. Since R[x] is associative we have,

P (x)x−1 = 2L(x)2x−1 − L(x2)x−1 = 2L(x)(x x−1)− x2 x−1 = x.

Since x−1 is a polynomial in x, by Proposition 2.2.3 L(x) and L(x−1) commute. Henceif we replace y by x−1 (respectively y = x−2) in Proposition 2.2.2-(iii) we obtain

P (x)L(x−1) = L(x)

and2L(x)L(x−1)− P (x)L(x−2) = I,

respectively, where I represents the identity operator. If we substitute the rst equa-tion in the second one we conclude

P (x)P (x−1) = I,

proving (2.10). ¥

Corollary 2.3.5 (Corollary II.3.2 in [15]). The set I of invertible elements is givenby

I = x : DetP (x) 6= 0.Proof. By Propositions 2.1.9 and 2.3.4 the result follows. ¥

In the remaining section we prove some more properties on the quadratic rep-resentation. For their proof we need of the well known concepts as derivative andgradient.

For now, we assume that V is endowed with an inner product denoted as 〈·, ·〉 andwith the induced norm, ‖ · ‖.Denition 2.3.6. We say a function h is o(u), denoted as h = o(u), if

limu→0

‖h(u)‖‖u‖ = 0.

Denition 2.3.7. Let f : U 7→ V and U an open subset of V . The function f isdierentiable at x ∈ U if there exists a linear map g(x) : V 7→ V such that

f(x + u)− f(x)− g(x)u = o(u), ∀u ∈ V. (2.11)

We denote g(x) by Dxf(x) or f ′(x). The function f is dierentiable on U if f isdierentiable at all points in U . In addition, if the map x 7→ g(x) is continuous ateach x ∈ U then f is said to be continuously dierentiable.

2.3 Quadratic representation 19

If x is a point of the interior of the domain of f and u ∈ V , we say that thefunction f is dierentiable in the direction u at the point x if the limit

Duxf(x) := lim

t→0

f(x + tu)− f(x)t

exists.If f is dierentiable at x , then Du

xf(x) exists and is equal to Dxf(x)u. The linearmap Dxf(x) is called the derivative of f at x. It is often called the gradient of f anddenoted by ∇f(x).

Let v : V 7→ V and w : V 7→ V be dierentiable functions. The bilinearity of theoperation and the denition of Du

x leads to

Dux(v w) = v Du

xw + Duxv w = L(v)Du

xw + L(w)Duxv.

This is obtained as easily as for the product of univariate real functions. Consequently

∇(v w) = L(v)∇w + L(w)∇v.

The denition of the powers of x ∈ V implies

∇xk+1 = L(x)∇xk + L(xk).

Especially we have ∇x = I and ∇x2 = 2L(x).

Proposition 2.3.8. One has:

(i) The gradient of the map x 7→ x−1is −P (x)−1, i.e.,

∇x−1 = −P (x)−1.

(ii) If x, y ∈ V are both invertible, then P (x)y is also invertible and moreover:

(P (x)y)−1 = P (x−1)y−1.

(iii) For any x, y ∈ V :P (P (y)x) = P (y)P (x)P (y).

Proof. By dierentiating the relations

x−1 x = e and x2 x−1 = x

in the u direction, we obtain

L(x)Dux(x−1) + L(x−1)u = 0 (2.12)

and2L(x)u x−1 + L(x2)Du

x(x−1) = u, (2.13)


respectively. Multiplying equation (2.12) by 2L(x) and subtracting equation (2.13)yields (recalling that L(x) and L(x−1) commute)

P (x)Dux(x−1) = −u

and (i) follows. The proof of second and third item will be done at the same time.Since, by the proof of Proposition 2.3.4

L(x−1)P (x) = P (x)L(x−1) = L(x),

we havex−1 P (x)y = L(x)−1P (x)y = L(x)y = x y. (2.14)

Since, P (x)y = 2x (x y)− x2 y we get

Dux (P (x)y) = 2u (x y) + 2x (u y)− 2(x u) y

= 2(L(u)L(y)x− L(y)L(u)x + L(u y)x)= 2Q(u, y)x,

where we dene Q(u, y) := L(u)L(y)− L(y)L(u) + L(u y). Regarding the left- andright-hand sides of (2.14) as functions of x and applying Du

x we obtain

(−P (x−1)u) (P (x)y) + 2x−1 Q(u, y)x = u y.

Setting u = y−1, and since L(y) and L(y−1) commute, it follows that

(−P (x−1)y−1) (P (x)y) + 2x−1 x = e,

or(P (x−1)y−1) (P (x)y) = e.

If L(P (x)y) is invertible, then P (x)y is invertible and

(P (x)y)−1 = P (x−1)y−1. (2.15)

Since, by Proposition 2.1.10, the set (x, y) : det L(P (x)y) 6= 0 is dense in (x, y) :det(P (x)y) 6= 0, the identity (2.15) remains valid for x, y ∈ V such that det(P (x)y) 6=0. Now, applying (i), i.e., use (i) to get the gradients of both sides of equation (2.15),we obtain

−P (P (y)x)−1P (y) = −P (y)−1P (x)−1,

which can be written as follows:

P (y)P (x)P (y) = P (P (y)x). (2.16)

Using equation (2.16) and Proposition 2.3.4 it follows that

(x, y) : det(P (x)y) 6= 0 = (x, y) : det(x) det(y) 6= 0.Thus, we conclude that the formula (2.15) is still valid for x and y invertible and thethird item is proved for x and y invertible. Since both sides of (2.16) are polynomialsin x and y, the result holds in general by continuity. Everything is proved. ¥

2.4 Euclidean Jordan algebras 21

The identityP (P (y)x) = P (y)P (x)P (y) (2.17)

is known as the fundamental formula.

Corollary 2.3.9. Let x ∈ V and k a positive integer. One has

P (xk) = P (x)k.

Proof. If k = 1 then it is trivially true. We proceed with induction to k. Supposethat P (xk) = P (x)k. By the fundamental formula we have

P (P (xk)x) = P (xk)P (x)P (xk).

By induction hypothesis we have

P (xk)P (x)P (xk) = P (x)kP (x)P (x)k = P (x)2k+1.

Since P (xk)x = x2k+1 we obtain

P (x2k+1) = P (x)2k+1.

On the other hand, using the same arguments, we have

P (P (xk)e) = P (xk)P (e)P (xk)

which, by the fundamental formula and induction hypothesis, implies

P (x2k) = P (x)2k.

It follows that we have proved the property for all positive integers. ¥

2.4 Euclidean Jordan algebras

A Jordan algebra V over R is said to be Euclidean if there exists an inner productwhich is associative. In other words, if there exists an inner product, denoted by〈u, v〉, such that

〈x u, v〉 = 〈x, u v〉for all x, u, v, in V .

We will assume that (V, , 〈·, ·〉) is an nite-dimensional Euclidean Jordan algebraover R with rank r.

An element c ∈ V is said to be idempotent if c2 = c. Two idempotents c and d aresaid to be orthogonal if c d = 0. Since then

〈c, d〉 = 〈c2, d〉 = 〈c, c d〉 = 〈c, 0〉 = 0,


orthogonal idempotents are orthogonal with respect to the inner product. One saysthat c1, . . . , ck, with k ≤ r, is a complete system of orthogonal idempotents if

c2i = ci, ∀i

cicj = 0 if i 6= j,

c1 + c2 + · · ·+ ck = e.

Theorem 2.4.1 (Theorem III.1.1 in [15]). (Spectral theorem, type I.) For x in Vthere exist unique real numbers λ1, . . . , λk, all distinct, and a unique complete systemof orthogonal idempotents c1, . . . , ck such that

x = λ1c1 + · · ·+ λkck,

with k = degree(x). For each j = 1, . . . , k, we have cj ∈ R[x]. The numbers λj arethe eigenvalues of x and

∑kj=1 λjcj is called the spectral decomposition of x.

Sketch of the proof. Essentially, the proof of the theorem uses the spectral decom-position of an endomorphism; for details, we refer to [36]. For y in R[x], let M(y)be the restriction of L(y) to R[x]. Then M(x) is a symmetric endomorphism of theEuclidean space R[x]. Therefore, applying the spectral decomposition to M(x), thereexist non-zero orthogonal projections P1, . . . , Pk in R[x] such that P1 + · · ·+ Pk = I,and real numbers λi such that

M(x) = λ1P1 + · · ·+ λkPk.

Furthermore, there exist polynomials pj such that Pj = pj(M(x)). We set cj := pj(x).Using the associativity of the algebra R[x] we obtain

M(cj) = M(pj(x)) = pj(M(x)) = Pj ,

where the second equality follows from the fact that M(u v) = M(u)M(v) foru, v ∈ R[x] (see Corollary 2.2.4). Similarly,

M(ci cj) = M(ci)M(cj) = PiPj ,

M(∑

cj) =∑

Pj = I,

M(∑

λjcj) =∑

λjPj = M(x).

Since M is injective (if M(y) = 0 then y = M(y)e = 0) it follows that

c2i = ci, ci cj = 0 if i 6= j∑

cj = e,∑

λjcj = x.

To prove uniqueness, suppose x =∑

αjdj where αj 's are all distinct and d1, . . . , dk

are complete system of orthogonal idempotents. We have p(x) =∑

p(αj)dj for everypolynomial p. Dening, for xed j,

p(j)(X) =∏

i 6=j

(X − αi),


it follows thatp(j)(x) =

∏

i 6=j

(αj − αi)dj ,

or equivalently

dj =p(j)(x)∏

i 6=j(αj − αi),

since the αj 's are all dierent. Therefore dj belongs to R[x]. Since d1, . . . , dj iscomplete system of idempotents and M is invertible, the M(dj)'s are mutually or-thogonal projections, and so the αj 's are necessarily just eigenvalues of M(x). Eachdj is then the orthogonal projection, dj = M(dj)e, of e onto the eigenspace of M(x)corresponding to αj . This makes everything unique. ¥

We say that an idempotent is primitive if it is non-zero and cannot be written asthe sum of two (necessarily orthogonal) non-zero idempotents. We say that c1, . . . , cm

is a complete system of orthogonal primitive idempotents, or a Jordan frame, if eachcj is a primitive idempotent and if

ci cj = 0, i 6= jm∑

j=1

cj = e.

Every Jordan frame has r = rank(V ), as we prove in the following result.Theorem 2.4.2 (Theorem III.1.2 in [15]). Spectral theorem, type II. SupposeV has rank r. Then for every x in V there exists a Jordan frame c1, . . . , cr and realnumbers λ1, . . . , λr such that

x =r∑

j=1

λjcj .

The numbers λj (with their multiplicities) are uniquely determined by x. Furthermore,

det(x) =r∏

j=1

λj , tr (x) =n∑

j=1

λj .

More generally,ak(x) =

∑

1≤i1≤···≤ik≤r

λi1 . . . λik,

where ak (1 ≤ k ≤ r) is the polynomial dened in Proposition 2.1.2.

Proof. If x =∑k

i=1 λici is the spectral decomposition given by Theorem 2.4.1, then,clearly, p(x) =

∑ki=1 p(λi)ci for every polynomial p. Therefore, the minimal polyno-

mial of x is

f(X,x) =k∏

i=1

(X − λi).


From this we see that k ≤ r, and k = r exactly if x is regular. In the latter case eachci is primitive, otherwise it would be possible to have a Jordan frame of more thanr elements, and one could construct elements in V whose minimal polynomial haddegree higher than r, contradicting the denition of rank.

The formulas for det(x), tr (x) and more generally ak(x) are now obvious when xis regular. When x is not regular, since the set of regular elements are dense in V , it isstill the limit of a sequence

(x(n)

)n∈N of regular elements. Then x(n) =

∑ri=1 λ

(n)i c

(n)i

by the above, where c(n)1 , c

(n)2 , . . . , c

(n)r is a Jordan frame for every n ∈ N. Since the

converging sequence is a compact set in V there exists a subsequence such that thereexist limits λi = limk λ

(nk)i and ci = limk c

(nk)i . Our statements follow from this. The

uniqueness is immediate from Theorem 2.4.1. ¥

Whenever we refer to spectral decomposition of an element we mean the spectraldecomposition of type II given by Theorem 2.4.2, unless stated otherwise.

The decomposition of type I can be obtained from the decomposition of type IIas follows. Let x =

∑ri=1 λici be the spectral decomposition of type II of x. Let us

dene the integers s, q1, . . . , qs such that qs := r and

λ1(x) = · · · = λq1(x) > λq1+1(x) = · · · = λq2(x) > · · · > λqs(x).

Dene Ji := qi−1 + 1, . . . , qi (with q0 = 0) and put ei :=∑

j∈Jicj . Then e1, . . . , es

is a complete system of idempotents and

x =s∑

i=1

λqi(x)ei (2.18)

is the spectral decomposition of type I.The spectral decomposition of type II includes all the eigenvalues of x including

multiplicities, while the type I decomposition includes only the dierent eigenvalues.A direct consequence of Theorems 2.4.1 and 2.4.2 is that eigenvalues of elements in aEuclidean Jordan algebra are always real. Since e = c1 + · · ·+ cr, by Theorem 2.4.1the eigenvalues of e are all equal to one, and it immediately follows that tr (e) = rand det(e) = 1. This agrees with Example 2.1.4.

We give the following examples of spectral decompositions.

Example 2.4.3. The Jordan algebra (Rn, ) dened in Example 2.2.5 is Euclideanwith the associative inner product dened as 〈x, y〉 := tr (x y) = xT y, x, y ∈ Rn.The canonical basis in Rn is a Jordan frame, and we can write any x ∈ Rn as(x1;x2; . . . ; xn) = x1c1 + · · · + xncn. In this case, the Jordan frame is unique. Wejust need to observe that x2

i = xi is equivalent to xi = 0 or xi = 1, for i = 1 . . . n. 2

Example 2.4.4. Consider the Jordan algebra (Rn+1, ) dened in Example 2.2.6.We have that (Rn+1, ) is Euclidean with the associative inner product dened as


〈x, y〉 := tr (x y) = 2xT y, x, y ∈ Rn+1. Moreover, any x ∈ Rn+1, with x 6= 0, hasthe spectral decomposition

x = λ1(x)c1(x) + λ2(x)c2(x),

whereλ(x) =

[λ1(x)λ2(x)

]=

[x0 − ‖x‖x0 + ‖x‖

]

andc1(x) =

12

[1

− x‖x‖

]c2(x) =

12

[1x‖x‖

],

where ‖x‖ is the Euclidean norm in Rn. When x = 0 the spectral decomposition istrivial. 2

Example 2.4.5. Let (Sn, ) be the Jordan algebra dened in Example 2.2.7. Withthe associative inner product dened as 〈X, Y 〉 = tr (X Y ) = tr (XY ) for all X, Y ∈Sn, (Sn, ) is a Euclidean Jordan algebra. Any X ∈ Sn has spectral decomposition:

X =n∑

i=1

λiqiqTi

where λi, i = 1, . . . , n, are the eigenvalues of X and qi, i = 1, ..., n are unitary eigen-vectors of X. Since X is symmetric, all eigenvalues are real numbers. The symmetricmatrix qiq

Ti is an idempotent, since

qiqTi qiq

Ti = qiq

Ti qiq

Ti = qiq

Ti .

Moreover, it is easy to see that q1qT1 , . . . , qnqT

n is a Jordan frame. 2

We can extend the denition of any real valued univariate and continuous functionf(.) to elements of a Euclidean Jordan algebra, using the eigenvalues. This canbe done as follows. Starting from the (unique) spectral decomposition of type I,x =

∑si=1 λqi(x)ei (cf. (2.18)), we (uniquely) dene the function F : V 7→ R:

F (x) :=s∑

i=1

f(λqi(x))ei. (2.19)

Sinces∑

i=1

f(λqi(x))ei =r∑

i=1

f(λi(x))ci,

we can also use a spectral decomposition of type II to obtain the value of F at x.Of particular interest are the following examples:


1. The square root: x12 := λ

121 c1 + · · ·+ λ

12r cr, whenever all λi ≥ 0, and undened

otherwise;

2. The inverse: x−1 := λ−11 c1 + · · · + λ−1

r cr whenever all λi 6= 0 and undenedotherwise.

Note that x−1 given as a linear combination of c1, . . . , cr also belongs to R[x] becauseci ∈ R[x] for every i = 1, . . . , r. Moreover (λ−1

1 c1 + · · · + λ−1r cr) x = e. Thus, the

expression of x−1 using a spectral decomposition of x is the same as the algebraicinverse element considered before.

In Chapter 4 we will deal with more examples that are extensively used in thisthesis.

Proposition 2.4.6 (Proposition III.1.3 in [15]). Let c be an idempotent in a Jordanalgebra. The only possible eigenvalues of L(c) are 0, 1

2 , and 1.

Proof. Using identity (iii) of Proposition 2.2.2, with x = y, we obtain

L(x3) = 3L(x2)L(x)− 2L(x)3,

and for x = c:2L(c)3 − 3L(c)2 + L(c) = 0.

Therefore, an eigenvalue λ of L(c) is a solution of

2λ3 − 3λ2 + λ = 0,

whose roots are 0, 12 and 1. ¥

Proposition 2.4.7 (Proposition III.1.5 in [15]). Let V be a Jordan algebra over R.The following properties are equivalent.

(i) V is Euclidean.

(ii) The symmetric bilinear form tr (x y) is positive denite.

Sketch of the proof. (i) ⇒ (ii). By Theorem 2.4.2, any x in V has a spectral decom-position x =

∑ri=1 λjcj , where the λi's are real numbers, and x2 =

∑ri=1 λ2

jcj . Ifx 6= 0, then

tr(x2

)=

∑λ2

i > 0.

(ii) ⇒ (i). Let 〈x, y〉 := tr (x y) . Then 〈x, y〉 is a positive denite symmetric bilinearform. It remains to show that 〈 , 〉 is associative. The proof is beyond the scope ofthe thesis. We refer to [15] or [4]. ¥

Proposition 2.4.7 also means that tr (x y) is an inner product. In the sequel,〈x, y〉 will always denote this inner product. So

〈x, y〉 := tr (x y) .

2.5 Symmetric cones 27

The norm induced by this inner product is given by

‖x‖ :=√〈x, x〉.

The norm of x ∈ V can be obtained by using its eigenvalues. So, let x =∑r

i=1 λ2i (x)ci

a spectral decomposition of x. It is straightforward to see that

x2 =r∑

i=1

λ2i (x)ci.

Thus,

‖x‖ :=√〈x, x〉 =

√tr (x2) =

√√√√r∑

i=1

λ2i (x).

The operator L(x) and P (x) are self-adjoint with respect to this inner product.To explain this we need some denitions. If g is a linear map from V to V , then wecall the linear map g∗ the adjoint of g, with respect to the inner product 〈 , 〉, if

〈gx, y〉 = 〈x, g∗y〉, for all x, y ∈ V.

If g = g∗ then g is said to be self-adjoint . Now we can easily prove that L(x) isself-adjoint:

〈L(x)y, z〉 = 〈x y, z〉 = 〈y x, z〉 = 〈y, x z〉 = 〈y, L(x)z〉, ∀x, y, z ∈ V,

where the third equality follows from the associativity of the inner product. As wenow show this implies that the quadratic representation P (x) is also self-adjoint. Forx, y, x ∈ V we have

〈P (x)y, z〉 = 〈2L2(x)y − L(x2)y, z〉= 〈2L2(x)y, z〉 − 〈L(x2)y, z〉= 〈y, 2L2(x)z〉 − 〈y, L(x2)z〉= 〈y, P (x)z〉,

where the equalities follows from the denition of the quadratic representation andsince L(x) is self-adjoint.

2.5 Symmetric cones

In this section the relevance of the theory of Euclidean Jordan algebras to SymmetricOptimization will be illuminated. It may be useful to recall the denition of a convexcone. Let V be a nite-dimensional real Euclidean space.

Denition 2.5.1. A non-empty subset K of V is a cone if x ∈ K and λ ≥ 0 implythat λx ∈ K. 2


A subset S of V is said to be convex if x, y ∈ S and 0 ≤ λ ≤ 1 imply thatλx + (1− λ)y ∈ S. It is clear that K ⊂ V is a convex cone if and only if K is a coneand a convex set. One easily veries that a cone K is convex if and only if x, y ∈ Kimply that x + y ∈ K. The dual cone of K is dened as

K∗ := y ∈ V : 〈x, y〉 ≥ 0, ∀x ∈ K.

It is easy to see that K∗ is closed convex cone. If K = K∗ we say that K is self-dual .The cone K is said to be pointed if K ∩ −K = 0. In what follows, K will denote anon-empty pointed convex cone.

For a convex cone K we dene the automorphism group, Aut(K) ⊂ GL(V ) of Kby

Aut(K) := g ∈ GL(V ) : g(K) = K, (2.20)where GL(V ) is the set of invertible linear maps g from V into itself. The convex coneK is said to be homogeneous if Aut(K) acts transitively on K0, i.e., for all x, y ∈ K0

there exists g ∈ Aut(K) such that gx = y.

Denition 2.5.2. The convex cone K is said to be symmetric if it is homogeneousand self-dual. 2

Let V be a Euclidean Jordan R-algebra and let K(V ) be the set of all squares, i.e,

K(V ) := x2 : x ∈ V .

If x2 ∈ K(V ) and α ≥ 0 then αx2 = (√

αx)2 ∈ K(V ). It follows that K(V ) is acone. We call K(V ) the cone of squares of the Euclidean Jordan algebra V . To proveconvexity we use the dual cone, which is given by

K∗(V ) = y ∈ V : 〈y, x2〉 ≥ 0 ∀x ∈ V .

Since〈y, x2〉 = 〈y x, x〉 = 〈L(y)x, x〉,

we haveK∗(V ) = y ∈ V : L(y) is positive semidenite.

Proposition 2.5.3. The cone K(V ) is self-dual.

Proof. Let x2 ∈ K(V ) with x ∈ V and x =∑r

i=1 λici its spectral decomposition.Then

x2 =r∑

i=1

λ2i ci,

andL(x2) =

r∑

i=1

λ2i L(ci).


By Proposition 2.4.6, the operators L(ci) are positive semidenite, therefore L(x2) ispositive semidenite and hence x2 ∈ K∗(V ). Conversely, let x belong to K∗(V ), thenL(x) is positive semidenite and particularly

〈L(x)ci, ci〉 ≥ 0.

Since the idempotents c1, . . . , cr are orthogonal we have

〈L(x)ci, ci〉 = λi〈ci, ci〉 = λi‖ci‖2 ≥ 0, ∀i.Hence, the eigenvalues are non-negative, then we can write x = y2 with

y =r∑

i=1

√λici.

Therefore, it follows that K∗(V ) = K(V ). ¥

Corollary 2.5.4. K(V ) is a closed convex cone.Proof. Since K∗(V ) is a closed convex cone, it immediately follows from Proposition2.5.3 that K(V ) is closed and convex. ¥

In the sequel, unless stated otherwise, K will always denote the cone K(V ).Proposition 2.5.5. As before (see (2.4)), we denote I the set of invertible elementsin V . Then

K0 = x2 : x ∈ I,where K0 denotes the interior of K.Proof. We will prove that

Ω := x2 : x ∈ Iis equal to

Γ := x ∈ V : L(x) is positive denite.Let y = w2 ∈ Ω with w ∈ I. Then L(y) is positive denite if

0 < 〈z, L(y)z〉 = 〈z2, w2〉 for all z ∈ V \ 0.If w =

∑ri=1 λici then w2 =

∑ri=1 λ2

i ci. Hence

〈zL(w2), z〉 = 〈r∑

i=1

λ2i L(ci)z, z〉.

We claim that for each z ∈ V \ 0 there is i such that 〈L(ci)z, z〉 > 0. Supposethat there exists z ∈ V \ 0 for all i such that 〈L(ci)z, z〉 = 0. Remark that L(ci) ispositive semidenite. Therefore

0 =r∑

i=1

〈L(ci)z, z〉 =r∑

i=1

〈ci z, z〉 = 〈e z, z〉 = 〈z, z〉.


This implies that z = 0 which contradicts the hypothesis of z ∈ V \ 0. Since wis invertible, the eigenvalues of w2 are greater than zero. From everything that wasexposed, we can now say that L(w2) is positive denite. It follows that Ω ⊂ Γ. Lety ∈ Γ and y =

∑ri=1 λici its spectral decomposition. Since, by denition 〈z, L(y)z〉 >

0 for all z ∈ V \ 0, we have

〈L(y)ci, ci〉 = λi〈ci, ci〉 = λi‖ci‖2 > 0,

which implies thatλi =

1‖ci‖〈L(y)ci, ci〉 > 0.

Therefore y is invertible and y = w2 with

w =r∑

i=1

√λici.

It follows that Γ ⊂ Ω, and the result is proved because Γ = (K∗)0 = K0. ¥

Remark 2.5.6. From Proposition 2.5.5, it immediately follows that K is the closureof x2 : x ∈ I.

Before proving the following proposition, we give some denitions. Let g : V 7→ Vbe a self-adjoint linear map. We say g is positive semidenite (denite) if 〈gx, x〉 ≥ 0(> 0) for all x ∈ V (V \ 0).Proposition 2.5.7. We have the following properties:

(i) P (x)K0 = K0, x ∈ I;(ii) P (x), with x ∈ K0, is positive denite.

Proof. (i) Since K0 is convex, P (x)K0 is also convex. Note that by Proposition2.3.8 any z = P (x)y ∈ P (x)K0 is invertible, for y ∈ K0 and x ∈ I. This meansthat P (x)K0 does not cross the border of K0. Thus, it is enough to prove thatthere is a common element to both sets: we have for invertible x, x2 = P (x)ebelongs to K0 and to P (x)K0. Hence P (x)K0 ⊆ K0. Let y ∈ K0. We have thatP (x)y ∈ P (x)K0 ⊆ K0. Therefore, y = P (x)−1P (x)y ∈ P (x)−1K0. It followsthat K0 ⊆ P (x)K0 for x invertible. Thus, P (x)K0 = K0.

(ii) If x ∈ K0 then x = u2, with u ∈ I. Therefore

P (x) = P (u2) = P (u)2.

Since if u is invertible then P (u) is invertible (Proposition 2.3.4), we can con-clude that P (x) is positive denite.



Proposition 2.5.8. K is a symmetric cone.

Proof. By Proposition 2.5.3, K is self-dual. For x, y ∈ K0 we dene

g := P (y1/2)P (x−1/2).

By Proposition 2.5.7 and Remark 2.5.6, P (x) ∈ Aut(K) for invertible x. Therefore,g ∈ Aut(K) and

gx = P (y1/2)P (x−1/2)x = P (y1/2)e = y.

We conclude that for any x and y in K0 there exists g ∈ Aut(K) such that gx = y.Hence, K is homogeneous. Moreover, K is symmetric. ¥

We have proved that K is a symmetric cone. In fact, the converse also holds.However, its proof is beyond the scope of this thesis. We state the result belowwithout proof.

Theorem 2.5.9 (Theorem III.3.1 in [15]). (Jordan algebraic characterizationof symmetric cones) A cone is symmetric if and only if it is the cone of squares ofsome Euclidean Jordan algebra.

Proposition 2.5.10. Let V be an Euclidean Jordan R-algebra with rank r and K itscone of squares. If x ∈ V is such that x =

∑ri=1 λici, where c1, . . . cr is a Jordan

frame, then λi ≥ 0(> 0) for i = 1, . . . k if and only if x ∈ K(K0).

Proof. If λi ≥ 0 then we can write x = y2, with y =∑r

i=1

√λici, which means that

x ∈ K. In case that λi > 0 for i = 1, . . . , r, we know that x is invertible, and it followsthat x ∈ K0. Conversely, if x ∈ K we have x = y2, with y ∈ V . If we denote αi

the eigenvalues of y, we can write x =∑r

i=1 α2i ci where α2

i 's are the eigenvalues of x,which are greater or equal than zero. If x ∈ K0 then y is invertible, therefore α2

i 's aregreater than zero. The proposition is proved. ¥

We give the following two more properties for the quadratic representation.

Proposition 2.5.11. P (x)12 = P (x

12 ) if x ∈ K.

Proof. We have P (x1/2)e = 2L2(x1/2)e− L(x)e = x. Therefore, by the fundamentalformula,

P (x) = P (P (x1/2)e) = P (x1/2)P (e)P (x1/2) = P (x1/2)2.

Since L(x) is symmetric, P (x) is also symmetric, and thus P (x)1/2 = P (x1/2). ¥

Proposition 2.5.12 (Proposition III.4.2 in [15]). Let V be a Euclidean Jordan alge-bra. Then

det(P (y)x) = det(y)2 det(x), x, y ∈ V. (2.21)


Sketch of the proof. By the fundamental formula we have

detP (P (y)x) = det(P (y)P (x)P (y)) = (det P (y))2 detP (x).

Since, by Proposition III.4.2 in [15],

Det(P (x)) = (det x)2nr ,

with dim V = n and rank(V ) = r, the identity (2.21) follows. ¥

2.6 The natural barrier function

The function B : K0 7→ R is dened as

B(x) := − log det(x).

As the eigenvalues are positive for any x ∈ K0 (Proposition 2.5.10), det(x) is positivefor any x ∈ K0. We conclude that B is well dened.

Since, by Proposition 2.5.10, at least one eigenvalue of the elements belonging tothe boundary of the K0 is zero, it follows that

B(x) →∞ as x → ∂K0.

We call B the natural barrier function of the cone K.Below we compute the gradient and the Hessian of B at x ∈ K0. Before doing this

we recall its denition. The function f : U ⊂ V 7→ R is said to be twice dierentiableif f is continuous dierentiable and there exists a linear operator H(x) : V 7→ V suchthat

g(x + u)− g(x)−H(x)u = o(u),

where g is the gradient of f at x.If it exists, H(x) is said to be the Hessian of f at x. If the function x 7→ H(x)

is continuous at x then H(x) is self-adjoint. In this case, H(x) is the self-adjointoperator such that

DuxDv

xf(x) = 〈H(x)u, v〉.We denote H(x) by D2

xf(x).

Proposition 2.6.1. One has:

(i) ∇B(x) = −x−1;

(ii) D2xB(x) = P (x)−1.

Proof. If x =∑r

i=1 λici, then det(x) =∏r

i=1 λi. Therefore, we can write

log detx =r∑

i=1

log λi.

2.6 The natural barrier function 33

Using tools that will be presented in Chapter 3, the gradient of log det x turns out tobe given by

∇ log det x =r∑

i=1

(Dt log t)t=λici =

r∑

i=1

λ−1i ci = x−1.

Since by Proposition 2.3.8 -(i)

Duxx−1 = −P (x)−1u

andDv

x log det x = 〈x−1, v〉,it follows that

DuxDv

xB(x) = Dux〈−x−1, v〉 = 〈P (x)−1u, v〉,

and (ii) follows. ¥

Proposition 2.6.1 - (ii) relates the quadratic representation P (x) with the Hessianof the natural barrier function B(x).

Example 2.6.2. Consider the Euclidean Jordan algebra of Example 2.4.3. Then onecan easily see that:

(i) K = Rn+;

(ii) B(x) = −∑ni=1 log(xi), x = (x1; . . . ;xn) ∈ K0;

(iii) x is invertible if and only if xi 6= 0 for every i.

In case this case∇B(x) = −

(1x1

; . . . ;1xn

).

Further,

DuxDv

xB(x) =n∑

i=1

uivi

x2i

, u, v ∈ Rn.

HenceP (x) = diag(x2

1; . . . ; x2n),

in agreement with Example 2.3.1. 2

Example 2.6.3. Let (Rn+1, ) be the Euclidean Jordan algebra dened in Example2.4.4. Then:

(i) K = (x0; x) ∈ Rn+1 : x0 ≥ ‖x‖;(ii) B(x) = − log(x2

0 − ‖x‖2), for x ∈ K0;


(iii) x is invertible if and only if x0 6= ‖x‖ and

∇B(x) = − 1detx

(x0;−x).

Since H(x) = P (x)−1 = P (x−1) and using the matrix representation of P (x) inExample 2.3.2, we obtain

H(x) =1

det(x)2

[xT x −2x0x

T

−2x0x det(x)I + 2xxT

].

This provides an easy way to obtain the Hessian. 2

Example 2.6.4. Let (Sn, ) be the Euclidean Jordan algebra dened in Example2.4.5. We have:

(i) K is the cone of the positive semidenite matrices;

(ii) B(X) = − log DetX is the barrier function of the cone of positive semidenitematrices, where Det(X) has the usual meaning;

(iii) the matrix X is invertible in (Sn, ), if and only if X is invertible with respectto the usual matrix product.

We have (cf. [12])∇B(X) = −X−1.

andDU

XDVXB(X) = 〈X−1UX−1, V 〉 U, V ∈ Sn, X ∈ K0.

FurtherP (X)−1U = X−1UX−1

for U ∈ Sn, X ∈ K0. 2

In the preceding three examples we have encountered the most common symmetriccones and their natural barrier functions in optimization:

(i) The so-called linear cone is the non-negative orthant:

Rn+ := x ∈ Rn : xi ≥ 0, i = 1, . . . , n.

(ii) The second order cone:

Ln := (x0; x) ∈ Rn+1 : x0 ≥ ‖x‖.

(iii) The positive semidenite cone:

Sn+ := X ∈ Sn : X º 0,

where º means that X is positive semidenite.

2.7 Simple Jordan algebras 35

2.7 Simple Jordan algebras

A vector space V is said to be the direct sum of the subspaces V1, . . . , Vk if V =V1 + · · ·+ Vk and Vi ∩ Vj = 0 for i 6= j. We then write

V = V1 ⊕ · · · ⊕ Vk.

A Jordan algebra V is said to be the direct sum of the subalgebras V1, . . . , Vk, sym-bolically V = V1⊕· · ·⊕Vk, if V is the direct sum of the vector spaces V1, . . . , Vk, andif Vi Vj = 0 holds for i 6= j.

Denition 2.7.1. Let V be R-algebra. We say that D is an ideal of V if it is avector subspace of V and for every a of D and every x of V the elements x a anda x belongs to D. 2

Denition 2.7.2. We say that a (Euclidean Jordan) R-algebra V is simple if theonly ideals are 0 and V . 2

The ideals 0 and V are called the trivial ideals of V . A non-trivial ideal D iscalled minimal if there exists no nonzero ideal E, such that E is a proper subset ofD.

In the next proposition we use the notation

V1 V2 := u1 u2 : u1 ∈ V1, u2 ∈ V2,

for any two subsets of V .

Proposition 2.7.3 (Proposition III.4.4 in [15]). If V is a Euclidean Jordan algebra,then it is, in a unique way, a direct sum of simple ideals.

Proof. Let D be a minimal ideal in V . We rst show that the orthogonal complement

D⊥ := x ∈ V : 〈x, y〉 = 0, ∀y ∈ D,

is an ideal. Let x be in V and y in D⊥, then for any z in D

〈x y, z〉 = 〈y, x z〉 = 0,

because the inner product is associative and x z belongs to D. Hence x y belongsto D⊥. Note that, if PD is an orthogonal projection onto D, then, for every y ∈ V ,

y = PDy + (I − PD)y.

Since (I − PD)y ∈ D⊥, this means that we can write V = D ⊕D⊥. We have

D D⊥ ⊂ D ∩D⊥ = 0,


because D and D⊥ are ideals. Therefore V is the direct sum of the ideals D andD⊥. Now we apply the same process to D⊥. Since V is a nite-dimensional vectorspace, after a nite number of steps we obtain the desired decomposition. To provethe uniqueness we consider two decompositions of V as a direct sum of simple ideals:

V = D1 ⊕D2 ⊕ · · · ⊕Dk

= E1 ⊕E2 ⊕ · · · ⊕ E`.

The intersection Di ∩ E1 is an ideal, therefore by minimality of either it is 0 orE1 = Di. It can not be 0 for every i, since this would imply

∀i,Di E1 ⊂ Di ∩ E1 = 0,

hence V E1 = 0, and E1 = e E1 = 0. So E1 = Di for some i. In the same wayones shows that every Ej equals Di for some i. The proposition is proved. ¥

The previous proposition immediately implies that any Euclidean Jordan algebrais, in a unique way, a direct sum of simple Euclidean Jordan algebras.

A symmetric cone K in a Euclidean space V is said to be primitive if there do notexist non-trivial subspaces V1, V2, and symmetric cones K1 ⊂ V1, K2 ⊂ V2 such thatV is the direct sum of V1 and V2, and K = K1 ⊕K2.

Proposition 2.7.4. Any symmetric cone K is, in a unique way, the direct sum ofprimitive symmetric cones.

Proof. Let V be an Euclidean Jordan algebra and K its associated symmetric cone.By Proposition 2.7.3 we can write

V = V1 ⊕ · · · ⊕ Vm.

Let Ki be the symmetric cone associated to Vi, with i = 1, . . . , r. Since xi xj = 0for any xi ∈ Vi and xj ∈ Vj with i 6= j, we have

x21 + · · ·+ x2

m = (x1 + · · ·+ xm)2.

HenceK = K1 ⊕ · · · ⊕ Km.


We now are ready to give a fundamental result: it states that there are ve simpleEuclidean Jordan algebras and consequently there are ve primitive symmetric cones.The proof is quite extensive and beyond the scope of this thesis. We refer to ChapterV in [15].

Theorem 2.7.5. If V is a simple Euclidean Jordan algebra then it is isomorphic toone of the following:

2.7 Simple Jordan algebras 37

(i) The space Rn+1 with Jordan multiplication dened as follows: let x := (x0; x)and y := (y0; y) with x, y ∈ Rn, and x0, y0 ∈ R, then x y := (xT y; x0y + y0x).

(ii) The set of real symmetric n× n matrices with dened by

X Y := (XY + Y X)/2 (2.22)

for symmetric matrices X and Y .

(iii) The set of complex Hermitian n× n matrices with dened by (2.22).

(iv) The set of Hermitian n×n matrices with quaternion entries and with denedby (2.22).

(v) The set of 3× 3 Hermitian matrices with octonion entries and with denedby (2.22).

As we mentioned before, the cone associated to the simple Euclidean Jordan al-gebra dened in (i) of Theorem 2.7.5 is the second order cone. For A = R,C,H, wedenote Herm(r,A) the vector space of Hermitian matrices with entries in A and rankr, where H denote the set of quaternions (for details see Section C.1). The primitivesymmetric cones associated to these, are the cones of positive semidenite matriceswith entries in A. The cone associated to Herm(3,O), where O denotes the set ofthe octotions (see Section C.3) is the cone of 3 × 3 Hermitian positive semidenitematrices with octonions entries. This cone is often referred to as the exceptional27-dimensional cone.

Note that the non-negative orthant Rn+ that appeared in several examples before,

may be written as the direct sum of n copies of K(Herm(1,R)), i.e.,

Rn+ = K(Herm(1,R))⊕ · · · ⊕ K(Herm(1,R)).

Remark 2.7.6. Suppose that we can write V as direct sum of two simple EuclideanJordan algebras, i.e, V = V1 ⊕ V2. Let x = x1 + x2 ∈ V and y = y1 + y2 ∈ V , withx1, y1 ∈ V1 and x2, y2 ∈ V2. Consider the spectral decompositions x1 =

∑qi=1 λi(x1)ci

and x2 =∑s

i=1 λi(x2)di, where c1 . . . , cq and d1 . . . , ds are Jordan frames in V1 andV2, respectively. Note that (c1, . . . , cq, d1, . . . , ds) is a Jordan frame in V1 ⊕ V2. Thenwe have

x =q∑

i=1

λi(x1)ci +s∑

i=1

λi(x2)di.

is a spectral decomposition of x. By Theorem 2.4.2 it follows that

det(x) = det(x1) det(x2)

andtr (x) = tr (x1) + tr (x2) .


Moreover

P (x)y = P (x1 + x2)(y1 + y2)= 2(x1 + x2) ((x1 + x2) y1)− (x1 + x2)2 y1

+2(x1 + x2) ((x1 + x2) y2)− (x1 + x2)2 y2

= 2x1 (x1 y1)− x21 y1 + 2x2 (x2 y2)− x2

2 y2

= P (x1)y1 + P (x2)y2,

where the third equality follows by Proposition 2.7.3. 2

2.8 Automorphisms

Let (V, ) be a Euclidean Jordan algebra and K its associated symmetric cone. Thissection introduces some important properties of automorphisms of V and K.

Recall that we denote the set of invertible linear maps from V to V by GL(V ).

Denition 2.8.1. A map g ∈ GL(V ) is called an automorphism of V if for every xand y in V , we have g(x y) = gx gy or, equivalently, gL(x)g−1 = L(gx). The setof automorphisms of V is denoted as Aut(V ). 2

In Section 2.5, we called g ∈ GL(V ) an automorphism of the symmetric cone K ifgK = K. We denote the automorphism group of a symmetric cone as Aut(K). So wehave

Aut(K) = g ∈ GL(V ) : gK = K.We say that a linear map is orthogonal if g∗ = g−1. The set of orthogonal auto-

morphisms that leave K invariant is denoted as OAut(K). So we have

OAut(K) = g ∈ GL(V ) : g∗ = g−1 and gK = K.

We would like to stress that Aut(V ) 6= Aut(K). For example, an element g ofAut(K) may not satisfy g(x y) = gx gy, since K is not closed under the Jordanproduct operation.

Proposition 2.8.2. If g ∈ Aut(K) then g∗ ∈ Aut(K).

Proof. Let g ∈ Aut(K) and x ∈ K, then for all y ∈ K,

〈x, g∗y〉 = 〈gx, y〉 ≥ 0,

because gx, y ∈ K and K is self-dual. Therefore g∗y ∈ K which implies g∗ ∈ Aut(K).¥

Proposition 2.8.3 (Proposition II.4.2 in [15]). The trace and the determinant areinvariant under Aut(V ).

2.9 The Peirce decomposition in a Jordan algebra 39

Proof. If g is an automorphism of V and p is a polynomial in R[X], then p(gx) =gp(x), and p(gx) = 0 if and only if p(x) = 0. Therefore, the minimal polynomial of xand gx are the same. It follows that x and gx have the same trace and determinant.

¥

The next theorem establish a connection between the automorphism group of asymmetric cone and the automorphism group of a Jordan algebra.

Theorem 2.8.4. We haveAut(V ) = OAut(K).

Proof. To prove that OAut(K) is a subset of Aut(V ) we refer to [15, Theorem III.5.1],because it would need details that are far beyond this introduction to EuclideanJordan algebras and symmetric cones. Anyway, we prove that Aut(V ) ⊂ OAut(K).Let g ∈ Aut(V ) then for any x ∈ V , gx2 = (gx)2 ∈ K, which implies that gK = K.Furthermore,

〈gx, y〉 = tr (gx y) = tr(g(x g−1y)

)= tr

(x g−1y

)= 〈x, g−1y〉.

Note that, the third equality follows from Proposition 2.8.3. Thus, g∗ = g−1. ¥

Theorem 2.8.5 (Theorem IV.2.5 in [15]). If c1, . . . , cr and d1, . . . , dr are twoJordan frames, then there exists g ∈ Aut(V ) such that

gci = di (1 ≤ i ≤ r).

2.9 The Peirce decomposition in a Jordan algebra

Let c be an idempotent element in a Jordan algebra V . Note that we do not assumethat V is Euclidean. By Proposition 2.4.6 the only possible eigenvalues of L(c) are 1,12 and 0. One introduces the subspaces

V (c, λ) := x ∈ V : c x = λx, λ ∈ 0, 12 , 1.

Each of these subspaces is the eigenspace corresponding to an eigenvalue of L(c),because if x ∈ V (c, λ) then L(c)x = c x = λx. Hence V is the direct sum of thecorresponding subspaces V (c, 1), V (c, 1

2 ) and V (c, 0). The decomposition

V = V (c, 1)⊕ V (c, 12 )⊕ V (c, 0) (2.23)

is called the Peirce decomposition of V with respect to the idempotent c. If x ∈ V (c, 1)and y ∈ V (c, 1

2 ) then L(c)x = x and L(c)y = 12y. Hence

〈x, y〉 = 〈L(c)x, y〉 = 〈x, L(c)y〉 = 〈x, 12y〉,

and it follows that 〈x, y〉 = 0. Therefore the decomposition (2.23) is orthogonal withrespect to the inner product.


Example 2.9.1. Let (V, ) be the Jordan algebra (Sn, ) dened in Example 2.2.7.Let n = p + q then

c :=[

Ip 00 0

]

is a orthogonal projection from Rn to Rn and hence an idempotent. We now have,

V (c, 1) = [

A 00 0

]: A is a p× p symmetric matrix

,

V (c, 0) = [

0 00 B

]: B is a q × q symmetric matrix

,

V (c, 12 ) =

[0 D

DT 0

]: D is a p× q matrix

and p + q = n. 2

Proposition 2.9.2. If x ∈ V (c, 0) or x ∈ V (c, 1) then [L(x), L(c)] = 0

Proof. If we replace y and z in identity (ii) of the Proposition 2.2.2, we get

[L(x), L(c)] + [L(c), L(c x)] + [L(c), L(x c)] = 0.

Since x ∈ V (c, 0) or x ∈ V (c, 1) means that xc = 0 or xc = x, the result follows. ¥

Proposition 2.9.3 (Proposition IV.1.1 in [15]). Let V be a Jordan algebra and can idempotent, the subspaces V (c, 1) and V (c, 0) are subalgebras of V . They areorthogonal in the sense that

V (c, 1) V (c, 0) = 0.

Furthermore,(V (c, 1) + V (c, 0)) V (c, 1

2 ) ⊂ V (c, 12 ),

V (c, 12 ) V (c, 1

2 ) ⊂ V (c, 1) + V (c, 0).

Proof. Using identity (iii) of Proposition 2.2.2, with c instead of x and x instead ofy, we obtain, for x ∈ V (c, λ),

(L(c)− λI)L(x)(2L(c)− I) = 0,

and applying it to y ∈ V (c, µ) (λ, µ = 0, 12 , 1), it follows

(2µ− 1)(L(c)− λI)(x y) = 0.

If µ = 0 or 1, thenL(c)(x y) = λ(x y),


which means that (xy) belongs to V (c, λ). Taking λ = 0 or 1, and µ = λ this provesthat V (c, 1) and V (c, 0) are subalgebras of V . Taking again λ = 0 or 1 and µ = 0or 1 but µ 6= λ and using Proposition 2.9.2 this proves that V (c, 1) and V (c, 0) areorthogonal. Again, for µ = 0 or 1 and taking λ = 1

2 , we have

L(c)(x y) = 12 (x y),

which means that x y ∈ V (c, 12 ). Therefore

V (c, 1) V (c, 12 ) ⊂ V (c, 1

2 ),

V (c, 0) V (c, 12 ) ⊂ V (c, 1

2 )

and it follows(V (c, 1) + V (c, 0)) V (c, 1

2 ) ⊂ V (c,12).

For the last assertion it is enough to show that if x belongs to V (c, 12 ), then x2 belongs

to V (c, 1) + V (c, 0), that is,x2 = a0 + a1,

with a0 ∈ V (c, 0) and a1 ∈ V (c, 1). In fact, with

a0 = x2 − c x2, a1 = c x2,

we have

L(c)a0 = L(c)x2 − L(c)L(c)x2

= (I − L(c))a1

= (L(x2)L(c)− L(x2 c))c,

and using identity (iii) of Proposition 2.2.2:

L(c)a0 = 2(L(x)L(c)− L(x c))L(x)c= (L(x)L(c)− 1

2L(x))x= L(x)(L(c)− 1

2I)x = 0,

because x ∈ V (c, 12 ). Hence a0 ∈ V (c, 0). From

0 = L(c)a0 = (I − L(c))a1

follows that L(c)a1 = a1. To nish, note that for x, y ∈ V (c, 12 ) we have

x2, y2, (x + y)2 ∈ V (c, 0) + V (c, 1).

Since V (c, 0) + V (c, 1) is vector space it follows that

x y =12((x + y)2 − x2 − y2) ∈ V (c, 0) + V (c, 1).

Everything is proved. ¥


Remark 2.9.4. From Proposition 2.9.3 and its proof we get that V (c, 0) and V (c, 1)are Jordan algebras. 2

We say that a linear map Q : V 7→ V is an orthogonal projection if Q2 = Q andQ is self-adjoint, i.e., Q∗ = Q. We say that two orthogonal projections, Q1 and Q2,are orthogonal with respect to each other if Q1Q2 = 0.

Proposition 2.9.5. The orthogonal projections onto V (c, 1), V (c, 0) and V (c, 12 ) are

respectively P (c), P (e − c) and I − P (c) − P (e − c). Moreover they are orthogonalwith respect to each other.

Sketch of the proof. We will just show that P (c) is a orthogonal projection ontoV (c, 1). The other cases are similar. Clearly P (c)2 = P (c2) = P (c). P (c) is self-adjoint. Let y ∈ V and y = y1 + y 1

2+ y0 its Peirce decomposition with respect to c,

where y1 ∈ V (c, 1), y 12∈ V (c, 1

2 ) and y0 ∈ V (c, 0). Since P (c) and L(c) commute (see(2.8)), we have

L(c)P (c)y = P (c)L(c)y.

Thus,P (c)L(c)y = P (c)(y1 + 1

2y12) = 2y1 − y1 + 1

4y 12− 1

4y 12

= y1.

This proves that P (c)y belongs to V (c, 1). Clearly, for y ∈ V (c, 1) we have P (c)y = y.For the orthogonality of the projections with respect to each other, it is just enoughto verify that

P (c)P (e− c) = 0.

We can easily see that

P (c) = 2L2(c)− L(c2) = L(c)(2L(c)− I)

andP (e− c) = (L(c)− I)(2L(c)− I).

Therefore,

P (c)P (e− c) = (L(c)(2L(c)− I)(L(c)− I))(2L(c)− I)= (2L3(c)− 3L(c) + L(c))(2L(c)− I).

Since2L3(c)− 3L(c) + L(c) = 0,

by the proof of Proposition 2.4.6, it follows that the projections are orthogonal withrespect to each other. For the other cases we just have to proceed analogously. ¥

Lemma 2.9.6 (Lemma IV.1.3 in [15]). If a and b are orthogonal idempotents, thenL(a) and L(b) commute.


Proof. This follows from identity (i) in Proposition 2.2.2 ¥

We just gave a decomposition of the vector space V with respect to an idempotent.Now, we are able to get a ner decomposition, using a Jordan frame c1, . . . , cr ofthe Euclidean Jordan algebra (V, , 〈·, ·〉).

For each ci we have three Peirce spaces, V (ci, 0), V (ci, 1), V (ci,12 ). We consider

the following subspaces of V ,

Vii := V (ci, 1) = Rci,

Vij := V (ci,12 ) ∩ V (cj ,

12 ),

where the set Rci := αci : α ∈ R. The equality Vii = Rci immediately follows:for all y ∈ Vii we have y ci = y, which means that ci is the identity element inVii. But since ci is primitive, i.e., it can not be written as the sum of two orthogonalidempotents, ci is the only element of the only Jordan frame in Vii. We concludethat, any element of Vii can be written as αci, with α ∈ R.

We dene

Pii = P (ci) (2.24)Pij = 4L(ci)L(cj), i 6= j (2.25)

Proposition 2.9.7. The endomorphisms Pij are orthogonal projections onto Vij andthey are orthogonal with respect to each other.Proof. We have seen in Proposition 2.9.5 that Pii are orthogonal projections onto Vii.Suppose now that i 6= j. If we replace x and y by ci and cj respectively, in the thirditem of Proposition 2.2.2, we get

2L(ci)L(cj)L(ci) = L(ci)L(cj). (2.26)

Then we have

P 2ij = 16L(ci)L(cj)L(ci)L(cj) = 8L(ci)L(cj)L(cj) = 4L(ci)L(cj) = Pij .

Since L(x) for any x ∈ V is self-adjoint then Pij is self-adjoint. Let

Wij := PijV = Pijx : x ∈ V .We want to prove that Wij = Vij . Let y ∈ Wij , then there exists x ∈ V such thaty = Pijx. We have,

L(ci)y = 4L(ci)L(ci)L(cj)x.

Therefore,L(ci)y = 2L(ci)L(cj)x =

12y,

which means that y ∈ V (ci,12 ). Analogously, we can prove that y ∈ V (cj ,

12 ). Hence

y ∈ Vij . Now suppose that y ∈ Vij , then easily

L(ci)L(cj)y =14y,


implying that y ∈ Wij . Proving that the projections are orthogonal with respect toeach other, resumes to prove what follows. For i 6= j, we have

P (ci)P (cj) = (2L2(ci)− L(ci))(2L2(cj)− L(cj))

= 4L2(ci)L2(cj)− 2L2(ci)L(cj)− 2L2(cj)L(ci) + L(cj)L(ci)= 0,

where the last equality follows by (2.26). Now, for i 6= j and using (2.26) it follows

PkkPij = P (ck)4L(ci)L(cj) = (2L2(ck)− L(ck))4L(ci)L(cj) = 0.

The last dierent case, for i, j and k all distinct, is

PkiPij = 4L(ck)L(ci)4L(ci)L(cj) = 8L(ck)L(ci)L(cj).

By the identity (2.26) we have

2L2(ci + cj)L(ck) = L(ci + cj)L(ck),

which is equivalent to

2L(ci)L(cj)L(ck) + 2L2(ci)L(ck) + 2L2(cj)L(ck) = L(ci)L(ck) + L(cj)L(ck).

From this, using (2.26), we conclude that L(ci)L(cj)L(ck) = 0 and consequentlyPkiPij = 0. Finally, the remaining cases follow from the ones we have proved. ¥

The orthogonal projections dened in Proposition 2.9.5 may be written, aftersimple calculations, as follows:

P (c) = L(c)(2L(c)− I),P (e− c) = (L(c)− I)(2L(c)− I),

I − P (c)− P (e− c) = 4L(c)(I − L(c)),

and c ∈ V is an idempotent.Thus, with respect to the Jordan frame, one can give a ner decomposition.

Theorem 2.9.8. The vector space V has the following orthogonal direct sum decom-position:

V =⊕

i≤j

Vij .

Proof. Let us x a positive integer k such that 1 ≤ k ≤ r. Using the fact that e− ck

is an idempotent and ck = e−∑i6=k ci, we can decompose the orthogonal projection


P (e− ck). Hence, by simple calculations:

P (e− ck) = (L(ck)− I)(2L(ck)− I)

=

−

∑

i 6=k

L(ci)

I − 2

∑

i 6=k

L(ci)

= 2∑

i 6=k

∑

j 6=k

L(ci)L(cj)−∑

i 6=k

L(ci)

=∑

i<j, i,j 6=k

4L(ci)L(cj) +∑

i 6=k

2L2(ci)− L(ci)

=∑

i<j i,j 6=k

Pij +∑

i 6=k

P (ci).

This means, by Propositions 2.9.5 and 2.9.7, that we can decompose

V (ck, 0) =∑

i<j, i,j 6=k

Vij +∑

i 6=k

Vii.

Analogously,

I − P (ck)− P (e− ck) = 4L(ck)(I − L(ck)) (2.27)= 4L(ck)

∑

i 6=k

L(ci) (2.28)

=∑

i6=k

4L(ck)L(ci). (2.29)

Again, we haveV (ck,

12) =

∑

i6=k

Vki.

Since, we have seen that

V = V (ck, 1) + V (ck,12) + V (ck, 0),

we can now conclude thatV =

∑

i≤j

Vij .

The Proposition 2.9.7 implies that the sum is direct. ¥

Proposition 2.9.9 (Theorem IV.2.1 in [15]). One has

Vij Vij ⊂ Vii + Vjj ,

Vij Vjk ⊂ Vik, if i 6= k,

Vij Vkl = 0, if i, j ∩ k, l = ∅.


Sketch of the proof. By Proposition 2.9.3:

Vij Vij ⊂ Vii + Vjj .

For i, j, k distinct:Vij ⊂ V (ci + cj , 1) ⊂ V (ck, 0),

therefore it follows from Proposition 2.9.3 that

Vij Vjk ⊂ V (ck, 0) V (ck,12) ⊂ V (ck,

12),

and, interchanging i and k, we obtain

Vij Vjk ⊂ Vik.

Finally, if i, j ∩ k, ` = ∅, thenVij ⊂ V (ci + cj , 1),

Vk` ⊂ V (ck + c`, 1) ⊂ V (ci + cj , 0),

and from Proposition 2.9.3 it follows that

Vij Vk` = 0.Everything is proved. ¥

Proposition 2.9.10. For x ∈ Vij with i 6= j we have tr (x) = 0.

Proof. If x ∈ Vij then x = 4L(ci)L(cj)x. Hence, by associativity of the trace,

tr (x) = 4tr (ci (cj x)) = tr ((ci cj) x) = 0,

and the property is proved. ¥

In view of the Theorem 2.9.8 we can decompose

x =r∑

i=1

xii +∑

i<j

xij ,

with xij ∈ Vij and i ≤ j or equivalently,

x =r∑

i=1

xici +∑

i<j

xij (2.30)

with xi ∈ R, i = 1, . . . , r and xij ∈ Vij , i < j. We call it the Peirce decomposition ofx with respect to the Jordan frame c1, . . . , cr.

Recall from Section 2.1 that the eigenvalues of x are the same as those of L0(x).But what can be said on the eigenvalues of L(x), and of P (x)? A complete answer tothis question is given by the following result.


Proposition 2.9.11. For x ∈ V and x =∑r

k=1 λkck its spectral decomposition wehave

L(x) =r∑

i=1

λiPii +∑

i<j

λi + λj

2Pij ,

P (x) =r∑

i=1

λ2i Pii +

∑

i<j

λiλjPij

Proof. Since, for y ∈ V , we can write y =∑

i≤j yij with yij ∈ Vij we also haveL(ci)yii = yii, L(ck)yii = 0 for k 6= i, L(ci)yij = 1

2yij , L(cj)yij = 12yij for i 6= j and

L(ck)yij = 0 for k /∈ i, j. Hence

L(x)y =r∑

k=1

λkL(ck)y

=r∑

i=1

r∑

k=1

λkL(ck)yi +∑

i<j

r∑

k=1

λkL(ck)yij

=r∑

i=1

λiyi +∑

i<j

λi12yij + λj

12yij

=r∑

i=1

λiPiiy +∑

i<j

12(λi + λj)Pijy.

It follows thatL(x) =

r∑

i=1

λiPii +∑

i<j

12(λi + λj)Pij .

The second expression of the theorem we easily get:P (x) = 2L(x)2 − L(x2)

= 2(∑

i

λiL(ci))2 −

∑λ2

i L(ci)

=∑

λ2i

(2L(ci)2 − L(ci)

)+ 4

∑

i<j

λiλjL(ci)L(cj)

=∑

i

λ2i Pii +

∑

i<j

λiλjPij .


Corollary 2.9.12. Let x ∈ V and its spectral decomposition x =∑r

i=1 λici. Thenthe following statements hold.1. The eigenvalues of L(x) have the form

λi + λj

21 ≤ i ≤ j ≤ r.


2. The eigenvalues of P (x) have the form

λiλj , 1 ≤ i ≤ j ≤ r.

2.10 Conclusion

The material presented in this chapter has been developed by algebraists and alreadydiscovered by some optimizers. However, most members of the optimization commu-nity are not familiar with it. We therefore intended to give a simple and not too deepexposition of Euclidean Jordan algebras and the associated symmetric cones.

The results without a reference, were composed by the author. In the other cases,if we included a proof, it was just the proof provided by the indicated reference.

In the next chapter we will frequently refer to the results of this chapter.

Chapter 3Eigenvalues, spectral functions

and their derivatives

3.1 Introduction

Eigenvalues are the key parameters when we generalize the interior-point methodsfor linear optimization to symmetric optimization. The reason is that the barrierfunctions can be written as a function that only depends on the eigenvalues of itsargument. We will explore this characterization of the barrier functions in later chap-ters. Therefore, in this chapter we present some similarity properties and inequalitiesfor eigenvalues.

The barrier functions used in interior-point methods are a subclass of spectralfunctions: functions which only depend on the eigenvalues of its argument. Thus wepresent some aspects of spectral functions, including their derivatives.

As before, we consider a n-dimensional Euclidean Jordan R-algebra (V, , 〈·, ·〉)with rank r and K will denote the associated symmetric cone, and the inner productis dened by 〈x, y〉 = tr (x y) . For any x in V , let λ(x) ∈ Rr be the vector of theeigenvalues of x. We assume that the eigenvalues are always in non-increasing order,i.e., λ1(x) ≥ λ2(x) ≥ · · · ≥ λr(x) and we denote this as λ(x) ∈ Rr

↓.

3.2 Similarity

One says that two elements x and y in V are similar , denoted as x ∼ y, if and only if

x = gy for some g ∈ OAut(K), (3.1)

where OAut(K) stands for the set of orthogonal automorphisms that leave K invariant(c.f. [48]). Recall from Theorem 2.8.4 and Proposition 2.8.3 that x and gx have the

49

50 Chapter 3. Eigenvalues, spectral functions and their derivatives

same characteristic polynomial, so we can say that x and y are similar if and only ifthey share the same eigenvalues, including their multiplicities.

The following proposition establishes another characterization of similarity, usingthe linear map L(x), x ∈ V .

Proposition 3.2.1 (Proposition 19 in [47]). Two elements x and y of an EuclideanJordan algebra are similar if and only if L(x) and L(y) are similar.

Proof. As the eigenvalues of L(x) are determined by the eigenvalues of x (see Corollary2.9.12), the "only if" part is true.

Now assume x and y are not similar. Let λ(x), λ(y) ∈ Rr↓. Let k be the smallest

index such that λk(x) 6= λk(y). W.l.o.g we can assume λk(x) > λk(y). If k = 1,then λk(x) is larger than all eigenvalues of L(y) but it is an eigenvalue of L(x). SoL(x) and L(y) are not similar. In other case, we have k > 1 and λi(x) = λi(y) fori < k. So the multiplicities for eigenvalues larger than λk(x) are the same in λ(x) andλ(y). If λk(x) is not eigenvalue of y then also (λk(x) + λ1(x))/2 is not eigenvalue ofL(y). If λk(x) is eigenvalue of y then its multiplicity is larger as eigenvalue of x thanas eigenvalue of y. Hence the multiplicity of (λk(x) + λ1(x))/2 is larger in the setof eigenvalues of L(x) than of L(y). Therefore L(x) and L(y) are not similar. Thiscompletes the proof. ¥

In the following proposition we use the same arguments as in the proof of Propo-sition 3.2.1, as long as the eigenvalues of x and y are positive. If the eigenvalues arenot positive, P (x) and P (y) may be similar but x and y do not. As example, letλ1, λ2 and −λ1,−λ2 be the eigenvalues of x and y, respectively. By Corollary 2.9.12,the eigenvalues of P (x) and P (y) are λ2

1, λ1λ2, λ22. Thus P (x) and P (y) are similar in

opposition to x and y.

Proposition 3.2.2 (Corollary 20 in [47]). Let x and y be two elements in the interiorof K. Then x and y are similar if and only if their quadratic representations P (x)and P (y) are similar.

In the following we present more technical properties concerning similarity.

Proposition 3.2.3 (Proposition 21 in [47]). Let V be Euclidean Jordan algebras andx, s, z ∈ K0. Dene x := P (z)x and s := P (z−1)s. Then

(i) P (x1/2)s ∼ P (s1/2)x

(ii) P (x1/2)s ∼ P (x1/2)s

Proof. First note that by Proposition 2.5.7, P (s1/2)x, P (x1/2)s ∈ K0. By the funda-mental formula we have

P (P (x1/2)s) = P (x)1/2P (s)P (x)1/2,

which by Proposition B.1.1 is similar to

P (s)1/2P (x)P (s)1/2 = P (P (s1/2)x)

3.2 Similarity 51

and (i) follows from Proposition 3.2.2. Again, using the fundamental formula, wehave

P (P (x1/2)s) = P (x1/2)P (s)P (x1/2),

which is similar toP (x)P (s).

Replacing x and s, according to their denition, we obtain

P (x)P (s) = P (P (z)x)P (P (z−1)s)

= P (z)P (x)P (z)P (z−1)P (s)P (z−1)

= P (z)P (x)P (s)P (z)−1.

Hence, P (x)P (s) is similar to P (x)P (s). Since P (x)P (s) is similar to P (x12 )s and

P (x)P (s) is similar to P (x1/2)s, statement (ii) follows. ¥

The following proposition is a new result. It plays an important role in the analysisof the algorithm that will be presented in Chapter 5.

Proposition 3.2.4. Let x, s ∈ K0 and w := P (x12 )(P (x

12 )s)−

12 . Then

(P (x1/2)s)1/2 ∼ P (w)1/2s.

Proof. By Proposition 3.2.2, the statement is equivalent to

P (P (x1/2)s)1/2 ∼ P (P (w)1/2s).

Therefore, by the fundamental formula,

P (P (x1/2)s)1/2 = (P (x)1/2P (s)P (x)1/2)1/2 ∼ (P (x)P (s))1/2.

On the other hand

P (P (w1/2)s) = P (w)1/2P (s)P (w)1/2 ∼ P (w)P (s).

Thus, we obtain

P (w)P (s) = P (x)1/2P (P (x)1/2s)−1/2P (x)1/2P (s)= P (x)1/2(P (x)1/2P (s)P (x)1/2)−1/2P (x)1/2P (s)∼ (P (x)1/2P (s)P (x)1/2)−1/2P (x)1/2P (s)P (x)1/2

= (P (x)1/2P (s)P (x)1/2)1/2

∼ (P (x)P (s))1/2,

and the proposition follows. ¥


The point w in the previous proposition is known as the scaling point correspond-ing to x and s (see [16]), i.e.,

P (w)−1/2x = P (w)1/2s. (3.2)

Later, in Section 5.4 we approach this topic.Corollary 3.2.5. We have

P (x1/2)s ∼ P (w)1/2s P (w)−1/2x.

Proof. This follows from Proposition 3.2.4 and (3.2). ¥

3.3 An important eigenvalue inequality

In this section, we prove the following inequality: for x, s ∈ K and u := P (x)1/2s,k∏

i=1

λi(u) ≤k∏

i=1

λi(x)λi(s) k = 1, . . . , r − 1, (3.3)

The property (3.3) is a generalization of the Theorem B.1.2 to Euclidean Jordanalgebras. This inequality is of crucial importance to establish e-convexity of thebarrier functions.

We rst relate the minimum eigenvalue and the maximum eigenvalue of u with theminimum and the maximum eigenvalue, respectively, of x and s. It means that theeigenvalues of u are less dispersed than the component-wise product λ(x)λ(s), whereλ(x) ∈ Rr

↓ is the vector of eigenvalues of x. We will need it to prove the inequality(3.3).Proposition 3.3.1. Let x, s ∈ K and u := P (x)1/2s, then

λmax(u) ≤ λmax(x)λmax(s)

andλmin(u) ≥ λmin(x)λmin(s).

Proof. By the fundamental formula,

P (u) = P (x)1/2P (s)P (x)1/2,

which is similar to P (x)P (s). Therefore, by Theorem B.1.2

λmax(P (u)) ≤ λmax(P (x))λmax(P (s)),

which implies, by Corollary 2.9.12, that

λ2max(u) ≤ λ2

max(x)λ2max(s).

Thus, the rst inequality follows. The second one follows analogously. ¥

3.3 An important eigenvalue inequality 53

We will prove the inequality (3.3) rst for each of the ve simple Euclidean Jordanalgebras characterized by Theorem 2.7.5. In fact, the inequality is known to be truefor Herm(n,R) and for Herm(n,C) and, to our knowledge, is not yet known forV = Herm(n,H) or V = Herm(3,O).

Theorem 3.3.2. Let V be a simple Euclidean Jordan algebra, and K its cones ofsquares. Then for x, s ∈ K and u := P (x)1/2s,

k∏

i=1

λi(u) ≤k∏

i=1

λi(x)λi(s) k = 1, . . . , r − 1

r∏

i=1

λi(u) =r∏

i=1

λi(x)λi(s).

Proof. First we focus on the inequality. The equality is handled at the end. Sup-pose that (V, ) is the simple Euclidean Jordan algebra described in Theorem 2.7.5-(i). In Example 2.4.4 we have seen that the elements of this Jordan algebra hasonly two eigenvalues. Hence, the result follows from Proposition 3.3.1. If V =Herm(n,R), Herm(n,C) or Herm(n,H), and

x s =xs + sx

2

then, by simple calculations,

u = P (x)1/2s = x1/2sx1/2.

For V = Herm(n,R) or Herm(n,C), the result follows by Theorem B.1.2. In the caseof V = Herm(n,H) the result follows by Theorem C.2.5.

If V = Herm(3,O) and x s = xs+sx2 , any x ∈ V has only three eigenvalues

(see Appendix C.4). Since λ3(u) ≥ λ3(x)λ3(s) (Proposition 3.3.1) and det(u) =det(x) det(s) (Proposition 2.5.12), we have

λ1(u)λ2(u) ≤ λ1(x)λ2(x)λ1(s)λ2(s).

Since by Theorem 2.7.5 any simple Euclidean Jordan algebra is isomorphic to oneof these ve, the inequality holds for any simple Euclidean Jordan algebra and anyx, s ∈ K. The equality is a direct consequence of

det(P (x)1/2s) = det(x) det(s),

according to Proposition 2.5.12. ¥

Theorem 3.3.3. The inequality (3.3) is valid for any Euclidean Jordan algebra.

Proof. For simplicity of notation we will prove the inequality for x instead of x1/2,but the result follows in the same way. Suppose for the sake of simplicity that the


Euclidean Jordan algebra V is just the direct sum of two simple Euclidean Jordanalgebras V1 and V2, i.e V = V1⊕V2. Let K, K1 and K2 the symmetric cones associatedwith V , V1 and V2 respectively, where obviously K1 and K2 are primitive. Let x =x1 + x2 ∈ K and s = s1 + s2 ∈ K with x1, s1 ∈ K1 and x2, s2 ∈ K2. By Remark 2.7.6we have

u := P (x)s = P (x1)s1 + P (x2)s2.

Let u1 := P (x1)s1 and u2 := P (x2)s2. Again, as is clear from Remark 2.7.6, theeigenvalues of u are all the eigenvalues of u1 and u2. Let the k largest eigenvalues ofu be given by the k1 largest eigenvalues of u1 and by the k2 largest eigenvalues of u2,where k = k1 + k2. Hence, by Theorem 3.3.2, we have

k∏

i=1

λi(u) =k1∏

i=1

λi(u1)k2∏

i=1

λi(u2) ≤k1∏

i=1

λ2i (x1)λi(s1)

k2∏

i=1

λ2i (x2)λi(s2).

Note that λ1(s1) ≥ · · · ≥ λk1(s1) are the k1 largest eigenvalues of s1 and λ1(s2) ≥· · · ≥ λk2(s2) are the k2 largest eigenvalues of s2, but both sets of eigenvalues maynot be the k largest eigenvalues of s. Since they are eigenvalues of s we certainly havethat

k1∏

i=1

λi(s1)k2∏

i=1

λi(s2) ≤k∏

i=1

λi(s).

The same is valid for the eigenvalues of x. Thus

k∏

i=1

λi(u) ≤k1∏

i=1

λ2i (x1)λi(s1)

k2∏

i=1

λ2i (x2)λi(s2) ≤

k∏

i=1

λ2i (x)λi(s).

¥

We proceed this section giving some more technical lemmas.

Lemma 3.3.4. Let x, s ∈ K. Then

λmin(s)λmax(x) ≤ λmin(s)tr (x) ≤ tr(P (x)1/2s

)≤ λmax(s)tr (x) ≤ rλmax(s)λmax(x).

Proof. Since P (x) is self-adjoint and P (x)e = x2, we have

tr(P (x)1/2s

)= tr

(P (x)1/2s e

)= tr

(s P (x)1/2e

)= tr (x s) .

Let s =∑r

i=1 λi(s)ci be the spectral decomposition of s. So,

tr (x s) = tr

(r∑

i=1

λi(s)ci x

)=

r∑

i=1

λi(s)tr (ci x) .

3.3 An important eigenvalue inequality 55

Since ci = c2i ∈ K, by Lemma 2.5.7, P (x)1/2ci ∈ K. This means that

tr (x ci) = tr(P (x)1/2ci

)≥ 0.

We can now conclude that

tr (x s) ≥ λmin(s)r∑

i=1

tr (ci x) = λmin(s)tr (x) ≥ λmin(s)λmin(x).

The other inequality follows in the same way. ¥

The following lemma gives an explicit formula for the minimum and maximumeigenvalues of an element in V . In fact, it is a particular case of the Courant-Fischer'sTheorem for Jordan algebras (Theorem 3.4.1 in [4]).Lemma 3.3.5. Let a ∈ V , then we obtain the smallest and the largest eigenvalue as

λmin(a) = miny

〈y, a y〉〈y, y〉 , λmax(a) = max

y

〈y, a y〉〈y, y〉 .

Lemma 3.3.6 (Lemma 14 in [47]). Let a, b ∈ V , then we can bound the eigenvaluesof a + b as follows

λmin(a + b) ≥ λmin(a)− ‖b‖ (3.4)λmax(a + b) ≤ λmax(a) + ‖b‖ (3.5)

Proof. Recall that ‖·‖ denotes the norm induced by the inner product. Using Lemma3.3.5 we may write

λmin(a + b) = miny

〈y, (a + b) y〉〈y, y〉

= minu

〈y, a y〉+ 〈y, b y〉〈y, y〉

≥ miny

〈y, a y〉〈y, y〉 + min

y

〈y, b y〉〈y, y〉

= λmin(a) + λmin(b)≥ λmin(a)− ‖b‖.

We have that

‖b‖ =√

tr (b2) =√

λ21(b) + · · ·+ λ2

r(b) ≥ |λmin(b)|.From here we obtain that

−‖b‖ ≤ λmin(b) ≤ ‖b‖.Thus

λmin(a + b) ≥ λmin(a) + λmin(b) ≥ λmin(a)− ‖b‖,and (3.4) follows. The proof of (3.5) is similar. ¥


Many inequalities of eigenvalues for symmetric matrices can be extended to Eu-clidean Jordan algebras. We dealt with the ones that are useful in this thesis. Formatter of curiosity we nish the section with a generalization of the Hadamard in-equality. Let c1, . . . , cr be a Jordan frame and x =

∑ri=1 xici +

∑i<j xij be the Peirce

decomposition of x with respect to the Jordan frame c1, . . . , cr.

Proposition 3.3.7. For x in K we have

det(x) ≤r∏

i=1

〈x, ci〉,

where c1, . . . , cr is a Jordan frame.

Proof. Let x =∑r

j=1 λj(x)dj be a spectral decomposition of x. Therefore

〈x, ci〉 =r∑

j=1

λj(x)〈ci, dj〉.

Let B = [bij ] be a matrix such that bij = 〈ci, dj〉. Note that bij ≥ 0 for all i andj, since ci, dj are in K, and the sum of all elements in any column or row of B is 1.Hence

r∏

i=1

〈x, ci〉 =r∏

i=1

r∑

j=1

λj(x)bij

≥r∏

i=1

r∏

j=1

λj(x)bij

=r∏

j=1

λj(x)Pr

i=1 bij

=r∏

j=1

λj(x) = det(x),

where the inequality follows from the weighted arithmetic-geometric mean inequality(Lemma D.1.2). Note that, in case some bij = 0 we can remove it from the inequalityto apply Lemma D.1.2. ¥

3.4 Derivatives of eigenvalues

In this section we deduce formulas for some derivatives of eigenvalues. The formulasobtained in this section are, to our knowledge, new. To obtain these formulas wefollowed [29], where similar formulas for matrices were deduced. A thorough studyabout the dierentiability of eigenvalues can be found in [4].

3.4 Derivatives of eigenvalues 57

As before (see denition 2.3.7) we denoted Dxf(x) as the derivative of f at x,where f : V 7→ V is a function whose domain has a non-empty interior and x is apoint of the interior of the domain of f . The second derivative is denoted as D2

xf(x).For simplicity of notation and in case V = R we will sometimes use f ′(t) and f ′′(t)as the rst and second derivative of f with respect to t, respectively, for t ∈ R.

Let b ∈ Rr↓. Since the coordinates of b are non-increasing, we can write

b1 = · · · = bk1 > bk1+1 = · · · = bk2 > bk2+1 · · · bkd, (kd := r).

Thus d is the number of distinct values in the coordinates of the vector b. We denethe corresponding partition I1, . . . , Ir of the index set 1, . . . , n such that

I1 := 1, 2, . . . , k1, I2 := k1 + 1, . . . , k2, . . . , Id := kd−1 + 1, . . . , kd, (3.6)

and we call these sets blocks of b. Below we will deal with the blocks of the vectorλ(x) for x ∈ V .

We say that an eigenvalue of x is simple if the correspondent block has size 1.If for x ∈ V , its spectral decomposition is x =

∑ri=1 λi(x)ci then xci = λici for all

i. We get this by multiplying, in Jordan product sense, both sides of x =∑r

i=1 λi(x)ci

by ci.

Proposition 3.4.1 (Corollary 34 in [5]). Let 1 ≤ k ≤ r, x =∑r

i=1 λi(x)ci ∈ V andλ(x) ∈ Rr

↓. Denote Sk(x) =∑k

i=1 λi(x). If k = kj for some j ∈ 1, . . . , d then Sk isdierentiable with respect to x and DxSk(x) =

∑ki=1 ci.

Proposition 3.4.2. Under the assumptions of Proposition 3.4.1, we have

Dx

(∑

i∈I`

λi(x)

)=

∑

i∈I`

ci

for ` = 1, . . . , d.

Proof. Just notice that by Proposition 3.4.1 we have

Dx

∑

i∈Ikj+1

λi(x)

= DxSkj+1(x)−DxSkj (x) =

∑

i∈Ikj+1

ci,

thus proving the lemma. ¥

In fact, we can say that the sum of equal eigenvalues at x, i.e.∑

i∈I`λi(x), is

dierentiable at x. Indeed, as we show below by an example, that if an eigenvalue isnot simple then it might not be dierentiable.

Example 3.4.3. Let V be the Euclidean Jordan algebra dened in Example 2.4.4.We have already seen that for x ∈ V ,

λ1(x) = x0 − ‖x‖ and λ2(x) = x0 + ‖x‖.


Obviouslyλ1(x) = λ2(x)

holds if and only if ‖x‖ = 0, i.e, if and only if x = 0. For that case we compute nowthe directional derivative of λ2(x) at x in the direction u = (u0; u):

Duxλ2(x) = lim

t→0

λ2(x + tu)− λ2(x)t

= limt→0

x0 + tu0 + ‖x + tu‖ − x0

t

= limt→0

tu0 + ‖tu‖t

= limt→0

u0 +|t|‖u‖

t.

Since we havelim

t→0−

|t|‖u‖t

= −‖u‖

andlim

t→0+

|t|‖u‖t

= ‖u‖,

we conclude that λ2(x) is not dierentiable at x if u 6= 0. 2

Corollary 3.4.4. If λi(x), with 1 ≤ i ≤ r, is a simple eigenvalue then

Dxλi(x) = ci, i = 1, . . . , r.

Proof. This follows from Proposition 3.4.2, with |I`| = 1. ¥

The properties proved in the remaining section are not used in thesis. However,we present them here hoping that they will be useful in the future.

From now on we assume that x depends linearly in parameter t as follows:

x(t) = x0 + tu, x0, u ∈ V, t ∈ R,

but we sometimes write x instead of x(t) for simplicity of notation. Let

x(t) =r∑

i=1

λi(x(t))ci

be the spectral decomposition of x(t) and

u =r∑

i=1

uici +∑

i<j

uij

the Peirce decomposition of u with respect to the Jordan frame c1, . . . , cr (see (2.30)).


Proposition 3.4.5. Let λi(x(t)) be a simple eigenvalue of x(t). Then

Dtλi(x(t)) = ui.

Proof. Applying the chain rule we have Dtλi(x(t)) = 〈Dxλi(x(t)), x′(t)〉. Thereforeby Corollary 3.4.4,

Dtλi(x(t)) = 〈ci, u〉 = ui.


Proposition 3.4.6. Under the assumptions of Proposition 3.4.1, we have

Dt

(∑

i∈I`

λi(x(t))

)=

∑

i∈I`

ui,

for ` = 1, . . . , d.

Proof. This follows from Proposition 3.4.2. ¥

Lemma 3.4.7. Let c(t) ∈ V be an idempotent dependent on a real parameter t. Ifc(t) is dierentiable with respect to t then

c(t) c′(t) = 12c′(t).

Proof. Since c(t)2 = c(t), we have 2c(t) c′(t) = c′(t). The result follows. ¥

Lemma 3.4.8. Let ci(t) and cj(t) be two orthogonal idempotents. If ci(t) and cj(t)are dierentiable with respect to t then

c′i(t) cj(t) + ci(t) c′j(t) = 0.

Proof. The proof follows from ci(t) cj(t) = 0. ¥

Let x =∑r

i=1 λi(x)ci be a spectral decomposition of x. If all the eigenvalues aresimple, by Theorem 2.4.1 the Jordan frame c1, . . . , cr is unique with respect to x andevery ci is a polynomial in x. Hence, every ci is dierentiable with respect to x. Thefollowing theorem deduces a formula for Dtci(x(t)).

Theorem 3.4.9. Let x(t) := x0 + tu and x =∑r

i=1 λi(x(t))ci be the spectral decom-position of x. Suppose that the eigenvalues of x are simple. Then

Dtci(x(t)) = 4∑

j 6=i

(ci u) cj

λi − λj=

∑

j 6=i

uij

λi − λj.


Proof. During the proof we use cj or cj(t) instead of cj(x(t)). Clearly x ci = λici.Dierentiating this expression to t we get

x′(t) ci + x c′i(t) = λ′i(t)ci + λic′i(t),


λ′i(t)ci = u ci + (x− λie) c′i(t).

Pre-multiplying by cj we obtain

0 = cj (u ci) + cj ((x− λie) c′i(t)) for j 6= i. (3.7)

From Lemma 3.4.8 we get that

(c′i(t) cj(t)) x + (ci(t) c′j(t)) x = 0,

for i 6= j, which, commutating x with ci (because ci ∈ R[x]), is equivalent to

(c′i(t) x) cj + (x c′j(t)) ci = 0, i 6= j. (3.8)

On the other hand, we have (x cj) ci = (x ci) cj . Dierentiating this expressionwith respect to t we get

(u cj) ci + (x c′j) ci + (x cj) c′i = (u ci) cj + (x c′i) cj + (x ci) c′j ,

where we used c′i instead of c′i(t). Rearranging the terms in the last expression we get

(x c′j(t)) ci + (x cj) c′i(t) = (x c′i(t)) cj + (x ci) c′j(t). (3.9)

From expressions (3.8) and (3.9) we obtain

−(x c′i(t)) cj + (x cj) c′i(t) = (x c′i(t)) cj + (x ci) c′j(t).

From here it follows

2(x c′i(t)) cj = (x cj) c′i(t)− (x ci) c′j(t)= λjcj c′i(t)− λici c′j(t)= λjcj c′i(t) + λic

′i(t) cj (Lemma 3.4.8)

= (λj + λi)c′i(t) cj

Using the last expression in equation (3.7), we get

0 = cj (u ci) + 12 (λj + λi)c′i(t) cj − (cj λie) c′i(t) for j 6= i,

hence(λi − λj)c′i(t) cj = 2cj (u ci) for j 6= i



c′i(t) cj = 2cj (u ci)

λi − λjfor j 6= i.

Taking the sum over j 6= i, it follows that∑

j 6=i

cj c′i(t) = 2∑

j 6=i

cj (u ci)λi − λj

.

Note that∑

j 6=i cj = e− ci. Therefore, by Lemma 3.4.7,∑

j 6=i

cj c′i(t) = (e− ci) c′i(t) = c′i(t)− ci c′i(t) =12c′i(t).


c′i(t) = 4∑

j 6=i


.

Since Pij = 4L(ci)L(cj) (see Proposition 2.9.7) and Piju = uij the last equality ofthe theorem follows. ¥

Theorem 3.4.9 provides an easy way to compute the derivative of ci(x(t)) withrespect to t. See the example below.Example 3.4.10. Let (Rn+1, ) be the Euclidean Jordan algebra with the Jordanproduct dened by

x y = (xT y; x0y + y0x),

and denoting (x0; x1; . . . ; xn) ∈ Rn+1 as (x0; x) with x = (x1; . . . ; xn). As we knowfrom Example 2.6.3, the symmetric cone of this Jordan algebra is the second-ordercone. Recall from Example 2.4.4 that, the spectral decomposition of x ∈ Rn+1 is

x = λ1(x)c1(x) + λ2(x)c2(x),

where the eigenvalues are

λ(x) =[

λ1(x)λ2(x)

]=

[x0 − ‖x‖x0 + ‖x‖

]

and the Jordan frame is

c1(x) =12

[1

− x‖x‖

], c2(x) =

12

[1x‖x‖

].

Setting x := x0 + tu, we dierentiate c1(x(t)) with respect to t. This gives

c′1(t) =12

(0;Dt

(− x

‖x‖))

=12

0;

−u‖x‖+ x xT u‖x‖

‖x‖2

. (3.10)


On the other hand, if we apply directly Theorem 3.4.9, we get

c′1(t) = 4c2 (u c1)

λ1 − λ2

= 4c2 1

2 (u0 − uT x‖x‖ ; u− u0

x‖x‖ )

−2‖x‖

= 414 (u0 − uT x

‖x‖ + uT x‖x‖ − u0

xtx‖x‖ ; u− u0

x‖x‖ + (u0 − uT x

‖x‖ ) x‖x‖ )

−2‖x‖

=(0; u− uT x

‖x‖x‖x‖ )

−2‖x‖

=12(0;− u

‖x‖ +uT x

‖x‖3 x),

in agreement with (3.10). 2

In the following property we use the twice dierentiability of the eigenvalues. Infact, if the eigenvalues of x ∈ V are simple then they are twice dierentiable at x. Thisfollows from Corollary 3.4.4 and because in this case the Jordan frame is dierentiableat x.Corollary 3.4.11. Let x(t) =

∑ri=1 λi(x(t))ci(x(t)). If the eigenvalues are simple

thenD2

t λi(x(t)) = 4∑

j 6=i

tr ((u (cj u)) ci)λi − λj

=∑

j 6=i

tr(u2

ij

)

λi − λj.

Proof. From Proposition 3.4.5 we getD2

t λi(x(t)) = Dt(ui) = 〈c′i(t), u〉. (3.11)Now, the result follows from Theorem 3.4.9. ¥

We prove similar properties for eigenvalues which are not simple. However, itsproof requires special attention because, we cannot guarantee dierentiability for theeigenvalues at x. Aggregating all the idempotents in the same block, for λ(x) ∈R↓, we have, by Theorem 2.4.1, that e` :=

∑i∈I`

ci with ` = 1 . . . , d is a uniquecomplete system of idempotents and e` ∈ R[x]. Consequently e`, with ` = 1, . . . , dare dierentiable with respect to x. The Propositions 3.4.1 and 3.4.2 provide us thetools for the proof of the following result.Theorem 3.4.12. Let x(t) =

∑ri=1 λi(x(t))ci(x(t)). If the eigenvalues are not simple

and λ(x) ∈ R↓, then

Dt

(∑

i∈I`

ci(x(t))

)= 4

∑

i∈I`

∑

j /∈I`

(cj u) ci

λi − λj=

∑

i∈I`

∑

j /∈I`

uij

λi − λj,

for ` = 1, . . . , d.


Proof. Let e` :=∑

i∈I`ci and em :=

∑i∈Im

ci, with m 6= `. Clearly

(x e`) em = 0. (3.12)

Dierentiating both sides of (3.12) to t, we get

(x′(t) e`) em + (x e′`(t)) em + (x e`) e′m(t) = 0,


(u e`) em + (x e′`(t)) em +∑

i∈I`

λici e′m(t) = 0. (3.13)

From Lemma 3.4.8 we get that

(e′`(t) em(t)) x + (e`(t) e′m(t)) x = 0,

for m 6= `, which commutating x with e` and em, is equivalent to

(e′`(t) x) em + (x e′m(t)) e` = 0, m 6= `. (3.14)

On the other hand, we have (xem)e` = (xe`)em. Dierentiating this expressionto t we get

(u em) e` + (x e′m) e` + (x em) e′` = (u e`) em + (x e′`) em + (x e`) e′m,

where we used e′` instead of e′`(t). Rearrange the last expression we get

(x e′m(t)) e` + (x em) e′`(t) = (x e′`(t)) em + (x e`) e′m(t). (3.15)

Using the identity (3.14) in the identity (3.15) we obtain

−(x e′`(t)) em + (x em) e′`(t) = (x e′`(t)) em + (x e`) e′m(t).

From here, it follows that

2(x e′`(t)) em = (x em) e′`(t)− (x e`) e′m(t)= αmem e′`(t)− αè` e′m(t)= αmem e′`(t) + αèm e′`(t) (Lemma 3.4.8)= (αm + α`)e′`(t) em,

where we denoted αm := λj for j ∈ Im, because λi = λj for all i, j ∈ Im. Using thelast expression in equation (3.13), we get

0 = (u e`) em +12(αm + α`)e′`(t) em − αèm e′`(t) for ` 6= m.

Hence(α` − αm)e′`(t) em = 2em (u e`) for m 6= `



e′`(t) em = 2em (u e`)

α` − αmfor m 6= `.

Taking the sum over m 6= `, it follows that∑

m 6=`

em e′`(t) = 2∑

m 6=`

em (u e`)αm − α`

.

Regard that∑

m 6=` em = e− e`. Therefore, by Lemma 3.4.7,∑

m 6=`

em e′`(t) = (e− e`) e′`(t) = e′`(t)− e` e′`(t) =12e′`(t).


e′`(t) = 4∑

m6=`

em (u e`)α` − αm

.

Replacing em by∑

j∈Imcj and e` by

∑i∈I`

ci we obtain

Dt

∑

i∈I`

ci(t) = 4∑

m 6=`

∑

i∈I`

∑

j∈Im

cj (u ci)α` − αm

= 4∑

m 6=`

∑

i∈I`

∑

j∈Im


= 4∑

i∈I`

∑

j /∈I`


.

Since Pij = 4L(ci)L(cj) (see Proposition 2.9.7) and Piju = uij the last equality ofthe theorem follows. ¥

Corollary 3.4.13. Under the assumptions of the Theorem 3.4.12 we have

D2t

(∑

i∈I`

λi(x(t))

)= 4

∑

i∈I`

∑

j /∈I`

tr ((cj (u ci)) u)λi − λj

=∑

i∈I`

∑

j /∈I`

tr(u2

ij

)

λi − λj,

for ` = 1, . . . , d.Proof. From Theorem 3.4.6 we get

D2t

∑

i∈I`

λi(x(t)) = 〈Dt

(∑

i∈I`

ci(t)

), u〉, (3.16)

for l = 1, . . . , d. Sincetr ((cj (u ci)) u) = tr (Piju u) = tr

(u2

ij

),

the result follows by Theorem 3.4.12. ¥

3.5 Spectral functions 65

In the properties presented in this section, i.e., for the derivatives of eigenvaluesand Jordan frames, we had two versions, one supposing that the eigenvalues are simpleand another supposing that they are not. These both versions appear naturally whenwe are deducing the formulas, which is coherent with the dierentiability of the Jordanframe.

3.5 Spectral functions

We dene spectral functions using the notion of similarity, as dened in the beginningof this chapter. Let F be a real-valued function on V such that

F (gx) = F (x),

for every x in its domain and every g ∈ OAut(K).Given a permutation π of n elements, π : 1, . . . , r 7→ π(1), . . . , π(r). Its

permutation matrix P is the r × r matrix whose entries are 0 except the entries(i, π(i)) which are equal to 1.

Let f : Rr 7→ R and assume that for x ∈ domf we have Px ∈ domf for anypermutation matrix P . If f has the property

f(x) = f(Px)

for any permutation matrix P and any x ∈ domf we call the function f symmetric.One may consider the eigenvalues of x in V as functions of x and dene the

eigenvalue map,λ : V 7→ Rr

↓.

It satises λ(gx) = λ(x) since x and gx are similar, with g ∈ OAut(K).Now, we have the following proposition.

Proposition 3.5.1. The following two properties are equivalent:

(i) For any element x ∈ V and g ∈ OAut(K) we have F (gx) = F (x).

(ii) F = fλ for some symmetric function f : Rr 7→ R. Here fλ denotes f composedby λ.

Proof. (ii) =⇒ (i): by statement (ii) we have F (gx) = f(λ(gx)). Since gx ∼ x wemay write

F (gx) = fλ(gx) = f(λ(gx)) = f(λ(x)) = F (x).

Therefore, statement (i) follows.(i) =⇒ (ii): for a ∈ Rr dene

f(a) := F (a1c1 + · · ·+ arcr)


for some xed Jordan frame c1, . . . , cr. We have that cπ(1), . . . , cπ(r) is a Jordan frame.By Theorem 2.8.5 there exists g such that gci = cπ(i) and by (i)

F (g(a1c1 + · · ·+ arcr)) = F (a1c1 + · · ·+ arcr).

Therefore f is symmetric. Hence

f(λ(x)) = F (λ1(x)c1 + · · ·+ λr(x)cr) = F (x).

This completes the proof. ¥

So every function that is invariant under OAut(K), can be decomposed as F (x) =f(λ(x)). We say that F is generated by f and we call F a spectral function. Hence, itis quite clear that F depends only on the eigenvalues of its argument. An importantsubclass of spectral functions is obtained when f(a) = g(a1)+ · · ·+g(an), with a ∈ Rr

for some univariate real function g. We call such symmetric functions separable andthe respective spectral functions are called separable spectral functions.

3.6 Derivatives of spectral functions

In this section we present derivatives of spectral functions. The following result isfundamental to prove the remaining results of this section.

Proposition 3.6.1 (Corollary 24 in [5]). For every x, y ∈ V , we have

‖λ(x)− λ(y)‖Rr ≤ ‖x− y‖,

where λ(x) ∈ Rr and ‖ · ‖Rr denotes the Euclidean norm in Rr.

Let x =∑r

i=1 λi(x)ci be a spectral decomposition of x. We may write x =∑d`=1 α`(x)e` where e` =

∑i∈I`

ci and α` = λi for i ∈ I` and I`, ` = 1, . . . , d arethe blocks as in (3.6) dened for the vector λ(x) ∈ Rr

↓. It follows that the completesystem of orthogonal idempotents e1 . . . , e` is unique with respect to x (see Section2.4).

The following theorem gives explicitly the derivative of a separable spectral func-tion. This result is adapted from a more general case (for spectral functions) obtainedin [5].

Theorem 3.6.2. Let D ⊂ R be an open set and f : D 7→ R. Let U = x ∈ V : λ(x) ∈Dr and F : U 7→ R dened as F (x) =

∑ri=1 f(λi(x)), with x =

∑ri=1 λi(x)ci ∈ V .

If f is dierentiable in D then F is dierentiable in U and

DxF (x) =r∑

i=1

f ′(λi(x))ci.

3.6 Derivatives of spectral functions 67

Proof. We want prove that

F (x + u)− F (x)− 〈r∑

i=1

f ′(λi(x))ci, u〉 = o(u).

We have

|F (x + u)− F (x)− 〈r∑

i=1

f ′(λi(x))ci, u〉| =∣∣∣∣∣

d∑

`=1

∑

i∈I`

(f(λi(x + u))− f(λi(x))− f ′(λi(x))〈ci, u〉)∣∣∣∣∣ , (3.17)

where I` are the blocks as dened in (3.6). The right-hand side of (3.17) can bewritten as

∣∣∣∣∣d∑

`=1

∑

i∈I`

(f(λi(x + u))− f(λi(x))− f ′(λi(x))(λi(x + u)− λi(x))+

f ′(λi(x))(λi(x + u)− λi(x)− 〈ci, u〉))∣∣∣∣∣,

which is less than or equal to

d∑

`=1

(∑

i∈I`

|f(λi(x + u))− f(λi(x))− f ′(λi(x))(λi(x + u)− λi(x))|+∣∣∣∣∣∑

i∈I`

f ′(λi(x))(λi(x + u)− λi(x)− 〈ci, u〉)∣∣∣∣∣

). (3.18)

Since f is dierentiable,d∑

`=1

∑

i∈I`

|f(λi(x + u))− f(λi(x))− f ′(λi(x))(λi(x + u)− λi(x))| =

r∑

i=1

o(λi(x + u)− λi(x)).

For simplicity of notation, let ε := λ(x + u)− λ(x) and

h(εi) := f(λi(x) + εi)− f(λi(x))− f ′(λi(x))εi.

Sincelim‖ε‖→0

|h(εi)|‖ε‖ ≤ lim

‖ε‖→0

|h(εi)||εi| = 0,


we havelim‖ε‖→0

∑ri=1 |h(εi)|‖ε‖ = 0.

Thus,r∑

i=1

|h(εi)| = o(ε).

The left-hand side of the last expression is also o(u) because, by Proposition 3.6.1,‖ε‖ ≤ ‖u‖. The second summand of (3.18) can be rewritten:

∣∣∣∣∣∑

i∈I`

f ′(λi(x))(λi(x + u)− λi(x)− 〈ci, u〉)∣∣∣∣∣ =

|f ′(α`(x))|∣∣∣∣∣∑

i∈I`

(λi(x + u)− λi(x)− 〈ci, u〉)∣∣∣∣∣ ,

with α`(x) = λi(x), for i ∈ I`. Therefore, by Proposition 3.4.2,∣∣∣∣∣∑

i∈I`

f ′(λi(x))(λi(x + u)− λi(x)− 〈ci, u〉)∣∣∣∣∣ = o(u).

Thus, the theorem is proved. ¥

Let x =∑r

i=1 λi(x)ci be a spectral decomposition of x. Let f be a real valuedfunction on an open subset D of R and assume that the eigenvalues of x are in D.We denote

[λj , λk]f :=f(λj)− f(λk)

λj − λk.

When λj = λk the quotient is understood as a derivative: f ′(λj).The following lemma is also a direct consequence of Theorem 5.6.1 in [4].

Lemma 3.6.3 (Lemma 1 in [28]). If f is a continuous dierentiable function in a suit-able domain that contains all the eigenvalues of x ∈ V , then G(x) =

∑ri=1 f(λi(x))ci

is continuous dierentiable at x and

DxG(x) =r∑

i=1

[λi, λi]fPii +∑

j<k

[λj , λk]fPjk, (3.19)

with Pjk, 1 ≤ j ≤ k ≤ r as dened in (2.24) and (2.25).

Sketch of the proof. First we consider the special case where f(λi) = f ′(λi) = 0 foreach i. The right hand side of (3.19) as well as G(x) is then zero. So, for proving that

‖G(x + u)−G(x)−DxG(x)u‖ = o(‖u‖)

3.6 Derivatives of spectral functions 69

it suces to show that‖G(x + u)‖ = o(‖u‖). (3.20)

We have that x + u =∑r

i=1(λi + εi)c′i for some Jordan frame c′1, . . . , c′r. Applying

Proposition 3.6.1 we getr∑

i=1

ε2i =

r∑

i=1

(λi + εi − λi)2 ≤ ‖(x + u)− x‖2 = ‖u‖2.

The mean value theorem now gives, with some ε′i between 0 and εi, for each i,

‖G(x + u)‖2 =∥∥∥

∑

i

f(λi + εi)c′i −∑

i

f(λi)c′i∥∥∥

2

=∑

i

ε2i f′(λi + ε′i)

2 ≤ ‖u‖2∑

i

f ′(λi + ε′i)2

and (3.20) follows. In the case of a general continuous dierentiable function f , wecan nd a polynomial p such that p and p′ coincide with f and f ′ at λi. Then, bywhat we just proved, the lemma holds for f − p; it only remains to prove that italso holds for p. For this, it will clearly suce to prove the lemma for the functionsxm(m ∈ N). Explicitly, what remains to prove is that

(x + u)m − xm =∑

i

mλm−1i uici +

∑

j<k

λmj − λm

k

λj − λkujk + o(‖u‖) (3.21)

for all m ∈ N. We prove this by induction on m. For m = 1 we have

u =∑

i

uici +∑

j<k

ujk + o(‖u‖) = u + o(‖u‖).

By power-associativity we have

(x + u)m+1 − xm+1 = (x + u) ((x + u)m − xm) + hxm.

Using the induction hypothesis (3.21) this equals

(x + u)

∑

i

mλm−1i uici +

∑

j<k

λmj − λm

k

λj − λkujk

+ hxm + o(‖u‖).

In this expression we write x and u in terms of the Peirce decomposition and use therules cicj = δijci, cihjk = 1

2 (δij + δik)ujk. After simple computations it turns out tobe equal to

∑

i

(m + 1)λihici +∑

j<k

λm+1j − λm+1

k

λj − λkujk + o(‖u‖),

nishing the proof. ¥


3.7 Conclusion

In this chapter we proved an inequality concerning the product of eigenvalues (The-orem 3.3.2), which will be crucial for the proof of an inequality for barrier functionsin Chapter 4. The similarity property given by Proposition 3.2.4 enters to our con-sideration during the analysis of the algorithm in Chapter 5.

We deduced derivative formulas for eigenvalues presented in Section 3.4.Finally, we presented spectral functions and proved (see Proposition 3.5.1) that

they only depend on the eigenvalues of its argument.

Chapter 4Barrier functions

4.1 Introduction

A new class of barrier functions was introduced in [69]. Each barrier function inthis class is generated by a univariate real function, and provides an interior-pointmethod, whose iteration complexity for so-called large-update methods is better thanfor the primal-dual logarithmic barrier function. In this chapter we introduce thisclass of barrier functions in the framework of Euclidean Jordan algebras.

We derive the rst and second derivatives of the aforementioned barrier functionswhen their argument depends linearly on a real parameter. We use the results onderivatives presented in the previous chapter.

4.2 Kernel functions

In this section we give the rst steps in order to dene the barrier functions, generatedby kernel functions, for symmetric cones. Following [9], we call

ψ(t) : (0,+∞) 7→ [0, +∞)

a kernel function if ψ is twice dierentiable and the following conditions are satised.

(i) ψ′(1) = ψ(1) = 0;

(ii) ψ′′(t) > 0, for all t > 0;

(iii) ψ(t) is e-convex, (i.e. ψ(et) is convex).

We say that ψ is coercive if

limt→0

ψ(t) = limt→+∞

ψ(t) = +∞. (4.1)

71

72 Chapter 4. Barrier functions

Clearly, this denition implies that ψ(t) is nonnegative and strictly convex whereasψ(1) = 0. As can be easily veried, any kernel function ψ is completely determinedby its second derivative:

ψ(t) =∫ t

1

∫ ξ

1

ψ′′(ζ)dζdξ.

For the purpose of this work, and as in [9], we consider more conditions on thekernel functions. We require that ψ is three times continuous dierentiable and

ψ′′′(t) < 0, (4.2)2ψ′′(t)2 − ψ′(t)ψ′′′(t) > 0, t < 1, (4.3)

ψ′′(t)ψ′(βt)− βψ′(t)ψ′′(βt) > 0, t > 1, β > 1. (4.4)

The e-convexity property of the kernel function admits dierent interpretations,which are described below. As observed by Glineur ([21]), condition 4.3 is also con-sequence of the self-concordant barrier conditions with complexity parameter 1.

Lemma 4.2.1 (Lemma 2.1.2 in [43]). The following three properties are equivalent:

(i) ψ(t) is e-convex;

(ii) ψ(√

t1t2) ≤ 12 (ψ(t1) + ψ(t2)) for t1, t2 > 0;

(iii) ψ′(t) + tψ′′(t) ≥ 0, t > 0.

Following [9], we say that ψ(t) exponential convex, or shortly e-convex if and onlyif ψ(et) is convex. This property has been proven to be very useful in the analysis ofprimal-dual algorithms based on kernel functions (see for example [9, 43]). Of course,Lemma 4.2.1 gives dierent interpretations of e-convexity.

In what follows we present some technical properties of the kernel functions.

Denition 4.2.2. Any kernel function that is coercive and satises the conditions(4.2)-(4.4) is called an eligible kernel function. 2

In the sequel we assume that ψ is an eligible kernel function and present sometechnical lemmas for ψ.

We frequently use that if K > 0 then there are precisely two values of t for whichψ(t) = K. This follows since ψ(t) is strictly convex and minimal at t = 1, withψ(1) = 0. If these values are t1 and t2, and t1 ≤ t2, then t1 < 1 < t2.

Lemma 4.2.3 (Lemma 3.1 in [9]). Suppose that ψ(t1) = ψ(t2), with t1 ≤ 1 ≤ t2 andβ ≥ 1. Then

ψ(βt1) ≤ ψ(βt2).

Equality holds if and only if β = 1 or t1 = t2 = 1.

4.3 Barrier functions based on kernel functions 73

Lemma 4.2.4 (Lemma 4.8 in [9]). Suppose that ψ(t1) = ψ(t2), with t1 ≤ 1 ≤ t2.Then ψ′(t1) ≤ 0 and ψ′(t2) ≥ 0, whereas

−ψ′(t1) ≥ ψ′(t2).

Lemma 4.2.5 (Lemma 2.6 in [9]). We have

12ψ′′(t)(t− 1)2 < ψ(t) <

12ψ′′(1)(t− 1)2, t > 1,

12ψ′′(1)(t− 1)2 < ψ(t) <

12ψ′′(t)(t− 1)2, t < 1.

4.3 Barrier functions based on kernel functions

Let the triple (V, , 〈, 〉) be an n-dimensional Euclidean Jordan algebra with rankr, where stands for the Jordan product and 〈, 〉 the inner product given by〈x, y〉 = tr (x y) . Let x be an element in V and x =

∑ri=1 λi(x)ci be its spec-

tral decomposition, where λi(x) with i = 1 . . . , r are the eigenvalues of x, such thatλr(x) ∈ Rr

↓ and c1, . . . , cr is a Jordan frame for x. As before, K is the symmetric coneassociated to V . Moreover, we assume that x ∈ K0, which means, by Proposition2.5.10, that λi(x) > 0 for i = 1, . . . , r.

We can now extend the kernel functions to Euclidean Jordan algebras. Let ψbe a kernel function as dened in the previous section. Since the eigenvalues of anyelement of a Euclidean Jordan algebra are real (see Theorem 2.4.1), we can extendthese real functions to Euclidean Jordan algebras. We dene the function φ : K0 7→ Vas

φ(x) :=r∑

i=1

ψ(λi(x))ci.

The barrier function induced by the kernel function is now dened by,

Ψ(x) := tr (φ(x)) =r∑

i=1

ψ(λi(x)). (4.5)

So, if we dene, for a ∈ Rr,

Ψ(a) :=r∑

i=1

ψ(ai)

thenΨ(x) = (Ψλ)(x),

thus making clear that Ψ is a separable spectral function (cf. Section 3.5).The next lemma establishes that e-convex functions preserve systems of inequali-

ties such as (3.3).


Lemma 4.3.1 (Corollary 3.3.10 in [25]). Let f be a real function and α1, . . . , αn,β1, . . . , βn be 2n given nonnegative real numbers such that α1 ≥ · · · ≥ αn ≥ 0 andβ1 ≥ · · · ≥ βn ≥ 0. If

k∏

i=1

αi ≤k∏

i=1

βi, k = 1, . . . , n− 1, (4.6)

n∏

i=1

αi =n∏

i=1

βi (4.7)

and f(t) is e-convex on the interval [βn, β1], thenn∑

i=1

f(αi) ≤n∑

i=1

f(βi).

Suppose now that ψ is e-convex. Our next theorem presents an appealing propertyof barrier functions induced by e-convex kernel functions. It is a crucial result of thethesis.Theorem 4.3.2. Let Ψ(v) be the function dened in (4.5) and x, s ∈ K0. Then

Ψ((P (x)1/2s)1/2) ≤ 12(Ψ(x) + Ψ(s)). (4.8)

Proof. Let u = P (x)1/2s, then, using λi(u1/2) = λ1/2i (u),

Ψ(u1/2) =r∑

i=1

ψ(λ1/2i (u)).

Since by Theorem 3.3.2 the real numbers

αi := λ1/2i (u) > 0, i = 1, . . . , r

andβi := λ

1/2i (x)λ1/2

i (s) > 0, i = 1, . . . , r

satisfy inequality (4.6) and equality (4.7), and ψ(t) is e-convex, we haver∑

i=1

ψ(λ1/2i (u)) ≤

r∑

i=1

ψ(λ1/2i (x)λ1/2

i (s)).

By Lemma 4.2.1, it follows thatr∑

i=1

ψ(λ1/2i (x)λ1/2

i (s)) ≤r∑

i=1

(12ψ(λi(x)) +

12ψ(λi(s))

).

ThereforeΨ(u1/2) ≤ 1

2(Ψ(x) + Ψ(s)).

The result is proved. ¥

4.4 Derivatives of the barrier function 75

4.4 Derivatives of the barrier function

Let x(t) := x0 + tu with t ∈ R and u ∈ V and assume that x(t) ∈ K0. Let x(t) =∑ri=1 λi(x(t))ci be the spectral decomposition of x(t).Our following aim is to obtain expressions for DtΨ(x(t)) and D2

t Ψ(x(t)). For thepurpose we use Theorem 3.6.2 and Lemma 3.6.3, respectively. By Theorem 3.6.2 wehave

DtΨ(x(t)) = tr (DxΨ(x(t)) x′(t)) = tr

(r∑

i=1

ψ(λi(x(t)))ci u

). (4.9)

In order to obtain D2t Ψ(x), we rst rewrite (3.19):

DxG(x) =r∑

i=1

f ′(λi)Pii +∑

j<kλj=λk

f ′(λj)Pjk +∑

j<kλj 6=λk

f(λj)− f(λk)λj − λk

Pjk.

By Theorem 3.6.2, DxΨ(x) =∑r

i=1 ψ′(λi(x))ci. Hence, applying (3.19) to DxΨ(x)we get

D2xΨ(x) =

r∑

i=1

ψ′′(λi)Pii +∑

j<kλj=λk

ψ′′(λj)Pjk +∑

j<kλj 6=λk

ψ′(λj)− ψ′(λk)λj − λk

Pjk.

Since Dtx(t) = u, we have DtΨ(x(t)) = 〈DxΨ(x(t)), u〉. From here

D2t Ψ(x(t)) = 〈D2

xΨ(x(t))u, u〉,i.e.,

D2t Ψ(x(t))=

r∑

i=1

ψ′′(λi)〈Piiu, u〉+∑

j<kλj=λk

ψ′′(λj)〈Pjku, u〉+∑

j<kλi 6=λj


〈Pjku, u〉.

We can easily verify that

〈Piiu, u〉 = 〈Piiu, Piiu〉 = 〈uici, uici〉 = u2i ,

and〈Piju, u〉 = tr

(u2

ij

),

because Pii and Pjk are orthogonal projections onto the Peirce spaces V (ci, 1) andVjk, respectively. Therefore,

D2t Ψ(x(t)) =

r∑

i=1

ψ′′(λi)u2i +

∑

j<kλj=λk

ψ′′(λj)tr(u2

jk

)+

∑

j<kλj 6=λk


tr(u2

jk

).

(4.10)


We can get a upper bound for D2t Ψ(x(t)) that will allows us to work with a simpler

expression.

Proposition 4.4.1. One has

D2t Ψ(x(t)) ≤

r∑

i=1

ψ′′(λi)u2i +

∑

j<k

ψ′′(λk)tr(u2

jk

).

Proof. Note that by the well known mean value theorem, there exists β ∈ (λk, λj)such that


= ψ′′(β).

But this can be bounded by ψ′′(λk) because Condition 4.2 implies that ψ′′(t) ismonotonically decreasing, and we have assumed that for j < k such that λj 6= λk wehave λj > λk. Since the second term of the right-hand side in equation (4.10) is forλk = λj , the result immediately follows. ¥

4.5 Conclusion

In this chapter we introduced a barrier function that is generated by an eligible kernelfunction. The inequality (4.8) is crucial in the analysis of the algorithm presentedin the following chapter. Up till now this inequality was known only for the cone ofpositive semidenite matrices and for the second-order cone.

Chapter 5Interior-point methods based on

kernel functions

5.1 Introduction

This chapter introduces interior-point methods for symmetric optimization based onthe kernel functions introduced in the previous chapter. We rst recall the denitionof the symmetric optimization problem and the necessary and sucient conditionsfor optimality. Then we present the so-called Nesterov-Todd (NT) direction adaptedto the case of symmetric optimization. After dening the search direction that useskernel functions, we present and analyze the algorithm. Later, the kernel functionsdiscovered so far are listed. We conclude the chapter with some notes.

5.2 Symmetric optimization problem

As before, we consider a n-dimensional Euclidean Jordan R-algebra (V, , 〈·, ·〉) withrank r and K will denote the associated symmetric cone, and the inner product isdened by 〈x, y〉 = tr (x y) . The function ψ will denote an eligible kernel function,and Ψ the associated barrier function, as dened in Section 4.3.

We consider the following primal-dual pair of optimization problems,

min〈c, x〉 : 〈ai, x〉 = bi, i = 1, . . . ,m, x ∈ K (5.1)

maxbT y :m∑

i=1

yiai + s = c, s ∈ K, y ∈ Rm, (5.2)

where c, ai ∈ V , for i = 1, . . . , m, and b ∈ Rm. We call x ∈ K primal feasible if〈ai, x〉 = bi for i = 1, . . . ,m. (y, s) ∈ Rm×K is called dual feasible if

∑mi=1 yiai+s = c.

77

78 Chapter 5. Interior-point methods based on kernel functions

Let A ∈ Rm×n be the matrix corresponding to the linear transformation thatmaps x to the m-vector whose ith component is 〈ai, x〉. Then the sets of primal anddual interior feasible solutions are given by

Fp := x ∈ V : Ax = b, x ∈ K0, (5.3)

Fd := (y, s) ∈ Rm × V : AT y + s = c, s ∈ K0, y ∈ Rm.We say that the optimization problems (5.1) and (5.2) are strictly feasible if Fp

and Fd are nonempty, respectively.

5.3 Duality

In this section we recall the conditions for optimality.We start to dene duality gap: the dierence between the primal and dual objective

values at feasible solutions of (5.1) and (5.2).

Theorem 5.3.1 (Weak duality). Let x and (y, s) be primal and dual feasible, re-spectively. One has

〈c, x〉 − bT y ≥ 0,

i.e the duality gap is nonnegative at feasible solutions.

Proof. Replacing c by AT y + s and b by Ax, we have

〈c, x〉 − bT y = 〈AT y + s, x〉 − (Ax)T y = 〈s, x〉+ 〈AT y, x〉 − (Ax)T y.

Since Ax, y ∈ Rm, 〈Ax, y〉 = (Ax)T y. Hence,

〈AT y, x〉 = 〈y, Ax〉 = (Ax)T y.

It follows that〈c, x〉 − bT y = 〈s, x〉 ≥ 0,

where the inequality follows from the self-duality of K. ¥

In the following we state the strong duality theorem.

Theorem 5.3.2 (Strong duality). Let

p∗ := inf〈c, x〉 : Ax = b, x ∈ K

andd∗ := supbT y : AT y + s = c, s ∈ K.

If there exists a strictly feasible solution (y, s) for (5.2) and d∗ is nite, then p∗ = d∗

and p∗ is attained for some x such that Ax = b and x ∈ K.

5.3 Duality 79

We omit the proof because it is exactly the same as the proof of the conic dualitytheorem as given in [11]. We only mention that the proof does not use that V is aEuclidean Jordan algebra.

We use that V is a Euclidean Jordan algebra in the next result.

Lemma 5.3.3 (Lemma 2.2 in [18]). Let (x, s) ∈ K ×K and 〈x, s〉 = 0. Then

x s = 0. (5.4)

Proof. Since s ∈ K, s = z2 for some z ∈ V . So we may write, using the associativityof the inner product,

0 = 〈x, z2〉 = 〈z, x z〉 = 〈z, L(x)z〉.

We will derive from this that L(x)z = 0. As in the proof of Theorem 2.4.1, we canwrite L(x) =

∑ki=1 λiPi, where λi, i = 1, . . . , k are the eigenvalues of L(x) and Pi,

i = 1, . . . , k are non-zero orthogonal projections. So,

0 = 〈z, L(x)z〉 = 〈z,

k∑

i=1

λiPiz〉 =k∑

i=1

λi〈z, Piz〉 =k∑

i=1

λi〈Piz, Piz〉 =k∑

i=1

λi‖Piz‖2.

Since x ∈ K, by Proposition 2.5.10, the eigenvalues of x are nonnegative. Thisimplies, by Corollary 2.9.12, that the eigenvalues of L(x) are also nonnegative. Thus,λi‖Piz‖2 = 0 for all i = 1, . . . , k. Hence, λi = 0 or Piz = 0, for all i = 1, . . . , k.Therefore we obtain L(x)z = 0. Furthermore,

〈s, L(x)s〉 = 〈z2, x z2〉 = 〈z4, x〉 = 〈z3, x z〉 = 〈z3, L(x)z〉 = 0,

which using the same arguments as before, implies that L(x)s = x s = 0. ¥

Note that if x s = 0 then obviously 〈x, s〉 = 0. Thus, we have that for (x, s) ∈K ×K, 〈x, s〉 = 0 if and only if x s = 0.

The following assumptions will be made through this manuscript.

Assumption 1 The vectors ai are linearly independent.

Assumption 2 The problems (5.1) and (5.2) are strictly feasible.

Assumption 1 and 2 are not much restrictive. If the vectors ai are not linearly inde-pendent, we can reduce the problem to a subset of ai's which are linearly independent.For the case that the primal and/or dual problems are not strictly feasible we can em-bed both problems in a self-dual problem that is strictly feasible, as has been shownin e.g., [32, 49]. As a consequence of these assumptions there exist optimal solutionsfor (5.1) and (5.2) with duality gap 0.


5.4 Scaling

In this section we show the existence and the uniqueness of a scaling point w corre-sponding to any points x, s ∈ K0, such that P (w) takes s into x. This was done forself-scaled cones in [38]. Faybusovich [16] also showed it using the framework of theEuclidean Jordan algebras.

Let the geometric mean a#b of a, b ∈ K0 be dened as

a#b := P (a12 )(P (a−

12 )b)

12 .

We have a#b ∈ K0, because, by Proposition 2.5.7, P (a−12 )b ∈ K0 and also

P (a12 )(P (a−

12 )b)

12 ∈ K0.

Remark 5.4.1. The name geometric mean can be explained in the following way:let V be the Euclidean Jordan algebra given in Example 2.4.3. Since for a, b ∈ V ,

P (a) = Diag(a21; . . . ; a

2r),

we have in that case

a#b = Diag(a1; . . . ; ar)(a−11 b1; . . . ; a−1

r br)1/2 = (a1/21 b

1/21 ; . . . ; a1/2

r b1/2r ).

So, the components of the last vector (aibi)1/2, with i = 1 . . . , r, are the geometricmean of the components ai and bi. 2

Proposition 5.4.2 (Proposition 2.4 in [31]). Let a, b ∈ K0. Then a#b is the uniquesolution which belongs to K0 of the following equation in x:

P (x)a−1 = b. (5.5)

Proof. With x = a#b using the fundamental formula (2.17) of the quadratic repre-sentation, we may write

P (x)a−1 = P (P (a12 )(P (a−

12 )b)

12 )a−1

= P (a12 )P (P (a−

12 )b)

12 )P (a

12 )a−1

= P (a12 )P (P (a−

12 )b)

12 )e

= P (a12 )P (a−

12 )b = b.

Now suppose that P (x)a−1 = P (y)a−1 for some y ∈ K0. Then

P (P (x)a−1) = P (P (y)a−1),

which by (2.17) implies that

P (x)P (a−1)P (x) = P (y)P (a−1)P (y).

5.4 Scaling 81

By Proposition 2.5.7, P (x), P (y), P (a−1) are positive denite. Therefore, usingLemma B.1.3, P (x) = P (y). It follows that

x2 = P (x)e = P (y)e = y2.

By Theorem 2.4.2 we can write x2 =∑r

i=1 λici = y2. Since x, y ∈ K0, the eigenvaluesof x and y are positive. Thus,

x =r∑

i=1

√λici = y.


The geometric mean is commutative: a#b = b#a. This can be checked by verify-ing that b#a is also a solution of the quadratic equation (5.5).

We conclude that for given (x, s) ∈ K0 × K0, there exists a unique w ∈ K0 suchthat

P (w)s = x,

namelyw = x#s−1.

The point w is called the scaling point of x and s. It coincides with the uniquescaling point introduced by Nesterov and Todd [38] for self-scaled cones. Below wepresent a few examples.

Example 5.4.3. Let V = Rn and K be the linear cone. For any x, s ∈ K0, we sawin Remark 5.4.1 that

x#s−1 =

(x

1/21

s1/21

, . . . ,x

1/2r

s1/2r

). (5.6)

Thus, the unique scaling point w is given by (5.6). 2

Example 5.4.4. Let V = Sn and K be the cone of real positive semidenite matrices.For any X, S ∈ K0 we have P (X)S = XSX (cf. Example 2.3.3). Thus,

W = X#S−1

= P (X1/2)(P (X−1/2)S−1)1/2

= P (X1/2)(X−1/2S−1X−1/2)1/2

= X1/2(X1/2SX1/2)−1/2X1/2.

The point W coincides with scaling point given in [38]. 2


Example 5.4.5. Let V = Rn+1 and K be the second-order cone. Using the matrixrepresentation L(x) as given in Example 2.3.2, we can easily get that L2(x) = L(x2).Thus

P (x)s =(2L2(x)− L(x2)

)s = L(x2)s = x2 s,

with x, s ∈ V . So, for any x, s ∈ K0 we have

w = x#s−1

= P (x1/2)(P (x−1/2)s−1)1/2

= x (P (x−1/2)s−1)1/2

= x P (x1/2)s)−1/2

= x (x s)−1/2,

where the fourth equality follows from Proposition 2.3.8-(ii). 2

5.5 The central path

In this section, we introduce the concept of central path for the primal and dualproblems (5.1) and (5.2). We will prove the existence and uniqueness of the centralpath.

Recall from Section 5.2 the denition of the symmetric optimization problem. Aswe have seen in Section 5.3, under assumptions 1 and 2, the conditions

〈ai, x〉 = bi, i = 1, . . . ,m, x ∈ K∑mi=1 yiai + s = c s ∈ K

x s = 0.

are necessary and sucient for optimality. We perturb the optimality conditions,more precisely, the so-called complementary condition, x s = 0, by introducing aparameter µ > 0, as follows

〈ai, x〉 = bi, i = 1, . . . , m, x ∈ K∑mi=1 yiai + s = c s ∈ K

x s = µe.(5.7)

Now we will prove that (5.7) has a unique solution for each µ > 0. The function

fµp (x) :=

〈c, x〉µ

− log det x, x ∈ K0

is the so-called primal logarithmic barrier function.This function is strictly convex. Just note that the Hessian of − log det x is P (x)−1

(Proposition 2.6.1), which is positive denite for x ∈ K0. This implies that − log det xis strictly convex. Since 〈c, x〉 is linear in x the strictly convexity of fµ

p follows.The existence of the minimizer of fµ

p is proved below.

5.5 The central path 83

Theorem 5.5.1. If Fp and Fd are nonempty then fµp has a unique minimizer, on

the strictly feasible region Fp.

Proof. By hypothesis, there exist vectors x0 ∈ Fp and (y0, s0) ∈ Fd. Taking K =fµ

p (x0) and dening the level set of fµp (x) by

LK = x ∈ Fp : fµp (x) ≤ K,

we have that x0 ∈ LK , so LK is not empty. Since fµp is strictly convex, if fµ

p has aminimizer then it is unique. The existence of the minimizer can be proved by showingthat LK is compact. By continuity of fµ

p , LK is closed. It remains to prove that LK

is bounded. Let x ∈ LK . Using Proposition 5.3.1 we have

〈c, x〉 − bT y0 = 〈x, s0〉,so, in the denition of fµ

p (x) we may replace 〈c, x〉 by bT y0 + 〈x, s0〉:

fµp (x) =

〈c, x〉µ

− log det x =1µ

bT y0 +1µ〈x, s0〉 − log det x.

Since〈x, s0〉 = tr

(x s0

)= tr

(P (x)1/2s0

)= 〈P (x)1/2s0, e〉,

tr (e) = r and det(P (x)1/2s0) = det(x) det(s0) (cf. Proposition 2.5.12), it follows that

fµp (x) = 〈e, P (x)1/2s0

µ− e〉 − log det

P (x)1/2s0

µ+ r − r log µ +

1µ

bT y0 + log det s0,

or equivalently,

〈e, P (x)1/2s0


P (x)1/2s0

µ= fµ

p (x)− r + r log µ− 1µ

bT y0 − log det s0.

Hence, using fµp (x) ≤ K and dening K by

K := K − r + r log µ− 1µ

bT y0 − log det s0,

we obtain〈e, P (x)1/2s0


P (x)1/2s0

µ≤ K. (5.8)

Note that K does not depend on x. Now let the function ψ : (0, +∞) 7→ R be thefunction dened

φ(t) := t− 1− log t.

If we dene Ψ(u) :=∑r

i=1 ψ(λi(u)), then we may rewrite (5.8) as follows

Ψ(

P (x)1/2s0

µ

)≤ K.


In fact, ψ is an eligible kernel function (cf. [20]). Thus ψ is nonnegative in its domainwhich implies that

ψ

(λi

(P (x)1/2s0

µ

))≤ K.

also that there exist t1 < 1 and t2 > 1, such that

ψ(t1) = ψ(t2) = K.

We conclude that

t1 ≤ λi

(P (x)1/2s0

µ

)≤ t2 i = 1, . . . , r.

Adding these expressions, for all i, we obtain

rt1 ≤ tr(

P (x)1/2s0

µ

)≤ rt2.

Using Lemma 3.3.4 we obtain

0 < tr (x) ≤ rt2µ

λmin(s0),

where λmin(s0) denotes the minimum eigenvalue of s0. Since x ∈ K0, the eigenvaluesof x are positive (cf. Proposition 2.5.10), which implies,

‖x‖ =

√√√√r∑

i=1

λ2i (x) ≤

√√√√(

r∑

i=1

λi(x)

)2

= tr (x) .

Therefore the level set LK is bounded. We conclude that LK is compact. ¥

The existence and uniqueness of solution of the system (5.7) is guaranteed by thefollowing proposition.

Proposition 5.5.2. For each µ > 0 there exists a unique solution of the system (5.7),denoted as (x(µ), y(µ), s(µ)).

Proof. Consider the following problem

minx∈K0

fµ

p (x) :=〈c, x〉

µ− log det x : 〈ai, x〉 = bi, i = 1, . . . , r

,

i.e. the minimization of the primal logarithmic barrier function over Fp. Since thefunction fµ

p is strictly convex and we assumed that the problem (5.1) is strictly fea-sible, the constraint qualications hold for this problem. So, the KKT (rst order)optimality conditions for this problem, with feasible x, are

5.6 The Nesterov-Todd direction 85

∇fµp (x)−

m∑

i=1

yi∇(〈ai, x〉 − bi) = 0,

which is equivalent to the system

1µ

c− x−1 −m∑

i=1

yiai = 0

〈ai, x〉 − bi = 0 i = 1, . . . , m.

See Proposition 2.6.1 for the gradient of log det x. Dening s = c−∑mi=1 yiai where

yi = µyi, this system is the same as (5.7), provided that x s = µe. So it remainsto show that s = µx−1 if and only if x s = µe. It is straightforward to see thats = µx−1 implies x s = µe. To show the converse, note that µx−1 is a solution ofthe equation xs = µe. On the other hand, since x ∈ K0, by Corollary 2.9.12, L(x) isa positive denite operator. Thus implying that L(x)s = µe has a unique solution. Itfollows that KKT conditions are equivalent to the conditions (5.7). Since by Theorem5.5.1 fµ

p has a unique minimizer, this minimizer is given by the system (5.7). ¥

The solution of the perturbed system (5.7) denes a curve parameterized by µ,through the feasible region, which leads to the optimal set as µ → 0. This curveis called the central path and most interior-point methods approximately follow thecentral path to reach the optimal set.

5.6 The Nesterov-Todd direction

Nesterov and Todd in [38] introduced the so-called Nesterov-Todd (shortly NT) direc-tion. It is dened as a triple of vectors (∆x, ∆y∆s) ∈ V × V that is uniquely denedby the conditions:

〈ai,∆x〉 = 0, i = 1, . . . , m,∑mi=1 ∆yiai + ∆s = 0

∆x + (F ′′(w))−1∆s = −x + µs−1,

where F ′′(w) is the Hessian of the barrier function F (x) := − log detx for K atw = x#s−1. Their theory applies to optimization over self-scaled cones, but as wehave mentioned earlier, it was shown in [22] that a cone is self-scaled if and only if itis symmetric. Thus, using Proposition 2.6.1, i.e., the fact that F ′′(x) = P (x)−1 forx ∈ K0, the NT direction in terms of Euclidean Jordan algebra can be dened by thesystem

〈ai,∆x〉 = 0, i = 1, . . . ,m,∑mi=1 ∆yiai + ∆s = 0∆x + P (w)∆s = −x + µs−1,

(5.9)


This system for dening the NT direction in the framework of Euclidean Jordanalgebras was obtained rst in [16]. Note that the rst and the second equations implythat

〈∆x, ∆s〉 = 0.

We will use the scaling point w to scale the NT direction. We set

v :=1√µ

P (w)−12 x =

1√µ

P (w)12 s. (5.10)

Then the last equation from (5.9) can be written as,

∆x + P (w)∆s = −√µP (w)12 v +

µ√µ

P (w)12 v−1.

Applying P (w)−12 to both sides of this equation, we can rewrite it in the form:

dx + ds = v−1 − v,

wheredx =

1√µ

P (w)−12 ∆x (5.11)

andds =

1√µ

P (w)12 ∆s. (5.12)

Thus, we can rewrite the system that denes the NT direction for symmetric cones(or self-scaled cones) as

〈aj , dx〉 = 0 j = 1, . . . ,m,−∑m

j=1 ∆yj aj = ds

dx + ds = v−1 − v(5.13)

where aj = 1√µP (w)

12 aj .

In the next section we will use the characterization (5.13) of the NT direction toderive a new direction.

5.7 A new search direction for symmetric optimization

We now are ready to generalize the search direction introduced in [41] for LO tosymmetric optimization, using the framework of Euclidean Jordan algebras.

The so-called dual logarithmic barrier function is given by

fµd (y, s) :=

1µ

bT y + log det s, (y, s) ∈ Fd.

5.7 A new search direction for symmetric optimization 87

Combining the primal and dual logarithmic barrier functions, we dene the primal-dual logarithmic barrier function as

fµ(x, s) := fµp (x)− fµ

d (y, s)− r + r log µ

=1µ〈c, x〉 − 1

µbT y − log det x− log det s− r + r log µ

=1µ〈x, s〉 − log det

P (x)1/2s

µ− r.

This can be written in terms of v as,

fµ(x, s) = 〈P (w)12 v, P (w)−

12 v〉 − log det((P (w)

12 v) det((P (w)−

12 v)− r

= 〈v, v〉 − log(det(w) det(v) det(w)−1 det(v))− r

= 〈v2, e〉 − 2 log det v − r

= 〈v2 − e, e〉 − 2 log det v,

which is called the v-scaled primal-dual logarithmic barrier function and will be de-noted as Φ(v).Remark 5.7.1. The function Φ(v) can be generated by a kernel function. If v =∑r

i=1 λi(v)ci is a spectral decomposition of v, then

〈v2 − e, e〉 = tr(v2 − e

)= tr

(r∑

i=1

λ2i (v)ci −

r∑

i=1

ci

)=

r∑

i=1

(λ2i (v)− 1)

andlog det v = log

r∏

i=1

λi(v) =r∑

i=1

log λi(v).

Thus we obtainΦ(v) =

∑

i=1

ψ(λi(v)),

with ψ(t) = t2 − 1− 2 log t. The funcion 12ψ(t) is an eligible kernel function (cf. [9]).

2

In the case of LO it was observed that the gradient of Φ(v) and the right-handside of the last equation in the system (5.13) coincide (cf. [9]). It is clear that herethe same holds:

12∇Φ(v) = v − v−1,

using Theorem 3.6.2. This coincidence motivated a new search direction, which isdened by the following system

〈aj , dx〉 = 0 j = 1, . . . ,m,∑∆yj aj + ds = 0

dx + ds = −∇Ψ(v),(5.14)


where Ψ(v) is the barrier function induced by a kernel function as dened in Section4.3 and ∇Ψ(v) its gradient. At this point, it is worth to verify what happens to vwhen x and s are in the central path.

Proposition 5.7.2. For primal and dual strictly feasible points x and s we have

x s = µe ⇐⇒ v = e.

Proof. By the proof of the Proposition 5.5.2 we know that x s = µe is equivalent tos = µx−1. By the denition (5.10) of v it follows that s = µx−1 is equivalent to

√µP (w)−1/2v =

µ√µ

(P (w)1/2v

)−1

.

It follows thatP (w)−1/2v = P (w)−1/2v−1,

and we obtain v2 = e. If v =∑r

i=1 λi(v)ci then v2 =∑r

i=1 λ2i (v)ci =

∑ri=1 ci. Hence,

λ2i (v) = 1 which implies that λi(v) = 1, since v ∈ K. Therefore v = e. The other

implication follows analogously. ¥

The system (5.14) denes uniquely the search direction. We prove this below.

Proposition 5.7.3. Under Assumption 1 and Assumption 2, there is a unique solu-tion to (5.14).

Proof. It suces to prove that the linear operator dened by the left-hand side of(5.14) is injective. Let

〈aj , dx〉 = 0 j = 1, . . . ,m,∑∆yj aj + ds = 0

dx + ds = 0.

Taking the inner product of the last equation with dx, and using 〈dx, ds〉 = 0, itfollows

‖dx‖2 = 0,

implying dx = 0. By the same reasoning it follows that ds = 0. By Assumption 1a1, . . . , am are linearly independent, which implies that ∆y = 0. Note that the system(5.14) is only well dened under Assumption 2. The result follows. ¥

A solution of the system (5.14) returns dx, ∆y and ds. The original search direc-tions ∆x and ∆s are obtained using (5.11) and (5.12), i.e.,

∆x =√

µP (w)12 dx

and∆s =

√µP (w)−

12 ds.

5.8 The algorithm 89

5.8 The algorithm

The algorithm we propose in this section is adapted from [43].We start to remark that since dx and ds are orthogonal, we will have

dx = ds = 0 ⇐⇒ ∇Ψ(v) = 0 ⇐⇒ v = e ⇐⇒ Ψ(v) = 0,

i.e., if and only if x and s belong to the central path. Hence, if

(x, y, s) 6= (x(µ), y(µ), s(µ)),

then (∆x,∆y, ∆s) is nonzero.By taking a step along the search direction, with a step size α dened by some

line search rules, one constructs a new triple (x, y, s) according to

x+ := x + α∆x, y+ := y + α∆y, s+ := s + α∆s. (5.15)

If necessary, this procedure is repeated until we nd iterates (x, y, s) that are closeenough to (x(µ), y(µ), s(µ)). Then µ is reduced by the factor 1− θ and we apply theabove method targeting at the new µ-center, and so on. This procedure is repeateduntil µ is small enough, say, until rµ ≤ ε.

The closeness of (x, y, s) to (x(µ), y(µ), s(µ)) is measured by the value of Ψ(v), withτ as threshold value: if Ψ(v) ≤ τ , then we start a new outer iteration, performing anupdate of the barrier parameter; otherwise we enter an inner iteration by computingthe search directions at the current iterates with respect to the current value of µ andapply (5.15) to get new iterates, see Figure 5.1.

When it stops, the algorithm returns a solution such that is τ -close (in the barrierfunction sense) to a point on the central path with rµ ≤ ε. However, this does notdirectly imply that the accuracy of the solution (measured by the duality gap) isbounded by ε. We will discuss the accuracy of the solution in Section 5.12.

We can use as starting points (s0, x0) without loss of generality (cf. [32]).The parameters τ , θ and the step size α should be chosen in such a way that the

algorithm is optimized in the sense that the number of iterations required by thealgorithm is as small as possible. The choice of the barrier update parameter θ playsan important role in both theory and practice of IPMs. Usually, if θ is a constantindependent of the dimension r of the problem, for instance, θ = 1

2 , then we call thealgorithm a large-update (or long-step) method. If θ depends on the dimension of theproblem, such as θ = 1√

r, then the algorithm is called a small-update (or short-step)

method.The choice of the step size α (α > 0) is another crucial issue in the analysis of the

algorithm. It has to be taken such that the closeness of the iterates to the currentµ-center improves by a sucient amount. In the theoretical analysis the step size αis usually given a value that depends on the closeness of the current iterates to theµ-center.

It is generally agreed that the total number of inner iterations required by thealgorithm is an appropriate measure for its eciency. This number will be referred


Generic Primal-Dual Algorithm for Symmetric Optimization

Input:A threshold parameter τ > 0;an accuracy parameter ε > 0;a xed barrier update parameter θ, 0 < θ < 1;(x0, s0) and µ0 = 1 such that Ψ(v0) ≤ τ .

beginx := x0; s := s0; µ := µ0;while rµ ≥ ε do %outer iterationbegin

µ := (1− θ)µ;while Ψ(v) > τ do %inner iterationbegin

x := x + α∆x;s := s + α∆s;y := y + α∆y;v := 1√

µP (w)−12 x

(= 1√

µP (w)12 s

);

endend

end

Figure 5.1: Generic algorithm

as the iteration complexity of the algorithm; it will be described as a function of therank of V , and the accuracy parameter ε.

As already pointed out, the barrier function Ψ(v) not only serves to dene thesearch direction, but also acts as a measure of closeness of the current iterates tothe µ-center. In the analysis of the algorithm we also use the norm-based proximitymeasure δ(v) dened by

δ(v) :=12‖∇Ψ(v)‖ =

12‖dx + ds‖. (5.16)

Note that since Ψ is strictly convex and minimal at v = e we have

Ψ(v) = 0 ⇐⇒ δ(v) = 0 ⇐⇒ v = e.

Thus, both measures are naturally determined by the kernel function.

5.9 Analysis of the algorithm 91

5.9 Analysis of the algorithm

5.9.1 Growth behavior

At the start of each outer iteration of the algorithm, just before the µ-update, wehave Ψ(v) ≤ τ . Due to the update of µ, the vector v is divided by the factor

√1− θ,

with 0 < θ < 1, which in general leads to an increase in the value of Ψ(v). Then,during the subsequent inner iterations, Ψ(v) decreases until it passes the threshold τagain. Hence, during the course of the algorithm the largest values if Ψ(v) occur justafter the updates of µ. That is why we derive an estimate for the eect of a µ-updateon the value of Ψ(v). In other words, with β = 1√

1−θ≥ 1 we want to nd an upper

bound for Ψ(βv) in terms of Ψ(v).We have the following result.

Theorem 5.9.1. Let % : [0, +∞) 7→ [1,+∞) be the inverse function of ψ(t) for t ≥ 1.Then we have for any vector v ∈ K0 and any β ≥ 1:

Ψ(βv) ≤ rψ(β%

(Ψ(v)r

)).

Proof. First we consider the case where β > 1. We consider the following maximiza-tion problem:

maxvΨ(βv) : Ψ(v) = z,

where z is any nonnegative number. Let v =∑r

i=1 λi(v)ci be the spectral decomposi-tion of v. By Theorem 3.6.2 the rst order optimality conditions for this maximizationproblem are

r∑

i=1

βψ′(βλi(v))ci = ν

r∑

i=1

ψ′(λi(v))ci,

where ν denotes a Lagrange multiplier. This implies that

βψ′(βλi(v)) = νψ′(λi(v)), i = 1, . . . , r. (5.17)

The remaining of the proof is the same as in the proof of Theorem 3.2 in [9], but weinclude it here for the sake of completeness. Since ψ′(1) = 0 and βψ′(β) > 0, we musthave λi(v) 6= 1 for all i. We even may assume that λi(v) > 1 for all i. To see this, let zi

be such that ψ(λi(v)) = zi. Given zi, this equation has two solutions: λi(v) = a(1)i < 1

and λi(v) = a(2)i > 1. As a consequence of Lemma 4.2.3 we have ψ(βv

(1)i ) ≤ ψ(βa

(2)i ).

Since we are maximizing Ψ(βv), it follows that we may assume λi(v) = a(2)i > 1.

Thus we have shown that without loss of generality we may assume that λi(v) > 1for all i. Note that then (5.17) implies βψ′(βλi(v)) > 0 and ψ′(λi(v)) > 0, whencealso ν > 0. Now dening g(t) as

g(t) :=ψ′(t)ψ′(βt)

, t ≥ 1,


We deduce from (5.17) that g(λi(v)) = βν for all i. However, by (4.4) we have g′(t) > 0

for t > 1. So g(t) is strictly monotonically increasing. Hence it follows that all λi(v)′sare mutually equal. Putting λi(v) = t > 1, for all i, we deduce from Ψ(v) = z thatrψ(t) = z. This implies t = %

(zr

). Hence the maximal value that Ψ(v) can attain is

given by

Ψ(βte) = rψ(βt) = rψ(β%

(z

r

))= rψ

(β%

(Ψ(v)r

)).

This proves the theorem if β > 1. For the case β = 1 it suces to observe that bothsides of the inequality in the theorem are continuous in β. ¥

Remark 5.9.2. The bound of Theorem 5.9.1 is sharp. One may easily verify that ifv = βe, with β ≥ 1, then the bound holds with equality. 2

As a result we have if Ψ(v) ≤ τ and β = 1√1−θ

then

LΨ(r, θ, τ) := rψ

(%( τ

r )√1− θ

)(5.18)

is an upper bound for Ψ(βv), the value of Ψ(v) after the µ-update. This upper bounddepends on the parameters θ, τ , on the rank of the Euclidean Jordan algebra andon the kernel function. Note that there is no relevant dierence between the upperbound obtained in [9] for LO and for symmetric optimization.

5.9.2 Decrease of the barrier function during a inner iteration

We are going to compute a default value for the step size α in order to yield a newtriple (x+, y+, s+) as dened in (5.15). This results in a sucient decrease of thebarrier function during an inner iteration.

The analysis is sometimes dierent from the linear case and sometimes quite sim-ilar. If the analysis is dierent from the linear case we go into details, otherwisewe avoid it. We use the quadratic representation and the similarity properties (asdiscussed in Section 3.2). Theorem 4.3.2 plays here its crucial role.

After a damped step we have

x+ := x + α∆x

= x +√

µαP (w)12 dx

= P (w)12 (P (w)−

12 x +

√µαdx)

= P (w)12 (√

µv +√

µαdx),

and we getx+ =

√µP (w)

12 (v + αdx). (5.19)


Analogously,

s+ := s + α∆s

= P (w)−12 (P (w)

12 s +

√µαds)

= P (w)−12 (√

µv +√

µαds),

and we can writes+ =

√µP (w)−

12 (v + αds). (5.20)

Hence, dening v+ as

v+ :=1√µ

P (w+)−12 x+ =

1√µ

P (w+)12 s+

we have

v+ = P (w+)12 P (w)−

12 (v + αds) = P (w+)−

12 P (w)

12 (v + αdx),

where, according to Proposition 5.4.2,

w+ := P (x+)12 ((P (x+)

12 s+)−

12 ).

The following step is key element in our analysis. We replace v+ by an elementwhich is similar to the new iterate v+, thus simplifying the analysis.

Proposition 5.9.3. One has

v+ ∼(P (v + αdx)

12 (v + αds)

) 12 .

Proof. By Proposition 3.2.4 we have

√µv+ = P (w+)

12 s+ ∼

(P (x+)

12 s+

) 12

.

Using (5.19) and (5.20), we get

(P (x+)

12 s+

) 12

=(

µP(P (w)

12 (v + αdx)

) 12

P (w)−12 (v + αds)

) 12

. (5.21)

Since by Proposition 3.2.3 - (ii) (with z = w12 ) the second term of (5.21) is similar to

(µP (v + αdx)

12 (v + αds)

) 12 ,

the result follows. Note that we used P (w1/2) = P (w)1/2 (Proposition 2.5.11). ¥


In the case of LO, the two elements of Proposition (5.9.3) are equal. However, forour purpose the similarity property is enough, because it implies that

Ψ(v+) = Ψ((

P (v + αdx)12 (v + αds)

) 12)

.

Therefore, by Theorem 4.3.2,

Ψ(v+) ≤ 12(Ψ(v + αdx) + Ψ(v + αds)).

Deningf(α) := Ψ(v+)−Ψ(v),

we thus have f(α) ≤ f1(α), where

f1(α) :=12(Ψ(v + αdx) + Ψ(v + αds))−Ψ(v).

Note that f1(α) gives an upper bound for the decrease of the barrier function. Workingwith f1 instead of f has two advantages: f1 is convex, whereas f is in general notconvex and the derivatives of f1, are easier to compute than those of f .

Obviously, f(0) = f1(0) = 0. Using (4.9), the derivative of f1(α) with respect toα is given by

f ′1(α) =12(tr (Ψ′(v + αdx) dx) + tr (Ψ′(v + αds) ds) .

This gives

f ′1(0) =12tr (Ψ′(v) (dx + ds))

= −12tr (∇Ψ(v) ∇Ψ(v))

= −12‖∇Ψ(v)‖2 = −2δ2(v).

Following Theorem 2.4.2 we can write v+αdx =∑r

i=1 λi(v+αdx)ci and v+αds =∑ri=1 λi(v + αds)bi. Let dx =

∑ri=1 dxici +

∑i<j dxij be the Peirce decomposition of

dx with respect to the Jordan frame c1, . . . , cr and ds =∑r

i=1 dsifi +∑

i<j dsij be thePeirce decomposition of ds with respect to the Jordan frame b1, . . . , br. For simplicityof notation, below we use ηi := λi(v + αdx) and γi = λi(v + αds), for i = 1, . . . , r.Thus, if we dierentiate f ′1 with respect to α, using (4.10), we obtain

f ′′1 (α) = g1(α) + g2(α), (5.22)

where

g1(α) =r∑

i=1

ψ′′(ηi)dx2i +

∑

i<jηi=ηj

ψ′′(ηi)tr(dx

2ij

)+

∑

i<jηi 6=ηj

ψ′(ηi)− ψ′(ηj)ηi − ηj

tr(dx

2ij

)


and

g2(α) =r∑

i=1

ψ′′(γi)ds2i +

∑

i<jγi=γj

ψ′′(γi)tr(ds

2ij

)+

∑

i<jγi 6=γj

ψ′(γi)− ψ′(γj)γi − γj

tr(ds

2ij

).

Proposition 5.9.4. We have

f ′′1 (α) ≤ 12

(∑ri=1 ψ′′(ηi)dx

2i +

∑i<j ψ′′(ηj)tr

(dx

2ij

) )

+ 12

( ∑ri=1 ψ′′(γi)ds

2i +

∑i<j ψ′′(γj)tr

(ds

2ij

) ).

(5.23)

Proof. Using (5.22) and the above expressions for g1(α) and g2(α), this propositionfollows from Proposition 4.4.1. ¥

In the following we want nd the step size α that minimizes f1(α). We alreadyknow that f ′1(0) = −2δ(v)2 < 0, which means that f1(α) is monotonically decreasingin a neighborhood of α = 0.

Using Proposition 5.9.4 we will get simpler upper bound for f ′′1 (α) than the oneprovided previously. Below we write δ instead of δ(v), with δ(v) as dened in (5.16).Lemma 5.9.5. One has f ′′1 (α) ≤ 2δ2ψ′′(λmin(v)− 2αδ).Proof. Since dx and ds are orthogonal, (5.16) implies that

4δ2 = ‖dx + ds‖2 = ‖dx‖2 + ‖ds‖2.Hence we have ‖dx‖ ≤ 2δ and ‖ds‖ ≤ 2δ. Therefore, by Theorem 3.3.6

ηi = λi(v + αdx) ≥ λmin(v + αdx) ≥ λmin(v)− ‖αdx‖ ≥ λmin(v)− 2αδ.

In the same way one can prove that

γi = λi(v + αds) ≥ λmin(v)− 2αδ.

Using the Peirce decomposition of dx with respect to c1 . . . , cr, we may write

‖dx‖2 = 〈dx, dx〉 =r∑

i=1

dx2i +

∑

i<j

tr(dx

2ij

).

Since ψ′′(t) is monotonically decreasing and because of (5.23) it follows that

f ′′1 (α) ≤ 12ψ′′(λmin(v)− 2αδ)

r∑

i=1

(dx

2i + ds

2i

)+

∑

i<j

(tr

(ds

2ij

)+ tr

(dx

2ij

))

=12ψ′′(λmin(v)− 2αδ)(‖dx‖2 + ‖ds‖2)

= 2δ2ψ′′(λmin(v)− 2αδ).

This proves the lemma. ¥


Integrating twice the inequality in Lemma 5.9.5 with respect to α, we obtain anupper bound for f1(α). In fact Lemma 5.9.5 is the same as Lemma 4.1 in [9]. As aconsequence, from here on the analysis is similar to the case of LO. By integratingthe inequality in Lemma 5.9.5 we get the next result.

Lemma 5.9.6. f ′1(α) ≤ 0 holds certainly if α satises the inequality

−ψ′(λmin(v)− 2αδ) + ψ′(λmin(v)) ≤ 2δ. (5.24)

Lemma 5.9.6 means that for α satisfying the inequality (5.24), we have that f1(α)is monotonically decreasing. We will obtain the largest step size that satises thisinequality.

The proof of next lemma depends on the condition (4.2). We can see that thelemma is valid for symmetric optimization by just applying Lemma 4.3 in [9] to thevector λ(v) ∈ Rr.

Lemma 5.9.7 (Lemma 4.3 in [9]). Let ρ : [0, +∞) 7→ (0, 1] denote the inverse functionof the restriction of − 1

2ψ′(t) to the interval (0, 1]. Then the largest step size of α thatsatises (5.24) is given by

α :=12δ

(ρ(δ)− ρ(2δ)). (5.25)

The next lemma follows from Lemma 5.9.7. We deduce a lower bound for α whichis easier to work with than α itself.

Lemma 5.9.8 (Lemma 4.4 in [9]). Let ρ and α be as dened in Lemma 5.9.7. Then

α ≥ 1ψ′′(ρ(2δ))

.

Note that an upper bound for f1(α) is also an upper bound for f(α). Analogouslyto LO we can obtain an upper bound for f1(α) and simplifying it using the technicalLemma D.1.1. This upper bound is given below.

Lemma 5.9.9 (Lemma 4.5 in [9]). If the step size α is such that α ≤ α then

f(α) ≤ −αδ2.

In the following we deneα :=

1ψ′′(ρ(2δ))

(5.26)

and we will use α as the default step size. By Lemma 5.9.8 we have α ≥ α. It turnsout that the default step size is exactly the same as for the LO case.

The next theorem is an immediate consequence of Lemmas 5.9.8 and 5.9.9. Itprovides an estimate of the decrease of the barrier function when taking a dampedstep with size α.


Theorem 5.9.10 (Theorem 4.6 in [9]). With α being the default step size, as givenby (5.26), one has

f(α) ≤ − δ2

ψ′′(ρ(2δ)). (5.27)

The following lemma depends on Condition (4.3). For its proof we refer to [9].

Lemma 5.9.11 (Lemma 4.7 in [9]). The right hand side expression in (5.27) ismonotonically decreasing in δ.

We want to express the decrease of the barrier function during a inner iterationas a function of Ψ(v). To this end we need a lower bound on δ(v) in terms of Ψ(v).Such a bound is provided by the following theorem. The statement is exactly as in[9], but its proof is slightly dierent.

Theorem 5.9.12. One has

δ(v) ≥ 12ψ′(ρ(Ψ(v))).

Proof. The statement in the lemma is obvious if v = e since then δ(v) = Ψ(v) = 0.Otherwise we have δ(v) > 0 and Ψ(v) > 0. Let

v =r∑

i=1

λi(v)ci

be a spectral decomposition of v. To deal with nontrivial case we consider, for ω ≥ 0,the problem

zω = minv

δ(v)2 =

14

r∑

i=1

ψ′(λi(v))2 : Ψ(v) = ω

. (5.28)

Therefore the rst order optimality conditions are

12

r∑

i=1

ψ′(λi(v))ψ′′(λi(v))ci =r∑

i=1

γψ′(λi(v))ci,

where γ ∈ R. This implies that

12ψ′(λi(v))ψ′′(λi(v)) = γψ′(λi(v)), i = 1, . . . , r.

From this we conclude that we have either ψ′(λi(v)) = 0 or ψ′′(λi(v)) = 2γ for eachi. Since ψ′′(t) is monotonically decreasing, this implies that all λi(v)′s for whichψ′′(λi(v)) = 2γ have the same value. Denoting this value as t, and observing that allother eigenvalues have value 1 (since ψ′(λi(v)) = 0 for these eigenvalues), we concludethat,

v = tc1 + · · ·+ tck + ck+1 + · · ·+ cr,


where we supposed there are k eigenvalues with value t. Now Ψ(v) = ω implieskψ(t) = ω. From here on we proceed as in the proof of Theorem 4.9 in [9]. Given k,this uniquely determines ψ(t), whence we have

4δ(v)2 = k(ψ′(t))2, ψ(t) =ω

k.

Note that the equation ψ(t) = ωk has two solutions, one smaller than 1 and one larger

than 1. By Lemma 4.2.4 the larger value gives the smallest value of (ψ′(t))2. Sincewe are minimizing δ(v)2, we conclude that t > 1 (since ω > 0). Hence we may write

t = %(ω

k

),

where, as before, % denotes the inverse function of ψ(t) for t ≥ 1. Thus we obtainthat

4δ(v)2 = k(ψ′(t))2, t = %(ω

k

). (5.29)

The question now is which value of k minimizes δ(v)2. To investigate this, we takethe derivative with respect to k of (5.29) extended to k ∈ R. This gives

d4δ(v)2

dk= ψ′(t)2 + 2kψ′(t)ψ′′(t)

dt

dk. (5.30)

From ψ(t) = ωk we derive that

ψ′(t)dt

dk= − ω

k2= −ψ(t)

k,

which givesdt

dk= − ψ(t)

kψ′(t).

Substitution into (5.30) gives

d4δ(v)2

dk= ψ′(t)2 − 2ψ(t)ψ′′(t).

Dening f(t) = ψ′(t)2 − 2ψ(t)ψ′′(t), we have f(1) = 0 and

f ′(t) = 2ψ′(t)ψ′′(t)− 2ψ′(t)ψ′′(t)− 2ψ(t)ψ′′′(t) = −2ψ(t)ψ′′′(t) > 0.

We conclude that f(t) > 0 for t > 1. Hence dδ(v)2

dk > 0, so δ(v)2 increases when kincreases. Since we are minimizing δ(v)2, at optimality we have k = 1. Also usingthat ψ(t) ≥ 0, we obtain from (5.29) that

minvδ(v) : Ψ(v) = ω =

12ψ′(t) =

12ψ′(%(ω)) =

12ψ′(%(Ψ(v))).

This completes the proof of the theorem. ¥


Remark 5.9.13. The bound of Theorem 5.9.12 is sharp. One may easily verify thatif v is such that all eigenvalues are equal to 1 except one eigenvalue which is greaterthan or equal to 1, then the bound holds with equality. 2

Combining the results of Theorems 5.9.10 and 5.9.12 we obtain

f(α) ≤ − (ψ′(%(Ψ(v))))2

4ψ′′(ρ(ψ′(%(Ψ(v))))). (5.31)

This expresses the decrease of Ψ(v) during an inner iteration completely in Ψ(v), therst and second derivatives of ψ, and the inverse functions ρ and %, of − 1

2ψ′ restrictedto (0, 1] and ψ restricted to [1,+∞), respectively.

5.9.3 Iteration bounds

After the update of µ to (1− θ)µ, we have, by Theorem 5.9.1 and (5.18),

Ψ(v) ≤ LΨ(r, θ, τ) = rψ

(%( τ

r )√1− θ

)(5.32)

We need to count how many inner iterations are required to return to the situationwhere ψ(v) ≤ τ . We denote the value Ψ(v) after the µ-update as Ψ0, and thesubsequent values are denoted as Ψk, k = 1, 2, . . . . The decrease during each inneriteration is given by (5.31). In the sequel we assume that the expression in the right-hand side expression of (5.31) satises

(ψ′(%(Ψ(v))))2

4ψ′′(ρ(ψ′(%(Ψ(v)))))≥ κΨ(v)1−γ (5.33)

for some positive constants κ and γ, with γ ∈ (0, 1].Let us establish that such constants γ and κ do exist. At the start of each inner

iteration we have Ψ(v) > τ > 0. Since

% : [0, +∞) 7→ [1, +∞)

is increasing (as the inverse of an increasing function), %(Ψ(v)) > %(τ) > 1. From here,we obtain ψ′(%(Ψ(v))) > ψ′(%(τ)) > 0. Let z := ψ′(%(τ)). It follows, by denition ofρ that ρ(z) > 0. Using that ψ′′(t) > 0, we conclude that the left-hand side expressionof (5.33) is positive. By Lemma 5.9.11

z2

ψ′′(ρ(z))(5.34)

is monotonically increasing, which implies that the left-hand side in (5.33) is greaterthan (5.34). Hence (5.33) certainly holds if γ = 1 and

κ =z2

4ψ′′(ρ(z)).


For each kernel function we should nd constants γ and κ satisfying (5.33). Thisprocess is not straightforward and may vary, depending on the kernel function. Fordierent cases and appropriate choices of γ and κ, see [9]. The next lemma makesclear that we want γ ∈ (0, 1] as small as possible.

Lemma 5.9.14 (Lemma 5.1 in [9]). If K denotes the number of inner iterations, wehave

K ≤ Ψγ0

κγ. (5.35)

The last lemma provides an estimate for the number of inner iterations in terms ofΨ0 and the constants κ and γ. Recall that Ψ0 is bounded above according to (5.32).

An upper bound for the total number of iterations is obtained by multiplying theupper bound in (5.35) for the number K by the number of barrier parameter updates,which is bounded above by (see Lemma II.17 in [45])

1θ

logr

ε.

So the total number of iterations is bounded above by

Ψγ0

κγθlog

r

ε.

5.10 Recipe to calculate a complexity bound

In this section we summarize the results of the previous sections by presenting asimple scheme to obtain iteration bounds for both large- and small-update methods.This recipe was introduced in [9] for LO. The amazing fact of this recipe, is that, fora given ψ, τ , θ and ε it returns an iteration bound for the IPM based on it.

We rst state some notational conventions, that we use in this section. Let f, g :R+ 7→ R+. We write f(t) = O(g(t) if f(t) ≤ νg(t) for some positive constant ν andf(t) = Θ(g(t)) if ν1g(t) ≤ f(t) ≤ ν2g(t) for positive constants ν1 and ν2.

Step 1 Input a kernel function ψ; an update parameter θ, 0 < θ < 1; a thresholdparameter τ ; and an accuracy parameter ε.

Step 2 Solve the equation − 12ψ′(t) = s to get ρ(s), the inverse function of − 1

2ψ′(t),t ∈ (0, 1]. If the equation is hard to solve, derive a lower bound for ρ(s).

Step 3 Calculate the decrease of Ψ(v) in terms of δ for the default step size α from

f(α) ≤ − δ2

ψ′′(ρ(2δ)).

Step 4 Solve the equation ψ(t) = s to get %(s), the inverse function of ψ(t), t ≥ 1.If the equation is hard to solve, derive a lower bound and an upper boundfor %(s).

5.11 Examples 101

Step 5 Derive a lower bound for δ(v) in terms of Ψ(v) by using

δ(v) ≥ 12ψ′(%(Ψ(v))).

Step 6 Using the results of Step 3 and Step 4 nd a valid inequality of the form

f(α) ≤ −κΨ(v)1−γ

for some positive constants κ and γ, with γ ∈ (0, 1] and γ as small as possible.

Step 7 Calculate an upper bound for Ψ0 from

Ψ0 ≤ LΨ(r, θ, τ) = rψ( %( τ

r )√1− θ

).

Step 8 Derive an upper bound for the total number of iterations using that thisnumber is bounded above by

Ψγ0

θκγlog

r

ε.

Step 9 Set τ = O(r) and θ = Θ(1) so as to calculate complexity bound for large-update methods, and set τ = O(1) and θ = Θ( 1√

r) to obtain a complexity

bound for small-update methods.

5.11 Examples

In Table 5.1 we present the kernel functions that up to now have been proposed andanalyzed in the literature.

The table also shows the current iterations bounds for the corresponding algo-rithm. Most of these examples have been studied for LO. However, our work allowsus to conclude that the iteration complexity associated to each function also appliesto symmetric optimization. The best iteration complexity for large-update method isobtained for

ψ(t) =t2 − 1

2+

t1−q − 1q − 1

,

with q = log r. In this case the iteration complexity is O(√

r log r log rε ).


kernel function iteration complexity references1 t2−1

2 − log t O(r log rε ) [3, 13, 50]

2 t2−12 + t1−q−1

q(q−1) − q−1q (t− 1), q > 1 O(qr

q+12q log r

ε ) [42, 43]3 t2−1

2 + (e−1)2

e1

et−1 − e−1e O(r

34 log r

ε ) [8]4 1

2 (t− 1t )

2 O(r23 log r

ε ) [41]5 t2−1

2 + e1t−1 − 1 O(

√r log2 r log r

ε ) [9]6 t2−1

2 − ∫ t

1e

1ξ−1dξ O(

√r log2 r log r

ε ) [9]7 t2−1

2 + t1−q−1q−1 , q > 1 O(qr

q+12q log r

ε ) [40]8 t1+p−1

1+p − log t, p ∈ [0, 1] O(r log rε ) [20]

9 t1+p−11+p + t1−q−1

q−1 , q > 1, p ∈ [0, 1] O(qrp+q

q(1+p) log rε ) [20]

10 t + 1t − 2 O(r log r

ε ) [6]

Table 5.1: Kernel functions

5.12 The accuracy of solution produced by the algorithm

As we mention in Section 5.8 when the algorithm stops, it returns an solution thatis τ -closed (in the barrier function sense) to a point (x(µ), y(µ), s(µ)) such that〈x(µ), s(µ)〉 = µr ≤ ε. But this does not directly imply that we got a solution(x, y, s) such that 〈x, s〉 ≤ ε. As much as we know, this question is for the rst timeanalyzed here.

In the following we deduce an upper bound for the duality gap. We can assumewithout loss of generality that, for v ∈ K, r1 < r eigenvalues are bigger than one andthe remaining eigenvalues are less than one. This means that λ1 = λmax > 1. UsingLemma 4.2.5 we obtain

Ψ(v) ≥r1∑

i=1

12ψ′′(1)(λi(v)− 1)2 +

r∑

i=r1+1

12ψ′′(λi(v))(λi(v)− 1)2.

Since ψ′′ is monotonically decreasing,

Ψ(v) ≥r∑

i=1

12ψ′′(λ1(v))(λi(v)− 1)2.

Let ν := ψ′′(λ1(v)). HenceΨ(v) ≥ 1

2ν‖v − e‖2,

which is equivalent toΨ(v)

ν≥ 1

2‖v‖2 − tr (v) +

r

2.

5.13 Conclusions 103

Using the fact that tr (v) ≤ √r‖v‖ we obtain

Ψ(v)ν

≥ 12‖v‖2 −√r‖v‖+

r

2.

This implies that

‖v‖ ≤ √r +

√2Ψ(v)

ν.

Since the duality gap can be written in terms of v, i.e.,

〈x, s〉 = µ‖v‖2,

and for the solution produced by the algorithm Ψ(v) < τ , then we have

〈x, s〉 = µ‖v‖2 ≤ µr +2µτ

ν+ 2µ

√2rτ

ν.

Note that for large update methods we do τ = O(n). If we assume that ν ≥ 1,then the algorithm nally reports a feasible solution such that 〈x, s〉 = O(ε), usingµr ≤ ε. For instance if τ = n then

〈x, s〉 ≤ 7ε.

To obtain this result we have assumed that ν > 1. In fact, this is the case forthe kernel functions presented in Table 5.1 numbered from 1 to 7. However, for thekernel functions numbers 8, 9 (with p < 1) and 10 we may not have a good lowerbound for ν, i.e., a lower bound for ν such that we certainly have 〈x, s〉 = O(ε). Forinstance, if ψ(v) = t + 1

t − 2 (number 10), then ψ′′(t) = 2t3 . It is obviously that if t

increases then ν decreases and in this case (up to now) we could not guarantee thequality of the solution.

Note that if the solution produced by the algorithm is such that λ1(v) < 1, thenwe certainly have a solution such that 〈x, s〉 = O(ε), because we can make ν := ψ′′(1).

5.13 Conclusions

The interior-point methods based on kernel functions for LO can be extended, and inmany places, word by word, to symmetric optimization. Our main conclusion is thatthe recipe presented in Section 5.10 for symmetric optimization is exactly the sameas for LO. As a consequence, we proved that computing the iteration complexity forsymmetric optimization is as hard (or easy) as for LO. The recipe uses only the kernelfunction ψ(t) as input.

Chapter 6Conclusions

6.1 Final notes

The aim of this work was to generalize interior-point methods for LO based on kernelfunctions to symmetric optimization. To achieve this we needed to study EuclideanJordan algebras and their relation with symmetric cones, as it is done in Chapter 2.

The barrier functions based on kernel functions are, in fact, separable spectralfunctions, whose fundamental properties were studied in Chapter 4.

Since spectral functions are functions depending on the eigenvalues of their argu-ment we had to establish eigenvalues properties, and these were presented in Chapter3. We further derived expressions for the derivatives of eigenvalues.

In Chapter 5 it was proved that interior-point methods based on kernel functionsare generalizable to symmetric optimization and that getting complexity bounds forsymmetric optimization is as hard (or easy) as for LO. In fact, it turns out that thesame scheme for computing an iteration bound for LO can also be used for symmetricoptimization.

The e-convexity property is shared by the so-called self-regular functions, as in-troduced in [43] and also by the class of eligible kernel functions, as introduced in [9].In both cases, this property needs to be extended to the induced barrier function.As we could do this, we can conclude that the interior point methods based on theself-regular functions are also extendable to symmetric optimization.

6.2 Directions for further research

We list below some possible directions for further research:

• Does there exist a kernel function for which the complexity of large-updatemethods is the same as for small-update methods?

105

106 Chapter 6. Conclusions

• Can we develop a dierent analysis of the Algorithm 5.8, in a such way that wecan accept kernel functions, that do not satisfy Conditions (4.2)-(4.4)?

• The set of homogeneous cones is quite larger set than the set of symmetriccones. Can we extend the interior-point methods based on kernel functions tohomogeneous cones?

• Theorem 3.3.2 was proved for each primitive symmetric cone separately andthen for the direct sum of the primitives cones. Can this theorem be proved inan unied way for any symmetric cone?

Appendix ATopological notions

When V is an n-dimensional vector space over R, any vector in V has real coordinateswith respect to a certain xed basis b1, . . . , bn of V . We can dene a norm on V by

‖x‖ :=√

x21 + · · ·+ x2

n, x ∈ V,

where x1, . . . , xn are the coordinates of x with respect to the basis. This norm inducesa metric d : V × V 7→ R+ dened as

d(x, y) := ‖y − x‖, x, y ∈ V.

Using this metric the ball, with radius ε, centered at x ∈ V , is given by

Bε(x) := y ∈ V : ‖y − x‖ < ε.

Let U be a nonempty subset of V . A point x ∈ U is said to be interior in U ifthere exists ε > 0 such that Bε(x) ⊂ U . The set of all interior points of U is denotedU0. We say that U is open if U0 = U . We say that U is closed if V \U is open. Thepoint x ∈ V is called a boundary point of U if for each ε > 0, Bε(x) contains a pointin U and a point not in U . The set of all boundary points is called the boundary ofU and is denoted by ∂U .

Alternatively, we say that U is closed if it contains its boundary. The closure ofU , denoted as cl(U), is the set U ∪ ∂U . We say that U is dense in V if cl(U) = V . Itis well known that U is dense V if and only if every element x ∈ V is the limit of asequence of elements in U .

We say that U is bounded if there exists L > 0 such that

‖x‖ ≤ L, ∀x ∈ U.

A set U is said to be compact if U is closed and bounded.

107

Appendix BSome matrix properties

Let Herm(C, n) be the set of complex n×n Hermitian matrices. Let x∗ := xT , wherex is the complex conjugate of x. We say that A ∈ Herm(C, n) is positive semidenite(denite) if x∗Ax ≥ 0 for all x, y ∈ Cn (x∗Ax > 0 for all x, y ∈ Cn \ 0). Anothercharacterization of positive semideniteness of the Hermitian matrix A can be given interms of its eigenvalues: A is positive semidenite (denite) if λi(A) ≥ 0, (λi(A) > 0)i = 1, . . . , n.

A matrix A ∈ Herm(C, n) is said to be similar to a matrix B ∈ Herm(C, n), ifthere exists a invertible matrix S, such that

A = S−1BS.

We denote the similarity relation as A ∼ B. We have the following properties:

Proposition B.1.1 (Section 5.2 in [33]). Let A,B ∈ Herm(C, n). One has

(i) AB ∼ BA,

(ii) if A,B are positive semidenite then AB ∼ A1/2BA1/2 ∼ B1/2AB1/2,

(iii) if B is invertible then A ∼ B−1AB.

Theorem B.1.2 (Theorem 9.H.1.a in [34]). If A and B are n×n positive semideniteHermitian matrices, then

k∏

i=1

λi(AB) ≤k∏

i=1

λi(A)λi(B), k = 1, . . . , n− 1,

n∏

i=1

λi(AB) =n∏

i=1

λi(A)λi(B).

109

110 Chapter B. Some matrix properties

Lemma B.1.3. Let A,X, Y be positive denite symmetric matrices. If XAX = Y AYthen X = Y .

Proof. The equation XAX = Y AY implies

A1/2XAXA1/2 = A1/2Y AY A1/2.

This can be written as follows

(A1/2XA1/2)(A1/2XA1/2) = (A1/2Y A1/2)(A1/2Y A1/2),

or equivalently(A1/2XA1/2)2 = (A1/2Y A1/2)2.

Since X and Y are positive denite, the matrices A1/2XA1/2 and A1/2Y A1/2 are alsopositive denite. Therefore,

A1/2XA1/2 = A1/2Y A1/2,

implying that X = Y . ¥

Appendix CMatrices of quaternions and

octonions

C.1 Quaternions

As usual, let C and R denote the eld of the complex and real numbers, respectively.Let H be a four-dimensional vector space over R with an ordered basis, denoted by1, i, j and k. A real quaternion or simply called quaternion, is a vector

x = x0 + x1i + x2j + x3k ∈ H,

with real coecients x0, x1, x2, x3. The elements of H are called (real) quaternions.The product of any two of quaternions is determined by the following product rulesfor the basis vectors:

i2 = j2 = k2 = −1,

ij = −ji = k, jk = −kj = i, ki = −ik = j.

For any x = x0 + x1i + x2j + x3k ∈ H, we dene <x = x0, the real part of x;=x = x1i+x2j +x3k, the imaginary part ; and x = x0−x1i−x2j−x3k, the conjugateof x.

We may writeH = R+ Ri + Rj + Rk = (R+ Ri) + (R+ Ri)j = C+ Cj. (C.1)

The algebra H, with product rules dened above, is associative and noncommutative.

C.2 Matrices of quaternions

Let U be a matrix with entries in H. We call U a quaternion matrix . Following (C.1)we can represent U = A+Bj, with A = A1 +A2i, B = B1 +B2i and A1, A2, B1, B2 ∈

111

112 Chapter C. Matrices of quaternions and octonions

Rn×n. Let U := [uij ] be the conjugate matrix of U . We dene

Uf :=[

A B−B A

]∈ C2n×2n,

the complex matrix Uf is known as the complex representation of the matrix U .The matrix U∗ := UT is called the adjoint of U . We say that U is Hermitian if

U∗ = U . Let U ∈ Herm(H, n) an n-Hermitian matrix with quaternion entries.

Proposition C.2.1 (Theorem 3 in [30]). Let U, V ∈ Hn×n. Then

(i) Uf is Hermitian if and only if U is Hermitian;

(ii) (UV )f = UfV f .

By Proposition C.2.1-(i) we can say that an Hermitian quaternion matrix is diag-onalizable since any Hermitian complex matrix is diagonalizable.

Theorem C.2.2 (Theorem 3.1 in [26]). Let A ∈ Hn×n. Then A is a diagonalizablequaternion matrix if and only if Af is a diagonalizable complex matrix. Assume thatAf is diagonalizable. Let all eigenvalues of Af be λ1, λ1, λ2, λ2, . . . , λn, λn, in which=λi ≥ 0, i = 1, . . . , n, and T be a nonsingular matrix such that

T−1AfT =[

J 00 J

]= Jf ,

where J = Diag(λ1, λ2, . . . , λn). Let

S =14

[In −jIn

](T + Q−1

n TQn)[

In

jIn

],

whereQn =

[0 −In

In 0

].

Then S is a nonsingular quaternion matrix and S−1AS = J .

We say that A ∈ Herm(H, n) is a positive semidenite (denite) matrix if all theeigenvalues are nonnegative (positive). Theorem 2.4.2 implies that the eigenvalues ofany matrix A ∈ Herm(H, n) are real numbers.

Proposition C.2.3. If A ∈ Herm(H, n) is positive semidenite if and only if Af isalso positive semidenite.

Proof. Direct consequence of Theorem C.2.2 and of the fact that the eigenvalues arereal numbers. ¥

Proposition C.2.4. Let U := A1/2BA1/2, with A and B n × n quaternion positivesemidenite matrices. Then U is similar to AB.

C.2 Matrices of quaternions 113

Proof. By Theorem C.2.1,Uf = (A1/2BA1/2)f = (A1/2)fBf (A1/2)f .

Since Af and Bf are matrices with complex entries, by Proposition B.1.1, Uf issimilar to AfBf = (AB)f . Thus, by Theorem C.2.2 AB is similar to U . ¥

We extend the Theorem B.1.2 to set of quaternion positive semienite Hermitianmatrices. As far as we know it was not done before.Theorem C.2.5. If A and B are n × n quaternion positive semidenite Hermitianmatrices and U := A1/2BA1/2 then

k∏

i=1

λi(U) ≤k∏

i=1

λi(A)λi(B), k = 1, . . . , n− 1,

n∏

i=1

λi(U) =n∏

i=1

λi(A)λi(B).

Proof. We have, by Theorem C.2.2 that the eigenvalues of Uf are of the formλ1(Uf ), λ1(Uf ), λ2(Uf ), λ2(Uf ), . . . , λn(Uf ), λn(Uf ).

Analogously, the eigenvalues of Af and Bf areλ1(Af ), λ1(Af ), λ2(Af ), λ2(Af ), . . . , λn(Af ), λn(Af ),

λ1(Bf ), λ1(Bf ), λ2(Bf ), λ2(Bf ), . . . , λn(Bf ), λn(Bf ),

respectively. Since Uf is Hermitian, its eigenvalues are real numbers (Theorem 2.4.2).This implies that

λi(Uf ) = λi(Uf ), λi(Af ) = λi(Af ) and λi(Bf ) = λi(Bf ),

for i = 1, . . . , n. Therefore, Theorem C.2.2 implies that the eigenvalues of U , A, andB are

λ1(Uf ), λ2(Uf ), . . . , λn(Uf ),

λ1(Af ), λ2(Af ), . . . , λn(Af )

andλ1(Bf ), λ2(Bf ), . . . , λn(Bf ),

respectively. Since, by Proposition B.1.1, Uf = (A1/2)fBf (A1/2)f is similar toAfBf = (AB)f , applying Theorem B.1.2 to Uf , Af and Bf we have

k∏

i=1

λ2i (U

f ) ≤k∏

i=1

λ2i (A

f )λ2i (B

f ).

From here we get thatk∏

i=1

λi(U) ≤k∏

i=1

λi(A)λi(B).

The equality follows analogously. ¥

114 Chapter C. Matrices of quaternions and octonions

C.3 Octonions

The octonions, denoted as O, are the eight-dimension vector space over the reals,spanned by the identity element 1 and seven imaginary units, which we label asi, j, k, k`, j`, i`, `. Each imaginary unit squares to −1,

i2 = j2 = k2 = ... = `2 = −1.

We omit here the remaining rules of multiplication between the imaginary units,because they are quite extensive and we do not need them. The octonions endowedwith the mentioned multiplication rules are nonassociative and noncommutative.

C.4 The Albert Algebra

The Albert algebra denoted as Herm(O, 3), is an algebra consisting of the 3×3 octonionHermitian matrices. Recall the Theorem 2.7.5 which states that the Albert algebra isa Euclidean Jordan algebra, when dening for A,B ∈ Herm(O, 3) the Jordan product,

A B =AB + BA

2.

The Albert algebra it is well study by algebraist, and for A ∈ Herm(O, 3) its charac-teristic polynomial is given by

λ3 − (tr (A))λ2 + σ(A)A− (det A) = 0,

with σ(A) = 12 ((tr (A))2 − tr

(A2

)). For a deduction of the characteristic polynomial

we refer to [14]. From everything that was exposed we can conclude that A has onlythree real eigenvalues.

Appendix DTechnical properties

Lemma D.1.1 (Lemma 3.12 in [42]). Let h(t) be a twice dierentiable convex functionwith h(0) = 0, h′(0) < 0, and let h(t) attain its (global) minimum at t∗ > 0. If h′′(t)is increasing for t ∈ [0, t∗], then

h(t) ≤ th′(0)2

, 0 ≤ t ≤ t∗.

Lemma D.1.2 (Section 14 in [10] ). For xi ≥ 0 and αi > 0 such that α1+· · ·+αn = 1,we have

n∑

i=1

αixi ≥n∏

i=1

xαii .

The last inequality is known as the weighted arithmetic-geometric mean inequality .

115

Bibliography

[1] F. Alizadeh. Interior point methods in semidenite programming with applica-tions to combinatorial optimization. SIAM J. Optim., 5(1):1351, 1995.

[2] F. Alizadeh and D. Goldfarb. Second-order cone programming. Math. Program.,95(1, Ser. B):351, 2003.

[3] E. D. Andersen, J. Gondzio, C. Mészáros, and X. Xu. Implementation of interior-point methods for large scale linear programs. In Interior point methods of math-ematical programming, volume 5 of Appl. Optim., pages 189252. Kluwer Acad.Publ., Dordrecht, 1996.

[4] M. Baes. Spectral functions and smoothing techniques on Jordan algebras. PhDthesis, Université Catholique de Louvain, 2006.

[5] M. Baes. Convexity and dierentiability properties of spectral functions andspectral mappings on Euclidean Jordan algebras. Linear Algebra Appl., 422:664700, 2007.

[6] Y. Q. Bai and C. Roos. A polynomial-time algorithm for linear optimizationbased on a new simple kernel function. Optim. Methods Softw., 18(6):631646,2003.

[7] Y. Q. Bai, M. El Ghami, and C. Roos. A new ecient large-update primal-dualinterior-point method based on a nite barrier. SIAM J. Optim., 13(3):766782,2002.

[8] Y. Q. Bai, C. Roos, and M. El Ghami. A primal-dual interior-point method forlinear optimization based on a new proximity function. Optim. Methods Softw.,17(6):9851008, 2002.

[9] Y. Q. Bai, M. El Ghami, and C. Roos. A comparative study of kernel functionsfor primal-dual interior-point algorithms in linear optimization. SIAM J. Optim.,15(1):101128, 2004.

[10] E. F. Beckenbach and R. Bellman. Inequalities. Springer-Verlag, Berlin, 1961.

117

118 Bibliography

[11] A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization.MPS/SIAM Series on Optimization. Society for Industrial and Applied Math-ematics (SIAM), Philadelphia, PA, 2001.

[12] E. de Klerk. Aspects of semidenite programming, volume 65 of Applied Opti-mization. Kluwer Academic Publishers, Dordrecht, 2002.

[13] D. den Hertog, C. Roos, and J.-Ph. Vial. A complexity reduction for the long-step path-following algorithm for linear programming. SIAM J. Optim., 2(1):7187, 1992.

[14] T. Dray and C. A. Manogue. The exceptional Jordan eigenvalue problem. Inter-nat. J. Theoret. Phys., 38(11):29012916, 1999.

[15] J. Faraut and A. Korányi. Analysis on symmetric cones. Oxford MathematicalMonographs. The Clarendon Press Oxford University Press, New York, 1994.

[16] L. Faybusovich. A Jordan-algebraic approach to potential-reduction algorithms.Math. Z., 239(1):117129, 2002.

[17] L. Faybusovich. Euclidean Jordan algebras and interior-point algorithms. Posi-tivity, 1(4):331357, 1997.

[18] L. Faybusovich. Linear systems in Jordan algebras and primal-dual interior-pointalgorithms. J. Comput. Appl. Math., 86(1):149175, 1997.

[19] L. Faybusovich and R. Arana. A long-step primal-dual algorithm for the sym-metric programming problem. Systems Control Lett., 43(1):37, 2001.

[20] M. El Ghami. New primal-dual interior-point methods based on kernel functions.PhD thesis, TUDelft, 2005.

[21] F. Glineur. Improving complexity of structured convex optimization problemsusing self-concordant barriers. European J. Oper. Res., 143(2):291310, 2002.

[22] O. Güler. Barrier functions in interior point methods. Math. Oper. Res., 21(4):860885, 1996.

[23] R. A. Hauser and O. Güler. Self-scaled barrier functions on symmetric cones andtheir classication. Found. Comput. Math., 2(2):121143, 2002.

[24] R. A. Hauser and Y. Lim. Self-scaled barriers for irreducible symmetric cones.SIAM J. Optim., 12(3):715723, 2002.

[25] R. A. Horn and C. R. Johnson. Topics in matrix analysis. Cambridge UniversityPress, Cambridge, 1991.

[26] T. Jiang. Algebraic methods for diagonalization of a quaternion matrix in quater-nionic quantum theory. Journal of Mathematical Physics, 46(5):052106, 2005.

Bibliography 119

[27] N. Karmarkar. A new polynomial-time algorithm for linear programming. Com-binatorica, 4(4):373395, 1984.

[28] A. Korányi. Monotone functions on formally real Jordan algebras. Math. Ann.,269(1):7376, 1984.

[29] P. Lancaster. On eigenvalues of matrices dependent on a parameter. Numer.Math., 6:377387, 1964.

[30] H. C. Lee. Eigenvalues and canonical forms of matrices with quaternion coe-cients. Proc. Roy. Irish Acad. Sect. A., 52:253260, 1949.

[31] Y. Lim. Geometric means on symmetric cones. Arch. Math. (Basel), 75(1):3945,2000.

[32] Z.-Q. Luo, J. F. Sturm, and S. Zhang. Conic convex programming and self-dualembedding. Optim. Methods Softw., 14(3):169218, 2000.

[33] H. Lütkepohl. Handbook of matrices. John Wiley & Sons Ltd., Chichester, 1996.

[34] A. W. Marshall and I. Olkin. Inequalities: theory of majorization and its appli-cations, volume 143 of Mathematics in Science and Engineering. Academic PressInc., New York, 1979.

[35] R. D. C. Monteiro and Y. Zhang. A unied analysis for a class of long-stepprimal-dual path-following interior-point algorithms for semidenite program-ming. Math. Programming, 81(3, Ser. A):281299, 1998.

[36] Evar D. Nering. Linear algebra and matrix theory. Wiley International Editions.John Wiley & Sons, Inc, second edition, 1970.

[37] Y. Nesterov and A. Nemirovskii. Interior-point polynomial algorithms in convexprogramming, volume 13 of SIAM Studies in Applied Mathematics. Society forIndustrial and Applied Mathematics (SIAM), Philadelphia, PA, 1994.

[38] Y. E. Nesterov and M. J. Todd. Self-scaled barriers and interior-point methodsfor convex programming. Math. Oper. Res., 22(1):142, 1997.

[39] Y. E. Nesterov and M. J. Todd. Primal-dual interior-point methods for self-scaledcones. SIAM J. Optim., 8(2):324364, 1998.

[40] J. Peng, C. Roos, and T. Terlaky. A new and ecient large-update interior-pointmethod for linear optimization. Vychisl. Tekhnol., 6(4):6180, 2001.

[41] J. Peng, C. Roos, and T. Terlaky. A new class of polynomial primal-dual methodsfor linear and semidenite optimization. European J. Oper. Res., 143(2):234256,2002.

120 Bibliography

[42] J. Peng, C. Roos, and T. Terlaky. Self-regular functions and new search directionsfor linear and semidenite optimization. Math. Program., 93(1, Ser. A):129171,2002.

[43] J. Peng, C. Roos, and T. Terlaky. Self-regularity: a new paradigm for primal-dualinterior-point algorithms. Princeton Series in Applied Mathematics. PrincetonUniversity Press, Princeton, NJ, 2002.

[44] B. K. Rangarajan. Polynomial convergence of infeasible interior point methodsover symmetric cones. SIAM J. Optim., 16(4):12111229, March 2006.

[45] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and algorithms for linear optimiza-tion. Wiley-Interscience Series in Discrete Mathematics and Optimization. JohnWiley & Sons Ltd., Chichester, 1997. (Second edition: Springer 2005).

[46] S. H. Schmieta and F. Alizadeh. Associative and Jordan algebras, and polynomialtime interior-point algorithms for symmetric cones. Math. Oper. Res., 26(3):543564, 2001.

[47] S. H. Schmieta and F. Alizadeh. Extension of primal-dual interior point algo-rithms to symmetric cones. Math. Program., 96(3, Ser. A):409438, 2003.

[48] J. F. Sturm. Similarity and other spectral relations for symmetric cones. LinearAlgebra Appl., 312(1-3):135154, 2000.

[49] J. F. Sturm. Using SeDuMi 1.02, a MATLAB toolbox for optimization oversymmetric cones. Optim. Methods Softw., 11/12(1-4):625653, 1999.

[50] M. J. Todd. Recent developments and new directions in linear programming. InMathematical programming (Tokyo, 1988), volume 6 of Math. Appl. (JapaneseSer.), pages 109157.

[51] S. J. Wright. Primal-dual interior-point methods. Society for Industrial andApplied Mathematics (SIAM), Philadelphia, PA, 1997.

[52] Y. Ye. Interior point algorithms. Wiley-Interscience Series in Discrete Mathe-matics and Optimization. John Wiley & Sons Inc., New York, 1997.

Index

algebra, 7Albert, 112associative, 7Cliord, 3commutative, 7Jordan, 13power associative, 8

associative, 7automorphism, 38

orthogonal, 38

barrier function, 73dual logarithmic, 86primal logarithmic, 82primal-dual logarithmic, 86

central path, 85coercive function, 71commutator, 13cone, 27

automorphism group of a, 28convex, 27dual, 27homogeneous, 28linear, 34of squares, 28pointed, 27positive semidenite, 34primitive symmetric, 36second order, 34self-dual, 27symmetric, 28

determinant, 10dierentiable, 18

twice, 32direct sum, 34dual feasible, 77duality gap, 78

e-convex, 72eigenvalue, 10

simple, 57

geometric mean, 80

ideal, 35idempotent, 21

primitive, 23inequality

Hadamard, 55weighted arithmetic-geometric mean,

113inner iteration, 89interior-point method

large-update, 89small-update, 89

inverse element, 12iteration complexity, 89

Jordan algebraEuclidean, 21simple, 35

Jordan frame, 23

kernel function, 71eligible, 72

matrixpositive semidenite (denite), 107

121

122 Index

similar, 107

norm-based proximity measure, 89

octonions, 112operator

adjoint, 27positive semidenite (denite), 30self-adjoint, 27

orthogonal, 38orthogonal projection, 41outer iteration, 89

Peirce decomposition, 39, 44polynomial

characteristic, 9minimal, 8

primal feasible, 77

quadratic representation, 16fundamental formula of the, 21

quaternion, 109quaternion matrix, 109

positive semidenite (denite), 110complex representation of a, 110Hermitian, 110

rank, 8regular, 8

scaling point, 81set

bounded, 105closed, 105closure of a, 105compact, 105dense, 105open, 105

similar, 49spectral decomposition, 22, 24spectral function, 66

separable, 66strictly feasible, 78symmetric function, 65

trace, 10

Summary

Jordan algebraic approach to symmetric optimization

In this thesis we present a generalization of interior-point methods for linear optimiza-tion based on kernel functions to symmetric optimization. It covers the three standardcases of conic optimization: linear optimization, second-order cone optimization andsemi-denite optimization.

We give an introduction to Euclidean Jordan algebras and explain the connectionbetween such algebras and symmetric cones.

We establish some properties of eigenvalues in Jordan algebras and prove that thebarrier functions based on kernel functions are separable spectral functions that onlydepend on the eigenvalues of their arguments.

We propose an interior-point algorithm for symmetric optimization and derive itscomplexity bound.

123

Samenvatting

Jordan algebraïsche benadering van symmetrische optimalis-ering

Het onderwerp van dit proefschrift is een generalisatie van op kernfuncties gebaseerdeinwendige punt methoden voor lineaire optimalisering naar symmetrische optimalis-ering. Het omvat de drie standaardgevallen van symmetrische optimalisering: lineareoptimalisering, tweede-orde kegel optimalisering en semi-deniete optimalisering.

Wij geven een inleiding tot Euclidische Jordan algebras en verduidelijken de relatietussen dergelijke algebras en symmetrische kegels.

Wij leiden enkele eigenschappen af van de eigenwaarden in Jordan algebras enbewijzen dat op kernfuncties gebaseerde barriëre functies separable spectrale functieszijn die alleen afhangen van de eigenwaarden van hun argumenten.

Tenslotte introduceren wij een inwendige punt methode voor symmetrische opti-malisering en leiden daarvoor een complexiteitsgrens af.

125

Curriculum Vitae

Manuel Vieira was born in Vila Nova de Foz Côa, Portugal on November 28, 1972.He studied Applied Mathematics at the New University of Lisbon and graduated

in 1996. During one year he worked as trainee at the National Electric Network. In1998, he started to work as assistant teacher at the New University of Lisbon.

He got a master degree in Operational Research at the University of Lisbon, in2002.

During the HPOPT 2002 Workshop, in Tilburg, he met Prof. dr. ir. Roos andthey agreed to start his PhD in TUDelft in the following year.

He started his PhD in October 2003, at the Optimization Group, Department ofSoftware Technology, Faculty of Electrical Engineering, Mathematics and ComputerScience, TUDelft, under the supervision of Prof. dr. ir. Roos. During this period hegot a LNMB (Dutch Network for the Mathematics of Operational Research) Diploma.

This PhD position at the Technical University of Delft was nancially supportedby the Portuguese Foundation for Science and Technology and by the New Universityof Lisbon.

127

Jordan Algebraic Approach to Symmetric Optimization · 2012-12-10 · This simpler analysis in [9]...

Documents

Transcript of Jordan Algebraic Approach to Symmetric Optimization · 2012-12-10 · This simpler analysis in [9]...