Linear algebra ma106 iitb

8/13/2019 Linear algebra ma106 iitb

1/71

Linear Algebra

Murali K. Srinivasan

Jugal K. Verma

February 13, 2014


2/71

2


3/71

Contents

1 Matrices, Linear Equations and Determinants 5

1.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Gauss Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Vector spaces and Linear Transformations 29

2.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.2 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Inner product spaces 49

3.1 Length, Projection, and Angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2 Projections and Least Squares Approximations . . . . . . . . . . . . . . . . . . . . . 54

3.3 Determinant and Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4 Eigenvalues and eigenvectors 61

4.1 Algebraic and Geometric multiplicities . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3


4/71

4 CONTENTS


5/71

Chapter 1

Matrices, Linear Equations and

Determinants

1.1 Matrix Operations

Convention 1.1.1. We shall write F to mean either the real numbersR or the complex numbersC. Elements ofF will be calledscalars.

Let m, n be positive integers. An m n matrix A over F is a collection ofmn scalars aij Farranged in a rectangular array ofm rows and n columns:

A =

a11 a12 a1na21 a22 a2n

am1 am2 amn

.

The entry in rowi and column j is aij . We also write A = (aij) to denote the entries. When allthe entries are in R we say that A is a real matrix. Similarly, we define complex matrices. Forexample,

1 1 3/2

5/2 6 11.2

is a 2 3 real matrix.A 1 n matrix [a1a2 an] is called a row vectorand am 1 matrix

b1b2

bn

is called a column vector. An n

n matrix is called a square matrix.

Matrix Addition

5


6/71

6 CHAPTER 1. MATRICES, LINEAR EQUATIONS AND DETERMINANTS

Let M , N be m n matrices. Then M+ N is a m n matrix whose (i, j) entry is the sum ofthe (i, j) entries ofM andN. For example,

2 1 01 3 5

+

1 0 34 3 1

=

3 1 35 0 6

.

Note that addition is defined only when both matrices have the same size.

Scalar multiplication

Let F and letM be am nmatrix. ThenM is a m nmatrix whose (i, j) entry is (i, j) entry ofM. For example

2

0 12 32 1

=

0 24 64 2

.

Matrix multiplication

First we define the product of a row vector a = [a1 . . . an] and a column vector b =

b1

bn

,

both withn components.

Define ab to be the scalarn

i=1 aibi.

The product of two matrices A = (aij) and B = (bij), denoted AB, is defined only when thenumber of columns ofA is equal to the number of rows ofB . So let A be a m

n matrix and let

B be a n p matrix. Let the row vectors ofA be A1, A2, . . . , Am and let the column vectors ofBbe B1, B2, . . . , Bp. We write

A=

A1A2

Am

, B = [B1B2 Bp] .

ThenM =AB is a m p matrix whose (i, j) entry mij , for 1im and 1jp, is givenby

mij =AiBj =

nk=1

aikbkj .

For example, 1 3 12 4 2

2 01 10 1

= 5 4

8 6

.

The usefulness and meaning of this definition will emerge as this course progresses. Meanwhile,let us note other ways of thinking about matrix multiplication. First a definition. By a linearcombination ofn 1 column vectors v1, v2, . . . , vr we mean a column vector v of the form v =1v1+ +rvr, where i F for all i are called the coefficients. Similarly we define a linearcombination of row vectors.

Matrix times a column vector


7/71

1.1. MATRIX OPERATIONS 7

Lemma 1.1.2. LetB = [B1 B2 Bp]be an pmatrix with columnsB1, . . . , Bp. Letx = x1

xp

be a column vector withp components. Then

Bx = x1B1+ x2B2+ + xpBp.

Proof. Both sides are n 1. By definition

Bx =

pj=1 b1jxj

pj=1 b2jxj

pj=1 bnjxj

=

p

j=1

b1jxjb2jxj

bnjxj

=

p

j=1

xjBj ,

as desired. So, Bxcan be thought of as a linear combination of the columns ofB , with columnlhaving coefficient xl. This way of thinking about Bx is very important.

Example 1.1.3. Lete1, e2, . . . , ep denote the standard column vectors with p components, i.e., eidenotes thep1 column vector with 1 in componenti and all other components 0. ThenBei = Bi,columni ofB .

Row vector times a matrix

Let A = be a mn with rows A1, . . . , Am. Let y = [y1 ym] be a row vector with mcomponents. Then (why?)

yA= y1A1+ y2A2+ + ymAm.So, yAcan be thought of as a linear combination of the rows ofA, with rowi having coefficientyi.

Columns and rows of product

LetA and B be as above. Then (why?)

AB = [AB1 AB2 ABp] =

A1BA2B

AmB

.

So, thej th column ofAB is a linear combination of the columns ofA, the coefficients coming fromthe j th column Bj ofB . For example,

1 3 12 4 2

2 01 10 1

= 5 4

8 6

.

The second column of the product can be written as

0

12

+ 1

34

+ 1

12

=

46

.


8/71


Similarly, ith rowAiB ofAB is a linear combination of the rows ofB , the coefficients coming from

the ith row Ai ofA.Properties of Matrix Operations

Theorem 1.1.4. The following identities hold for matrix sum and product, whenever the sizes ofthe matrices involved are compatible (for the stated operations).

(i) A(B+ C) =AB + AC.

(ii) (P+ Q)R= P R + QR.

(iii) A(BC) = (AB)C.

(iv) c(AB) = (cA)B = A(cB).

Proof. We prove item (iii) (leaving the others as exercises). Let A = (aij) have p columns,B = (bkl) have p rows and qcolumns, and C = (crs) have q rows. Then the entry in row i andcolumns ofA(BC) is

=

pm=1

a(i, m){entry in row m, columns ofBC}

=

pm=1

a(i, m)

qn=1

b(m, n)c(n, s)

=

q

n=1

p

m=1

a(i, m)b(m, n) c(n, s),which is the entry in row i and columns ofA(BC).

Matrix multiplication is not commutative. For example : 1 00 0

0 10 0

=

0 10 0

,

but 0 10 0

1 00 0

=

0 00 0

Definition 1.1.5. A matrix all of whose entries are zero is called thezero matrix. The entriesaii of a square matrixA= (aij) are called the diagonal entries. If the only nonzero entries of asquare matrixA are the diagonal entries thenA is called adiagonal matrix. Ann n diagonalmatrix whose diagonal entries are 1 is called then n identity matrix. It is denoted by In. Asquare matrixA= (aij) is calledupper triangular if all the entries below the diagonal are zero,i.e., aij = 0 fori > j. Similarly we define lower triangular matrices.

A square matrixA is callednilpotent ifAr = 0 for somer1.Example 1.1.6. Let A = (aij) be an upper triangular n n matrix with diagonal entries zero.ThenA is nilpotent. In fact An = 0.

Since column j of An

is An

ej , it is enough to show that An

ej = 0 for j = 1, . . . , n. Denotecolumnj ofA by Aj .


9/71

1.1. MATRIX OPERATIONS 9

We have Ae1 = A1 = 0. Now

A2e2= A(Ae2) =AA2= A(a12e1) =a12Ae1= 0.

Similarly

A3e3= A2(Ae3) =AA3= A

2(a13e1+ a23e2) = 0.

Continuing in this fashion we see that all columns ofAn are zero.

Inverse of a Matrix

Definition 1.1.7. LetA be annnmatrix. If there is annnmatrixB such thatAB = In= BAthen we sayA is invertible andB is the inverse ofA. The inverse ofA is denoted byA1.

Remark 1.1.8. (1) Inverse of a matrix is uniquely determined. Indeed, ifB andCare inverses ofAthen

B = BI=B(AC) = (BA)C= I C= C.

(2) IfAand B are invertible n n matrices, then AB is also invertible. Indeed,

(B1A1)(AB) =B1(A1A)B = B1B = I .

Similarly (AB)(B1A1) =I . Thus AB is invertible and (AB)1 =B1A1.

(3) We will see later (in Chapter 3) that if there exists an n n matix B for an n n matrix Asuch that AB = I or BA = I, then A is invertible. This fact fails for non-square matrices. Forexample

[1 2]

10

= [1] =I1, but

10

[1 2] =

1 20 0

=I2.

(4) Inverse of a square matrix need not exist. For example, letA=

1 00 0

. If

a bc d

is any

2 2 matrix, then 1 00 0

a bc d

=

a b0 0

=I2

for any a,b,c,d.

Transpose of a Matrix

Definition 1.1.9. LetA = (aij) be anm n matrix. Then the transposeofA, denoted byAt, isthe matrixn n matrix(bij) such thatbij =aji for alli,j.Thus rows ofA become columns ofAt and columns ofA become rows ofAt. For example, if

A=

2 0 11 0 1

thenAt =

2 10 0

1 1

.

Lemma 1.1.10. (i) For matricesA andB of suitable sizes, (AB)t =B tAt.

(ii) For any invertible square matrixA, (A1)t = (At)1.


10/71


Proof. For any matrix C, let Cij denote its (i, j)th entry.

(i) Let A = (aij), B = (bij). Then, for all i, j,

((AB)t)ij = (AB)ji

=

ajkbki

=

(At)kj(Bt)ik

=

(Bt)ik(At)kj

= (BtAt)ij

(ii) Since AA1 = I = A1A, we have (AA1)t = I = (A1A)t. By (i), (A1)tAt = I =

At(A1)t. Thus (At)1 = (A1)t.

Definition 1.1.11.A square matrixA is calledsymmetricifA = At.It is calledskew-symetricifAt =A.Lemma 1.1.12. (i) IfA is a symmetric matrix then so isA1. (ii) Every square matrixA is asum of a symmetric and a skew symmetric matrix in a unique way.

Proof. (i) is clear from part (ii) above.(ii) Since

A=1

2

(A + At) +1

2

(A

At),

every matrix is a sum of a symmetric and a skew-symmetric matrix. To see the uniqueness, supposethat P is a symmetric matrix and Q is a skew-symmetric matrix such that

A= P+ Q.

ThenAt =Pt + Qt =P Q. Hence P = 12(A + At) and Q = 12(A At).

1.2 Gauss Elimination

We discuss a widely used method called the Gauss elimination method to solve a system ofm linearequations inn unknowns x1, . . . , xn:

ai1x1+ ai2x2+ + ainxn= bi, i= 1, 2, . . . , m ,

where the aij s and the bis are known scalars in F. If each bi = 0 then the system above is calleda homogeneous system. Otherwise, we say it is inhomogeneous.

Set A= (aij), b = (b1, . . . , bm)t, and x= (x1, . . . , xn)

t. We can write the system above in thematrix form

Ax= b.

The matrix A is called the coefficient matrix. By a solution, we mean any choice of the unknownsx1, . . . , xn which satisfies all the equations.


11/71

1.2. GAUSS ELIMINATION 11

Lemma 1.2.1. LetA be am n matrix overF, b Fm, andEan invertiblem m matrix overF. SetU =EA andc= Eb. ThenAx= b has the same solutions asU x= c.

Proof. Ax = b implies EAx = Eb. Similarly, EAx = Eb implies E1(EAx) = E1(Eb) orAx= b.

The idea of Gauss elimination is the following:

(i) Find a suitable invertible E so that U is in row echelon form or row canonical form(defined below).

(ii) All solutions to U x = c, when U is in row echelon form or row canonical form , can bewritten down easily.

We first describe step (ii) and then step (i).

Definition 1.2.2. A m n matrixM is said to be in row echelon form (ref)if it satisfies thefollowing conditions:

(a) By a zero row ofMwe mean a row with al l entries zero. SupposeM hask nonzero rowsandm k zero rows. Then the lastm k rows ofMare the zero rows.

(b) The first nonzero entry in a nonzero row is called apivot. Fori = 1, 2, . . . , k, suppose thatthe pivot in rowi occurs in columnji. Then we havej1< j2


12/71


and nonpivotal columns 1, 3, 4, 6, 8. Now let R be the matrix

0 a a13 a14 a15 a16 a17 a180 0 0 0 b a26 a27 a280 0 0 0 0 0 c a380 0 0 0 0 0 0 0

,

wherea,b, c are nonzero scalars and the aij s are arbitrary scalars. It may be checked that R is inref with pivotal columns 2, 5, 7 and nonpivotal columns 1, 3, 4, 6, 8.

Example 1.2.5. Let U be the matrix from the example above. Letc = (c1, c2, c3, c4)t. We want

to write down all solutions to the system U x= c.

(i) Ifc4= 0 then clearly there is no solution.

(ii) Now assume that c4= 0. Call the variables x2, x5, x7pivotal and the variables x1, x3, x4, x6, x8nonpivotal or free.

Give arbitrary values x1 =s, x3 = t, x4 =u, x6 =v, x8 = w to the free variables. These valuescan be extended to values of the pivotal variables in one and only one way to get a solution to thesystemU x= c:

x7 = c3 a38wx5 = c2 a26v a28wx2 = c1 a13t a14u a16v a18w

Thus (why?) the set of all solutions to U x= c can be written as

sc1 a13t a14u a16v a18w

tu

c2 a26v a28wv

c3 a38ww

,

wheres, t, u, v,w are arbitrary scalars.

(iii) The column vector above can be written as

0c100c20c30

+ s

10000000

+ t

0a13

100000

+ u

0a14

010000

+ v

0a16

00

a26100

+ w

0a18

00

a280

a381

Thus every solution to U x= c is of the form above, for arbitrary scalars s, t, u, v,w. Note that thefirst vector in the expression above is the unique solution to U x= c that has all free variables zero

and that the other vectors (without the coefficients) are the unique solutions to U x= 0 that haveone free variable equal to 1 and the other free variables equal to zero.


13/71


Example 1.2.6. Let R be the matrix from the example above. Letc = (c1, c2, c3, c4)t. We want

to write down all solutions to the system U x= c.(i) Ifc4= 0 then clearly there is no solution.(ii) Now assume that c4= 0. Call the variables x2, x5, x7pivotal and the variables x1, x3, x4, x6, x8

nonpivotal or free.

Give arbitrary values x1 =s, x3 =t, x4 = u, x6 = v, x8 = w to the free variables. These valuescan be extended to values of the pivotal variables in one and only one way to get a solution to thesystemRx = c:

x7 = (c3 a38w)/cx5 = (c2 a26v a27x7 a28w)/bx2 = (c1 a13t a14u a15x5 a16v a17x7 a18w)/a

The process above is called back substitution. Given arbitrary values for the free variables,we first solve for the value of the largest pivotal variable, then using this value (and the values ofthe free variables) we get the value of the second largest pivotal variable, and so on.

We extract the following Lemma from the examples above and its proof is left as an exercise.

Lemma 1.2.7. LetU be am n matrix in ref. Then the only solution to the homogeneous systemU x= 0 which is zero in all free variables is the zero solution.

Note that a matrix in rcf is also in ref and the lemma above also applies to such matrices.

Theorem 1.2.8. LetAx = b, withA anm n matrix. Letc be a solution ofAx= b andS theset of all solutions of the associated homogeneous systemAx= 0. Then the set of all solutions toAx= b is

c + S={c + v : vS}.

Proof. LetAu = b. ThenA(uc) =Au Ac= b b= 0. So u cSand u = c + (uc)c + S.Conversely, let vS. ThenA(c + v) =Ac + Av= b + 0 =b. Hence c + v is a solution to Ax = b.

The proof of the following important result is almost obvious and is left as an exercise.

Theorem 1.2.9. LetU be am n matrix in ref withk pivotal columnsP ={j1< j2


14/71


(iv) Letp be the unique solution ofU x= c having al l free variables zero. Then every solution

ofU x= c is of the formp +

iF

aisi,

where theais are arbitrary scalars.

Example 1.2.10.In our previous two examples P ={2, 5, 7}and F ={1, 3, 4, 6, 8}. To make surethe notation of the theorem is understood write down p and si, i= 1, 3, 4, 6, 8.

We now discuss the first step in Gauss elimination, namely, how to reduce a matrix to ref orrcf. We define a set of elementary row operations to be performed on the equations of a system.These operations transform a system of equations into another system with the same solution set.

Performing an elementary row operation on Ax = b is equivalent to replacing this system by thesystemE Ax= Eb, where E is an invertible elementary matrix.

Elementary row operations and elementary matrices

Leteij denote them nmatrix with 1 in the ith row andj th column and zero elsewhere. Anymatrix A = (aij) of size m n can be written as

A=m

i=1n

j=1aijeij .

For this reason eij s are called the matrix units. Let us see the effect of multiplyinge13 with amatrix A written in terms of row vectors :

e13A=

0 0 1 0...

... ...

......

... ...

......

... ...

...0 0 0 0

mm

R1 R2 R3

...

Rm

mn

=

R3 0 0

...

0

.

In general, ifeij is an m m matrix unit and A is anm n matrix then

eijA=

0

...

Rj 0

...

0

ith row.

We now define three kinds of elementary row operations and elementary matrices. Consider thesystemAx = b, where Ais m n, b is m 1, andx is a n 1 unknown vector.


15/71


(i) Elementary row operation of type I: For i= j and a scalar a, add a times equation j toequationi in the systemAx = b.

What effect does this operation have on A andb? Consider the matrix

E=

11 a

1. . .

1

or

11

. . .

a1

=I+ aeij , i=j.

This matrix has 1s on the diagonal and a scalaraas an off-diagonal entry. By the above observation

(I+ aeij)

R1 R2

... Rm

=

R1 R2

... Rm

+ a

0 Rj

... 0

ith row

=

R1 ...

Ri+ aRj ...

Rm

ith row

It is now clear that performing an elementary row operation of type I on the system Ax = b weget the new system E Ax= Eb.

Suppose we perform an elementary row operation of type I as above. Then perform the sameelementary row operation of type I but with the scalar a replaced by the scalara. It is clear thatwe get back the original system Ax = b. It follows (why?) that E1 =I aeij .

(ii) Elementary row operation of type II: Fori=j interchange equations i andj in the systemAx= b.

What effect does this operation have on A andb?. Consider the matrix

F =

11

. . .

0 1. . .

1 0. . .

1

=I+ eij+ eji eii ejj .

Premultiplication by this matrix has the effect of interchanging theith andjth rows. Performingthis operation twice in succession gives back the original system. Thus F2 =I.


16/71


(iii) Elementary row operation of type III: Multiply equation i in the system Ax = b by a

nonzero scalar c.What effect does this operation have on A and b?. Consider the matrix

G=

11

. . .

c1

1. . .

1

=I+ (c 1)eii , c= 0

Premultiplication by G has the effect of multiplying the ith row by c. Do this operation twicein succession, first time with the scalar c and the second time with scalar 1/c, yields the originalsystem back. It follows that G1 =I+ (c1 1)eii.

The matrices E , F , G above are called elementary matrices of type I,II,III respectively. Wesummarize the above discussion in the following result.

Theorem 1.2.11. Performing an elementary row operation (of a certain type) on the systemAx = b is equivalent to premultiplying A and b by an elementary matrix E (of the same type),yielding the systemEAx = Eb.

Elementary matrices are invertible and the inverse of an elementary matrix is an elementarymatrix of the same type.

Since elementary matrices are invertible it follows that performing elementary row operationsdoes not change the solution set of the system. We now show how to reduce a matrix to rowreduced echelon form using a sequence of elementary row operations.

Theorem 1.2.12.Every matrix can be reduced to a matrix in rcf by a sequence of elementary rowoperations.

Proof. We apply induction on the number of rows.If the matrix A is a row vector, the conclusion

is obvious. Now suppose thatA is m n, where m 2. IfA = 0 then we are done. IfA is notthe zero matrix then there is a nonzero column in A. Find the first nonzero column, say column

j1, from the left. Interchange rows to move the first nonzero in columnj1 to the top row. Nowmultiply by a nonzero scalar to make this entry (in row 1 and column j1) 1. Now add suitablemultiples of the first row to the remaining rows so that all entries in column j1, except the entryin row 1, become zero. The resulting matrix looks like

A1=

0 0 1 0 0 0 0 0 0


17/71


By induction, the submatrix of A1 consisting of rows 2, 3 . . . , m can be reduced to row reduced

echelon form. So now the resulting matrix looks like

A2=

1 v

D

where blank space consists of 0s, vis a row vector with nj1components, and Dis a (m1)(nj1)matix in rcf. Let the pivotal columns ofD be j2 < j3


18/71


Example 1.2.15. Consider the system

Ax=

1 3 2 0 2 02 6 5 2 4 30 0 5 10 0 152 6 0 8 4 18

x1x2x3x4x5x6

=

01

56

=b.

Applying the indicated elementary row operations to A andb we get

1 3 2 0 2 0 02 6

5

2 4

3

1

0 0 5 10 0 15 52 6 0 8 4 18 6

R2 2R1

R4 2R1

1 3 2 0 2 0 00 0

1

2 0

3

1

0 0 5 10 0 15 50 0 4 8 0 18 6

R2

1 3 2 0 2 0 00 0 1 2 0 3 10 0 5 10 0 15 50 0 4 8 0 18 6

R3 5R2

R4 4R2

1 3 2 0 2 0 00 0 1 2 0 3 10 0 0 0 0 0 00 0 0 0 0 6 2

R3R4

1 3 2 0 2 0 00 0 1 2 0 3 10 0 0 0 0 6 2

0 0 0 0 0 0 0

(1/6)R3

1 3 2 0 2 0 00 0 1 2 0 3 10 0 0 0 0 1 1/3

0 0 0 0 0 0 0

R2 3R3

1 3 2 0 2 0 00 0 1 2 0 0 00 0 0 0 0 1 1/30 0 0 0 0 0 0

R1+ 2R2

1 3 0 4 2 0 00 0 1 2 0 0 00 0 0 0 0 1 1/30 0 0 0 0 0 0

It may be checked that every solution to Ax = b is of the form

000

001/3

+ s

310

000

+ r

40

2100

+ t

200

010

,

for some scalars s, t, r.

Example 1.2.16. Consider the system

Ax=

1 3 2 0 2 02 6 5 2 4 30 0 5 10 0 15

2 6 0 8 4 18

x1x2x3x4

x5x6

=

0

16

6

=b.


19/71


Applying the indicated elementary row operations to A and b we get

1 3 2 0 2 0 02 6 5 2 4 3 10 0 5 10 0 15 62 6 0 8 4 18 6

R2 2R1

R4 2R1

1 3 2 0 2 0 00 0 1 2 0 3 10 0 5 10 0 15 60 0 4 8 0 18 6

R2

1 3 2 0 2 0 00 0 1 2 0 3 10 0 5 10 0 15 60 0 4 8 0 18 6

R3 5R2

R4 4R2

1 3 2 0 2 0 00 0 1 2 0 3 10 0 0 0 0 0 10 0 0 0 0 6 2

R3

R4

1 3 2 0 2 0 00 0 1 2 0 3 10 0 0 0 0 6 20 0 0 0 0 0 1

(1/6)R3 1 3 2 0 2 0 00 0 1 2 0 3 10 0 0 0 0 1 1/30 0 0 0 0 0 1

R2 3R3

1 3 2 0 2 0 00 0 1 2 0 0 00 0 0 0 0 1 1/30 0 0 0 0 0 1

R1+ 2R2

1 3 0 4 2 0 00 0 1 2 0 0 00 0 0 0 0 1 1/30 0 0 0 0 0 1

It follows that the system has no solution.

Calculation ofA1 by Gauss elimination

Lemma 1.2.17. LetA be a square matrix. Then the following are equivalent:

(a) A can be reduced to Iby a sequence of elementary row operations.

(b) A is a product of elementary matrices.

(c) A is invertible.

(d) The systemAx= 0 has only the trivial solutionx= 0.

Proof. (a) (b). Let E1, . . . , E k be elementary matrices so that Ek. . . E 1A = I. Thus A =E11 . . . E

1k .

(b) (c) Elementary matrices are invertible.

(c)(d) Suppose A is invertible. ThenAX= 0. Hence A1(AX) =X= 0.

(d)(a) First observe that a square matrix in rcf is either the identity matrix or its bottom rowis zero. IfA cant be reduced to Iby elementary row operations thenU= the rcf ofA has a zerorow at the bottom. Hence U x= 0 has atmost n 1 nontrivial equations. which have a nontrivialsolution. This contradicts (d).

This proposition provides us with an algorithm to calculate inverse of a matrix if it exists. IfA

is invertible then there exist invertible matrices E1, E2, . . . , E k such that Ek E1A= I . MultiplybyA1 on both sides to get Ek E1I=A1.


20/71


Lemma 1.2.18. (Gauss-Jordan Algorithm) LetA be an invertible matrix. To computeA1, apply

elementary row operations to A to reduce it to an identity matrix. The same operations whenapplied to I, produceA1.

Example 1.2.19. We find the inverse of the matrix

A=

1 0 01 1 0

1 1 1

.

by forming the 3 6 matrix

[A|I] =

1 0 0 1 0 01 1 0 0 1 0

1 1 1 0 0 1

.

Now perform row operations to reduce the matrix A to I. In this process the identity matrix willreduce to A1.

[A|I] = 1 0 0 1 0 01 1 0 0 1 0

1 1 1 0 0 1

R2 R1

R3 R1

1 0 0 1 0 00 1 0 1 1 0

0 1 1 1 0 1

R3 R2

1 0 0 1 0 00 1 0 1 1 00 0 1 0 1 1

. Hence A1 =

1 0 0

1 1 00 1 1

1.3 Determinants

In this section we study determinants of matrices. Recall the formula for determinants ofk kmatrices, for k = 1, 2, 3.

det[a] = a, det

a bc d

=ad bc

and deta b cd e f

g h i =aei ahf bdi + bgf+ cdh ceg.

Our approach to determinants ofn n is via their properties (rather than via an explicit formulaas above). It makes their study more elegant. Later, we will give a geometric interpretation ofdeterminant in terms of volume.

Letd be a function that associates a scalar d(A) Fwith every n nmatrixA over F. We usethe following notation. If the columns ofA are A1, A2, . . . , An, we write d(A) =d(A1, A2, . . . , An).

Definition 1.3.1. (i) d is calledmultilinear if for each k = 1, 2, . . . , n; scalars , and n 1column vectorsA1, . . . , Ak1, Ak+1, . . . , An, B , C

d(A1, . . . , Ak1, B+ C,Ak+1, . . . , An) =

d(A1, . . . , Ak1, B , Ak+1, . . . , An) + d(A1, . . . , Ak1, C , Ak+1, . . . , An).


21/71

1.3. DETERMINANTS 21

(ii) d is calledalternating ifd(A1, A2, . . . , An) = 0 ifAi= Aj for somei=j.(iii) d is callednormalized ifd(I) =d(e1, e2, . . . , en) = 1, whereei is theith standard column

vector with 1 in theith coordinate and 0s elsewhere.

(iv) A normalized, alternating, and multillinear functiond onn n matrices is called adeter-minant function of ordern.

Our immediate objective is to show that there is only one determinant function of ordern. Thisfact is very useful in proving that certain formulas yield the determinant. We simply show thatthe formula defines an alternating, multilinear and normalized function on the columns ofn nmatrices.

Lemma 1.3.2. Suppose thatd(A1, A2, . . . , An)is a multilinear alternating function on columns ofn n matrices. Then

(a) If someAk = 0 thend(A1, A2, . . . , An) = 0.

(b) d(A1, A2, . . . , Ak, Ak+1, . . . An) =d(A1, A2, . . . , Ak+1, Ak, . . . , An).(c) d(A1, A2, . . . , Ai, . . . , Aj , . . . , An) =d(A1, A2, . . . , Aj , . . . , Ai, . . . , An).

Proof. (a) IfAk = 0 then by multilinearity

d(A1, A2, . . . , 0Ak, . . . , An) = 0 d(A1, A2, . . . , Ak, . . . , An) = 0.

(b) PutAk = B, Ak+1= C. Then by alternating property ofd(A1, A2, . . . , An),

0 = d(A1, A2, . . . , B+ C, B+ C , . . . , An)

= d(A1, A2, . . . , B , B + C , . . . , An) + d(A1, A2, . . . , C , B+ C , . . . , An)

= d(A1, A2, . . . , B , C , . . . , An) + d(A1, A2, . . . , C , B , . . . , An)

Henced(A1, A2, . . . , B , C , . . . , An) =d(A1, A2, . . . , C , B , . . . , An).(c) Follows from (b).

Remark 1.3.3.Note that the properties (a), (b), (c) have been derived by properties of determinantfunctions without having any formula at our disposal yet.

Computation of determinants

Example 1.3.4.We now derive the familiar formula for the determinant of 22 matrices. Supposed(A1, A2) is an alternating multilinear normalized function on 2 2 matrices A = (A1, A2). Then

d

x yz u

=xu yz.

To derive this formula, write the first column as A = xe1+ ze2 and the second column as A2 =


22/71


ye1+ ue2. Then

d(A1, A2) = d(xe1+ ze2, ye1+ ue2)

= d(xe1+ ze2, ye1) + d(xe1+ ze2, ue2)

= d(xe1, ye1) + d(ze2, ye1)

+d(xe1, ue2) + d(ze2, ue2)

= yzd(e2, e1) + xud(e1, e2)

= (xu yz)d(e1, e2)= xu yz.

Similarly, the formula for 3

3 determinants can also be derived as above. We leave this as an

exercise.

Lemma 1.3.5.Supposefis a multilinear alternating function onnnmatrices andf(e1, e2, . . . , en) =0. Thenfis identically zero.

Proof. Let A = (aij) be an n n matrix with columns A1, . . . , An. Write Aj as

Aj =a1je1+ a2je2+ + anjen.

Since f is multilinear we have (why?)

f(A1

, . . . , An) = h

ah(1)1

ah(2)2

ah(n)n

f(eh(1)

, eh(2)

, . . . , eh(n)

),

where the sum is over all functions h:{1, 2, . . . , n} {1, 2, . . . , n}.Since f is alternating we have (why?)

f(A1, . . . , An) =h

ah(1)1ah(2)2 ah(n)nf(eh(1), eh(2), . . . , eh(n)),

where the sum is now over all 1 1 onto functions h :{1, 2, . . . , n} {1, 2, . . . , n}.By using part (c) of the lemma above we see that we can write

f(A1, . . . , An) =h

ah(1)1ah(2)2 ah(n)nf(e1, e2, . . . , en),

where the sum is over all 1 1 onto functions h :{1, 2, . . . , n} {1, 2, . . . , n}.Thus f(A) = 0.

Existence and uniqueness of determinant function

Theorem 1.3.6. (Uniqueness of determinant function). Letfbe an alternating multilinear func-tion of order n and d a determinant function of order n. Then for all n n matrices A =(A1, A2, . . . , An),

f(A1, A2, . . . , An) =d(A1, A2, . . . , An)f(e1, e2, . . . , en).

In particular, iff is also a determinant function thenf(A1, A2, . . . , An) =d(A1, A2, . . . , An).


23/71


Proof. Consider the function

g(A1, A2, . . . , An) =f(A1, A2, . . . , An) d(A1, A2, . . . , An)f(e1, e2, . . . , en).

Sincef, d are alternating and multilinear so is g. Since

g(e1, e2, . . . , en) = 0

the result follows from the previous lemma.

We have proved uniqueness of determinant function of order n. It remains to show their exis-tence.

Convention 1.3.7. We shall denote the determinant ofA by det A or

|A

|.

Setting det[a] =a shows existence for n = 1.

Assume that we have shown existence of determinant function of order (n 1) (n 1). Thedeterminant of annnmatrixAcan be computed in terms of certain (n1)(n1) determinants.

LetAij = the (n 1) (n 1) matrix obtained from A by deleting the ith row andj th columnofA.

Theorem 1.3.8. LetA= (aij) be ann n matrix. Then the function

a11detA11 a12detA12+ + (1)n+1a1ndetA1n.

is multilinear, alternating, and normalized onn n matrices, hence is the determinant function.Proof. Denote the function by f(A1, A2, . . . , An).

Suppose that the columns Aj and Aj+1 ofA are equal. Then A1i have equal columns exceptwheni = j or i = j + 1. By induction f(A1i) = 0 for i=j, j+ 1. Thus

f(A) =a1j

(1)j+1f(A1j)

+

(1)j+2f(A1j+1)

a1j+1.

Since Aj = Aj+1, a1j = a1j+1 and A1j = A1j+1. Thus f(A) = 0. Therefore f(A1, A2, . . . , An) isalternating.

IfA = (e1, e2, . . . , en) then by induction

f(A) = 1f(A11) =f(e1, e2, . . . , en1) = 1.

We leave the multilinear property off(A1, . . . , An) as an exercise for the reader.

The formula in the lemma above above is called expansion by first row. Just like in thelemma above we can also prove the following formula for expansion by rowk . We leave its proofas an exercise.

Theorem 1.3.9. LetA= (aij) be ann n matrix and let1kn. Then

detA =n

j=1

(

1)k+jakj

detAkj

.


24/71


Theorem 1.3.10. (i) Let U be an upper triangular or a lower triangular matrix. Then detU =

product of diagonal entries ofU.(ii) LetEbe an elementary matrix of the typeI+ aeij, for somei=j. ThendetE= 1.(iii) LetE be an elementary matrix of the typeI+ eij+ eji eii ejj , for some i=j . Then

detE=1.(iv) LetEbe an elementary matrix of the typeI+ (a 1)eii, a= 0. ThendetE= a.

Proof. (i) LetU= (uij) be upper triangular. Arguing as in Lemma 3.5 we see that

detU =h

uh(1)1uh(2)2 uh(n)n,

where the sum is over all 1 1 onto functions h :{1, 2, . . . , n} {1, 2, . . . , n}. Since U is uppertriangular the only choice ofh yeilding a nonzero term is the identity function (and this gives aplus sign).

The proof for a lower triangular matrix is similar.

(ii) Follows from part (i).

(iii) Eis obtained from the identity matrix by exchanging columns i and j . The result followssince determinant is an alternating function.

(iv) Follows form part (i).

Determinant and Invertibility

Theorem 1.3.11. LetA, B be two n n matrices. Thendet(AB) = detAdetB.

Proof. Let Di denote the ith column of a matrix D . Then

(AB)i = ABi.

Therefore we need to prove that

det(AB1, AB2 . . . , A Bn) = det(A1, A2, . . . , An)det(B1, . . . , Bn)

KeepA fixed and define

f(B1, B2, . . . , Bn) = det(AB1, AB2, . . . , A Bn).

We show that fis alternating and multilinear. Let C be an 1 column vector. Thenf(B1, . . . , Bi, . . . , Bi, . . . , Bn) = det(AB1, . . . , A Bi, . . . , A Bi, . . . , A Bn) = 0

f(B1, . . . , Bk+ C , . . . , Bn) = det(AB1, . . . , A(Bk+ C), . . . , A Bn)

= det(AB1, . . . , A Bk+ AC,.. . ,ABn)

= det(AB1, . . . , A Bk, . . . , A Bn)

+ det(AB1, . . . , A C , . . . , A Bn)

= f(B1, . . . , Bn) + f(B1, . . . , C , . . . , Bn).


25/71


Therefore

f(B1, B2, . . . , Bn) = det(B1, . . . , Bn)f(e1, e2, . . . , en)

Now note that

f(e1, e2, . . . , en) = det(Ae1, . . . , A en)

= det(A1, . . . , An)

= detA

Hence det(AB) = detAdetB.

Lemma 1.3.12. (i) IfA is an invertible matrix thendetA

= 0 and

detA1 = 1

detA.

(ii) detA= 0 impliesA is invertible.(iii) SupposeA, B are square matrices withAB = I. ThenA is invertible andB = A1.

Proof. (i) Since AA1 =I , detA1detA= detI= 1.

(ii) SupposeA is not invertible. Then, by Chapter 2, there is a nontrivial column vectorx suchthat Ax = 0. So some column ofA is a linear combination of other columns (i.e., excluding itself)ofA. It now follows from multilinearity and alternating properties that detA= 0.

(iii) Taking determinants we have detAdetB = 1. So detA= 0 and A is invertible. NowB = (A1A)B = A1(AB) =A1.

Theorem 1.3.13. For anyn n matrixA,detA= detAt.

Proof. LetB be the rcf ofA. ThenE A= B , whereEis a product of elementary matrices. Sinceinverses of elementary matrices are elementary matrices (of the same type) we can write

A = E1 EkB,

A

t

= B

t

E

t

k Et

1,where the Ei are elementary matrices.

Now the transpose of an elementary matrix is also an elementary matrix (of the same type) andhas the same determinant (by Theorem 1.3.10). Thus, by multiplicativity of determinant, we needto show that det(B) = det(Bt).

Case (i)A is not invertible (i.e., det(A) = 0): In this case det(B) = 0 and the last row ofB or thelast column ofB t is 0. Thus det(Bt) = 0.

Case (ii)A is invertible: In this case B (andB t) are both equal to the the identity matrix.

The lemma above shows that the determinant is also a normalized, alternating, and multilinear

functions of the rows of a square matrix and we have the following formula for the determinant,calledexpansion by column k.


26/71


Theorem 1.3.14. LetA= (aij) be ann n matrix and let1kn. Then

detA =n

i=1

(1)k+iaikdetAik.

Example 1.3.15.(Computation by GaussElimination Method). This is one of the most efficientways to calculate determinant functions. LetA be an n n matrix. Suppose

E= the n n elementary matrix for the row operation Ai+ cAjF= the n n elementary matrix for the row operation AiAjG= the n n elementary matrix for the row operation AicAi.

Suppose that U is the rcf of A. If c1

, c2

, . . . , cp

are the multipliers used for the row operationsAicAi andr row exchanges have been used to get U fromA then for any alternating multilinearfunction d, d(A) = (1)rc1c2 . . . cp d(U). To see this we simply note that

d(F A) =d(A), d(EA) =d(A) andd(GA) =cd(A).

Suppose thatu11, u22, . . . , unn are the diagonal entries ofU then

d(A) = (1)r(c1c2, . . . cp)1u11u22 . . . unnd(e1, e2, . . . , en).

The cofactor matrix

Definition 1.3.16. LetA = (aij) be ann n matrix. The cofactor ofaij , denoted bycofaij isdefined as

cofaij = (1)i+jdetAij .Thecofactor matrix ofA denoted bycofA is the matrix

cofA= (cofaij).

Whenn = 1, A11 is the empty matrix and its determinant is taken to be 1.

Theorem 1.3.17. For anyn

n matrixA,

A(cofA)t = (detA)I= (cofA)tA.

In particular, ifdetA is nonzero thenA1 = 1detA

(cofA)t, henceA is invertible.

Proof. The (i, j) entry of (cofA)tAis :

a1jcofa1i+ a2jcofa2i+ + anjcofani.Ifi = j, it is easy to see that it is detA.Wheni=j consider the matrixB obtained by replacingith

column ofA byjth

column ofA. So B has a repeated column. The expansion by minors formulafor detB shows that detB = 0. The other equation A(cofA)t = (detA)Iis proved similarly.


27/71


Theorem 1.3.18. (Cramers Rule) Suppose

a11 a12 a1na21 a22 a2n

...an1 an2 ann

x1x2...

xn

=

b1b2...

bn

is system of n linear equations in n unknowns, x1, x2, . . . , xn. Suppose the coefficient matrixA =(aij) is invertible. Let Cj be the matrix obtained from A by replacing j

th column of A by b =(b1, b2, . . . , bn)

t. Then forj = 1, 2, . . . , n ,

xj =detCj

detA

.

Proof. Let A1, . . . , An be the columns of A. Write b = x1A1 +x2A2 + + xnAn. Thendet(b, A2, A3, . . . , An) =x1detA (why?). So x1 =

detC1detA

. Similarly for x2, . . . , xn.


28/71



29/71


30/71

30 CHAPTER 2. VECTOR SPACES AND LINEAR TRANSFORMATIONS

10. (existence of identity for multiplication) For all xV,

1x= x.

Remark 2.1.2. WhenF = Rwe say thatV is areal vector space. If we replace real numbers inthe above definition by complex numbers then we get the definition of acomplex vector space.

Examples of Vector Spaces

In the examples below we leave the verification of the vector addition and scalar multiplicationaxioms as exercises.

Example 2.1.3. 1. V = R, F = R with ordinary addition and multiplication as vector addition

and scalar multiplication. This gives a real vector space.

2. V = C, F = C with ordinary addition and multiplication as vector addition and scalarmultiplication. This gives a complex vector space.

3. V = C, F = R with ordinary addition and multiplication as vector addition and scalarmultiplication. This gives a real vector space.

4. V = Rn ={(a1, a2, . . . , an)|a1, . . . , an R}, F = R with addition of row vectors as vectoraddition and multiplication of a row vector by a real number as scalar multiplication. Thisgives a real vector space. We can similarly define a real vector space of column vectors withnreal components. Depending on the context Rn could refer to either row vectors or column

vectors with n real components.

5. V = Cn ={(a1, a2, . . . , an)|a1, . . . , an C}, F = C with addition of row vectors as vectoraddition and multiplication of a row vector by a complex number as scalar multiplication.This gives a complex vector space. We can similarly define a complex vector space of columnvectors withn complex components. Depending on the context Rn could refer to either rowvectors or column vectors with n complex components.

6. Let a < b be real numbers and set V ={f : [a, b] R}, F = R. Iff, g V then we set(f+g)(x) =f(x) +g(x) for all x[a, b]. Ifa R and f V then (af)(x) = af(x) for allx[a, b]. This gives a real vector space. Here V is also denoted by R[a,b].

7. Let t be an indeterminate. The setPn(R) ={a0+ a1t+. . .+antn|a0, a1, . . . , an R} is areal vector space under usual addition of polynomials and multiplication of polynomials withreal numbers.

8. C[a, b] ={f : [a, b] R|f is continuous on [a, b]} is a real vector space under addition andscalar multiplication defined in item 6 above.

9. V ={f : [a, b] R|f is differentiable at x[a, b], x fixed}is a real vector space under theoperations described in item 6 above.

10. The set of all solutions to the differential equationy

+ay

+by = 0 where a, b R form areal vector space. More generally, in this example we can take a = a(x), b = b(x) suitablefunctions ofx.


31/71

2.1. VECTOR SPACES 31

11. Let V = Mmn(R) denote the set of all m n matrices with real entries. ThenV is a realvector space under usual matrix addition and multiplication of a matrix by a real number.

The above examples indicate that the notion of a vector space is quite general. A result provedfor vector spaces will simultaneously apply to all the above different examples.

Subspace of a Vector Space

Definition 2.1.4. LetVbe a vector space overF. A nonempty subsetW ofV is called asubspaceofV if

(i) 0W.(ii) u, vW impliesu + vW.

(iii) uW, F impliesuW.Before giving examples we discuss an important notion.

Linear span

Let V be a vector space over F. Let x1, . . . , xn be vectors in V and let c1, . . . , cn F. Thevector

ni=1 cixiV is called a linear combinationofxis and ci is called the coefficientofxi

in this linear combination.

Definition 2.1.5. Let S be a subset of a vector space V over F. The linear span of S is thesubset of all vectors inVexpressible as linear combinations of finite subsets ofS, i.e.,

L(S) =

ni=1

cixi|n0, x1, x2, . . . , xnS andc1, c2, . . . , cn F

The empty sum of vectors is the zero vector. ThusL() ={0}. We say thatL(S) isspanned byS.

The linear spanL(S) is actually a subspace ofV. In fact, we have

Lemma 2.1.6. The smallest subspace ofV containingS isL(S).

Proof. Note that L(S) is a subspace (why?). Now, ifSW V and W is a subspace ofV thenbyL(S)W(why?). The result follows. Example 2.1.7. 1. Let A be an mn matrix over F, with rows R1, . . . , Rm and columns

C1, . . . , C n. The row spaceofA, denotedR(A), is the subspace ofFn spanned by the rows ofA. The column spaceofA, denotedC(A), is the subspace ofFm spanned by the columns ofA. The null spaceofA, denotedN(A), is defined by

N(A) ={x Fn :Ax = 0}.

Check thatN(A) is a subspace ofFn.2. Different sets may span the same subspace. For exampleL({e1, e2}) =L({e1, e2, e1+ e2}) =

R2

.The vector spacePn(R

) is spanned by {1, t , t2

, . . . , tn

} and also by {1, (1 + t), . . . , (1 + t)n

}(why?).


32/71


Bases and dimension of vector spaces

We have introduced the notion of linear span of a subset Sof a vector space. This raises somenatural questions:

(i) Which spaces can be spanned by finite number of elements ?

(ii) If a vector space V =L(S) for a finite subset S ofVthen what is the size of smallest suchS?

To answer these questions we introduce the notions of linear dependence and independence,basis and dimension of a vector space.

Linear independence

Definition 2.1.8. LetVbe a vector space. A subsetS

V is called linearly dependent (L.D.)

if there exist distinct elements v1, v2, . . . , vn S (for some n 1) and scalars 1, 2, . . . , n notall zero such that

1v1+ 2v2+ . . . + nvn= 0

A setS is called linearly independent (L.I.) if it is not linearly dependent, i.e., for alln1and for all distinctv1, v2, . . . , vnS and scalars1, 2, . . . , n

1v1+ 2v2+ . . . + nvn= 0 impliesi = 0, for alli.

Elements of a linearly independent set are called linearly independent. Note that the emptyset is linearly independent.

Remark 2.1.9. (i) Any subset ofVcontaining a linearly dependent set is linearly dependent.(ii) Any subset of a linearly independent set inV is linearly independent.

Example 2.1.10. (i) If a set Scontains the zero vector 0 then Sis dependent since 1.0 = 0.

(ii) Consider the vector space Rn and let S={e1, e2, . . . , en}. Then S is linearly independent.Indeed, if1e1+2e2+. . .+nen = 0 for some scalars 1, 2, . . . , n then (1, 2, . . . , n) = 0.Thus eachi = 0. Hence S is linearly independent.

(iii) LetVbe the vector space of all continuous functions from R to R.LetS={1, cos2 t, sin2 t}.Then the relation cos2 t + sin2 t 1 = 0 shows that S is linearly dependent.

(iv) Let1< 2 < .. . < nbe real numbers. LetV ={f : R R|fis continuous }. Considerthe set S=

{e1x, e2x, . . . , enx

}. We show that Sis linearly independent by induction on n. Let

n = 1 and e1x = 0. Since e1x = 0 for any x, we get = 0. Now assume that the assertion istrue for n 1 and

1e1x + . . . + ne

nx = 0.

Then1e(1n)x + . . . + ne

(nn)x = 0

Letx to get n= 0. Now apply induction hypothesis to get 1= . . .= n1= 0.(v) LetP denote the vector space of all polynomials p(t) with real coefficients. Then the set

S={1, t , t2, . . .} is linearly independent. Suppose that 0n1< n2 < .. . < nr and1t

n1 + 2tn2 + . . . + rt

nr = 0

for certain real numbers 1, 2, . . . , r. Differentiate n1 times to get 1 = 0. Continuing this waywe see that all 1, 2, . . . , r are zero.


33/71


Bases and Dimension

Bases and dimension are two important notions in the study of vector spaces. A vector spacemay be realized as linear span of several sets of different sizes. We study properties of the smallestsets whose linear span is a given vector space.

Definition 2.1.11. A subset S of a vector space V is called a basis of V if elements of S areindependent andV =L(S).A vector spaceVpossessing a finite basis is calledfinite dimensional.OtherwiseV is called infinite dimensional.

Exercise 2.1.12. Let{v1, . . . , vn} be a basis of a finite dimensional vector space V. Show thatevery vVcan be uniquely expressed as v = a1v1+ + anvn, for scalars a1, . . . , an.

We show that all bases of a finite dimensional vector space have same cardinality (i.e., they

contain the same number of elements) For this we prove the following result.Lemma 2.1.13. LetS={v1, v2, . . . , vk} be a subset of a vector spaceV . Then anyk + 1 elementsinL(S) are linearly dependent.

Proof. We shall give two proofs.

(first proof) Suppose T ={w1, . . . , wn} are linearly independent vectors in L(S). We shallshow that nk. This will prove the result.

We shall construct a sequence of sets

S= S0, S1, . . . , S n

such that(i) each Si spans L(S), i = 0, 1, . . . , n.

(ii)|Si|= k, i = 0, 1, . . . , n.(iii){w1, . . . , wi} Si, i = 0, 1, . . . , n.We shall produce this sequence of sets inductively, the base case i = 0 being clear. Now suppose

we have sets S0, . . . , S j satisfying (i), (ii), (iii) above, for some j < n.

Since Sj spans L(S) we can write

wj+1=

sSjcss,

for some scalars cs, s S. Since w1, . . . , wj+1 are linearly idependent there exists t Sj{w1, . . . , wj} withct= 0 (why?). It follows that

t= 1

ct(wj+1

sSj{t}

css)

and hence the set (Sj {t}){wj+1}satisfies conditions (i), (ii), and (iii) above fori = j +1. Thatcompletes the proof.

(second proof) Let T ={u1, . . . , uk+1} L(S). Write

ui =

kj=1

aijvj , i= 1, . . . , k+ 1.


34/71


Consider the (k+ 1) k matrix A = (aij).SinceA has more rows than columns there exists (why?) a nonzero row vectorc = [c1, . . . , ck+1]

such thatcA = 0, i.e., for j = 1, . . . kk+1i=1

ciaij = 0.

We now have

k+1i=1

ciui =k+1i=1

ci(k

j=1

aijvj)

=

kj=1

(

k+1i=1

ciaij)vj

= 0,

completing the proof.

Theorem 2.1.14. Any two bases of a finite dimensional vector space have same number of ele-ments.

Proof. SupposeSandT are bases of a finite dimensional vector spaceV .Suppose|S|


35/71


Proof. Suppose that dim V =n and Shas less than n elements. Let vV\ L(S). ThenS {v}is a linearly independent subset ofV (why?). Continuing this way we can enlarge S to a basis ofV.

Gauss elimination, row space, and column space

Lemma 2.1.19. LetA be am n matrix overF andE a nonsingularm mmatrix overF. Then(a)R(A) =R(EA). Hencedim R(A) = dim R(EA).(b) Let1i1 < i2


36/71


Columns 1,4,6 ofA form a basis ofC(A) and the first 3 rows ofU form a basis ofR(A).Definition 2.1.22. Therank of anm n matrixA, denoted byr(A) or rank(A)isdim R(A) =dim C(A). Thenullity ofA is the dimension of the nullspaceN(A) ofA.

The rank-nullity Theorem

Theorem 2.1.23. LetA be anm n matrix. ThenrankA + nullityA= n.

Proof. Letk= r(A). Reduce A to rcf (or even ref) Uusing elementary row operations. Then Uhas k nonzero rows andk pivotal columns. We need to show that dimN(A) = dimN(U) =n k.

Let j1, . . . , jk be the indices of the pivotal columns of U. Set P ={j1, . . . , jk} and F ={1, 2, . . . , n} \ P, so|F|= n k. Recall from Chapter 2 the following:

(i) Given arbitrary scalars xi for i F, there are unique scalars xi for i P such thatx= (x1, . . . , xn)

t satisfyingU x= 0.

(ii) Given iF, there is a unique si = (x1, . . . , xn) satisfyingU si = 0, xi = 1, and xj = 0, forall jF {i}.

Thensi, iF forms a basis ofN(A) (why?). Fundamental Theorem for systems of linear equations

Theorem 2.1.24.Consider the following system ofmlinear equations innunknownsx1, x2, . . . , xn:

a11 a12 a1na21 a22 a2n

...am1 am2 amn

x1x2...

xn

=

b1b2...

bm

orAx= b.

(1) The system has a solution iffr(A) =r([A | b]).(2) Ifr(A) =r([A | b]) =n thenAx= b has a unique solution.(3) Ifr(A) =r([A | b]) =r < n thenAx= b has infinitely many solutions.

Proof. (1) Let C1, C2, . . . , C n be the column ofA. Suppose Ax= b has a solution x1 =a1, x2 =a2, . . . , xn= an. Then

b= a1C1+ a2C2+ + anCn.Henceb C(A) soA and [A | b] have same column space. Thus they have equal rank. Conversely ifr(A) =r([A | b]), then b C(A). Hence b = d1C1+ + dnCn for some scalars d1, d2, . . . , dn. Then

d1C1+ + dnCn= A

d1d2...

dn

=b.

Hence x1= d1, . . . , xn= dn is a solution.


37/71

2.2. LINEAR TRANSFORMATIONS 37

(2) Letr(A) =r([A | b]) =n. Then by the rank-nullity theorem, nullity (A) = 0. Hence Ax = 0has a unique solution, namely x1 = = xn = 0. If Ax = b = Ay then A(xy) = 0. Hencex y= 0. Thus x = y.

(3) Suppose r(A) =r([A | b]) =r < n. Thenn r= dimN(A)> 0. ThusAx = 0 has infinitelymany solutions. Let c Fn and Ac = b. Then we have seen before that all the solutions ofAx = bare in the set c +N(A) ={c + x | Ax= 0}. Hence Ax = b has infinitely many solutions. Rank in terms of determinants

We characterize rank in terms of minors of A. Recall that a minor of order r of A is asubmatrix ofA consisting ofr columns and r rows ofA.

Theorem 2.1.25. Anm n matrixA has rankr1 iffdetM= 0 for some orderr minorM ofA anddetN= 0 for all orderr+ 1 minorsN ofA.

Proof. Let the rank ofA be r 1. Then somer columns ofA are linearly independent. LetBbe the m r matrix consisting of these r columns ofA. Then rank(B) =r and thus some r rowsofB will be linearly independent. Let Cbe the r r matrix consisting of these r rows ofB . Thendet(C)= 0 (why?).

LetNbe a (r + 1) (r + 1) minor ofA. Without loss of generality we may take Nto consist ofthe first r + 1 rows and columns ofA. Suppose det(N)= 0. Then the r + 1 rows ofN, and hencethe first r + 1 rows ofA, are linearly independent, a contradiction.

The converse is left as an exercise.

2.2 Linear transformations

LetA be an m nmatrix with real entries. Then A acts on the n-dimensional space Rn by leftmultiplication : Ifv Rn then Av Rm.In other words, A defines a function

TA: Rn Rm, TA(v) =Av.

By properties of matrix multiplication,TA satisfies the following conditions :

(i) TA(v+ w) =TA(v) + TA(w)(ii) TA(cv) =cTA(v)

wherec R and v, w Rn. We say that TA respects the two operations in the vector space Rn. Inthis section we study such maps between vector spaces.

Definition 2.2.1. LetV, Wbe vector spaces over F. A linear transformation T : V W is afunction satisfying

T(v+ w) =T(v) + T(w) andT(cv) =cT(v)

wherev, wV andc F.Exercise 2.2.2. LetT :VWbe a linear map. Show that T(0) = 0.


38/71


Example 2.2.3.

(1) Letc R, V =W = R2. Define T : R2 R2 by

T

xy

=

c 00 c

xy

=

cxcy

.

T stretches each vector v in R2 to cv. Hence

T(v+ w) =c(v+ w) =cv+ cw= T(v) + T(w)T(dv) =c(dv) =d(cv) =dT(v).

Hence Tis a linear transformation.

(2) Rotation

Fix and define T : R2 R2 by

T

xy

=

cos sin

sin cos

xy

=

x cos y sin x sin + y cos

.

Then T(e1) = (cos , sin )t and T(e2) = ( sin , cos )t. Thus T rotates the whole space by .

(Draw a picture to convince yourself of this. Another way is to identify the vector ( x, y)t with thecomplex number z = x + iy. Then we can write T(z) =zei).

(3) LetDbe the vector space of differentiable functions f : R R such that f(n) exists for all n.Define D :D D by

D(f) =f

.ThenD(af+ bg) =af

+ bg

=aD(f) + bD(g). Hence D is a linear transformation.

(4) DefineI :D D byI(f)(x) =

x0

f(t) dt

By properties of integration,Iis a linear transformation.

(5) Consider the differential equation

y 3y + 2y= 0.

LetD :D D be the linear transformation defined as above. Then D D(y) =y . Let Ibe theidentity mapI(y) =y. Then the differential equation can be written as

(D2 3D+ 2I)(y) = 0.It can be shown that ex and e2x are solutions of the differential equation. Let T =D2 3D+ 2I.Then for any , R

T(ex + e2x) =T(ex) + T(e2x) = 0

(6) The map T : R R given by T(x) =x2 is not linear (why?).

(7) Let V = Mnn(F

) be the vector space of all nn matrices over F

. Fix A V. The mapT :VV given by T(N) =ANis linear (why?).


39/71


Rank and Nullity

LetT :VW be a linear transformation of vector spaces. There are two important subspacesassociated withT .

Nullspace of T =N(T) ={vV|T(v) = 0}. Image ofT = Im(T) ={T(v)|vV}.

Let V be a finite dimensional vector space. Suppose that , are scalars. If v, w N(T)then T(v+w) =T(v) +T(w) = 0. Hence v+w N(T). ThusN(T) is a subsapce ofV.The dimension ofN(T) is called the nullity ofTand it is denoted by nullity (T). Suppose that

v, wV. Then T(v) + T(w) =T(v+ w).Thus Im (T) is a subspace ofW. The dimension of Im T, denoted by rank(T), is called therankofT.

Lemma 2.2.4. Let T : V W be a linear map of vector spaces. Then T is 1-1 if and only ifN(T) ={0}.

Proof. (if) T(u) =T(v) implies T(u v) = 0 which implies u = v.(only if) T(v) = 0 = T(0) implies v = 0.

Lemma 2.2.5. LetV, W be vector spaces. AssumeV is finite dimensional with(v1, . . . , vn) as an

ordered basis. Let (w1, . . . , wn) be an arbitrary sequence of vectors in W. Then there is a uniquelinear map T :VW withT(vi) =wi, for alli= 1, . . . , n.

Proof. (uniqueness)Given vVwe can write (uniquely) v = a1v1+ + anvn, for scalars ai.ThenT(v) =a1T(v1) + + anT(vn) =a1w1+ + anwn. So Tis determined by (w1, . . . , wn).(existence) Define T as follows. GivenvV write (uniquely) v = a1v1+ +anvn, for scalarsai. Define T(v) =a1w1+ + anwn. Show that T is linear (exercise). Theorem 2.2.6(The Rank-Nullity Theorem). LetT :VWbe a linear transformation of vectorspaces whereV is finite dimensional. Then

rank(T) + nullity(T) = dim V.

Proof. Suppose dim V = n. Let B ={v1, v2, . . . , vl} be a basis ofN(T). We can extend B to abasis C={v1, v2, . . . , vl, w1, w2, . . . , wnl} ofV . We show that

D={T(w1), T(w2), . . . , T (wnl)}is a basis of Im(T). Any vVcan be expressed uniquely as

v= 1v1+ 2v2+ + lvl+ 1w1+ + nlwnl.Hence

T(v) = 1T(v1) +

+ lT(vl) + 1T(w1) +

+ nlT(wnl)

= 1T(w1) + + nlT(wnl).


40/71


Hence D spans Im T. Suppose

1T(w1) + + nlT(wnl) = 0.

Then

T(1w1+ + nlwnl) = 0.Hence 1w1+ + nlwnl N(T). Hence there are scalars 1, 2, . . . , l such that

1v1+ 2v2+ + lvl = 1w1+ 2w2+ + nlwnl.

By linear independence of{v1, v2, . . . , vl, w1, w2, . . . , wnl} we conclude that1= 2= = nl=0. Hence D is a basis of Im T. Thus

rank(T) =n l= dim V dimN(T).

In a later exercise in this section we ask you to derive the rank-nullity theorem for matricesfrom the result above.

Coordinate vectors

LetV be a finite dimensional vector space (fdvs) over F. By anordered basisofVwe mean asequence (v1, v2, . . . , vn) of distinct vectors ofVsuch that the set{v1, . . . , vn}is a basis. LetuV.Write uniquely (why?)

u= a1v1+ a2v2+ + anvn, ai F.Define the coordinate vector ofu with respect to (wrt) the ordered basis B by

[u]B =

a1a2..

an

.

Note that (why?) for vectors u, vV and scalar a F we have

[u + v]B = [u]B+ [v]B, [av]B =a[v]B .

Suppose C= (u1, . . . , un) is another ordered basis ofV. GivenuV, what is the relation between[u]B and [u]C.

Define MCB , the transition matrix from C to B, to be the n n matrix whose j th columnis [uj ]B:

MCB = [[u1]B [u2]B [un]B] .

Lemma 2.2.7. SetM=MCB . Then, for alluV, we have

[u]B = M[u]C.


41/71


Proof. Let

[u]C=

a1a2..

an

.

Thenu = a1u1+ a2u2+ + anun and we have

[u]B = [a1u1+ + anun]B

= a1[u1]B+ + an[un]B

= [[u1]B [u2]B [un]B]

a1a2..

an

= M[u]C.

Example 2.2.8. LetV = R3 and let

v1 = 1

11

, v2=

011

, v3=

001

, u1 =

100

, u2=

010

, u3=

001

.

Consider the ordered bases B = (v1, v2, v3) and C= (u1, u2, u3). We have (why?)

M=MCB =

1 0 01 1 0

0 1 1

.

Letu =

234

. So (why?) [u]C=

234

.

Then

[u]B =

1 0 01 1 0

0 1 1

23

4

=

21

1

.

Check that 23

4

= 2

11

1

+

01

1

+

00

1

.

Lemma 2.2.9. LetVbe a fdvs andB andCbe two ordered bases. Then

MCB = (MBC)

1.


42/71


Proof. PutM=MBC andN =MCB . We need to show that M N=N M=I.

We have, for all uV,[u]B =N[u]C, [u]C=M[u]B.

It follows that, for all uV,[u]B =N[u]C=N M[u]B

[u]C=M[u]B =M N[u]C

Thus (why?) M N=N M =I.

Example 2.2.10. Let M be the (n+ 1)(n+ 1) matrix, with rows and columns indexed by{0, 1, . . . , n}, and with entry in row i and columnj , 0i, jn, given by

ji. We show that M is

invertible and find the inverse explicitly.Consider the vector spacePn(R) of real polynomials of degreen. ThenB = (1, x , x2, . . . , xn)

and C= (1, x 1, (x 1)2, . . . , (x 1)n) are both ordered bases (why?).We claim that M=MBC. To see this note the following computaion. For 0jn we have

xj = (1 + (x 1))j

=

ji=0

j

i

(x 1)i

=n

i=0 j

i(x 1)i,

where in the last step we have used the fact thatji

= 0 for i > j .

Thus M1 = MCB and its entries are given by the following computation. For 0 j n wehave

(x 1)j =ji=0

(1)ji

j

i

xi

=n

i=0

(1)ji

j

i

xi

Thus, the entry in row i and column j ofM1 is (1)jiji

.

Matrices and linear transformations

Let V and W be finite dimensional vector spaces with dim V = n and dim W = m. SupposeE= (e1, e2, . . . , en) is an ordered basis for V and F = (f1, f2, . . . , f m) is an ordered basis for W.

LetT :VWbe a linear transformation. We defineMEF(T), thematrix ofTwith respectto the ordered bases E and F, to be the m n matrix whose j th column is [T(ej)]F:

MEF(T) = [[T(e1)]F [T(e2)]F [T(en)]F] .

Please do the following important exercise.


43/71


Exercise 2.2.11. Let A be a m n matrix over F and consider the linear map TA : Fn Fm

given by TA(v) =Av, for v Fn

(we are considering column vectors here).Consider the ordered basis E = (e1, . . . , en) and F = (e1, . . . , em) of F

n and Fm respectively.Show that MEF(TA) =A.

LetL(V, W) denote the set of all linear transformations from V toW. SupposeS, T L(V, W)andc is a scalar. Define S+ T andcSas follows :

(S+ T)(x) = S(x) + T(x)

(cS)(x) = cS(x)

for all xV. It is easy to show thatL(V, W) is a vector space under these operations.

Lemma 2.2.12. Fix ordered basesE andF ofV andW respectively. For allS, T L(V, W) andscalarc we have

(i) MEF(S+ T) =MEF(S) + M

EF(T)

(ii) MEF(cS) =cMEF(S)

(iii) MEF(S) =MEF(T)S= T .

Proof. Exercise.

Lemma 2.2.13. Suppose V, W are vector spaces of dimensions n, m respectively. Suppose T :V

W is a linear transformation. SupposeE= (e1, . . . , en), F = (f1, . . . , f m) are ordered bases

ofV, W respectively. Then[T(v)]F =M

EF(T)[v]E, vV.

Proof. Let

[v]E=

a1a2..

an

.

Thenv = a1e1+ a2e2+ + anen and hence T(v) =a1T(e1) + a2T(e2) + + anT(en).We have

[T(v)]F = [a1T(e1) + + anT(en)]F

= a1[T(e1)]F+ + an[T(en)]F

= [[T(e1)]F [T(e2)]F [T(en)]F]

a1a2..

an

= MEF(T)[v]E.


44/71


Lemma 2.2.14. Suppose U ,V,W are vector spaces of dimension n,p,m respectively. Suppose

T : U V and S : V W are linear transformations. Suppose E = (e1, . . . , en), F , G areordered bases ofU, V,W respectively. Then

MEG (SoT) =MFG (S)M

EF(T).

Proof. The j th column ofMEG (S T) is

= [(S T)(ej)]G= [S(T(ej))]G.

Now the j th column ofMFG (S)MEF(T) is

= MFG (S)(jth column ofMEF(T))

= MFG (S)[T(ej)]F

= [S(T(ej))]G.

Let V be a fdvs. A linear mapT : V V is said to be a linear operator on V. LetB, C beordered bases ofV. The square matrix MBB (T) is said to be the matrix ofT with respect tothe ordered basisB. Note that the transition matrix MCB from C to B is the matrix M

CB (I) of

the identity map wrt C andB . Thus it follows that (why?) MCB (I) =MBC(I)

1.

Exercise 2.2.15. Prove Lemma 2.2.9 using Lemma 2.2.14.

An easy induction gives the following generalization the lemma above. Its proof is left as anexercise.

Lemma 2.2.16. Suppose Vi, 1 i m+ 1 are finite dimensional vector spaces and Ti : ViVi+1, 1im are linear maps. SupposeEi is an ordered basis ofVi, for1im + 1. Then

ME1Em+1(Tm Tm1 T2 T1) =MEmEm+1(Tm) ME2E3 (T2) ME1E2 (T1).

Lemma 2.2.17. We have

MBB (T) = (MBC)

1MCC(T)MBC .

Proof. Applying Lemma 2.2.16 with V1 = V2 = V3 = V4 = V and T1 = T3 = I, T2 = T, and

E1= E4= B, E2= E3= Cwe get

MBB (T) = MCB M

CC(T)M

BC

Since MCB = (MBC)

1 the proof is complete.

Example 2.2.18. Consider the linear transformation

T : R2 R2, T(e1) =e1, T(e2) =e1+ e2.

LetC={e1, e2} and B ={e1+ e2, e1 e2}. Then

MCC(T) =

1 10 1

, MBC =

1 11 1

, MCB =

1/2 1/21/2 1/2

.


45/71


Hence

MBB =

1/2 1/21/2 1/2

1 10 1

1 11 1

= 1

2

3 11 1

.

We can also find this directly :

T(e1+ e2) = 2e1+ e2=3

2(e1+ e2) +

1

2(e1 e2)

T(e1 e2) = e2=12

(e1+ e2) +1

2(e1 e2).

Exercise 2.2.19. (i) Deduce the rank-nullity theorem for matrices from the rank-nullity theoremfor linear maps.

(ii) Let T : V W be a linear map of rank r between fdvs V and W. Show that there areordered bases E ofV and F ofW such that

MEF(T) =

I 00 0

,

whereI is the r r identity matrix and 0 stands for a matrix of zeros of appropriate size.

Given subspaces V , Wof a vector space Udefine the sum ofV and W, denotedV + W, by

V + W =L(V W).

Lemma 2.2.20. LetV, Wbe subspaces of a fdvsU. Then

dim V+ dim W= dim(V W) + dim(V + W).

Proof. We shall give a sketch of a proof leaving the reader to fill in the details.

Consider the set V W ={(v, w) :vV, wW}. This set is a vector space with componentwise addition and scalar multiplication. Check that the dimension of this space is dim V + dim W.

Define a linear mapT :V W V + W byT((v, w)) =v w. Check thatT is onto and thatthe nullspace ofT is{(v, v) : v V W}. The result now follows from the rank nullity theoremfor linear maps.

Example 2.2.21. Let V, Wbe finite dimensional vector spaces over F with dimensions n, m re-spectively. Fix ordered bases E , F for V , W respectively.

Consider the mapf :L(V, W)Mmn(F) given byf(T) =MEF(T), forT L(V, W). Lemma2.2.12 shows thatfis linear and 1-1 and Lemma 2.2.5 shows that f is onto. It follows (why?) thatdim L(V, W) =mn.Example 2.2.22. Often we see statements like If every vector in a vector space is uniquely

determined by t parameters the dimension ofV ist. In this example we show one possible way ofmaking this precise.


46/71


LetVbe a vector space over F. Alinear functionalis a linear map f :V F. We shall referto a linear functional as a parameter. Suppose we have t parameters fi : V F, i= 1, 2, . . . , t.Suppose every vector inVis uniquely determined by these t parameters, i.e., given arbitrary scalarsa1, a2, . . . , at in F, there is a unique vector v V with fi(v) = ai, i = 1, . . . , t. Then dim V = t.We show this as follows.

For i = 1, . . . , t, let viVbe the (unique) vector with fi(vi) = 1 andfj(vi) = 0, for j=i. Weclaim thatv1, . . . , vt is a basis ofV.

LetvV. Putai = fi(v), i= 1, . . . , t. Consider the vector v (a1v1+ + atvt). Check thatfi(v (a1v1+ +atvt)) = 0, for i = 1, . . . , t. Since each of the fi is linear and the parametersf1, . . . , f t uniquely determine the vectors in Vit follows that the only vector with all parameters 0is the 0 vector. Thus v (a1v1+ + atvt) = 0 and v1, . . . , vt spanV .

Now supposea1v1 + + atvt = 0. Then, for alli, fi(a1v1 + + atvt) =ai = 0 and thus linearindependence follows.

Example 2.2.23. Given a n n matrix, by ri we mean the sum of elements in row i. Similarly,bycj we mean the sum of elements in column j .

A real magic square of order n is a real n n matrix satisfying

r1= r2= = rn= c1= c2 = = cn.

LetRM S(n) denote the set of all real magic squares of order n. It is easy to see that RM S(n) isa subspace ofMnn(R), the vector space of all n n real matrices. The dimension ofMnn(R) isn2. What is the dimension ofRM S(n)?

We show that dim RM S(n) = (n 1)2 + 1 using the previous example.For 1i, jn 1, define a linear map fij :RM S(n) R by

fij(M) = entry in row i and columnj ofM , MRM S(n).

Define a linear map f :RM S(n) R by

f(M) = sum of the entries in row 1 ofM , MRM S(n).

Check that the (n

1)2 + 1 parameters f , fij satisfy the hypothesis of the previous example.

LetVbe a finite dimensional vector space and let V =R Nbe a direct sum decomposition.It is easy to see that there is a unique linear map pR : VV satisfying

pR(v) =v, for vR andpR(v) = 0, for vN .

We say that pR is the projection onto R along N. Note that p2R = pR.

Lemma 2.2.24. Let P : V V satisfy P2 = P. Then P is the projection onto Im(P) alongN(P).

Proof. LetvV. Write v = P(v) + (v P(v)). NowP(v P(v)) =P(v) P2

(v) = 0. It followsthat V= Im(P) +N(P).


47/71


Now let P(u) Im(P) N(P). Then P(u) = P2(u) = P(P(u)) = 0. It follows that V =Im(P) N(P).

Let P(u) Im(P). Then P(P(u)) = P2(u) = P(u). Clearly P(v) = 0 for v N(P). Thatcompletes the proof.


48/71



49/71

Chapter 3

Inner product spaces

3.1 Length, Projection, and Angle

The concept of a (real) vector space abstracts the operations of adding directed line segments andmultiplying a directed line segment by a real number. In plane geometry we also speak of othergeometric concepts such as length, angle, perpendicularity, projection of a point on a line etc.Remarkably, we need to put only a single additional algebraic structure, that of an inner product,on a vector space in order to have these geometric concepts available in the abstract setting.

We shall use the following notation. Recall that F = R or C. Givena

F, a will denote a if

F= R and is the complex conjugate ofa ifF= C. Given a matrix A over F we denote by A theconjugate transpose ofA, i.e., ifA = (aij) thenA

= (aji). We call A the adjointofA.

Definition 3.1.1. Let V be a vector space V over F. An inner product on V is a rule whichto any ordered pair of elements (u, v) of V associates a scalar, denoted byu, v satisfying the

following axioms: for allu, v,w inV andc any scalar we have

(1)u, v=v, u (Hermitian property or conjugate symmetry),(2)u, v+ w=u, v + u, w (additivity),(3)cu, v= cu, v (homogeneity),(4)

v, v

0 with

v, v

= 0 iffv= 0 (positive definite).

A vector space with an inner product is called an inner product space.

Example 3.1.2. (1) Let v = (x1, x2, . . . , xn)t and w = (y1, y2, . . . , yn)

t Rn. Definev, w =ni=1 xiyi = v

tw. This is an inner product on the real vector space Rn. This is often called thestandard inner product.

(2) Let v = (x1, x2, . . . , xn)t and w = (y1, y2, . . . , yn)

t Cn. Definev, w=ni=1 xiyi = vw.This is an inner product on the complex vector space Cn.This is often called the standard innerproduct.

(3) Let V= the vector space of all real valued continuous functions on the unit interval [0, 1].For f , gV, put

49


50/71

50 CHAPTER 3. INNER PRODUCT SPACES

f, g= 10

f(t)g(t)dt.

Simple properties of the integral show thatf, g is an inner product.(4) Let B be a nonsingular nn complex matrix. Set A = BB. Given x, y Cn define

x, y = xAy. Denote the standard inner product on Cn by the dot product (i.e., the innerproduct ofu and v is u v). We have

x, y= xAy= xBBy = (Bx)By = Bx ByCheck that,is an inner product.Definition 3.1.3. Given an inner product spaceV and an elementv

V we define its length or

norm byv=

v, v.

We sayv is aunit vector ifv= 1. Elementsv, w ofVare said to be orthogonalorperpen-dicular ifv, w= 0. We write this asvw.

Note that ifc F and vV thencv=

cv, cv=

ccv, v=|c|v.

Theorem 3.1.4. (Pythagoras)Ifvw thenv+ w2 =v2 + w2.

Proof. We have

v+ w2 =v+ w, v+ w=v, v + w, w + v, w + w, v=v2 + w2. Exercise 3.1.5. Prove the Parallelogram law: forv , wV we have

v+ w2 + v w2 = 2v2 + 2w2.Definition 3.1.6. Letv, wV withw= 0. We define

pw(v) =w, v

w, w

w

to be the projection of v on w.

Note that the mappw :VVgiven by vpw(v) is linear. This is the reason for usingw, vinstead ofv, w in the definition ofpw(v).

The next lemma, whose geometric content is clear, explains the use of the term projection.

Lemma 3.1.7. Letv, wV withw= 0. Then(i) pw(v) = p w

w(v), i.e., the projection of v on w is the same as the projection of v on the unit

vector in the direction ofw.

(ii) pw(v) andv

pw(v) are orthogonal.

(iii)pw(v) v with equality iff{v, w} are linearly dependent.


51/71

3.1. LENGTH, PROJECTION, AND ANGLE 51

Proof. (i) We have

pw(v) =w, vw, ww=

w, vw2 w=

w

w , v w

w =p ww (v),

where in the last step we have used the fact that ww , ww= 1.(ii) In view of part (i) we may assume that w is a unit vector. We have

pw(v), v pw(v) = pw(v), v pw(v), pw(v)

= w, vw, v w, vw,w, vw

= w, vw, v w, vw, vw, w

= 0.

(iii) We have (in the third step below we have used part (ii) and Pythogoras)

v2 = v, v= pw(v) + v pw(v), pw(v) + v pw(v)= pw(v)2 + v pw(v)2 pw(v)2.

Clearly, there is equality in the last step iffv = pw(v), i.e., iff{v, w} are dependent. Theorem 3.1.8. Cauchy-Schwarz inequalityForv, wV

|w, v| vw,with equality iff{v, w} are linearly dependent.

Proof. The result is clear ifw = 0. So we may assume that w= 0.Case (i): w is a unit vector. In this case the lhs of the C-S inequality ispw(v) and the resultfollows from part (iii) of the previous lemma.

Case (ii): w is not a unit vector. Setu = ww . We have

|w, v

|=

w

(

|u, v

|) and

v

w

=

w(vu). The result follows from Case (i) above. Theorem 3.1.9. Triangle InequalityForv, wV

v+ w v + w.

Proof. We have, using C-S inequality,

v+ w2 = v+ w, v+ w= v, v + v, w + w, v + w, w= v, v + v, w + v, w + w, w

v

2 +

w

2 + 2

v

w

= (v + w)2.


52/71


Definition 3.1.10. Let V be a real inner product space. Given v, w V with v, w= 0, by C-Sinequality

1 v, wvw1.

So, there is a unique0 satisfyingcos() = v, wvw . This is theangle betweenv andw.Thedistance betweenu andv inV is defined asd(u, v) =u v.

Lemma 3.1.11. Letu, v,wV. Then(i) d(u, v)0 with equality iffu= v.(ii) d(u, v) =d(v, u).

(iii) d(u, v)

d(u, w) + d(w, u).

Proof. Exercise.

Definition 3.1.12. LetV be an n-dimensional inner product space. A basis{v1, v2, . . . , vn} ofVis called orthogonal if its elements are mutually perpendicular, i.e., ifvi, vj = 0 for i=j . If,in addition,vi= 1, for alli, we say that the basis is orthonormal.Example 3.1.13. In Fn, with the standard inner product, the basis{e1, . . . , en} is orthonormal.Lemma 3.1.14. LetU={u1, u1, . . . , un} be a set of nonzero vectors in an inner product spaceV.Ifui, uj= 0 fori=j, 1i, jn, thenU is linearly independent.

Proof. Suppose c1, c2, . . . , cn are scalars with

c1u1+ c2u2+ . . . + cnun= 0.

Take inner product with ui on both sides to get

ciui, ui= 0.Since ui= 0, we get ci = 0. Thus U is linearly independent.

Example 3.1.15.(1) Consider R2 with standard inner product. Thenv1=

11

andv2=

11

are orthogonal. Dividing v1 and v2 by their lengths we get an orthonormal basis.(2) Let V denote the real inner product space of all continuous real functions on [0, 2] with

inner product given by

f, g= 20

f(x)g(x)dx.

Define gn(x) =cos(nx), for n0. Then

gn(x)= 20

cos2nx dx=

2, n= 0,, n1,

and

gm, gn

= 2

0

cos(mx)cos(nx)dx = 0, m

=n.

So,{g0, . . . , gn} are orthogonal.


53/71

3.1. LENGTH, PROJECTION, AND ANGLE 53

Theorem 3.1.16.LetVbe a finite dimensional inner product space. LetW Vbe a subspace andlet{w1, . . . , wm}be an orthogonal basis ofW. IfW=V, then there exist elements{wm+1, . . . , wn}ofV such that{w1, . . . , wn} is an orthogonal basis ofV.

TakingW ={0}, the zero subspace, we see thatVhas an orthogonal, and hence orthonormal,basis.

Proof. The method of proof is as important as the theorem and is called the Gram-Schmidtorthogonalization process.

SinceW=V, we can find a vector vm+1 such that{w1, . . . , wm, vm+1} is linearly independent.The idea is to take vm+1 and subtract from it its projections along w1, . . . , wm}. Define

wm+1= vm+1 pw1(vm+1) pw2(vm+1) pwm(vm+1)

(Recall thatpw(v) = w, vw, w

w.)

Clearly,wm+1= 0 as otherwise {w1, . . . , wm, vm+1} would be linearly dependent. We now checkthat{w1, . . . , wm+1}is orthogonal. For this, it is enough to check that wm+1 is orthogonal to eachofwi, 1im.

For i = 1, 2, . . . , mwe have

wi, wm+1 = wi, vm+1 m

j=1

pwj (vm+1)

= wi, vm+1 wi,m

j=1

pwj(vm+1)

= wi, vm+1 wi, pwi(vm+1), (sincewi, wj= 0 fori=j)= wi, vm+1 pwi(vm+1)= 0, (by part (ii) of 3.1.7).

Example 3.1.17.Find an orthonormal basis for the subspace ofR4 (under standard inner product)

spanned by

1101

,

12

00

, and

10

12

Denote these vectors by a,b, c respectively. Set

b = b b aa a a

= 1

3

45

01

.


54/71


Now subtractc fr0m its projections along a andb.

c = c c aa a a

c bb b b

= 1

7

427

6

.

Nowa, b, c are orthogonal and generate the same subspace as a, b, c. Dividing by the lengths weget the orthonormal basis aa ,

b

b, c

c.

Example 3.1.18. LetV =P3[

1, 1] denote the real vector space of polynomials of degree atmost

3 defined on [1, 1]. V is an inner product space under the inner product

f, g= 11

f(t)g(t)dt.

To find an orthonormal basis, we begin with the basis{1, x , x2, x3}. Set v1= 1. Then

v2 = x x, 1 112

= x 12

1

1tdt= x,

v3 = x2 x2, 11

2 x2, x x

(2/3)

= x2 12

11

t2dt 32

x

11

t3dt

= x2 13

,

v4 = x3 x3, 11

2 x3, x x

(2/3) x3, x2 1

3 x

2 13(

8/45)

= x3 35

x.

Thus{1, x , x2 13 , x3 35x}is an orthogonal basis. We divide these by respective norms to get anorthonormal basis.

12

, x

3

2, (x2 1

3)

3

5

2

2, (x3 3

5x)

5

7

2

2

.

You will meet these polynomials later when you will learn about differential equations.

3.2 Projections and Least Squares Approximations

Let V be a finite dimensional inner product space. We have seen how to project a vector onto anonzero vector. We now discuss the (orthogonal) projection of a vector onto a subspace.


55/71

3.2. PROJECTIONS AND LEAST SQUARES APPROXIMATIONS 55

LetWbe a subspace ofV . Define the orthogonal complementW ofW:

W ={uV|uw for all wW}.

Check that W is a subspace ofV .

Theorem 3.2.1. EveryvVcan be written uniquely as

v= x + y,

wherexW andyW.

Proof. (Existence) Let{v1, v2, . . . , vk} be an orthonormal basis ofW. Set

x=v1, vv1+ v2, vv2+ + vk, vvkand puty= v x. Clearly v = x + y and xW. We now check that yW. For i = 1, 2, . . . , kwe have

y, vi = v x, vi= v, vi x, vi

= v, vi k

j=1

vj , vvj , vi

= v, vi k

j=1

vj , vvj , vi

= v, vi v, vi (by orthonormality)= 0.

It follows that (why?) yW.(uniqueness) Letv = x + y = x + y, wherex, x W andy, y W. Thenx x =y y

W W. But W W ={0}(why?). Hence x = x and y = y . Lemma 3.2.2. We have dimW+ dimW = dimV.

Proof. Exercise.

Exercise 3.2.3. Consider Rn with standard inner product. Given a nonzero vectorv Rn, byHvwe mean the hyperplane (i.e., a subspace of dimension n 1) orthogonal to v

Hv ={u Rn :u v= 0}.

By a reflection we mean a linear operator Tv : Rn Rn which, for some nonzero v, sends v

tov and fixes every vector in Hv, i.e., Tv(v) =v and Tv(u) =u, for uHv.Show that, for all w Rn,

Tv(w) =w 2(w v)v v v.


56/71


Definition 3.2.4. For a subspaceW, we define a functionpW :VW as follows: givenvV,expressv (uniquely) asv =x+y, wherexW andyW

. DefinepW(v) =x. We callpW(v)theorthogonal projectionofv ontoW. Note thatv pW(v)W. Note also that the map pWis linear.

The diligent reader should observe that, in the language of the previous chapter, pW is theprojection onto W along W.

Example 3.2.5. ConsiderV = Fn with the standard inner product. LetP be an nmatrix overF with associated linear map TP : F

n Fn. Assume thatP2 = P and P = P. Then we claimthat TP =pim(TP)= pC(P). To see this proceed as follows.

We have already seen that P2 =P implies that TPis the projection ontoC(P) alongN(P). Itis enough to show thatC(P) andN(P) are orthogonal. Let v C(P) and u N(P). Then

v, u=TP(v), u=Pv, u= (P v)u= vPu= vP u= 0,completing the proof.

This example is our first hint of the connection between adjointness and orthogonality. We shallcome back to this theme when we discuss the spectral theorem.

Definition 3.2.6. Let W be a subspace of V and let v V. A best approximation to v byvectors inW is a vectorw inW such that

v

w

v

u

, for allu

W.

The next result shows that orthogonal projection gives the unique best approximation.

Theorem 3.2.7. Let v V and let W be a subspace of V. Let w W. Then the following areequivalent:

(i) w is a best approximation to v by vectors inW.

(ii) w= pW(v).

(iii) v wW.

Proof. We have

v w2 = v pW(v) +pW(v) w2= v pW(v)2 + pW(v) w2,

where the second equality follows from Pythogoras theorem on noting that pW(v) w W andv pW(v)W. It follows that (i) and (ii) are equivalent. To see the equivalence of (ii) and (iii)writev = w+ (v w) and apply Theorem 3.2.1.

Consider Rn with the standard inner product (we think ofRn as column vectors). Let A be ann m (mn) matrix and let bRn. We want to project b onto the column space ofA. Here isa method for doing this:

(i) Use Gauss elimination find a basis of

C(A).

(ii) Now use Gram-Schmidt process to find an orthogonal basis B ofC(A).


57/71

3.2. PROJECTIONS AND LEAST SQUARES APPROXIMATIONS 57

(iii) We have

pC(A)(b) =wB

pw(b).

We now discuss another method for doing this. The projection ofb onto the column space ofAwill be a vector of the form p= Ax for some xRm. From Theorem 3.2.7, p is the projection iffb Ax is orthogonal to every column ofA. In other words, x should satisfy the equations

At(b Ax) = 0, orAtAx = Atb.

The above equations are callednormal equationsin the Gauss-Markov theory in statistics. Thus,ifx is any solution of the normal equation

Linear algebra ma106 iitb

Documents

Transcript of Linear algebra ma106 iitb