Linear Algebra Glossary


This file defines common terms from linear algebra.

TABLE OF CONTENTS

A-Orthogonal Vectors

Two vectors u and v are said to be A-orthogonal if

( u, A * v ) = 0.

Here A should be a positive definite symmetric matrix, which in turn guarantees that the expression ( u, A * v ) may be regarded as an inner product of the vectors u and v, with the usual properties.

This concept is useful in the analysis of the conjugate gradient method.

Back to TABLE OF CONTENTS.

Adjacency Matrix

An adjacency matrix of an (undirected) graph is a matrix whose order is the number of nodes, and whose entries record which nodes are connected to each other by a link or edge of the graph.

If two nodes I and J are connected by an edge, then Ai,j=1. All other entries of the matrix are 0. Thus, an adjacency matrix is a zero-one matrix. The usual convention is that a node is not connected to itself, and hence the diagonal of the matrix is zero.

The product A2=A*A is a matrix which records the number of paths between nodes I and J. If it is possible to reach one node from another, it must be possible in a path of no more than n-1 links. Hence, the reachability matrix, which records whether it is possible to get from node I to node J in one or more steps, can be determined by taking the logical sum of the matrices I, A, A2, ..., An-1.

Back to TABLE OF CONTENTS.

Adjoint Matrix

The adjoint matrix of a square matrix A has the property that:

A * adjoint ( A ) = adjoint ( A ) * A = det(A) * I.

Thus, the adjoint of A is "almost" the inverse. If A is invertible, then the inverse of A, denoted inverse ( A ) can be written explicitly as:

A-1 = ( 1 / det(A) ) * adjoint ( A ).

The adjoint matrix, in turn, is defined in terms of the cofactor matrix of A:

adjoint ( A ) = ( cofactor ( A ) )T.

Back to TABLE OF CONTENTS.

Alternating Sign Matrix

An alternating sign matrix is an integer matrix with the properties that

Example:

     0   1   0   0   0
     1  -1   0   1   0
     0   1   0  -1   1
     0   0   0   1   0
     0   0   1   0   0
  

Obviously, alternating sign matrices include the identity matrix and any permutation matrix. From the definitions, you should see that the first row must contain a single entry which is 1, with the other values being 0. The nonzeroes in the second row can be a single 1, or the values, 1, -1, 1, in that order, with some intervening zeroes possible. In the third row, the value -1 may occur up to as many times as there were 1's in preceding rows, which means the most interesting row could be 1, -1, 1, -1, 1, -1, 1. Thus the number of possible nonzeroes grows until the central row of the matrix is reached. Since the same restrictions apply from the bottom reading up, the number of possible nonzeroes must now decrease. Similar reasoning controls the nonzero population of the columns.

If we let An denote the number of distinct alternating sign matrices of order n, then it has only recently been proved that

An = Product ( 0 <= I <= N-1 ) (3*I+1)! / (N+I)!
giving the sequence 1, 2, 7, 42, 429, 7436, 218348, 10850216, ...

Reference:

David Robbins,
The Story of 1, 2, 7, 42, 429, 7436, ...,
Mathematical Intelligencer,
Volume 13, Number 2, pages 12-19.
David Bressoud,
Proofs and Confirmations: The Story of the Alternating Sign Matrix Conjecture
Cambridge University Press, 1999.

Back to TABLE OF CONTENTS.

Anticirculant Matrix

An anticirculant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the left, with the first value "wrapping around" to the end.

Here is an example of an anticirculant matrix:

    1 2 3 4
    2 3 4 1
    3 4 1 2
    4 1 2 3
  
and here is an example of a rectangular anticirculant matrix:
    1 2 3 4 5
    2 3 4 5 1
    3 4 5 1 2
  

Simple facts about a anticirculant matrix A:

Back to TABLE OF CONTENTS.

Antisymmetric Matrix

A square matrix A is antisymmetric if it is equal to the negative of its transpose:

A = - AT.

Every matrix A can be decomposed into the sum of an antisymmetric and a symmetric matrix:

A = B + C = (1/2) * ( ( A - AT ) + ( A + AT ) )

Simple facts about an antisymmetric matrix A:

An antisymmetric matrix is also called skew symmetric.

In complex arithmetic, the corresponding object is a skew Hermitian matrix.

Back to TABLE OF CONTENTS.

Band Matrix

A band matrix is a matrix whose entries are all zero except for the diagonal and a few of the immediately adjacent diagonals.

If the band matrix is large enough, then many significant efficiencies can be achieved in storage and matrix operations. Because so many elements are zero, the band matrix can be stored more compactly using band matrix storage. And because so many elements are zero, many algorithms can be speeded up to execute more quickly, including matrix multiplication and Gauss elimination.

Special cases include band matrices which are symmetric, positive definite, or tridiagonal.

Here is an example of a band matrix:

    11 12  0  0  0  0
    21 22 23  0  0  0
    31 32 33 34  0  0
     0 42 43 44 45  0
     0  0 53 54 55 56
     0  0  0 64 65 66
  
This matrix has an upper bandwidth of 1, and lower bandwidth of 2, and an overall bandwidth of 4.

LAPACK and LINPACK include special routines for a variety of band matrices. These routines can compute the LU factorization, determinant, inverse, or solution of a linear system.

LAPACK and EISPACK have routines for computing the eigenvalues of a symmetric banded matrix.

Back to TABLE OF CONTENTS.

Band Matrix Storage

Band matrix storage is a matrix storage format suitable for efficiently storing the nonzero entries of a band matrix.

Most band storage schemes are column oriented. The nonzero entries of the matrix "slide" downwards, while remaining in their original column. If the original matrix looked like this:

    11 12  0  0  0  0
    21 22 23  0  0  0
    31 32 33 34  0  0
     0 42 43 44 45  0
     0  0 53 54 55 56
     0  0  0 64 65 66
  
the matrix would be saved in column band matrix storage as:
 
     0 12 23 34  0  0
    11 22 33 44 45  0
    21 32 43 54 55 56
    31 42 53 64 65 66
  
Note that the zeroes in the above array are there just as padding. They don't correspond to any entries of the original array, and are simply necessary to make the array rectangular.

If the matrix is to be handled by a Gauss elimination routine that uses pivoting, then there is a possibility of fill in; that is, nonzero entries may need to be stored in places where zeroes had been. Band matrix storage can still be used, but we need to include in the compressed matrix some extra entries representing the diagonals along which the fill in entries may occur. It turns out that the number of extra diagonals required is simply the number of nonzero subdiagonals in the original matrix. For our example, this would mean the matrix would be stored as:

 
     0  0  0  0  0  0
     0  0  0  0  0  0
     0 12 23 34  0  0
    11 22 33 44 45  0
    21 32 43 54 55 56
    31 42 53 64 65 66
  

Back to TABLE OF CONTENTS.

Bandwidth

The bandwidth of a band matrix is, roughly speaking, the number of diagonals that contain nonzero entries.

More precisely, define ML, the lower bandwidth of a matrix A to be the maximum value of ( I - J ), and MU to be the maximum value of ( I - J ), for all nonzero matrix entries Ai,j. Then the bandwidth M is defined by:

M = ML + 1 + MU.

This definition always treats the (main) diagonal as nonzero, and is not misled by a matrix which has only two nonzero diagonals, which are actually widely separated. All the territory between the diagonals must be included when measuring bandwidth.

Back to TABLE OF CONTENTS.

Basis

A basis for a linear space X of dimension N is a set of N vectors, {v(i) | 1 <= i <= N } from which all the elements of X can be constructed by linear combinations.

Naturally, we require that each of the vectors v(i) be an element of the space X. Moreover, it is not enough that these vectors span the space; we also require that they be linearly independent, that is, there should be no redundant vectors in the set.

The columns of the identity matrix form a basis for the linear space of vectors of dimension N. A square matrix of order N is not defective exactly when its eigenvectors form a basis for the linear space of vectors of dimension N.

Given a particular basis, the representation of a vector x is the unique set of coefficients c(i) so that

x = Sum ( 1 <= I <= N ) c(i) * v(i)
The coefficients must be unique, otherwise you can prove that the basis is not linearly independent!

If the basis vectors are pairwise orthogonal, then the basis is called an orthogonal basis. If the basis vectors have unit length in the Euclidean norm, the basis is a normal basis. If both properties apply, it is an orthonormal basis. The columns of an orthogonal matrix are an orthonormal basis for the linear space of vectors of dimension N.

Back to TABLE OF CONTENTS.

Bidiagonal Matrix

A bidiagonal matrix has only two nonzero diagonals. The matrix is called upper bidiagonal if these are the main diagonal and the immediate upper diagonal. The matrix is called lower bidiagonal if these are the main diagonal and the immediate lower diagonal.

A simple example of an upper bidiagonal matrix is:

    1 2 0 0 0
    0 3 4 0 0
    0 0 5 6 0
    0 0 0 7 8
    0 0 0 0 9
  

The Jordan Canonical Form is an example of an upper bidiagonal matrix.

A bidiagonal matrix is automatically a:

with all the rights and privileges appertaining thereunto.

Back to TABLE OF CONTENTS.

Basic Linear Algebra Subprograms (BLAS)

The BLAS or Basic Linear Algebra Subprograms, are a set of routines offering vector and matrix utilities. They are extensively used as part of LINPACK and LAPACK, to simplify algorithms, and to make them run more quickly.

The Level 1 BLAS provide basic vector operations such as the dot product, vector norm, and scaling. Level 2 BLAS provide operations involving a matrix and a vector, and Level 3 BLAS provide matrix-matrix operations.

There are also sets of sparse BLAS and parallel BLAS available.

Here are the Level 1 BLAS routines for real, single precision vectors:

Back to TABLE OF CONTENTS.

Block Matrix

A block matrix is a matrix which is described as being built up of smaller matrices.

For example, a tridiagonal block matrix might look like this:

    2 4 | 3 9 | 0 0
    4 6 | 0 3 | 0 0
    ---------------
    1 0 | 2 4 | 3 9
    5 5 | 4 6 | 0 3
    ---------------
    0 0 | 1 0 | 2 4
    0 0 | 5 5 | 4 6
  
but for certain purposes, it might help us to see this matrix as "really" being a tridiagonal matrix, whose elements are themselves little matrices:
    a | b | 0
    ---------
    c | a | b
    ---------
    0 | c | a
  

An algorithm suitable for a tridiagonal matrix can often be extended, in a natural manner, to handle a block tridiagonal matrix. Similar extensions can be made in some cases for other types of block matrices. A block banded matrix can be factored by a variant of banded Gauss elimination, for instance.

Back to TABLE OF CONTENTS.

Border Banded Matrix

A border banded matrix is a 2 by 2 block matrix comprising a (large) leading block which is a square banded matrix, two dense rectangular side strips, and a (small) trailing block which is a square dense matrix.

For example, a "toy" border banded matrix might look like this:

     2 -1  0  0  0  0  0 | 1 2
    -1  2 -1  0  0  0  0 | 2 5
     0 -1  2 -1  0  0  0 | 7 8
     0  0 -1  2 -1  0  0 | 3 3
     0  0  0 -1  2 -1  0 | 4 2
     0  0  0  0 -1  2 -1 | 3 1
     0  0  0  0  0 -1  2 | 7 8
     -------------------------
     3  7  8  3  2  3  1 | 5 2
     1  2  4  7  9  2  4 | 3 6
  
which we can regard as being made up of the blocks:
A11 | A12
---------
A21 | A22
where, as we specified, A11 is a square banded matrix, A22 is a square dense matrix, and A21 and A12 are rectangular strips.

It is desirable to take advantage of the banded structure of A11. We can specify an algorithm for solving a linear system A * x = b that can be written in terms of operations involving the sub-matrices A11, A12, A21 and A22, which will achieve this goal, at the expense of a little extra work. One problem with this technique is that it will fail if certain combinations of the matrices A11 and A22 are singular, which can happen even when A is not singular.

The algorithm for solving A * x = b rewrites the system as:

A11 * X1 + A12 * X2 = B1
A21 * X1 + A22 * X2 = B2
The first equation can be solved for X1 in terms of X2:
X1 = - A11-1 * A12 * X2 + A11-1 * B1
allowing us to rewrite the second equation for X2:
( A22 - A21 * A11-1 * A12 ) X2 = B2 - A21 * A11-1 * B1
which can be solved as:
X2 = ( A22 - A21 * A11-1 * A12 )-1 * ( B2 - A21 * A11-1 * B1 )

The actual algorithm doesn't compute the inverse, of course, but rather factors the matrices A11 and A22 - A21 * A11-1 * A12.

Back to TABLE OF CONTENTS.

Cartesian Basis Vectors

The Cartesian basis vectors are simply the N columns of the identity matrix, regarded as individual column vectors.

These vectors form the standard basis for the set of vectors RN. The vector corresponding to the I-th column of the identity matrix is often symbolized by ei.

Thus, if we are working in a space with dimension N of 4, the basis vectors e1 through e4 would be:

    1     0     0    0
    0     1     0    0
    0     0     1    0
    0     0     0    1
  

Facts about the Cartesian basis vectors:

Back to TABLE OF CONTENTS.

The Cauchy-Schwarz Inequality

The Cauchy-Schwarz Inequality is a relationship between a vector inner product and a vector norm derived from that inner product. In particular, if the norm ||*|| is defined by an inner product (*,*) as follows:

|| x || = sqrt ( x, x ),
then the Cauchy-Schwarz inequality guarantees that for any vectors x and y it is the case that:
| ( x, y ) | <= || x || * || y ||.

Back to TABLE OF CONTENTS.

The Cayley-Hamilton Theorem

The Cayley-Hamilton Theorem guarantees that every (square) matrix satisfies its own characteristic equation.

For example, if A is the matrix:

    2 3
    1 4
  
then the characteristic equation is
lambda2 - 6 * lambda + 5 = 0.
which is not true for all values lambda, but just a few special values known as eigenvalues. The Cayley-Hamilton theorem guarantees that the matrix version of the characteristic equation, with A taking the place of lambda, is guaranteed to be true:
A2 - 6 * A + 5 * I = 0.

Back to TABLE OF CONTENTS.

CentroSymmetric Matrix

A centrosymmetric matrix is one which is symmetric about its center; that is,

Ai,j = Am+1-i,n+1-j

Example:

     1 10  8 11  5
    13  2  9  4 12
     6  7  3  7  6
    12  4  9  2 13
     5 11  8 10  1
  

A centrosymmetric matrix A satisfies the following equation involving the Exchange matrix J:

J*A*J = A
.

Back to TABLE OF CONTENTS.

Characteristic Equation

The characteristic equation of a (square) matrix A is the polynomial equation:

det ( A - lambda * I ) = 0
where lambda is an unknown scalar value.

The left hand side of the equation is known as the characteristic polynomial of the matrix. If A is of order N, then there are N roots of the characteristic equation, possibly repeated, and possibly complex.

For example, if A is the matrix:

    2 3
    1 4
  
then the characteristic equation is
        (    2 - lambda     3           )
    det (    1              4 - lambda  )  = 0
  
or
lambda2 - 6 * lambda + 5 = 0.
This equation has roots lambda = 1 or 5.

Values of the scalar lambda which satisfy the characteristic equation are known as eigenvalues of the matrix.

Some facts about the characteristic equation of A:

The Cayley-Hamilton Theorem guarantees that the matrix itself also satisfies the matrix version of its characteristic equation.

Back to TABLE OF CONTENTS.

Cholesky Factorization

The Cholesky factorization of a positive semidefinite symmetric matrix A has the form:

A = L * LT
where L is a lower triangular matrix, or equivalently:
A = RT * R
where R is an upper triangular matrix.

If a matrix is symmetric, then it is possible to determine whether or not the matrix is positive definite simply by trying to compute its Cholesky factorization: if the matrix has a zero eigenvalue, then it is positive semidefinite, and the algorithm should theoretically spot this by computing a zero diagonal element; if the matrix actually has a negative eigenvalue, then at a particular point in the algorithm, the square root of a negative number will be computed.

Software to compute the Cholesky factorization often saves space by using symmetric matrix storage, and overwriting the original matrix A by its Cholesky factor L.

As long as the matrix A is positive definite, the Cholesky factorization can be computed from an LU factorization.

The Cholesky factorization can be used to compute the square root of the matrix.

The LINPACK routines SCHDC, SCHUD, SCHDD, SCHEX, SPOCO, SPOFA, SPODI, and SPOSL compute and use the Cholesky factorization.

Back to TABLE OF CONTENTS.

Circulant Matrix

A circulant matrix is a matrix whose first row of values is repeated in each successive row, shifted one position to the right, with the end value "wrapping around".

Here is a square circulant matrix:

    1 2 3 4
    4 1 2 3
    3 4 1 2
    2 3 4 1
  
a "wide" rectangular circulant matrix:
    1 2 3 4 5
    5 1 2 3 4
    4 5 1 2 3
  
a "tall" rectangular circulant matrix:
    1 2 3
    5 1 2 
    4 5 1
    3 4 5
    2 3 4
  

Simple facts about a (rectangular) circulant matrix A:

Simple facts about a square circulant matrix A:

Compare the concept of an anticirculant matrix.

Back to TABLE OF CONTENTS.

Cofactor Matrix

The cofactor matrix of a square matrix A is generally used to define the adjoint matrix, or to represent the determinant.

For a given matrix A, the cofactor matrix is the transpose of the adjoint matrix:

cofactor ( A ) = ( adjoint ( A ) )T

The determinant det(A) can be represented as the product of each of the entries of any given row or column times their corresponding cofactor entries. In particular, consider the first row:

det(A) = A(1,1) * cofactor(A)(1,1) + A(1,2) * cofactor(A)(1,2) + ... + A(1,N) * cofactor(A)(1,N)

The formula for the (I,J) entry of the cofactor matrix of A is:

cofactor(A)(I,J) = (-1)(I+J) * det ( M(A,I,J) )
where M(A,I,J) is the minor matrix of A, constructed by deleting row I and column J.

Back to TABLE OF CONTENTS.

Column Echelon Form

Column echelon form is a special matrix structure which is usually arrived at by Gauss elimination.

Any matrix can be transformed into this form, using a series of elementary column operations. Once the form is computed, it is easy to compute the determinant, inverse, the solution of linear systems (even for underdetermined or overdetermined systems), the rank, and solutions to linear programming problems.

A matrix (whether square or rectangular) is in column echelon form if:

A matrix is in column reduced echelon form if it is in column echelon form, and it is also true that:

Column echelon form is primarily of use for teaching, and analysis of small problems, using exact arithmetic. It is of little interest numerically, because very slight errors in numeric representation or arithmetic can result in completely erroneous results.

Back to TABLE OF CONTENTS.

Commuting Matrices

Two square matrices, A and B, are said to commute if

A * B = B * A

Facts about commuting matrices:

Back to TABLE OF CONTENTS.

Companion Matrix

The companion matrix for a monic polynomial P(X) of degree N is a matrix of order N whose characteristic polynomial is P(X).

If the polynomial P(X) is represented as:

P(X) = XN + C(1) * X(N-1) + ... + C(N-1) * X + C(N).
then the companion matrix for this polynomial has the form:
    0 0 0 ... 0 0 -C(N)
    1 0 0 ... 0 0 -C(N-1)
    0 1 0 ... 0 0 -C(N-2)
    ...................
    0 0 0 ... 1 0 -C(2)
    0 0 0 ... 0 1 -C(1)
  

Note that the determinant of the companion matrix is -C(N) or +C(N), so it is never zero unless the polynomial itself is degenerate (has a zero leading coefficient).

The characteristic polynomial of a companion matrix is, in fact, the polynomial that was used to define the matrix originally. Thus it is always possible to construct a matrix with any desired set of eigenvalues, by constructing the corresponding characteristic polynomial, and then the companion matrix.

The companion matrix can also be used to perform a decomposition of a matrix A. If x is a vector, and K the Krylov matrix

K = Krylov ( A, x, n )
whose columns are the successive products x, A*x, A2*x, and so on, and if K is nonsingular, then
A = K * C * K-1
where the matrix C is the companion matrix of A.

(There are several equivalent forms of the companion matrix, with the coefficients running along the top, the bottom, or the first column of the matrix.)

Back to TABLE OF CONTENTS.

Compatible Norms

A matrix norm and a vector norm are compatible if it is true, for all vectors x and matrices A that

||A*x|| <= ||A|| * ||x||
In some texts, the word consistent is used in this sense, instead of compatible.

In particular, if you have not verified that a pair of norms are compatible, then the above inequality is not guaranteed to hold. For any vector norm, it is possible to define at least one compatible matrix norm, namely, the matrix norm defined by:

||A|| = supremum ||A*x|| / ||x||
where the supremum (roughly, the "maximum") is taken over all nonzero vectors x. If a matrix norm can be derived from a vector norm in this way, it is termed a vector-bound matrix norm. Such a relationship is stronger than is required by compatibility.

If a matrix norm is compatible with some vector norm, then it is also true that

||A*B|| <= ||A|| * ||B||
where both A and B are matrices.

Back to TABLE OF CONTENTS.

Complex Number Representation

Complex numbers have the form a+bi, where i is a special quantity with the property that i2=-1.

It is possible to devise real matrices that behave like complex numbers. Let the value "1" be represented by the identity matrix of order 2, and the value "i" be represented by

    0  1
   -1  0
  
Then it is easy to show that these matrices obey the rules of complex numbers. In particular, "i" * "i" = - "1". In general, the complex number a+bi is represented by
    a  b
   -b  a
  
and multiplication and inversion have the correct properties.

Back to TABLE OF CONTENTS.

Condition Number

The condition number of the coefficient matrix A of a linear system is a (nonnegative) number used to estimate the amount by which small errors in the right hand side b, or in A itself, can change the solution x.

This analysis ignores arithmetic roundoff, which is hard to analyze, and focusses on easily measurable quantities known beforehand, and how they will amplify or diminish the roundoff errors.

Small values of the condition number suggest that the algorithm will not be sensitive to errors, but large values indicate that small data or arithmetic errors may explode into enormous errors in the answer.

The condition number is defined in terms of a particular matrix norm. Many different matrix norms may be chosen, and the actual value of the condition number will vary depending on the norm chosen. However, the general rule that large condition numbers indicate sensitivity will hold true no matter what norm is chosen.

The condition number for a matrix A is usually defined as

condition ( A ) = || A || * || A-1 ||.
If A is not invertible, the condition number is infinite.

Simple facts about the condition number:

LINPACK routines such as SGECO return RCOND, an estimate of the reciprocal of the condition number in the L1 matrix norm.

Turing's M condition number, M(A), for a matrix of order N, is defined as

M(A) = N * max | Ai,j | * max | A-1i,j |.

Turing's N condition number, N(A) is

N(A) = Frob ( A ) * Frob ( A-1 ) / N
where Frob(A) is the Frobenius matrix norm.

The Von Neumann and Goldstine P condition number is

P(A) = | lambda_Max / lambda_Min |
where lambda_Max and lambda_Min are the eigenvalues of largest and smallest magnitude, which is equivalent to using the spectral radius of A and A-1.

There is also a condition number defined for the eigenvalue problem, which attempts to estimate the amount of error to be expected when finding the eigenvalues of a matrix A.

Back to TABLE OF CONTENTS.

Congruent Matrix

Congruent matrices A and B are related by a nonsingular matrix P such that

A = PT * B * P.

Congruent matrices have the same inertia.

Congruence is of little interest by itself, but the case where P is also an orthogonal matrix is much more important.

Back to TABLE OF CONTENTS.

Conjugate Gradient Method

The conjugate gradient method is designed to solve linear systems

A * x = b
when the matrix A is symmetric, and positive definite.

The method is not an iterative method, but rather a direct method, which produces an approximation to the solution after N steps. Because of numerical inaccuracies and instabilities, many implementations of the method repeat the computation several times, until the residual error is deemed small enough.

The method is ideally suited for use with large sparse systems, because the matrix A is only accessed to compute a single matrix-vector product on each step. This involves no fill in or overwriting of the data structure that describes A. However, if A is dense, the conjugate gradient method costs roughly 3 times the number of operations for direct Gauss elimination.

The conjugate gradient method can be considered as a minimization of the functional f(x), defined by

f(x) = xT * ( 0.5 * A * x - b )
which achieves its minimum value when x solves the linear system.

Here are the formulas for the basic conjugate gradient method. Brackets indicate the value of an iterative quantity. X[0] is the initial value of the vector X, X[1] the value after one iteration, and so on.

    X[0] = 0
    For K = 1 to N
  
Compute the residual error:
    R[K-1] = B - A * X[K-1]
  
Compute the direction vector:
    If K = 1 then
      P[K] = R[0]
    else
      BETA = - PT[K-1] * A * R[K-1] 
           / ( PT[K-1] * A * P[K-1] )
      P[K] = R[K-1] + BETA * P[K-1]
    end if
  
Compute the location of the next iterate:
    ALPHA = ( RT[K-1] ) * R[K-1] / ( PT[K] * A * P[K] )
    X[K] = X[K-1] + ALPHA * P[K]
  end for
  

Conjugate gradient algorithms are available in IMSL, ITPACK and NSPCG.

Back to TABLE OF CONTENTS.

Conjugate Matrix

The conjugate matrix of a complex matrix A, denoted by A* or conjugate ( A ), is the matrix obtained by replacing each entry of A by its complex conjugate.

(In this document, the form conjugate ( A ) is preferred, because the A* is easily confused with multiplication.

The complex conjugate transpose, sometimes called the Hermitian or tranjugate of A, is derived from A by complex conjugation, followed by transposition, and is denoted by AH.

Back to TABLE OF CONTENTS.

Conjugate of a Complex Number

The conjugate of a complex number z = a + b * i is the complex number

conjugate ( z ) = a - b * i.

The conjugate is frequently represented by placing a bar over the quantity, or occasionally a star after it, as in z*.

The complex conjugate can be used in a formula for the norm or magnitude of a complex number, which must always be a real nonnegative value:

norm ( z ) = sqrt ( z * conjugate ( z ) ) = sqrt ( a2 + b2 ).

For complex vectors, an inner product with the correct properties may be defined as:

V dot W = ( V, W ) = sum ( I = 1 to N ) conjugate ( V(I) ) * W(I).
This inner product is computed in the BLAS function CDOTC, for example, and yields another relationship with the Euclidean vector norm:
|| V || = sqrt ( V dot V )

Back to TABLE OF CONTENTS.

Conjunctive Matrix

Two (complex) matrices A and B are said to be conjunctive if there is some nonsingular matrix P so that

A = ( conjugate ( P ) )T * B * P,

This is the extension to complex matrices of the concept of a congruent real matrix.

Back to TABLE OF CONTENTS.

Convergent Matrix

A convergent matrix A is a square matrix for which the limit as n goes to infinity of An is zero.

A matrix is convergent if and only if the spectral radius rho(A) satisfies

rho(A) < 1

A semiconvergent matrix A is a square matrix A for which the limit as n goes to infinity of An exists. If a matrix is semiconvergent, then either it is convergent, (rho(A) < 1) or else rho(A) = 1. In the second case, it must be further true that the only eigenvalue of norm 1 is 1, and that this eigenvalue is semisimple.

The simplest example of a semiconvergent matrix is the identity matrix.

Back to TABLE OF CONTENTS.

Cross Product

The cross product of two vectors u and v, denoted u x v, is a vector w which is perpendicular to u and v, pointing in the direction so that (u,v,w) forms a right handed coordinate system, and whose length is equal to the area of the parallelogram two of whose sides are u and v.

Algebraically,

w(1) = u(2) * v(3) - u(3) * v(2)
w(2) = u(3) * v(1) - u(1) * v(3)
w(3) = u(1) * v(2) - u(2) * v(1)

If the unit vectors in the coordinate directions are denoted by i, j and k, then the cross product vector can also be regarded as the (vector) value of the following "determinant":

                    |   i    j    k  |
    w = u x v = det | u(1) u(2) u(3) |
                    | v(1) v(2) v(3) |
  

Back to TABLE OF CONTENTS.

Cyclic Reduction

Cyclic reduction is a method for solving a linear system A*x=b in the special case where A is a tridiagonal matrix.

On a parallel computer, this method solves the system in LOG(N) "steps" where N is the order of A. A standard Gauss elimination method for a tridiagonal system would require roughly N "steps" instead.

A tridiagonal system has some very special properties that will allow us to carry this operation out. Consider this system of 7 equations:

    A11 x1 + A12 x2                                              = y1
    A21 x1 + A22 x2 + A23 x3                                     = y2
             A32 x2 + A33 x3 + A34 x4                            = y3
                      A43 x3 + A44 x4 + A43 x5                   = y4
                               A54 x4 + A55 x5 + A56 x6          = y5
                                        A65 x5 + A66 x6 + A67 x7 = y6
                                                 A76 x6 + A77 x7 = y7
  

The first equation can be used to eliminate the coefficient A21 in the second equation, and the third equation to eliminate the coefficient A23 in the second equation. This knocks out variables x1 and x3 in the second equation, but adds x4 into that equation.

By the same method, x3 and x5 can be eliminated from the equation for x4, and so on. By eliminating the odd variables from the even equations, a smaller tridiagonal system system is derived, with half the equations and variables.

If elimination is applied to this set, the number of equations is again reduced by half; this reduction may be repeated until a single equation in one variable is reached. Backsubstitution then produces the values of all the variables.

The reason this method might have an advantage over Gauss elimination is that, at each step of the elimination phase, the parts of the step are independent. If many computer processors are available, then each can be working on a separate portion of the elimination. If the number of processors is large enough, the system can really be solved in LOG(N) time.

Cyclic reduction routines are available in the NCAR software library, the SLATEC library, and the Cray SCILIB library.

Back to TABLE OF CONTENTS.

Cyclic Tridiagonal Matrix

A cyclic tridiagonal matrix is a generalization of a tridiagonal matrix which includes an extra last entry in the first row, and an extra first entry in the last row.

An example of a cyclic tridiagonal matrix:

    -2  1  0  0  1
     1 -2  1  0  0
     0  1 -2  1  0
     0  0  1 -2  1
     1  0  0  1 -2
  

A cyclic tridiagonal matrix is not a tridiagonal matrix. If the matrix is constant along the three generalized diagonals, a cyclic tridiagonal matrix is a circulant matrix. A cyclic tridiagonal matrix can arise in situations where a periodic boundary condition is applied.

It is very disappointing that a cyclic tridiagonal matrix is not a tridiagonal matrix, since there are so many good methods for solving tridiagonal linear systems. One way to solve a cyclic tridiagonal system is to use the Sherman Morrison Formula and view the matrix as a rank one perturbation of a tridiagonal matrix. Another approach is to view it as a border banded matrix.

A cyclic tridiagonal matrix may also be called a periodic tridiagonal matrix.

Back to TABLE OF CONTENTS.

Defective Matrix

A defective matrix is a (square) matrix that does not have a full set of N linearly independent eigenvectors.

For every eigenvalue, there is always at least one eigenvector, and eigenvectors corresponding to distinct eigenvalues are linearly independent. If the N eigenvalues of a matrix are distinct, then it surely has N linearly independent eigenvectors, and so cannot be defective.

Conversely, if a matrix is defective, then it must have at least one repeated eigenvalue, that is, an eigenvalue of algebraic multiplicity greater than 1. A matrix is defective if and only if its Jordan Canonical Form has at least one nonzero entry on the superdiagonal.

Thus, a simple example of a defective matrix is:

    1 1
    0 1
  
which has the single eigenvalue of 1, with algebraic multiplicity 2, but geometric multiplicity 1. The only eigenvector is ( 0, 1 ).

If a matrix is not defective, then its eigenvectors form a basis for the entire linear space. In other words, any vector y can be written as

y = X * c
where X is the array of eigenvectors of A.

If a matrix A is not defective, then it is similar to its diagonal eigenvalue matrix:

A = X * LAMBDA * X-1
and the similarity transformation matrix X is actually the eigenvector matrix.

This in turn allows us to make interesting statements about the inverse, tranpose, and powers of A. For instance, we see that

A2 = ( X * LAMBDA * X-1 ) * ( X * LAMBDA * X-1 )
= X * LAMBDA2 * X-1
leading us to the statement that for a nondefective matrix, the square has the same eigenvectors, and the square of the eigenvalues.

Back to TABLE OF CONTENTS.

Deflation

Deflation is a technique for "removing" a known eigenvalue from a matrix, in order to facilitate the determination of other eigenvalues.

For example, the power method is able to estimate the eigenvalue of largest modulus of a matrix A. Once this is computed, it might be desired to find the next largest eigenvalue. Deflation can be used to essentially create a new matrix, A', which has the same eigenvalues as A, except that the largest eigenvalue has been dropped (and the order of A' reduced by 1) or the largest eigenvalue is replaced by 0. In either case, the power method applied to A' will produce the next largest eigenvalue of A.

To eliminate a known eigenvalue, lambda, it is necessary to know its eigenvector x, which we will assume has been scaled to have unit Euclidean norm. By the properties of eigenvectors, we know that

A * x = lambda * x.
Now define the matrix A' so that:
A' = A - lambda * x * xT.
Now x is an eigenvector of A' with eigenvalue 0, because:
A' * x = ( A - lambda * x * xT ) * x
= A * x - lambda * x * xT * x
= lambda * x - lambda * x
= 0

If the power method is being employed, then the new iteration should try to "factor out" any component of the eigenvector x; otherwise, small errors in the computation of the first eigenvalue and eigenvector will interfere with the next results.

Theoretically, this process may be repeated as often as desired, eliminating each eigenvalue as it is discovered. Practically, however, accumulated errors in the eigenvalues and eigenvectors make the computation more and more unreliable with each step of deflation. Thus, if more than a few eigenvalues are desired, it is more appropriate to use a standard technique.

Back to TABLE OF CONTENTS.

Derogatory Matrix

A derogatory matrix is a matrix whose minimal polynomial is of lower degree than its characteristic polynomial

For this to happen, of course, the matrix must have at least one eigenvector with algebraic multiplicity greater than one.

Surprisingly, the identity matrix is derogatory, (ignoring the case N=1). A matrix can be guaranteed to be nonderogatory if the geometric multiplicity of every eigenvalue is 1.

Perhaps the only reason that the term is worth knowing is this fact: every nonderogatory matrix is similar to the companion matrix of its characteristic polynomial.

Back to TABLE OF CONTENTS.

Determinant of a Matrix

The determinant of a square matrix is a scalar value which is zero exactly when the matrix is singular.

If a matrix is singular, then it doesn't have an inverse and linear systems cannot be reliably solved. In numerical work, the determinant is not a reliable indicator of singularity, and other data, such as the size of the matrix elements encountered during pivoting, are preferred.

The determinant also occurs in the definition of the eigenvalue problem.

An explicit formula for the determinant of a matrix A is:

det ( A ) = sum [ over all P ] sign(P) * A(1,P(1)) * A(2,P(2) * ... * A(N,P(N)).
where the sum ranges over all possible permutations P of the numbers 1 through N, and sign(P) is +1 for an even permutation, and -1 for an odd permutation. (Any permutation may be accomplished by a sequence of switching pairs of objects. The permutation is called even or odd, depending on whether the number of switches is even or odd).

A numerical method for finding the determinant comes as a byproduct of the LU factorization used in Gaussian elimination. Typically, this factorization has the form

A = P * L * U,
and the value of the determinant is simply
det ( A ) = det ( P ) * product ( 1 <= I <= N ) U(I,I).
where det ( P ) is +1 or -1, again determined by the sign of the permutation.

Simple facts about the determinant:

A single elementary row operation has the following effect on the determinant:

For small matrices, the exact determinant is simple to compute by hand. The determinant of a 2 by 2 matrix

    a  b
    c  d
  
is a*d-b*c, while the determinant of a 3 by 3 matrix:
    a  b  c
    e  f  g
    h  i  j
  
is a * (f*j-g*i) - b * (e*j-g*h) + c * (e*i-f*h).

If absolutely necessary, the determinant of a matrix of order N can be computed recursively in terms of determinants of minor matrices. Let M(A,I,J) stand for the (I,J) minor matrix of A. Then the determinant of A is

det ( A ) = sum ( 1 <= J <= N ) (-1)(J+1) * Ai,j * det ( M(A,I,J) ).
Of course, now we need to compute the determinants of the N minor matrices, but the order of these matrices has been reduced by 1. Theoretically, we can represent the determinant of any of these matrices of order N-1 by a similar sum involving minor matrices of order N-2, and this process can be repeated until we reach matrices of order 1 or 2, whose determinants are easy to compute. In practice, this method is never used except in simple classroom exercises.

There is a geometric interpretation of the determinant. If the rows or columns of A are regarded as vectors in N dimensional space, then the determinant is the volume of a the parallelepiped, or "slanted cube" whose one corner is defined by these vectors.

LAPACK and LINPACK provide routines for computing the determinant of a matrix, after the matrix has been decomposed into LU factors.

Back to TABLE OF CONTENTS.

Diagonal Dominance

A matrix is (column) diagonally dominant if, for every column, the sum of the absolute values of the offdiagonal elements is never greater than the absolute value of the diagonal element.

The matrix is strictly diagonally dominant if the offdiagonal sum is always strictly less than the absolute value of the diagonal element.

The same definitions can be used to consider rows instead of columns. The terms column diagonally dominant and row diagonally dominant may be used, if necessary, to specify which case is being considered.

A strictly diagonally dominant matrix cannot be singular, by Gershgorin's Theorem.

A diagonally dominant matrix which is also irreducible cannot be singular.

Here is a diagonally dominant matrix which is not strictly diagonally dominant:

    -2  1  0  0  0
     1 -2  1  0  0
     0  1 -2  1  0
     0  0  1 -2  1
     0  0  0  1 -2
  

For a linear system A * x = b, if the matrix A is strictly diagonally dominant, then both Jacobi iteration and Gauss Seidel iteration are guaranteed to converge to the solution.

Back to TABLE OF CONTENTS.

Diagonal Matrix

A diagonal matrix is one whose only nonzero entries are along the main diagonal. For example:

    3 0 0
    0 4 0
    0 0 7
  

Simple facts about a diagonal matrix A:

Back to TABLE OF CONTENTS.

Diagonalizable Matrix

A diagonalizable matrix is any (square) matrix A which is similar to a diagonal matrix:

A = P * D * P-1.

This concept is important in the study of eigenvectors To see the relationship, post-multiply the equation by P:

A * P = P * D.
Looking at the columns of P as eigenvectors, and the diagonal entries of D as eigenvalues, this shows that a matrix is diagonalizable exactly when it has N linearly independent eigenvectors.

In certain cases, not only is a matrix diagonalizable, but the matrix P has a special form. The most interesting case is that of any (real) symmetric matrix A; not only can such a matrix be diagonalized, but the similarity matrix is orthogonal:

A = Q * D * Q-1 = Q * D * QT,
This fact can be interpreted to show that not only does every symmetric matrix have a complete set of eigenvectors, but the eigenvectors and eigenvalues are real, and the eigenvectors are pairwise orthogonal.

Similarly, a complex matrix that is defective.

  • If a matrix is normal, then it is not only diagonalizable, but the transformation matrix is unitary.
  • Back to TABLE OF CONTENTS.

    Downshift Matrix

    The downshift matrix A circularly shifts all vector entries or matrix rows down 1 position.

    Example:

        0 0 0 1
        1 0 0 0
        0 1 0 0
        0 0 1 0
      

    Facts about the downshift matrix A:

    Back to TABLE OF CONTENTS.

    Eigenvalues

    Eigenvalues are special values associated with a (square) matrix, which can be used to analyze its behavior in multiplying any vector.

    The formal definition of an eigenvalue of a matrix A is that it is any value lambda which is a root of the characteristic equation of the matrix,

    det ( A - lambda * I ) = 0.

    lambda is an eigenvalue of A if and only if there is a nonzero vector x, known as an eigenvector (or sometimes a "right" eigenvector), with the property that

    A * x = lambda * x.
    Note that there must also be a "left" eigenvector y, with the property
    y * A = AT * y = lambda * y.

    The characteristic equation has exactly N roots, so a matrix has N eigenvalues. An important consideration is whether any eigenvalue is a repeated root, which determines how hard the eigenvector computation will be.

    If a matrix has the maximum possible number of linearly independent eigenvectors (namely N, the order of the matrix), then the eigenvalues and eigenvectors can be used to diagonalize the matrix. This only happens when the matrix is normal.

    Simple facts about eigenvalues of A:

    Simple algorithms for computing eigenvalues include the power method and the inverse power method. The QR method is a more powerful method that can handle complex and multiple eigenvalues.

    LAPACK and EISPACK include algorithms for computing the eigenvalues and eigenvectors of a variety of types of matrix, as well as methods that can be applied to more general eigenvalue problems.

    Back to TABLE OF CONTENTS.

    Eigenvectors

    A nonzero vector x is an eigenvector of the square matrix A if

    A * x = lambda * x
    for some scalar value lambda, called the associated eigenvalue.

    Sometimes this eigenvector is more particularly described as a right eigenvector, so that we may also consider left eigenvectors, that is, vectors y for which it is true that

    y * A = AT * y = mu * y
    for some scalar mu.

    For every eigenvalue of a matrix, there is at least one eigenvector. Every nonzero multiple of this eigenvector is also an eigenvector, but in an uninteresting way. If, and only if, an eigenvalue is a repeated root, then there may be more than one linearly independent eigenvector associated with that eigenvalue. In particular, if an eigenvalue is repeated 3 times, then there will be 1, 2 or 3 linearly independent eigenvectors corresponding to that eigenvalue.

    Facts about eigenvectors:

    Back to TABLE OF CONTENTS.

    EISPACK

    EISPACK is a package of routines for handling the standard and generalized eigenvalue problems.

    The beginning user who is not interested in trying to learn the details of EISPACK and simply wants the answer to an eigenvalue problem quickly should call one of the main driver routines. Each of these is tailored to handle a given problem completely with a single subroutine call. For more advanced work, it may be worth investigating some of the underlying routines.

    Driver routines to solve A*x=lambda*x include:

    For the generalized eigenvalue problem:

    Back to TABLE OF CONTENTS.

    EISPACK Matrix Norm

    The EISPACK matrix norm is used in the EISPACK eigenvalue package.

    The definition of the norm for an M by N matrix is:

    ||A|| = sum ( I = 1 to M, J = 1 to N ) | Ai,j |

    It's a simple exercise to verify that this quantity satisifes the requirements for a matrix norm.

    This norm is easy to calculate, and was used in EISPACK in order to have a standard against which to compare the size of matrix elements that were being driven to zero. I haven't seen it used anywhere else in practice.

    Back to TABLE OF CONTENTS.

    Elementary Column Operations

    Elementary column operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to column echelon form.

    Restricting the operations to a simple set makes it easy to:

    The three elementary column operations include:

    Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as postmultiplication by a concatenation of elementary matrices:

    B = A * E(1) * E(2) * ... * E(k)
    which may be abbreviated as:
    B = A * C
    Since C will be guaranteed to be invertible, we also know that,
    B * C-1 = A
    which yields a factorization of A.

    Back to TABLE OF CONTENTS.

    Elementary Matrix

    An elementary matrix E is one which, when pre-multiplying another matrix A, produces a product matrix E * A which has exactly one of the following properties:

    The matrix E which interchanges rows R1 and R2 of matrix A has the form E(I,J)=:

    The inverse of this matrix is simply its transpose.

    The matrix E which multiplies row R1 of A by the constant s has the form E(I,J)=:

    The inverse of this matrix is constructed by negating the value of s.

    The matrix E which adds s * row R2 to row R1 of A has the form E(I,J) =:

    The inverse of this matrix is constructed in the same way, using 1/s.

    If a matrix F can be represented as the product of elementary matrices,

    F = E1 * E2 * ... * EM,
    then its inverse is:
    F-1 = EM-1 * EM-1-1 * ... * E1-1.

    An elementary similarity transformation uses a matrix F which is the product of elementary matrices, and transforms the matrix A into the similar matrix B by the formula

    B = F-1 * A * F.

    Back to TABLE OF CONTENTS.

    Elementary Row Operations

    Elementary row operations are a simple set of matrix operations that can be used to carry out Gauss elimination, Gauss Jordan elimination, or the reduction of a matrix to row echelon form.

    Restricting the operations to a simple set makes it easy to:

    The three elementary row operations include:

    Each of these operations may be represented by an elementary matrix, and the transformation of the original matrix A to the reduced matrix B can be expressed as premultiplication by a concatenation of elementary matrices:

    B = E(k) * E(k-1) * ... * E(2) * E(1) * A
    which may be abbreviated as:
    B = C * A.
    Since C will be guaranteed to be invertible, we also know that,
    C-1 * B = A
    which yields a factorization of A.

    Back to TABLE OF CONTENTS.

    Ellipsoids

    An ellipsoid is an N dimensional generalization of an ellipse. The formula for an ellipsoid may be written as:

    sum ( I = 1 to N, J = 1 to N ) Ai,j * Xi * Xj = 1.
    where A is a positive definite symmetric matrix.

    A principal axis of an ellipsoid is any N dimensional point X on the ellipsoid such that the vector from the origin to X is normal to the ellipsoid.

    In the general case, there are exactly N principal axes (plus their negatives). In degenerate cases, there may be an entire plane of vectors that satisfy the requirement, but it is always possible to choose a set of N principal axes which are linearly independent.

    Moreover, in the general case, the principal axes are pairwise orthogonal, and in the degenerate case, may be chosen pairwise orthogonal.

    Moreover, it is always true that the principal axes are eigenvectors of the matrix A of ellipsoid coefficients. The length of the principal axis vector associated with an eigenvalue lambda(I) is 1 / Sqrt ( lambda(I) ).

    These facts have a strong relationship to the formulation of the conjugate gradient method.

    Back to TABLE OF CONTENTS.

    Equilibration

    Equilibration is the technique of balancing the rows or columns of a matrix by rescaling them.

    Consider, for instance, the fact that the following two equations are equivalent:

    0.0001 * x + 0.0001 * y = 0.0001
    and
    1000 * x + 1000 * y = 1000
    However, the large coefficients in the second equation will bias a Gauss elimination routine to choose that equation as its pivot. Actually, it's more important in this case that the chosen row be as "linearly independent as possible" from the other rows, and this is more likely to occur if we ensure that all the rows start out with an equal norm. This can be done very simply, by finding the element of maximum absolute value in each row and dividing that row (and its right hand side) by that value. Such a technique is called row equilibration. It is not necessary that the rows have precisely the same norm; it is desirable that the norms of the rows be maintained within some controlled range.

    Equilibration is useful in many areas of linear algebra, including eigenvalue calculations. In some cases, column equilibration is preferred, and in other cases, the norms of both the rows and columns are to be controlled.

    Back to TABLE OF CONTENTS.

    Equivalent Matrix

    Matrices A and B are said to be equivalent if there are nonsingular matrices P and Q so that

    A = P * B * Q.

    Simple facts about equivalence:

    Equivalence is a very loose concept of relatedness. A stronger and more useful concept is similarity.

    Back to TABLE OF CONTENTS.

    Exchange Matrix

    The exchange matrix J is constructed from an identity matrix by reversing the order of the columns.

    For example, the matrix J of order 4:

        0 0 0 1
        0 0 1 0
        0 1 0 0
        1 0 0 0
      

    Facts about the exchange matrix J:

    The exchange matrix is also called the anti-identity or counter-identity matrix.

    Back to TABLE OF CONTENTS.

    External Storage Algorithms

    An external storage algorithm is a method of solving a problem that is too large to be loaded into computer memory as a whole.

    Instead, the problem is solved incrementally, with most of the problem data residing, at any one time, in computer files, also called "disk storage" or "external storage". It is a considerable difficulty just to rewrite an algorithm that can handle a situation where, say, part of a matrix is in one place, and part is in another, remote place.

    However, such algorithms must also be aware that data transfers between memory and disk are very slow. Hence, if the algorithm is to be of any use, it must do as much processing as possible on the portion of the data that resides in memory, and read the external problem data into memory as rarely as possible, and in large contiguous "chunks".

    Back to TABLE OF CONTENTS.

    Fourier Matrix

    The Fourier matrix represents the linear operator that transforms a vector of data into a vector of Fourier coefficients.

    Let w indicate an N-th root of unity. Then, if we choose N=4, the matrix F will be:

        1  1    1    1
        1  w    w^2  w^3
        1  w^2  w^4  w^6
        1  w^3  w^6  w^9
      
    which simplifies to
        1  1    1    1
        1  w    w^2  w^3
        1  w^2  1    w^2
        1  w^3  w^2  w^1
      
    However, we will choose to scale F by 1/sqrt(N).

    Facts about the Fourier matrix F:

    Back to TABLE OF CONTENTS.

    Frobenius Matrix Norm

    The Frobenius matrix norm is a matrix norm that has the simple formula: ||A|| = the square root of the sum of the squares of all the entries of the matrix.

    The Frobenius matrix norm is not a vector-bound matrix norm, although it is compatible with the L2 vector norm, and much easier to compute that the L2 matrix norm.

    The Frobenius matrix norm is sometimes called the Schur matrix norm or Euclidean matrix norm

    Back to TABLE OF CONTENTS.

    Gauss Elimination

    Gauss elimination has the goal of producing a solution x to the system of linear equations A*x=b, where A is matrix of order N, and b a vector of length N. The standard version of Gauss elimination used in most algorithms employs partial pivoting.

    Gauss elimination accomplishes its goal by decomposing the original matrix A into three factors, a permutation matrix P, a unit lower triangular matrix L, and an upper triangular matrix U. The factors are related to the original matrix by the formula

    A = P * L * U.

    Once the matrix is factored, it is a simple matter to solve A*x=b, by solving instead P * ( L * ( U * x ) ) ) = b, because each of the three factors is easy to invert.

    Moreover, once the factors are known, the user may solve several linear systems involving A, with different right hand sides.

    The determinant of A is equal to the product of the determinants of the factors, and hence is easily computed: the determinant of P is plus or minus 1, and that of L is 1, and that of U is simply the product of its diagonal elements.

    The inverse matrix could be solved for, if necessary, by solving the N linear systems A * X(I) = E(I), where E(I) is the I-th Cartesian basis vector The vectors X(1), X(2), ..., X(N) then are the columns of the inverse of A.

    As an example of the Gauss elimination of a matrix, suppose we start with with the the matrix:

            A
    
          1 2 3
          4 5 6
          7 8 0
      
    We can write an imperfect PLU factorization as:
            P          L          U
    
          1 0 0      1 0 0      1 2 3
          0 1 0      0 1 0      4 5 6
          0 0 1      0 0 1      7 8 0
      
    The factorization is imperfect because, although A = P*L*U, the matrix U is not upper triangular. We will now modify the matrix U, and update the factors P and L, so that it is always true that A=P*L*U, while the matrix U gradually is transformed into the correct upper triangular form.

    Step 1.1: Choose a pivot row in U, namely, row 3. We want to interchange rows 3 and 1 of the matrix U. The elementary permutation matrix P(1,3) does this. We are allowed to insert the inverse of this matrix times itself between L and U in the factorization. We also insert the inverse of this matrix times itself between P and L. If we use primes to denote the updated quantities, these operations are:

        
          A = P * L * U
            = [ P * P-1(1,3) ] * [ P(1,3) * L * P-1(1,3) ] * [ P(1,3) * U ]
            = P' * L' * U'
        
      
    The resulting factorization is now:
            P          L          U
    
          0 0 1      1 0 0      7 8 0
          0 1 0      0 1 0      4 5 6
          1 0 0      0 0 1      1 2 3
      
    Step 1.2: Eliminate U2,1 by subtracting 4/7 of row 1 from row 2. To do this, we construct the elementary matrix L(1,2,4/7), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.
        
          A = P * L * U
            = P * [ L * L-1(1,2,4/7) ] * [ L(1,2,4/7) * U ]
            = P * L' * U'
        
      
    The resulting factorization is now:
            P            L           U
    
          0 0 1       1  0 0      7  8  0
          0 1 0      4/7 1 0      0 3/7 6
          1 0 0       0  0 1      1  2  3
      
    Step 1.3: Eliminate U3,1 by subtracting 1/7 of row 1 from row 3. To do this, we construct the elementary matrix L(1,3,1/7), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.
        
          A = P * L * U
            = P * [ L * L-1(1,3,1/7) ] * [ L(1,3,1/7) * U ]
            = P * L' * U'
        
      
    The resulting factorization is now:
            P           L           U
    
          0 0 1      1  0 0      7  8  0
          0 1 0     4/7 1 0      0 3/7 6
          1 0 0     1/7 0 1      0 6/7 3
      
    Step 2.2: Choose a pivot row in U, namely, row 3. We want to interchange rows 3 and 2 of the U. The elementary permutation matrix P(2,3) does this. We are allowed to insert the inverse of this matrix times itself between L and U in the factorization. We also insert the inverse of this matrix times itself between P and L. If we use primes to denote the updated quantities, these operations are:
        
          A = P * L * U
            = [ P * P-1(2,3) ] * [ P(2,3) * L * P-1(2,3) ] * [ P(2,3) * U ]
            = P' * L' * U'
        
      
    The resulting factorization is now:
            P            L           U
    
          0 1 0       1  0 0      7  8  0
          0 0 1      1/7 1 0      0 6/7 3
          1 0 0      4/7 0 1      0 3/7 6
      
    Step 2.3: Eliminate U3,2 by subtracting 1/2 of row 2 from row 3. To do this, we construct the elementary matrix L(2,3,1/2), and insert the product of its inverse and itself into the factorization. Then we absorb the inverse into L, and the matrix into U.
        
          A = P * L * U
            = P * [ L * L-1(2,3,1/2) ] * [ L(2,3,1/2) * U ]
            = P * L' * U'
        
      
    The resulting factorization is now:
            P            L            U
    
          0 1 0      1   0  0      7  8   0
          0 0 1     1/7  1  0      0 6/7  3
          1 0 0     4/7 1/2 1      0  0  9/2
      
    The PLU factorization is now correct.

    You should be able to see a formula for the final factors:

    P = I * P-1(1,3) * P-1(2,3)
    and
    L = P(2,3) * P(1,3) * I * P-1(1,3) * L-1(1,2,4/7) * L-1(1,3,1/7) * P-1(2,3) * L-1(2,3,1/2)
    and
    U = L(2,3,1/2) * P(2,3) * L(1,3,1/7) * L(1,2,4/7) * P(1,3) * A
    You should see that the form of these matrices guarantees that A=P*L*U. The way we carried out the steps guarantees that P stays a permutation matrix, L stays a lower triangular matrix, and U becomes an upper triangular matrix.

    Back to TABLE OF CONTENTS.

    Gauss Jordan Elimination

    Gauss Jordan elimination is a method for solving a system of linear equations A * x = b for x, or for computing the inverse matrix of A.

    Gauss elimination and Gauss Jordan elimination are very closely related. Gauss elimination reduces A to an upper triangular matrix, and saves the elimination factors in a lower triangular matrix. Gauss Jordan elimination proceeds relentlessly until A has been converted into the identity matrix.

    Thus, unlike the Gauss elimination procedure, the Gauss Jordan elimination does not produce a factorization of the matrix, but only a solution to the linear system. This means that if a second linear system has to be solved, the matrix has to be set up and eliminated all over again.

    The simplest way to describe Gauss Jordan is to note that to solve, say, the linear system A * x = b, the right hand side is appended as an extra column of the matrix. Then, on step I of the elimination, we choose a pivot row, move it to row I, divide it through by the pivot value, and then eliminate the matrix entries in column I from all other rows, rather than simply from rows I+1 through N. When the process is completed, the solution x has overwritten the right hand side b that was stored in column N+1.

    Several right hand sides can be handled at once, by appending all of them to the coefficient matrix; the inverse can be computed by appending a copy of the identity matrix to the coefficient matrix before beginning elimination.

    Gauss Jordan elimination is primarily used as a teaching tool, and for small linear systems. In practical computation, standard Gauss elimination is universally preferred.

    Back to TABLE OF CONTENTS.

    Gauss Seidel Iteration For Linear Equations

    The Gauss Seidel iteration for linear equations is an iterative method for solving linear systems of equations A*x=b. It is similar to the the Jacobi algorithm and the successive overrelaxation method (SOR).

    The Gauss Seidel iteration should only be used for matrices which are symmetric and positive definite, or for a matrix which is strictly diagonally dominant.

    Each step of the Gauss Seidel iteration begins with an approximate answer x, and produces a new approximation y. Each component of y is computed in order, using the formula:

        y(i) = [ b(i) 
          - a(i,1)*y(1) 
          - a(i,2)*y(2) 
            ... 
          - a(i,i-1)*y(i-1)
          - a(i,i+1)*x(i+1) 
            ... 
          - a(i,n)*x(n) ] / a(i,i)
      
    The calculation of each entry of y is dependent on the calculation of the entries of lower index. Thus, the value of y(1) is calculated first, and then y(2) is calculated based on the values of y(1) as well as x(3) through x(n), and so on.

    The process is to be repeated until the residual error is small, or the change in the approximate solution is negligible.

    The Gauss Seidel iteration can be considered in terms of its matrix splitting. That is, if we decompose the matrix A into its strictly lower triangular, diagonal, and strictly upper triangular parts:

    A = L + D + U
    then the method is equivalent to the iteration
    ( L + D ) * xnew = b - U * x.
    which means that the convergence of the algorithm can be understood in terms of the behavior of powers of the iteration matrix:
    - ( L + D )-1 * U,
    which in turn may best be understood by looking at the eigenvalues.

    If the original coefficient matrix A is symmetric, then it may be preferred to use the symmetric Gauss Seidel iteration or SGS. In this case, the iteration consists of pairs of Gauss Seidel steps. The odd steps are the same as the usual iteration. But in the even steps, the variables are solved for in reverse order. Each pair of such steps is a single step of the SGS iteration, which has the property that its iteration matrix is similar to a symmetric matrix (though not necessarily symmetric itself). Among other things, this means that SGS can be used as a preconditioner for certain other problems.

    Back to TABLE OF CONTENTS.

    General Matrix Storage

    A general matrix is one which has no special matrix structure or matrix symmetry.

    In such a case, there are no space advantages to be gained by using a special matrix storage format, and so the matrix entries are stored using the standard two dimensional array format provided by the programming language.

    For general matrices, the only remaining issue concerns the problem that occurs when the matrix storage must be set aside before the size of the matrix is known. In FORTRAN, for example, it is common to specify a maximum matrix size of, say, 100 by 100. If the actual problem to be solved is of size 25 by 25, then it may be necessary to describe the data with both the matrix order of 25, and the leading dimension of the storage array, which is 100. In LINPACK and LAPACK, variables containing leading dimension information have names like LDA, LDB and so on.

    LAPACK and LINPACK provide routines, with the prefix SGE, which apply to matrices in general storage.

    Back to TABLE OF CONTENTS.

    Generalized Permutation Matrix

    A generalized permutation matrix is a square matrix A with at most one nonzero entry in each row, and in each column.

    A standard permutation matrix, of course, has exactly one nonzero entry in each row and column, and that entry has value 1.

    An interesting fact: if A is a nonsingular nonnegative matrix, then the inverse of A is also nonnegative if and only if A is a generalized permutation matrix. In other words, it's very hard for both A and A-1 to be nonnegative, and essentially can only happen if A is diagonal.

    So suppose A >= 0 but A is not a generalized permutation matrix. For an arbitrary vector b >= 0 , can we say that the solution x of A * x = b is nonnegative? No, because we know that the inverse of A is not nonnegative. Therefore A-1 contains at least one negative entry, say entry (i,j). Choose b = E(J), where E(J) is the J-th Cartesian basis vector. Then x = A-1 * b and it is easy to see that x(i) is negative.

    Back to TABLE OF CONTENTS.

    Gershgorin Disks

    The method of Gershgorin disks provides an estimate of the size of the eigenvalues of a matrix. The accuracy of the estimate varies wildly, depending on the size of the elements of the matrix. It is most useful for matrices that are diagonally dominant or sparse.

    Gershgorin's theorem states that the eigenvalues of any matrix A lie in the space covered by the disks Di:

    Di = ( x: sqrt ( x - A(I,I) )2 <= Ri )
    where Ri is the sum of the absolute values of the off-diagonal elements of row I:
    Ri = sum ( J =/= I ) | Ai,j |.

    The theorem may also be applied using columns instead of rows.

    Back to TABLE OF CONTENTS.

    Givens Rotation Matrix

    A Givens rotation is a linear transformation applied to two vectors, or two rows or columns of a matrix, which can be interpreted as a coordinate axis rotation. The intent of the rotation is to zero out an entry of the vector or matrix using an orthogonal transformation.

    A Givens rotation is similar to the elementary row operation that adds a multiple of one row to another, but because a Givens rotation is an orthogonal similarity transformation, it offers greater stability and easy invertibility.

    A Givens rotation matrix G has the form:

        1  0  0  0  0  0
        0  c  0  0  s  0  <-- row i
        0  0  1  0  0  0
        0  0  0  1  0  0
        0 -s  0  0  c  0  <-- row j
        0  0  0  0  0  1  
     
           ^        ^
         col i     col j
      
    where c = cosine(theta) and s = sin(theta) for some angle theta.

    Premultiplying A by G has the following effect:

        row i  of A is replaced by   c*row i + s*row j  in G*A;
        row j  of A is replaced by  -s*row i + c*row j  in G*A.
      
    while postmultiplication, A*G, would carry out a similar operation on the columns of A.

    As an example, to zero out entry Ai,j of a matrix requires a Givens rotation with values of cosine and sine so that:

    - s * Ai,i + c * Ai,j = 0.
    It's not actually necessary to compute the underlying rotation angle theta, since c and s can be computed directly:
    s = Ai,j / sqrt ( Ai,j2 + Ai,i2 )
    c = Ai,i / sqrt ( Ai,j2 + Ai,i2 )

    For instance, to zero out the 3,1 entry of this matrix:

        4 2 0
        0 4 5
        3 8 1
      
    the sine and cosine are 3/5, 4/5, yielding a Givens matrix G of:
        0.8  0  0.6
         0   1   0
       -0.6  0  0.8
      
    and the product G * A:
        5.0 6.4 6.0
         0   4   5
         0  5.2 8.0
      

    It is possible to zero out entries of a matrix, one by one, using Givens rotations, similar to the way that Householder matrices are used, to reduce a matrix to a simpler form. The process can be used to zero out the entire lower triangle of a matrix, but further operations on the upper triangle would reintroduce zeroes in the lower triangle. Nonetheless, zeroing out the lower triangle means that Givens rotations can be used to produce the QR factorization of the matrix.

    Back to TABLE OF CONTENTS.

    Gram Matrix

    Given a set of M vectors Vi, each of dimension N, the M by M Gram matrix is formed by all the pairwise dot products of the vectors:

    
        Gi,j = Vi dot Vj
      

    The Gram matrix is a positive semidefinite symmetric. If any vector is linearly dependent on the others, the matrix is singular. If the vectors are linearly independent, the Gram matrix has full rank M. If the vectors form a basis for the space, the Gram matrix has rank N.

    The Gram matrix can be used to construct a projection matrix into the space spanned by the vectors.

    Back to TABLE OF CONTENTS.

    Gram Schmidt Orthogonalization

    Gram Schmidt orthogonalization is a process which starts with a set of N vectors X(I), each with M components, and produces a set of N2 orthonormal vectors Y which span the linear space of the original vectors.

    N2 is less than N if the vectors X(I) are linearly dependent, and equal to N if they are linearly independent. Thus, one use for the Gram Schmidt process is simply to determine if a set of vectors are linearly dependent; a second is to determine the dimension of the space spanned by a linearly dependent set.

    The Gram Schmidt process may be defined iteratively:

        for I = 1 to N
    
          N2 = I
    
          Y(I) = X(I)
    
          for J = 1 to I-1
            C(J) = dot_product ( X(I), Y(J) )
            Y(I) = Y(I) - C(J) * Y(J)
          end for
    
          Norm = sqrt ( dot_product ( Y(I), Y(I) ) )
    
          if ( Norm = 0 ) exit
     
          Y(I) = Y(I) / Norm
    
        end for
      

    Another way of looking at the process is to use the vectors to form the columns of a matrix A. The Gram Schmidt process can then be used to construct one version of the QR factorization of the matrix A:

    A = Q * R
    where Q is orthogonal and R is upper triangular.

    Back to TABLE OF CONTENTS.

    Hadamard Product