I have been reading a wonderful book about mathematical programming and I have decided to document my learnings in this blog.
This post is going to explain one of basic building blocks for solving “Linear Algebra Equations”. Consider a set of equations:
This is a system on M unknowns and N equations. Each variable can be thought of a degree of freedom and each equation can be thought of as a constraint. Think about a three variable situation, like a position of a person in a 3-D coordinate. Without any constraints, he has three degrees of freedom in the x, y and z direction. If we are given three equations describing his position(each equation in x, y and z represents a plane in 3-D), we can pin point his co-ordinates in the 3-D space.
- If M > N, the number of unknowns is greater than the number of equations, the system is said to be undetermined and has infinitely many solutions. The solution space can be restricted by Compressed Sensing.
- if M < N, the number of equations are greater than the number of variables, the system is said to be overdetermined. Here the general approach is to find the best fit solution (i.e R.M.S error values are a minimum for all equations)
- If M = N, the system is consistent if the following caveats are satisfied:
- No row should be a linear combination of the other row, this leads to row degeneracy
- If all the equations have a certain variable in the exact same linear combination, the system is afflicted by column degeneracy
- Both these equations effective result in the removal of a constraint and thus the system becomes indeterminable.
In order to obtain more accurate results and reduce round-off errors, a technique called Pivoting is used. Pivoting is done to convert a matrix to its row echelon form.
What is row echelon form?
A matrix is said to be in row echelon form if:
- All non zero rows are above the zero rows.
- The first non zero number in a row from the left called the Leading coefficient or Pivot should be strictly to the right of the leading coefficient of row above it.
- All entries in a column below the leading coefficient must be zero
- Here is an example of a matrix in row echelon form:
Pivoting can be done in two ways:
- Partial Pivoting: In this the algorithm selects element the largest absolute value and shuffles the rows in such a way that it lies along the diagonal.
- Complete Pivoting: The algorithm scans the whole matrix for the largest element and shuffles both columns and rows to place the pivot along a diagonal
We will be using an example matrix to illustrate this Algorithm (which is given in the text-book:
Numerical Recipes in C++
The equations we aim at solving is:
The algorithm takes two inputs, matrix A (coefficient matrix) and B (solution vector). The inverse of the matrix is returned in A and the variable vector is returned in B.
Step 1: Finding the Pivot Element
In the first step the algorithm iterates through the matrix and finds the largest element, in the first iteration the pivot element is the largest element of the last row. In our case it comes out to be five and is in the fist column, so there is no need for a column swap, it only needs to be swapped with t he first row. This swap is maintained in a two book-keeping arrays storing the actual position of pivot, so that the result can be restored.
The next time the algorithm searches for a Pivot element, it excludes and from the search.
Step 2: Normalizing the row
Before we understand the first step we need to understand why this actually works. Using our transformations we are basically converting the matrix into the identity matrix I. Therefore,
Where is the transformed solution vector
As we are using the equation to determine the inverse of the matrix we store the result back in A.
This step can be further subdivided into two sub-steps:
- The first step is that we normalize a given row by the Pivot element, So now our matrix equations looks like:
For the Inverse:
The solution vector also gets transformed as:
The next step is to reduce each element below the Pivot by subtracting the right amount of first row:
and similar transforms on the solution vector.
We, will discuss certain parts of the second iteration as they are slightly different from the first:
Now while iterating for the second column, the largest element found its at
Here there is no need for swapping as the pivot is found along the diagonal itself.
At the end we have done pivoting for all columns and have reduced our matrix, but we need to accommodate for the shuffling that we have done. Let us say that our book-keeping arrays:
Let us take the first case:
As the row and column number was not the same, there is an initial swap that needs to restored back. So, we swap . A row operation in the input appears as a column operation in its inverse a (explains the shuffling of columns instead of rows)