-
Notifications
You must be signed in to change notification settings - Fork 13
Key Concept
The least squares (LS) method solves for a vector of unknown modeling parameters x given a vector of observations z assuming the measurement model,
z = Ax + v
where A is a matrix relating the observations to the solve for parameters and v is a vector of observation errors that are zero mean. The LS best fit is determined by minimizing a cost function such that
||z - Ax|| = minimum
For a critically determined scenario where the number of observations is equal to the number of solve for parameters, the minimum is equal to zero via
z = Ax
Under these conditions, A is a square matrix and x may be solved for through Gauss-Jordan elimination, matrix decomposition,or even inversion of A when nothing else is available in your math library (or you simply don't know any better).
The system is typically overdetermined where there are more measurements than solve for parameters. In this case, the classic least squares best fit is through solving the normal equations
(A'A)x = A'z
The information matrix, (A'A), is a square matrix. Decomposition methods or the previously mentioned redneck approach leads to the least squares fit (statistics are employed to describe the conditions under which such a fit is considered to be a maximum likelihood and/or a minimum variance estimate).
A key element to Bierman's book is recognizing minimization of the cost function is unaffected by an orthogonal transformation T:
||z - Ax|| = T||z - Ax|| = ||Tz - T Ax||
Orthogonal transformations through Householder reflections are used to recursively annihilate portions of A and z, allowing the cost function to:
- Transform (compress) A into an upper triangular square matrix R
- Perform that transformation such that R can be used to easily solve for x (upper triangular form allowing for backwards substitution)
This approach leads to greater numerical stability by eliminating the need to "square up" the Jacobian A. The linear independence of the columns of A dictates the observability of the solve for parameters x w.r.t. the observations z. The closer these vectors are to being parallel, the worse the observability. Formation of the normal equations amplifies numerical instability. The application of Householder transformations to the observation model eliminates the need to square up the Jacobian.
Working directly with the Jacobian A gives way to the name Square Root Information Filter (SRIF) because A is the square root of the information matrix (A'A).
Consult Gerald Bierman's "Factorization Methods for Discrete Sequential Estimation" for a detailed description, or the original work on this method (I believe), Gene Golub's short paper "Numerical Methods for Solving Linear Least Squares Problems"