Friday, 11 October 2013

Linear algebra problem.

It's not immediately obvious, but any sort of solver, fitting, or minimization is all basically just linear algebra.  This week's work-related math problem was in the polynomial fitting code.  It only arises when the x-data is closely (like 1e-3) spaced.  How is this linear algebra?

Here's how you fit a polynomial.  First, define the Vandermonde matrix for a vector x as
VM[x] = [x.**n x.**(n-1) ... x.**2 x.**1 x.**0]
using matlab/octave style .-operators for element by element operations.  Basically, this has expanded the vector x into an array that has the various power terms in the columns.  You can then define a system of polynomials as VM[x] p, where the vector p contains the polynomial coefficients.  In the fitting form, this vector is what you're trying to find.  To do this, you do basically the same thing as regular fitting:
A p = b
(VM[x]^T VM[x]) p = (VM[x]^T y)
using some prior knowledge of how least squares fitting usually goes.  This is basically a matrix inversion (of A), but that's hard to do, so usually this kind of thing is solved with a LU-decomposition or the Gauss-Jordan method.  However, as mentioned above, if the x-data is closely spaced, this is failing, as our GJ implementation has some logic to prevent the construction row operations that have large scaling factors.

This leads to the idea of pre-scaling the data in x to prevent this from being a problem.  Define a new vector
x' = f x
where f is some scalar.  This means that
VM[x'] = VM[x] .* VM[f]
so
A' = (VM[x]^T VM[x]) .* (VM[f]^T VM[f])/length(x)
b' = (VM[x] .* VM[f])^T y
which has the solution
p = p' .* sqrt(diag(VM[f]^T VM[f])/length(x))

Great.  What's f?  This is where things start to be complicated.  Why does the GJ solver fail?  It's complicated, but basically boils down to the matrix A being ill-conditioned.  The degree of ill-conditioning for a matrix can be calculated as the condition number (with the most ill-conditioned matrices being non-invertable, where c_N = infinity).  The fastest way to do this is to calculate
c_N' = max(diag(A))/min(diag(A))
If this is large relative to some threshold (like 1e7), the matrix is ill-conditioned and should have some scaling.  Unfortunately, this is just a decent lower limit, so to do this correctly, you'd need to do something like
c_N = abs(max(eig(A))/min(eig(A))
or, even better:
[U S V] = svd(A)
c_N = max(diag(S))/min(diag(S))

In any case, I have the suspicion that the "optimum" f value is something like
f = sqrt(max( max(diag(S)), 1 / min(diag(S))
The idea being that you want to reduce the dynamic range of A in such a way that you no longer break the threshold.  One complication of this is that checking on the real matrix that's breaking things suggests that the "scaled" form that works is actually even more ill-conditioned than the unscaled, but since the test in the current code is uni-directional, this doesn't throw an error.

That was kind of hard to go through, so here's a Jigglypuff mocking Clefairies.