How to use the backslash operator in Julia? - operators

I am currently trying to invert huge matrices of order 1 million by 1 million and I figured that the Backslash operator will be helpful in doing this. Any idea as to how it's implemented?. I did not find any concrete examples so any help is much appreciated.

Any idea as to how it's implemented?
It's a multialgorithm. This shows how to use it:
julia> A = rand(10,10)
10×10 Array{Float64,2}:
0.330453 0.294142 0.682869 0.991427 … 0.533443 0.876566 0.157157
0.666233 0.47974 0.172657 0.427015 0.501511 0.0978822 0.634164
0.829653 0.380123 0.589555 0.480963 0.606704 0.642441 0.159564
0.709197 0.570496 0.484826 0.17325 0.699379 0.0281233 0.66744
0.478663 0.87298 0.488389 0.188844 0.38193 0.641309 0.448757
0.471705 0.804767 0.420039 0.0528729 … 0.658368 0.911007 0.705696
0.679734 0.542958 0.22658 0.977581 0.197043 0.717683 0.21933
0.771544 0.326557 0.863982 0.641557 0.969889 0.382148 0.508773
0.932684 0.531116 0.838293 0.031451 0.242338 0.663352 0.784813
0.283031 0.754613 0.938358 0.0408097 0.609105 0.325545 0.671151
julia> b = rand(10)
10-element Array{Float64,1}:
0.0795157
0.219318
0.965155
0.896807
0.701626
0.741823
0.954437
0.573683
0.493615
0.0821557
julia> A\b
10-element Array{Float64,1}:
1.47909
2.39816
-0.15789
0.144003
-1.10083
-0.273698
-0.775122
0.590762
-0.0266894
-2.36216
You can use #which to see how it's defined:
julia> #which A\b
\(A::AbstractArray{T,2} where T, B::Union{AbstractArray{T,1}, AbstractArray{T,2}} where T) in Base.LinAlg at linalg\generic.jl:805
Which leads us here: https://github.com/JuliaLang/julia/blob/master/base/linalg/generic.jl#L827 (line numbers change slightly because of version differences). As you can see, it does a few quick function calls to determine what type of matrix it is. istril finds out of its lower triangular: https://github.com/JuliaLang/julia/blob/master/base/linalg/generic.jl#L987 , etc. Once it determines the matrix type, it specializes the matrix as much as possible so it can be efficient, and then calls \. These specialized matrix types either perform a factorization which then \ does the backsubstitution (which is a nice way to use \ on your own BTW to re-use the factorization), or it "directly knows" the answer, like for triangular or diagonal matrices.
Can't get more concrete than the source.
Note that \ is slightly different than just inverting. You usually do not want to invert a matrix, let alone a large matrix. These factorizations are much more numerically stable. However, inv will do an inversion, which is a lot like an LU-factorization (which in Julia is lufact). You may also want to look into pinv for the psudo-inverse in some cases where the matrix is singular or close to singular, but you should really avoid this an instead factorize + solve the system instead of using the inverse.
For very large sparse matrices, you'll want to use iterative solvers. You'll find a lot of implementations in IterativeSolvers.jl

Related

Faster than numpy.matmul for multiplying matrix by its transpose

I have a large 2D NumPy array M in python, and I want to compute numpy.matmul(M, M.T), or equivalently, numpy.dot(M, M.T).
However, numpy.matmul and numpy.dot won't exploit the symmetry involved in multiplication with the transpose, so I believe I am doing twice the work that I really need to do.
Is there an easy way to make this faster by exploiting the symmetry and only doing half the work? Perhaps there is a NumPy/SciPy function or some other python library I'm not aware of that accomplishes this?
Someone informed me that numpy actually already accounts for the symmetry:
https://github.com/numpy/numpy/blob/9a1229f86ca4d4041c9aa48027a21c7ad97da748/numpy/core/src/umath/matmul.c.src#L157

Errors to fit parameters of scipy.optimize

I use the scipy.optimize.minimize ( https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html ) function with method='L-BFGS-B.
An example of what it returns is here above:
fun: 32.372210618549758
hess_inv: <6x6 LbfgsInvHessProduct with dtype=float64>
jac: array([ -2.14583906e-04, 4.09272616e-04, -2.55795385e-05,
3.76587650e-05, 1.49213975e-04, -8.38440428e-05])
message: 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH'
nfev: 420
nit: 51
status: 0
success: True
x: array([ 0.75739412, -0.0927572 , 0.11986434, 1.19911266, 0.27866406,
-0.03825225])
The x value correctly contains the fitted parameters. How do I compute the errors associated to those parameters?
TL;DR: You can actually place an upper bound on how precisely the minimization routine has found the optimal values of your parameters. See the snippet at the end of this answer that shows how to do it directly, without resorting to calling additional minimization routines.
The documentation for this method says
The iteration stops when (f^k - f^{k+1})/max{|f^k|,|f^{k+1}|,1} <= ftol.
Roughly speaking, the minimization stops when the value of the function f that you're minimizing is minimized to within ftol of the optimum. (This is a relative error if f is greater than 1, and absolute otherwise; for simplicity I'll assume it's an absolute error.) In more standard language, you'll probably think of your function f as a chi-squared value. So this roughly suggests that you would expect
Of course, just the fact that you're applying a minimization routine like this assumes that your function is well behaved, in the sense that it's reasonably smooth and the optimum being found is well approximated near the optimum by a quadratic function of the parameters xi:
where Δxi is the difference between the found value of parameter xi and its optimal value, and Hij is the Hessian matrix. A little (surprisingly nontrivial) linear algebra gets you to a pretty standard result for an estimate of the uncertainty in any quantity X that's a function of your parameters xi:
which lets us write
That's the most useful formula in general, but for the specific question here, we just have X = xi, so this simplifies to
Finally, to be totally explicit, let's say you've stored the optimization result in a variable called res. The inverse Hessian is available as res.hess_inv, which is a function that takes a vector and returns the product of the inverse Hessian with that vector. So, for example, we can display the optimized parameters along with the uncertainty estimates with a snippet like this:
ftol = 2.220446049250313e-09
tmp_i = np.zeros(len(res.x))
for i in range(len(res.x)):
tmp_i[i] = 1.0
hess_inv_i = res.hess_inv(tmp_i)[i]
uncertainty_i = np.sqrt(max(1, abs(res.fun)) * ftol * hess_inv_i)
tmp_i[i] = 0.0
print('x^{0} = {1:12.4e} ± {2:.1e}'.format(i, res.x[i], uncertainty_i))
Note that I've incorporated the max behavior from the documentation, assuming that f^k and f^{k+1} are basically just the same as the final output value, res.fun, which really ought to be a good approximation. Also, for small problems, you can just use np.diag(res.hess_inv.todense()) to get the full inverse and extract the diagonal all at once. But for large numbers of variables, I've found that to be a much slower option. Finally, I've added the default value of ftol, but if you change it in an argument to minimize, you would obviously need to change it here.
One approach to this common problem is to use scipy.optimize.leastsq after using minimize with 'L-BFGS-B' starting from the solution found with 'L-BFGS-B'. That is, leastsq will (normally) include and estimate of the 1-sigma errors as well as the solution.
Of course, that approach makes several assumption, including that leastsq can be used and may be appropriate for solving the problem. From a practical view, this requires the objective function return an array of residual values with at least as many elements as variables, not a cost function.
You may find lmfit (https://lmfit.github.io/lmfit-py/) useful here: It supports both 'L-BFGS-B' and 'leastsq' and gives a uniform wrapper around these and other minimization methods, so that you can use the same objective function for both methods (and specify how to convert the residual array into the cost function). In addition, parameter bounds can be used for both methods. This makes it very easy to first do a fit with 'L-BFGS-B' and then with 'leastsq', using the values from 'L-BFGS-B' as starting values.
Lmfit also provides methods to more explicitly explore confidence limits on parameter values in more detail, in case you suspect the simple but fast approach used by leastsq might be insufficient.
It really depends what you mean by "errors". There is no general answer to your question, because it depends on what you're fitting and what assumptions you're making.
The easiest case is one of the most common: when the function you are minimizing is a negative log-likelihood. In that case the inverse of the hessian matrix returned by the fit (hess_inv) is the covariance matrix describing the Gaussian approximation to the maximum likelihood.The parameter errors are the square root of the diagonal elements of the covariance matrix.
Beware that if you are fitting a different kind of function or are making different assumptions, then that doesn't apply.

Update submatrix in Tensorflow

Quite simply, what I want to do is the following
A = np.ones((3,3)) #arbitrary matrix
B = np.ones((2,2)) #arbitrary matrix
A[1:,1:] = A[1:,1:] + B
except in Tensorflow (where the matrices can be arbitrarily complicated tensor expressions). Neither A nor B is a Tensorflow Variable, but just a run-of-the-mill tensor.
What I have gathered so far: tensors are immutable, so I cannot assign to a submatrix. tf.scatter_nd is the current option for sub-assignment, but does not appear to support sub-matrices, only slices.
Methods that should work, but are perhaps not ideal:
I could pad B with zeros, but I'm sure this leads to instantiation of
an unnecessarily large B - can it be made sparse, maybe?
I could use the padding idea, but write it as a low-rank decomposition, e.g. in Numpy: A+U.dot(B).U.T where U is a stacked zero and identity matrix. I'm not sure this is actually advantageous.
I could split A into submatrices, and stack them back together. Might be the most efficient, but sounds like the code would be convoluted.
Ideally, I want to do this operation N times for progressively smaller matrices, resulting in one large final result, but this is tangential.
I'll use one of the hacks for now, but I'm hoping someone can tell me what the idiomatic version is!

Numpy/Scipy pinv and pinv2 behave differently

I am working with bidimensional arrays on Numpy for Extreme Learning Machines. One of my arrays, H, is random, and I want to compute its pseudoinverse.
If I use scipy.linalg.pinv2 everything runs smoothly. However, if I use scipy.linalg.pinv, sometimes (30-40% of the times) problems arise.
The reason why I am using pinv2 is because I read (here: http://vene.ro/blog/inverses-pseudoinverses-numerical-issues-speed-symmetry.html ) that pinv2 performs better on "tall" and on "wide" arrays.
The problem is that, if H has a column j of all 1, pinv(H) has huge coefficients at row j.
This is in turn a problem because, in such cases, np.dot(pinv(H), Y) contains some nan values (Y is an array of small integers).
Now, I am not into linear algebra and numeric computation enough to understand if this is a bug or some precision related property of the two functions. I would like you to answer this question so that, if it's the case, I can file a bug report (honestly, at the moment I would not even know what to write).
I saved the arrays with np.savetxt(fn, a, '%.2e', ';'): please, see https://dl.dropboxusercontent.com/u/48242012/example.tar.gz to find them.
Any help is appreciated. In the provided file, you can see in pinv(H).csv that rows 14, 33, 55, 56 and 99 have huge values, while in pinv2(H) the same rows have more decent values.
Your help is appreciated.
In short, the two functions implement two different ways to calculate the pseudoinverse matrix:
scipy.linalg.pinv uses least squares, which may be quite compute intensive and take up a lot of memory.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.pinv.html#scipy.linalg.pinv
scipy.linalg.pinv2 uses SVD (singular value decomposition), which should run with a smaller memory footprint in most cases.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.pinv2.html#scipy.linalg.pinv2
numpy.linalg.pinv also implements this method.
As these are two different evaluation methods, the resulting matrices will not be the same. Each method has its own advantages and disadvantages, and it is not always easy to determine which one should be used without deeply understanding the data and what the pseudoinverse will be used for. I'd simply suggest some trial-and-error and use the one which gives you the best results for your classifier.
Note that in some cases these functions cannot converge to a solution, and will then raise a scipy.stats.LinAlgError. In that case you may try to use the second pinv implementation, which will greatly reduce the amount of errors you receive.
Starting from scipy 1.7.0 , pinv2 is deprecated and also replaced by a SVD solution.
DeprecationWarning: scipy.linalg.pinv2 is deprecated since SciPy 1.7.0, use scipy.linalg.pinv instead
That means, numpy.pinv, scipy.pinv and scipy.pinv2 now compute all equivalent solutions. They are also equally fast in their computation, with scipy being slightly faster.
import numpy as np
import scipy
arr = np.random.rand(1000, 2000)
res1 = np.linalg.pinv(arr)
res2 = scipy.linalg.pinv(arr)
res3 = scipy.linalg.pinv2(arr)
np.testing.assert_array_almost_equal(res1, res2, decimal=10)
np.testing.assert_array_almost_equal(res1, res3, decimal=10)

Optimize Blas-like operation - A`*B*A

Given two matrices, A and B, where B is symetric (and positive semi-definite), What is the best (fastest) way to calculate A`*B*A?
Currently, using BLAS, I first compute C=B*A using dsymm (introducing a temporary matrix C) and then A`*C using dgemm.
Is there a better (faster, no temporaries) way to do this using BLAS and mkl?
Thanks.
I'll offer somekind of answer: Compared to the general case A*B*C you know that the end result is symmetric matrix. After computing C=B*A with BLAS subroutine dsymm, you want to compute A'C, but you only need to compute the upper diagonal part of the matrix and the copy the strictly upper diagonal part to the lower diagonal part.
Unfortunately there doesn't seem to be a BLAS routine where you can claim beforehand that given two general matrices, the output matrix will be symmetric. I'm not sure if it would be beneficial to write you own function for this. This probably depends on the size of your matrices and the implementation.
EDIT:
This idea seems to be addressed recently here: A Matrix Multiplication Routine that Updates Only the Upper or Lower Triangular Part of the Result Matrix