Efficient solving of generalised eigenvalue problems in python - numpy

Given an eigenvalue problem Ax = λBx what is the more efficient way to solve it out of the two shown here:
import scipy as sp
import numpy as np
def geneivprob(A,B):
# Use scipy
lamda, eigvec = sp.linalg.eig(A, B)
return lamda, eigvec
def geneivprob2(A,B):
# Reduce the problem to a standard symmetric eigenvalue problem
Linv = np.linalg.inv(np.linalg.cholesky(B))
C = Linv # A # Linv.transpose()
#C = np.asmatrix((C + C.transpose())*0.5,np.float32)
lamda,V = np.linalg.eig(C)
return lamda, Linv.transpose() # V
I saw the second version in a codebase and was wondering if it was better than simply using scipy.

Well there is no obvious advantage in using the second approach, maybe for some class of matrices it will be better, I would suggest you to test with the problems you want to solve. Since you are transforming the eigenvectors, this will also transform how the errors affect the solution, and maybe that is the reason for using this second method, not efficiency, but numerical accuracy, or convergence.
Another thing is that the second method will only work for symmetric B.

Related

How to setup a batched matrix multiplication in Numba with np.dot() using contiguous arrays

I am trying to speed up a batched matrix multiplication problem with numba, but it keeps telling me that it's faster with contiguous code.
Note: I'm using numba version 0.55.1, and numpy version 1.21.5
Here's the problem:
import numpy as np
import numba as nb
def numbaFastMatMult(mat,vec):
result = np.zeros_like(vec)
for n in nb.prange(vec.shape[0]):
result[n,:] = np.dot(vec[n,:], mat[n,:,:])
return result
D,N = 10,1000
mat = np.random.normal(0,1,(N,D,D))
vec = np.random.normal(0,1,(N,D))
result = numbaFastMatMult(mat,vec)
print(mat.data.contiguous)
print(vec.data.contiguous)
print(mat[n,:,:].data.contiguous)
print(vec[n,:].data.contiguous)
clearly all the relevant data is contiguous (run the above code snippet and see the results of print()...
But, when I run this code, I get the following warning:
NumbaPerformanceWarning: np.dot() is faster on contiguous arrays, called on (array(float64, 1d, C), array(float64, 2d, A))
result[n,:] = np.dot(vec[n,:], mat[n,:,:])
2 Extra comments:
This is just a toy problem for replication. I'm actually using something with many more data points, so hoping this will speed up.
I think the "right" way to solve this is with np.tensordot. However, I want to understand what's going on for future reference. For example, this discussion addresses a similar issue, but as far as I can tell, doesn't address why the warning shows up directly.
I've tried adding a decorator:
nb.float64[:,::1](nb.float64[:,:,::1],nb.float64[:,::1]),
I've tried reordering the arrays so the batch index is first (n in the above code)
I've tried printing whether the "mat" variable is contiguous from inside the function
I'll leave this up, but I figured it out:
Outside of a numba function:
mat[n,:,:].data.contiguous==True
but inside numba, mat[n,:,:] is no longer continous.
Changing my code above to np.dot(vec[n], mat[n]) removed the warning.
I'm making this the "correct" answer since it solved my problem. However, according to max9111's response, this behavior may be a bug!

Equation constraints with scipy least_squares

I'm trying to use least square to minimize a loss function by changing x,y,z. My problem is nonlinear hence why i chose scipy least_squares. The general structure is:
from scipy.optimize import least_squares
def loss_func(x, *arguments):
# plug x's and args into an arbitrary equation and return loss
return loss # loss here is an array
# x_arr contains x,y,z
res = least_squares(loss_func, x_arr, args=arguments)
I am trying to constraint x,y,z by: x-y = some value, z-y = some value. How do I go about doing so? The scipy least_squares documentation only provided bounds. I understand I can create bounds like 0<x<5. However, my constraints is an equation and not a constant bound. Thank you in advance!
If anyone ever stumble on this question, I've figured out how to overcome this issue. Since least_squares does not have constraints, it is best to just use linear programming with scipy.optimize.minimize. Since the loss_func returns an array of residuals, we can use L1 norm (as we want to minimize the absolute difference of this array of residuals).
from scipy.optimize import minimize
import numpy as np
def loss_func(x, *arguments):
# plug x's and args into an arbitrary equation and return loss (loss is an array)
return np.linalg.norm(loss, 1)
The bounds can be added to scipy.optimize.minimize fairly easily:)

In OpenMDAO's ExecComp, is shape_by_conn compatible with has_diag_partials?

I have an om.ExecComp that performs a simple operation:
"d_sq = x**2 + y**2"
where x, y, and d_sq are always 1D np.arrays. I'd like to be able to use this with large arrays without allocating a large dense matrix. I'd also like the length of the array to be configured based on the shape of the connections.
However, if I specify x={"shape_by_conn": True} rather than x={"shape":100000}, even if I also have has_diag_partials=True, it attempts to allocate a 100000^2 array. Is there a way to make these two options compatible?
First, I'll note that you're using ExecComp a bit outside its design intended purpose. Thats not to say that you're something totally invalid, but generally speaking ExecComp was designed for small, cheap calculations. Passing it giant arrays is not something we test for.
That being said, I think what you want will work. When you use shape_by_conn in this you need to be sure to size both your inputs and outputs. I've provided an example, along with a manually defined component that does the same thing. Since your equations are pretty simple, this would be a little faster overall.
import numpy as np
import openmdao.api as om
class SparseCalc(om.ExplicitComponent):
def setup(self):
self.add_input('x', shape_by_conn=True)
self.add_input('y', shape_by_conn=True)
self.add_output('d_sq', shape_by_conn=True, copy_shape='x')
def setup_partials(self):
# when using shape_by_conn, you need to delcare partials
# in this secondary method
md = self.get_io_metadata(iotypes='input')
# everything should be the same shape, so just need this one
x_shape = md['x']['shape']
row_col = np.arange(x_shape[0])
self.declare_partials('d_sq', 'x', rows=row_col, cols=row_col)
self.declare_partials('d_sq', 'y', rows=row_col, cols=row_col)
def compute(self, inputs, outputs):
outputs['d_sq'] = inputs['x']**2 + inputs['y']**2
def compute_partials(self, inputs, J):
J['d_sq', 'x'] = 2*inputs['x']
J['d_sq', 'y'] = 2*inputs['y']
if __name__ == "__main__":
p = om.Problem()
# use IVC here, because you have to have something connected to
# in order to use shape_by_conn. Normally IVC is not needed
ivc = p.model.add_subsystem('ivc', om.IndepVarComp(), promotes=['*'])
ivc.add_output('x', 3*np.ones(10))
ivc.add_output('y', 2*np.ones(10))
# p.model.add_subsystem('sparse_calc', SparseCalc(), promotes=['*'])
p.model.add_subsystem('sparse_exec_calc',
om.ExecComp('d_sq = x**2 + y**2',
x={'shape_by_conn':True},
y={'shape_by_conn':True},
d_sq={'shape_by_conn':True,
'copy_shape':'x'},
has_diag_partials=True),
promotes=['*'])
p.setup(force_alloc_complex=True)
p.run_model()
If you still find this isn't working as expected, please feel free to submit a bug report with a test case that shows the problem clearly (i.e. will raise the error you're seeing). In this case, the provided manual component can serve as a workaround.

NumPy vectorization with integration

I have a vector and wish to make another vector of the same length whose k-th component is
The question is: how can we vectorize this for speed? NumPy vectorize() is actually a for loop, so it doesn't count.
Veedrac pointed out that "There is no way to apply a pure Python function to every element of a NumPy array without calling it that many times". Since I'm using NumPy functions rather than "pure Python" ones, I suppose it's possible to vectorize, but I don't know how.
import numpy as np
from scipy.integrate import quad
ws = 2 * np.random.random(10) - 1
n = len(ws)
integrals = np.empty(n)
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
def temp(x): return np.array([f(x, w) for w in ws]).sum()
def integrand(x, w): return f(x, w) * np.log(temp(x))
## Python for loop
for k in range(n):
integrals[k] = quad(integrand, -1, 1, args = ws[k])[0]
## NumPy vectorize
integrals = np.vectorize(quad)(integrand, -1, 1, args = ws)[0]
On a side note, is a Cython for loop always faster than NumPy vectorization?
The function quad executes an adaptive algorithm, which means the computations it performs depend on the specific thing being integrated. This cannot be vectorized in principle.
In your case, a for loop of length 10 is a non-issue. If the program takes long, it's because integration takes long, not because you have a for loop.
When you absolutely need to vectorize integration (not in the example above), use a non-adaptive method, with the understanding that precision may suffer. These can be directly applied to a 2D NumPy array obtained by evaluating all of your functions on some regularly spaced 1D array (a linspace). You'll have to choose the linspace yourself since the methods aren't adaptive.
numpy.trapz is the simplest and least precise
scipy.integrate.simps is equally easy to use and more precise (Simpson's rule requires an odd number of samples, but the method works around having an even number, too).
scipy.integrate.romb is in principle of higher accuracy than Simpson (for smooth data) but it requires the number of samples to be 2**n+1 for some integer n.
#zaq's answer focusing on quad is spot on. So I'll look at some other aspects of the problem.
In recent https://stackoverflow.com/a/41205930/901925 I argue that vectorize is of most value when you need to apply the full broadcasting mechanism to a function that only takes scalar values. Your quad qualifies as taking scalar inputs. But you are only iterating on one array, ws. The x that is passed on to your functions is generated by quad itself. quad and integrand are still Python functions, even if they use numpy operations.
cython improves low level iteration, stuff that it can convert to C code. Your primary iteration is at a high level, calling an imported function, quad. Cython can't touch or rewrite that.
You might be able to speed up integrand (and on down) with cython, but first focus on getting the most speed from that with regular numpy code.
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
With if w<0 w must be scalar. Can it be written so it works with an array w? If so, then
np.array([f(x, w) for w in ws]).sum()
could be rewritten as
fn(x, ws).sum()
Alternatively, since both x and w are scalar, you might get a bit of speed improvement by using math.exp etc instead of np.exp. Same for log and abs.
I'd try to write f(x,w) so it takes arrays for both x and w, returning a 2d result. If so, then temp and integrand would also work with arrays. Since quad feeds a scalar x, that may not help here, but with other integrators it could make a big difference.
If f(x,w) can be evaluated on a regular nx10 grid of x=np.linspace(-1,1,n) and ws, then an integral (of sorts) just requires a couple of summations over that space.
You can use quadpy for fully vectorized computation. You'll have to adapt your function to allow for vector inputs first, but that is done rather easily:
import numpy as np
import quadpy
np.random.seed(0)
ws = 2 * np.random.random(10) - 1
def f(x):
out = np.empty((len(ws), *x.shape))
out0 = np.abs(np.multiply.outer(ws, x))
out1 = np.multiply.outer(ws, np.exp(x))
out[ws < 0] = out0[ws < 0]
out[ws >= 0] = out1[ws >= 0]
return out
def integrand(x):
return f(x) * np.log(np.sum(f(x), axis=0))
val, err = quadpy.quad(integrand, -1, +1, epsabs=1.0e-10)
print(val)
[0.3266534 1.44001826 0.68767868 0.30035222 0.18011948 0.97630376
0.14724906 2.62169217 3.10276876 0.27499376]

Exponential decay curve fitting in numpy and scipy

I'm having a bit of trouble with fitting a curve to some data, but can't work out where I am going wrong.
In the past I have done this with numpy.linalg.lstsq for exponential functions and scipy.optimize.curve_fit for sigmoid functions. This time I wished to create a script that would let me specify various functions, determine parameters and test their fit against the data. While doing this I noticed that Scipy leastsq and Numpy lstsq seem to provide different answers for the same set of data and the same function. The function is simply y = e^(l*x) and is constrained such that y=1 at x=0.
Excel trend line agrees with the Numpy lstsq result, but as Scipy leastsq is able to take any function, it would be good to work out what the problem is.
import scipy.optimize as optimize
import numpy as np
import matplotlib.pyplot as plt
## Sampled data
x = np.array([0, 14, 37, 975, 2013, 2095, 2147])
y = np.array([1.0, 0.764317544, 0.647136491, 0.070803763, 0.003630962, 0.001485394, 0.000495131])
# function
fp = lambda p, x: np.exp(p*x)
# error function
e = lambda p, x, y: (fp(p, x) - y)
# using scipy least squares
l1, s = optimize.leastsq(e, -0.004, args=(x,y))
print l1
# [-0.0132281]
# using numpy least squares
l2 = np.linalg.lstsq(np.vstack([x, np.zeros(len(x))]).T,np.log(y))[0][0]
print l2
# -0.00313461628963 (same answer as Excel trend line)
# smooth x for plotting
x_ = np.arange(0, x[-1], 0.2)
plt.figure()
plt.plot(x, y, 'rx', x_, fp(l1, x_), 'b-', x_, fp(l2, x_), 'g-')
plt.show()
Edit - additional information
The MWE above includes a small sample of the dataset. When fitting the actual data the scipy.optimize.curve_fit curve presents an R^2 of 0.82, while the numpy.linalg.lstsq curve, which is the same as that calculated by Excel, has an R^2 of 0.41.
You are minimizing different error functions.
When you use numpy.linalg.lstsq, the error function being minimized is
np.sum((np.log(y) - p * x)**2)
while scipy.optimize.leastsq minimizes the function
np.sum((y - np.exp(p * x))**2)
The first case requires a linear dependency between the dependent and independent variables, but the solution is known analitically, while the second can handle any dependency, but relies on an iterative method.
On a separate note, I cannot test it right now, but when using numpy.linalg.lstsq, I you don't need to vstack a row of zeros, the following works as well:
l2 = np.linalg.lstsq(x[:, None], np.log(y))[0][0]
To expound a bit on Jaime's point, any non-linear transformation of the data will lead to a different error function and hence to different solutions. These will lead to different confidence intervals for the fitting parameters. So you have three possible criteria to use to make a decision: which error you want to minimize, which parameters you want more confidence in, and finally, if you are using the fitting to predict some value, which method yields less error in the interesting predicted value. Playing around a bit analytically and in Excel suggests that different kinds of noise in the data (e.g. if the noise function scales the amplitude, affects the time-constant or is additive) leads to different choices of solution.
I'll also add that while this trick "works" for exponential decay to 0, it can't be used in the more general (and common) case of damped exponentials (rising or falling) to values that cannot be assumed to be 0.