How can I make scipy.odeint to be faster? - optimization

I am currently solving an integrated system of 559 non linear differential equations.I have to fit the solutions obtained to some experimental data by varying the constants c1,c2 b and g.
I am using scipy.odeint and I would like to know if there is a way to make my program faster as it takes ages to run.
The code is this:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import random as rd
from numba import jit
L=np.loadtxt('C:/Users/Pablo/Desktop/TFG/Probas/matriz_L_Pablo.txt')
I=np.loadtxt('C:/Users/Pablo/Desktop/TFG/Probas/vector_I_Pablo.txt')
k=np.diag(L)
n=len(k) #Contamos o numero de nodos
u=np.zeros(n)
for i in range (n):
u[i]=rd.random()
M=np.zeros((n,n))
derivs=np.zeros(n)
c1=100 ; c2=10000 ; b=0.01 ; g=1
#jit
def f(y,t,params):
suma=0
c1,c2,b,g=params
for i in range(n):
for j in range(n):
if i==j:
M[i,i]=(1-y[i]/b)+g*(1-y[i])+c2*I[i]*(1/n-1)
if i!=j:
M[i,j]=(1/n)*(c1*L[i,j]+c2*I[i])
out=(M[i,j]*y[j])
suma=suma+out
derivs[i]=suma
suma=0
return derivs
#Condicions iniciais
y0=u
#lista cos parametros
params=[c1,c2,b,g]
#tempos de int
tf=1
deltat=0.001
t=np.arange(0,tf,deltat)
#solucion
sol= odeint(f, y0,t, args=(params,))
(Sorry if it is not very clear it's my first time here)

You can try vectorizing your code. The function f does 2 things - first it creates the matrix M, and then does the multiplication $$My$$. The multiplication $$My$$ is easy to vectorize because all we have to do is use numpy's matmul function.
def f(y,t,params):
suma=0
c1,c2,b,g=params
for i in range(n):
for j in range(n):
if i==j:
M[i,i]=(1-y[i]/b)+g*(1-y[i])+c2*I[i]*(1/n-1)
if i!=j:
M[i,j]=(1/n)*(c1*L[i,j]+c2*I[i])
return np.matmul(M,y)
That should help with runtime a bit. But the most time consuming part is the fact that the entire matrix M is formed every time f is called, and that it is formed one element at a time.
The only parts of M that need to be modified when calling f, are the parts which depend on y. So all of the off diagonal entries in M can be filled before the ode solver is called. So if M is (569x569), instead of having to calculate all ~250000+ elements of M every time f is called, you would only have to calculate the 569 elements of M on the diagonal. The remaining 250000 entries of M don't depend on y, and are specified before calling the ode solver. Making this modification should result in a huge speedup as this seems to be the main bottleneck in your code.
Lastly, you could also vectorize how the diagonal of M is filled by using something like numpy.diag_indices.

Related

numpy - find all pixels near a set of pixels

I have a PIL.Image object input of mode '1' (a black & white bitmap) and I would like to determine, for every pixel in the image, whether it's within n pixels (Euclidean distance - n may be around 100 or so) of any of the white pixels.
The motivation is: input represents every pixel that is different between two other images, and I would like to create a highlight region around all those differences to show clearly where the differences occur.
So far I haven't been able to find a fast algorithm for this - the following code works, but the convolution is very slow because the kernel argument is larger than the convolution can apparently handle efficiently:
from scipy import ndimage
import numpy as np
from PIL import Image
n = 100
y, x = np.ogrid[:2*n, :2*n]
kernel = (x-n)**2 + (y-n)**2 <= n**2
img = Image.open('input.png')
result = ndimage.convolve(np.array(img), kernel) != 0
Image.fromarray(result).save('result.png')
Example input input.png:
Desired output result.png (there are also some undesired artifacts here that I assume come from over/underflow):
Even with these small images, the computation takes 30 seconds or so.
Can someone recommend a better procedure to compute this? Thanks.
ndimage.convolve use a very inefficient algorithm to perform the convolution certainly running in O(n m kn km) where (n,m) is the shape of the image and (kn, km) is the shape of the kernel. You can use an FFT to do that much more efficiently in O(n m log(n m)) time. Hopefully, scipy provide such a function. Here is an example of usage:
import scipy.signal
import numpy as np
from PIL import Image
n = 100
y, x = np.ogrid[:2*n, :2*n]
kernel = (x-n)**2 + (y-n)**2 <= n**2
img = Image.open('input.png')
result = scipy.signal.fftconvolve(img, kernel, mode='same') >= 1.0
Image.fromarray(result).save('result.png')
This is >500 times faster on my machine and this also fix the artefacts. Here is the result:

Fastest way to find nearest nonzero value in array from columns in pandas dataframe

I am looking for the nearest nonzero cell in a numpy 3d array based on the i,j,k coordinates stored in a pandas dataframe. My solution below works, but it is slower than I would like. I know my optimization skills are lacking, so I am hoping someone can help me find a faster option.
It takes 2 seconds to find the nearest non-zero for a 100 x 100 x 100 binary array, and I have hundreds of files, so any speed enhancements would be much appreciated!
a=np.random.randint(0,2,size=(100,100,100))
# df with i,j,k of interest
df=pd.DataFrame(np.random.randint(100,size=(100,3)).tolist(),
columns=['i','j','k'])
def find_nearest(a,df):
import numpy as np
import pandas as pd
import time
t0=time.time()
nzi = np.nonzero(a)
for i,r in df.iterrows():
dist = ((r['k'] - nzi[0])**2 + \
(r['i'] - nzi[1])**2 + \
(r['j'] - nzi[2])**2)
nidx = dist.argmin()
df.loc[i,['nk','ni','nj']]=(nzi[0][nidx],
nzi[1][nidx],
nzi[2][nidx])
print(time.time()-t0)
return(df)
The problem that you are trying to solve looks like a nearest-neighbor search.
The worst-case complexity of the current code is O(n m) with n the number of point to search and m the number of neighbour candidates. With n = 100 and m = 100**3 = 1,000,000, this means about hundreds of million iterations. To solve this efficiently, one can use a better algorithm.
The common way to solve this kind of problem consists in putting all elements in a BSP-Tree data structure (such as Quadtree or Octree. Such a data structure helps you to locate the nearest elements near a location in a O(log(m)) time. As a result, the overall complexity of this method is O(n log(m))! SciPy already implement KD-trees.
Vectorization generally also help to speed up the computation.
def find_nearest_fast(a,df):
from scipy.spatial import KDTree
import numpy as np
import pandas as pd
import time
t0=time.time()
candidates = np.array(np.nonzero(a)).transpose().copy()
tree = KDTree(candidates, leafsize=1024, compact_nodes=False)
searched = np.array([df['k'], df['i'], df['j']]).transpose()
distances, indices = tree.query(searched)
nearestPoints = candidates[indices,:]
df[['nk', 'ni', 'nj']] = nearestPoints
print(time.time()-t0)
return df
This implementation is 16 times faster on my machine. Note the results differ a bit since there are multiple nearest points for a given input point (with the same distance).

Can't minimize function

I just want to minimize a simple function, every example i' ve watch didnt get me anywhere.
import math
import numpy as np
import sympy as sp
from scipy.optimize import minimize
import scipy.optimize as optimize
R=1.5
k_1=2
a=1
n=a
alpha=0.25
beta=0.5
delta=0.9
def f_gob(x, y, z):
c_1=((1/x-y/x)+R*k_1)/(1+delta*(1+alpha))
c_2=delta*x*(((1/x-y/x)+R*k_1)/(1+delta*(1+alpha)))
l=n-(alpha*(delta*x*(((1/x-y/x)+R*k_1)/((1+delta*(1+alpha))))))/(1-y)
return -1*(math.log(c_1)+delta*(math.log(c_2)+alpha*math.log(n-l)+beta*math.log(z)))
f_gob(0.9996,0.332,0.7765)
x0 = [0.8,0.2,0.6]
res = minimize(f_gob, x0)
Thank you very much.
Better is:
def f_gob(a):
x = a[0]
y = a[1]
z = a[2]
c_1= ((1/x-y/x)+R*k_1)/(1+delta*(1+alpha))
c_2=delta*x*c_1
l=n-(alpha*c_2)/(1-y)
return -1*(math.log(c_1)+delta*(math.log(c_2)+alpha*math.log(n-l)+beta*math.log(z)))
f_gob([0.9996,0.332,0.7765])
The main issue is that the current levels of the three decision variables x,y,z are passed on as a single array, which I call a. I just unpack the individual members to keep things close to what you had. Passing things on as an array makes sense, especially if you want to allow for large numbers of variables (say hundreds).
For further information see the documentation: the third sentence explains the format of the function to be called. Also check the examples.

Locally weighted smoothing for binary valued random variable

I have a random variable as follows:
f(x) = 1 with probability g(x)
f(x) = 0 with probability 1-g(x)
where 0 < g(x) < 1.
Assume g(x) = x. Let's say I am observing this variable without knowing the function g and obtained 100 samples as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binned_statistic
list = np.ndarray(shape=(200,2))
g = np.random.rand(200)
for i in range(len(g)):
list[i] = (g[i], np.random.choice([0, 1], p=[1-g[i], g[i]]))
print(list)
plt.plot(list[:,0], list[:,1], 'o')
Plot of 0s and 1s
Now, I would like to retrieve the function g from these points. The best I could think is to use draw a histogram and use the mean statistic:
bin_means, bin_edges, bin_number = binned_statistic(list[:,0], list[:,1], statistic='mean', bins=10)
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], lw=2)
Histogram mean statistics
Instead, I would like to have a continuous estimation of the generating function.
I guess it is about kernel density estimation but I could not find the appropriate pointer.
straightforward without explicitly fitting an estimator:
import seaborn as sns
g = sns.lmplot(x= , y= , y_jitter=.02 , logistic=True)
plug in x= your exogenous variable and analogously y = dependent variable. y_jitter is jitter the point for better visibility if you have a lot of data points. logistic = True is the main point here. It will give you the logistic regression line of the data.
Seaborn is basically tailored around matplotlib and works great with pandas, in case you want to extend your data to a DataFrame.

NumPy vectorization with integration

I have a vector and wish to make another vector of the same length whose k-th component is
The question is: how can we vectorize this for speed? NumPy vectorize() is actually a for loop, so it doesn't count.
Veedrac pointed out that "There is no way to apply a pure Python function to every element of a NumPy array without calling it that many times". Since I'm using NumPy functions rather than "pure Python" ones, I suppose it's possible to vectorize, but I don't know how.
import numpy as np
from scipy.integrate import quad
ws = 2 * np.random.random(10) - 1
n = len(ws)
integrals = np.empty(n)
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
def temp(x): return np.array([f(x, w) for w in ws]).sum()
def integrand(x, w): return f(x, w) * np.log(temp(x))
## Python for loop
for k in range(n):
integrals[k] = quad(integrand, -1, 1, args = ws[k])[0]
## NumPy vectorize
integrals = np.vectorize(quad)(integrand, -1, 1, args = ws)[0]
On a side note, is a Cython for loop always faster than NumPy vectorization?
The function quad executes an adaptive algorithm, which means the computations it performs depend on the specific thing being integrated. This cannot be vectorized in principle.
In your case, a for loop of length 10 is a non-issue. If the program takes long, it's because integration takes long, not because you have a for loop.
When you absolutely need to vectorize integration (not in the example above), use a non-adaptive method, with the understanding that precision may suffer. These can be directly applied to a 2D NumPy array obtained by evaluating all of your functions on some regularly spaced 1D array (a linspace). You'll have to choose the linspace yourself since the methods aren't adaptive.
numpy.trapz is the simplest and least precise
scipy.integrate.simps is equally easy to use and more precise (Simpson's rule requires an odd number of samples, but the method works around having an even number, too).
scipy.integrate.romb is in principle of higher accuracy than Simpson (for smooth data) but it requires the number of samples to be 2**n+1 for some integer n.
#zaq's answer focusing on quad is spot on. So I'll look at some other aspects of the problem.
In recent https://stackoverflow.com/a/41205930/901925 I argue that vectorize is of most value when you need to apply the full broadcasting mechanism to a function that only takes scalar values. Your quad qualifies as taking scalar inputs. But you are only iterating on one array, ws. The x that is passed on to your functions is generated by quad itself. quad and integrand are still Python functions, even if they use numpy operations.
cython improves low level iteration, stuff that it can convert to C code. Your primary iteration is at a high level, calling an imported function, quad. Cython can't touch or rewrite that.
You might be able to speed up integrand (and on down) with cython, but first focus on getting the most speed from that with regular numpy code.
def f(x, w):
if w < 0: return np.abs(x * w)
else: return np.exp(x) * w
With if w<0 w must be scalar. Can it be written so it works with an array w? If so, then
np.array([f(x, w) for w in ws]).sum()
could be rewritten as
fn(x, ws).sum()
Alternatively, since both x and w are scalar, you might get a bit of speed improvement by using math.exp etc instead of np.exp. Same for log and abs.
I'd try to write f(x,w) so it takes arrays for both x and w, returning a 2d result. If so, then temp and integrand would also work with arrays. Since quad feeds a scalar x, that may not help here, but with other integrators it could make a big difference.
If f(x,w) can be evaluated on a regular nx10 grid of x=np.linspace(-1,1,n) and ws, then an integral (of sorts) just requires a couple of summations over that space.
You can use quadpy for fully vectorized computation. You'll have to adapt your function to allow for vector inputs first, but that is done rather easily:
import numpy as np
import quadpy
np.random.seed(0)
ws = 2 * np.random.random(10) - 1
def f(x):
out = np.empty((len(ws), *x.shape))
out0 = np.abs(np.multiply.outer(ws, x))
out1 = np.multiply.outer(ws, np.exp(x))
out[ws < 0] = out0[ws < 0]
out[ws >= 0] = out1[ws >= 0]
return out
def integrand(x):
return f(x) * np.log(np.sum(f(x), axis=0))
val, err = quadpy.quad(integrand, -1, +1, epsabs=1.0e-10)
print(val)
[0.3266534 1.44001826 0.68767868 0.30035222 0.18011948 0.97630376
0.14724906 2.62169217 3.10276876 0.27499376]