I am trying to optimize a double loop for an N-body integrator and I found that the problem with my code is that I'm incurring a massive overhead when I write stored variables into the memory view locations.
I originally had this code vectorized in numpy, but it was called inside another for loop to update the particle positions and the overhead was brutal. I have an np.ndarray Nx2 vector of positions (X) and I want to return an Nx2 vector of momentums (XOut) -- the current code listed below returns a memory view, but that's OK because I'd like to eventually embed this function in other Cython function once I've debugged this bottleneck.
I had tried the cython -a "name.pyx" command and found that I more or less have everything as a C-type. However, I found that towards the bottom of the loop, writing into the memoryview of XOut[ii,0] -= valuex is incurring most of the run time. If I change that into a constant so that XOut[ii,0] -= 5, the code is ~40X faster. I think this means I'm doing some sort of copy operation on that line which is slowing me down. My Cython/C++ backgrounds are rudimentary, but I think I need to change the syntax so that I'm writing into the memoryview from a pointer. Any advice would be greatly appreciative; Thanks!
import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef DTYPE_t pi = 3.141592653589793
#cython.cdivision(True)
#cython.boundscheck(False) # turn off bounds-checking for entire function
#cython.wraparound(False) # turn off negative index wrapping for entire function
def intTerms(const DTYPE_t[:,:] X, DTYPE_t epsilon, DTYPE_t[:,:] XOut):
cdef Py_ssize_t ii,jj,N
N = X.shape[0]
cdef DTYPE_t valuex,valuey,r2,xvec,yvec
for ii in range(0,N):
for jj in range(ii+1,N):
xvec = X[ii,0]-X[jj,0]
yvec = X[ii,1]-X[jj,1]
r2 = max(xvec**2+yvec**2,epsilon)
valuex = xvec/r2**2
valuey = yvec/r2**2
XOut[ii,0] -= valuex
XOut[ii,1] -= 5 #valuey
XOut[jj,0] += 5 #valuex
XOut[jj,1] += 5 #valuey
XOut[ii,0] /= 2*pi
XOut[ii,1] /= 2*pi
return XOut
OK, so the issue was the mathematical operations. Cython doesn't optimize the ** operator so I modified the code:
import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef DTYPE_t pi = 3.141592653589793
#cython.cdivision(True)
#cython.boundscheck(False) # turn off bounds-checking for entire function
#cython.wraparound(False) # turn off negative index wrapping for entire function
def intTerms(const DTYPE_t[:,:] X, DTYPE_t epsilon, DTYPE_t[:,:] XOut):
cdef Py_ssize_t ii,jj,N
N = X.shape[0]
cdef DTYPE_t valuex,valuey,r2,xvec,yvec
for ii in range(0,N-1):
for jj in range(ii+1,N):
xvec = X[ii,0]-X[jj,0]
yvec = X[ii,1]-X[jj,1]
r2 = max(xvec*xvec+yvec*yvec,epsilon)
valuex = xvec/r2/r2
valuey = yvec/r2/r2
XOut[ii,0] -= valuex
XOut[ii,1] -= valuey
XOut[jj,0] += valuex
XOut[jj,1] += valuey
XOut[ii,0] /= 2*pi
XOut[ii,1] /= 2*pi
return XOut
Changing valuex from xvec/r2**2 to xvec/r2/r2 and removing all instances of the ** operator sped up the loop to 9ms from 200ms for an 1800x2 array. I am still hopeful that a 4ms speed is possible, but I'll settle for 9ms for now.
I've created a dtype for my np.ndarrays:
particle_t = np.dtype([
('position', float, 2),
('momentum', float, 2),
('velocity', float, 2),
('force', float, 2),
('charge', int, 1),
])
According to the official examples one can call:
def my_func(np.ndarray[dtype, dim] particles):
but when I try to compile:
def tester(np.ndarray[particle_t, ndim = 1] particles):
I get the Invalid type error. Another possibility of usage I've seen is with the memory view like int[:]. Trying def tester(particle_t[:] particles): results in:
'particle_t' is not a type identifier.
How can I fix this?
Obviously, particle_t is not a type but a Python-object as far as Cython is concerned.
It is similar to np.int32 being a Python-object and thus
def tester(np.ndarray[np.int32] particles): #doesn't work!
pass
won't work, you need to use the corresponding type, i.e. np.int32_t:
def tester(np.ndarray[np.int32_t] particles): #works!
pass
But what is the corresponding type for particle_t? You need to create a packed struct, which would mirror your numpy-type. Here is a simplified version:
#Python code:
particle_t = np.dtype([
('position', np.float64, 2), #It is better to specify the number of bytes exactly!
('charge', np.int32, 1), #otherwise you might be surprised...
])
and corresponding Cython code:
%%cython
import numpy as np
cimport numpy as np
cdef packed struct cy_particle_t:
np.float64_t position_x[2]
np.int32_t charge
def tester(np.ndarray[cy_particle_t, ndim = 1] particles):
print(particles[0])
Not only does it compile and load, it also works as advertised:
>>> t=np.zeros(2, dtype=particle_t)
>>> t[:]=42
>>> tester(t)
{'charge': 42, 'position_x': [42.0, 42.0]}
I have a memoryview on a numpy array and want to copy the content of another numpy array into it by using this memoryview:
import numpy as np
cimport numpy as np
cdef double[:,::1] test = np.array([[0,1],[2,3]], dtype=np.double)
test[...] = np.array([[4,5],[6,7]], dtype=np.double)
But why is this not possible? It keeps me telling
TypeError: only length-1 arrays can be converted to Python scalars
Blockquote
It works fine if I copy from a memoryview to a memoryview, or from a numpy array to a numpy array, but how to copy from a numpy array to a memoryview?
These assignments work:
cdef double[:,::1] test2d = np.array([[0,1],[2,3],[4,5]], dtype=np.double)
cdef double[:,::1] temp = np.array([[4,5],[6,7]], dtype=np.double)
test2d[...] = 4
test2d[:,1] = np.array([5],dtype=np.double)
test2d[1:,:] = temp
print np.asarray(test2d)
displaying
[[ 4. 5.]
[ 4. 5.]
[ 6. 7.]]
I've added an answer at https://stackoverflow.com/a/30418422/901925 that uses this memoryview 'buffer' approach in a indented context.
cpdef int testfunc1c(np.ndarray[np.float_t, ndim=2] A,
double [:,:] BView) except -1:
cdef double[:,:] CView
if np.isnan(A).any():
return -1
else:
CView = la.inv(A)
BView[...] = CView
return 1
It doesn't perform the copy-less buffer assignment that the other poster wanted, but it is still an efficient memoryview copy.
Suppose I have
a = np.zeros(2, dtype=[('a', np.int), ('b', np.float, 2)])
a[0] = (2,[3,4])
a[1] = (6,[7,8])
then I define the same Cython structure
import numpy as np
cimport numpy as np
cdef packed struct mystruct:
np.int_t a
np.float_t b[2]
def test_mystruct(mystruct[:] x):
cdef:
int k
mystruct y
for k in range(2):
y = x[k]
print y.a
print y.b[0]
print y.b[1]
after this, I run
test_mystruct(a)
and I got error:
ValueError Traceback (most recent call last)
<ipython-input-231-df126299aef1> in <module>()
----> 1 test_mystruct(a)
_cython_magic_5119cecbaf7ff37e311b745d2b39dc32.pyx in _cython_magic_5119cecbaf7ff37e311b745d2b39dc32.test_mystruct (/auto/users/pwang/.cache/ipython/cython/_cython_magic_5119cecbaf7ff37e311b745d2b39dc32.c:1364)()
ValueError: Expected 1 dimension(s), got 1
My question is how to fix it? Thank you.
This pyx compiles and imports ok:
import numpy as np
cimport numpy as np
cdef packed struct mystruct:
int a[2] # change from plain int
float b[2]
int c
def test_mystruct(mystruct[:] x):
cdef:
int k
mystruct y
for k in range(2):
y = x[k]
print y.a
print y.b[0]
print y.b[1]
dt='2i,2f,i'
b=np.zeros((3,),dtype=dt)
test_mystruct(b)
I started with the test example mentioned in my comment, and played with your case. I think the key change was to define the first element of the packed structure to be int a[2]. So if any element is an array, the first must an array to properly set up the structure.
Clearly an error that the test file isn't catching.
Defining the element as int a[1] doesn't work, possibly because the dtype removes such a dimension:
In [47]: np.dtype([('a', np.int, 1), ('b', np.float, 2)])
Out[47]: dtype([('a', '<i4'), ('b', '<f8', (2,))])
Defining the dtype to get around this shouldn't be hard until the issue is raised and patched.
The struct could have a[1], but the array dtype would have to specify the size with a tuple: ('a','i',(1,)). ('a','i',1) would make the size ().
If one of the struct arrays is 2d, it looks like all of them have to be:
cdef packed struct mystruct:
int a[1][1]
float b[2][1]
int c[2][2]
https://github.com/cython/cython/blob/c4c2e3d8bd760386b26dbd6cffbd4e30ba0a7d13/tests/memoryview/numpy_memoryview.pyx
Stepping back a bit, I wonder what's the point to processing a complex structured array in cython. For some operations wouldn't it work just as well to pass the fields as separate variables. For example myfunc(a['a'],a['b']) instead of myfunc(a).
There is a general method to get the dtype for a c struct, but it involves a temporary variable:
cdef mystruct _tmp
dt = np.asarray(<mystruct[:1]>(&_tmp)).dtype
This requires at least numpy 1.5. See discussion here: https://github.com/scikit-learn/scikit-learn/pull/2298
I have a masked array which is used by matplotlib.plt.contourf to project a temperature contour on a glabal map. I was trying to smooth the contour, but unfortunately none of the proposed solutions seems to be able to handle masked array. I tested these solutions:
-scipy.ndimage.gaussian_filter - moving averages
scipy.ndimage.zoom
none of them works(they count in the masked values also). Is there any way I can smooth my contour on maskedArray
I have added this part after trying the proposed 'inpaint' solution and the results were unchanged. here is the code (if it helps)
import Scientific.IO.NetCDF as S
import mpl_toolkits.basemap as bm
import numpy.ma as MA
import numpy as np
import matplotlib.pyplot as plt
import inpaint
def main():
fileobj = S.NetCDFFile('Bias.ANN.tas_A1_1.nc', mode='r')
# take the values
set1 = {'time', 'lat', 'lon'}
set2 = set(fileobj.variables.keys())
set3 = set2 - set1
datadim = set3.pop()
print "******************datadim: "+datadim
data = fileobj.variables[datadim].getValue()[0,:,:]
lon = fileobj.variables['lon'].getValue()
lat = fileobj.variables['lat'].getValue()
fileobj.close()
data, lon = bm.shiftgrid(180.,data, lon,start=False)
data = MA.masked_equal(data, 1.0e20)
#data2 = inpaint.replace_nans(data, 10, 0.25, 2, 'idw')
#- Make 2-D longitude and latitude arrays:
[lon2d, lat2d] =np.meshgrid(lon, lat)
#- Set up map:
mapproj = bm.Basemap(projection='cyl',
llcrnrlat=-90.0, llcrnrlon=-180.00,
urcrnrlat=90.0, urcrnrlon=180.0)
mapproj.drawcoastlines(linewidth=.5)
mapproj.drawmapboundary(fill_color='.8')
#mapproj.drawparallels(N.array([-90, -45, 0, 45, 90]), labels=[1,0,0,0])
#mapproj.drawmeridians(N.array([0, 90, 180, 270, 360]), labels=[0,0,0,1])
lonall, latall = mapproj(lon2d, lat2d)
cmap=plt.cm.Spectral
#- Make a contour plot of the temperature:
mymapf = plt.contourf(lonall, latall, data, 20, cmap=cmap)
#plt.clabel(mymapf, fontsize=12)
plt.title(cmap.name)
plt.colorbar(mymapf, orientation='horizontal')
plt.savefig('sample2.png', dpi=150, edgecolor='red', format='png', bbox_inches='tight', pad_inches=.2)
plt.close()
if __name__ == "__main__":
main()
I am comparing the output from this code (the first figure), with output of the same datafile from Panoply. Zoomin in and looking more precisely it seems like it is not the smoothness problem, but the pyplot model provides one stripe slimmer, or the contours are cut earlier (the outer boundaries shows this clearly, and inner contours are different due to this fact). It makes it to look like that the pyplot model is not as smooth as the Panoply one. how can I get (nearly) the same model? Am I distinguishing it right?
I had similar problem and google pointed me to this: blog post. Basically he's using inpaint algorithm to interpolate missing values and produce valid array for filtering.
The code is at the end of the post, and you can save it to site-packages (or else) and load it as module (i.e. inpaint.py):
import inpaint
filled = inpaint.replace_nans(NANMask, 5, 0.5, 2, 'idw')
I'm happy with the result, and I guess it will suite missing temperature values just fine. There is also next version here: github but code will need some cleaning for general usage as it's part of a project.
For reference, easy use and preservation sake I'll post the code (of initial version) here:
# -*- coding: utf-8 -*-
"""A module for various utilities and helper functions"""
import numpy as np
#cimport numpy as np
#cimport cython
DTYPEf = np.float64
#ctypedef np.float64_t DTYPEf_t
DTYPEi = np.int32
#ctypedef np.int32_t DTYPEi_t
##cython.boundscheck(False) # turn of bounds-checking for entire function
##cython.wraparound(False) # turn of bounds-checking for entire function
def replace_nans(array, max_iter, tol,kernel_size=1,method='localmean'):
"""Replace NaN elements in an array using an iterative image inpainting algorithm.
The algorithm is the following:
1) For each element in the input array, replace it by a weighted average
of the neighbouring elements which are not NaN themselves. The weights depends
of the method type. If ``method=localmean`` weight are equal to 1/( (2*kernel_size+1)**2 -1 )
2) Several iterations are needed if there are adjacent NaN elements.
If this is the case, information is "spread" from the edges of the missing
regions iteratively, until the variation is below a certain threshold.
Parameters
----------
array : 2d np.ndarray
an array containing NaN elements that have to be replaced
max_iter : int
the number of iterations
kernel_size : int
the size of the kernel, default is 1
method : str
the method used to replace invalid values. Valid options are
`localmean`, 'idw'.
Returns
-------
filled : 2d np.ndarray
a copy of the input array, where NaN elements have been replaced.
"""
# cdef int i, j, I, J, it, n, k, l
# cdef int n_invalids
filled = np.empty( [array.shape[0], array.shape[1]], dtype=DTYPEf)
kernel = np.empty( (2*kernel_size+1, 2*kernel_size+1), dtype=DTYPEf )
# cdef np.ndarray[np.int_t, ndim=1] inans
# cdef np.ndarray[np.int_t, ndim=1] jnans
# indices where array is NaN
inans, jnans = np.nonzero( np.isnan(array) )
# number of NaN elements
n_nans = len(inans)
# arrays which contain replaced values to check for convergence
replaced_new = np.zeros( n_nans, dtype=DTYPEf)
replaced_old = np.zeros( n_nans, dtype=DTYPEf)
# depending on kernel type, fill kernel array
if method == 'localmean':
print 'kernel_size', kernel_size
for i in range(2*kernel_size+1):
for j in range(2*kernel_size+1):
kernel[i,j] = 1
print kernel, 'kernel'
elif method == 'idw':
kernel = np.array([[0, 0.5, 0.5, 0.5,0],
[0.5,0.75,0.75,0.75,0.5],
[0.5,0.75,1,0.75,0.5],
[0.5,0.75,0.75,0.5,1],
[0, 0.5, 0.5 ,0.5 ,0]])
print kernel, 'kernel'
else:
raise ValueError( 'method not valid. Should be one of `localmean`.')
# fill new array with input elements
for i in range(array.shape[0]):
for j in range(array.shape[1]):
filled[i,j] = array[i,j]
# make several passes
# until we reach convergence
for it in range(max_iter):
print 'iteration', it
# for each NaN element
for k in range(n_nans):
i = inans[k]
j = jnans[k]
# initialize to zero
filled[i,j] = 0.0
n = 0
# loop over the kernel
for I in range(2*kernel_size+1):
for J in range(2*kernel_size+1):
# if we are not out of the boundaries
if i+I-kernel_size < array.shape[0] and i+I-kernel_size >= 0:
if j+J-kernel_size < array.shape[1] and j+J-kernel_size >= 0:
# if the neighbour element is not NaN itself.
if filled[i+I-kernel_size, j+J-kernel_size] == filled[i+I-kernel_size, j+J-kernel_size] :
# do not sum itself
if I-kernel_size != 0 and J-kernel_size != 0:
# convolve kernel with original array
filled[i,j] = filled[i,j] + filled[i+I-kernel_size, j+J-kernel_size]*kernel[I, J]
n = n + 1*kernel[I,J]
# divide value by effective number of added elements
if n != 0:
filled[i,j] = filled[i,j] / n
replaced_new[k] = filled[i,j]
else:
filled[i,j] = np.nan
# check if mean square difference between values of replaced
#elements is below a certain tolerance
print 'tolerance', np.mean( (replaced_new-replaced_old)**2 )
if np.mean( (replaced_new-replaced_old)**2 ) < tol:
break
else:
for l in range(n_nans):
replaced_old[l] = replaced_new[l]
return filled
def sincinterp(image, x, y, kernel_size=3 ):
"""Re-sample an image at intermediate positions between pixels.
This function uses a cardinal interpolation formula which limits
the loss of information in the resampling process. It uses a limited
number of neighbouring pixels.
The new image :math:`im^+` at fractional locations :math:`x` and :math:`y` is computed as:
.. math::
im^+(x,y) = \sum_{i=-\mathtt{kernel\_size}}^{i=\mathtt{kernel\_size}} \sum_{j=-\mathtt{kernel\_size}}^{j=\mathtt{kernel\_size}} \mathtt{image}(i,j) sin[\pi(i-\mathtt{x})] sin[\pi(j-\mathtt{y})] / \pi(i-\mathtt{x}) / \pi(j-\mathtt{y})
Parameters
----------
image : np.ndarray, dtype np.int32
the image array.
x : two dimensions np.ndarray of floats
an array containing fractional pixel row
positions at which to interpolate the image
y : two dimensions np.ndarray of floats
an array containing fractional pixel column
positions at which to interpolate the image
kernel_size : int
interpolation is performed over a ``(2*kernel_size+1)*(2*kernel_size+1)``
submatrix in the neighbourhood of each interpolation point.
Returns
-------
im : np.ndarray, dtype np.float64
the interpolated value of ``image`` at the points specified
by ``x`` and ``y``
"""
# indices
# cdef int i, j, I, J
# the output array
r = np.zeros( [x.shape[0], x.shape[1]], dtype=DTYPEf)
# fast pi
pi = 3.1419
# for each point of the output array
for I in range(x.shape[0]):
for J in range(x.shape[1]):
#loop over all neighbouring grid points
for i in range( int(x[I,J])-kernel_size, int(x[I,J])+kernel_size+1 ):
for j in range( int(y[I,J])-kernel_size, int(y[I,J])+kernel_size+1 ):
# check that we are in the boundaries
if i >= 0 and i <= image.shape[0] and j >= 0 and j <= image.shape[1]:
if (i-x[I,J]) == 0.0 and (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j]
elif (i-x[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(j-y[I,J]) )/( pi*(j-y[I,J]) )
elif (j-y[I,J]) == 0.0:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )/( pi*(i-x[I,J]) )
else:
r[I,J] = r[I,J] + image[i,j] * np.sin( pi*(i-x[I,J]) )*np.sin( pi*(j-y[I,J]) )/( pi*pi*(i-x[I,J])*(j-y[I,J]))
return r
#cdef extern from "math.h":
# double sin(double)
A simple smoothing function that works with masked data will solve this. One can then avoid the approaches that involve making up data (ie, interpolating, inpainting, etc); and making up data should always be avoided.
The main issue that arises when smoothing masked data is that for each point, smoothing uses the neighboring values to calculate a new value at a center point, but when those neighbors are masked, the new value for the center point will also become masked due to the rules of masked arrays. Therefore, one needs to do the calculation with unmasked data, and explicitly account for the mask. That's easy to do, and is not in the function smooth below.
from numpy import *
import pylab as plt
# make a grid and a striped mask as test data
N = 100
x = linspace(0, 5, N, endpoint=True)
grid = 2. + 1.*(sin(2*pi*x)[:,newaxis]*sin(2*pi*x)>0.)
m = resize((sin(pi*x)>0), (N,N))
plt.imshow(grid.copy(), cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('original data')
def smooth(u, mask):
m = ~mask
r = u*m # set all 'masked' points to 0. so they aren't used in the smoothing
a = 4*r[1:-1,1:-1] + r[2:,1:-1] + r[:-2,1:-1] + r[1:-1,2:] + r[1:-1,:-2]
b = 4*m[1:-1,1:-1] + m[2:,1:-1] + m[:-2,1:-1] + m[1:-1,2:] + m[1:-1,:-2] # a divisor that accounts for masked points
b[b==0] = 1. # for avoiding divide by 0 error (region is masked so value doesn't matter)
u[1:-1,1:-1] = a/b
# run the data through the smoothing filter a few times
for i in range(10):
smooth(grid, m)
mg = ma.array(grid, mask=m) # put together the mask and the data
plt.figure()
plt.imshow(mg, cmap='jet', interpolation='nearest')
plt.colorbar()
plt.title('smoothed with mask')
plt.show()
The main point is that at the boundary of the mask, the masked values are not used in the smoothing. (This is also where the grid squares switch values, so it would be clear in the figure if the masked neighboring values were being used.)
We also just had this problem and the astropy package has us covered:
import numpy as np
import matplotlib.pyplot as plt
# Some Axes
x = np.arange(100)
y = np.arange(100)
#Some Interesting Shape
z = np.array(np.outer(np.sin((x+y)/10),np.sin(y/3)),dtype=float)
# some mask
mask = np.outer(np.sin((x+y)/20),np.sin(y/5))**2>.9
# masked data represent noise, so lets put in some trash into the masked points
z[mask] = (np.random.random(size = (100,100))*10)[mask]
# masked data
z_masked = np.ma.masked_array(z, mask)
# "Conventional" filter
filter_kernelsize = 2
import scipy.ndimage
z_filtered_bad = scipy.ndimage.gaussian_filter(z_masked,filter_kernelsize)
# Lets filter it
import astropy.convolution.convolve
from astropy.convolution import Gaussian2DKernel
k = Gaussian2DKernel(1.5)
z_filtered = astropy.convolution.convolve(z_masked, k, boundary='extend')
### Plots:
fig, axes = plt.subplots(2,2)
plt.sca(axes[0,0])
plt.title('Raw Data')
plt.imshow(z)
plt.colorbar()
plt.sca(axes[0,1])
plt.title('Raw Data Masked')
plt.imshow(z_masked)
plt.colorbar()
plt.sca(axes[1,0])
plt.title('ndimage filter (ignores mask)')
plt.imshow(z_filtered_bad)
plt.colorbar()
plt.sca(axes[1,1])
plt.title('astropy filter (uses mask)')
plt.imshow(z_filtered)
plt.colorbar()
plt.tight_layout()
Output plot of the code