Copy Numpy array to a memoryview - numpy

I have a memoryview on a numpy array and want to copy the content of another numpy array into it by using this memoryview:
import numpy as np
cimport numpy as np
cdef double[:,::1] test = np.array([[0,1],[2,3]], dtype=np.double)
test[...] = np.array([[4,5],[6,7]], dtype=np.double)
But why is this not possible? It keeps me telling
TypeError: only length-1 arrays can be converted to Python scalars
Blockquote
It works fine if I copy from a memoryview to a memoryview, or from a numpy array to a numpy array, but how to copy from a numpy array to a memoryview?

These assignments work:
cdef double[:,::1] test2d = np.array([[0,1],[2,3],[4,5]], dtype=np.double)
cdef double[:,::1] temp = np.array([[4,5],[6,7]], dtype=np.double)
test2d[...] = 4
test2d[:,1] = np.array([5],dtype=np.double)
test2d[1:,:] = temp
print np.asarray(test2d)
displaying
[[ 4. 5.]
[ 4. 5.]
[ 6. 7.]]
I've added an answer at https://stackoverflow.com/a/30418422/901925 that uses this memoryview 'buffer' approach in a indented context.
cpdef int testfunc1c(np.ndarray[np.float_t, ndim=2] A,
double [:,:] BView) except -1:
cdef double[:,:] CView
if np.isnan(A).any():
return -1
else:
CView = la.inv(A)
BView[...] = CView
return 1
It doesn't perform the copy-less buffer assignment that the other poster wanted, but it is still an efficient memoryview copy.

Related

Unable to assign values to numpy array in cython

I am fairly new to parallel programming in cython and I was trying to create a 1D array of size 3 from numpy, however I am unable to assign values to this array unless I specify it element by element.
import numpy as np
cimport numpy as cnp
cdef int num = 3
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight = np.ones((num), dtype = "int")
Weight[2] = 6
print(Weight)
Output -> [1 1 6]
Weight = cnp.ndarray([1,2,3])
Output -> ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
In the comments I suggested changing:
Weight = cnp.ndarray([1,2,3])
to
Weight = np.array([1,2,3])
Just to clarify my comment a bit more:
The line
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight = np.ones((num), dtype = "int")
is effectively two parts:
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight
a declaration only. This allocated no memory, it merely creates a variable that can be a reference to a numpy array, and which allows quick indexing.
Weight = np.ones((num), dtype = "int")
This is a normal Python call to np.ones which allocates memory for the array. It is largely un-accelerated by Cython. From this point on Weight is a reference to that allocated array and can be used to change it. Note that Weight = ... in following lines will change what array Weight references.
I therefore suggest you skip the np.ones step and just do
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight = np.ones([1,2,3], dtype = "int")
Be aware that the only bit of Numpy that using these declarations accelerates is indexing into the array. Almost all other Numpy calls happen through the normal Python mechanism and will require the GIL and will happen at normal Python speed.

How do I type my variables in cython so they pass to the memoryview array faster?

I am trying to optimize a double loop for an N-body integrator and I found that the problem with my code is that I'm incurring a massive overhead when I write stored variables into the memory view locations.
I originally had this code vectorized in numpy, but it was called inside another for loop to update the particle positions and the overhead was brutal. I have an np.ndarray Nx2 vector of positions (X) and I want to return an Nx2 vector of momentums (XOut) -- the current code listed below returns a memory view, but that's OK because I'd like to eventually embed this function in other Cython function once I've debugged this bottleneck.
I had tried the cython -a "name.pyx" command and found that I more or less have everything as a C-type. However, I found that towards the bottom of the loop, writing into the memoryview of XOut[ii,0] -= valuex is incurring most of the run time. If I change that into a constant so that XOut[ii,0] -= 5, the code is ~40X faster. I think this means I'm doing some sort of copy operation on that line which is slowing me down. My Cython/C++ backgrounds are rudimentary, but I think I need to change the syntax so that I'm writing into the memoryview from a pointer. Any advice would be greatly appreciative; Thanks!
import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef DTYPE_t pi = 3.141592653589793
#cython.cdivision(True)
#cython.boundscheck(False) # turn off bounds-checking for entire function
#cython.wraparound(False) # turn off negative index wrapping for entire function
def intTerms(const DTYPE_t[:,:] X, DTYPE_t epsilon, DTYPE_t[:,:] XOut):
cdef Py_ssize_t ii,jj,N
N = X.shape[0]
cdef DTYPE_t valuex,valuey,r2,xvec,yvec
for ii in range(0,N):
for jj in range(ii+1,N):
xvec = X[ii,0]-X[jj,0]
yvec = X[ii,1]-X[jj,1]
r2 = max(xvec**2+yvec**2,epsilon)
valuex = xvec/r2**2
valuey = yvec/r2**2
XOut[ii,0] -= valuex
XOut[ii,1] -= 5 #valuey
XOut[jj,0] += 5 #valuex
XOut[jj,1] += 5 #valuey
XOut[ii,0] /= 2*pi
XOut[ii,1] /= 2*pi
return XOut
OK, so the issue was the mathematical operations. Cython doesn't optimize the ** operator so I modified the code:
import numpy as np
cimport numpy as np
from cython.view cimport array as cvarray
cimport cython
from libc.math cimport sinh, cosh, sin, cos, acos, exp, sqrt, fabs, M_PI
DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef DTYPE_t pi = 3.141592653589793
#cython.cdivision(True)
#cython.boundscheck(False) # turn off bounds-checking for entire function
#cython.wraparound(False) # turn off negative index wrapping for entire function
def intTerms(const DTYPE_t[:,:] X, DTYPE_t epsilon, DTYPE_t[:,:] XOut):
cdef Py_ssize_t ii,jj,N
N = X.shape[0]
cdef DTYPE_t valuex,valuey,r2,xvec,yvec
for ii in range(0,N-1):
for jj in range(ii+1,N):
xvec = X[ii,0]-X[jj,0]
yvec = X[ii,1]-X[jj,1]
r2 = max(xvec*xvec+yvec*yvec,epsilon)
valuex = xvec/r2/r2
valuey = yvec/r2/r2
XOut[ii,0] -= valuex
XOut[ii,1] -= valuey
XOut[jj,0] += valuex
XOut[jj,1] += valuey
XOut[ii,0] /= 2*pi
XOut[ii,1] /= 2*pi
return XOut
Changing valuex from xvec/r2**2 to xvec/r2/r2 and removing all instances of the ** operator sped up the loop to 9ms from 200ms for an 1800x2 array. I am still hopeful that a 4ms speed is possible, but I'll settle for 9ms for now.

Passing structured array to Cython, failed (I think it is a Cython bug)

Suppose I have
a = np.zeros(2, dtype=[('a', np.int), ('b', np.float, 2)])
a[0] = (2,[3,4])
a[1] = (6,[7,8])
then I define the same Cython structure
import numpy as np
cimport numpy as np
cdef packed struct mystruct:
np.int_t a
np.float_t b[2]
def test_mystruct(mystruct[:] x):
cdef:
int k
mystruct y
for k in range(2):
y = x[k]
print y.a
print y.b[0]
print y.b[1]
after this, I run
test_mystruct(a)
and I got error:
ValueError Traceback (most recent call last)
<ipython-input-231-df126299aef1> in <module>()
----> 1 test_mystruct(a)
_cython_magic_5119cecbaf7ff37e311b745d2b39dc32.pyx in _cython_magic_5119cecbaf7ff37e311b745d2b39dc32.test_mystruct (/auto/users/pwang/.cache/ipython/cython/_cython_magic_5119cecbaf7ff37e311b745d2b39dc32.c:1364)()
ValueError: Expected 1 dimension(s), got 1
My question is how to fix it? Thank you.
This pyx compiles and imports ok:
import numpy as np
cimport numpy as np
cdef packed struct mystruct:
int a[2] # change from plain int
float b[2]
int c
def test_mystruct(mystruct[:] x):
cdef:
int k
mystruct y
for k in range(2):
y = x[k]
print y.a
print y.b[0]
print y.b[1]
dt='2i,2f,i'
b=np.zeros((3,),dtype=dt)
test_mystruct(b)
I started with the test example mentioned in my comment, and played with your case. I think the key change was to define the first element of the packed structure to be int a[2]. So if any element is an array, the first must an array to properly set up the structure.
Clearly an error that the test file isn't catching.
Defining the element as int a[1] doesn't work, possibly because the dtype removes such a dimension:
In [47]: np.dtype([('a', np.int, 1), ('b', np.float, 2)])
Out[47]: dtype([('a', '<i4'), ('b', '<f8', (2,))])
Defining the dtype to get around this shouldn't be hard until the issue is raised and patched.
The struct could have a[1], but the array dtype would have to specify the size with a tuple: ('a','i',(1,)). ('a','i',1) would make the size ().
If one of the struct arrays is 2d, it looks like all of them have to be:
cdef packed struct mystruct:
int a[1][1]
float b[2][1]
int c[2][2]
https://github.com/cython/cython/blob/c4c2e3d8bd760386b26dbd6cffbd4e30ba0a7d13/tests/memoryview/numpy_memoryview.pyx
Stepping back a bit, I wonder what's the point to processing a complex structured array in cython. For some operations wouldn't it work just as well to pass the fields as separate variables. For example myfunc(a['a'],a['b']) instead of myfunc(a).
There is a general method to get the dtype for a c struct, but it involves a temporary variable:
cdef mystruct _tmp
dt = np.asarray(<mystruct[:1]>(&_tmp)).dtype
This requires at least numpy 1.5. See discussion here: https://github.com/scikit-learn/scikit-learn/pull/2298

Cython error: Undeclared name not built in:array

I am compiling this Cython code in Sage Cell Server and I get the following error.
undeclared name not builtin: array
It displays the same error in Sage Notebook. I think it is not recognizing numpy array but it
is strange cause I have imported numpy already.
cython('''
cimport numpy as np
ctypedef np.int DTYPE
def computeDetCy(np.ndarray[DTYPE, ndim=2] matrix):
return determ(matrix,len(matrix))
cdef inline int determ(np.ndarray[DTYPE, ndim=2] matrix, int n):
cdef int det = 0
cdef int p=0
cdef int h
cdef int k
cdef int i=0
cdef int j=0
cdef np.ndarray[DTYPE, ndim=2] temp=np.zeros(4,4)
if n == 1:
return matrix[0][0]
elif n == 2:
return matrix[0][0]*matrix[1][1] - matrix[0][1]*matrix[1][0]
else:
for p in range(0, n):
h = 0
k = 0
for i in range(1, n):
for j in range(0, n):
if j==p:
continue
temp[h][k] = matrix[i][j]
k+=1
if k ==(n-1):
h+=1
k=0
det= det + matrix[0][p] * (-1)**p * determ(temp, n-1)
return det
computeDetCy(array([[13,42,43,22],[12,67,45,98],[23,91,18,54],[34,56,82,76]]))
''')
Yeah, but you imported it as np, not importing * (which would be a bad idea anyway) and didn't do a regular Python import. (Sometimes you have to do both a cimport and import, see this SO question for an example.)
However, even after
import numpy as np
and using np.array, I still get some errors
ValueError: Buffer dtype mismatch, expected 'DTYPE' but got 'long'
So this solves your question, but isn't the whole story, and the things I tried didn't work to fix this new issue.

Why doesn't the shape of my numpy array change?

I have made a numpy array out of data from an image. I want to convert the numpy array into a one-dimensional one.
import numpy as np
import matplotlib.image as img
if __name__ == '__main__':
my_image = img.imread("zebra.jpg")[:,:,0]
width, height = my_image.shape
my_image = np.array(my_image)
img_buffer = my_image.copy()
img_buffer = img_buffer.reshape(width * height)
print str(img_buffer.shape)
The 128x128 image is here.
However, this program prints out (128, 128). I want img_buffer to be a one-dimensional array though. How do I reshape this array? Why won't numpy actually reshape the array into a one-dimensional array?
.reshape returns a new array, rather than reshaping in place.
By the way, you appear to be trying to get a bytestring of the image - you probably want to use my_image.tostring() instead.
reshape doesn't work in place. Your code isn't working because you aren't assigning the value returned by reshape back to img_buffer.
If you want to flatten the array to one dimension, ravel or flatten might be easier options.
>>> img_buffer = img_buffer.ravel()
>>> img_buffer.shape
(16384,)
Otherwise, you'd want to do:
>>> img_buffer = img_buffer.reshape(np.product(img_buffer.shape))
>>> img_buffer.shape
(16384,)
Or, more succinctly:
>>> img_buffer = img_buffer.reshape(-1)
>>> img_buffer.shape
(16384,)