Passing numpy integer array to c code - numpy

I'm trying to write Cython code to dump a dense feature matrix, target vector pair to libsvm format faster than sklearn's built in code. I get a compilation error complaining about a type issue with passing the target vector (a numpy array of ints) to the relevant c function.
Here's the code:
import numpy as np
cimport numpy as np
cimport cython
cdef extern from "cdump.h":
int filedump( double features[], int numexemplars, int numfeats, int target[], char* outfname)
#cython.boundscheck(False)
#cython.wraparound(False)
def fastdumpdense_libsvmformat(np.ndarray[np.double_t,ndim=2] X, y, outfname):
if X.shape[0] != len(y):
raise ValueError("X and y need to have the same number of points")
cdef int numexemplars = X.shape[0]
cdef int numfeats = X.shape[1]
cdef bytes py_bytes = outfname.encode()
cdef char* outfnamestr = py_bytes
cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
X_c = np.ascontiguousarray(X, dtype=np.double)
y_c = np.ascontiguousarray(y, dtype=np.int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
return retval
When I attempt to compile this code using distutils, I get the error
cythoning fastdump_svm.pyx to fastdump_svm.cpp
Error compiling Cython file:
------------------------------------------------------------ ...
cdef np.ndarray[np.double_t, ndim=2, mode="c"] X_c
cdef np.ndarray[np.int_t, ndim=1, mode="c"] y_c
X_c = np.ascontiguousarray(X, dtype=np.double)
y_c = np.ascontiguousarray(y, dtype=np.int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)
^
------------------------------------------------------------
fastdump_svm.pyx:24:58: Cannot assign type 'int_t *' to 'int *'
Any idea how to fix this error? I originally was following the paradigm of passing y_c.data, which works, but this is apparently not the recommended way.

You can also use dtype=np.dtype("i") when initiating a numpy array to match the C int on your machine.
cdef int [:] y_c
c_array = np.ascontiguousarray(y, dtype=np.dtype("i"))

The problem is that numpy.int_t is not the same as int, you can easily check this by having your program print sizeof(numpy.int_t) and sizeof(int).
int is a c int, defined by the c standard as being at least 16 bits, but it's 32 bits on my machine. numpy.int_t is usually 32 bits or 64 bits depending on whether you're using a 32 or 64 bit version of numpy, but of course there is some exception (probably for windows users). If you want to know which numpy dtype matches your c_int you can do np.dtype(cytpes.c_int).
So to pass your numpy array to c code you can do:
import ctypes
cdef np.ndarray[int, ndim=1, mode="c"] y_c
y_c = np.ascontiguousarray(y, dtype=ctypes.c_int)
retval = filedump( &X_c[0,0], numexemplars, numfeats, &y_c[0], outfnamestr)

Related

bsearch in Cython

I'm learning cython's libc.bsearch for trying to use Cython to get an index in a sorted array. The example is from this question with modifications:
## test_bsearch.pyx
cimport cython
from libc.stdlib cimport bsearch
cdef int comp_fun(const void *a, const void *b) nogil:
cdef int a_v = (<int*>a)[0]
cdef int b_v = (<int*>b)[0]
if a_v < b_v:
return -1
elif a_v > b_v:
return 1
else:
return 0
def bsearch_c(int[::1] t, int v):
cdef int *p = <int*> bsearch(&v, &t[0], t.shape[0], sizeof(int), &comp_fun)
cdef int j = <int> p
if p != NULL:
return j
else:
return -1
I then created a setup.py:
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules=cythonize([
"test_bsearch.pyx"
],
compiler_directives={'language_level': "3"}
),
include_dirs=[
np.get_include()
]
)
And compiled the code in Win10 in command prompt: python setup.py build_ext -i. But running it as follows got a strange result:
>>> from test_bsearch import bsearch_c
>>> import numpy as np
>>> x = np.arange(20, dtype=np.int32)
>>> bsearch_c(x, 5) # got 610183044
I know nothing about C++, so can't figure out what's wrong with above implementation. How to correct?
cdef int j = <int> p
This is casting a pointer to an int. You want
cdef int j = p[0]

Cython passing int numpy array to C++

First, I know this question appears similar to this one but they are different. I'm struggling trying to pass int (int32) numpy array to C++ via Cython without copying. The files:
doit.cpp:
#include "doit.h"
void run(int *x) {}
doit.h:
#ifndef _DOIT_H_
#define _DOIT_H_
void run(int *);
#endif
q.pyx:
cimport numpy as np
import numpy as np
cdef extern from "doit.h":
void run(int* X)
def pyrun(np.ndarray[np.int_t, ndim=1] X):
X = np.ascontiguousarray(X)
run(&X[0])
I compile with Cython. The error is:
Error compiling Cython file:
------------------------------------------------------------
...
cdef extern from "doit.h":
void run(int* X)
def pyrun(np.ndarray[np.int_t, ndim=1] X):
X = np.ascontiguousarray(X)
run(&X[0])
^
------------------------------------------------------------
py_cpp/q.pyx:9:8: Cannot assign type 'int_t *' to 'int *'
However, if I replace all occurrences of int to double (e.g. int *x to double *x, int_t to double_t), then all errors are gone.
How to solve the problem? Thanks in advance.

A Good Way to Expose CUPY MemoryPointer in C/C++?

NumPy provides well-defined C APIs so that one can easily handle NumPy array in C/C++ space. For example, if I have a C function that takes C arrays (pointers) as arguments, I can just #include <numpy/arrayobject.h>, and pass a NumPy array to it by accessing its data member (or use the C API PyArray_DATA).
Recently I want to achieve the same for CuPy, but I cannot find a header file that I can include. To be specific, my goal is as follows:
I have some CUDA kernels and their callers written in C/C++. The callers run on host but take handles of memory buffers on device as arguments. The computed results of the callers are also stored on device.
I want to wrap the callers into Python functions so that I can control when to transfer data from device to host in Python. That means I have to wrap the resulted device memory pointers in Python objects. CuPy's ndarray is the best choice I can think of.
I can't use CuPy's user-defined-kenrel mechanism because the functions I want to wrap are not directly CUDA kernels. They must contain host code.
Currently, I've found a workaround. I write the Python functions in cython, which take CuPy arrays as inputs and return CuPy arrays. And then I cast .data.ptr attribute into C's size_t type, and then further cast it to whatever pointer type I need. Example code follows.
Example Code
//kernel.cu
#include <math.h>
__global__ void vecSumKernel(float *A, float *B, float *C, int n) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n)
C[i] = A[i] + B[i];
}
// This is the C function I want to wrap into Python.
// Notice it does not allocate any memory on device. I want that to be done by cupy.
extern "C" void vecSum(float *A_d, float *B_d, float *C_d, int n) {
int threadsPerBlock = 512;
if (threadsPerBlock > n) threadsPerBlock = n;
int nBlocks = (int)ceilf((float)n / (float)threadsPerBlock);
vecSumKernel<<<nBlocks, threadsPerBlock>>>(A_d, B_d, C_d, n);
}
//kernel.h
#ifndef KERNEL_H_
#define KERNEL_H_
void vecSum(float *A_d, float *B_d, float *C_d, int n);
#endif
# test_module.pyx
import cupy as cp
import numpy as np
cdef extern from "kernel.h":
void vecSum(float *A_d, float *B_d, float *C_d, int n)
cdef vecSum_wrapper(size_t aPtr, size_t bPtr, size_t cPtr, int n):
# here the Python int -- cp.ndarray.data.ptr -- is first cast to size_t,
# and then cast to (float *).
vecSum(<float*>aPtr, <float*>bPtr, <float*>cPtr, n)
# This is the Python function I want to use
# a, b are cupy arrays
def vec_sum(a, b):
a_ptr = a.data.ptr
b_ptr = b.data.ptr
n = a.shape[0]
output = cp.empty(shape=(n,), dtype=a.dtype)
c_ptr = output.data.ptr
vecSum_wrapper(a_ptr, b_ptr, c_ptr, n)
return output
Compile and Run
To compile, one can first compile the kernel.cu into a static library, say, libVecSum. Then use cython to compile test_module.pyx int test_module.c, and build the Python extension as usual.
# setup.py
from setuptools import Extension, setup
ext_module = Extension(
"cupyExt.test_module",
sources=["cupyExt/test_module.c"],
library_dirs=["cupyExt/"],
libraries=['libVecSum', 'cudart'])
setup(
name="cupyExt",
version="0.0.0",
ext_modules = [ext_module],
)
It seems working.
>>> import cupy as cp
>>> from cupyExt import test_module
>>> a = cp.ones(5, dtype=cp.float32) * 3
>>> b = cp.arange(5, dtype=cp.float32)
>>> c = test_module.vec_sum(a, b)
>>> print(c.device)
<CUDA Device 0>
>>> print(c)
[3. 4. 5. 6. 7.]
Any better ways?
I am not sure if this way is memory safe. I also feel the casting from .data.ptr to C pointers is not good. I want to know people's thoughts and comments on this.

passing 1 or 2 d numpy array to c throw cython

I am writing an extension to my python code in c and cython, by following this guide.
my c function signature is
void c_disloc(double *pEOutput, double *pNOutput, double *pZOutput, double *pModel, double *pECoords, double *pNCoords, double nu, int NumStat, int NumDisl)
and my cython function is
cdef extern void c_disloc(double *pEOutput, double *pNOutput, double *pZOutput, double *pModel, double *pECoords, double *pNCoords, double nu, int NumStat, int NumDisl)
#cython.boundscheck(False)
#cython.wraparound(False)
def disloc(np.ndarray[double, ndim=2, mode="c"] pEOutput not None,
np.ndarray[double, ndim=2, mode="c"] pNOutput not None,
np.ndarray[double, ndim=2, mode="c"] pZOutput not None,
np.ndarray[double, ndim=1, mode="c"] pModel not None,
np.ndarray[double, ndim=2, mode="c"] pECoords not None,
np.ndarray[double, ndim=2, mode="c"] pNCoords not None,
double nu,int NumStat, int NumDisl ):
c_disloc(&pEOutput[0,0], &pNOutput[0,0], &pZOutput[0,0], &pModel[0], &pECoords[0,0], &pNCoords[0,0], nu, NumStat, NumDisl)
return None
now my c function has the same behavior no matter if the arrays that its getting are 1d or 2d arrays, but I didn't succeed making the cython function to be able to get 1d or 2d numpy arrays.
of course, I could write tow cython function one for the 1d case and one for the 2d case but it will be cleaner to do it with one function.
dose someone knows how to do it?
I'd accept an untyped argument, check that it's a C contiguous array and then use np.ravel to get a flat array (this returns a view, not a copy, when passed a C contiguous array). It's easy to create that as a cdef function:
cdef double* get_array_pointer(arr) except NULL:
assert(arr.flags.c_contiguous) # if this isn't true, ravel will make a copy
cdef double[::1] mview = arr.ravel()
return &mview[0]
Then you'd do
def disloc(pEOutput,
pNOutput,
# etc...
double nu,int NumStat, int NumDisl ):
c_disloc(get_array_pointer(pEOutput), get_array_pointer(pNOutput),
# etc
nu, NumStat, NumDisl)
I've removed the
#cython.boundscheck(False)
#cython.wraparound(False)
since it's obvious they will gain you close to nothing. Using them without thinking about whether they do anything seems like cargo cult programming to me.

Making my cython code more efficient

I've written a python program which I try to cythonize.
Is there any suggestion how to make the for-loop more efficient, as this is taking 99% of the time?
This is the for-loop:
for i in range(l):
b1[i] = np.nanargmin(locator[i,:]) # Closer point
locator[i, b1[i]] = NAN # Do not consider Closer point
b2[i] = np.nanargmin(locator[i,:]) # 2nd Closer point
Adjacents[i,0] = np.array((Existed_Pips[b1[i]]), dtype=np.double)
Adjacents[i,1] = np.array((Existed_Pips[b2[i]]), dtype=np.double)
This is the rest of the code:
import numpy as np
cimport numpy as np
from libc.math cimport NAN #, isnan
def PIPs(np.ndarray[np.double_t, ndim=1, mode='c'] ys, unsigned int nofPIPs, unsigned int typeofdist):
cdef:
unsigned int currentstate, j, i
np.ndarray[np.double_t, ndim=1, mode="c"] D
np.ndarray[np.int64_t, ndim=1, mode="c"] Existed_Pips
np.ndarray[np.int_t, ndim=1, mode="c"] xs
np.ndarray[np.double_t, ndim=2] Adjacents, locator, Adjy, Adjx, Raw_Fire_PIPs, Raw_Fem_PIPs
np.ndarray[np.int_t, ndim=2, mode="c"] PIP_points, b1, b2
cdef unsigned int l = len(ys)
xs = np.arange(0,l, dtype=np.int) # Column vector with xs
PIP_points = np.zeros((l,1), dtype=np.int) # Binary indexation
PIP_points[0] = 1 # One indicate the PIP points.The first two PIPs are the first and the last observation.
PIP_points[-1] = 1
Adjacents = np.zeros((l,2), dtype=np.double)
currentstate = 2 # Initial PIPs
while currentstate <= nofPIPs: # for eachPIPs in range(nofPIPs)
Existed_Pips = np.flatnonzero(PIP_points)
currentstate = len(Existed_Pips)
locator = np.full((l,currentstate), NAN, dtype=np.double) #np.int*
for j in range(currentstate):
locator[:,j] = np.absolute(xs-Existed_Pips[j])
b1 = np.zeros((l,1), dtype=np.int)
b2 = np.zeros((l,1), dtype=np.int)
for i in range(l):
b1[i] = np.nanargmin(locator[i,:]) # Closer point
locator[i, b1[i]] = NAN # Do not consider Closer point
b2[i] = np.nanargmin(locator[i,:]) # 2nd Closer point
Adjacents[i,0] = np.array((Existed_Pips[b1[i]]), dtype=np.double)
Adjacents[i,1] = np.array((Existed_Pips[b2[i]]), dtype=np.double)
##Calculate Distance
Adjx = Adjacents
Adjy = np.array([ys[np.array(Adjacents[:,0], dtype=np.int)], ys[np.array(Adjacents[:,1], dtype=np.int)]]).transpose()
Adjx[Existed_Pips,:] = NAN # Existed PIPs are not candidates for new PIP.
Adjy[Existed_Pips,:] = NAN
if typeofdist == 1: #Euclidean Distance
##[D] = EDist(ys,xs,Adjx,Adjy)
ED = np.power(np.power((Adjx[:,1]-xs),2) + np.power((Adjy[:,1]-ys),2),(0.5)) + np.power(np.power((Adjx[:,0]-xs),2) + np.power((Adjy[:,0]-ys),2),(0.5))
EDmax = np.nanargmax(ED)
PIP_points[EDmax]=1
currentstate=currentstate+1
return np.array([Existed_Pips, ys[Existed_Pips]]).transpose()
A couple of suggestions:
Take the calls to np.nanargmin out of the loop (use the axis parameter to let you operate on the whole array at once. This reduces the number of Python function calls you have to make:
b1 = np.nanargmin(locator,axis=1)
locator[np.arange(locator.shape[0]),b1] = np.nan
b2 = np.nanargmin(locator,axis=1)
Your assignment to Adjacents is odd - you seem to be creating a length-1 array for the right-hand side first. Instead just do
Adjacents[i,0] = Existed_Pips[b1[i]]
# ...
However, in this case, you can also take both lines outside the loop, eliminating the entire loop:
Adjacents = np.vstack((Existing_Pips[b1], Existings_Pips[b2])).T
All of this is relying on numpy, rather than Cython, for the speed-up, but it probably beats your version.