What is a ".pxi.in" file and how to use it? - pandas

I need to understand the source code of pandas. I met several ".pxi.in" files in the source code. For example:
https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/hashtable_class_helper.pxi.in
The files say
DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
I want to know how to use .pxi.in files to generate .pxi files. Is there any tutorials or docs?

Cython comes with tempita. The idea behind it is more or less to have a kind of preprocessor-language for Cython, because otherwise Cython is somewhat lacking this functionality compared to C or C++-templates.
*.pxi.in from pandas uses tempita, to create e.g. classes for int8, int16 usw. from a template, so one doesn't have to repeat the code per hand. *.pxi.in files are converted to *.pxi-file in the setup.py-file.
As example
{{py:
# name
complex_types = ['complex64',
'complex128']
}}
{{for name in complex_types}}
cdef kh{{name}}_t to_kh{{name}}_t({{name}}_t val) nogil:
cdef kh{{name}}_t res
res.real = val.real
res.imag = val.imag
return res
{{endfor}}
will be converted, once tempita step was performed, to
cdef khcomplex64_t to_khcomplex64_t(complex64_t val) nogil:
cdef khcomplex64_t res
res.real = val.real
res.imag = val.imag
return res
cdef khcomplex128_t to_khcomplex128_t(complex128_t val) nogil:
cdef khcomplex128_t res
res.real = val.real
res.imag = val.imag
return res
and the resulting pxi-file will be used while cythonizing pyx-files.

Related

A Good Way to Expose CUPY MemoryPointer in C/C++?

NumPy provides well-defined C APIs so that one can easily handle NumPy array in C/C++ space. For example, if I have a C function that takes C arrays (pointers) as arguments, I can just #include <numpy/arrayobject.h>, and pass a NumPy array to it by accessing its data member (or use the C API PyArray_DATA).
Recently I want to achieve the same for CuPy, but I cannot find a header file that I can include. To be specific, my goal is as follows:
I have some CUDA kernels and their callers written in C/C++. The callers run on host but take handles of memory buffers on device as arguments. The computed results of the callers are also stored on device.
I want to wrap the callers into Python functions so that I can control when to transfer data from device to host in Python. That means I have to wrap the resulted device memory pointers in Python objects. CuPy's ndarray is the best choice I can think of.
I can't use CuPy's user-defined-kenrel mechanism because the functions I want to wrap are not directly CUDA kernels. They must contain host code.
Currently, I've found a workaround. I write the Python functions in cython, which take CuPy arrays as inputs and return CuPy arrays. And then I cast .data.ptr attribute into C's size_t type, and then further cast it to whatever pointer type I need. Example code follows.
Example Code
//kernel.cu
#include <math.h>
__global__ void vecSumKernel(float *A, float *B, float *C, int n) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n)
C[i] = A[i] + B[i];
}
// This is the C function I want to wrap into Python.
// Notice it does not allocate any memory on device. I want that to be done by cupy.
extern "C" void vecSum(float *A_d, float *B_d, float *C_d, int n) {
int threadsPerBlock = 512;
if (threadsPerBlock > n) threadsPerBlock = n;
int nBlocks = (int)ceilf((float)n / (float)threadsPerBlock);
vecSumKernel<<<nBlocks, threadsPerBlock>>>(A_d, B_d, C_d, n);
}
//kernel.h
#ifndef KERNEL_H_
#define KERNEL_H_
void vecSum(float *A_d, float *B_d, float *C_d, int n);
#endif
# test_module.pyx
import cupy as cp
import numpy as np
cdef extern from "kernel.h":
void vecSum(float *A_d, float *B_d, float *C_d, int n)
cdef vecSum_wrapper(size_t aPtr, size_t bPtr, size_t cPtr, int n):
# here the Python int -- cp.ndarray.data.ptr -- is first cast to size_t,
# and then cast to (float *).
vecSum(<float*>aPtr, <float*>bPtr, <float*>cPtr, n)
# This is the Python function I want to use
# a, b are cupy arrays
def vec_sum(a, b):
a_ptr = a.data.ptr
b_ptr = b.data.ptr
n = a.shape[0]
output = cp.empty(shape=(n,), dtype=a.dtype)
c_ptr = output.data.ptr
vecSum_wrapper(a_ptr, b_ptr, c_ptr, n)
return output
Compile and Run
To compile, one can first compile the kernel.cu into a static library, say, libVecSum. Then use cython to compile test_module.pyx int test_module.c, and build the Python extension as usual.
# setup.py
from setuptools import Extension, setup
ext_module = Extension(
"cupyExt.test_module",
sources=["cupyExt/test_module.c"],
library_dirs=["cupyExt/"],
libraries=['libVecSum', 'cudart'])
setup(
name="cupyExt",
version="0.0.0",
ext_modules = [ext_module],
)
It seems working.
>>> import cupy as cp
>>> from cupyExt import test_module
>>> a = cp.ones(5, dtype=cp.float32) * 3
>>> b = cp.arange(5, dtype=cp.float32)
>>> c = test_module.vec_sum(a, b)
>>> print(c.device)
<CUDA Device 0>
>>> print(c)
[3. 4. 5. 6. 7.]
Any better ways?
I am not sure if this way is memory safe. I also feel the casting from .data.ptr to C pointers is not good. I want to know people's thoughts and comments on this.

calling a fortran dll from python using cffi with multidimensional arrays

I use a dll that contains differential equation solvers among other useful mathematical tools. Unfortunately, this dll is written in Fortran. My program is written in python 3.7 and I use spyder as an IDE.
I successfully called easy functions from the dll. However, I can't seem to get functions to work that require multidimensional arrays.
This is the online documentation to the function I am trying to call:
https://www.nag.co.uk/numeric/fl/nagdoc_fl26/html/f01/f01adf.html
The kernel dies without an error message if I execute the following code:
import numpy as np
import cffi as cf
ffi=cf.FFI()
lib=ffi.dlopen("C:\Windows\SysWOW64\DLL20DDS")
ffi.cdef("""void F01ADF (const int *n, double** a, const int *lda, int *ifail);""")
#Integer
nx = 4
n = ffi.new('const int*', nx)
lda = nx + 1
lda = ffi.new('const int*', lda)
ifail = 0
ifail = ffi.new('int*', ifail)
#matrix to be inversed
ax1 = np.array([5,7,6,5],dtype = float, order = 'F')
ax2 = np.array([7,10,8,7],dtype = float, order = 'F')
ax3 = np.array([6,8,10,9],dtype = float, order = 'F')
ax4 = np.array([5,7,9,10], dtype = float, order = 'F')
ax5 = np.array([0,0,0,0], dtype = float, order = 'F')
ax = (ax1,ax2,ax3,ax4,ax5)
#Array
zx = np.zeros(nx, dtype = float, order = 'F')
a = ffi.cast("double** ", zx.__array_interface__['data'][0])
for i in range(lda[0]):
a[i] = ffi.cast("double* ", ax[i].__array_interface__['data'][0])
lib.F01ADF(n, a, lda, ifail)
Since function with 1D arrays work I assume that the multidimensional array is the issues.
Any kind of help is greatly appreciated,
Thilo
Not having access to the dll you refer to complicates giving a definitive answer, however, the documentation of the dll and the provided Python script may be enough to diagnose the problem. There are at least two issues in your example:
The C header interface:
Your documentation link clearly states what the function's C header interface should look like. I'm not very well versed in C, Python's cffi or cdef, but the parameter declaration for a in your function interface seems wrong. The double** a (pointer to pointer to double) in your function interface should most likely be double a[] or double* a (pointer to double) as stated in the documentation.
Defining a 2d Numpy array with Fortran ordering:
Note that your Numpy arrays ax1..5 are one dimensional arrays, since the arrays only have one dimension order='F' and order='C' are equivalent in terms of memory layout and access. Thus, specifying order='F' here, probably does not have the intended effect (Fortran using column-major ordering for multi-dimensional arrays).
The variable ax is a tuple of Numpy arrays, not a 2d Numpy array, and will therefore have a very different representation in memory (which is of utmost importance when passing data to the Fortran dll) than a 2d array.
Towards a solution
My first step would be to correct the C header interface. Next, I would declare ax as a proper Numpy array with two dimensions, using Fortran ordering, and then cast it to the appropriate data type, as in this example:
#file: test.py
import numpy as np
import cffi as cf
ffi=cf.FFI()
lib=ffi.dlopen("./f01adf.dll")
ffi.cdef("""void f01adf_ (const int *n, double a[], const int *lda, int *ifail);""")
# integers
nx = 4
n = ffi.new('const int*', nx)
lda = nx + 1
lda = ffi.new('const int*', lda)
ifail = 0
ifail = ffi.new('int*', ifail)
# matrix to be inversed
ax = np.array([[5, 7, 6, 5],
[7, 10, 8, 7],
[6, 8, 10, 9],
[5, 7, 9, 10],
[0, 0, 0, 0]], dtype=float, order='F')
# operation on matrix using dll
print("BEFORE:")
print(ax.astype(int))
a = ffi.cast("double* ", ax.__array_interface__['data'][0])
lib.f01adf_(n, a, lda, ifail)
print("\nAFTER:")
print(ax.astype(int))
For testing purposes, consider the following Fortran subroutine that has the same interface as your actual dll as a substitute for your dll. It will simply add 10**(i-1) to the i'th column of input array a. This will allow checking that the interface between Python and Fortran works as intended, and that the intended elements of array a are operated on:
!file: f01adf.f90
Subroutine f01adf(n, a, lda, ifail)
Integer, Intent (In) :: n, lda
Integer, Intent (Inout) :: ifail
Real(Kind(1.d0)), Intent (Inout) :: a(lda,*)
Integer :: i
print *, "Fortran DLL says: Hello world!"
If ((n < 1) .or. (lda < n+1)) Then
! Input variables not conforming to requirements
ifail = 2
Else
! Input variables acceptable
ifail = 0
! add 10**(i-1) to the i'th column of 2d array 'a'
Do i = 1, n
a(:, i) = a(:, i) + 10**(i-1)
End Do
End If
End Subroutine
Compiling the Fortran code, and then running the suggested Python script, gives me the following output:
> gfortran -O3 -shared -fPIC -fcheck=all -Wall -Wextra -std=f2008 -o f01adf.dll f01adf.f90
> python test.py
BEFORE:
[[ 5 7 6 5]
[ 7 10 8 7]
[ 6 8 10 9]
[ 5 7 9 10]
[ 0 0 0 0]]
Fortran DLL says: Hello world!
AFTER:
[[ 6 17 106 1005]
[ 8 20 108 1007]
[ 7 18 110 1009]
[ 6 17 109 1010]
[ 1 10 100 1000]]

Link Cython-wrapped C functions against BLAS from NumPy

I want to use inside a Cython extension some C functions defined in .c files that uses BLAS subroutines, e.g.
cfile.c
double ddot(int *N, double *DX, int *INCX, double *DY, int *INCY);
double call_ddot(double* a, double* b, int n){
int one = 1;
return ddot(&n, a, &one, b, &one);
}
(Let’s say the functions do more than just call one BLAS subroutine)
pyfile.pyx
cimport numpy as np
import numpy as np
cdef extern from "cfile.c":
double call_ddot(double* a, double* b, int n)
def pyfun(np.ndarray[double, ndim=1] a):
return call_ddot(&a[0], &a[0], <int> a.shape[0])
setup.py:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext
import numpy
setup(
name = "wrapped_cfun",
packages = ["wrapped_cfun"],
cmdclass = {'build_ext': build_ext},
ext_modules = [Extension("wrapped_cfun.cython_part", sources=["pyfile.pyx"], include_dirs=[numpy.get_include()])]
)
I want this package to link against the same BLAS library that the installed NumPy or SciPy are using, and would like it to be installable from PIP under different operating systems using numpy or scipy as dependencies, without any additional BLAS-related dependency.
Is there any hack for setup.py that would allow me to accomplish this, in a way that it could work with any BLAS implementation?
Update:
With MKL, I can make it work by modifying the Extension object to point to libmkl_rt, which can be extracted from numpy if MKL is installed, e.g.:
Extension("wrapped_cfun.cython_part", sources=["pyfile.pyx"], include_dirs=[numpy.get_include()], extra_link_args=["-L{path to python's lib dir}", "-l:libmkl_rt.{so, dll, dylib}"])
However, the same trick does not work for OpenBLAS (e.g. -l:libopenblasp-r0.2.20.so). Pointing to libblas.{so,dll,dylib} will not work if that file is a link to libopenblas, but works fine it it's a link to libmkl_rt.
Update 2:
It seems OpenBLAS names their C functions with an underscore at the end, e.g. not ddot but ddot_. The code above with l:libopenblas will work if I change ddot to ddot_ in the .c file. I'm still wondering if there is some (ideally run-time) mechanism to detect which name should be used in the c file.
An alternative to depending on linker/loader to provide the right blas-functionality, would be to emulate resolution of the necessary blas-symbols (e.g. ddot) and to use the wrapped blas-function provided by scipy during the runtime.
Not sure, this approach is superior to the "normal way" of building, but wanted to bring it to your attention, even if only because I find this approach interesting.
The idea in a nutshell:
Define an explicit function-pointer to ddot-functionality, called my_ddot in the snippet below.
Use my_ddot-pointer where you would use ddot-otherwise.
Initialize my_ddot-pointer when the cython-module is loaded with the functionality provided by scipy.
Here is a working prototype (I use C-code-verbatim to make the snippet standalone and easily testable in a jupiter-notebook, trust you to transform it to format you need/like):
%%cython
# h-file:
cdef extern from *:
"""
// blas-functionality,
// will be initialized by cython when module is loaded:
typedef double (*ddot_t)(int *N, double *DX, int *INCX, double *DY, int *INCY);
extern ddot_t my_ddot;
double call_ddot(double* a, double* b, int n);
"""
ctypedef double (*ddot_t)(int *N, double *DX, int *INCX, double *DY, int *INCY)
ddot_t my_ddot
double call_ddot(double* a, double* b, int n)
# init the functions of the c-library
# with blas-function provided by scipy
from scipy.linalg.cython_blas cimport ddot
my_ddot=ddot
# a simple function to demonstrate, that it works
def ddot_mult(double[:]a, double[:]b):
cdef int n=len(a)
return call_ddot(&a[0], &b[0], n)
#-------------------------------------------------
# c-file, added so the example is complete
cdef extern from *:
"""
ddot_t my_ddot;
double call_ddot(double* a, double* b, int n){
int one = 1;
return my_ddot(&n, a, &one, b, &one);
}
"""
pass
And now ddot_mult can be used:
import numpy as np
a=np.arange(4, dtype=float)
ddot_mult(a,a) # 14.0 as expected!
An advantage of this approach is, that there is no hustle with distutils and you have a guarantee, to use the same blas-functionality as scipy.
Another perk: One could switch the used engine (mkl, open_blas or even an own implementation) during the runtime without the need to recompile/relink.
On there other hand, there is some additional amount of boilerplate-code and also the danger, that initialization of some symbols will be forgotten.
I've finally figured out an ugly hack for this. I'm not sure if it will always work, but at least it works for cobminations of Windows (mingw and visual studio), Linux, MKL and OpenBlas. I'd still like to know if there are better alternatives, but if not, this will do it:
Edit: Corrected for visual studio now
Modify C files to account for names with underscores (do it for each BLAS function that is called) - need to declare each function twice and add an if for each one
double ddot_(int *N, double *DX, int *INCX, double *DY, int *INCY);
#define ddot(N, DX, INCX, DY, INCY) ddot_(N, DX, INCX, DY, INCY)
daxpy_(int *N, double *DA, double *DX, int *INCX, double *DY, int *INCY);
#define daxpy(N, DA, DX, INCX, DY, INCY) daxpy_(N, DA, DX, INCX, DY, INCY)
... etc
Extract library path from NumPy or SciPy and add it to the link arguments.
Detect if the compiler to be used is visual studio, in which case the linking arguments are quite different.
setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext
import numpy
from sys import platform
import os
try:
blas_path = numpy.distutils.system_info.get_info('blas')['library_dirs'][0]
except:
if "library_dirs" in numpy.__config__.blas_mkl_info:
blas_path = numpy.__config__.blas_mkl_info["library_dirs"][0]
elif "library_dirs" in numpy.__config__.blas_opt_info:
blas_path = numpy.__config__.blas_opt_info["library_dirs"][0]
else:
raise ValueError("Could not locate BLAS library.")
if platform[:3] == "win":
if os.path.exists(os.path.join(blas_path, "mkl_rt.lib")):
blas_file = "mkl_rt.lib"
elif os.path.exists(os.path.join(blas_path, "mkl_rt.dll")):
blas_file = "mkl_rt.dll"
else:
import re
blas_file = [f for f in os.listdir(blas_path) if bool(re.search("blas", f))]
if len(blas_file) == 0:
raise ValueError("Could not locate BLAS library.")
blas_file = blas_file[0]
elif platform[:3] == "dar":
blas_file = "libblas.dylib"
else:
blas_file = "libblas.so"
## https://stackoverflow.com/questions/724664/python-distutils-how-to-get-a-compiler-that-is-going-to-be-used
class build_ext_subclass( build_ext ):
def build_extensions(self):
compiler = self.compiler.compiler_type
if compiler == 'msvc': # visual studio
for e in self.extensions:
e.extra_link_args += [os.path.join(blas_path, blas_file)]
else: # gcc
for e in self.extensions:
e.extra_link_args += ["-L"+blas_path, "-l:"+blas_file]
build_ext.build_extensions(self)
setup(
name = "wrapped_cfun",
packages = ["wrapped_cfun"],
cmdclass = {'build_ext': build_ext_subclass},
ext_modules = [Extension("wrapped_cfun.cython_part", sources=["pyfile.pyx"], include_dirs=[numpy.get_include()], extra_link_args=[])]
)
As yet another alternative with more recent Cython versions, one can create a "public" Cython function (which will be made available to C code and auto-generate a public header) that would simply call the corresponding BLAS function:
from scipy.linalg.cython_blas cimport ddot
cdef public double ddot_(int *n, double *x, int *ldx, double *y, int *ldy):
return ddot(n, x, ldx, y, ldy)
Then one simply declares it in the C code or includes the header, and the rest of the Cython extension builder will take care of linkage:
extern double ddot_(int *n, double *x, int *ldx, double *y, int *ldy);

Scipy.optimize - curve fitting with fixed parameters

I'm performing curve fitting with scipy.optimize.leastsq. E.g. for a gaussian:
def fitGaussian(x, y, init=[1.0,0.0,4.0,0.1]):
fitfunc = lambda p, x: p[0]*np.exp(-(x-p[1])**2/(2*p[2]**2))+p[3] # Target function
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
final, success = scipy.optimize.leastsq(errfunc, init[:], args=(x, y))
return fitfunc, final
Now, I want to optionally fix the values of some of the parameters in the fit. I found that suggestions are to use a different package lmfit, which I want to avoid, or are very general, like here.
Since I need a solution which
works with numpy/scipy (no further packages etc.)
is independent of the parameters themselves,
is flexible, in which parameters are fixed or not,
I came up with the following, using a condition on each of the parameters:
def fitGaussian2(x, y, init=[1.0,0.0,4.0,0.1], fix = [False, False, False, False]):
fitfunc = lambda p, x: (p[0] if not fix[0] else init[0])*np.exp(-(x-(p[1] if not fix[1] else init[1]))**2/(2*(p[2] if not fix[2] else init[2])**2))+(p[3] if not fix[3] else init[3])
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
final, success = scipy.optimize.leastsq(errfunc, init[:], args=(x, y))
return fitfunc, final
While this works fine, it's neither practical, nor beautiful.
So my question is: Are there better ways of performing curve fitting in scipy for fixed parameters? Or are there wrappers, which already include such parameter fixing?
Using scipy, there are no builtin options that I am aware of. You will always have to do a work-around like the one you already did.
If you are willing to use a wrapper package however, may I recommend my own symfit? This is a wrapper to scipy with readability and less boilerplate code as its core principles. In symfit, your problem would be solved as:
from symfit import parameters, variables, exp, Fit, Parameter
a, b, c, d = parameters('a, b, c, d')
x, y = variables('x, y')
model_dict = {y: a * exp(-(x - b)**2 / (2 * c**2)) + d}
fit = Fit(model_dict, x=xdata, y=ydata)
fit_result = fit.execute()
The line a, b, c, d = parameters('a, b, c, d') makes four Parameter objects. To fix e.g. the parameter c to its initial value, do the following anywhere before calling fit.execute():
c.value = 4.0
c.fixed = True
So a possible end result might be:
from symfit import parameters, variables, exp, Fit, Parameter
a, b, c, d = parameters('a, b, c, d')
x, y = variables('x, y')
c.value = 4.0
c.fixed = True
model_dict = {y: a * exp(-(x - b)**2 / (2 * c**2)) + d}
fit = Fit(model_dict, x=xdata, y=ydata)
fit_result = fit.execute()
If you want to be more dynamic in your code, you could make the Parameter objects straight away using:
c = Parameter(4.0, fixed=True)
For more info, check the docs: http://symfit.readthedocs.io/en/latest/tutorial.html#simple-example
The above example using symfit would surely simply the syntax of the fitting approach, however, does the example given really constrain the variable c?
If you look at the fit_result.param you get the following:
OrderedDict([('a', 16.374368575343127),
('b', 0.49201249437123556),
('c', 0.5337962977235504),
('d', -9.55593614465743)])
The parameter c is not 4.0.

Itertools for containers

Considder the following interactive example
>>> l=imap(str,xrange(1,4))
>>> list(l)
['1', '2', '3']
>>> list(l)
[]
Does anyone know if there is already an implementation somewhere out there with a version of imap (and the other itertools functions) such that the second time list(l) is executed you get the same as the first. And I don't want the regular map because building the entire output in memory can be a waste of memory if you use larger ranges.
I want something that basically does something like
class cmap:
def __init__(self, function, *iterators):
self._function = function
self._iterators = iterators
def __iter__(self):
return itertools.imap(self._function, *self._iterators)
def __len__(self):
return min( map(len, self._iterators) )
But it would be a waste of time to do this manually for all itertools if someone already did this.
ps.
Do you think containers are more zen then iterators since for an iterator something like
for i in iterator:
do something
implicitly empties the iterator while a container you explicitly need to remove elements.
You do not have to build such an object for each type of container. Basically, you have the following:
mkimap = lambda: imap(str,xrange(1,4))
list(mkimap())
list(mkimap())
Now you onlky need a nice wrapping object to prevent the "ugly" function calls. This could work this way:
class MultiIter(object):
def __init__(self, f, *a, **k):
if a or k:
self.create = lambda: f(*a, **k)
else: # optimize
self.create = f
def __iter__(self):
return self.create()
l = MultiIter(lambda: imap(str, xrange(1,4)))
# or
l = MultiIter(imap, str, xrange(1,4))
# or even
#MultiIter
def l():
return imap(str, xrange(1,4))
# and then
print list(l)
print list(l)
(untested, hope it works, but you should get the idea)
For your 2nd question: Iterators and containers both have their uses. You should take whatever best fits your needs.
You may be looking for itertools.tee()
Iterators are my favorite topic ;)
from itertools import imap
class imap2(object):
def __init__(self, f, *args):
self.g = imap(f,*args)
self.lst = []
self.done = False
def __iter__(self):
while True:
try: # try to get something from g
x = next(self.g)
except StopIteration:
if self.done:
# give the old values
for x in self.lst:
yield x
else:
# g was consumed for the first time
self.done = True
return
else:
self.lst.append(x)
yield x
l=imap2(str,xrange(1,4))
print list(l)
print list(l)