calling a fortran dll from python using cffi with multidimensional arrays - dll

I use a dll that contains differential equation solvers among other useful mathematical tools. Unfortunately, this dll is written in Fortran. My program is written in python 3.7 and I use spyder as an IDE.
I successfully called easy functions from the dll. However, I can't seem to get functions to work that require multidimensional arrays.
This is the online documentation to the function I am trying to call:
https://www.nag.co.uk/numeric/fl/nagdoc_fl26/html/f01/f01adf.html
The kernel dies without an error message if I execute the following code:
import numpy as np
import cffi as cf
ffi=cf.FFI()
lib=ffi.dlopen("C:\Windows\SysWOW64\DLL20DDS")
ffi.cdef("""void F01ADF (const int *n, double** a, const int *lda, int *ifail);""")
#Integer
nx = 4
n = ffi.new('const int*', nx)
lda = nx + 1
lda = ffi.new('const int*', lda)
ifail = 0
ifail = ffi.new('int*', ifail)
#matrix to be inversed
ax1 = np.array([5,7,6,5],dtype = float, order = 'F')
ax2 = np.array([7,10,8,7],dtype = float, order = 'F')
ax3 = np.array([6,8,10,9],dtype = float, order = 'F')
ax4 = np.array([5,7,9,10], dtype = float, order = 'F')
ax5 = np.array([0,0,0,0], dtype = float, order = 'F')
ax = (ax1,ax2,ax3,ax4,ax5)
#Array
zx = np.zeros(nx, dtype = float, order = 'F')
a = ffi.cast("double** ", zx.__array_interface__['data'][0])
for i in range(lda[0]):
a[i] = ffi.cast("double* ", ax[i].__array_interface__['data'][0])
lib.F01ADF(n, a, lda, ifail)
Since function with 1D arrays work I assume that the multidimensional array is the issues.
Any kind of help is greatly appreciated,
Thilo

Not having access to the dll you refer to complicates giving a definitive answer, however, the documentation of the dll and the provided Python script may be enough to diagnose the problem. There are at least two issues in your example:
The C header interface:
Your documentation link clearly states what the function's C header interface should look like. I'm not very well versed in C, Python's cffi or cdef, but the parameter declaration for a in your function interface seems wrong. The double** a (pointer to pointer to double) in your function interface should most likely be double a[] or double* a (pointer to double) as stated in the documentation.
Defining a 2d Numpy array with Fortran ordering:
Note that your Numpy arrays ax1..5 are one dimensional arrays, since the arrays only have one dimension order='F' and order='C' are equivalent in terms of memory layout and access. Thus, specifying order='F' here, probably does not have the intended effect (Fortran using column-major ordering for multi-dimensional arrays).
The variable ax is a tuple of Numpy arrays, not a 2d Numpy array, and will therefore have a very different representation in memory (which is of utmost importance when passing data to the Fortran dll) than a 2d array.
Towards a solution
My first step would be to correct the C header interface. Next, I would declare ax as a proper Numpy array with two dimensions, using Fortran ordering, and then cast it to the appropriate data type, as in this example:
#file: test.py
import numpy as np
import cffi as cf
ffi=cf.FFI()
lib=ffi.dlopen("./f01adf.dll")
ffi.cdef("""void f01adf_ (const int *n, double a[], const int *lda, int *ifail);""")
# integers
nx = 4
n = ffi.new('const int*', nx)
lda = nx + 1
lda = ffi.new('const int*', lda)
ifail = 0
ifail = ffi.new('int*', ifail)
# matrix to be inversed
ax = np.array([[5, 7, 6, 5],
[7, 10, 8, 7],
[6, 8, 10, 9],
[5, 7, 9, 10],
[0, 0, 0, 0]], dtype=float, order='F')
# operation on matrix using dll
print("BEFORE:")
print(ax.astype(int))
a = ffi.cast("double* ", ax.__array_interface__['data'][0])
lib.f01adf_(n, a, lda, ifail)
print("\nAFTER:")
print(ax.astype(int))
For testing purposes, consider the following Fortran subroutine that has the same interface as your actual dll as a substitute for your dll. It will simply add 10**(i-1) to the i'th column of input array a. This will allow checking that the interface between Python and Fortran works as intended, and that the intended elements of array a are operated on:
!file: f01adf.f90
Subroutine f01adf(n, a, lda, ifail)
Integer, Intent (In) :: n, lda
Integer, Intent (Inout) :: ifail
Real(Kind(1.d0)), Intent (Inout) :: a(lda,*)
Integer :: i
print *, "Fortran DLL says: Hello world!"
If ((n < 1) .or. (lda < n+1)) Then
! Input variables not conforming to requirements
ifail = 2
Else
! Input variables acceptable
ifail = 0
! add 10**(i-1) to the i'th column of 2d array 'a'
Do i = 1, n
a(:, i) = a(:, i) + 10**(i-1)
End Do
End If
End Subroutine
Compiling the Fortran code, and then running the suggested Python script, gives me the following output:
> gfortran -O3 -shared -fPIC -fcheck=all -Wall -Wextra -std=f2008 -o f01adf.dll f01adf.f90
> python test.py
BEFORE:
[[ 5 7 6 5]
[ 7 10 8 7]
[ 6 8 10 9]
[ 5 7 9 10]
[ 0 0 0 0]]
Fortran DLL says: Hello world!
AFTER:
[[ 6 17 106 1005]
[ 8 20 108 1007]
[ 7 18 110 1009]
[ 6 17 109 1010]
[ 1 10 100 1000]]

Related

pycuda - memcpy_dtoh, not giving what appears to have been set

I have a very simple function where I'm passing in a char array and doing a simple character match. I want to return an array of 1/0 depending on which characters are matched.
Problem: although I can see the value has been set in the data structure (as I print it in the function after it's assigned) when the int array is copied back from the device the values aren't as expected.
I'm sure it's something silly.
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np
mod = SourceModule("""
__global__ void test(const char *q, const int chrSize, int *d, const int intSize) {
int v = 0;
if( q[threadIdx.x * chrSize] == 'a' || q[threadIdx.x * chrSize] == 'c' ) {
v = 1;
}
d[threadIdx.x * intSize] = v;
printf("x=%d, y=%d, val=%c ret=%d\\n", threadIdx.x, threadIdx.y, q[threadIdx.x * chrSize], d[threadIdx.x * intSize]);
}
""")
func = mod.get_function("test")
# input data
a = np.asarray(['a','b','c','d'], dtype=np.str_)
# allocate/copy to device
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
# destination array
d = np.zeros((4), dtype=np.int16)
# allocate/copy to device
d_gpu = cuda.mem_alloc(d.nbytes)
cuda.memcpy_htod(d_gpu, d)
# run the function
func(a_gpu, np.int8(a.dtype.itemsize), d_gpu, np.int8(d.dtype.itemsize), block=(4,1,1))
# copy data back and priint
cuda.memcpy_dtoh(d, d_gpu)
print(d)
Output:
x=0, y=0, val=a ret=1
x=1, y=0, val=b ret=0
x=2, y=0, val=c ret=1
x=3, y=0, val=d ret=0
[1 0 0 0]
Expected output:
x=0, y=0, val=a ret=1
x=1, y=0, val=b ret=0
x=2, y=0, val=c ret=1
x=3, y=0, val=d ret=0
[1 0 1 0]
You have two main problems, neither of which have anything to do with memcpy_dtoh:
You have declared d and d_gpu as dtype np.int16, but the kernel is expecting C++ int, leading to a type mistmatch. You should use the np.int32 type to define the arrays.
The indexing of d within the kernel is incorrect. If you have declared the array to the compiler as a 32 bit type, indexing the array as d[threadIdx.x] will automatically include the correct alignment for the type. Passing and using intSize to the kernel for indexing d is not required and it is incorrect to do so.
If you fix those two issues, I suspect the code will work as intended.

What is wrong with my cython implementation of erosion operation of mathematical morphology

I have produced a naive implementation of "erosion". The performance is not relevant since I just trying to understand the algorithm. However, the output of my implementation does not match the one I get from scipy.ndimage. What is wrong with my implementation ?
Here is my implementation with a small test case:
import numpy as np
from PIL import Image
# a small image to play with a cross structuring element
imgmat = np.array([
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1,0,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,1,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0],
])
imgmat2 = np.where(imgmat == 0, 0, 255).astype(np.uint8)
imarr = Image.fromarray(imgmat2).resize((100, 200))
imarr = np.array(imgrrr)
imarr = np.where(imarr == 0, 0, 1)
se_mat3 = np.array([
[0,1,0],
[1,1,1],
[0,1,0]
])
se_mat31 = np.where(se_mat3 == 1, 0, 1)
The imarr is .
My implementation of erosion:
%%cython -a
import numpy as np
cimport numpy as cnp
cdef erosionC(cnp.ndarray[cnp.int_t, ndim=2] img,
cnp.ndarray[cnp.int_t, ndim=2] B, cnp.ndarray[cnp.int_t, ndim=2] X):
"""
X: image coordinates
struct_element_mat: black and white image, black region is considered as the shape
of structuring element
This operation checks whether (B *includes* X) = $B \subset X$
as per defined in
Serra (Jean), « Introduction to mathematical morphology »,
Computer Vision, Graphics, and Image Processing,
vol. 35, nᵒ 3 (septembre 1986).
URL : https://linkinghub.elsevier.com/retrieve/pii/0734189X86900022..
doi: 10.1016/0734-189X(86)90002-2
Consulted le 6 août 2020, p. 283‑305.
"""
cdef cnp.ndarray[cnp.int_t, ndim=1] a, x, bx
cdef cnp.ndarray[cnp.int_t, ndim=2] Bx, B_frame, Xcp, b
cdef bint check
a = B[0] # get an anchor point from the structuring element coordinates
B_frame = B - a # express the se element coordinates in with respect to anchor point
Xcp = X.copy()
b = img.copy()
for x in X: # X contains the foreground coordinates in the image
Bx = B_frame + x # translate relative coordinates with respect to foreground coordinates considering it as the anchor point
check = True # this is erosion so if any of the se coordinates is not in foreground coordinates we consider it a miss
for bx in Bx: # Bx contains all the translated coordinates of se
if bx not in Xcp:
check = False
if check:
b[x[0], x[1]] = 1 # if there is a hit
else:
b[x[0], x[1]] = 0 # if there is no hit
return b
def erosion(img: np.ndarray, struct_el_mat: np.ndarray, foregroundValue = 0):
B = np.argwhere(struct_el_mat == 0)
X = np.argwhere(img == foregroundValue)
nimg = erosionC(img, B, X)
return np.where(nimg == 1, 255, 0)
The calling code for both is:
from scipy import ndimage as nd
err = nd.binary_erosion(imarr, se_mat3)
imerrCustom = erosion(imarr, se_mat31, foregroundValue=1)
err produces
imerrCustom produces
In the end, I am still not sure about it, but after having read several papers more, I assume that my interpretation of X as foreground coordinates was an error. It should have probably been the entire image that is being iterated.
As I have stated I am not sure if this interpretation is correct as well. But I made a new implementation which iterates over the image, and it gives a more plausible result. I am sharing it in here, hoping that it might help someone:
%%cython -a
import numpy as np
cimport numpy as cnp
cdef dilation_c(cnp.ndarray[cnp.uint8_t, ndim=2] X,
cnp.ndarray[cnp.uint8_t, ndim=2] SE):
"""
X: boolean image
SE: structuring element matrix
origin: coordinate of the origin of the structuring element
This operation checks whether (B *hits* X) = $B \cap X \not = \emptyset$
as per defined in
Serra (Jean), « Introduction to mathematical morphology »,
Computer Vision, Graphics, and Image Processing,
vol. 35, nᵒ 3 (septembre 1986).
URL : https://linkinghub.elsevier.com/retrieve/pii/0734189X86900022..
doi: 10.1016/0734-189X(86)90002-2
Consulted le 6 août 2020, p. 283‑305.
The algorithm adapts DILDIRECT of
Najman (Laurent) et Talbot (Hugues),
Mathematical morphology: from theory to applications,
2013. ISBN : 9781118600788, p. 329
to the formula given in
Jähne (Bernd),
Digital image processing,
6th rev. and ext. ed, Berlin ; New York,
2005. TA1637 .J34 2005.
ISBN : 978-3-540-24035-8.
"""
cdef cnp.ndarray[cnp.uint8_t, ndim=2] O
cdef list elst
cdef int r, c, X_rows, X_cols, SE_rows, SE_cols, se_r, se_c
cdef cnp.ndarray[cnp.int_t, ndim=1] bp
cdef list conds
cdef bint check, b, p, cond
O = np.zeros_like(X)
X_rows, X_cols = X.shape[:2]
SE_rows, SE_cols = SE.shape[:2]
# a boolean convolution
for r in range(0, X_rows-SE_rows):
for c in range(0, X_cols - SE_cols):
conds = []
for se_r in range(SE_rows):
for se_c in range(SE_cols):
b = <bint>SE[se_r, se_c]
p = <bint>X[se_r+r, se_c+c]
conds.append(b and p)
O[r,c] = <cnp.uint8_t>any(conds)
return O
def dilation_erosion(
img: np.ndarray,
struct_el_mat: np.ndarray,
foregroundValue: int = 1,
isErosion: bool = False):
"""
img: image matrix
struct_el: NxN mesh grid of the structuring element whose center is SE's origin
structuring element is encoded as 1
foregroundValue: value to be considered as foreground in the image
"""
B = struct_el_mat.astype(np.uint8)
if isErosion:
X = np.where(img == foregroundValue, 0, 1).astype(np.uint8)
else:
X = np.where(img == foregroundValue, 1, 0).astype(np.uint8)
nimg = dilation_c(X, B)
foreground, background = (255, 0) if foregroundValue == 1 else (0, 1)
if isErosion:
return np.where(nimg == 1, background, foreground).astype(np.uint8)
else:
return np.where(nimg == 1, foreground, background).astype(np.uint8)
# return nimg

Initializing variables in gekko using the array model function

Defining an array of Gekko variables does not allow any arguments to initialize the variables. For example, I am unable to make an array of integer variables using the m.Array function.
I can make an array of variables using this syntax: m.Array(m.Var, (42, 42)). However, I don't know how to make this array an array of integer variables because the m.Var passed in to the m.Array function does not take any arguments.
I have a single variable as an integer variable:
my_var_is_an_integer_var = m.Var(0, lb=0, ub=1, integer=True)
I have an array of variables that are not integer variables:
my_array_vars_are_not_integer_vars = m.Array(m.Var, (42, 42))
I want an array of integer variables: my_array_vars_are_integer_vars = m.Array(m.Var(0, lb=0, ub=1, integer=True), (42,42)) (Throws error)
HOW DO I INITIALIZE THE VARIABLES IN THE ARRAY TO BE INTEGER VARIABLES???
Error when trying to initialize array as integer variables:
Traceback (most recent call last):
File "integer_array.py", line 7, in <module>
my_array_vars_are_not_integer_vars = m.Array(m.Var(0, lb=0, ub=1,
integer=True), (42,42))
File "C:\Users\wills\Anaconda3\lib\site-packages\gekko\gekko.py", line
1831, in Array
i[...] = f(**args)
TypeError: 'GKVariable' object is not callable
If you need to pass additional arguments when creating a variable array, you can use one of the following options. Option 1 creates a Numpy array while Options 2 and 3 create a Python list.
Option 1 (Preferred)
Create a numpy array with the m.Array function with additional argument integer=True:
y = m.Array(m.Var,(42,42),lb=0,ub=1,integer=True)
Option 2
Create a 2D list of variables with a list comprehension:
y = [[m.Var(lb=0,ub=1,integer=True) for i in range(42)] for j in range(42)]
Option 3
Alternatively, you can create an empty list (y) and append binary values to that list.
y = [[None]*42]*42
for i in range(42):
for j in range(42):
y[i][j] = m.Var(lb=0,ub=1,integer=True)
The UPPER and LOWER bounds can be changed after the variable creation but the integer option is only available at initialization. Don't forget to switch to the APOPT MINLP solver for integer variable solutions with m.options.SOLVER = 1. Below is a complete example that uses all three options but with a 3x4 array for x, y, and z.
from gekko import GEKKO
import numpy as np
m = GEKKO()
# option 1
x = m.Array(m.Var,(3,4),lb=0,ub=1,integer=True)
# option 2
y = [[m.Var(lb=0,ub=1,integer=True) for i in range(4)] for j in range(3)]
# option 3
z = [[None]*4]*3
for i in range(3):
for j in range(4):
z[i][j] = m.Var(lb=0,ub=1,integer=True)
# switch to APOPT
m.options.SOLVER = 1
# define objective function
m.Minimize(m.sum(m.sum(x)))
m.Minimize(m.sum(m.sum(np.array(y))))
m.Minimize(m.sum(m.sum(np.array(z))))
# define equation
m.Equation(x[1,2]==0)
m.Equation(m.sum(x[:,0])==2)
m.Equation(m.sum(x[:,1])==3)
m.Equation(m.sum(x[2,:])==1)
m.solve(disp=True)
print(x)
The objective is to minimize sum of all the elements in x, y, and z but there are certain constraints on an element, row, and columns of x. The solution is:
[[[1.0] [1.0] [0.0] [0.0]]
[[1.0] [1.0] [0.0] [0.0]]
[[0.0] [1.0] [0.0] [0.0]]]

How to optimize the linear coefficients for numpy arrays in a maximization function?

I have to optimize the coefficients for three numpy arrays which maximizes my evaluation function.
I have a target array called train['target'] and three predictions arrays named array1, array2 and array3.
I want to put the best linear coefficients i.e., x,y,z for these three arrays which will maximize the function
roc_aoc_curve(train['target'], xarray1 + yarray2 +z*array3)
the above function would be maximum when prediction is closer to the target.
i.e, xarray1 + yarray2 + z*array3 should be closer to train['target'].
The range of x,y,z >=0 and x,y,z <= 1
Basically I am trying to put the weights x,y,z for each of the three arrays which would make the function
xarray1 + yarray2 +z*array3 closer to the train['target']
Any help in getting this would be appreciated.
I used pulp.LpProblem('Giapetto', pulp.LpMaximize) to do the maximization. It works for normal numbers, integers etc, however failing while trying to do with arrays.
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
prob += score
coef = x+y+z
prob += (coef==1)
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Getting error at the line
score = roc_auc_score(train['target'],x*array1+ y*array2 + z*array3)
TypeError: unsupported operand type(s) for /: 'int' and 'LpVariable'
Can't progress beyond this line when using arrays. Not sure if my approach is correct. Any help in optimizing the function would be appreciated.
When you add sums of array elements to a PuLP model, you have to use built-in PuLP constructs like lpSum to do it -- you can't just add arrays together (as you discovered).
So your score definition should look something like this:
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
A few notes about this:
[+] You didn't provide the definition of roc_auc_score so I just pretended that it equals the sum of the element-wise difference between the target array and the weighted sum of the other 3 arrays.
[+] I suspect your actual calculation for roc_auc_score is nonlinear; more on this below.
[+] arr_ind is a list of the indices of the arrays, which I created like this:
# build array index
arr_ind = range(len(array1))
[+] You also didn't include the arrays, so I created them like this:
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
train = {}
train['target'] = np.ones((10, 1))
Here is my complete code, which compiles and executes, though I'm sure it doesn't give you the result you are hoping for, since I just guessed about target and roc_auc_score:
import numpy as np
import pulp
# create the LP object, set up as a maximization problem
prob = pulp.LpProblem('Giapetto', pulp.LpMaximize)
# dummy arrays since arrays weren't in OP code
array1 = np.random.rand(10, 1)
array2 = np.random.rand(10, 1)
array3 = np.random.rand(10, 1)
# build array index
arr_ind = range(len(array1))
# set up decision variables
x = pulp.LpVariable('x', lowBound=0)
y = pulp.LpVariable('y', lowBound=0)
z = pulp.LpVariable('z', lowBound=0)
# dummy roc_auc_score since roc_auc_score wasn't in OP code
train = {}
train['target'] = np.ones((10, 1))
score = pulp.lpSum([train['target'][i] - (x * array1[i] + y * array2[i] + z * array3[i]) for i in arr_ind])
prob += score
coef = x + y + z
prob += coef == 1
# solve the LP using the default solver
optimization_result = prob.solve()
# make sure we got an optimal solution
assert optimization_result == pulp.LpStatusOptimal
# display the results
for var in (x, y,z):
print('Optimal weekly number of {} to produce: {:1.0f}'.format(var.name, var.value()))
Output:
Optimal weekly number of x to produce: 0
Optimal weekly number of y to produce: 0
Optimal weekly number of z to produce: 1
Process finished with exit code 0
Now, if your roc_auc_score function is nonlinear, you will have additional troubles. I would encourage you to try to formulate the score in a way that is linear, possibly using additional variables (for example, if you want the score to be an absolute value).

Why does cythons in-place division of numpy arrays use conversion to python floats?

I tried to normalize a vector stored as numpy array, but cython -a shows unexpected conversions to Python values in this code.
Minimal example:
import numpy as np
cimport cython
cimport numpy as np
#cython.wraparound(False)
#cython.boundscheck(False)
cdef vec_diff(np.ndarray[double, ndim=1] vec1, double m):
vec1/=m
return vec1
Cython 0.29.6 run with the -a option generates the following code for the line vec1/=m:
__pyx_t_1 = PyFloat_FromDouble(__pyx_v_m); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_2 = __Pyx_PyNumber_InPlaceDivide(((PyObject *)__pyx_v_vec1), __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
if (!(likely(((__pyx_t_2) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_2, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 8, __pyx_L1_error)
__pyx_t_3 = ((PyArrayObject *)__pyx_t_2);
{
__Pyx_BufFmt_StackElem __pyx_stack[1];
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_vec1.rcbuffer->pybuffer);
__pyx_t_4 = __Pyx_GetBufferAndValidate(&__pyx_pybuffernd_vec1.rcbuffer->pybuffer, (PyObject*)__pyx_t_3, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack);
if (unlikely(__pyx_t_4 < 0)) {
PyErr_Fetch(&__pyx_t_5, &__pyx_t_6, &__pyx_t_7);
if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_vec1.rcbuffer->pybuffer, (PyObject*)__pyx_v_vec1, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {
Py_XDECREF(__pyx_t_5); Py_XDECREF(__pyx_t_6); Py_XDECREF(__pyx_t_7);
__Pyx_RaiseBufferFallbackError();
} else {
PyErr_Restore(__pyx_t_5, __pyx_t_6, __pyx_t_7);
}
__pyx_t_5 = __pyx_t_6 = __pyx_t_7 = 0;
}
__pyx_pybuffernd_vec1.diminfo[0].strides = __pyx_pybuffernd_vec1.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_vec1.diminfo[0].shape = __pyx_pybuffernd_vec1.rcbuffer->pybuffer.shape[0];
if (unlikely(__pyx_t_4 < 0)) __PYX_ERR(0, 8, __pyx_L1_error)
}
__pyx_t_3 = 0;
__Pyx_DECREF_SET(__pyx_v_vec1, ((PyArrayObject *)__pyx_t_2));
__pyx_t_2 = 0;
where the first line __pyx_t_1 = PyFloat_FromDouble(__pyx_v_m); has PyFloat_FromDouble highlighted in dark red.
Given that I have told cython that the array contains double values, why does it have to convert to a python float?
Note: Memoryviews do not support the /= operation (would require a loop)
Because this isn't something that Cython does anything special for or optimises at all. All it's doing is calling __Pyx_PyNumber_InPlaceDivide on the Numpy array, which calls the Numpy array's __idiv__ operator.
Since it's calling a Python operator it needs to pass a Python object as the second argument, and hence it needs to convert your double to a Python float.
The Numpy __idiv__ operator is almost certainly written in C so likely to be pretty fast (although there is a little overhead calling it) so there's not a lot of value in Cython doing anything except delegating to Numpy's code.
Memoryviews don't define the whole-array operators (they're just ways to access memory so don't make any claims about meaningful mathematical operations) and hence the fact that it doesn't work is consistent with how Cython deals with these operators.