Cython self defined data types ndarray - numpy

I've created a dtype for my np.ndarrays:
particle_t = np.dtype([
('position', float, 2),
('momentum', float, 2),
('velocity', float, 2),
('force', float, 2),
('charge', int, 1),
])
According to the official examples one can call:
def my_func(np.ndarray[dtype, dim] particles):
but when I try to compile:
def tester(np.ndarray[particle_t, ndim = 1] particles):
I get the Invalid type error. Another possibility of usage I've seen is with the memory view like int[:]. Trying def tester(particle_t[:] particles): results in:
'particle_t' is not a type identifier.
How can I fix this?

Obviously, particle_t is not a type but a Python-object as far as Cython is concerned.
It is similar to np.int32 being a Python-object and thus
def tester(np.ndarray[np.int32] particles): #doesn't work!
pass
won't work, you need to use the corresponding type, i.e. np.int32_t:
def tester(np.ndarray[np.int32_t] particles): #works!
pass
But what is the corresponding type for particle_t? You need to create a packed struct, which would mirror your numpy-type. Here is a simplified version:
#Python code:
particle_t = np.dtype([
('position', np.float64, 2), #It is better to specify the number of bytes exactly!
('charge', np.int32, 1), #otherwise you might be surprised...
])
and corresponding Cython code:
%%cython
import numpy as np
cimport numpy as np
cdef packed struct cy_particle_t:
np.float64_t position_x[2]
np.int32_t charge
def tester(np.ndarray[cy_particle_t, ndim = 1] particles):
print(particles[0])
Not only does it compile and load, it also works as advertised:
>>> t=np.zeros(2, dtype=particle_t)
>>> t[:]=42
>>> tester(t)
{'charge': 42, 'position_x': [42.0, 42.0]}

Related

Unable to assign values to numpy array in cython

I am fairly new to parallel programming in cython and I was trying to create a 1D array of size 3 from numpy, however I am unable to assign values to this array unless I specify it element by element.
import numpy as np
cimport numpy as cnp
cdef int num = 3
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight = np.ones((num), dtype = "int")
Weight[2] = 6
print(Weight)
Output -> [1 1 6]
Weight = cnp.ndarray([1,2,3])
Output -> ValueError: Buffer has wrong number of dimensions (expected 1, got 3)
In the comments I suggested changing:
Weight = cnp.ndarray([1,2,3])
to
Weight = np.array([1,2,3])
Just to clarify my comment a bit more:
The line
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight = np.ones((num), dtype = "int")
is effectively two parts:
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight
a declaration only. This allocated no memory, it merely creates a variable that can be a reference to a numpy array, and which allows quick indexing.
Weight = np.ones((num), dtype = "int")
This is a normal Python call to np.ones which allocates memory for the array. It is largely un-accelerated by Cython. From this point on Weight is a reference to that allocated array and can be used to change it. Note that Weight = ... in following lines will change what array Weight references.
I therefore suggest you skip the np.ones step and just do
cdef cnp.ndarray[cnp.int_t, ndim = 1] Weight = np.ones([1,2,3], dtype = "int")
Be aware that the only bit of Numpy that using these declarations accelerates is indexing into the array. Almost all other Numpy calls happen through the normal Python mechanism and will require the GIL and will happen at normal Python speed.

numpy ndarray error in lmfit when mdel is passed using sympy

I got the following error:
<lambdifygenerated-1>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.return numpy.array((A1exp(-1/2(x - xc1)**2/sigma1**2), 0, 0))
Here I have just one model but this code is written for model combination in fitting by the lmfit Please kindly let me know about it.
import matplotlib.pyplot as plt
import numpy as np
import sympy
from sympy.parsing import sympy_parser
import lmfit
gauss_peak1 = sympy_parser.parse_expr('A1*exp(-(x-xc1)**2/(2*sigma1**2))')
gauss_peak2 = 0
exp_back = 0
model_list = sympy.Array((gauss_peak1, gauss_peak2, exp_back))
model = sum(model_list)
print(model)
model_list_func = sympy.lambdify(list(model_list.free_symbols), model_list)
model_func = sympy.lambdify(list(model.free_symbols), model)
np.random.seed(1)
x = np.linspace(0, 10, 40)
param_values = dict(x=x, A1=2, sigma1=1, xc1=2)
y = model_func(**param_values)
yi = model_list_func(**param_values)
yn = y + np.random.randn(y.size)*0.4
plt.plot(x, yn, 'o')
plt.plot(x, y)
lm_mod = lmfit.Model(model_func, independent_vars=('x'))
res = lm_mod.fit(data=yn, **param_values)
res.plot_fit()
plt.plot(x, y, label='true')
plt.legend()
plt.show()
lmfit.Model takes a model function that is a Python function. It parses the function arguments and expects those to be the Parameters for the model.
I don't think using sympy-created functions will do that. Do you need to use sympy here? I don't see why. The usage here seems designed to make the code more complex than it needs to be. It seems you want to make a model with a Gaussian-like peak, and a constant(?) background. If so, why not do
from lmfit.Models import GaussianModel, ConstantModel
model = GaussianModel(prefix='p1_') + ConstantModel()
params = model.make_params(p1_amplitude=2, p1_center=2, p1_sigma=1, c=0)
That just seems way easier to me, and it is very easy to add a second Gaussian peak to that model.
But even if you have your own preferred mathematical expression, don't use that as a sympy string, use it as Python code:
def myfunction(x, A1, xc1, sigma1):
return A1*exp(-(x-xc1)**2/(2*sigma1**2))
and then
from lmfit import Model
mymodel = Model(myfunction)
params = mymodel.guess(A1=2, xc1=2, sigma1=1)
In short: sympy is an amazing tool, but lmfit does not use it.

passing numpy array as parameter in theano function

As a beginner, i was trying to simply compute the dot product of two matrices using theano.
my code is very simple.
import theano
import theano.tensor as T
import numpy as np
from theano import function
def covarience(array):
input_array=T.matrix('input_array')
deviation_matrix = T.matrix('deviation_matrix')
matrix_filled_with_1s=T.matrix('matrix_filled_with_1s')
z = T.dot(input_array, matrix_filled_with_1s)
identity=np.ones((len(array),len(array)))
f=function([array,identity],z)
# print(f)
covarience(np.array([[2,4],[6,8]]))
but the problem is each time i run this code , i get error message like "TypeError: Unknown parameter type: "
Can anyone tell me whats wrong with my code?
You cannot pass numpy array to theano function, theano functions can only be defined by theano.tensor variables. So you can always define computations with interaction of tensor/symbolic variables, and to perform actual computation on values/real data you can use functions, it doesn't make sense to define theano function itself with numpy array.
This should work:
import theano
import theano.tensor as T
import numpy as np
a = T.matrix('a')
b = T.matrix('b')
z = T.dot(a, b)
f = theano.function([a, b], z)
a_d = np.asarray([[2, 4], [6, 8]], dtype=theano.config.floatX)
b_d = np.ones(a_d.shape, dtype=theano.config.floatX)
print(f(a_d, b_d))

Passing structured array to Cython, failed (I think it is a Cython bug)

Suppose I have
a = np.zeros(2, dtype=[('a', np.int), ('b', np.float, 2)])
a[0] = (2,[3,4])
a[1] = (6,[7,8])
then I define the same Cython structure
import numpy as np
cimport numpy as np
cdef packed struct mystruct:
np.int_t a
np.float_t b[2]
def test_mystruct(mystruct[:] x):
cdef:
int k
mystruct y
for k in range(2):
y = x[k]
print y.a
print y.b[0]
print y.b[1]
after this, I run
test_mystruct(a)
and I got error:
ValueError Traceback (most recent call last)
<ipython-input-231-df126299aef1> in <module>()
----> 1 test_mystruct(a)
_cython_magic_5119cecbaf7ff37e311b745d2b39dc32.pyx in _cython_magic_5119cecbaf7ff37e311b745d2b39dc32.test_mystruct (/auto/users/pwang/.cache/ipython/cython/_cython_magic_5119cecbaf7ff37e311b745d2b39dc32.c:1364)()
ValueError: Expected 1 dimension(s), got 1
My question is how to fix it? Thank you.
This pyx compiles and imports ok:
import numpy as np
cimport numpy as np
cdef packed struct mystruct:
int a[2] # change from plain int
float b[2]
int c
def test_mystruct(mystruct[:] x):
cdef:
int k
mystruct y
for k in range(2):
y = x[k]
print y.a
print y.b[0]
print y.b[1]
dt='2i,2f,i'
b=np.zeros((3,),dtype=dt)
test_mystruct(b)
I started with the test example mentioned in my comment, and played with your case. I think the key change was to define the first element of the packed structure to be int a[2]. So if any element is an array, the first must an array to properly set up the structure.
Clearly an error that the test file isn't catching.
Defining the element as int a[1] doesn't work, possibly because the dtype removes such a dimension:
In [47]: np.dtype([('a', np.int, 1), ('b', np.float, 2)])
Out[47]: dtype([('a', '<i4'), ('b', '<f8', (2,))])
Defining the dtype to get around this shouldn't be hard until the issue is raised and patched.
The struct could have a[1], but the array dtype would have to specify the size with a tuple: ('a','i',(1,)). ('a','i',1) would make the size ().
If one of the struct arrays is 2d, it looks like all of them have to be:
cdef packed struct mystruct:
int a[1][1]
float b[2][1]
int c[2][2]
https://github.com/cython/cython/blob/c4c2e3d8bd760386b26dbd6cffbd4e30ba0a7d13/tests/memoryview/numpy_memoryview.pyx
Stepping back a bit, I wonder what's the point to processing a complex structured array in cython. For some operations wouldn't it work just as well to pass the fields as separate variables. For example myfunc(a['a'],a['b']) instead of myfunc(a).
There is a general method to get the dtype for a c struct, but it involves a temporary variable:
cdef mystruct _tmp
dt = np.asarray(<mystruct[:1]>(&_tmp)).dtype
This requires at least numpy 1.5. See discussion here: https://github.com/scikit-learn/scikit-learn/pull/2298

Turn 2D NumPy array into 1D array for plotting a histogram

I'm trying to plot a histogram with matplotlib.
I need to convert my one-line 2D Array
[[1,2,3,4]] # shape is (1,4)
into a 1D Array
[1,2,3,4] # shape is (4,)
How can I do this?
Adding ravel as another alternative for future searchers. From the docs,
It is equivalent to reshape(-1, order=order).
Since the array is 1xN, all of the following are equivalent:
arr1d = np.ravel(arr2d)
arr1d = arr2d.ravel()
arr1d = arr2d.flatten()
arr1d = np.reshape(arr2d, -1)
arr1d = arr2d.reshape(-1)
arr1d = arr2d[0, :]
You can directly index the column:
>>> import numpy as np
>>> x2 = np.array([[1,2,3,4]])
>>> x2.shape
(1, 4)
>>> x1 = x2[0,:]
>>> x1
array([1, 2, 3, 4])
>>> x1.shape
(4,)
Or you can use squeeze:
>>> xs = np.squeeze(x2)
>>> xs
array([1, 2, 3, 4])
>>> xs.shape
(4,)
reshape will do the trick.
There's also a more specific function, flatten, that appears to do exactly what you want.
the answer provided by mtrw does the trick for an array that actually only has one line like this one, however if you have a 2d array, with values in two dimension you can convert it as follows
a = np.array([[1,2,3],[4,5,6]])
From here you can find the shape of the array with np.shape and find the product of that with np.product this now results in the number of elements. If you now use np.reshape() to reshape the array to one length of the total number of element you will have a solution that always works.
np.reshape(a, np.product(a.shape))
>>> array([1, 2, 3, 4, 5, 6])
Use numpy.flat
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[1,0,0,1],
[2,0,1,0]])
plt.hist(a.flat, [0,1,2,3])
The flat property returns a 1D iterator over your 2D array. This method generalizes to any number of rows (or dimensions). For large arrays it can be much more efficient than making a flattened copy.