How to pass array of datetime64[ns] to pybind11 without copying

How to pass array of datetime64[ns] to pybind11 without copying - numpy

I am trying to pass an array of datetime64[ns] into C++ using pybind11. For arrays of integers or floating point values, one can simply use the wrapper py::array_t<int64_t or double>.
Is there a dedicated type in pybind11 for datetime64[ns] in C++ side so that I can capture as py::array_t<DateTime>?
As a sub-optimal solution, it would be already a great improvement if I can instead pass the underlying storage of a datetime64[ns] array which is expected to be stored as an array of int64. Is there any light way (i.e., no copy) to pass this as an array of int64?

It's not the most ergonomic, but you could probably use py::array_t<int64_t> as the pybind11 interface's type, and convert the array (without copy) like this:
In [1]: a = np.array([np.datetime64(x, 'ns') for x in range(10)])
In [2]: v = a.view(dtype=np.int64)
In [3]: v
Out[3]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)
In [4]: some_pybind_method(v)
There's a way to write a custom type conversion that does this transparently, but I'm not familiar with the required Python C-API needed for this.

Related

Newshape parameter in numpy.reshape can be a list?

The numpy official document specifies the synopsis of reshape as follows:
numpy.reshape(a, newshape, order='C')
where newshape can be an int or tuple of ints.
The document does not say newshape can be a list, but my testing indicates that newshape can be a list. For example:
a = np.array([[1,2,3],[4,5,6]])
b = a.reshape([3,2])
>>> b
array([[1, 2],
[3, 4],
[5, 6]])
Is the feature of providing newshape as a list a nonstandard extension, so that the document does not mention it?

Tuples and lists have similar properties, and often you can use arraylike objects (or iterables) like lists, tuples, sets, or NumPy arrays interchangeably. They don't write in the documentation about all possibilities. I guess they use tuple in this case because if you call the function shape it returns a tuple of ints.

column_stack returns non cotiguous array

I am having a problem in my code with non contiguous arrays.
In particular I get the following warning message:
C:\Program Files\Anaconda2\lib\site-packages\skimage\util\shape.py:247: RuntimeWarning: Cannot provide views on a non-contiguous input array without copying.
warn(RuntimeWarning("Cannot provide views on a non-contiguous input "
I am using np.column_stack
import numpy as np
x = np.array([1,2,3,4])
y = np.array([5,6,7,8])
stack = np.column_stack((x,y))
stack.flags.f_contiguous
Out[2]: False
but I get a non contiguous array
Do you know how can I get contigous array? should I use always ascontiguousarray after column_stack?

np.stack([x, y]) is not contiguous. However, np.stack([x, y]).T is.
np.stack([x, y]) # Transpose of what you want and not contiguous
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Instead:
stack = np.stack([x, y]).T

In [276]: xy=np.column_stack((x,y))
In [277]: np.info(xy)
class: ndarray
shape: (4, 2)
strides: (8, 4)
itemsize: 4
aligned: True
contiguous: True
fortran: False
data pointer: 0xa836ec0
byteorder: little
byteswap: False
type: int32
The skimage code, https://github.com/scikit-image/scikit-image/blob/master/skimage/util/shape.py
# -- build rolling window view
if not arr_in.flags.contiguous:
warn(RuntimeWarning("Cannot provide views on a non-contiguous input "
"array without copying."))
arr_in = np.ascontiguousarray(arr_in)
That test, on the column_stack is ok:
In [278]: xy.flags.contiguous
Out[278]: True
In [279]: xy.T.flags.contiguous
Out[279]: False
Normally constructed 2d arrays are contiguous. But their transpose is F-contiguous. The warning is that np.ascontiguousarray will produce a copy. For very large arrays that could be a problem.
If this warning comes up often you could either suppress it, or routinely use ascontiguousarray before calling this function.
RuntimeWarning: Cannot provide views on a non-contiguous input array without copying

Irregular Numpy matrix

In Numpy, it appears that the matrix can simply be a nested list of anything not limited to numbers. For example
import numpy as np
a = [[1,2,5],[3,'r']]
b = np.matrix(a)
generates no complaints.
What is the purpose of this tolerance when list can treat the object that is not a matrix in the strict mathematical sense?

What you've created is an object dtype array:
In [302]: b=np.array([[1,2,5],[3,'r']])
In [303]: b
Out[303]: array([[1, 2, 5], [3, 'r']], dtype=object)
In [304]: b.shape
Out[304]: (2,)
In [305]: b[0]
Out[305]: [1, 2, 5]
In [306]: b[1]=None
In [307]: b
Out[307]: array([[1, 2, 5], None], dtype=object)
The elements of this array are pointers - pointers to objects else where in memory. It has a data buffer just like other arrays. In this case 2 pointers, 2
In [308]: b.__array_interface__
Out[308]:
{'data': (169809984, False),
'descr': [('', '|O')],
'shape': (2,),
'strides': None,
'typestr': '|O',
'version': 3}
In [309]: b.nbytes
Out[309]: 8
In [310]: b.itemsize
Out[310]: 4
It is very much like a list - which also stores object pointers in a buffer. But it differs in that it doesn't have an append method, but does have all the array ones like .reshape.
And for many operations, numpy treats such an array like a list - iterating over the pointers, etc. Many of the math operations that work with numeric values fail with object dtypes.
Why allow this? Partly it's just a generalization, expanding the concept of element values/dtypes beyond the simple numeric and string ones. numpy also allows compound dtypes (structured arrays). MATLAB expanded their matrix class to include cells, which are similar.
I see a lot of questions on SO about object arrays. Sometimes they are produced in error, Creating numpy array from list gives wrong shape.
Sometimes they are created intentionally. pandas readily changes a data series to object dtype to accommodate a mix of values (string, nan, int).
np.array() tries to create as high a dimension array as it can, resorting to object dtype only when it can't, for example when the sublists differ in length. In fact you have to resort to special construction methods to create an object array when the sublists are all the same.
This is still an object array, but the dimension is higher:
In [316]: np.array([[1,2,5],[3,'r',None]])
Out[316]:
array([[1, 2, 5],
[3, 'r', None]], dtype=object)
In [317]: _.shape
Out[317]: (2, 3)

If I pass a ndarray view to a function I can find its base but how can I find the slice?

numpy slicing e.g. S=np.s_[1:-1]; V=A[1:-1], produces a view of the underlying array. I can find this underlying array by V.base. If I pass such a view to a function, e.g.
def f(x):
return x.base
then f(V) == A. But how can I find the slice information S? I am looking for an attribute something like base containing information on the slice that created this view. I would like to be able to write a function to which I can pass a view of an array and return another view of the same array calculated from the view. E.g. I would like to be able to shift the view to the right or left of a one dimensional array.

As far as I know the slicing information is not stored anywhere, but you might be able to deduce it from attributes of the view and base.
For example:
In [156]: x=np.arange(10)
In [157]: y=x[3:]
In [159]: y.base
Out[159]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [160]: y.data
Out[160]: <memory at 0xb1a16b8c>
In [161]: y.base.data
Out[161]: <memory at 0xb1a16bf4>
I like the __array_interface__ value better:
In [162]: y.__array_interface__['data']
Out[162]: (163056924, False)
In [163]: y.base.__array_interface__['data']
Out[163]: (163056912, False)
So y databuffer starts 12 bytes beyond x. And since y.itemsize is 4, this means that the slicing start is 3.
In [164]: y.shape
Out[164]: (7,)
In [165]: x.shape
Out[165]: (10,)
And comparing the shapes, I deduce that the slice stop is None (the end).
For 2d arrays, or stepped slicing you'd have to look at the strides as well.
But in practice it is probably easier, and safer, to pass the slicing object (tuple, slice, etc) to your function, rather than deduce it from the results.
In [173]: S=np.s_[1:-1]
In [174]: S
Out[174]: slice(1, -1, None)
In [175]: x[S]
Out[175]: array([1, 2, 3, 4, 5, 6, 7, 8])
That is, pass S itself, rather than deduce it. I've never seen it done before.

Evaluate several elements of numpy object array

I have an ndarray A that stores objects of the same type, in particular various LinearNDInterpolator objects. For example's sake assume it's just 2:
>>> A
array([ <scipy.interpolate.interpnd.LinearNDInterpolator object at 0x7fe122adc750>,
<scipy.interpolate.interpnd.LinearNDInterpolator object at 0x7fe11daee590>], dtype=object)
I want to be able to do two things. First, I'd like to evaluate all objects in A at a certain point and get back an ndarray of A.shape with all the values in it. Something like
>> A[[0,1]](1,1) =
array([ 1, 2])
However, I get
TypeError: 'numpy.ndarray' object is not callable
Is it possible to do that?
Second, I would like to change the interpolation values without constructing new LinearNDInterpolator objects (since the nodes stay the same). I.e., something like
A[[0,1]].values = B
where B is an ndarray containing the new values for every element of A.
Thank you for your suggestions.

The same issue, but with simpler functions:
In [221]: A=np.array([add,multiply])
In [222]: A[0](1,2) # individual elements can be called
Out[222]: 3
In [223]: A(1,2) # but not the array as a whole
---------------------------------------------------------------------------
TypeError: 'numpy.ndarray' object is not callable
We can iterate over a list of functions, or that array as well, calling each element on the parameters. Done right we can even zip a list of functions and a list of parameters.
In [224]: ll=[add,multiply]
In [225]: [x(1,2) for x in ll]
Out[225]: [3, 2]
In [226]: [x(1,2) for x in A]
Out[226]: [3, 2]
Another test, the callable function:
In [229]: callable(A)
Out[229]: False
In [230]: callable(A[0])
Out[230]: True
Can you change the interpolation values for individual Interpolators? If so, just iterate through the list and do that.
In general, dtype object arrays function like lists. They contain the same kind of object pointers. Most operations requires the same sort of iteration. Unless you need to organize the elements in multiple dimensions, dtype object arrays have few, if any advantages over lists.
Another thought - the normal array dtype is numeric or fixed length strings. These elements are not callable, so there's no need to implement a .__call__ method on these arrays. They could write something like that to operate on object dtype arrays, but the core action is a Python call. So such a function would just hide the kind of iteration that I outlined.
In another recent question I showed how to use np.char.upper to apply a string method to every element of a S dtype array. But my time tests showed that this did not speedup anything.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to pass array of datetime64[ns] to pybind11 without copying - numpy

Related

Newshape parameter in numpy.reshape can be a list?

column_stack returns non cotiguous array

Irregular Numpy matrix

If I pass a ndarray view to a function I can find its base but how can I find the slice?

Evaluate several elements of numpy object array

Categories

Resources