I am new to Numba and GPU programming with Python.
I understand basic signatures used with vectorize decorator.
For example :- For a method that accepts two float variables and return a float , this is the signature #vectorize([float64(float64, float64)])
I have 3 questions
1.What is the signature to accept and return Numpy arrays?
2.What is the signature to accept and return Pandas Dataframe?
3.What are the possible signatures available ?
Related
Why are numpy arrays called homogeneous when you can have elements of different type in the same numpy array like this?
np.array([1,2,3,4,"a"])
I understand that I cannot perform some types of broadcasting operations like I cannot perform
np1*4 here and it results in an error.
but my question really is when it can have elements of different types, why it is called homogeneous?
Numpy automatically converts them to most applicable datatype.
e.g.,
>>> np.array([1,2,3,4,"a"]).dtype.type
numpy.str_
In short this means all elements are of string.
>>> np.array([1,2,3,4]).dtype.type
numpy.int64
I have a C++ class which is interchangeable with Numpy arrays through the buffer protocol, and already I can return objects from C++ to Python which are convertible to Numpy via the numpy.asarray() call.
I would like to make my class even easier to use, so I would like to return Numpy arrays which wrap my class directly from C++.
Is it possible to construct a numpy array from the C++ side using PyBind11 and return it?
I found out how to do this, you can just call the numpy "asarray" function directly on my buffer type.
py::buffer pybuf_to_numpy(py::buffer& b)
{
py::object np = py::module::import("numpy");
py::object asarray = np.attr("asarray");
return asarray(b);
}
I would like to #numba.njit this simple function that returns an array with a shape, in particular a rank, that depends on the input i:
E.g. for i = 4 the shape should be shape=(2, 2, 2, 2, 4)
import numpy as np
from numba import njit
#njit
def make_array_numba(i):
shape = np.array([2] * i + [i], dtype=np.int64)
return np.empty(shape, dtype=np.int64)
make_array_numba(4).shape
I tried many different ways, but always fail at the fact that I can't generate the shape tuple that numba wants to see in np.empty / np.reshape / np.zeros /...
In normal numpy one can pass lists / np.arrays as the shape, or I can generate a tuple on the fly such as (2,) * i + (i,).
Output:
>>> empty(array(int64, 1d, C), dtype=class(int64))
There are 4 candidate implementations:
- Of which 4 did not match due to:
Overload in function '_OverloadWrapper._build.<locals>.ol_generated': File: numba/core/overload_glue.py: Line 131.
With argument(s): '(array(int64, 1d, C), dtype=class(int64))':
Rejected as the implementation raised a specific error:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<intrinsic stub>) found for signature:
>>> stub(array(int64, 1d, C), class(int64))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Intrinsic of function 'stub': File: numba/core/overload_glue.py: Line 35.
With argument(s): '(array(int64, 1d, C), class(int64))':
No match.
This is not possible only with #njit. The reason is that Numba needs to set a type for the array independently of variable values so to compile the function and only then execute it. The thing is the dimension of an array is part of its type. Thus, here, Numba cannot find the type of the array since it is dependent of a value that is not a compile-time constant.
The only way to solve this problem (assuming you do not want to linearize your array) is to recompile the function for each possible i which is certainly overkill and completely defeat the benefit of using Numba (at least in your example). Note that #generated_jit can be used in such a case when you really want to recompile the function for different values or input types. I strongly advise you not to use it for your current use-case. If you try, then you will have other similar issues due to the array not being indexable using a runtime-defined variables and the resulting code will quickly be insane.
A more general and cleaner solution is simply to linearize the array. This means flattening it and perform some fancy indexing computation like (((... + z) * stride_z) + y) * stride_y + x. The size and the index can be computed at runtime independently of the typing system. Note that indexing can be quite slow but Numpy will not use a faster code in this case.
I recently switched from numpy compiled with open blas to numpy compiled with mkl. In pure numeric operations there was a clear speed up for matrix multiplication. However when I ran some code I have been using which multiplies matrices containing sympy variables, I now get the error
'Object arrays are not currently supported'
Does anyone have information on why this is the case for mkl and not for open blas?
Release notes for 1.17.0
Support of object arrays in matmul
It is now possible to use matmul (or the # operator) with object arrays. For instance, it is now possible to do:
from fractions import Fraction
a = np.array([[Fraction(1, 2), Fraction(1, 3)], [Fraction(1, 3), Fraction(1, 2)]])
b = a # a
Are you using # (matmul or dot)? A numpy array containing sympy objects will be object dtype. Math on object arrays depends on delegating the action to the object's own methods. It cannot be performed by the fast compiled libraries, which only work with c types such as float and double.
As a general rule you should not be trying to mix numpy and sympy. Math is hit-or-miss, and never fast. Use sympy's own Matrix module, or lambdify the sympy expressions for numeric work.
What's the mkl version? You may have to explore this with creator of that compilation.
After having read a lot of documentation on numpy / cython I am still unable to create a numpy array from a pointer in cython. The situation is as follows. I have a cython (*.pyx) file containing a callback function:
cimport numpy
cdef void func_eval(double* values,
int values_len,
void* func_data):
func = (<object> func_data)
# values.: contiguous array of length=values_len
array = ???
# array should be a (modifiable) numpy array containing the
# values as its data. No copying, no freeing the data by numpy.
func.eval(array)
Most tutorials and guides consider the problem of turning an array to a pointer, but I am interested in the opposite.
I have seen one solution here based on pure python using the ctypes library (not what I am interested in). Cython itself talks about typed memoryviews a great deal. This is also not what I am looking for exactly, since I want all the numpy goodness to work on the array.
Edit: A (slightly) modified standalone MWE (save as test.cyx, compile via cython test.cyx):
cimport numpy
cdef extern from *:
"""
/* This is C code which will be put
* in the .c file output by Cython */
typedef void (*callback)(double* values, int values_length);
void execute(callback cb)
{
double values[] = {0., 1.};
cb(values, 2);
}
"""
ctypedef void (*callback)(double* values, int values_length);
void execute(callback cb);
def my_python_callback(array):
print(array.shape)
print(array)
cdef void my_callback(double* values, int values_length):
# turn values / values_length into numpy array
# and call my_pytohn_callback
pass
cpdef my_execute():
execute(my_callback)
2nd Edit:
Regarding the possible duplicate: While the questions are related, the first answer given is, as was pointed out rather fragile, relying on memory data flags, which are arguably an implementation detail. What is more, the question and answers are rather outdated and the cython API has been expanded since 2014. Fortunately however, I was able to solve the problem myself.
Firstly, you can cast a raw pointer to a typed MemoryView operating on the same underlying memory without taking ownership of it via
cdef double[:] values_view = <double[:values_length]> values
This is not quire enough however, as I stated I want a numpy array. But it is possible to convert a MemoryView to a numpy array provided that it has a numpy data type. Thus, the goal can be achieved in one line via
array = np.asarray(<np.float64_t[:values_length]> values)
It can be easily checked that the array operates on the correct memory segment without owning it.