Object arrays not supported on numpy with mkl? - numpy

I recently switched from numpy compiled with open blas to numpy compiled with mkl. In pure numeric operations there was a clear speed up for matrix multiplication. However when I ran some code I have been using which multiplies matrices containing sympy variables, I now get the error
'Object arrays are not currently supported'
Does anyone have information on why this is the case for mkl and not for open blas?

Release notes for 1.17.0
Support of object arrays in matmul
It is now possible to use matmul (or the # operator) with object arrays. For instance, it is now possible to do:
from fractions import Fraction
a = np.array([[Fraction(1, 2), Fraction(1, 3)], [Fraction(1, 3), Fraction(1, 2)]])
b = a # a
Are you using # (matmul or dot)? A numpy array containing sympy objects will be object dtype. Math on object arrays depends on delegating the action to the object's own methods. It cannot be performed by the fast compiled libraries, which only work with c types such as float and double.
As a general rule you should not be trying to mix numpy and sympy. Math is hit-or-miss, and never fast. Use sympy's own Matrix module, or lambdify the sympy expressions for numeric work.
What's the mkl version? You may have to explore this with creator of that compilation.

Related

How to make an np.array in numba with input-dependent rank?

I would like to #numba.njit this simple function that returns an array with a shape, in particular a rank, that depends on the input i:
E.g. for i = 4 the shape should be shape=(2, 2, 2, 2, 4)
import numpy as np
from numba import njit
#njit
def make_array_numba(i):
shape = np.array([2] * i + [i], dtype=np.int64)
return np.empty(shape, dtype=np.int64)
make_array_numba(4).shape
I tried many different ways, but always fail at the fact that I can't generate the shape tuple that numba wants to see in np.empty / np.reshape / np.zeros /...
In normal numpy one can pass lists / np.arrays as the shape, or I can generate a tuple on the fly such as (2,) * i + (i,).
Output:
>>> empty(array(int64, 1d, C), dtype=class(int64))
There are 4 candidate implementations:
- Of which 4 did not match due to:
Overload in function '_OverloadWrapper._build.<locals>.ol_generated': File: numba/core/overload_glue.py: Line 131.
With argument(s): '(array(int64, 1d, C), dtype=class(int64))':
Rejected as the implementation raised a specific error:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<intrinsic stub>) found for signature:
>>> stub(array(int64, 1d, C), class(int64))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Intrinsic of function 'stub': File: numba/core/overload_glue.py: Line 35.
With argument(s): '(array(int64, 1d, C), class(int64))':
No match.
This is not possible only with #njit. The reason is that Numba needs to set a type for the array independently of variable values so to compile the function and only then execute it. The thing is the dimension of an array is part of its type. Thus, here, Numba cannot find the type of the array since it is dependent of a value that is not a compile-time constant.
The only way to solve this problem (assuming you do not want to linearize your array) is to recompile the function for each possible i which is certainly overkill and completely defeat the benefit of using Numba (at least in your example). Note that #generated_jit can be used in such a case when you really want to recompile the function for different values or input types. I strongly advise you not to use it for your current use-case. If you try, then you will have other similar issues due to the array not being indexable using a runtime-defined variables and the resulting code will quickly be insane.
A more general and cleaner solution is simply to linearize the array. This means flattening it and perform some fancy indexing computation like (((... + z) * stride_z) + y) * stride_y + x. The size and the index can be computed at runtime independently of the typing system. Note that indexing can be quite slow but Numpy will not use a faster code in this case.

Creating a numpy array from a pointer in cython

After having read a lot of documentation on numpy / cython I am still unable to create a numpy array from a pointer in cython. The situation is as follows. I have a cython (*.pyx) file containing a callback function:
cimport numpy
cdef void func_eval(double* values,
int values_len,
void* func_data):
func = (<object> func_data)
# values.: contiguous array of length=values_len
array = ???
# array should be a (modifiable) numpy array containing the
# values as its data. No copying, no freeing the data by numpy.
func.eval(array)
Most tutorials and guides consider the problem of turning an array to a pointer, but I am interested in the opposite.
I have seen one solution here based on pure python using the ctypes library (not what I am interested in). Cython itself talks about typed memoryviews a great deal. This is also not what I am looking for exactly, since I want all the numpy goodness to work on the array.
Edit: A (slightly) modified standalone MWE (save as test.cyx, compile via cython test.cyx):
cimport numpy
cdef extern from *:
"""
/* This is C code which will be put
* in the .c file output by Cython */
typedef void (*callback)(double* values, int values_length);
void execute(callback cb)
{
double values[] = {0., 1.};
cb(values, 2);
}
"""
ctypedef void (*callback)(double* values, int values_length);
void execute(callback cb);
def my_python_callback(array):
print(array.shape)
print(array)
cdef void my_callback(double* values, int values_length):
# turn values / values_length into numpy array
# and call my_pytohn_callback
pass
cpdef my_execute():
execute(my_callback)
2nd Edit:
Regarding the possible duplicate: While the questions are related, the first answer given is, as was pointed out rather fragile, relying on memory data flags, which are arguably an implementation detail. What is more, the question and answers are rather outdated and the cython API has been expanded since 2014. Fortunately however, I was able to solve the problem myself.
Firstly, you can cast a raw pointer to a typed MemoryView operating on the same underlying memory without taking ownership of it via
cdef double[:] values_view = <double[:values_length]> values
This is not quire enough however, as I stated I want a numpy array. But it is possible to convert a MemoryView to a numpy array provided that it has a numpy data type. Thus, the goal can be achieved in one line via
array = np.asarray(<np.float64_t[:values_length]> values)
It can be easily checked that the array operates on the correct memory segment without owning it.

Does PyTorch have a RandomState-like object for random number generation?

in numpy i can
import numpy as np
rs = np.random.RandomState(seed=0)
and then pass that object around, eg for dependency injection.
Does PyTorch have a similar interface? I can't find anything in the docs, but maybe i'm missing something.
The closest thing would be torch.manual_seed, which sets the seed for generating random numbers and returns a torch.Generator. This thread here has more information, apparently there may be some inconsistencies depending on whether you are using GPU or a CPU.

Only size 1 arrays can be converted to python scalars

I created a 3 dimensional object using numpy.random module such as
import numpy as np
b = np.random.randn(4,4,3)
Why can't we cast type float to b?
TypeError
actual code
You can't float(b) because b isn't a number, it's a multidimensional array/matrix. If you're trying to convert every element to a Python float, that's a bad idea because numpy numbers are more precise, but if you really want to do that for whatever reason, you can do b.tolist(), which returns a Python list of floats. However, I don't believe you can have a numpy matrix of native Python types because that doesn't make any sense.

Convert Breeze Matrix to Numpy Array

Is it possible to convert a breeze dense matrix to numpy array using spark?
I have here a breeze dense matrix I want to convert to numpy array.
Here is a way that works correctly but is slow / inefficient (creates multiple copies). i used zeppelin spark and pyspark interpreters (i guess toree should also be possible):
in spark:
%spark
import breeze.linalg._
import breeze.numerics._
z.put("matrix", DenseMatrix.eye[Double](4));
z.get("matrix")
then in python:
%pyspark
import numpy as np
def breeze2numpy(breeze_matrix):
data = list(breeze_matrix.copy().data())
return np.array(data).reshape(breeze_matrix.rows(), breeze_matrix.cols(), order='F')
breeze2numpy(z.z.get("matrix"))
this works but will be impractical for big datasets (because of the multiple copies involved via a python list). it would be nice to have a zero-copy method using python's buffer protocol like there is for C++ Eigen matrix --> numpy array.