Using scalar ndarrays as keys - numpy

This works
range(50)[np.asarray(10)]
This works
{}.get(50)
This doesn't because of unhashable type: 'numpy.ndarray'
{}.get(np.asarray(50))
Is there a reason why __hash__ isn't implemented for this case?

Python dictionaries require their keys to implement both __eq__ and __hash__ methods, and Python's data model requires that:
The hash of an object does not change during its lifetime
If x == y then hash(x) == hash(y)
Numpy's ndarray class overrides __eq__ to support elementwise comparison and broadcasting. This means that for numpy arrays x and y, x == y is not a boolean but another array. This in itself probably rules out ndarrays functioning correctly as dictionary keys.
Even ignoring this quirk of ndarray.__eq__, it would be tricky to come up with a (useful) implementation of ndarray.__hash__. Since the data in a numpy array is mutable, we could not use that data to calculate __hash__ without violating the requirement that the hash of an object does not change during its lifetime.
There is nothing wrong with defining __hash__ for mutable objects provided that the hash itself does not change during the object's lifetime. Similarly, dictionary keys can be mutable provided they implement __hash__ and the hash is immutable. E.g. simple user-defined classes are mutable but can still be used as dictionary keys.

This scalar array is regular array with a 0d shape. Otherwise there's nothing unique about it.
In [46]: x=np.array(10)
In [47]: x
Out[47]: array(10)
In [48]: x[...]=100
In [49]: x
Out[49]: array(100)
You have to extract the number from the array:
In [53]: {}.get(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-53-19202767b220> in <module>()
----> 1 {}.get(x)
TypeError: unhashable type: 'numpy.ndarray'
In [54]: {}.get(x.item())
In [58]: {}.get(x[()])
Looking at the hash methods
In [65]: x.__hash__ # None
In [66]: x.item().__hash__
Out[66]: <method-wrapper '__hash__' of int object at 0x84f2270>
In [67]: x[()].__hash__
Out[67]: <method-wrapper '__hash__' of numpy.int32 object at 0xaaab42b0>

Related

How to make an np.array in numba with input-dependent rank?

I would like to #numba.njit this simple function that returns an array with a shape, in particular a rank, that depends on the input i:
E.g. for i = 4 the shape should be shape=(2, 2, 2, 2, 4)
import numpy as np
from numba import njit
#njit
def make_array_numba(i):
shape = np.array([2] * i + [i], dtype=np.int64)
return np.empty(shape, dtype=np.int64)
make_array_numba(4).shape
I tried many different ways, but always fail at the fact that I can't generate the shape tuple that numba wants to see in np.empty / np.reshape / np.zeros /...
In normal numpy one can pass lists / np.arrays as the shape, or I can generate a tuple on the fly such as (2,) * i + (i,).
Output:
>>> empty(array(int64, 1d, C), dtype=class(int64))
There are 4 candidate implementations:
- Of which 4 did not match due to:
Overload in function '_OverloadWrapper._build.<locals>.ol_generated': File: numba/core/overload_glue.py: Line 131.
With argument(s): '(array(int64, 1d, C), dtype=class(int64))':
Rejected as the implementation raised a specific error:
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<intrinsic stub>) found for signature:
>>> stub(array(int64, 1d, C), class(int64))
There are 2 candidate implementations:
- Of which 2 did not match due to:
Intrinsic of function 'stub': File: numba/core/overload_glue.py: Line 35.
With argument(s): '(array(int64, 1d, C), class(int64))':
No match.
This is not possible only with #njit. The reason is that Numba needs to set a type for the array independently of variable values so to compile the function and only then execute it. The thing is the dimension of an array is part of its type. Thus, here, Numba cannot find the type of the array since it is dependent of a value that is not a compile-time constant.
The only way to solve this problem (assuming you do not want to linearize your array) is to recompile the function for each possible i which is certainly overkill and completely defeat the benefit of using Numba (at least in your example). Note that #generated_jit can be used in such a case when you really want to recompile the function for different values or input types. I strongly advise you not to use it for your current use-case. If you try, then you will have other similar issues due to the array not being indexable using a runtime-defined variables and the resulting code will quickly be insane.
A more general and cleaner solution is simply to linearize the array. This means flattening it and perform some fancy indexing computation like (((... + z) * stride_z) + y) * stride_y + x. The size and the index can be computed at runtime independently of the typing system. Note that indexing can be quite slow but Numpy will not use a faster code in this case.

How to compare numpy arrays of tuples?

Here's an MWE that illustrates the issue I have:
import numpy as np
arr = np.full((3, 3), -1, dtype="i,i")
doesnt_work = arr == (-1, -1)
n_arr = np.full((3, 3), -1, dtype=int)
works = n_arr == 10
arr is supposed to be an array of tuples, but it doesn't behave as expected.
works is an array of booleans, as expected, but doesnt_work is False. Is there a way to get numpy to do elementwise comparisons on more complex types, or do I have to resort to list comprehension, flatten and reshape?
There's a second problem:
f = arr[(0, 0)] == (-1, -1)
f is False, because arr[(0,0)] is of type numpy.void rather than a tuple. So even if the componentwise comparison worked, it would give the wrong result. Is there a clever numpy way to do this or should I just resort to list comprehension?
Both problems are actually the same problem! And are both related to the custom data type you created when you specified dtype="i,i".
If you run arr.dtype you will get dtype([('f0', '<i4'), ('f1', '<i4')]). That is a 2 signed integers that are placed in one continuous block of memory. This is not a python tuple. Thus it is clear why the naive comparison fails, since (-1,-1) is a python tuple and is not represented in memory the same way that the numpy data type is.
However if you compare with a_comp = np.array((-1,-1), dtype="i,i") you get the exact behavior you are expecting!
You can read more about how the custom dtype stuff works on the numpy docs:
https://numpy.org/doc/stable/reference/arrays.dtypes.html
Oh and to address what np.void is: it comes from the idea that it is a void c pointer which essentially means that it is an address to a continuous block of memory of unspecified type. But, provided you (the programer) knows what is going to be stored in that memory (in this case two back to back integers) it's fine provided you are careful (compare with the same custom data type).

numpy argmax not getting all values from generator expression

The output of the following
import numpy as np
print(np.argmax([i for i in range(0, 10)]))
print(np.argmax(i for i in range(0, 10)))
is
9
0
Why does argmax reduce the generator expression only once?
Compare these two expressions:
In [682]: np.asarray([i for i in range(3)])
Out[682]: array([0, 1, 2])
In [683]: np.asarray(i for i in range(3))
Out[683]: array(<generator object <genexpr> at 0xb367bb9c>, dtype=object)
asarray (or array) applied to a list produces an array with numbers. The same thing applied to the generator produces a dtype=object array with 1 item, the generator itself. In fact its shape is () (0d). You can recover this generator with np.array(i for i in range(3))[()]
fromiter can iterate a generator, but array only iterates on things like lists and tuples.
In [688]: np.fromiter((i for i in range(3)),int)
Out[688]: array([0, 1, 2])
And argmax depends on its input being an array.
As i have don't have required reputation amount i am adding this an answer.
As suggested by #hpaulj, np.argmax calls asarray function in numeric.py. Here the developers have mentioned this in the code:
def asarray(a, dtype=None, order=None):
"""Convert the input to an array.
Parameters
----------
a : array_like
Input data, in any form that can be converted to an array. This
includes lists, lists of tuples, tuples, tuples of tuples, tuples
of lists and ndarrays.
...
Hence your a doesn't match the requirements. Also why zero is returned for any input. This return value is from the function
result = getattr(asarray(obj), method)(*args, **kwds)
in fromnumeric.py which is called first. as for a generator object the code can't resolve the method, this function might return 0 as default
This was my research regarding the question

Evaluate several elements of numpy object array

I have an ndarray A that stores objects of the same type, in particular various LinearNDInterpolator objects. For example's sake assume it's just 2:
>>> A
array([ <scipy.interpolate.interpnd.LinearNDInterpolator object at 0x7fe122adc750>,
<scipy.interpolate.interpnd.LinearNDInterpolator object at 0x7fe11daee590>], dtype=object)
I want to be able to do two things. First, I'd like to evaluate all objects in A at a certain point and get back an ndarray of A.shape with all the values in it. Something like
>> A[[0,1]](1,1) =
array([ 1, 2])
However, I get
TypeError: 'numpy.ndarray' object is not callable
Is it possible to do that?
Second, I would like to change the interpolation values without constructing new LinearNDInterpolator objects (since the nodes stay the same). I.e., something like
A[[0,1]].values = B
where B is an ndarray containing the new values for every element of A.
Thank you for your suggestions.
The same issue, but with simpler functions:
In [221]: A=np.array([add,multiply])
In [222]: A[0](1,2) # individual elements can be called
Out[222]: 3
In [223]: A(1,2) # but not the array as a whole
---------------------------------------------------------------------------
TypeError: 'numpy.ndarray' object is not callable
We can iterate over a list of functions, or that array as well, calling each element on the parameters. Done right we can even zip a list of functions and a list of parameters.
In [224]: ll=[add,multiply]
In [225]: [x(1,2) for x in ll]
Out[225]: [3, 2]
In [226]: [x(1,2) for x in A]
Out[226]: [3, 2]
Another test, the callable function:
In [229]: callable(A)
Out[229]: False
In [230]: callable(A[0])
Out[230]: True
Can you change the interpolation values for individual Interpolators? If so, just iterate through the list and do that.
In general, dtype object arrays function like lists. They contain the same kind of object pointers. Most operations requires the same sort of iteration. Unless you need to organize the elements in multiple dimensions, dtype object arrays have few, if any advantages over lists.
Another thought - the normal array dtype is numeric or fixed length strings. These elements are not callable, so there's no need to implement a .__call__ method on these arrays. They could write something like that to operate on object dtype arrays, but the core action is a Python call. So such a function would just hide the kind of iteration that I outlined.
In another recent question I showed how to use np.char.upper to apply a string method to every element of a S dtype array. But my time tests showed that this did not speedup anything.

subclass ndarray in python numpy: change size and value of array

Someone asks the question here: Subclassing numpy ndarray problem but it is basically unanswered.
Here is my version of the question. Suppose you subclass the numpy.ndarray to something that automatically expands when you try to set an element beyond the current shape. You would need to override the setitem and use some numpy.concatenate calls to construct a new array and then assign it to "self" somehow. How to assign the array to "self"?
class myArray(numpy.ndarray):
def __new__(cls, input_array):
obj = numpy.asarray(input_array).view(cls)
return(obj)
def __array_finalize__(self, obj):
if obj is None: return
try:
super(myArray, self).__setitem__(coords, value)
except IndexError as e:
logging.error("Adjusting array")
...
self = new_array # THIS IS WRONG
Why subclass? Why not just give your wrapper object it's own data member that is an ndarray and use __getitem__ and __setitem__ to operate on the wrapped data member? This is basically what ndarray already does, wrapping Python's built-in containers. Also have a look at Python Pandas which already does a lot of what you're talking about wrapped on top of ndarray.