Numpy: Functional assignment? - numpy

Suppose I want to create an array b which is a version of array a with the i'th row set to zero.
Currently, I have to do:
b = a.copy()
b[i, :] = 0
Which is a bit annoying, because you can't do that in lambdas, and everything else in numpy is functional. I'd like a function similar to theano's set_subtensor, where you could go
b = a.set_subtensor((i, slice(None)), 0)
or
b = np.set_subtensor(a, (i, slice(None)), 0)
As far as I can tell, there's nothing like that in numpy. Or is there?
Edit
The answer appears to be no, there is no such function, you need to define one yourself. See hpaulj's response.

Do you mean a simple function like this:
def subtensor(a, ind, val):
b=a.copy()
b[ind] = val
return b
In [192]: a=np.arange(12).reshape(3,4)
In [194]: subtensor(a,(1,slice(None)),0)
Out[194]:
array([[ 0, 1, 2, 3],
[ 0, 0, 0, 0],
[ 8, 9, 10, 11]])
Indexing takes a tuple like (1, slice(None)).
There are some alternative assignment functions like put, place, copyto, but none look like do this task.
These are equivalent:
b[0,:] = 1
b.__setitem__((0,slice(None)),1)
That is, the Python interpreter converts [] syntax into a method call.
This is an in-place operation. I don't know of anything that first makes a copy.
Functions like choose and where return copies, but they (normally) work with boolean masks, not indexing tuples.

Related

Computing quick convex hull using Numba

I came across to this nice implementation of computing convex hull of 2d points using Numpy implementation. I would like to be able to #njit this function to use it inside my other Numba jitted code. However I'm not able to modify it, to run, as it uses recursion, and unsupported Numba features? Can anybody help me to rewrite this?
import numpy as np
from numba import njit
def process(S, P, a, b):
signed_dist = np.cross(S[P] - S[a], S[b] - S[a])
K = [i for s, i in zip(signed_dist, P) if s > 0 and i != a and i != b]
if len(K) == 0:
return (a, b)
c = max(zip(signed_dist, P))[1]
return process(S, K, a, c)[:-1] + process(S, K, c, b)
def quickhull_2d(S: np.ndarray) -> np.ndarray:
a, b = np.argmin(S[:,0]), np.argmax(S[:,0])
max_index = np.argmax(S[:,0])
max_element = S[max_index]
return process(S, np.arange(S.shape[0]), a, max_index)[:-1] + process(S, np.arange(S.shape[0]), max_index, a)[:-1]
Example data input and output
points = np.array([[0, 0], [1, 1], [0.5, 0.5], [0, 1], [1, 0]])
ch = quickhull_2d(points)
print(ch)
[0, 4, 1, 3]
print(points[ch])
[[0. 0.]
[1. 0.]
[1. 1.]
[0. 1.]]
There are many issues in this code for Numba to be used.
First of all, returning variable-sized tuples is not possible in Numba because the type of a tuple implicitly includes its size. A tuple is basically a structured type and not a list. See this post and this one for more information about this issue. The solution is basically to return a list (slow) or an array (fast).
Moreover, the type of the parameters change from one function to another. Indeed, process is called in quickhull_2d with a P defined as a Numpy array and then called from process itself with P defined as a list. List and array are completely different things. It is better to use array when possible in Numba unless you use a list to add an unknown number of items (not small nor bounded).
Additionally, max(zip(signed_dist, P))[1] is apparently unsupported by Numba and it is not very efficient anyway (nor idiomatic for a Numpy code). P[np.argmax(signed_dist)] should be used instead.
Furthermore, np.cross also does not seems supported for the general case and you need to currently use cross2d instead (from numba.np.extensions).
Finally, when you use recursive function like this, it is better to specify the input type of the parameters so to avoid weird errors. This can be done thanks to a signature string.
The resulting code is:
import numpy as np
from numba import njit
from numba.np.extensions import cross2d
#njit('(float64[:,:], int64[:], int64, int64)')
def process(S, P, a, b):
signed_dist = cross2d(S[P] - S[a], S[b] - S[a])
K = np.array([i for s, i in zip(signed_dist, P) if s > 0 and i != a and i != b], dtype=np.int64)
if len(K) == 0:
return [a, b]
c = P[np.argmax(signed_dist)]
return process(S, K, a, c)[:-1] + process(S, K, c, b)
#njit('(float64[:,:],)')
def quickhull_2d(S: np.ndarray) -> np.ndarray:
a, b = np.argmin(S[:,0]), np.argmax(S[:,0])
max_index = np.argmax(S[:,0])
max_element = S[max_index]
return process(S, np.arange(S.shape[0]), a, max_index)[:-1] + process(S, np.arange(S.shape[0]), max_index, a)[:-1]
points = np.array([[0, 0], [1, 1], [0.5, 0.5], [0, 1], [1, 0]])
ch = quickhull_2d(points)
print(ch) # print [0, 4, 1, 3]
Note that the compilation time is slow and the execution time should not be great. This is due to lists (and so temporary array for the runtime performance). The next step is simply to use arrays. The bad news is that concatenate is not supported by Numba (because the general case is not easy to implement though specific case are trivial). You can create a new array and copy each part (or even better: you can preallocate an array and slice it during the recursive calls).
Also not that any recursive function can be transformed to a non-recursive function using a manual stack. That being said, it may be slower and make the code more verbose. There are some benefits to this approach though: it avoid stack overflow when the recursion is deep and it may be faster if the function is rewritten so not to stack one of the function call thanks to tail call optimization.

Rearranging numpy arrays

I was not able to find a duplicate of my question, unfortunately, although I am sure that this is a problem which has been solved before
I have a numpy array with a certain set of indices, eg.
ind1 = np.array([1, 3, 5, 7])
With these indices, I can filter some values from another array. Lets call this other array rows. As an example, I can retrieve
rows[ind1] = [1, 10, 20, 15]
The order of rows[ind1] must not be changed in the following.
I have another index array, ind2
ind2 = np.array([4, 5, 6, 7])
I also have an array cols, where I can filter values from using ind2. I know that cols[ind2] results in an array which has the size of rows[ind1] and the entries are the same, but the order is different. An example:
cols[ind2] = [15, 20, 10, 1]
I would like to rearrange the order of cols[ind2], so that it corresponds to rows[ind1]. I am interested in the corresponding order of ind2.
In the example, the result should be
cols[ind2] = [1, 10, 20, 15]
ind2 = [7, 6, 5, 4]
Using numpy, I did not find a way to do this. Any ideas would be helpful. Thanks in advance.
There may be a better way, but you can do this using argsorts.
Let's call your "reordered ind2" ind3.
If you are sure that rows[ind1] and cols[ind2] will have the same length and all of the same elements, then the sorted versions of both will be the same i.e np.sort(rows[ind1]) = np.sort(cols[ind2]).
If this is the case, and you don't run into any problems with repeated elements (unsure of your exact use case), then what you can do is find the indices to put cols[ind2] in order, and then from there, find the indices to put np.sort(cols[ind2]) into the order of rows[ind1].
So, if
p1 = np.argsort(rows[ind1])
and
p2 = np.argsort(cols[ind2])
and
p3 = np.argsort(p1)
Then
ind3 = ind2[p2][p3]. The reason this works is because if you do an argsort of an argsort, it gives you the indices you need to reverse the first sort. p2 sorts cols[ind2] (that's the definition of argsort), and p3 unsorts the result of that back into the order of rows[ind1].

What does the [1] do when using .where()?

I m practicing on a Data Cleaning Kaggle excercise.
In parsing dates example I can´t figure out what the [1] does at the end of the indices object.
Thanks..
# Finding indices corresponding to rows in different date format
indices = np.where([date_lengths == 24])[1]
print('Indices with corrupted data:', indices)
earthquakes.loc[indices]
As described in the documentation, numpy.where called with a single argument is equivalent to calling np.asarray([date_lengths == 24]).nonzero().
numpy.nonzero return a tuple with as many items as the dimensions of the input array with the indexes of the non-zero values.
>>> np.nonzero([1,0,2,0])
(array([0, 2]),)
Slicing [1] enables to get the second element (i.e. second dimension) but as the input was wrapped into […], this is equivalent to doing:
np.where(date_lengths == 24)[0]
>>> np.nonzero([1,0,2,0])[0]
array([0, 2])
It is an artefact of the extra [] around the condition. For example:
a = np.arange(10)
To find, for example, indices where a>3 can be done like this:
np.where(a > 3)
gives as output a tuple with one array
(array([4, 5, 6, 7, 8, 9]),)
So the indices can be obtained as
indices = np.where(a > 3)[0]
In your case, the condition is between [], which is unnecessary, but still works.
np.where([a > 3])
returns a tuple of which the first is an array of zeros, and the second array is the array of indices you want
(array([0, 0, 0, 0, 0, 0]), array([4, 5, 6, 7, 8, 9]))
so the indices are obtained as
indices = np.where([a > 3])[1]

Can einsum be used to reshape operand?

I trying to modify the following code snippet to not use reshape
a = np.random.randn(1, 2, 3, 5)
b = np.random.randn(2, 5, 10)
np.einsum("ijkl,mjl->kim", a, b.reshape(10,2,5))
At first I thought that the reshape is just transposing the operand, but it seems more complicated than that. Is it possible to do this operation without reshaping?

Get indices for values of one array in another array

I have two 1D-arrays containing the same set of values, but in a different (random) order. I want to find the list of indices, which reorders one array according to the other one. For example, my 2 arrays are:
ref = numpy.array([5,3,1,2,3,4])
new = numpy.array([3,2,4,5,3,1])
and I want the list order for which new[order] == ref.
My current idea is:
def find(val):
return numpy.argmin(numpy.absolute(ref-val))
order = sorted(range(new.size), key=lambda x:find(new[x]))
However, this only works as long as no values are repeated. In my example 3 appears twice, and I get new[order] = [5 3 3 1 2 4]. The second 3 is placed directly after the first one, because my function val() does not track which 3 I am currently looking for.
So I could add something to deal with this, but I have a feeling there might be a better solution out there. Maybe in some library (NumPy or SciPy)?
Edit about the duplicate: This linked solution assumes that the arrays are ordered, or for the "unordered" solution, returns duplicate indices. I need each index to appear only once in order. Which one comes first however, is not important (neither possible based on the data provided).
What I get with sort_idx = A.argsort(); order = sort_idx[np.searchsorted(A,B,sorter = sort_idx)] is: [3, 0, 5, 1, 0, 2]. But what I am looking for is [3, 0, 5, 1, 4, 2].
Given ref, new which are shuffled versions of each other, we can get the unique indices that map ref to new using the sorted version of both arrays and the invertibility of np.argsort.
Start with:
i = np.argsort(ref)
j = np.argsort(new)
Now ref[i] and new[j] both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:
k = np.argsort(i)
Now ref is just new[j][k], or new[j[k]]. Since all the operations are shuffles using unique indices, the final index j[k] is unique as well. j[k] can be computed in one step with
order = np.argsort(new)[np.argsort(np.argsort(ref))]
From your original example:
>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])
This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i) with something like the argsort_unique function in this answer. I would go one step further and just compute the inverse of the sort:
def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv
order = np.argsort(new)[inverse_argsort(ref)]