Numpy index of the maximum with reduction - numpy.argmax.reduceat - numpy

I have a flat array b:
a = numpy.array([0, 1, 1, 2, 3, 1, 2])
And an array c of indices marking the start of each "chunk":
b = numpy.array([0, 4])
I know I can find the maximum in each "chunk" using a reduction:
m = numpy.maximum.reduceat(a,b)
>>> array([2, 3], dtype=int32)
But... Is there a way to find the index of the maximum <edit>within a chunk</edit> (like numpy.argmax), with vectorized operations (no lists, loops)?

Borrowing the idea from this post.
Steps involved :
Offset all elements in a group by a limit-offset. Sort them globally, thus limiting each group to stay at their positions, but sorting the elements within each group.
In the sorted array, we would look for the last element, which would be the group max. Their indices would be the argmax after offsetting down for the group lengths.
Thus, a vectorized implementation would be -
def numpy_argmax_reduceat(a, b):
n = a.max()+1 # limit-offset
grp_count = np.append(b[1:] - b[:-1], a.size - b[-1])
shift = n*np.repeat(np.arange(grp_count.size), grp_count)
sortidx = (a+shift).argsort()
grp_shifted_argmax = np.append(b[1:],a.size)-1
return sortidx[grp_shifted_argmax] - b
As a minor tweak and possibly faster one, we could alternatively create shift with cumsum and thus have a variation of the earlier approach, like so -
def numpy_argmax_reduceat_v2(a, b):
n = a.max()+1 # limit-offset
id_arr = np.zeros(a.size,dtype=int)
id_arr[b[1:]] = 1
shift = n*id_arr.cumsum()
sortidx = (a+shift).argsort()
grp_shifted_argmax = np.append(b[1:],a.size)-1
return sortidx[grp_shifted_argmax] - b

Related

how to find missing number between minimum and maximum

I want to make a NumPy array which has below;
Random number: 0~9 (0<=value<=9) Random 1D size: 5~9 (5<= size <=9)
And I hope to find missing numbers between min and max so I made a code like this
import numpy as np
min_val = 0
max_val = 10
min_val_len = 5
max_val_len = 10
arr1 = [4,3,2,7,8,2,3]
a = list(arr1)
print(a)
diff = np.setdiff1d(range(min_val, max_val), arr1)
arr = np.arange(min_val_len, max_val_len)
if diff in arr:
print(diff)
else:
print("no missing")
In my purpose, the output will be [5,6].
And if an input is [1, 2, 3, 4, 5], the result will be 'no_missing'.
But the code isn't work on my expectation.
I think you expect in to work in a way it does not: You want to check every single element, try:
b = [d in arr for d in diff]
Now b contains a boolean value for each value d of diff. If you want to find the actual number that are missing you can do it using a condition
b = np.intersect1d(np.setdiff1d(range(min_val, max_val), arr1), arr)
Also note that python has built in set types, so you do not actually need to use numpy.
Now b contains all numbers of d that are in arr. But you can do it in even a simpler way as you're already using the notion of sets:
print(np.setdiff1d(rang

How to find matrix common members of matrices in Numpy

I have a 2D matrix A and a vector B. I want to find all row indices of elements in A that are also contained in B.
A = np.array([[1,9,5], [8,4,9], [4,9,3], [6,7,5]], dtype=int)
B = np.array([2, 4, 8, 10, 12, 18], dtype=int)
My current solution is only to compare A to one element of B at a time but that is horribly slow:
res = np.array([], dtype=int)
for i in range(B.shape[0]):
cres, _ = (B[i] == A).nonzero()
degElem = np.append(res, cres)
res = np.unique(res)
The following Matlab statement would solve my issue:
find(any(reshape(any(reshape(A, prod(size(A)), 1) == B, 2),size(A, 1),size(A, 2)), 2))
However comparing a row and a colum vector in Numpy does not create a Boolean intersection matrix as it does in Matlab.
Is there a proper way to do this in Numpy?
We can use np.isin masking.
To get all the row numbers, it would be -
np.where(np.isin(A,B).T)[1]
If you need them split based on each element's occurence -
[np.flatnonzero(i) for i in np.isin(A,B).T if i.any()]
Posted MATLAB code seems to be doing broadcasting. So, an equivalent one would be -
np.where(B[:,None,None]==A)[1]

what kind of x fits into argsort(x) == argsort(argsort(x))?

For a 1-d array, what kind of x gives you argsort(x) == argsort(argsort(x)) ? sorted array would be a trivial soliton.
but you can have not sorted array like [1, 0, 2] or [1, 0, 2, 3]
i'm really interested.
sorted_array = np.arange(10)
np.testing.assert_array_equal(np.argsort(sorted_array), np.argsort(np.argsort(sorted_array)))
# or
semi_sorted = [1, 0, 2]
np.testing.assert_array_equal(np.argsort(semi_sorted), np.argsort(np.argsort(semi_sorted)))
# or
semi_sorted = [1, 0, 2, 3]
np.testing.assert_array_equal(np.argsort(semi_sorted), np.argsort(np.argsort(semi_sorted)))
# or
semi_sorted = [2, 1, 3, 4, 5]
np.testing.assert_array_equal(np.argsort(semi_sorted), np.argsort(np.argsort(semi_sorted)))
what type of arrays fits in the criteria?
To formalize #Alex Riley's intuition:
For any (zero based) permutation p we have argsort(p) = p^-1 because by definition of argsort p[argsort(p)] = [0,1,2,...] and [0,1,2,...] viewed as a permutation is the identity.
Now, no matter what x, argsort(x) is a permutation, so writing p for that we get p = p^-1 or, equivalently, p^2 = id.
What do permutations p that are self-inverse look like? If p is applied twice nothing changes, so if the first application of p moves x to y the second application of p must move y to x. As y may equal x p must therefore consist of flips of two elements and of elements that stay put. That is also sufficient.
We now know what argsort(x) looks like. What about x itself? Let us for simplicity assume x has only unique elements, otherwise the details of the sort algorithm used have to be considered. Let us write s for the sorted x. Then s = x[p]. Permuting both sides with p we get s[p] = x[p^2] = x. So x may be any sequence that is obtained from an ordered sequence by flipping the positions of some (possibly zero) nonoverlapping pairs.

Get indices for values of one array in another array

I have two 1D-arrays containing the same set of values, but in a different (random) order. I want to find the list of indices, which reorders one array according to the other one. For example, my 2 arrays are:
ref = numpy.array([5,3,1,2,3,4])
new = numpy.array([3,2,4,5,3,1])
and I want the list order for which new[order] == ref.
My current idea is:
def find(val):
return numpy.argmin(numpy.absolute(ref-val))
order = sorted(range(new.size), key=lambda x:find(new[x]))
However, this only works as long as no values are repeated. In my example 3 appears twice, and I get new[order] = [5 3 3 1 2 4]. The second 3 is placed directly after the first one, because my function val() does not track which 3 I am currently looking for.
So I could add something to deal with this, but I have a feeling there might be a better solution out there. Maybe in some library (NumPy or SciPy)?
Edit about the duplicate: This linked solution assumes that the arrays are ordered, or for the "unordered" solution, returns duplicate indices. I need each index to appear only once in order. Which one comes first however, is not important (neither possible based on the data provided).
What I get with sort_idx = A.argsort(); order = sort_idx[np.searchsorted(A,B,sorter = sort_idx)] is: [3, 0, 5, 1, 0, 2]. But what I am looking for is [3, 0, 5, 1, 4, 2].
Given ref, new which are shuffled versions of each other, we can get the unique indices that map ref to new using the sorted version of both arrays and the invertibility of np.argsort.
Start with:
i = np.argsort(ref)
j = np.argsort(new)
Now ref[i] and new[j] both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:
k = np.argsort(i)
Now ref is just new[j][k], or new[j[k]]. Since all the operations are shuffles using unique indices, the final index j[k] is unique as well. j[k] can be computed in one step with
order = np.argsort(new)[np.argsort(np.argsort(ref))]
From your original example:
>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])
This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i) with something like the argsort_unique function in this answer. I would go one step further and just compute the inverse of the sort:
def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv
order = np.argsort(new)[inverse_argsort(ref)]

Vectorized gather operation in numpy

Given this (sample) data
target_slots = np.array([1, 3, 1, 0, 8, 5, 8, 1, 1, 2])
dummy_elements = np.arange(10*D).reshape(10, D)
is there any way to express in a vectorized numpy expression the operation
gathered_results = np.zeros((num_slots, D))
for i, target in enumerate(target_slots):
gathered_results[target] += dummy_elements[i]
this operation looks like a bincount but instead of counting we sum the elements of another array.
(It is implied that np.max(target_slots)<num_slots and np.min(target_slots)>=0 and target_slots.shape[0] == D)
Approach #1
You are performing interval-ed summing selecting rows off dummy_elements and adding in at specific rows into the output array. So, one obvious choice of a vectorized solution would be with np.add.reduceat, like so -
sidx = target_slots.argsort()
out = np.zeros((num_slots, D))
unq, shift_idx = np.unique(target_slots[sidx],return_index=True)
out[unq] = np.add.reduceat(dummy_elements[sidx],shift_idx)
Approach #2
Alternatively, we can use np.bincount as well to perform these ID based summing operations. One way would be with a loop that iterates along the columns of dummy_elements and I think would be beneficial when the no. of such columns is comparatively smaller. The implementation would look like this -
out = np.zeros((num_slots, D))
L = target_slots.size
for i in range(D):
out[:,i] = np.bincount(target_slots,dummy_elements[:,i],minlength=L)
Approach #3
A vectorized version of the same would be like this -
L = target_slots.size
ids = (target_slots[:,None] + np.arange(D)*L).ravel('F')
out = np.bincount(ids,dummy_elements.ravel('F'),minlength=L*D).reshape(D,-1).T