what kind of x fits into argsort(x) == argsort(argsort(x))? - numpy

For a 1-d array, what kind of x gives you argsort(x) == argsort(argsort(x)) ? sorted array would be a trivial soliton.
but you can have not sorted array like [1, 0, 2] or [1, 0, 2, 3]
i'm really interested.
sorted_array = np.arange(10)
np.testing.assert_array_equal(np.argsort(sorted_array), np.argsort(np.argsort(sorted_array)))
# or
semi_sorted = [1, 0, 2]
np.testing.assert_array_equal(np.argsort(semi_sorted), np.argsort(np.argsort(semi_sorted)))
# or
semi_sorted = [1, 0, 2, 3]
np.testing.assert_array_equal(np.argsort(semi_sorted), np.argsort(np.argsort(semi_sorted)))
# or
semi_sorted = [2, 1, 3, 4, 5]
np.testing.assert_array_equal(np.argsort(semi_sorted), np.argsort(np.argsort(semi_sorted)))
what type of arrays fits in the criteria?

To formalize #Alex Riley's intuition:
For any (zero based) permutation p we have argsort(p) = p^-1 because by definition of argsort p[argsort(p)] = [0,1,2,...] and [0,1,2,...] viewed as a permutation is the identity.
Now, no matter what x, argsort(x) is a permutation, so writing p for that we get p = p^-1 or, equivalently, p^2 = id.
What do permutations p that are self-inverse look like? If p is applied twice nothing changes, so if the first application of p moves x to y the second application of p must move y to x. As y may equal x p must therefore consist of flips of two elements and of elements that stay put. That is also sufficient.
We now know what argsort(x) looks like. What about x itself? Let us for simplicity assume x has only unique elements, otherwise the details of the sort algorithm used have to be considered. Let us write s for the sorted x. Then s = x[p]. Permuting both sides with p we get s[p] = x[p^2] = x. So x may be any sequence that is obtained from an ordered sequence by flipping the positions of some (possibly zero) nonoverlapping pairs.

Related

Explicit slicing across a particular dimension

I've got a 3D tensor x (e.g 4x4x100). I want to obtain a subset of this by explicitly choosing elements across the last dimension. This would have been easy if I was choosing the same elements across last dimension (e.g. x[:,:,30:50] but I want to target different elements across that dimension using the 2D tensor indices which specifies the idx across third dimension. Is there an easy way to do this in numpy?
A simpler 2D example:
x = [[1,2,3,4,5,6],[10,20,30,40,50,60]]
indices = [1,3]
Let's say I want to grab two elements across third dimension of x starting from points specified by indices. So my desired output is:
[[2,3],[40,50]]
Update: I think I could use a combination of take() and ravel_multi_index() but some of the platforms that are inspired by numpy (like PyTorch) don't seem to have ravel_multi_index so I'm looking for alternative solutions
Iterating over the idx, and collecting the slices is not a bad option if the number of 'rows' isn't too large (and the size of the sizes is relatively big).
In [55]: x = np.array([[1,2,3,4,5,6],[10,20,30,40,50,60]])
In [56]: idx = [1,3]
In [57]: np.array([x[j,i:i+2] for j,i in enumerate(idx)])
Out[57]:
array([[ 2, 3],
[40, 50]])
Joining the slices like this only works if they all are the same size.
An alternative is to collect the indices into an array, and do one indexing.
For example with a similar iteration:
idxs = np.array([np.arange(i,i+2) for i in idx])
But broadcasted addition may be better:
In [58]: idxs = np.array(idx)[:,None]+np.arange(2)
In [59]: idxs
Out[59]:
array([[1, 2],
[3, 4]])
In [60]: x[np.arange(2)[:,None], idxs]
Out[60]:
array([[ 2, 3],
[40, 50]])
ravel_multi_index is not hard to replicate (if you don't need clipping etc):
In [65]: np.ravel_multi_index((np.arange(2)[:,None],idxs),x.shape)
Out[65]:
array([[ 1, 2],
[ 9, 10]])
In [66]: x.flat[_]
Out[66]:
array([[ 2, 3],
[40, 50]])
In [67]: np.arange(2)[:,None]*x.shape[1]+idxs
Out[67]:
array([[ 1, 2],
[ 9, 10]])
along the 3D axis:
x = [x[:,i].narrow(2,index,2) for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
by enumerating you get the index of the axis and index from where you want to start slicing in one.
narrow gives you a zero-copy length long slice from a starting index start along a certain axis
you said you wanted:
dim = 2
start = index
length = 2
then you simply have to stack these tensors back to a single 3D.
This is the least work intensive thing i can think of for pytorch.
EDIT
if you just want different indices along different axis and indices is a 2D tensor you can do:
x = [x[:,i,index] for i,index in enumerate(indices)]
x = torch.stack(x,dim=1)
You really should have given a proper working example, making it unnecessarily confusing.
Here is how to do it in numpy, now clue about torch, though.
The following picks a slice of length n along the third dimension starting from points idx depending on the other two dimensions:
# example
a = np.arange(60).reshape(2, 3, 10)
idx = [(1,2,3),(4,3,2)]
n = 4
# build auxiliary 4D array where the last two dimensions represent
# a sliding n-window of the original last dimension
j,k,l = a.shape
s,t,u = a.strides
aux = np.lib.stride_tricks.as_strided(a, (j,k,l-n+1,n), (s,t,u,u))
# pick desired offsets from sliding windows
aux[(*np.ogrid[:j, :k], idx)]
# array([[[ 1, 2, 3, 4],
# [12, 13, 14, 15],
# [23, 24, 25, 26]],
# [[34, 35, 36, 37],
# [43, 44, 45, 46],
# [52, 53, 54, 55]]])
I came up with below using broadcasting:
x = np.array([[1,2,3,4,5,6,7,8,9,10],[10,20,30,40,50,60,70,80,90,100]])
i = np.array([1,5])
N = 2 # number of elements I want to extract along each dimension. Starting points specified in i
r = np.arange(x.shape[-1])
r = np.broadcast_to(r, x.shape)
ii = i[:, np.newaxis]
ii = np.broadcast_to(ii, x.shape)
mask = np.logical_and(r-ii>=0, r-ii<=N)
output = x[mask].reshape(2,3)
Does this look alright?

Get indices for values of one array in another array

I have two 1D-arrays containing the same set of values, but in a different (random) order. I want to find the list of indices, which reorders one array according to the other one. For example, my 2 arrays are:
ref = numpy.array([5,3,1,2,3,4])
new = numpy.array([3,2,4,5,3,1])
and I want the list order for which new[order] == ref.
My current idea is:
def find(val):
return numpy.argmin(numpy.absolute(ref-val))
order = sorted(range(new.size), key=lambda x:find(new[x]))
However, this only works as long as no values are repeated. In my example 3 appears twice, and I get new[order] = [5 3 3 1 2 4]. The second 3 is placed directly after the first one, because my function val() does not track which 3 I am currently looking for.
So I could add something to deal with this, but I have a feeling there might be a better solution out there. Maybe in some library (NumPy or SciPy)?
Edit about the duplicate: This linked solution assumes that the arrays are ordered, or for the "unordered" solution, returns duplicate indices. I need each index to appear only once in order. Which one comes first however, is not important (neither possible based on the data provided).
What I get with sort_idx = A.argsort(); order = sort_idx[np.searchsorted(A,B,sorter = sort_idx)] is: [3, 0, 5, 1, 0, 2]. But what I am looking for is [3, 0, 5, 1, 4, 2].
Given ref, new which are shuffled versions of each other, we can get the unique indices that map ref to new using the sorted version of both arrays and the invertibility of np.argsort.
Start with:
i = np.argsort(ref)
j = np.argsort(new)
Now ref[i] and new[j] both give the sorted version of the arrays, which is the same for both. You can invert the first sort by doing:
k = np.argsort(i)
Now ref is just new[j][k], or new[j[k]]. Since all the operations are shuffles using unique indices, the final index j[k] is unique as well. j[k] can be computed in one step with
order = np.argsort(new)[np.argsort(np.argsort(ref))]
From your original example:
>>> ref = np.array([5, 3, 1, 2, 3, 4])
>>> new = np.array([3, 2, 4, 5, 3, 1])
>>> np.argsort(new)[np.argsort(np.argsort(ref))]
>>> order
array([3, 0, 5, 1, 4, 2])
>>> new[order] # Should give ref
array([5, 3, 1, 2, 3, 4])
This is probably not any faster than the more general solutions to the similar question on SO, but it does guarantee unique indices as you requested. A further optimization would be to to replace np.argsort(i) with something like the argsort_unique function in this answer. I would go one step further and just compute the inverse of the sort:
def inverse_argsort(a):
fwd = np.argsort(a)
inv = np.empty_like(fwd)
inv[fwd] = np.arange(fwd.size)
return inv
order = np.argsort(new)[inverse_argsort(ref)]

Find indexes of nonuniform sample with np.random.choice

Let's say I have a positions information in the form a two large 1D arrays X and Y. I want to sample non-uniformly positions from these arrays.
I thought I could do this with np.random.choice, but since it only accepts 1D arrays and I cannot do:
Xsample = np.random.choice(X, n, p)
Ysample = np.random.choice(Y, n, p)
with n number of points in the sample, and p a probability array, since this will sample different points for Xsample and Ysample, I am left with finding a way to obtain the indexes of one sampling. The problem is that there is no guarantee that the numbers in the lists are unique so cannot quite use np.where.
Any thoughts?
Doh, I can just sample from the indexes.
Here's a working example:
X = np.array([1, 2, 3, 4, 5])
Y = np.array([11, 12, 13, 14, 15])
p = [0.25, 0., 0.5, 0.25]
sample_idxs = np.random.choice(arange(len(X)), 2, p)
# can also be
# sample_idxs = np.random.choice(len(X), 2, p)
sample_idxs
> array([2, 4])
X[sample_idxs]
> array([3, 5])
Y[sample_idxs]
> array([13, 15])

Numpy index of the maximum with reduction - numpy.argmax.reduceat

I have a flat array b:
a = numpy.array([0, 1, 1, 2, 3, 1, 2])
And an array c of indices marking the start of each "chunk":
b = numpy.array([0, 4])
I know I can find the maximum in each "chunk" using a reduction:
m = numpy.maximum.reduceat(a,b)
>>> array([2, 3], dtype=int32)
But... Is there a way to find the index of the maximum <edit>within a chunk</edit> (like numpy.argmax), with vectorized operations (no lists, loops)?
Borrowing the idea from this post.
Steps involved :
Offset all elements in a group by a limit-offset. Sort them globally, thus limiting each group to stay at their positions, but sorting the elements within each group.
In the sorted array, we would look for the last element, which would be the group max. Their indices would be the argmax after offsetting down for the group lengths.
Thus, a vectorized implementation would be -
def numpy_argmax_reduceat(a, b):
n = a.max()+1 # limit-offset
grp_count = np.append(b[1:] - b[:-1], a.size - b[-1])
shift = n*np.repeat(np.arange(grp_count.size), grp_count)
sortidx = (a+shift).argsort()
grp_shifted_argmax = np.append(b[1:],a.size)-1
return sortidx[grp_shifted_argmax] - b
As a minor tweak and possibly faster one, we could alternatively create shift with cumsum and thus have a variation of the earlier approach, like so -
def numpy_argmax_reduceat_v2(a, b):
n = a.max()+1 # limit-offset
id_arr = np.zeros(a.size,dtype=int)
id_arr[b[1:]] = 1
shift = n*id_arr.cumsum()
sortidx = (a+shift).argsort()
grp_shifted_argmax = np.append(b[1:],a.size)-1
return sortidx[grp_shifted_argmax] - b

Numpy loop using an index

I'm a newbie and was trying something in python 2.7.2 with Numpy which wasn't working as expected so wanted to check if there was something basic I was misunderstanding.
I was calculating a value for a triangle (trinormals) and then updating a value per point of the triangle (vertnormals) using an array of triangle indexes (trivertexidx). As a loop I was calculating:
for itri in range(ntriangles) :
vertnormals[(trivertidx[itri,0]),:] += trinormals[itri,:]
vertnormals[(trivertidx[itri,1]),:] += trinormals[itri,:]
vertnormals[(trivertidx[itri,2]),:] += trinormals[itri,:]
As this was a little slow I thought it could be modified to :
vertnormals[(trivertidx[:,0]),:] += trinormals[:,:]
vertnormals[(trivertidx[:,1]),:] += trinormals[:,:]
vertnormals[(trivertidx[:,2]),:] += trinormals[:,:]
However this doesn't give the same results. Is there another simpler way to write the loop? Any pointers appreciated. Note the intent here was to get a single value for each entry in vertnormals and then normalise the result.
Numpy has a function bincount that can be very helpful in situations like this. The two lines bellow are the the same when the elements of index are unique, but different when index has repeated values:
A[index] += W
A += np.bincount(index, W, minlenght=len(A))
I believe you want the behavior of the second, but you're code is a little more complex because A, index, and W are not 1d. You can try something like this,
import numpy as np
N = len(vertnormals)
for j in range(vertnormals.shape[-1]):
vertnormals[:, j] += np.bincount(trivertidx[:, 0], trinormals[:, j], minlength=N)
vertnormals[:, j] += np.bincount(trivertidx[:, 1], trinormals[:, j], minlength=N)
vertnormals[:, j] += np.bincount(trivertidx[:, 2], trinormals[:, j], minlength=N)
Hope that helps.
If I am understanding your question well, you have m points from which you have formed n triangles, and trivertidx is an array of shape (n, 3) holding values in the range [0, m), where trivertidx[j] is the list of the 3 points making up the j-th triangle.
trinormals then is an array of shape (n,) holding a value assigned to each triangle, and you want vertnormals to be an array of shape (m,) holding, for each point, the sum of the values assigned to each triangle that point is a vertex of.
If the above is right, the following example should show why your second code is not working properly:
>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,2,0,2]] += 1
>>> a
array([1, 2, 3, 3, 4])
Even though the element in position 2 shows up twice in the left hand side, what happens is that two copies of the same value have 1 added, and then the incremented value is copied twice to the same position.
To vectorize this summation you would need an array of shape (n, m) where the value at position [j, k] is True if vertex k is part of triangle j, False if not. You could build that array like this:
trivert = np.zeros((n, m), dtype='bool')
trivert[np.arange(n).reshape(n, 1), trivertidx] = 1
Once you have this array, you can get your sums for each vertex as
vertnormals = np.sum(trivert * trinormals.reshape(-1, 1), axis=0)