numpy get top k elements from last dimension of ndarray

numpy get top k elements from last dimension of ndarray - numpy

I have a multidimensional array, and I need to get the top k elements from each row of the last dimension.
>>> x = np.random.random_integers(0, 100, size=(2,1,1,5))
>>> x
array([[[[99, 39, 10, 18, 68]]],
[[[22, 3, 13, 56, 2]]]])
I'm trying to get:
array([[[[ 99., 68.]]],
[[[ 18., 99.]]]])
I can get the indices using the following, but I'm not sure how to slice out the values.
>>> k = 2
>>> parts = np.flip(-1 - np.arange(k), 0)
>>> indices = np.flip(
... np.argpartition(x, parts, axis=-1)[..., -k:],
... axis=-1)
>>> indices
array([[[[0, 4]]],
[[[3, 0]]]])

This could solve your problem.
np.sort(x, axis=len(x.shape)-1)[...,-2:]

np.partition(x, 2)[..., -2:]
returns 2 largest elements from each row. E.g.,
x = np.random.random_integers(0, 100, size=(2,1,1,5))
print(x)
print(np.partition(x, 2)[..., -2:])
prints something like
[[[[79 34 90 80 56]]]
[[[78 11 24 20 42]]]]
[[[[80 90]]]
[[[78 42]]]]

Related

find the array index which its element is most near greater than a value

I have a sorted array.
x = [1, 10, 12, 16, 19, 20, 21, ....]
for any given number y which is between [x[0], x[-1]], I want to find the index of the element which is the most near greater than y, for example, if y = 0, it returns 0, if y = 18, it returns 4
Is there a function available?

Without any external library, you can use bisect
i = bisect.bisect_right(x, y)
i will be the index of the element you wanted.

Given the sorted nature, we can use np.searchsorted -
idx = np.searchsorted(x,y,'right')

You can use numpy.argmin on the absolute value of the difference:
import numpy as np
x = np.array([1, 10, 12, 16, 19, 20, 21])
def find_closest(x,y):
return (np.abs(x-y)).argmin()
for y in [0,18]:
print(find_closest(x,y))
0
4

Finding those elements in an array which are "close"

I have an 1 dimensional sorted array and would like to find all pairs of elements whose difference is no larger than 5.
A naive approach would to be to make N^2 comparisons doing something like
diffs = np.tile(x, (x.size,1) ) - x[:, np.newaxis]
D = np.logical_and(diffs>0, diffs<5)
indicies = np.argwhere(D)
Note here that the output of my example are indices of x. If I wanted the values of x which satisfy the criteria, I could do x[indicies].
This works for smaller arrays, but not arrays of the size with which I work.
An idea I had was to find where there are gaps larger than 5 between consecutive elements. I would split the array into two pieces, and compare all the elements in each piece.
Is this a more efficient way of finding elements which satisfy my criteria? How could I go about writing this?
Here is a small example:
x = np.array([ 9, 12,
21,
36, 39, 44, 46, 47,
58,
64, 65,])
the result should look like
array([[ 0, 1],
[ 3, 4],
[ 5, 6],
[ 5, 7],
[ 6, 7],
[ 9, 10]], dtype=int64)

Here is a solution that iterates over offsets while shrinking the set of candidates until there are none left:
import numpy as np
def f_pp(A, maxgap):
d0 = np.diff(A)
d = d0.copy()
IDX = []
k = 1
idx, = np.where(d <= maxgap)
vidx = idx[d[idx] > 0]
while vidx.size:
IDX.append(vidx[:, None] + (0, k))
if idx[-1] + k + 1 == A.size:
idx = idx[:-1]
d[idx] = d[idx] + d0[idx+k]
k += 1
idx = idx[d[idx] <= maxgap]
vidx = idx[d[idx] > 0]
return np.concatenate(IDX, axis=0)
data = np.cumsum(np.random.exponential(size=10000)).repeat(np.random.randint(1, 20, (10000,)))
pairs = f_pp(data, 1)
#pairs = set(map(tuple, pairs))
from timeit import timeit
kwds = dict(globals=globals(), number=100)
print(data.size, 'points', pairs.shape[0], 'close pairs')
print('pp', timeit("f_pp(data, 1)", **kwds)*10, 'ms')
Sample run:
99963 points 1020651 close pairs
pp 43.00256529124454 ms

Your idea of slicing the array is a very efficient approach. Since your data are sorted you can just calculate the difference and split it:
d=np.diff(x)
ind=np.where(d>5)[0]
pieces=np.split(x,ind)
Here pieces is a list, where you can then use in a loop with your own code on every element.
The best algorithm is highly dependent on the nature of your data which I'm unaware. For example another possibility is to write a nested loop:
pairs=[]
for i in range(x.size):
j=i+1
while x[j]-x[i]<=5 and j<x.size:
pairs.append([i,j])
j+=1
If you want it to be more clever, you can edit the outer loop in a way to jump when j hits a gap.

Group numpy into multiple sub-arrays using an array of values

I have an array of points along a line:
a = np.array([18, 56, 32, 75, 55, 55])
I have another array that corresponds to the indices I want to use to access the information in a (they will always have equal lengths). Neither array a nor array b are sorted.
b = np.array([0, 2, 3, 2, 2, 2])
I want to group a into multiple sub-arrays such that the following would be possible:
c[0] -> array([18])
c[2] -> array([56, 75, 55, 55])
c[3] -> array([32])
Although the above example is simple, I will be dealing with millions of points, so efficient methods are preferred. It is also essential later that any sub-array of points can be accessed in this fashion later in the program by automated methods.

Here's one approach -
def groupby(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
return out
A simpler, but lesser efficient approach would be to use np.split to replace the last few lines and get the output, like so -
out = np.split(a_sorted, np.flatnonzero(b_sorted[1:] != b_sorted[:-1])+1 )
Sample run -
In [38]: a
Out[38]: array([18, 56, 32, 75, 55, 55])
In [39]: b
Out[39]: array([0, 2, 3, 2, 2, 2])
In [40]: groupby(a, b)
Out[40]: [array([18]), array([56, 75, 55, 55]), array([32])]
To get sub-arrays covering the entire range of IDs in b -
def groupby_perID(a, b):
# Get argsort indices, to be used to sort a and b in the next steps
sidx = b.argsort(kind='mergesort')
a_sorted = a[sidx]
b_sorted = b[sidx]
# Get the group limit indices (start, stop of groups)
cut_idx = np.flatnonzero(np.r_[True,b_sorted[1:] != b_sorted[:-1],True])
# Create cut indices for all unique IDs in b
n = b_sorted[-1]+2
cut_idxe = np.full(n, cut_idx[-1], dtype=int)
insert_idx = b_sorted[cut_idx[:-1]]
cut_idxe[insert_idx] = cut_idx[:-1]
cut_idxe = np.minimum.accumulate(cut_idxe[::-1])[::-1]
# Split input array with those start, stop ones
out = [a_sorted[i:j] for i,j in zip(cut_idxe[:-1],cut_idxe[1:])]
return out
Sample run -
In [241]: a
Out[241]: array([18, 56, 32, 75, 55, 55])
In [242]: b
Out[242]: array([0, 2, 3, 2, 2, 2])
In [243]: groupby_perID(a, b)
Out[243]: [array([18]), array([], dtype=int64),
array([56, 75, 55, 55]), array([32])]

why does tf.argmax not getting the right values in my array

I seem to have a problem of argmax getting the right index for my array. It suppose to return a value 0 but I got a value 18. Here is an example:
>>> a = tf.constant([-0.00000000e+00, 1.31838050e-07, 7.86561927e-11,1.95077332e-09, 4.71118966e-09, 2.67971922e-10,3.62677839e-11 ,9.57063651e-10, 3.25077543e-09, 6.84045816e-08, 2.71129057e-08, 4.34358327e-10, 3.01831915e-09, 6.50069998e-09,1.40559550e-10, 4.57989238e-08, 1.42130885e-08, 9.68442881e-10, 8.28957923e-07,6.10620265e-09, 2.63989475e-09])
>>> a.eval()
array([ -0.00000000e+00, 1.31838050e-07, 7.86561927e-11,
1.95077332e-09, 4.71118966e-09, 2.67971922e-10,
3.62677839e-11, 9.57063651e-10, 3.25077543e-09,
6.84045816e-08, 2.71129057e-08, 4.34358327e-10,
3.01831915e-09, 6.50069998e-09, 1.40559550e-10,
4.57989238e-08, 1.42130885e-08, 9.68442881e-10,
8.28957923e-07, 6.10620265e-09, 2.63989475e-09], dtype=float32)
>>> b = tf.argmax(a,0)
>>> b.eval()
>>> 18

a[18]=8.2895792e-07 > a[0]=0
There is no problem, a[18] is the max value in your array, all your numbers are positive...

Numpy : resize array

I have two Numpy array whose size is 994 and 1000. As such I when I am doing the below operation:
X * Y
I get error that "ValueError: operands could not be broadcast together with shapes (994) (1000)"
Hence as per fix I am trying to pad extras / trailing zeros to the array which great size by below method:
padzero = 0
if(bw.size > w.size):
padzero = bw.size - w.size
w = np.pad(w,padzero, 'constant', constant_values=0)
if(bw.size < w.size):
padzero = w.size - bw.size
bw = np.pad(bw,padzero, 'constant', constant_values=0)
But now the issue comes that if the size difference is 6 then 12 0's are getting padded in the array - which exactly should be six in my case.
I tried many ways to achieve this but its not resulting to resolve the issue. If I try he below way:
bw = np.pad(bw,padzero/2, 'constant', constant_values=0)
ValueError: Unable to create correctly shaped tuple from 3.0
How can I fix the issue?

a = np.array([1, 2, 3])
To insert zeros front:
np.pad(a,(2,0),'constant', constant_values=0)
array([0, 0, 1, 2, 3])
To insert zeros back:
np.pad(a,(0,2),'constant', constant_values=0)
array([1, 2, 3, 0, 0])
Front and back:
np.pad(a,(1,1),'constant', constant_values=0)
array([0, 1, 2, 3, 0])

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

numpy get top k elements from last dimension of ndarray - numpy

This could solve your problem. np.sort(x, axis=len(x.shape)-1)[...,-2:]

np.partition(x, 2)[..., -2:] returns 2 largest elements from each row. E.g., x = np.random.random_integers(0, 100, size=(2,1,1,5)) print(x) print(np.partition(x, 2)[..., -2:]) prints something like [[[[79 34 90 80 56]]] [[[78 11 24 20 42]]]] [[[[80 90]]] [[[78 42]]]]

Related

find the array index which its element is most near greater than a value

Finding those elements in an array which are "close"

Group numpy into multiple sub-arrays using an array of values

why does tf.argmax not getting the right values in my array

Numpy : resize array

Categories

Resources