Tensorflow indicator matrix for top n values - tensorflow

Does anyone know how to extract the top n largest values per row of a rank 2 tensor?
For instance, if I wanted the top 2 values of a tensor of shape [2,4] with values:
[[40, 30, 20, 10], [10, 20, 30, 40]]
The desired condition matrix would look like:
[[True, True, False, False],[False, False, True, True]]
Once I have the condition matrix, I can use tf.select to choose actual values.
Thank you for assistance!

You can do it using built-in tf.nn.top_k function:
a = tf.convert_to_tensor([[40, 30, 20, 10], [10, 20, 30, 40]])
b = tf.nn.top_k(a, 2)
print(sess.run(b))
TopKV2(values=array([[40, 30],
[40, 30]], dtype=int32), indices=array([[0, 1],
[3, 2]], dtype=int32))
print(sess.run(b).values))
array([[40, 30],
[40, 30]], dtype=int32)
To get boolean True/False values, you can first get the k-th value and then use tf.greater_equal:
kth = tf.reduce_min(b.values)
top2 = tf.greater_equal(a, kth)
print(sess.run(top2))
array([[ True, True, False, False],
[False, False, True, True]], dtype=bool)

you can also use tf.contrib.framework.argsort
a = [[40, 30, 20, 10], [10, 20, 30, 40]]
idx = tf.contrib.framework.argsort(a, direction='DESCENDING') # sorted indices
ranks = tf.contrib.framework.argsort(idx, direction='ASCENDING') # ranks
b = ranks < 2
# [[ True True False False] [False False True True]]
Moreover, you can replace 2 with a 1d tensor so that each row/column can have different n values.

Related

Descending sorting in numpy by several columns [duplicate]

This question already has answers here:
Numpy sort ndarray on multiple columns
(4 answers)
Closed last year.
I have NumPy array and need to sort it by two columns (first by column 0 and then sort equal values by column 1), both in descending order. When I try to sort sequentially by column 1 and column 0, the rows equal in the second sorting turn to be sorted in ascending order in the first sorting.
My array:
arr = np.array([
[150, 8],
[105, 20],
[90, 100],
[101, 12],
[110, 80],
[105, 100],
])
When I sort twice (by column 1 and column 0):
arr = arr[arr[:,1].argsort(kind='stable')[::-1]]
arr = arr[arr[:,0].argsort(kind='stable')[::-1]]
I have this result (where rows 2 and 3 are swapped):
array([[150, 8],
[110, 80],
[105, 20],
[105, 100],
[101, 12],
[ 90, 100]])
As far as I understand, it happens because stable mode preserves the original order for equal values, but when we flip the indices to make the order descend, the original order changes too.
The results I'd like to have:
array([[150, 8],
[110, 80],
[105, 100],
[105, 20],
[101, 12],
[ 90, 100]])
Use numpy.lexsort to sort on multiple columns at the same time.
arr = np.array([
[150, 8],
[105, 20],
[90, 100],
[101, 12],
[110, 80],
[105, 100],
])
order = np.lexsort([arr[:, 1], arr[:, 0]])[::-1]
arr[order]
yields:
array([[150, 8],
[110, 80],
[105, 100],
[105, 20],
[101, 12],
[ 90, 100]])

numpy unique over multiple arrays

Numpy.unique expects a 1-D array. If the input is not a 1-D array, it flattens it by default.
Is there a way for it to accept multiple arrays? To keep it simple, let's just say a pair of arrays, and we are unique-ing the pair of elements across the 2 arrays.
For example, say I have 2 numpy array as inputs
a = [1, 2, 3, 3]
b = [10, 20, 30, 31]
I'm unique-ing against both of these arrays, so against these 4 pairs (1,10), (2,20) (3, 30), and (3,31). These 4 are all unique, so I want my result to say
[True, True, True, True]
If instead the inputs are as follows
a = [1, 2, 3, 3]
b = [10, 20, 30, 30]
Then the last 2 elements are not unique. So the output should be
[True, True, True, False]
You could use the unique_indices value returned by numpy.unique():
In [243]: def is_unique(*lsts):
...: arr = np.vstack(lsts)
...: _, ind = np.unique(arr, axis=1, return_index=True)
...: out = np.zeros(shape=arr.shape[1], dtype=bool)
...: out[ind] = True
...: return out
In [244]: a = [1, 2, 2, 3, 3]
In [245]: b = [1, 2, 2, 3, 3]
In [246]: c = [1, 2, 0, 3, 3]
In [247]: is_unique(a, b)
Out[247]: array([ True, True, False, True, False])
In [248]: is_unique(a, b, c)
Out[248]: array([ True, True, True, True, False])
You may also find this thread helpful.

Can I create a view from a boolean selection of a numpy array?

If I create a numpy array, and another to serve as a selective index into it:
>>> x
array([[ 2, 3, 4],
[ 5, 6, 7],
[ 6, 7, 8],
[11, 12, 13]])
>>> nz
array([ True, True, False, True], dtype=bool)
then direct use of nz returns a view of the original array:
>>> x[nz,:]
array([[ 2, 3, 4],
[ 5, 6, 7],
[11, 12, 13]])
>>> x[nz,:] += 2
>>> x
array([[ 4, 5, 6],
[ 7, 8, 9],
[ 6, 7, 8],
[13, 14, 15]])
however, naturally, an assignment makes a copy:
>>> v = x[nz,:]
Any operation on v is on the copy, and has no effect on the original array.
Is there any way to create a named view, from x[nz,:], simply to abbreviate code, or which I can pass around, so operations on the named view will affect only the selected elements of x?
Numpy has masked_array, which might be what you are looking for:
import numpy as np
x = np.asarray([[ 2, 3, 4],[ 5, 6, 7],[ 6, 7, 8],[11, 12, 13]])
nz = np.asarray([ True, True, False, True], dtype=bool)
mx = np.ma.masked_array(x, ~nz.repeat(3)) # True means masked, so "~" is needed
mx += 2
# x changed as well because it is the base of mx
print(x)
print(x is mx.base)

How to use the 'where' option in numpy.multiply?

I need to multiply an array (NIR) with a scalar (f) but leaving some values that meet a certain condition intact.
I tried the following:
NIR_f = np.multiply(NIR,f,where=NIR!=-28672.0)
To check I made:
i,j=1119,753
NIR[i][j],NIR_f[i][j]
and I got this:
(-28672.0, 10058.0)
It is assumed that both results should be the same! In that position the condition is not met, therefore the value should remain intact.
Am I using the "where" option wrongly?
Without your array, or a smaller substitute, I can't exactly replicate your problem. But there are potentially 2 issues
float testing is not exact, so it might be matching one -28672.0, and not another.
the remain intact assumption is tricky. leave the value in the output alone, but what was it originally, 0's or NIR values.
Using an integer array to avoid the float issue:
In [20]: arr = np.arange(12).reshape(3,4)
In [21]: arr
Out[21]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [22]: np.multiply(arr, 10, where=arr!=10)
Out[22]:
array([[ 0, 10, 20, 30],
[ 40, 50, 60, 70],
[ 80, 90, 481036337249, 110]])
In [24]: np.multiply(arr, 10, where=arr!=10)
Out[24]:
array([[ 0, 10, 20, 30],
[ 40, 50, 60, 70],
[ 80, 90, 0, 110]])
arr[2,2] is random. In effect it started with a np.empty array of the right shape and dtype, and filled all values but that one with the multiplication. To use where correctly we need to specify an out parameter as well.
In [25]: out = np.full(arr.shape,-1)
In [26]: out
Out[26]:
array([[-1, -1, -1, -1],
[-1, -1, -1, -1],
[-1, -1, -1, -1]])
In [27]: np.multiply(arr, 10, where=arr!=10, out=out)
Out[27]:
array([[ 0, 10, 20, 30],
[ 40, 50, 60, 70],
[ 80, 90, -1, 110]])
The issue of inexact floats comes up often enough that I won't try to illustrate that.

Use an ufunc analogous to numpy.where

For example, if I want to add conditionally, I can use:
y = numpy.where(condition, a+b, b)
Is there a way to directly combine an ufunc and where? Something like:
y = numpy.add.where(condition, a, b)
Something along that line is add.at.
In [21]: b = np.arange(10)
In [22]: cond = b%3==0
Your where:
In [24]: np.where(cond, 10+b, b)
Out[24]: array([10, 1, 2, 13, 4, 5, 16, 7, 8, 19])
Use the other where (or np.nonzeros) to turn the boolean mask into index tuple
In [25]: cond
Out[25]: array([ True, False, False, True, False, False, True, False, False, True], dtype=bool)
In [26]: idx = np.where(cond)
In [27]: idx
Out[27]: (array([0, 3, 6, 9], dtype=int32),)
add.at does inplace, unbuffered addition:
In [28]: np.add.at(b,idx[0],10)
In [29]: b
Out[29]: array([10, 1, 2, 13, 4, 5, 16, 7, 8, 19])
add.at is intended as a way of getting around buffering problems with the more direct index +=:
In [30]: b = np.arange(10)
In [31]: b[idx[0]] += 10
In [32]: b
Out[32]: array([10, 1, 2, 13, 4, 5, 16, 7, 8, 19])
Here the action is the same (add.at is slower). But if there were duplicates in idx the results will be different.
+= also works with the boolean mask:
In [33]: b[cond] -= 10
In [34]: b
Out[34]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
There's got to be a ufunc equivalent to the += operator, but I don't use ufunc enough to know off hand.