How to Reverse One Hot Encoded Values of my model predictions [duplicate] - pandas

I have a list of label names which I enuemrated and created a dictionary:
my_list = [b'airplane',
b'automobile',
b'bird',
b'cat',
b'deer',
b'dog',
b'frog',
b'horse',
b'ship',
b'truck']
label_dict =dict(enumerate(my_list))
{0: b'airplane',
1: b'automobile',
2: b'bird',
3: b'cat',
4: b'deer',
5: b'dog',
6: b'frog',
7: b'horse',
8: b'ship',
9: b'truck'}
Now I'm trying to cleaning map/apply the dict value to my target which is in an one-hot-encoded form.
y_test[0]
array([ 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.])
y_test[0].map(label_dict) should return:
'cat'
I was playing around with
(lambda key,value: value for y_test[0] == 1)
but couldn't come up with any concrete
Thank you.

Since we are working with one-hot encoded array, argmax could be used to get the index for one off 1 for each row. Thus, using the list as input -
[my_list[i] for i in y_test.argmax(1)]
Or with np.take to have array output -
np.take(my_list,y_test.argmax(1))
To work with dict and assuming sequential keys as 0,1,.., we could have -
np.take(label_dict.values(),y_test.argmax(1))
If the keys are not essentially in sequence but sorted -
np.take(label_dict.values(), np.searchsorted(label_dict.keys(),y_test.argmax(1)))
Sample run -
In [79]: my_list
Out[79]:
['airplane',
'automobile',
'bird',
'cat',
'deer',
'dog',
'frog',
'horse',
'ship',
'truck']
In [80]: y_test
Out[80]:
array([[ 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.]])
In [81]: [my_list[i] for i in y_test.argmax(1)]
Out[81]: ['cat', 'automobile', 'ship']
In [82]: np.take(my_list,y_test.argmax(1))
Out[82]:
array(['cat', 'automobile', 'ship'],
dtype='|S10')

we can use dot product to reverse one-hot encoding, if it really is ONE-hot.
Let's start with factorizing your list
f, u = pd.factorize(my_list)
now if you have an array you'd like to get back your strings with
a = np.array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0])
Then use dot
a.dot(u)
'cat'
Now assume
y_test = np.array([
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
])
Then
y_test.dot(u)
array(['cat', 'automobile', 'ship'], dtype=object)
If it isn't one-hot but instead multi-hot, you could join with commas
y_test = np.array([
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 1, 0]
])
[', '.join(u[y.astype(bool)]) for y in y_test]
['cat', 'automobile, truck', 'bird, ship']

Related

One line to replace values with their row indices?

I have a numpy array
a = np.array([[1,0,0,1,0],
[0,1,0,0,0],
[0,0,1,0,1]])
I would like to replace every positive elements of this array by its row index+1. So the final result would be:
a = np.array([[1,0,0,1,0],
[0,2,0,0,0],
[0,0,3,0,3]])
Can I do this with a simply numpy command (without looping)?
Use numpy.arange
(a != 0) * np.reshape(np.arange(a.shape[0])+1, (-1, 1))
Output:
array([[1., 0., 0., 1., 0.],
[0., 2., 0., 0., 0.],
[0., 0., 3., 0., 3.]])
Works on any array:
a2 = np.array([[1,0,0,-1,0],
[0,20,0,0,0],
[0,0,-300,0,30]])
(a2 != 0) * np.reshape(np.arange(a2.shape[0])+1, (-1, 1))
Output:
array([[1., 0., 0., 1., 0.],
[0., 2., 0., 0., 0.],
[0., 0., 3., 0., 3.]])
Not sure if this is the proper numpy way, but you could use enumerate and multiply the sub-arrays by their indices:
>>> np.array([x * i for i, x in enumerate(a, start=1)])
array([[1, 0, 0, 1, 0],
[0, 2, 0, 0, 0],
[0, 0, 3, 0, 3]])
Note that this only works properly if "every positive element" is actually 1, as in your example, and if there are no negative elements. Alternatively, you can use a > 0 to first get an array with True (i.e. 1) in every place where a is > 0 and False (i.e. 0) otherwise.
>>> a = np.array([[ 1, 0, 0, 2, 0],
... [ 0, 3, 0, 0,-8],
,,, [-3, 0, 4, 0, 5]])
...
>>> np.array([x * i for i, x in enumerate(a > 0, start=1)])
array([[1, 0, 0, 1, 0],
[0, 2, 0, 0, 0],
[0, 0, 3, 0, 3]])

Sparse Matrix One 1 for Every Row

I would like to generate a random matrix MxN where every rows has just a single one in a random position.
For example, I would a matrix like this:
Out[3]:
array([[1, 0, 0],
[0, 1, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]])
I tried with
M = 5
N = 3
arr = np.array([1] + [0] * (N-1))
arr = np.tile(arr,(M,1))
np.random.shuffle(arr)
But it gives:
Out[75]:
array([[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0]])
There may be a more elegant way to do this, but works:
def randOne():
M = 5
N = 3
arr = np.zeros((M, N))
for row in range(M):
arr[row, np.random.randint(N)] = 1
return arr
>>> randOne() array([[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 1., 0., 0.]])
OR
Yup, there is a more elegant way to do this ;)
def randOne2(M=5, N=3):
arr = np.zeros((M, N), dtype=np.int8)
arr[np.arange(M),np.random.randint(0,N,M)] = 1
return arr
>>> randOne2()
array([[0, 0, 1],
[1, 0, 0],
[1, 0, 0],
[0, 1, 0],
[1, 0, 0]], dtype=int8)

Replace for loop with numpy operations for diagonal band

Is it possible to generate the following array without explicit loop?
nrows, ncols = 5, 3
d = np.zeros((nrows, nrows * ncols), dtype=np.uint8)
for i in range(nrows):
d[i][i * ncols:(i + 1) * ncols] = 1
print(d)
[[1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 1 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1 1 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 1 1 1]]
Using np.eye + np.repeat:
np.repeat(np.eye(nrows), ncols, axis=1)
array([[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.]])
You can use np.einsum:
nrows, ncols = 5, 3
out = np.zeros((nrows, nrows*ncols), 'u1')
np.einsum('iik->ik', out.reshape(nrows, nrows, ncols))[...] = 1
out
# array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]], dtype=uint8)
How about:
import numpy as np
ix = np.indices((5,15))
d = (ix[1] //3 == ix[0]).astype(int)
d
>> array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])

How to manipulate indices in tensorflow?

I am trying to use tf.gather_nd to convert
'R = tf.eye(3, batch_shape=[4])'
to :
array([[[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]],
[[0., 0., 1.],
[0., 1., 0.],
[1., 0., 0.]],
[[0., 1., 0.],
[0., 0., 1.],
[1., 0., 0.]],
[[1., 0., 0.],
[0., 0., 1.],
[0., 1., 0.]]], dtype=float32)'
With the index:
ind = array([[0, 2, 1],
[2, 1, 0],
[1, 2, 0],
[0, 2, 1]], dtype=int32)
I found out if I can convert the index matrix to something like:
ind_c = np.array([[[0, 0], [0, 2], [0, 1]],
[[1, 2], [1, 1], [1, 0]],
[[2, 1], [2, 2], [2, 0]],
[[3, 0], [3, 2], [3, 1]]])
gather_nd will do the job. so my question is:
is there a better way than converting the index ind to ind_c
if this the only way how I can convert ind to ind_c with tensorflow? (I have done this for now manually)
Thanks
You can try the following:
ind = tf.constant([[0, 2, 1],[2, 1, 0],[1, 2, 0],[0, 2, 1]], dtype=tf.int32)
# Creates the row indices matrix
row = tf.tile(tf.expand_dims(tf.range(tf.shape(ind)[0]), 1), [1, tf.shape(ind)[1]])
# Concat to the ind to form the index matrix
ind_c = tf.concat([tf.expand_dims(row,-1), tf.expand_dims(ind, -1)], axis=2)

how to understand the output of tf.nn.top_k() from tensorflow

I used tf.nn.top_k()function from tensorflow to use the model's softmax probabilities to visualize the certainty of its predictions with 5 new images and with k=5. I have an output as follows which I am not sure how to exactly interpret. Could anyone explain the output please.
TopKV2(values=array([[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0.]], dtype=float32), indices=array([[13, 0, 1, 2, 3],
[13, 0, 1, 2, 3],
[13, 0, 1, 2, 3],
[26, 0, 1, 2, 3],
[13, 0, 1, 2, 3]], dtype=int32))
From the documentation, it returns two tensors: the first with the top K value and the second with the indices of these values in the original tensor.
So for your data what I see is that the original tensor is always one-hot (i.e. has a single 1.0 entry per row and is 0 everywhere else).