How to encode multi-label representation using index? - pandas

I want to encode [[1, 2], [4]] to
[[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]]
while sklearn.preprocessing.MultiLabelbinarizer only gives
[[1, 1, 0],
[0, 0, 1]]
Anyone knows how to do it using Numpy or Pandas or sklearn built-in function?

MultilabelBinarizer will only know what you send in it. When it sees only 3 distinct classes, it will assign 3 columns only.
You need to set the classes param to set the total number of classes you are expecting in your dataset (in the order you want in the columns):
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=[0,1,2,3,4])
mlb.fit_transform([[1, 2], [4]])
#Output
array([[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]])

Related

find numpy rows that are the same

I have a numpy array
How can I find which of them are the same and how many times appear in the matrix?
thanks
dummy example:
A=np.array([[0, 1, 0, 1],[0, 0, 0, 0],[0, 1, 1, 1],[0, 0, 0, 0]])
You can use numpy.unique with axis=0 and return_counts=True:
np.unique(A, axis=0, return_counts=True)
Output:
(array([[0, 0, 0, 0],
[0, 1, 0, 1],
[0, 1, 1, 1]]),
array([2, 1, 1]))

What is the purpose of rotating filters while building convolutions with scipy signal?

I recently came across a bit of python code (shown below) which does 2d convolution with scipy signal.
x = np.array([[1, 1, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 0],
[0, 1, 1, 0, 0]],
dtype='float')
w_k = np.array([[1, 0, 1],
[0, 1, 0],
[1, 0, 1],],
dtype='float')
w_k = np.rot90(w_k, 2)
f = signal.convolve2d(x, w_k, 'valid')
Right before the convolve2d operation, the filter was rotated. What is the purpose of that?

Numpy check if a matrix can be transformed to another matrix by swaping columns

Say we have matrix A and B as follow
>>> A
matrix([[0, 0, 0, 1],
[1, 0, 0, 0],
[1, 0, 0, 0]])
>>> B
matrix([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 1, 0]])
Clearly we can "transform" matrix A to B by column swapping. Is there an efficient algorithm to check whether two (potentially large) matrices can be transformed to each other in this way?
Here is a simple function. For very large matrix, it is possible that (A==B).all() is slower than np.array_equal(A,B).
import numpy as np
A = np.array([[0, 0, 0, 1],
[1, 0, 0, 0],
[1, 0, 0, 0]])
B = np.array([[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 1, 0]])
def isSwaping(a, b):
count = 0
for i, c in enumerate(a.T): # transpose of a
for d in b.T:
if (c == d).all():
count += 1
break
if count == i : # then it is uncessary to continue
return False
return True
print isSwaping(A, B)

tensorflow manipulate labels vector into "multiple hot encoder"

is it possible (in a nice way, that is) in tensorflow to achieve the next functionality:
assume we have a dense vector of tags
labels = [0,3,1,2,0]
I need to make a "multiple hot encoder" of it. meaning, for each row I need 1's up to the index of the label minus 1
so the required result will be
[[0, 0, 0],
[1, 1, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]]
thanks
You could do this using tf.nn.embeddings_lookup as shown here:
embeddings = tf.constant([[0,0,0], [0,0,1], [0,1,1], [1,1,1]])
labels = [0,3,1,2,0]
encode_tensors = tf.nn.embedding_lookup(embeddings,labels)
Output of sess.run(encode_tensors) :
array([[0, 0, 0],
[1, 1, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]], dtype=int32)
Hope this helps !
for completion:
it's also possible to use:
In [397]: labels = np.array([1, 2, 0, 3, 0])
In [398]: sess.run(tf.sequence_mask(labels, 3, dtype=tf.int8))
Out[398]:
array([[1, 0, 0],
[1, 1, 0],
[0, 0, 0],
[1, 1, 1],
[0, 0, 0]], dtype=int8)
the result matrix will be reversed from what I asked though

Numpy `Broadcast` array

I would like to do a transformation like dimshuffle in theano using numpy.
Example input:
np.array([[1, 0, 0], [1, 0, 0]])
Example output:
np.array([
[[1, 0, 0], [1, 0, 0]],
[[1, 0, 0], [1, 0, 0]],
[[1, 0, 0], [1, 0, 0]]
])
I don't know what dimshuffle does, but the output can be produced with repeat
In [319]: np.repeat(np.array([[1, 0, 0], [1, 0, 0]])[None,:,:],3,axis=0)
Out[319]:
array([[[1, 0, 0],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 0]]])
The input is 2d (2,3), so I have to add an axis - output is (3,2,3). tile would work, so would indexing, or even:
A=np.array([[1, 0, 0], [1, 0, 0]])
np.array([A,A,A])