tensorflow manipulate labels vector into "multiple hot encoder" - tensorflow

is it possible (in a nice way, that is) in tensorflow to achieve the next functionality:
assume we have a dense vector of tags
labels = [0,3,1,2,0]
I need to make a "multiple hot encoder" of it. meaning, for each row I need 1's up to the index of the label minus 1
so the required result will be
[[0, 0, 0],
[1, 1, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]]
thanks

You could do this using tf.nn.embeddings_lookup as shown here:
embeddings = tf.constant([[0,0,0], [0,0,1], [0,1,1], [1,1,1]])
labels = [0,3,1,2,0]
encode_tensors = tf.nn.embedding_lookup(embeddings,labels)
Output of sess.run(encode_tensors) :
array([[0, 0, 0],
[1, 1, 1],
[0, 0, 1],
[0, 1, 1],
[0, 0, 0]], dtype=int32)
Hope this helps !

for completion:
it's also possible to use:
In [397]: labels = np.array([1, 2, 0, 3, 0])
In [398]: sess.run(tf.sequence_mask(labels, 3, dtype=tf.int8))
Out[398]:
array([[1, 0, 0],
[1, 1, 0],
[0, 0, 0],
[1, 1, 1],
[0, 0, 0]], dtype=int8)
the result matrix will be reversed from what I asked though

Related

How to pad a list of NumPy arrays in a vectorized way?

I am trying to find a vectorized way (or at least, better than using a loop) to create a three-dimensional NumPy array from a list of 2D NumPy arrays. Right now, I have a list L that looks something like:
L = [ np.array([[1,2,3], [4,5,6]]), np.array([[8,9,10]]), ...]
Each NumPy array has the same size for the second dimension (in the above case, it is 3). But the first dimension has different sizes.
My goal is to create a 3D NumPy array M that incorporates the above data. I've been trying to use the np.pad() function, since I have a maximum size for the first dimension of each of my arrays, but it looks like it would only operate on the individual elements of my list. I could then do what I wanted using the function and looping over every array. However, I'd like to do this without a loop if possible, using a vectorized approach. Are there any techniques to do this?
This question is related to this one, though I'm hoping to do this over my whole list at once.
First lets look at the common task of padding 1d arrays to a common size.
In [441]: alist = [np.ones((2,),int),np.zeros((1,),int)+2, np.zeros((3,),int)+3]
In [442]: alist
Out[442]: [array([1, 1]), array([2]), array([3, 3, 3])]
The obvious iterative approach:
In [443]: [np.hstack((arr, np.zeros((3-arr.shape[0]),int))) for arr in alist]
Out[443]: [array([1, 1, 0]), array([2, 0, 0]), array([3, 3, 3])]
In [444]: np.stack(_)
Out[444]:
array([[1, 1, 0],
[2, 0, 0],
[3, 3, 3]])
A clever alternative. It still requires an iteration to determine sizes, but the rest is whole-array "vectorization":
In [445]: sizes = [arr.shape[0] for arr in alist]
In [446]: sizes
Out[446]: [2, 1, 3]
Make the output array with the pad values:
In [448]: res = np.zeros((3,3),int)
Make a clever mask (#Divakar first proposed this)
In [449]: np.array(sizes)[:,None]>np.arange(3)
Out[449]:
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
then map the 'flattened' inputs to res:
In [450]: res[_]=np.hstack(alist)
In [451]: res
Out[451]:
array([[1, 1, 0],
[2, 0, 0],
[3, 3, 3]])
I think this process can be extended to your 2d=>3d case. But it will take a bit of work. I tried doing it directly and found I was getting lost in applying the mask. That's why I decided to first layout the 1d=>2d case. There's enough thinking-outside-the-box that I have to work out the details fresh each time.
2d=>3d
In [457]: a2list = [np.ones((2,3),int),np.zeros((1,3),int)+2, np.zeros((3,3),int)+3]
In [458]: [np.vstack((arr, np.zeros((3-arr.shape[0],arr.shape[1]),int))) for arr in a2list]
Out[458]:
[array([[1, 1, 1],
[1, 1, 1],
[0, 0, 0]]),
array([[2, 2, 2],
[0, 0, 0],
[0, 0, 0]]),
array([[3, 3, 3],
[3, 3, 3],
[3, 3, 3]])]
In [459]: np.stack(_)
Out[459]:
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[0, 0, 0],
[0, 0, 0]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
Now for the 'vectorized' approach:
In [460]: sizes = [arr.shape[0] for arr in a2list]
In [461]: sizes
Out[461]: [2, 1, 3]
In [462]: np.array(sizes)[:,None]>np.arange(3)
Out[462]:
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
In [463]: res = np.zeros((3,3,3),int)
and the corresponding indices from the mask:
In [464]: I,J=np.nonzero(Out[462])
In [465]: I
Out[465]: array([0, 0, 1, 2, 2, 2])
In [466]: J
Out[466]: array([0, 1, 0, 0, 1, 2])
In [467]: res[I,J,:] = np.vstack(a2list)
In [468]: res
Out[468]:
array([[[1, 1, 1],
[1, 1, 1],
[0, 0, 0]],
[[2, 2, 2],
[0, 0, 0],
[0, 0, 0]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])

What is the purpose of rotating filters while building convolutions with scipy signal?

I recently came across a bit of python code (shown below) which does 2d convolution with scipy signal.
x = np.array([[1, 1, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 1, 1],
[0, 0, 1, 1, 0],
[0, 1, 1, 0, 0]],
dtype='float')
w_k = np.array([[1, 0, 1],
[0, 1, 0],
[1, 0, 1],],
dtype='float')
w_k = np.rot90(w_k, 2)
f = signal.convolve2d(x, w_k, 'valid')
Right before the convolve2d operation, the filter was rotated. What is the purpose of that?

Tensorflow.js: tf.pad results in TypeError: t.map is not a function

This code is from the TF API docs:
let t = tf.tensor([[1, 2, 3], [4, 5, 6]])
let padding = tf.tensor([[1, 1,], [2, 2]])
When I execute it:
tf.pad(t, padding, "CONSTANT")
I get:
TypeError: t.map is not a function
I'm using the latest version of tfjs.
padding is a normal js array of tuples ( array of arrray) and not a tensor.
As for now, the version 1.3.1, only the CONSTANT mode is supported. Here is the way to go:
let t = tf.tensor([[1, 2, 3], [4, 5, 6]])
let padding = [[2, 2,], [1, 1]]
tf.pad(t, padding).print()
// [[0, 0, 0, 0, 0],
// [0, 0, 0, 0, 0],
// [0, 1, 2, 3, 0],
// [0, 4, 5, 6, 0],
// [0, 0, 0, 0, 0],
// [0, 0, 0, 0, 0]]

How to encode multi-label representation using index?

I want to encode [[1, 2], [4]] to
[[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]]
while sklearn.preprocessing.MultiLabelbinarizer only gives
[[1, 1, 0],
[0, 0, 1]]
Anyone knows how to do it using Numpy or Pandas or sklearn built-in function?
MultilabelBinarizer will only know what you send in it. When it sees only 3 distinct classes, it will assign 3 columns only.
You need to set the classes param to set the total number of classes you are expecting in your dataset (in the order you want in the columns):
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=[0,1,2,3,4])
mlb.fit_transform([[1, 2], [4]])
#Output
array([[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]])

Numpy `Broadcast` array

I would like to do a transformation like dimshuffle in theano using numpy.
Example input:
np.array([[1, 0, 0], [1, 0, 0]])
Example output:
np.array([
[[1, 0, 0], [1, 0, 0]],
[[1, 0, 0], [1, 0, 0]],
[[1, 0, 0], [1, 0, 0]]
])
I don't know what dimshuffle does, but the output can be produced with repeat
In [319]: np.repeat(np.array([[1, 0, 0], [1, 0, 0]])[None,:,:],3,axis=0)
Out[319]:
array([[[1, 0, 0],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 0]],
[[1, 0, 0],
[1, 0, 0]]])
The input is 2d (2,3), so I have to add an axis - output is (3,2,3). tile would work, so would indexing, or even:
A=np.array([[1, 0, 0], [1, 0, 0]])
np.array([A,A,A])