How to design the label for tensorflow's ctc loss layer - tensorflow

I just started using ctc loss layer in tensorflow(r1.0) and got a little bit confused with the "labels" input
In tensorflow's API document, it says
labels: An int32 SparseTensor. labels.indices[i, :] == [b, t] means labels.values[i] stores the id for (batch b, time t). labels.values[i] must take on values in [0, num_labels)
Is [b,t] and values[i] mean there is a label "values[i]" at "t" of sequence "b" in the batch?
It says value must be in [0,num_labels), but for a sparse tensor, almost everywhere is 0 excepted for some specified places, so I don't really know how should the sparse tensor for ctc be like
And for example, if I have a short video of hand gesture, and it has a label "1",should I label the output of all timesteps as "1", or only label the last timestep as "1" and take other as "blank"?
thanks!

To address your questions:
1. The notation in the documentation here seems a bit misleading, as the output label index t need not be the same as the input time slice, it's simply the index to the output sequence. A different letter could be used because the input and output sequences are not explicitly aligned. Otherwise, your assertion seems correct. I give an example below.
Zero is a valid class in your sequence output label. The so-called blank label in TensorFlow's CTC implementation is the last (largest) class, which should probably not be in your ground truth labels anyhow. So if you were writing a binary sequence classifier, you'd have three classes, 0 (say "off"), 1 ("on") and 2 ("blank" output of CTC).
CTC Loss is for labeling sequence input with sequence output. If you only have
a single class label output for the sequence input, you're probably better off using a softmax cross entropy loss on the output of the last time step of the RNN cell.
If you do end up using CTC loss, you can see how I've constructed the training sequence through a reader here: How to generate/read sparse sequence labels for CTC loss within Tensorflow?.
As an example, after I batch two examples that have label sequences [44, 45, 26, 45, 46, 44, 30, 44] and [5, 8, 17, 4, 18, 19, 14, 17, 12], respectively, I get the following result from evaluating the (batched) SparseTensor:
SparseTensorValue(indices=array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[0, 7],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 6],
[1, 7],
[1, 8]]), values=array([44, 45, 26, 45, 46, 44, 30, 44, 5, 8, 17, 4, 18, 19, 14, 17, 12], dtype=int32), dense_shape=array([2, 9]))
Notice how the rows of the indices in the sparse tensor value correspond to the batch number and the columns correspond to the sequence index for that particular label. The values themselves are the sequence label classes. The rank is 2 and the size of the last dimension (nine in this case) is the length of the longest sequence.

Related

What is the difference between np.array([val1, val2]) and np.array([[val1, val2]])?

What is the difference between np.array([1, 2]) and np.array([[1, 2]])?
Which one of them is a matrix?
I also do not understand the output for shape of the above tensors. The former returns (2,) and the latter returns (1,2).
np.array([1, 2]) builds an array starting from a list, thus giving you a 1D array with the shape (2, ) since it only contains a single list of two elements.
When using the double [ you are actually passing a list of lists, thus this gets you a multidimensional array, or matrix, with the shape (1, 2).
With the latter you are able to build more complex matrices like:
np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
rendering a 3x3 matrix:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

Argmax indexing in pytorch with 2 tensors of equal shape

Summarize the problem
I am working with high dimensional tensors in pytorch and I need to index one tensor with the argmax values from another tensor. So I need to index tensor y of dim [3,4] with the results from the argmax of tensor xwith dim [3,4]. If tensors are:
import torch as T
# Tensor to get argmax from
# expected argmax: [2, 0, 1]
x = T.tensor([[1, 2, 8, 3],
[6, 3, 3, 5],
[2, 8, 1, 7]])
# Tensor to index with argmax from preivous
# expected tensor to retrieve [2, 4, 9]
y = T.tensor([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
# argmax
x_max, x_argmax = T.max(x, dim=1)
I would like an operation that given the argmax indexes of x, or x_argmax, retrieves the values in tensor y in the same indexes x_argmax indexes.
Describe what you’ve tried
This is what I have tried:
# What I have tried
print(y[x_argmax])
print(y[:, x_argmax])
print(y[..., x_argmax])
print(y[x_argmax.unsqueeze(1)])
I have been reading a lot about numpy indexing, basic indexing, advanced indexing and combined indexing. I have been trying to use combined indexing (since I want a slice in first dimension of the tensor and the indexes values on the second one). But I have not been able to come up with a solution for this use case.
You are looking for torch.gather:
idx = torch.argmax(x, dim=1, keepdim=true) # get argmax directly, w/o max
out = torch.gather(y, 1, idx)
Resulting with
tensor([[2],
[4],
[9]])
How about y[T.arange(3), x_argmax]?
That does the job for me...
Explanation: You take dimensional information away when you invoke T.max(x, dim=1), so this information needs to be restored explicitly.

Max tensor (not element) in an n-dimensional tensor

I am finding it impossible to get the max tensor in an n-dimensional array, even by summing the tensors and using gather or gather_nd.
By max tensor I mean the set of weights with the highest sum.
I have a tensor of shape (-1, 4, 30, 256) where 256 is the weights.
I need to get the maximum set of weights for each (-1, 0, 30), (-1, 1, 30), (-1, 2, 30) and (-1, 3, 30), so under each tensor in the 2nd dimension.
This would ideally result in a (-1, 4, 256) tensor.
reduce_max and any other max function will only return the maximum element values within the last dimension, not the maximum tensor (which is the set of weights with the highest sum) in the dimension itself. I have tried:
p1 = tf.reduce_sum(tensor, axis=3) # (-1, 4, 30)
p2 = tf.argmax(p1, 2) # (-1, 4)
Which gives the appropriate index values for the 3rd dimension:
[[0, 2, 2, 0],
[0, 1, 3, 0],
...
But running tf.gather or tf.gather_nd on the above does not work, even when splitting my data beforehand and using different axes.
Further, I can get the appropriated indexes if I use gather_nd by hand, eg:
tf.gather_nd(out5, [[0,0,0], [0,1,2], [0,2,2], [0,3,0], [1,0,0], [1,1,2], [1,2,2], [1,3,1]])
But as we are using a tensorflow variable of an unknown first dimension, I cannot build these indexes.
I have searched through related workarounds and found nothing applicable.
Can anyone tell me how to accomplish this? Thanks!
edit for clarification:
The maximum tensor of weights would be the set of weights with the highest sum:
[[ 1, 2, 3], [0, 0, 2], [1, 0, 2]] would be [1, 2, 3]
I figured it out using map_fn:
I reshaped my tensor to (-1, 120, 256)
tfr = tf.reshape(sometensor, ((-1, 120, 256)))
def func(slice):
f1 = tf.reduce_sum(slice, axis=1)
f2 = tf.argmax(f1)
return(slice[f2])
bla = tf.map_fn(func, tfr)
Which returns (-1,256) with the greatested summed vector (highest set of weights).
Basically, map_fn will iterate along the 2nd to last axis, so it slices a chunk of (120,256) to func repeatedly (how ever many entries are on the first axis). It then returns the appropriate (1,256) chunk by chunk which, voila, gives the answer.

tensorflow transform a (structured) dense matrix to sparse, when number of rows unknow

My task is to transform a special formed dense matrix tensor into a sparse one. e.g. input matrix M as followed (dense positive integer sequence followed by 0 as padding in each row)
[[3 5 7 0]
[2 2 0 0]
[1 3 9 0]]
Additionally, given the non-padding length for each row, e.g. given by tensor L =
[3, 2, 3].
The desired output would be sparse tensor S.
SparseTensorValue(indices=array([[0, 0],[0, 1],[0, 2],[1, 0],[1, 1],[2, 0],[2, 1], [2, 2]]), values=array([3, 5, 7, 2, 2, 1, 3, 9], dtype=int32), shape=array([3, 4]))
This is useful in models where objects are described by variable-sized descriptors (S are then used in embedding_lookup_sparse to connect embeddings of descriptors.)
I am able to do it when number of M's row is known (by python loop and ops like slice and concat). However, M's row number here is determined by mini-batch size and could change (say in testing phase). Is there a good way to implement that? I am trying some control_flow_ops but haven't succeeded.
Thanks!!

How do I swap tensor's axes in TensorFlow?

I have a tensor of shape (30, 116, 10), and I want to swap the first two dimensions, so that I have a tensor of shape (116, 30, 10)
I saw that numpy as such a function implemented (np.swapaxes) and I searched for something similar in tensorflow but I found nothing.
Do you have any idea?
tf.transpose provides the same functionality as np.swapaxes, although in a more generalized form. In your case, you can do tf.transpose(orig_tensor, [1, 0, 2]) which would be equivalent to np.swapaxes(orig_np_array, 0, 1).
It is possible to use tf.einsum to swap axes if the number of input dimensions is unknown. For example:
tf.einsum("ij...->ji...", input) will swap the first two dimensions of input;
tf.einsum("...ij->...ji", input) will swap the last two dimensions;
tf.einsum("aij...->aji...", input) will swap the second and the third
dimension;
tf.einsum("ijk...->kij...", input) will permute the first three dimensions;
and so on.
You can transpose just the last two axes with tf.linalg.matrix_transpose, or more generally, you can swap any number of trailing axes by working out what the leading indices are dynamically, and using relative indices for the axes you want to transpose
x = tf.ones([5, 3, 7, 11])
trailing_axes = [-1, -2]
leading = tf.range(tf.rank(x) - len(trailing_axes)) # [0, 1]
trailing = trailing_axes + tf.rank(x) # [3, 2]
new_order = tf.concat([leading, trailing], axis=0) # [0, 1, 3, 2]
res = tf.transpose(x, new_order)
res.shape # [5, 3, 11, 7]