I am trying to embedding the positional information 'index' to some vector and use in Keras, for instance
inputs = Input(shape=(23,))
Which usually 23 represents as the number of features. I want to embed the position of the features to be one dimentional vector, from position 0 to position 22.
But I don't know how to get the position index of the features (I want something like 'enumerate' function for keras layer), I added
pos = K.constant([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]])
embedding_pos = Embedding(23, 1)(pos)
And I actually this embedding_pos to multiply with inputs and do the rest of the algorithms go on.
But I get this message
AttributeError: 'NoneType' object has no attribute '_inbound_nodes'
If I get rid of that embedding layer and multiply layer, the algorithm works fine. How am I supposed to get the embedding vectors using the position index of the features of inputs?
======
Adding more information, I moved around the layers to see the model.summary(), it seems like it has the embedding_pos = [None, 1] shape, which is missing batch size.
I don't think it is good to use 'Constant'. I'd like to know if there is some kind of 'enumerate' function for keras layer
=====
By the request, example inputs is like this
batch_size x number_of_features = 1 x 10
[[1.0, 4719.0, 0.0001, 472818.44, 958, 6402818., 1.828, 24.321, 55.0, 127.44]]
and so on...
I want to get the index of the features
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
to use this value as the input for Embedding.
But if I make it with constant, it doesn't know the batch size.
Related
I have a 2d tensor of shape [32,768] also a 3d tensor of [32,512,768]. I want the stack them and get output to have shape of [32,512,1536].
If I expand dimensions at axis=1 for 2d and concat. I am getting [32,513,768]. So how to get [32,512,1536] as my output shape of tensor?
Short answer: you will have to repeat the 2D tensor along axis 1 512 times to get a 3D tensor of shape [32, 512, 768]. This 3D tensor when concatenated with the other 3D tensor along the last dimension will give a tensor of shape [32, 512, 1536]. You need to make sure this repetition in desired.
Longer extension:
Let's take a much simpler case:
Take a 1D tensor (1, 2, 3, 4, 5). Say you need to concatenate this to a 2D tensor of shape [2, 5], say ((6, 7, 8, 9, 10), (11, 12, 13, 14, 15)). Note that this is a simplified version of your problem, with smaller tensors and no batch dimension.
One way to combine these tensors is to get a tensor of shape [3, 5]. Here, you would expand the 1D tensor to a 2D tensor having shape [1, 5], and concatenate along axis 0. This will give the result ((1, 2, 3, 4, 5), (6, 7, 8, 9, 10), (11, 12, 13, 14, 15)). When applied to your problem, this gives the resulting [32, 513, 768] tensor you have.
The second way would give the tensor ((1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (1, 2, 3, 4, 5, 11, 12, 13, 14, 15)), having shape [2, 10]. As you can see, this requires (1, 2, 3, 4, 5) to be repeated twice. So, you'll have to expand the 1D tensor to get the shape [1, 5], and repeat it to get a tensor of shape [2, 5]. This tensor can then be concatenated with the other 2D tensor. In your case, you will expand the 2D tensor to shape [32, 1, 768], then repeat it 512 times along axis 1 to get a tensor of shape [32, 512, 768], which will be concatenated with the other 3D tensor.
When going for the second method, ensure that you really want to repeat the smaller tensor across all entries of the second tensor.
You can try this:
import torch
a = torch.ones([32,768])
b = torch.ones([32,512,768])
result = torch.cat([a[:, None, :].repeat(1, 512,1), b], dim=2)
I just started using ctc loss layer in tensorflow(r1.0) and got a little bit confused with the "labels" input
In tensorflow's API document, it says
labels: An int32 SparseTensor. labels.indices[i, :] == [b, t] means labels.values[i] stores the id for (batch b, time t). labels.values[i] must take on values in [0, num_labels)
Is [b,t] and values[i] mean there is a label "values[i]" at "t" of sequence "b" in the batch?
It says value must be in [0,num_labels), but for a sparse tensor, almost everywhere is 0 excepted for some specified places, so I don't really know how should the sparse tensor for ctc be like
And for example, if I have a short video of hand gesture, and it has a label "1",should I label the output of all timesteps as "1", or only label the last timestep as "1" and take other as "blank"?
thanks!
To address your questions:
1. The notation in the documentation here seems a bit misleading, as the output label index t need not be the same as the input time slice, it's simply the index to the output sequence. A different letter could be used because the input and output sequences are not explicitly aligned. Otherwise, your assertion seems correct. I give an example below.
Zero is a valid class in your sequence output label. The so-called blank label in TensorFlow's CTC implementation is the last (largest) class, which should probably not be in your ground truth labels anyhow. So if you were writing a binary sequence classifier, you'd have three classes, 0 (say "off"), 1 ("on") and 2 ("blank" output of CTC).
CTC Loss is for labeling sequence input with sequence output. If you only have
a single class label output for the sequence input, you're probably better off using a softmax cross entropy loss on the output of the last time step of the RNN cell.
If you do end up using CTC loss, you can see how I've constructed the training sequence through a reader here: How to generate/read sparse sequence labels for CTC loss within Tensorflow?.
As an example, after I batch two examples that have label sequences [44, 45, 26, 45, 46, 44, 30, 44] and [5, 8, 17, 4, 18, 19, 14, 17, 12], respectively, I get the following result from evaluating the (batched) SparseTensor:
SparseTensorValue(indices=array([[0, 0],
[0, 1],
[0, 2],
[0, 3],
[0, 4],
[0, 5],
[0, 6],
[0, 7],
[1, 0],
[1, 1],
[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 6],
[1, 7],
[1, 8]]), values=array([44, 45, 26, 45, 46, 44, 30, 44, 5, 8, 17, 4, 18, 19, 14, 17, 12], dtype=int32), dense_shape=array([2, 9]))
Notice how the rows of the indices in the sparse tensor value correspond to the batch number and the columns correspond to the sequence index for that particular label. The values themselves are the sequence label classes. The rank is 2 and the size of the last dimension (nine in this case) is the length of the longest sequence.
In the expert mnist tutorial in tensorflow website, it have something like this :
x_image = tf.reshape(x, [-1,28,28,1])
I know that the reshape is like
tf.reshape(input,[batch_size,width,height,channel])
Q1 : why is the batch_size equals -1? What does the -1 means?
And when I go down the code there's one more thing I can not understand
W_fc1 = weight_variable([7 * 7 * 64, 1024])
Q2:What does the image_size * 64 means?
Q1 : why is the batch_size equals -1? What does the -1 means?
-1 means "figure this part out for me". For example, if I run:
reshape([1, 2, 3, 4, 5, 6, 7, 8], [-1, 2])
It creates two columns, and whatever number of rows it needs to get everything to fit:
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
Q2:What does the image_size * 64 means?
It is the number of filters in that particular filter activation. Shapes of filters in conv layers follow the format [height, width, # of input channels (number of filters in the previous layer), # of filters].
When you pass -1 as a dimension in tf.reshape, it preserves the existing dimension. From the docs:
If one component of shape is the special value -1, the size of that
dimension is computed so that the total size remains constant. In
particular, a shape of [-1] flattens into 1-D. At most one component
of shape can be -1.
The reference to 7 x 7 x 64 is because the convolutional layer being applied prior to this example has reduced the image to a shape of [7, 7, 64], and the input to the next fully connected layer needs to be a single dimension, so in the next line of the example, the tensor is reshaped from [7,7,64] to [7*7*64] so it can connect to the FC layer.
For more info on how convolutions and max pooling works, the wikipedia page has some helpful graphics:
e.g. network architecture:
and pooling:
How does the implicit broadcasting in tensorflow using + and * work?
If i Have two tensors, such that
a.get_shape() = [64, 10, 1, 100]
b.get_shape() = [64, 100]
(a+b).get_shape = [64, 10, 64, 100]
(a*b).get_shape = [64, 10, 64, 100]
How does that become [64, 10, 64, 100]??
According to the documentation, operations like add are broadcasting operation.
Quoting the glossary:
Broadcasting operation
An operation that uses numpy-style broadcasting to make the shapes of its tensor arguments compatible.
The numpy-style broadcasting is well documented in the documentation:
In brief:
[...] the smaller array is “broadcast” across the larger array so that they have compatible shapes.
Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python.
I think that the broadcasting isn't doing what you intended. It's actually broadcasting both directions. Let me show you what I mean by modifying your example
a = tf.ones([64, 10, 1, 100])
b = tf.ones([128, 100])
print((a+b).shape) # prints "(64, 10, 128, 100)"
From this we see that it broadcasts by matching the last dimensions first. It's implicitly tiling a across it's third dimension to match the size of b's first dimension, then implicitly adding singletons and tiling b across a's first two dimensions.
What I think you expected to do was to implicitly tile b across a's second dimension. To do that, you need b to be a different shape:
a = tf.ones([64, 10, 1, 100])
b = tf.ones([64, 1, 1, 100])
print((a+b).shape) # prints "(64, 10, 1, 100)"
You can use tf.expand_dims() twice on your b to add the two singleton dimensions to match this shape.
numpy style broadcasting is well documented, but to give a short explanation: the 2 tensors' shapes will be compared start from the last shape backward, then any shape lacked in either tensor will be replicated to be matched.
For example, with
a.get_shape() = [64, 10, 1, 100]
b.get_shape() = [64, 100]
(a*b).get_shape = [64, 10, 64, 100]
a and b have the same last shape==100, then the next to last shape of a is replicated to match b shape==64, b lacks the first two shapes of a and they will be created.
Note that any lacking shape must be 1 or absent, because the whole of lower-level shapes are replicated.
I am looking for a fast formulation to do a numerical binning of a 2D numpy array. By binning I mean calculate submatrix averages or cumulative values. For ex. x = numpy.arange(16).reshape(4, 4) would have been splitted in 4 submatrix of 2x2 each and gives numpy.array([[2.5,4.5],[10.5,12.5]]) where 2.5=numpy.average([0,1,4,5]) etc...
How to perform such an operation in an efficient way... I don't have really any ideay how to perform this ...
Many thanks...
You can use a higher dimensional view of your array and take the average along the extra dimensions:
In [12]: a = np.arange(36).reshape(6, 6)
In [13]: a
Out[13]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [14]: a_view = a.reshape(3, 2, 3, 2)
In [15]: a_view.mean(axis=3).mean(axis=1)
Out[15]:
array([[ 3.5, 5.5, 7.5],
[ 15.5, 17.5, 19.5],
[ 27.5, 29.5, 31.5]])
In general, if you want bins of shape (a, b) for an array of (rows, cols), your reshaping of it should be .reshape(rows // a, a, cols // b, b). Note also that the order of the .mean is important, e.g. a_view.mean(axis=1).mean(axis=3) will raise an error, because a_view.mean(axis=1) only has three dimensions, although a_view.mean(axis=1).mean(axis=2) will work fine, but it makes it harder to understand what is going on.
As is, the above code only works if you can fit an integer number of bins inside your array, i.e. if a divides rows and b divides cols. There are ways to deal with other cases, but you will have to define the behavior you want then.
See the SciPy Cookbook on rebinning, which provides this snippet:
def rebin(a, *args):
'''rebin ndarray data into a smaller ndarray of the same rank whose dimensions
are factors of the original dimensions. eg. An array with 6 columns and 4 rows
can be reduced to have 6,3,2 or 1 columns and 4,2 or 1 rows.
example usages:
>>> a=rand(6,4); b=rebin(a,3,2)
>>> a=rand(6); b=rebin(a,2)
'''
shape = a.shape
lenShape = len(shape)
factor = asarray(shape)/asarray(args)
evList = ['a.reshape('] + \
['args[%d],factor[%d],'%(i,i) for i in range(lenShape)] + \
[')'] + ['.sum(%d)'%(i+1) for i in range(lenShape)] + \
['/factor[%d]'%i for i in range(lenShape)]
print ''.join(evList)
return eval(''.join(evList))
I assume that you only want to know how to generally build a function that performs well and does something with arrays, just like numpy.reshape in your example. So if performance really matters and you're already using numpy, you can write your own C code for that, like numpy does. For example, the implementation of arange is completely in C. Almost everything with numpy which matters in terms of performance is implemented in C.
However, before doing so you should try to implement the code in python and see if the performance is good enough. Try do make the python code as efficient as possible. If it still doesn't suit your performance needs, go the C way.
You may read about that in the docs.