How to change the tensor shape in middle layers? - tensorflow

Saying I have a 2000x100 matrix, I put it into 10 dimension embedding layer, which gives me 2000x100x10 tensor. so it's 2000 examples and each example has a 100x10 matrix. and then, I pass it to a conv1d and KMaxpolling to get 2000x24 matrix, which is 2000 examples and each example has a 24 dimension vector. and now, I would like to recombine those examples before I apply another layer. I would like to combine the first 10 examples together, and such and such, so I get a tuple. and then I pass that tuple to the next layer.
My question is, Can I do that with Keras? and any idea on how to do it?

The idea of using "samples" is that these samples should be unique and not relate to each other.
This is something Keras will demand from your model: if it started with 2000 samples, it must end with 2000 samples. Ideally, these samples do not talk to each other, but you can use custom layers to hack this, but only in the middle. You will need to end with 2000 samples anyway.
I believe you're going to end your model with 200 groups, so maybe you should already start with shape (200,10,100) and use TimeDistributed wrappers:
inputs = Input((10,100)) #shape (200,10,100)
out = TimeDistributed(Embedding(....))(inputs) #shape (200,10,100,10)
out = TimeDistributed(Conv1D(...))(out) #shape (200,10,len,filters)
#here, you use your layer that will work on the groups without TimeDistributed.
To reshape a tensor without changing the batch size, use the Reshape(newShape) layer, where newShape does not include the first dimension (batch size).
To reshape a tensor including the batch size, use a Lambda(lambda x: K.reshape(x,newShape)) layer, where newShape includes the first dimension (batch size) - Here you must remember the warning above: somewhere you will need to undo this change so you end up with the same batch size as the input.

Related

Siamese Twin Network: Merging of data streams with a custom function

since I am not very experienced I am struggling with a siamese twin network.
I have 2 images which run trough the same CNN and generate each a distinct feature vector. I would like to train a further network interpreting these two image vectors (each with 32 elements). In an intermediate step I would like to use these vectors as input for a function NCC which is located as a Layer between the CNN and the NN and defined in the following snippet ( i.e. the output should be used for the next NN):
def NCC(a, b):
l=a.shape[1]
av_a=tf.math.reduce_mean(a)
av_b=tf.math.reduce_mean(b)
a=a-av_a
b=b-av_b
norm_a=tf.math.sqrt(tf.math.reduce_sum(a*a))
norm_b=tf.math.sqrt(tf.math.reduce_sum(b*b))
a=a/norm_a
b=b/norm_b
A=tf.reshape(tf.repeat(a, axis=0, repeats=l),(l,l))
B=tf.reshape(tf.repeat(b, axis=0, repeats=l),(l,l))
ncc=Flatten()(A*tf.transpose(B))
return ncc
The output vector (for batchsize=1) should have a 32x32=1024 elements. It seems to work for a batchsize of 1. If I increase the batch size I run into trouble because the input vectors are now tensors with shape=(batch_size,32). I think this is a very stupid question- But how can I circumvent this issue? (It should be noted I wish also to have an output tensor with shape=(batch_size,1024))
Thanks in advance
Mike

What are the effects of padding a tensor?

I'm working on a problem using Keras that has been presenting me with issues:
My X data is all of shape (num_samples, 8192, 8), but my Y data is of shape (num_samples, 4), where 4 is a one-hot encoded vector.
Both X and Y data will be run through LSTM layers, but the layers are rejecting the Y data because it doesn't match the shape of the X data.
Is padding the Y data with 0s so that it matches the dimensions of the X data unreasonable? What kind of effects would that have? Is there a better solution?
Edited for clarification:
As requested, here is more information:
My Y data represents the expected output of passing the X data through my model. This is my first time working with LSTMs, so I don't have an architecture in mind, but I'd like to use an architecture that works well with classifying long (8192-length) sequences of words into one of several categories. Additionally, the dataset that I have is of an immense size when fed through an LSTM, so I'm currently using batch-training.
Technologies being used:
Keras (Tensorflow Backend)
TL;DR Is padding one tensor with zeroes in all dimensions to match another tensor's shape a bad idea? What could be a better approach?
First of all, let's make sure your representation is actually what you think it is; the input to an LSTM (or any recurrent layer, for that matter) must be of dimensionality: (timesteps, shape), i.e. if you have 1000 training samples, each consisting of 100 timesteps, with each timestep having 10 values, your input shape will be (100,10,). Therefore I assume from your question that each input sample in your X set has 8192 steps and 8 values per step. Great; a single LSTM layer can iterate over these and produce 4-dimensional representations with absolutely no problem, just like so:
myLongInput = Input(shape=(8192,8,))
myRecurrentFunction = LSTM(4)
myShortOutput = myRecurrentFunction(myLongInput)
myShortOutput.shape
TensorShape([Dimension(None), Dimension(4)])
I assume your problem stems from trying to apply yet another LSTM on top of the first one; the next LSTM expects a tensor that has a time dimension, but your output has none. If that is the case, you'll need to let your first LSTM also output the intermediate representations at each time step, like so:
myNewRecurrentFunction=LSTM(4, return_sequences=True)
myLongOutput = myNewRecurrentFunction(myLongInput)
myLongOutput.shape
TensorShape([Dimension(None), Dimension(None), Dimension(4)])
As you can see the new output is now a 3rd order tensor, with the second dimension now being the (yet unassigned) timesteps. You can repeat this process until your final output, where you usually don't need the intermediate representations but rather only the last one. (Sidenote: make sure to set the activation of your last layer to a softmax if your output is in one-hot format)
On to your original question, zero-padding has very little negative impact on your network. The network will strain itself a bit in the beginning trying to figure out the concept of the additional values you have just thrown at it, but will very soon be able to learn they're meaningless. This comes at a cost of a larger parameter space (therefore more time and memory complexity), but doesn't really affect predictive power most of the time.
I hope that was helpful.

Shape of tensor for 2D image in Keras

I am a newbie to Keras (and somehow to TF) but I have found shape definition for the input layer very confusing.
So in the examples, when we have a 1D vector of length 20 for input, shape gets defined as
...Input(shape=(20,)...)
And when a 2D tensor for greyscale images needs to be defined for MNIST, it is defined as:
...Input(shape=(28, 28, 1)...)
So my question is why the tensor is not defined as (20) and (28, 28)? Why in the first case a second dimension is added and left empty? Also in second, number of channels have to be defined?
I understand that it depends on the layer so Conv1D, Dense or Conv2D take different shapes but it seems the first parameter is implicit?
According to docs, Dense needs be (batch_size, ..., input_dim) but how is this related the example:
Dense(32, input_shape=(784,))
Thanks
Tuples vs numbers
input_shape must be a tuple, so only (20,) can satisfy it. The number 20 is not a tuple. -- There is the parameter input_dim, to make your life easier if you have only one dimension. This parameter can take 20. (But really, I find it just confusing, I always work with input_shape and use tuples, to keep a consistent understanding).
Dense(32, input_shape=(784,)) is the same as Dense(32, input_dim=784).
Images
Images don't have only pixels, they also have channels (red, green, blue).
A black/white image has only one channel.
So, (28pixels, 28pixels, 1channel)
But notice that there isn't any obligation to follow this shape for images everywhere. You can shape them the way you like. But some kinds of layers do demand a certain shape, otherwise they couldn't work.
Some layers demand specific shapes
It's the case of the 2D convolutional layers, which need (size1,size2,channels). They need this shape because they must apply the convolutional filters accordingly.
It's also the case of recurrent layers, which need (timeSteps,featuresPerStep) to perform their recurrent calculations.
MNIST models
Again, there isn't any obligation to shape your image in a specific way. You must do it according to which first layer you choose and what you intend to achieve. It's a free thing.
Many examples simply don't care about an image being a 2d structured thing, and they just use models that take 784 pixels. That's enough. They probably start with Dense layers, which demand shapes like (size,)
Other examples may care, and use a shape (28,28), but then these models will have to reshape the input to fit the needs of the next layer.
Convolutional layers 2D will demand (28,28,1).
The main idea is: input arrays must match input_shape or input_dim.
Tensor shapes
Be careful, though, when reading Keras error messages or working with custom / lambda layers.
All these shapes we defined before omit an important dimension: the batch size or the number of samples.
Internally all tensors will have this additional dimension as the first dimension. Keras will report it as None (a dimension that will adapt to any batch size you have).
So, input_shape=(784,) will be reported as (None,784).
And input_shape=(28,28,1) will be reported as (None,28,28,1)
And your actual input data must have a shape that matches that reported shape.

Why do we flatten the data before we feed it into tensorflow?

I'm following udacity MNIST tutorial and MNIST data is originally 28*28 matrix. However right before feeding that data, they flatten the data into 1d array with 784 columns (784 = 28 * 28).
For example,
original training set shape was (200000, 28, 28).
200000 rows (data). Each data is 28*28 matrix
They converted this into the training set whose shape is (200000, 784)
Can someone explain why they flatten the data out before feeding to tensorflow?
Because when you're adding a fully connected layer, you always want your data to be a (1 or) 2 dimensional matrix, where each row is the vector representing your data. That way, the fully connected layer is just a matrix multiplication between your input (of size (batch_size, n_features)) and the weights (of shape (n_features, n_outputs)) (plus the bias and the activation function), and you get an output of shape (batch_size, n_outputs). Plus, you really don't need the original shape information in a fully connected layer, so it's OK to lose it.
It would be more complicated and less efficient to get the same result without reshaping first, that's why we always do it before a fully connected layer. For a convolutional layer, on the opposite, you'll want to keep the data in original format (width, height).
That is a convention with fully connected layers. Fully connected layers connect every node in the previous layer with every node in the successive layer so locality is not an issue for this type of layer.
Additionally by defining the layer like this we can efficiently calculate the next step by calculating the formula: f(Wx + b) = y. This would not be as easily possible with multidimensional input and reshaping the input is low cost and easy to accomplish.

understanding tensorflow sequence_loss parameters

The sequence_Loss module's source_code has three parameters that are required they list them as outputs, targets, and weights.
Outputs and targets are self explanatory, but I'm looking to better understand is what is the weight parameter?
The other thing I find confusing is that it states that the targets should be the same length as the outputs, what exactly do they mean by the length of a tensor? Especially if its a 3 dimensional tensor.
Think of the weights as a mask applied to the input tensor. In some NLP applications, we often have different sentence length for each sentence. In order to parallel/batch multiple instance sentences into a minibatch to feed into a neural net, people use a mask matrix to denotes which element in the the input tensor is actually a valid input. For instance, the weight can be a np.ones([batch, max_length]) that means all of the input elements are legit.
We can also use a matrix of the same shape as the labels such as np.asarray([[1,1,1,0],[1,1,0,0],[1,1,1,1]]) (we assume the labels shape is 3x4), then the crossEntropy of the first row last column will be masked out as 0.
You can also use weight to calculate weighted accumulation of cross entropy.
We used this in a class and our professor said we could just pass it ones of the right shape (the comment says "list of 1D batch-sized float-Tensors of the same length as logits"). That doesn't help with what they mean, but maybe it will help you get your code to run. Worked for me.
This code should do the trick: [tf.ones(batch_size, tf.float32) for _ in logits].
Edit: from TF code:
for logit, target, weight in zip(logits, targets, weights):
if softmax_loss_function is None:
# TODO(irving,ebrevdo): This reshape is needed because
# sequence_loss_by_example is called with scalars sometimes, which
# violates our general scalar strictness policy.
target = array_ops.reshape(target, [-1])
crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
logit, target)
else:
crossent = softmax_loss_function(logit, target)
log_perp_list.append(crossent * weight)
The weights that are passed are multiplied by the loss for that particular logit. So I guess if you want to take a particular prediction extra-seriously you can increase the weight above 1.