What does the keras ConvLSTM2D layer do? - tensorflow

I would like to understand the ConvLSTM2D Keras layer a bit better.
Does it execute an 2D convolution on a 2D input (image) and then average/ flatten its ouptut and feed that into a LSTM module?
But I guess it is basically an LSTM cell, where the matrix multiplications are replaced with convolution operations. Is that correct?
Thank you

Yes, you are right with the concept of CONVLSTM2D.
CONVLSTM2D architecture combines gating of LSTM with 2D convolutions.
As you have mentioned, CONVLSTM layers will do a similar task to LSTM but instead of matrix multiplications, it does convolution operations and retains the input dimensions.
Another different approach would be that the images pass through the convolution layer and the result will be a flattened 1D array and this will be the input to the LSTM layers with a set of features over time.
Input of Kera's CONVLSTM layer: is a 5D tensor with shape
(samples, time, channels, rows, cols) if it is channels first.
(samples, time, rows, cols, channels) if it is channels last.
Output of a CONVLSTM layer:
If return_sequences = True then it is a 5D tensor with shape
(samples, time, filters, rows, cols)
If return_sequences = False then it is a 4D tensor with shape.
(samples, filters, rows, cols)
You can refer to this paper from where the implementation of CONVLSTM is done.

Related

Tensorflow Keras output layer shape weird error

I am fairly new to TF, Keras and ML in general.
I am trying to implement a very simple MLP with an input shape of (batch_size,3,2) and an output shape of (batch_size,3), that is (if I got it right): for every 3x2 feature, there is a corresponding 3 value array label.
Here is how I create the model:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(3)
])
and these are the X and y shapes:
X_train.shape,y_train.shape
TensorShape([64,3,2]),TensorShape([64,3])
On model.fit I am facing a weird error I cannot understand:
ValueError: Dimensions must be equal, but are 3 and 32 for ... with input shapes: [32,3,3] and [32,3]
I have no clue what's going on, I understand the batch size is 32, but where does that [32,3,3] comes from?
Moreover, if from the original 64, I lower the number (shapes) of X_train and y_train, say, to: (19,3,2) and (19,3), I get the following error instead:
InvalidArgumentError: required broadcastable shapes at loc(unknown)
What's even more weird for me is that if I specify a single unit for the output (last) layer, instead of 3 like this:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(1)
])
model.fit works, but the predictions have shape (1,3,1) instead of my expected (3,)
I am very confused.
Whenever you have not any idea about the journey of data throughout your model, use model.summary() to see the details and what happens to the shape of data in each layer.
In this case, the input is a 2D array, and the output is a 1D array, and you just used dense layers. Dense layers can not handle 2d features in nature. For example for an image as input, you can not feed it directly to a dense layer. Instead you should use other layers such as Conv2D or Flatten your input (make it 1D) before feeding your data to the dense layer. Otherwise you will get the other dimension in the output.
Inference: If your input dimension and output dimension differs, somewhere in your model, the shape need to be changed. Most common ways to do so, is using a Flatten layer or GlobalAveragePooling and so on.
When you pass an input to a dense layer, the input should be flattened first. There are 2 ways to deal with this:
Way 1: Adding a flatten input as a first layer of your model:
model = Sequential()
model.add(Flatten(input_shape=(3,2)))
model.add(Dense(50, 'relu'))
model.add(Dense(3))
Way 2: Converting the 2D array to 1D before passing the inputs to your model:
X_train = tf.reshape(X_train, shape=([6]))
or
X_train = tf.reshape(X_train, shape=((6,)))
Then change the input shape of the first layer as:
model.add(Dense(50, 'relu', input_shape=(6,))

Convert 2D Convolutionary Neural Networks to 1D Convolutionary Neural Networks in Tensorflow

Say I have some feature extracted and it is 10x10 data(maybe image or cepstrogram).
Usually I would feed this into my 2DConv and i ll be on my way.
My quesiton is if I had to convert this into 1D of 100 inputs what disadvantages would I get besides the obvious part where my filter would not be detecting the surrounding neighboors but only the previous and the next ones to detect pattern, which might lead to a worse performance.
And If I had to do this though, would I just reshape ,use reshape layer or use permute layer ?
Thanks
Yes, you are correct regarding the GNA, our Intel GNA hardware is natively support only 1D convolution and 2D convolutions is experimental.
This article (GNA Plugin - OpenVINO™ Toolkit) specifies the steps to add Permute layers before or after convolutions.
You could try both methods and see which one works for you.
Generally,the 1d convolution in TensorFlow is created with 2d convolution wrapping in reshape layers to add H dimension before 2d convolution and remove it after that.
At the same time MO inserts permutes before and after reshape layers since they change the interpretation of data.
For advantages & disadvantages of 2D/1D CNN you may refer to this detailed thread
In TensorFlow, these are the process to build CNN architecture:
Reshape input if necessary using tf.reshape() to match the convolutional layer you intend to build (for example, if using a 2D convolution, reshape it into three-dimensional format)
Create a convolutional layer using tf.nn.conv1d(), tf.nn.conv2d(), or tf.nn.conv3d, depending on the dimensionality of the input.
Create a poling layer using tf.nn.maxpool()
Repeat steps 2 and 3 for additional convolution and pooling layers
Reshape output of convolution and pooling layers, flattening it to prepare for the fully connected layer
Create a fully connected layer using tf.matmul() function, add an activation using, for example, tf.nn.relu() and apply a dropout using tf.nn.dropout()
Create a final layer for class prediction, again using tf.matmul()
Store weights and biases using TensorFlow variables These are just the basic steps to create the CNN model, there are additional steps to define training and evaluation, execute the model and tune it
In step 2 of CNN development you create convolutional layer of 2D using tf.nn.conv2d() - this function Computes a 2-D convolution given 4-D input and filters tensors.
So if you have 1D vector as found in examples of MNIST datadet with 784 features, you can convert 1D vector to 4D input required for conv2d() function using the tensorflow reshape method, Reshape method converts to match picture format [Height x Width x Channel], then Tensor input become 4-D: [Batch Size, Height, Width, Channel]:
x = tf.reshape(x, shape=[-1, 28, 28, 1])
where x is placeholder vector
x = tf.placeholder(tf.float32, [None, num_input])
You may refer to the official Tensorflow documentation

How to input 5D tensor to keras model.fit

I am utilizing tensorflow ver 2, tensorflow.keras.
A model I made is in a sequence of tf.keras.Conv2D ( which requires 4D input tensor (samples, rows, cols, channels)
then tf.keras.convLSTM2D (which requires 5D input tensor (samples, time, rows, cols, channels).
Because of this reason, I made an input with 5D tensor (samples, time, rows, cols, channels) but it can't be fed into tf.keras.Conv2D at the beginning when I implement model.fit(train_data, train_data... )
Is there any way to make model.fit to take 5D tensor?
You need to implement TimeDistributed conv2D as in :
x_conv = tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal'))(x)
This way the layers understand that you're giving 4D input over timestep

Does Tensorflows tf.layers.dense flatten input dimensions?

I'm searching for a data leak in my model. I'm using tf.layers.dense before a masking operation and am concerned that the model could just learn to switch positions in the middle dimension of my input tensor.
When I have an input tensor x = tf.ones((2,3,4)) would tf.layers.dense(x,8) flatten x to a fully connected layer with 2*3*4=24 input neurons and 2*3*8=48 output neurons then reshape it again to [2,3,8], or would it create 2*3=6 fully connected layers with 4 input and 8 output neurons then concatenate them?
As for the Keras Dense layer, it has been already mentioned in another answer that its input is not flattened and instead, it is applied on the last axis of its input.
As for the TensorFlow Dense layer, it is actually inherited from Keras Dense layer and as a result, same as Keras Dense layer, it is applied on the last axis of its input.

tensorflow reshaping convolutional filters for visualization

I have a 4D tensor of filter/kernel weights (of convolutional layer).
They're being passed to the subsequent operation with shape [5,5,3,32], 32 RGB 5x5 filters.
to collect their values for monitoring/analysis/storage using tf.summary.image I need to reshape this tensor into the shape [32,5,5,3], to then view/store each of the 32 filters as individual images of [5,5,3]
is this possible purely using tf.reshape()? or do I need to do multiple tensor transformations?
You need transpose instead of reshape, tf.transpose(t, (3,0,1,2)) should do what you need (suppose t is your tensor here), which shifts the last axis as the first axis.