How to input 5D tensor to keras model.fit - tensorflow

I am utilizing tensorflow ver 2, tensorflow.keras.
A model I made is in a sequence of tf.keras.Conv2D ( which requires 4D input tensor (samples, rows, cols, channels)
then tf.keras.convLSTM2D (which requires 5D input tensor (samples, time, rows, cols, channels).
Because of this reason, I made an input with 5D tensor (samples, time, rows, cols, channels) but it can't be fed into tf.keras.Conv2D at the beginning when I implement model.fit(train_data, train_data... )
Is there any way to make model.fit to take 5D tensor?

You need to implement TimeDistributed conv2D as in :
x_conv = tf.keras.layers.TimeDistributed(tf.keras.layers.Conv2D(filters=filters,
kernel_size=kernel_size,
strides=strides,
padding='same',
kernel_initializer='he_normal'))(x)
This way the layers understand that you're giving 4D input over timestep

Related

Understanding shapes in keras layers

I am learning Tensorflow and Keras to implement LSTM many-to-many model where the length of input sequence is equal to the length of the output sequence.
Sample Code:
Inputs:
voc_size = 10000
embed_dim = 64
lstm_units = 75
size_batch = 30
count_classes = 5
Model:
from tensorflow.keras.layers import ( Bidirectional, LSTM,
Dense, Embedding, TimeDistributed )
from tensorflow.keras import Sequential
def sample_build(embed_dim, voc_size, batch_size, lstm_units, count_classes):
model = Sequential()
model.add(Embedding(input_dim=voc_size,
output_dim=embed_dim,input_length=50))
model.add(Bidirectional(LSTM(units=lstm_units,return_sequences=True),
merge_mode="ave"))
model.add(Dense(200))
model.add(TimeDistributed(Dense(count_classes+1)))
# Compile model
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
model.summary()
return model
sample_model = sample_build(embed_dim,voc_size,
size_batch, rnn_units,
count_classes)
I am having trouble understanding the shapes of input and output for each layer. For example, the shape of the output of Embedding_Layer is (BATCH_SIZE, time_steps, length_of_input) and in this case, it is (30, 50, 64).
Similarly, the output shape of Bidirectional LSTM later is (30, 50, 75). This is will be the input for the next Dense Layer with 200 units. But the shape of the weight matrix of Dense Layer is (number of units in the current layer, number of units in the previous layer, which is (200,75) in this case. So how does the matrix calculation happen between 2D shape of the Dense Layer and the 3D shape of the Bidirectional Layer? Any explanations on the shape clarification will be helpful
The Dense can do 3D operation, it will flatten the the input to shape (batch_size * time_steps, features) and then apply a dense layer and reshape it back to orignal (batch_size, time_steps, units). In keras's documentation of Dense layer, it says:
Note: If the input to the layer has a rank greater than 2, then Dense computes the dot product between the inputs and the kernel along the last axis of the inputs and axis 1 of the kernel (using tf.tensordot). For example, if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).
Another point regarding the output of Embedding layer. As you said, it is correct that it is a 3D output, but correctly the shape correspond to (BATCH_SIZE, input_dim, embeddings_dim)

What does the keras ConvLSTM2D layer do?

I would like to understand the ConvLSTM2D Keras layer a bit better.
Does it execute an 2D convolution on a 2D input (image) and then average/ flatten its ouptut and feed that into a LSTM module?
But I guess it is basically an LSTM cell, where the matrix multiplications are replaced with convolution operations. Is that correct?
Thank you
Yes, you are right with the concept of CONVLSTM2D.
CONVLSTM2D architecture combines gating of LSTM with 2D convolutions.
As you have mentioned, CONVLSTM layers will do a similar task to LSTM but instead of matrix multiplications, it does convolution operations and retains the input dimensions.
Another different approach would be that the images pass through the convolution layer and the result will be a flattened 1D array and this will be the input to the LSTM layers with a set of features over time.
Input of Kera's CONVLSTM layer: is a 5D tensor with shape
(samples, time, channels, rows, cols) if it is channels first.
(samples, time, rows, cols, channels) if it is channels last.
Output of a CONVLSTM layer:
If return_sequences = True then it is a 5D tensor with shape
(samples, time, filters, rows, cols)
If return_sequences = False then it is a 4D tensor with shape.
(samples, filters, rows, cols)
You can refer to this paper from where the implementation of CONVLSTM is done.

1D convolution on flat, one-dimensional data (i.e. no timeseries)

I am training in a dataset in which (some of) the neighboring features exhibit very strong correlations. In order to help the neural network, I am thinking of adding some 1D convolutions as the first layers. Even though 1D convolutions are mostly used to time series/nlp data, I see no theoretical reason why they cannot be used vector-wise in any type of data.
But I am not able to make keras.layers.Conv1D work, since its apparently designed for time-series data. A MRV example is the following:
model = keras.Sequential([
keras.layers.Input(10,),
keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1"),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss=losses.categorical_crossentropy, metrics=['accuracy'])
ValueError: Input 0 of layer conv_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 10]
In that, I believe the "found ndim=2" corresponds to a tensor of [batch_size, 10] while it expects a tensor of shape [series_length, batch_size, 10] (or some other way around).
My question is: Is there a way to make 1D convolutions work in this situation in keras?
Note 1: this SO question has the same problem, though without elaborating and the accepted answer does not solve the problem.
Note 2: I suppose I can convert each datapoint of my dataset to a 2D tensor of two rows where the second would be just 0's and use Conv2D's, but I would like to avoid that.
In all Ccnv layers in Keras there is one dimension defined for the number of channels. For example you can have an image which has 2 Dimensions but Conv2D needs 3 dimension (without batch). The reason is simply because the image can have one channel (gray scale) or 3 for example (colored). the same is true for a 1D signal which can be any signal with any number of channels. you can simply add one dimension to you data. if you have an numpy array:
data = data[:, np.newaxis, :] and setting channels_first keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1, data_format="channels_first"). you can do the same through adding extra dimension at the end and setting `data_format="channels_last"

CNN features(dimensions) feed to LSTM Tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

conv1d and conv2d in tensorflow

For conv2d, assuming an input 2D matrix with shape (W,H) and the conv kernel size is (Wk,H), which means the height of the kernel is the same with the height of input matrix. In this case, can we think that conv1 with kernel size Wk carries out the same computation as conv2d?
For example:
tf.layers.conv2d(
kernel_size=tf.Variable(tf.truncated_normal([Wk, H, 1, out_dim], stddev=0.1),
input=...
)
equals to:
tf.layers.conv1d(kernel_size=Wk, input=...)
They're not the same; the conv2d kernel has many more weights and is going to train differently because of that. Also, depending on what padding is set to, the output size of the conv2d operation may not be 1D either.
tf.nn.conv1d just call the tf.nn.conv2d
This is the description of tf.nn.conv1d:
Internally, this op reshapes the input tensors and invokes tf.nn.conv2d. For example, if data_format does not start with "NC", a tensor of shape [batch, in_width, in_channels] is reshaped to [batch, 1, in_width, in_channels], and the filter is reshaped to [1, filter_width, in_channels, out_channels]. The result is then reshaped back to [batch, out_width, out_channels] (where out_width is a function of the stride and padding as in conv2d) and returned to the caller.