Time distributed layer keras - tensorflow

Iam trying to understand the time distributed layer in keras/tensorflow.
As far as I have understood it is a kind of wrapper, making it possible to in example process a sequence of images.
Now Iam wondering how would design a time distributed network without using the time distributed layer.
In example if I would have a sequence of 3 images, each having 1 channel and a pixel dimension of 256x256px, that should first be processed by a CNN and then by LSTM cells.
My input to the time distributed layer would then be (N,3,256,256,1), where N is the batch size.
The CNN would then have 3 outputs, which are fed to the LSTM cell.
Now, without using the time distributed layers, would it be possible to accomplish the same by setting up a network with 3 different inputs and 3 similar CNNs? The outputs of the 3 CNNs could then be flattened and concatenated.
Is that any different from the time distributed approach?
Thanks in advance,

I created a prototype for you. I used the least number of layers and arbitrary units/kernels/filters, change them as you like. It creates a cnn model first that takes inputs of size (256,256,1). It uses the same cnn model 3 times (for your three images in the sequence) to extract features. It stacks all the features using Lambda layer to put it back in a sequence. The sequence then goes through LSTM layer. I have chosen for the LSTM to return a single feature vector per example, but if you want the output to be a sequence as well, you could change it to say return_sequences=True. You could also add final additional layers to adapt it to your needs.
from tensorflow.keras.layers import Input, LSTM, Conv2D, Flatten, Lambda
from tensorflow.keras import Model
import tensorflow.keras.backend as K
def create_cnn_model():
inp = Input(shape=(256,256,1))
x = Conv2D(filters=16, kernel_size=5, strides=2)(inp)
x = Flatten()(x)
model = Model(inputs=inp, outputs=x, name='cnn_Model')
return model
def combined_model():
cnn_model = create_cnn_model()
inp_1 = Input(shape=(256,256,1))
inp_2 = Input(shape=(256,256,1))
inp_3 = Input(shape=(256,256,1))
out_1 = cnn_model(inp_1)
out_2 = cnn_model(inp_2)
out_3 = cnn_model(inp_3)
lstm_inp = [out_1, out_2, out_3]
lstm_inp = Lambda(lambda x: K.stack(x, axis=-2))(lstm_inp)
x = LSTM(units=32, return_sequences=False)(lstm_inp)
model = Model(inputs=[inp_1, inp_2, inp_3], outputs=x)
return model
Now create the model as such:
model = combined_model()
Check the summary:
which will print:
Model: "model_14"
Layer (type) Output Shape Param # Connected to
input_53 (InputLayer) [(None, 256, 256, 1) 0
input_54 (InputLayer) [(None, 256, 256, 1) 0
input_55 (InputLayer) [(None, 256, 256, 1) 0
cnn_Model (Model) (None, 254016) 416 input_53[0][0]
lambda_3 (Lambda) (None, 3, 254016) 0 cnn_Model[1][0]
lstm_13 (LSTM) (None, 32) 32518272 lambda_3[0][0]
Total params: 32,518,688
Trainable params: 32,518,688
Non-trainable params: 0
The inner cnn model summary could be printed:
which currently prints:
Model: "cnn_Model"
Layer (type) Output Shape Param #
input_52 (InputLayer) [(None, 256, 256, 1)] 0
conv2d_10 (Conv2D) (None, 126, 126, 16) 416
flatten_6 (Flatten) (None, 254016) 0
Total params: 416
Trainable params: 416
Non-trainable params: 0
Your model expects a list as input. The list should have a length of 3 (since there are 3 images in a sequence). Each element of the list should be a numpy array of shape (batch_size, 256, 256, 1). I have worked a dummy example below with a batch size of 1:
import numpy as np
a = np.zeros((256,256,1)) # first image filled with zeros
b = np.zeros((256,256,1)) # second image filled with zeros
c = np.zeros((256,256,1)) # third image filled with zeros
a = np.expand_dims(a, 0) # adding batch dimension to make it (1, 256, 256, 1)
b = np.expand_dims(b, 0) # same here
c = np.expand_dims(c, 0) # same here
model.compile(loss='mse', optimizer='adam')
# train your model with model.fit(....)
e = model.predict([a,b,c]) # a,b and c have shape of (1, 256, 256, 1) where the first 1 is the batch size


Mobilenet: Transfer learning with Gradcam

I am a newbie to all this so please be kind to this question :)
What I am trying to do is train a Mobilenet classifier using the transfer learning technique and then implement the Gradcam technique to understand what my model is looking into.
I created a model
input_layer = tf.keras.layers.Input(shape=IMG_SHAPE)
x = preprocess_input(input_layer)
y = base_model(x)
y = tf.keras.layers.GlobalAveragePooling2D()(y)
y = tf.keras.layers.Dropout(0.2)(y)
outputs = tf.keras.layers.Dense(5)(y)
model = tf.keras.Model(inputs=input_layer, outputs=outputs)
model summary:
Model: "functional_2"
Layer (type) Output Shape Param #
input_3 (InputLayer) [(None, 224, 224, 3)] 0
tf_op_layer_RealDiv_1 (Tenso [(None, 224, 224, 3)] 0
tf_op_layer_Sub_1 (TensorFlo [(None, 224, 224, 3)] 0
mobilenetv2_1.00_224 (Functi (None, 7, 7, 1280) 2257984
global_average_pooling2d_1 ( (None, 1280) 0
dropout_1 (Dropout) (None, 1280) 0
dense_1 (Dense) (None, 5) 6405
Total params: 2,264,389
Trainable params: 6,405
Non-trainable params: 2,257,984
passed it to grad cam algorithm but the grad cam algorithm is not able to find the last convolutional layer
Plausible solution:
If instead of having an encapsulated 'mobilenetv2_1.00_224' layer if I can have unwrapped layers of mobilenet added in the model the grad cam algorithm will be able to find that last layer
I am not able to create the model where I can have data augmentation and pre_processing layer added to mobilenet unwrapped layers.
Thanks in advance
#skruff see if this helps
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
# First, we create a model that maps the input image to the activations
# of the last conv layer as well as the output predictions
grad_model = tf.keras.models.Model(
[model.inputs], [model.get_layer(last_conv_layer_name).output, model.output]
# Then, we compute the gradient of the top predicted class for our input image
# with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
last_conv_layer_output, preds = grad_model(img_array)
if pred_index is None:
pred_index = tf.argmax(preds[0])
class_channel = preds[:, pred_index]
# This is the gradient of the output neuron (top predicted or chosen)
# with regard to the output feature map of the last conv layer
grads = tape.gradient(class_channel, last_conv_layer_output)
# This is a vector where each entry is the mean intensity of the gradient
# over a specific feature map channel
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the top predicted class
# then sum all the channels to obtain the heatmap class activation
last_conv_layer_output = last_conv_layer_output[0]
heatmap = last_conv_layer_output # pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
# For visualization purpose, we will also normalize the heatmap between 0 & 1
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
return heatmap.numpy()

Dimension of output in Dense layer Keras

I have the sample following model
from tensorflow.keras import models
from tensorflow.keras import layers
sample_model = models.Sequential()
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
optimizer="adam", metrics = ["accuracy"])
IP for the model:
sam_x = np.random.rand(10,4)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
The confusion is the fit should have thrown an error of shape mismatch as the expected_input_shape for the 2nd Dense Layer is given as (None,44) but the output for the 1st Dense Layer (which is the input of the 2nd Dense Layer) will be of shape (None,32). But it ran successfully.
I don't understand why there was no error. Any clarifications will be helpful
The input_shape keyword argument has an effect only on the first layer of a Sequential. The shape of the input of the other layers will be derived from their previous layer.
That behaviour is hinted in the doc of tf.keras.layers.InputShape:
When using InputLayer with Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
And in the Sequential Model guide.
The behaviour can be confirmed by looking at the source of the Sequential.add method:
if not self._layers:
if isinstance(layer, input_layer.InputLayer):
# Case where the user passes an Input or InputLayer layer via `add`.
set_inputs = True
batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
if batch_shape:
# Instantiate an input layer.
x = input_layer.Input(
batch_shape=batch_shape, dtype=dtype, name=layer.name + '_input')
# This will build the current layer
# and create the node connecting the current layer
# to the input layer we just created.
set_inputs = True
If there is no layers yet in the model, then an Input will be added to the model with the shape derived from the first layer of the model. This is done only if no layer is present yet in the model.
That shape is either fully known (if input_shape has been passed to the first layer of the model) or will be fully known once the model is built (for example, with a call to model.build(input_shape)).
The thing is after checking the input shape of the model from the first layer, it won't check or deal with other declared input shape inside that same model. For example, if you write your model the following way
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.add(layers.Dense(8, input_shape = (32,)))
The program will always check the first declared input shape layer and discard the rest. So, if you start your first layer with input_shape = (44,), you need to pass exact feature numbers to your model as input such as:
sam_x = np.random.rand(10,44)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
Additionally, if you look at the Functional API, unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data. It's not learnable but simply a spec layer. It's a kind of gateway of the input data for the model. That means even if we define input_shape inside the other layers, they all will be discarded. For example:
nputs = keras.Input(shape=(4,))
dense = layers.Dense(64, input_shape=(8,)) # dicard input_shape
x = dense(inputs)
x = layers.Dense(64, input_shape=(16,))(x) # dicard input_shape
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
Here is a more complex example with Conv2D and MNIST.
encoder_input = keras.Input(shape=(28, 28, 1),)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[32,32,3])(encoder_input)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[64,64,3])(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[224,321,3])(x)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[420,32,3])(x)
x = layers.GlobalMaxPooling2D()(x)
out = layers.Dense(10, activation='softmax')(x)
encoder = keras.Model(encoder_input, out, name="encoder")
Model: "encoder"
Layer (type) Output Shape Param #
input_15 (InputLayer) [(None, 28, 28, 1)] 0
conv2d_8 (Conv2D) (None, 26, 26, 16) 160
conv2d_9 (Conv2D) (None, 24, 24, 32) 4640
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
conv2d_10 (Conv2D) (None, 6, 6, 32) 9248
conv2d_11 (Conv2D) (None, 4, 4, 16) 4624
global_max_pooling2d_2 (Glob (None, 16) 0
dense_56 (Dense) (None, 10) 170
Total params: 18,842
Trainable params: 18,842
Non-trainable params: 0
def pre_process(image, label):
return (image / 256)[...,None].astype('float32'),
tf.keras.utils.to_categorical(label, num_classes=10)
(x, y), (_, _) = tf.keras.datasets.mnist.load_data('mnist')
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
encoder.fit(x, y, batch_size=256)
4s 14ms/step - loss: 1.4303 - categorical_accuracy: 0.5279
I think Keras will create (or preserves to create) an additional Input Layer - but as the second dense layer is added using model.add() it will automatically be connected to the layer before, and thus the extra input layer stays unconnected and is not part of the model.
(I agree that it would be nice of Keras to hint at unconnected layers, I sometimes created unconnected layers when using the functional API and changed the inputs. Keras doesn't remind me that I had jumped several layers, I just wondered why the summary() was so short...)

Variable Number of channels

I need a convolutional layer which outputs a variable number of channels, depending on the input.
conv2d(filters = variable_number)
A model cannot have varying number of filters depending on the input. The model need to have below arguments to be fixed for it to be trained.
Name and type of all layers in the model.
Output shape for each layer.
Number of weight parameters of each layer.
The inputs each layer receives.
The total number of trainable and non-trainable parameters of the model.
If you have varying number of channels, then the model architecture is changing for different input and thus all the above listed points get impacted.
You can build a model with all the fixed parameters and later use dropout for the layer based on the input. But again the dropout is a regularization technique, Simply put, dropout refers to ignoring units (i.e. neurons) during the training phase of certain set of neurons which is chosen at random. By “ignoring”, I mean these units are not considered during a particular forward or backward pass.
The most appropriate solution would be -
Build multiple input layer for different inputs.
Concatenate all these layers, but make sure the output shape of all these layers are same in case of Convolution layers else concatenate throws error.
Add the remaining layers of the model.
Below is an example for this -
from keras.models import Model
from keras.layers import Input, concatenate, Conv2D, ZeroPadding2D
from keras.optimizers import Adagrad
import tensorflow.keras.backend as K
import tensorflow as tf
input_img1 = Input(shape=(44,44,3))
x1 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img1)
input_img2 = Input(shape=(34,34,3))
x2 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img2)
# Zero Padding of 5 at the top, bottom, left and right side of an image tensor
x3 = ZeroPadding2D(padding = (5,5))(x2)
# Concatenate works as layers have same size output
x4 = concatenate([x1,x3])
output = Dense(18, activation='relu')(x4)
model = Model(inputs=[input_img1,input_img2], outputs=output)
Output -
Model: "model_22"
Layer (type) Output Shape Param # Connected to
input_91 (InputLayer) (None, 34, 34, 3) 0
input_90 (InputLayer) (None, 44, 44, 3) 0
conv2d_73 (Conv2D) (None, 34, 34, 3) 84 input_91[0][0]
conv2d_72 (Conv2D) (None, 44, 44, 3) 84 input_90[0][0]
zero_padding2d_14 (ZeroPadding2 (None, 44, 44, 3) 0 conv2d_73[0][0]
concatenate_30 (Concatenate) (None, 44, 44, 6) 0 conv2d_72[0][0]
dense_47 (Dense) (None, 44, 44, 18) 126 concatenate_30[0][0]
Total params: 294
Trainable params: 294
Non-trainable params: 0
Hope this answers you question. Happy Learning.

how to save, restore, make predictions with siamese network (with triplet loss)

I am trying to develop a siamese network for simple face verification (and recognition in the second stage). I have a network in place that I managed to train but I am a bit puzzled when it comes to how to save and restore the model + making predictions with the trained model. Hoping that maybe an experienced person in the domain can help to make progress..
Here is how I create my siamese network, to begin with...
model = ResNet50(weights='imagenet') # get the original ResNet50 model
model.layers.pop() # Remove the last layer
for layer in model.layers:
layer.trainable = False # do not train any of original layers
x = model.get_layer('flatten_1').output
model_out = Dense(128, activation='relu', name='model_out')(x)
model_out = Lambda(lambda x: K.l2_normalize(x,axis=-1))(model_out)
new_model = Model(inputs=model.input, outputs=model_out)
# At this point, a new layer (with 128 units) added and normalization applied.
# Now create siamese network on top of this
anchor_in = Input(shape=(224, 224, 3))
positive_in = Input(shape=(224, 224, 3))
negative_in = Input(shape=(224, 224, 3))
anchor_out = new_model(anchor_in)
positive_out = new_model(positive_in)
negative_out = new_model(negative_in)
merged_vector = concatenate([anchor_out, positive_out, negative_out], axis=-1)
# Define the trainable model
siamese_model = Model(inputs=[anchor_in, positive_in, negative_in],
And I train the siamese_model. When I train it, if I interpret results right, it is not really training the underlying model, it just trains the new siamese network (essentially, just the last layer is trained).
But this model has 3 input streams. After the training, I need to save this model in a way so that it just takes 1 or 2 inputs so that I can perform predictions by calculating the distance between 2 given images. How do I save this model and reuse it now?
Thank you in advance!
In case you wonder, here is the summary of siamese model.
Layer (type) Output Shape Param # Connected to
input_2 (InputLayer) (None, 224, 224, 3) 0
input_3 (InputLayer) (None, 224, 224, 3) 0
input_4 (InputLayer) (None, 224, 224, 3) 0
model_1 (Model) (None, 128) 23849984 input_2[0][0]
concatenate_1 (Concatenate) (None, 384) 0 model_1[1][0]
Total params: 23,849,984
Trainable params: 262,272
Non-trainable params: 23,587,712
You can use below code to save your model
And then to load your model you need to use

Understanding Keras model architecture (tensor index)

This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
yields the following output
Layer (type) Output Shape Param # Connected to
A_input (InputLayer) (None, 100) 0
B_dense (Dense) (None, 20) 2020 A_input[0][0]
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
Layer (type) Output Shape Param # Connected to
input_7 (InputLayer) (None, 10) 0
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
Total params: 321
Trainable params: 321
Non-trainable params: 0