How to make input layer explicit in tf.keras - tensorflow

This question makes use of a pre-trained VGG network, whose summary shows an InputLayer being used. I like the clarity of the explicit input layer... even if functionally it does nothing (true?). But when I try to mimic this with something like:
model = Sequential()
model.add(Input(shape=(128, 128, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
the result displayed using print(model.summary()) is no different from:
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3))
... and both show the first layer as being a Conv2D layer. Where did my Input layer go? And is it worth the hassle of getting it back?

In your example you're using a Sequential, try using a keras.models.Model.
inp = keras.layers.Input((128, 128, 3))
op = keras.layers.Conv2D(32, (3, 3), activation='relu')(inp)
model = keras.models.Model(inputs=[ inp ], outputs = [op] )
Model: "model_1"
Layer (type) Output Shape Param #
input_2 (InputLayer) [(None, 128, 128, 3)] 0
conv2d_4 (Conv2D) (None, 126, 126, 32) 896
Total params: 896
Trainable params: 896
Non-trainable params: 0

No, you can keep them separate, it does not make any difference.
As for the input_shape, that argument can be specified for each and every layer, yet Keras is smart enough to deduce on its own the shape of the next layers so we do not mention it explicitly.


Dimension of output in Dense layer Keras

I have the sample following model
from tensorflow.keras import models
from tensorflow.keras import layers
sample_model = models.Sequential()
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
optimizer="adam", metrics = ["accuracy"])
IP for the model:
sam_x = np.random.rand(10,4)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,]),sam_y)
The confusion is the fit should have thrown an error of shape mismatch as the expected_input_shape for the 2nd Dense Layer is given as (None,44) but the output for the 1st Dense Layer (which is the input of the 2nd Dense Layer) will be of shape (None,32). But it ran successfully.
I don't understand why there was no error. Any clarifications will be helpful
The input_shape keyword argument has an effect only on the first layer of a Sequential. The shape of the input of the other layers will be derived from their previous layer.
That behaviour is hinted in the doc of tf.keras.layers.InputShape:
When using InputLayer with Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
And in the Sequential Model guide.
The behaviour can be confirmed by looking at the source of the Sequential.add method:
if not self._layers:
if isinstance(layer, input_layer.InputLayer):
# Case where the user passes an Input or InputLayer layer via `add`.
set_inputs = True
batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
if batch_shape:
# Instantiate an input layer.
x = input_layer.Input(
batch_shape=batch_shape, dtype=dtype, + '_input')
# This will build the current layer
# and create the node connecting the current layer
# to the input layer we just created.
set_inputs = True
If there is no layers yet in the model, then an Input will be added to the model with the shape derived from the first layer of the model. This is done only if no layer is present yet in the model.
That shape is either fully known (if input_shape has been passed to the first layer of the model) or will be fully known once the model is built (for example, with a call to
The thing is after checking the input shape of the model from the first layer, it won't check or deal with other declared input shape inside that same model. For example, if you write your model the following way
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.add(layers.Dense(8, input_shape = (32,)))
The program will always check the first declared input shape layer and discard the rest. So, if you start your first layer with input_shape = (44,), you need to pass exact feature numbers to your model as input such as:
sam_x = np.random.rand(10,44)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,]),sam_y)
Additionally, if you look at the Functional API, unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data. It's not learnable but simply a spec layer. It's a kind of gateway of the input data for the model. That means even if we define input_shape inside the other layers, they all will be discarded. For example:
nputs = keras.Input(shape=(4,))
dense = layers.Dense(64, input_shape=(8,)) # dicard input_shape
x = dense(inputs)
x = layers.Dense(64, input_shape=(16,))(x) # dicard input_shape
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
Here is a more complex example with Conv2D and MNIST.
encoder_input = keras.Input(shape=(28, 28, 1),)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[32,32,3])(encoder_input)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[64,64,3])(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[224,321,3])(x)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[420,32,3])(x)
x = layers.GlobalMaxPooling2D()(x)
out = layers.Dense(10, activation='softmax')(x)
encoder = keras.Model(encoder_input, out, name="encoder")
Model: "encoder"
Layer (type) Output Shape Param #
input_15 (InputLayer) [(None, 28, 28, 1)] 0
conv2d_8 (Conv2D) (None, 26, 26, 16) 160
conv2d_9 (Conv2D) (None, 24, 24, 32) 4640
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
conv2d_10 (Conv2D) (None, 6, 6, 32) 9248
conv2d_11 (Conv2D) (None, 4, 4, 16) 4624
global_max_pooling2d_2 (Glob (None, 16) 0
dense_56 (Dense) (None, 10) 170
Total params: 18,842
Trainable params: 18,842
Non-trainable params: 0
def pre_process(image, label):
return (image / 256)[...,None].astype('float32'),
tf.keras.utils.to_categorical(label, num_classes=10)
(x, y), (_, _) = tf.keras.datasets.mnist.load_data('mnist')
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam()), y, batch_size=256)
4s 14ms/step - loss: 1.4303 - categorical_accuracy: 0.5279
I think Keras will create (or preserves to create) an additional Input Layer - but as the second dense layer is added using model.add() it will automatically be connected to the layer before, and thus the extra input layer stays unconnected and is not part of the model.
(I agree that it would be nice of Keras to hint at unconnected layers, I sometimes created unconnected layers when using the functional API and changed the inputs. Keras doesn't remind me that I had jumped several layers, I just wondered why the summary() was so short...)

What is the correct way to upsample a [32x32x6] layer in a CNN

I have a CNN that produces a [32x32] image with 6 channels, but I need to upsample it to 256x256. I'm doing:
def upsample(filters, size):
initializer = tf.random_normal_initializer(0., 0.02)
result = tf.keras.Sequential()
result.add(tf.keras.layers.Conv2DTranspose(filters, size, strides=2,
return result
Then I pass the layer like this:
up_stack = [
upsample(6, 3), # x2
upsample(6, 3), # x2
upsample(6, 3) # x2
for up in up_stack:
finalLayer = up(finalLayer)
But this setup produces inaccurate results. Is there anything I'm doing wrong?
Your other option would be to use tf.keras.layers.UpSampling2D for your purpose, but that doesn't learn a kernel to upsample (it uses bilinear upsampling).
So, your approach is correct. But, you have used kernel_size as 3x3.
It should be 2x2 and if you are not satisfied with the results, you should increase the number of filters from [32, 256].
If you wish to use the up-convolution, I will suggest doing the following to achieve what you want. Following code works, just change the filter based on your need.
import tensorflow as tf
from tensorflow.keras import layers
# in = 32x32 out 256x256
inputs = layers.Input(shape=(32, 32, 6))
deconc01 = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(inputs)
deconc02 = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(deconc01)
outputs = layers.Conv2DTranspose(256, kernel_size=2, strides=(2, 2), activation='relu')(deconc02)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="up-conv")
Model: "up-conv"
Layer (type) Output Shape Param #
input_1 (InputLayer) [(None, 32, 32, 6)] 0
conv2d_transpose (Conv2DTran (None, 64, 64, 256) 6400
conv2d_transpose_1 (Conv2DTr (None, 128, 128, 256) 262400
conv2d_transpose_2 (Conv2DTr (None, 256, 256, 256) 262400
Total params: 531,200
Trainable params: 531,200
Non-trainable params: 0

Trouble understanding parts of concepts in creating custom callbacks in keras

import keras
import numpy as numpy
class ActivationLogger(keras.callbacks.Callback):
def set_model(self,model):
self.model = model //inform the callback of what model we will be calling
layer_outputs = [layer.output for layer in model.layers]
self.activations_model = keras.models.Model(model.input,layer_outputs)//returns activation of every layer
def on_epoch_end(self,epoch,logs = None):
if self.validation_data is None:
raise RuntimeError("Requires validation_data")
validation_sample = self.validation_data[0][0:1]
activations = self.activations_model.predict(validation_sample) #computes activation of every epoch
f = open('activations_at_epoch_' + str(epoch) + '.npz', 'w')
np.savez(f, activations)
While I was reading this code to create custom callbacks,I couldn't understand few lines of code.I know what are callbacks. What I understood from the above code is that we inherit the super class keras.callbacks.Callback and in the set_model fucntion, we inform the callback of what model it will be calling. I am not able to understand the below line, why does keras.models.Model take model.input?
self.activations_model = keras.models.Model(model.input,
and the line activations = self.activations_model.predict(validation_sample)
The further lines just save the numpy arrays to the drive. Also is the callback created,called on every epoch?
Let's say i have an simple model
model = Sequential()
model.add(Dense(32, input_shape=(784, 1), activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(4, activation='softmax'))
cb = ActivationLogger()
Now let me go through line by line of function set_model:
self.model = model
self.model.summary() = Model: "sequential"
Layer (type) Output Shape Param #
dense (Dense) (None, 32) 25120
dense_1 (Dense) (None, 16) 528
dropout (Dropout) (None, 16) 0
dense_2 (Dense) (None, 4) 68
Total params: 25,716
Trainable params: 25,716
Non-trainable params: 0
second line:
layer_outputs = [layer.output for layer in model.layers]
print(layer_outputs) = [<tf.Tensor 'dense/Relu:0' shape=(None, 32) dtype=float32>, <tf.Tensor 'dense_1/Relu:0' shape=(None, 16) dtype=float32>, <tf.Tensor 'dropout/cond/Identity:0' shape=(None, 16) dtype=float32>, <tf.Tensor 'dense_2/Softmax:0' shape=(None, 4) dtype=float32>]
layer_outputs contains all the tensors or the layers of the models
and the
third line:
self.activations_model = keras.models.Model(model.input,layer_outputs)
Now in this line, it creates a model with input shape corresponding to original model(model.input = it gives the input tensor or layer of a model. you can also checkout the output shape of a model using model.output)
so self.activation_model is model with one input shape((784, ) in this case) and output at every layer
so when you feed any input through this model it will give you a list of outputs correspond to every layer
Normally output will be a numpy array of shape (none, 4) (taking main Sequential model)
but self.activation will give you a list a numpy arrays. So the line
activations = self.activations_model.predict(validation_sample)
activation just contains the predictions of self.activation_model which a nothing but a list of numpy arrays
[(none, 32)(output of first layer), (None, 16)(output of 2nd), (none, 16)(dropout lyr), (none, 4)(final)
i would suggest you to read about keras Model Function api which is used to make models with many input and outputs

Different output shape of Conv2D between tf.keras and keras?

It might be a dumb question since I'm new to Keras and Tensorflow.
I have this simple model:
classifier.add(Convolution2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))
classifier.add(MaxPooling2D(pool_size=(2, 2)))
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=128, activation='relu'))
classifier.add(Dense(units=2, activation='softmax'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
When running with tf.keras.* (like from tensorflow.keras.models import Sequential) classes, summary shows the first layer as:
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 21, 21, 32) 896
but when running with keras.*(like from keras.models import Sequential) classes. summary shows the first layer as:
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 62, 62, 32) 896
Why do they give different output shapes?
I'm using tensorflow 2.0.0 and keras 2.3.1
Well that is related with the kernel_size actually. First let's check
Example 1:
classifier.add(Conv2D(32, (3, 3), input_shape=(64, 64, 3), activation='relu'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
note that the little change in Conv2D that is about in your code it was classifier.add(Conv2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu')) and in the other one has kernel_size = (3, 3).
Example 1 gives the output shape as
(None, 62, 62, 32)
Example 2:
Let's change it to your version
classifier.add(Conv2D(32, 3, 3, input_shape=(64, 64, 3), activation='relu'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
results in,
(None, 21, 21, 32) 896
The conclusion is, tf.keras api and standalone keras are interpreting kernel_size differently. Notice that I have get the both results with tf.keras api.

Keras LSTM with stateful in Reinforcement Learning

I'm doing a simple DQN RL algorithm with Keras, but using an LSTM in the network. The idea is that a stateful LSTM will remember the relevant information from all prior states and thus predict rewards for different actions better. This problem is more of a keras problem than RL. I think the stateful LSTM is not being handled by me correctly.
MODEL CODE - functional api used:
state_input = Input( batch_shape=(batch_size,look_back,1,resolution[0], resolution[1]))
conv1 = TimeDistributed(Conv2D(8, 6, strides=3, activation='relu', data_format="channels_first"))(
state_input) # filters, kernal_size, stride
conv2 = TimeDistributed(Conv2D(8, 3, strides=2, activation='relu', data_format="channels_first"))(
conv1) # filters, kernal_size, stride
flatten = TimeDistributed(Flatten())(conv2)
fc1 = TimeDistributed(Dense(128,activation='relu'))(flatten)
fc2 = TimeDistributed(Dense(64, activation='relu'))(fc1)
lstm_layer = LSTM(4, stateful=True)(fc2)
fc3 = Dense(128, activation='relu')(lstm_layer)
fc4 = Dense(available_actions_count)(fc3)
model = keras.models.Model(input=state_input, output=fc4)
adam = RMSprop(lr= learning_rate)#0.001
model.compile(loss="mse", optimizer=adam)
This is the model summary:
Layer (type) Output Shape Param
input_1 (InputLayer) (1, 1, 1, 30, 45) 0
time_distributed_1 (TimeDist (1, 1, 8, 9, 14) 296
time_distributed_2 (TimeDist (1, 1, 8, 4, 6) 584
time_distributed_3 (TimeDist (1, 1, 192) 0
time_distributed_4 (TimeDist (1, 1, 128) 24704
time_distributed_5 (TimeDist (1, 1, 64) 8256
lstm_1 (LSTM) (1, 4) 1104
dense_3 (Dense) (1, 128) 640
dense_4 (Dense) (1, 8) 1032
Total params: 36,616
Trainable params: 36,616
Non-trainable params: 0
I feed in one frame at a time to fit the model. When I need to predict to act, I make sure I save the model state and restore it as follows.
#save the state (lstm memory) for recovery before fitting.
prev_state = get_model_states(model)
target_q = model.predict(s1, batch_size=batch_size)
#lstm predict updates the state of the lstm modules
q_next = model.predict(s2, batch_size=batch_size)
max_q_next = np.max(q_next, axis=1)
target_q[np.arange(target_q.shape[0]), a] = r + discount_factor * (1 - isterminal) * max_q_next
#now recover states for fitting the model correctly
set_model_states(model,prev_state)#to before s1 prediction, target_q,batch_size=batch_size, verbose=0)
#after fitting, the state and weights both get updated !!
#so lstm has already moved forward in the sequence
The model does not seem to be working at all. The variance remains very high across different epochs. I am resetting the model after every episode , as one would expect. So stateful does not affect the training between episodes. Each episode is fed in one frame at a time, that is why I need stateful.
I've tried different discount factors and learning rates.In theory, this should be a superior model to the vanilla dqn (CNN with 4 frames )
What am I doing wrong ? Any help would be appreciated.