Upgraded to Tensorflow 2.5 now get a Lambda Layer error when using pretrained Keras Applications Models - tensorflow

I followed this tutorial to build a siamese network for my problem.
I was using Tensorflow 2.4.1 and now upgraded
This code worked wonderfully before
base_cnn = resnet.ResNet50(
weights="imagenet", input_shape=target_shape + (3,), include_top=False
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
Now each resnet layer or mobilenet or efficient net (tried them all)
throws these errors:
The following Variables were used a Lambda layer's call (tf.nn.convolution_620), but
are not present in its tracked objects:
<tf.Variable 'stem_conv/kernel:0' shape=(3, 3, 3, 48) dtype=float32>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
It compiles and seems to fit.
But do we have to initialize the models somewhat differently in 2.5?
Thanks for any pointers!

I'm not sure what's the main reason for your issue as it's not reproducible generally. But here are some notes about that warning message. The traceback shown in your question is not from ResNet but from EfficientNet.
Now, we know that the Lambda layer exists so that arbitrary expressions can be used as a Layer when constructing Sequential and Functional API models. Lambda layers are best suited for simple operations or quick experimentation. While it is possible to use Variables with Lambda layers, this practice is discouraged as it can easily lead to bugs. For example:
import tensorflow as tf
x_input = tf.range(12.).numpy().reshape(-1, 4)
weights = tf.Variable(tf.random.normal((4, 2)), name='w')
bias = tf.ones((1, 2), name='b')
# lambda custom layer
mylayer1 = tf.keras.layers.Lambda(lambda x: tf.add(tf.matmul(x, weights),
bias), name='lambda1')
The following Variables were used a Lambda layer's call (lambda1), but
are not present in its tracked objects:
<tf.Variable 'w:0' shape=(4, 2) dtype=float32, numpy=
array([[-0.753139 , -1.1668463 ],
[-1.3709341 , 0.8887151 ],
[ 0.3157893 , 0.01245957],
[-1.3878908 , -0.38395467]], dtype=float32)>
It is possible that this is intended behavior, but it is more likely
an omission. This is a strong indication that this layer should be
formulated as a subclassed Layer rather than a Lambda layer.
<tf.Tensor: shape=(3, 2), dtype=float32, numpy=
array([[ -3.903028 , 0.7617702],
[-16.687727 , -1.8367348],
[-29.472424 , -4.43524 ]], dtype=float32)>
It's because the mylayer1 layer doesn't trace the tf.Variables directly and so that those parameter won't appear in mylayer1.trainable_weights.
In general, Lambda layers can be convenient for simple stateless computation, but anything more complex should use a subclass Layer instead. From your traceback, it seems like there can be such a possible scenario with the step_conv layer.
for layer in EfficientNetB0(weights=None).layers:
if layer.name == 'stem_conv':
<tensorflow.python.keras.layers.convolutional.Conv2D object..
Quick surveying on source code of tf.compat.v1.nn.conv2d, lead to a lambda expression that might be the cause.

Here there is no need to revert back to TF2.4.1. I would always recommend try with latest version because it addressed many of the performance issues and new features.
I was able to execute above code without any issues in TF2.5.
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import layers, Model
img_width, img_height = 224, 224
target_shape = (img_width, img_height, 3)
base_cnn = ResNet50(
weights="imagenet", input_shape=target_shape, include_top=False
flatten = layers.Flatten()(base_cnn.output)
dense1 = layers.Dense(512, activation="relu")(flatten)
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)
embedding = Model(base_cnn.input, output, name="Embedding")
trainable = False
for layer in base_cnn.layers:
if layer.name == "conv5_block1_out":
trainable = True
layer.trainable = trainable
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94773248/94765736 [==============================] - 1s 0us/step
As per #Olli, Restarting and clearing the session the kernel has resolved the problem.

pip install tensorflow==2.3.0 , worked for me instead of tf 2.5
I was facing the issue related to using Lambda layer


TensorFlow with custom gym environment: Layer "dense_6" expects 1 input(s), but it received 2 input tensors

I am trying to use TF to solve a custom gym environment, all within Google Colab.
The main script is the TF "DQN Tutorial" available here.
In place of env_name = "CartPole-v0" I am using env_name = "gym_examples/GridWorld-v0", where gym_examples/GridWorld-v0 is the sample custom environment described in the gym documentation here. (That example uses gym v0.25.0 but TF requires gym <= v0.23.0, so I also had to tweak the rendering code a bit to make it work in v0.23.0.)
The environment loads fine via env = suite_gym.load(env_name), and subsequent code cells run fine as well, until the following two cells:
fc_layer_params = (100, 50)
action_tensor_spec = tensor_spec.from_spec(env.action_spec())
num_actions = action_tensor_spec.maximum - action_tensor_spec.minimum + 1
# Define a helper function to create Dense layers configured with the right
# activation and kernel initializer.
def dense_layer(num_units):
return tf.keras.layers.Dense(
scale=2.0, mode='fan_in', distribution='truncated_normal'))
# QNetwork consists of a sequence of Dense layers followed by a dense layer
# with `num_actions` units to generate one q_value per available action as
# its output.
dense_layers = [dense_layer(num_units) for num_units in fc_layer_params]
q_values_layer = tf.keras.layers.Dense(
minval=-0.03, maxval=0.03),
q_net = sequential.Sequential(dense_layers + [q_values_layer])
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
train_step_counter = tf.Variable(0)
agent = dqn_agent.DqnAgent(
After that cell, I get an error:
ValueError: Exception encountered when calling layer "sequential_2" (type Sequential).
Layer "dense_6" expects 1 input(s), but it received 2 input tensors. Inputs received: [<tf.Tensor: shape=(1, 2), dtype=int64, numpy=array([[2, 2]])>, <tf.Tensor: shape=(1, 2), dtype=int64, numpy=array([[3, 2]])>]
Call arguments received by layer "sequential_2" (type Sequential):
• inputs={'agent': 'tf.Tensor(shape=(1, 2), dtype=int64)', 'target': 'tf.Tensor(shape=(1, 2), dtype=int64)'}
• network_state=()
• kwargs={'step_type': 'tf.Tensor(shape=(1,), dtype=int32)', 'training': 'None'}
In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)
I'm too much of a TF novice to understand what's going on here. I suspect it's because the action state changed from 2 states (in CartPole) to 4 (in the custom GridWorld environment). But beyond that I cannot figure it out.
This can be solved by using an Embedding layer as your first layer. In this example (Embedding(16, 4)), 16 is the grid size (4x4), and 4 is the output dimension.
dense_layers = [dense_layer(num_units) for num_units in fc_layer_params]
For example, replacing the above line with the code below will eradicate the error.
dense_layers = [
# First layer
tf.keras.layers.Embedding(16, 4),
# Other layers
tf.keras.layers.Dense(100, activation=tf.keras.activations.relu)
Source and for further explanation:

How does Tensorflow or Keras handle model weight inititialization and when does it happen?

After reading the answer to this question I am a bit confused as to when exactly TensorFlow initializes the weight and bias variables.
As per the answers, Compile defines the loss function, the optimizer and the metrics. That's all.
Since the compile() method doesn't initialize it then that would suggest that it happens during the fit() method run.
However the issue with that is, in case of loading models or loading weights how would fit() know that the weights, its presented with, are actually useful and should not be thrown away and then assigned random values in place of those.
We pass the type of intitializer in the argument kernel_initializer while declaring the layer. For example:
dense02 = tf.keras.layers.Dense(units=10,
So an obvious question would be whether the weights are initialized layer by layer during the first epoch forward pass or does it happen for all layers before the first epoch.
(What I am trying to say is that if there say 5 Dense layers in the model, then does the initialization happen say a layer at a time, i.e. the first Dense layer gets initialized then the forward pass happens for that layer, then the second layer is initialized and the forward pass for second Dense layer happens and so on)
Another aspect is regarding transfer learning, when stacking custom layers on top of a trained model, the trained model layers have the weights, while the layers that I added wouldn't have any useful layers. So how would TensorFlow know to only initialize the variables of the layers I added and not the mess up the layers of the transferred model (provided, I don't have trainable=False)
How does TensorFlow or Keras handle weight initialization?
The weights are initialized when the model is created (when each layer in model is initialized), i.e before the compile() and fit():
import tensorflow as tf
from tensorflow.keras import models, layers
inputs = layers.Input((3, ))
outputs = layers.Dense(units=10,
model = models.Model(inputs=inputs, outputs=outputs)
for layer in model.layers:
print("Config:\n{}\nWeights:\n{}\n".format(layer.get_config(), layer.get_weights()))
{'batch_input_shape': (None, 3), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_1'}
{'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 10, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}
[array([[-0.60352975, 0.08275259, -0.6521113 , -0.5860774 , -0.42276743,
-0.3142944 , -0.28118378, 0.07770532, -0.5644444 , -0.47069687],
[ 0.4611913 , 0.35170448, -0.62191975, 0.5837332 , -0.3390234 ,
-0.4033073 , 0.03493106, -0.06078851, -0.53159714, 0.49872506],
[ 0.43685734, 0.6160207 , 0.01610583, -0.3673877 , -0.14144647,
-0.3792309 , 0.05478126, 0.602067 , -0.47438127, 0.36463356]],
dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]
After doing a bit more research, even though Mr. For Example's answer is correct, lets get a bit more deep into how initialization works in TensorFlow Keras.
As per the tf.keras.layers.Layer Doc, we can create variables in the following two methods:
__init__(self, ...): Defines custom layer attributes, and creates layer state variable that do not depend on input shapes, using add_weight()
build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight()
The below code shows an example of a basic layer with 2 variables that does the computation: y = w . x + b:
class SimpleDense(Layer):
def __init__(self, units=32):
super(SimpleDense, self).__init__()
self.units = units
def build(self, input_shape): # Create the state of the layer (weights)
w_init = tf.random_normal_initializer()
self.w = tf.Variable(
initial_value=w_init(shape=(input_shape[-1], self.units),
b_init = tf.zeros_initializer()
self.b = tf.Variable(
initial_value=b_init(shape=(self.units,), dtype='float32'),
def call(self, inputs): # Defines the computation from inputs to outputs
return tf.matmul(inputs, self.w) + self.b
# Instantiates the layer.
linear_layer = SimpleDense(4)
# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))
assert len(linear_layer.weights) == 2
# These weights are trainable, so they're listed in `trainable_weights`:
assert len(linear_layer.trainable_weights) == 2
The most interesting thing to note in the above code is when the build method is called.
The build() is called when the layer (after it has been initialized) is assigned some sort of input whether it be actual values or just a TensorFlow placeholder.
When using a Keras Sequential model, we add a layer to the model, it automatically assigns the input placeholder to the layer and there by initializing it at the same time.
Thus we see the weights before the calling of compile() or the fit() methods of the Keras Model. (Note that __call__() will automatically build the layer (if it has not been built yet) by calling build())
Regarding Transfer Learning, when we are loading the transferred model, we are loading already built layers, so the build method is not called again when you add the layers to your own model.
In other words, the layers, of the transferred model, already have had the input placeholder assigned to it and the build() method has already been called when the transferred model was being trained.
Useful References:
Keras Layer Doc
TF Tutorial: Custom Layers

'Channels first' training accuracy very low compared to 'channels last'

My issue:
I am trying to train a semantic segmentation model in tf.keras, in fact it works very well when I am using channels_last (WHC) mode (it reaches 96%+ val acc). I wanted to train it in channels_first (CHW) mode so the weights are compatible with TensorRT. When I do this, the ~80% training accuracy in the first few epochs dips down to around 0.020% and stays there permanently.
It is useful to know that the base of my model is a tf.keras.applications.MobileNet() model with the pre-trained 'imagenet' weights. (Model architecture at the bottom.)
The transformation process:
I used the guidelines provided and I change only a few things here:
Set tf.keras.backend.set_image_data_format() to 'channels_first'.
I change the channel order in the input tensor from: input_tensor=Input(shape=(376, 672, 3)) to: input_tensor=Input(shape=(3, 376, 672))
In my image preprocessing (using tf.data.Dataset), i use tf.transpose(img, perm=[2, 0, 1]) on both my input image and one-hot encoded mask to change the channel orders. I checked this with equality assertion to make sure its correct and it seems to be fine.
When I change these the training starts fine but as I said the training accuracy goes down to almost zero. When I revert back everything's fine again.
Possible leads:
What am I doing wrong or what could be the problematic part here? My suspicions are around these questions:
Are the pre-trained imageNet weights changed to the 'channels_first' order also when I set the backend? Is this something I should consider at all?
Could it be that the tf.transpose() function messes up the mask's one-hot encoding? (I have 3 classes represented by 3 colors: lane, opposing lane, background)
Maybe I am not seeing something obvious. I can provide further code and answers as needed.
08/17: This is still an ongoing issue, I have tried several things:
I checked if the image and the mask is correct after the transpose with numpy assertion, seems correct.
I suspected that the loss function calculates on the wrong axis, so I customized the loss function for the first axis (where the channels are). Here it is:
def ReverseAxisLoss(y_true, y_pred):
return K.categorical_crossentropy(y_true, y_pred, from_logits=True, axis=1)
My main suspicion is that the 'channels first' backend setting does nothing to transpose the pretrained 'imagenet' weights for the mobilenet part. Is there an updated way for TF2.x / Keras to transpose the pre-trained weights into CHW format?
Here is the architecture that I use (the skipNet() is the head network and the mobilenet is the base, and it is connected in the create_model() function)
def skipNet(encoder_output, feed1, feed2, classes):
# random initializer and regularizer
stddev = 0.01
init = RandomNormal(stddev=stddev)
weight_decay = 1e-3
reg = l2(weight_decay)
score_feed2 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed2)
score_feed2_bn = BatchNormalization()(score_feed2)
score_feed1 = Conv2D(kernel_size=(1, 1), filters=classes, padding="SAME",
kernel_initializer=init, kernel_regularizer=reg)(feed1)
score_feed1_bn = BatchNormalization()(score_feed1)
upscore2 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
height_pad1 = ZeroPadding2D(padding=((1,0),(0,0)))(upscore2)
upscore2_bn = BatchNormalization()(height_pad1)
fuse_feed1 = add([score_feed1_bn, upscore2_bn])
upscore4 = Conv2DTranspose(kernel_size=(4, 4), filters=classes, strides=(2, 2),
padding="SAME", kernel_initializer=init,
height_pad2 = ZeroPadding2D(padding=((0,1),(0,0)))(upscore4)
upscore4_bn = BatchNormalization()(height_pad2)
fuse_feed2 = add([score_feed2_bn, upscore4_bn])
upscore8 = Conv2DTranspose(kernel_size=(16, 16), filters=classes, strides=(8, 8),
padding="SAME", kernel_initializer=init,
kernel_regularizer=reg, activation="softmax")(fuse_feed2)
return upscore8
def create_model(classes):
base_model = tf.keras.applications.MobileNet(input_tensor=Input(shape=IMG_SHAPE),
conv4_2_output = base_model.get_layer(index=43).output
conv3_2_output = base_model.get_layer(index=30).output
conv_score_output = base_model.output
head_model = skipNet(conv_score_output, conv4_2_output, conv3_2_output, classes)
for layer in base_model.layers:
layer.trainable = False
model = Model(inputs=base_model.input, outputs=head_model)
return model

Tensorflow dense layers worse than keras sequential

I try to train an agent on the inverse-pendulum (similar to cart-pole) problem, which is a benchmark of reinforcement learning. I use neural-fitted-Q-iteration algorithm which uses a multi-layer neural network to evaluate the Q function.
I use Keras.Sequential and tf.layers.dense to build the neural network repectively, and leave all other things to be the same. However, Keras gives me a good results and tensorflow does not. In fact, tensorflow doesn't work at all with its loss being increasing and the agent learns nothing from the training.
Here I present the code for Keras as follows
def build_model():
model = Sequential()
model.add(Dense(5, input_dim=3))
adam = Adam(lr=1E-3)
model.compile(loss='mean_squared_error', optimizer=adam)
return model
and the tensorflow version is
class NFQ_fit(object):
neural network approximator for NFQ iteration
def __init__(self, sess, N_feature, learning_rate=1E-3, batch_size=100):
self.sess = sess
self.N_feature = N_feature
self.learning_rate = learning_rate
self.batch_size = batch_size
# DNN structure
self.inputs = tf.placeholder(tf.float32, [None, N_feature], 'inputs')
self.labels = tf.placeholder(tf.float32, [None, 1], 'labels')
self.l1 = tf.layers.dense(inputs=self.inputs,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
self.l2 = tf.layers.dense(inputs=self.l1,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
self.outputs = tf.layers.dense(inputs=self.l2,
kernel_initializer=tf.truncated_normal_initializer(0.0, 1E-2),
# optimization
# self.mean_loss = tf.losses.mean_squared_error(self.labels, self.outputs)
self.mean_loss = tf.reduce_mean(tf.square(self.labels-self.outputs))
self.regularization_loss = tf.losses.get_regularization_loss()
self.loss = self.mean_loss # + self.regularization_loss
self.train_op = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
The two models are the same. Both of them has two hidden layers with the same dimension. I expect that the problems may come from the kernel initialization but I don't know how to fix it.
Using Keras is great. If you want better TensorFlow integration check out tf.keras. There's no particular reason to use tf.layers if the Keras (or tf.keras) defaults work better.
In this case glorot_uniform looks like the default initializer. This is also the global TensorFlow default, so consider removing the kernel_initializer argument instead of the explicit truncated normal initialization in your question (or passing Glorot explicitly).

keras trainable attribute not compatible with tensorflow?

It seems that keras trainable attribute is ignored by tensorflow, which makes it very inconvenient to use keras as a syntactical shortcut in tensorflow.
For example:
import keras
import tensorflow as tf
import numpy as np
import keras.backend as K
Conv2 = keras.layers.Conv2D(filters=16, kernel_size=3, padding='same')
Conv2.trainable = False #This layers has been set to not trainable.
B = Conv2(A)
x = np.random.randn(1, 16, 16,3)
y = np.random.randn(1,16, 16, 16)
True_y = tf.placeholder(shape=(1,16,16,16), dtype=tf.float32)
loss = tf.reduce_sum((B - True_y) ** 2)
opt_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss)
# [<tf.Variable 'conv2d_1/kernel:0' shape=(3, 3, 3, 16) dtype=float32_ref>, <tf.Variable 'conv2d_1/bias:0' shape=(16,) dtype=float32_ref>]
sess = K.get_session()
for _ in range(10):
out = sess.run([opt_op, loss], feed_dict={A:x, True_y:y})
It simply means the loss is decreasing and the weights are trainable.
I read the blog ''Keras as a simplified interface to TensorFlow'', but it mentioned nothing about the trainable problem.
Any suggestion is appreciated.
Your conclusion is basically correct. Keras is a wrapper around TensorFlow, but not all Keras functionality transfers directly into TensorFlow, so you need to be careful when you mix Keras and raw TF.
Specifically, in this case, if you want to call the minimize function yourself, you need to specify which variables you want to train on using the var_list argument of minimize.