how to parallelize custom lstm (4d input) - tensorflow

After permute layer, dimensions become (None, None, 12, 16)
I want to summarize last two dimensions with a LSTM(48 units) with input_shape(12, 16)
so that overall dimension becomes (None, None, 48)
Currently I have a workaround with custom lstm&lstmcell, however its very slow, since I have used another LSTM within the cell etc.
What I would want to have is this:
(None, None, 12, 16)
(None, None, 48)
(None, None, 60)
The last two is done in custom lstm (currently), is there a way to seperate them?
Whats the proper way of doing this?
Can we create different(or more than one) lstm for cells, which have same weights but different cell states?
Would you give me some direction?
inputs (InputLayer) (None, 36, None, 1) 0
convlayer (Conv2D) (None, 36, None, 16) 160 inputs[0][0]
mp (MaxPooling2D) (None, 12, None, 16) 0 convlayer[0][0]
permute_1 (Permute) (None, None, 12, 16) 0 mp[0][0]
reshape_1 (Reshape) (None, None, 192) 0 permute_1[0][0]
custom_lstm_extended_1 (CustomL (None, None, 60) 26160 reshape_1[0][0]
Custom LSTM is called like this:
CustomLSTMExtended(units=60, summarizeUnits=48, return_sequences=True, return_state=False, input_shape=(None, 192))(inner)
LSTM class:
self.summarizeUnits = summarizeUnits
self.summarizeLSTM = CuDNNLSTM(summarizeUnits, input_shape=(None, 16), return_sequences=False, return_state=True)
cell = SummarizeLSTMCellExtended(self.summarizeLSTM, units,
RNN.__init__(self, cell,
Cell class:
def call(self, inputs, states, training=None):
reshaped = Reshape([12, 16])(inputs)
state_h = self.summarizeLayer(reshaped)
inputsx = state_h[0]
return super(SummarizeLSTMCellExtended, self).call(inputsx, states, training)

I have done this using tf.reshape rather than keras Reshape layer.
Keras reshape layer doesnt want you to interfere with "batch_size" dimension
shape = Lambda(lambda x: tf.shape(x), output_shape=(4,))(inner)
inner = Lambda(lambda x : customreshape(x), output_shape=(None, 48))([inner, shape])
def customreshape(inputs):
inner = inputs[0]
shape = inputs[1]
import tensorflow as tf2
reshaped = tf2.reshape(inner, [shape[0], shape[1], 48] )
return reshaped


Error on custom dataset dimensions feeding transfer model in TensorFLow

Can someone explain this TensorFlow error for me, I'm having trouble understanding what I am doing wrong.
I have a dataset in Tensorflow constructed with a generator. When I test the output of the generator, output dimensions look correct (224 x 224 x 1). But when I try to train the model, I get an error:
WARNING:tensorflow:Model was constructed with shape (None, 224, 224, 1) for input
KerasTensor(type_spec=TensorSpec(shape=(None, 224, 224, 1), dtype=tf.float32,
name='input_2'), name='input_2', description="created by layer 'input_2'"),
but it was called on an input with incompatible shape (224, 224, 1, 1).
I'm unsure why the dimension of this output has an extra 1 at the end.
Here is the code to create the generator and model. df is a dataframe with file-paths to data and labels. The data are 2D matrices of variable dimensions. I'm using cv2.resize to make them 224x224 and then np.reshape to transform dimensions to (224x224x1). Then I yield the result.
def datagen_row():
# ======================== #
# Import data
# ======================== #
df = get_data()
rowsize = 224
colsize = 224
# ======================== #
# ======================== #
for row in range(len(df)):
data = get_data_from_filepath(df.iloc[row].file_path)
data = cv2.resize(data, dsize=(rowsize, colsize), interpolation=cv2.INTER_CUBIC)
labels = df.iloc[row].label
data = data.reshape( 224, 224, 1)
yield data, labels
dataset =
tf.TensorSpec(shape = (int(os.getenv('rowsize')), int(os.getenv('colsize')), 1), dtype=tf.float32, name=None),
tf.TensorSpec(shape=(), dtype=tf.int64, name=None)
Testing the following I get what I expected:
iterator = iter(dataset.batch(8))
x = iterator.get_next()
x[0].shape # TensorShape([8, 224, 224, 1])
x[1].shape # TensorShape([8])
x[0] # <tf.Tensor: shape=(8, 224, 224, 1), dtype=float32, numpy=array(...
x[1] # <tf.Tensor: shape=(8,), dtype=int64, numpy=array([1, 1, 1, 1, 1, 1, 1, 1], dtype=int64)>
I'm trying to plug this into InceptionV3 model to do a classification
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.layers import Input
from tensorflow.keras import layers
origModel = InceptionV3(weights = 'imagenet', include_top = False)
inputs = layers.Input(shape = (224, 224, 1))
modified_inputs = layers.Conv2D(3, 3, padding = 'same', activation='relu')(inputs)
x = origModel(modified_inputs)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(1024, activation = 'relu')(x)
x = layers.Dense(512, activation = 'relu')(x)
x = layers.Dense(256, activation = 'relu')(x)
x = layers.Dense(128, activation = 'relu')(x)
x = layers.Dense(64, activation = 'relu')(x)
x = layers.Dense(32, activation = 'relu')(x)
outputs = layers.Dense(2)(x)
model = tf.keras.Model(inputs, outputs)
model.summary() # 24.6 M trainable params
for layer in origModel.layers:
layer.trainable = False
model.summary() # now shows 2.8 M trainable params
optimizer = 'adam',
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits = True),
metrics = ['accuracy']
), epochs = 1, verbose = True, batch_size = 32)
Here is the output of model.summary
Model: "model"
Layer (type) Output Shape Param #
input_2 (InputLayer) [(None, 224, 224, 1)] 0
conv2d_94 (Conv2D) (None, 224, 224, 3) 30
inception_v3 (Functional) (None, None, None, 2048) 21802784
global_average_pooling2d (G (None, 2048) 0
dense (Dense) (None, 1024) 2098176
dense_1 (Dense) (None, 512) 524800
dense_2 (Dense) (None, 256) 131328
dense_3 (Dense) (None, 128) 32896
dense_4 (Dense) (None, 64) 8256
dense_5 (Dense) (None, 32) 2080
dense_6 (Dense) (None, 2) 66
Total params: 24,600,416
Trainable params: 2,797,632
Non-trainable params: 21,802,784
This code worked after changing, epochs = 1, verbose = True, batch_size = 32)
to, epochs = 1, verbose = True, batch_size = 32)
So... I will have to look into using dataset.batch versus batch_size in

Making Class Activation Map (CAM) for EfficientNetB3 architecture

I would like to draw a class activation map for a model built upon EfficeintNet B3. But when I follow different tutorials and codes from different sources, it simply fails....
#load images
img = tf.keras.preprocessing.image.load_img(
base, target_size=(img_height, img_width))
img_array = tf.keras.preprocessing.image.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch
predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])
last_conv = model.layers[2].layers[-3]
grad_model = tf.keras.models.Model(
[model.inputs], [last_conv.output, model.output])
Can't build a grad_model
ValueError: Graph disconnected: cannot obtain value for tensor
KerasTensor(type_spec=TensorSpec(shape=(None, 300, 300, 3),
dtype=tf.float32, name='input_1'), name='input_1',
description="created by layer 'input_1'") at layer "stem_conv". The
following previous layers were accessed without issue: []
This is the model:
Model: "sequential_1"
Layer (type) Output Shape Param #
sequential (Sequential) (None, 300, 300, 3) 0
rescaling (Rescaling) (None, 300, 300, 3) 0
efficientnet-b3 (Functional) (None, 10, 10, 1536) 10783528
global_average_pooling2d (Gl (None, 1536) 0
dropout (Dropout) (None, 1536) 0
dense (Dense) (None, 128) 196736
dense_1 (Dense) (None, 5) 645
To address the graph disconnected value error, you need to build the grad cam model properly. Here is one of the ways to build a model for grad-cam.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
inputs = tf.keras.Input(shape=(300, 300, 3))
x = keras.applications.EfficientNetB3(
input_tensor=inputs, # pass input to input_tensor
# flat the base model with x.output
x = layers.GlobalAveragePooling2D()(x.output)
# others
x = layers.Dense(128)(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(5)(x)
model = keras.Model(inputs, x)
for i, layer in enumerate(model.layers[-10:]):
print(i,, layer.output_shape, layer.trainable)
0 block7b_project_bn (None, 10, 10, 384) True
1 block7b_drop (None, 10, 10, 384) True
2 block7b_add (None, 10, 10, 384) True
3 top_conv (None, 10, 10, 1536) True
4 top_bn (None, 10, 10, 1536) True
5 top_activation (None, 10, 10, 1536) True # < - We will pick this 2D maps
6 global_average_pooling2d_2 (None, 1536) True
7 dense_2 (None, 128) True
8 dropout (None, 128) True
9 dense_3 (None, 5) True
Build Grad-CAM Model
grad_model = keras.models.Model(
With your setup, you would get a disconnected error with the following code. But now, it wouldn't happen.
import numpy as np
image = np.random.rand(1, 300, 300, 3).astype(np.float32)
with tf.GradientTape() as tape:
convOutputs, predictions = grad_model(tf.cast(image, tf.float32))
loss = predictions[:, tf.argmax(predictions[0])]
grads = tape.gradient(loss, convOutputs)
(1, 10, 10, 1536) # NO DISCONNECTED ERROR
To get the heatmaps from your grad-cam model, check the following answers and sources as references.
Grad-CAM mult-output Model
Swin-Transformer : GradCAM

Sentence classification: Why does my embedding not reduce the shape of the subsequent layer?

I want to embed sentences that all contain 5 words and a my training-set has a total vocabulary of 10000 words. I use this code:
import tensorflow as tf
vocab_size = 10000
inputs = tf.keras.layers.Input(shape=(5,vocab_size), name="input", )
embedding = tf.keras.layers.Embedding(10000, 64)(inputs)
conv2d_1 = Conv2D( filters = 32, kernel_size = (3,3),
strides =(1), padding = 'SAME',)(embedding)
model = tf.keras.models.Model(inputs=inputs, outputs=conv2d_1)
After running I get:
Layer (type) Output Shape Param #
input (InputLayer) [(None, 5, 10000)] 0
embedding_105 (Embedding) (None, 5, 10000, 64) 640000
conv2d_102 (Conv2D) (None, 5, 10000, 32) 18464
I want to do the embedding to convert the sparse 10000x5 tensor to a dense 64x5 tensor. Apparently that doesn't work as intended, so my question is: Why is the shape of the next layer not (None, 5, 64, 32) instead of (None, 5, 10000, 32)? How can I achieve the compactization?

In Keras, the ResNet50 has a strange pattern

As you know, in the CNN, only layers of Convolution, BatchNormalization have weights. And Usually, they are constructed by this way. Conv - BN - ReLU - Conv - BN - ReLU
But, As you can see, below the structure remain unusual.
You can find this result by:
model = tf.keras.application.ResNet50()
#The unusual phenomenon begins with index 18.
I recommend that you use debugging mode in your IDE. Then you'll find it easier.
In the below lines, the ResNet50 has stack_fn function for creating layers
def ResNet50():
def stack_fn(x):
x = stack1(x, 64, 3, stride1=1, name='conv2')
x = stack1(x, 128, 4, name='conv3')
x = stack1(x, 256, 6, name='conv4')
return stack1(x, 512, 3, name='conv5')
In the below codes, the stack1 is for simplifying repeated residential blocks.
def stack1(x, filters, blocks, stride1=2, name=None):
x = block1(x, filters, stride=stride1, name=name + '_block1')
for i in range(2, blocks + 1):
x = block1(x, filters, conv_shortcut=False, name=name + '_block' + str(i))
return x
In the below structure, the block1 is Residential layers in ResNet50.
def block1(x, filters, kernel_size=3, stride=1, conv_shortcut=True, name=None):
bn_axis = 3 if backend.image_data_format() == 'channels_last' else 1
if conv_shortcut:
shortcut = layers.Conv2D(
4 * filters, 1, strides=stride, name=name + '_0_conv')(x)
shortcut = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_0_bn')(shortcut)
shortcut = x
x = layers.Conv2D(filters, 1, strides=stride, name=name + '_1_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_1_bn')(x)
x = layers.Activation('relu', name=name + '_1_relu')(x)
x = layers.Conv2D(
filters, kernel_size, padding='SAME', name=name + '_2_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_2_bn')(x)
x = layers.Activation('relu', name=name + '_2_relu')(x)
x = layers.Conv2D(4 * filters, 1, name=name + '_3_conv')(x)
x = layers.BatchNormalization(
axis=bn_axis, epsilon=1.001e-5, name=name + '_3_bn')(x)
x = layers.Add(name=name + '_add')([shortcut, x])
x = layers.Activation('relu', name=name + '_out')(x)
return x
My problem is why are the model instance different from the actual structures?
Sorry I might have misunderstood your question previously.
As shown in the picture below, there seems to be two contiguous conv layer, and I assume this is what you meant. However, this is in fact not contiguous.
ResNet has a branching structure (residual), which means it is not sequential. But in TensorFlow, summary prints its layers sequentially, so, note the last column, it represents what this layer is connected to before it TensorFlow illustrates parallel structures by specifying which layer is after which.
for example, conv2_block1_0_conv is connected to pool1_pool
conv2_block1_3_conv is connected to conv2_block1_2_relu
Which means although they are printed side by side, they are not contiguous, they are parallel structures!
conv2_block1_0_conv and conv2_block1_0_bn are on the shortcut path
while conv2_block1_3_conv and conv2_block1_3_bn are on the residual path
please feel free to comment if you have more questions on this part, or open a new post if you have other questions
model.weights return weights of a model (which is self-explanatory by name).
Conv - BN - ReLU - Conv - BN - ReLU are layers.
Conv stands for Convolutional layer, BN stands for Batch Normalization, ReLU is activation.
To get a list of layers, you can use model.layers (which returns a list of Layer objects). If you simply want to see the summary of model structure, use model.summary() to print the structure
For example, ResNet50().summary() gives (partial output)
odel: "resnet50"
__________________________________________________________________________________________________ Layer (type) Output Shape Param #
Connected to
================================================================================================== input_1 (InputLayer) [(None, 224, 224, 3) 0
__________________________________________________________________________________________________ conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0
__________________________________________________________________________________________________ conv1_conv (Conv2D) (None, 112, 112, 64) 9472
__________________________________________________________________________________________________ conv1_bn (BatchNormalization) (None, 112, 112, 64) 256
__________________________________________________________________________________________________ conv1_relu (Activation) (None, 112, 112, 64) 0
__________________________________________________________________________________________________ pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0
__________________________________________________________________________________________________ pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0
__________________________________________________________________________________________________ conv2_block1_1_conv (Conv2D) (None, 56, 56, 64) 4160
__________________________________________________________________________________________________ conv2_block1_1_bn (BatchNormali (None, 56, 56, 64) 256
__________________________________________________________________________________________________ conv2_block1_1_relu (Activation (None, 56, 56, 64) 0

"Tap" a specific layer in existing Keras Model and make a branch to a new output?

Im using TF.Keras (Tensorflow 1.14) on Google Colab, and my model architecture is MobileNet V2 1.00 224.
I am trying (and failing) to attach a new layer and make a new output to an existing layer that is not the normal output of my Model. I.e make a branch earlier in MobileNet V2
I want this new branch to be for a regression output - but I dont want that output to serially connected off of the final embedding layer of MobileNet, but a much earlier stage (which one - im not sure, im experimenting). Basically a branch with its own output, and then the normal, pre-trained image net embedding out.
Grab MobileNet V2 as base_model:
base_model = tf.keras.applications.MobileNetV2(input_shape=(IMG_SIZE, IMG_SIZE, 3),
base_model.trainable = False
Make my layers from base_model and make my new outputs.
# get layers from mobilenet base layer
mobilenet_input = base_model.get_layer('input_1')
mobilenet_output = base_model.get_layer('out_relu')
# add our average pooling layer to our MobileNetV2 output like all of our other classifiers so we split our graph on the same nodes
out_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name='embedding_pooling')(mobilenet_output.output)
out_global_pooling.trainable = False
# Our new branch and outputs for the branch
expanded_conv_depthwise_BN = base_model.get_layer('expanded_conv_depthwise_BN')
regression_dropout = tf.keras.layers.Dropout(0.5) (expanded_conv_depthwise_BN.output)
regression_global_pooling = tf.keras.layers.GlobalAveragePooling2D(name="regression_pooling")(regression_dropout)
new_regression_output = tf.keras.layers.Dense(num_labels, activation = 'sigmoid', name = "cinemanet_output") (regression_global_pooling)
This appears to be fine, and I can even make my model via the functional API:
model = tf.keras.Model(inputs=mobilenet_input.input, outputs=[out_global_pooling, new_regression_output])
My Training Code
My data set is a set of 30 floats (10 RGB duplets) I want to predict from an input image. My data set functions when training a 'sequence' model, but fails when I try to train this model.
tf.keras.backend.set_learning_phase(1) # 0 testing, 1 training mode
# preview contents of CSV to verify things are sane
import csv
import math
def lenopenreadlines(filename):
with open(filename) as f:
return len(f.readlines())
def csvheaderrow(filename):
with open(filename) as f:
reader = csv.reader(f)
return next(reader, None)
# !head {label_file}
NUM_IMAGES = ( lenopenreadlines(label_file) - 1) # remove header
COLUMN_NAMES = csvheaderrow(label_file)
# make our data set
FILE_PATH = ["filepath"]
print("Label contains: " + str(NUM_IMAGES) + " images")
print("Label Are: " + LABELS_TO_PRINT)
print("Creating Data Set From " + label_file)
csv_dataset = get_dataset(label_file, BATCH_SIZE, NUM_EPOCHS, COLUMN_NAMES)
#make a new data set from our csv by mapping every value to the above function
split_dataset =
# make a new datas set that loads our images from the first path
image_and_labels_ds =, num_parallel_calls=AUTOTUNE)
# update our image floating point range to match -1, 1
ds =
model = build_model(LABEL_NAMES, use_masked_loss)
#split the final data set into train / validation splits to use for our model.
ds = ds.repeat()
steps_per_epoch = int(math.floor(DATASET_SIZE/BATCH_SIZE))
history =, epochs=NUM_EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[TensorBoardColabCallback(tbc)])
# results = model.evaluate(test_dataset)
# print('test loss, test acc:', results)
export_model(model, model_name, LABEL_NAMES, date)
ValueError: Error when checking model target:
the list of Numpy arrays that you are passing to your model is not the size the model expected.
Expected to see 2 array(s), but instead got the following list of 1 arrays:
[<tf.Tensor 'IteratorGetNext:1' shape=(?, 30) dtype=float32>]
If I instead use a Sequence and naively try to train my regression task against final output of mobile net (rather than the branch) - training works fine (although I get poor results).
My Model summary appears to tell me things are wired as I expect. My dropout is connected to expanded_conv_depthwise_BN. My regression pooling is connected to my drop out and my output layer appears in the summary connected to my regressing pooling
Model: "model"
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(None, 224, 224, 3) 0
Conv1_pad (ZeroPadding2D) (None, 225, 225, 3) 0 input_1[0][0]
Conv1 (Conv2D) (None, 112, 112, 32) 864 Conv1_pad[0][0]
bn_Conv1 (BatchNormalization) (None, 112, 112, 32) 128 Conv1[0][0]
Conv1_relu (ReLU) (None, 112, 112, 32) 0 bn_Conv1[0][0]
expanded_conv_depthwise (Depthw (None, 112, 112, 32) 288 Conv1_relu[0][0]
expanded_conv_depthwise_BN (Bat (None, 112, 112, 32) 128 expanded_conv_depthwise[0][0]
expanded_conv_depthwise_relu (R (None, 112, 112, 32) 0 expanded_conv_depthwise_BN[0][0]
expanded_conv_project (Conv2D) (None, 112, 112, 16) 512 expanded_conv_depthwise_relu[0][0
< snip for brevity >
block_16_project (Conv2D) (None, 7, 7, 320) 307200 block_16_depthwise_relu[0][0]
block_16_project_BN (BatchNorma (None, 7, 7, 320) 1280 block_16_project[0][0]
Conv_1 (Conv2D) (None, 7, 7, 1280) 409600 block_16_project_BN[0][0]
Conv_1_bn (BatchNormalization) (None, 7, 7, 1280) 5120 Conv_1[0][0]
dropout (Dropout) (None, 112, 112, 32) 0 expanded_conv_depthwise_BN[0][0]
out_relu (ReLU) (None, 7, 7, 1280) 0 Conv_1_bn[0][0]
regression_pooling (GlobalAvera (None, 32) 0 dropout[0][0]
embedding_pooling (GlobalAverag (None, 1280) 0 out_relu[0][0]
cinemanet_output (Dense) (None, 30) 990 regression_pooling[0][0]
Total params: 2,258,974
Trainable params: 990
Non-trainable params: 2,257,984
It looks like you are setting things up correctly, but your training dataset doesn't include tensors for both outputs. If you only want to train the new output, you can provide dummy tensors (or even real training data) for the other one while using a loss weight of 0 to prevent the parameters from updating. That should also prevent any parameters that are not directly "upstream" of the new output layer from updating during training.
When compiling your model, use the argument loss_weights to pass the weights as either a list (e.g., loss_weights=[0, 1]) or a dictionary (e.g., loss_weights={'out_relu': 0, 'cinemanet_output': 1}).