Difference of calling the Keras pretrained model without including top layers - tensorflow

What is the difference of calling the VGG16 model with or without including top layers of the model? I wonder, why the input parameters to the layers are not shown in the model summary when the model is called without including the top layers. I used the VGG16 model in the following two ways:
from keras.applications import vgg16
model = vgg16.VGG16(weights='imagenet', include_top=False)
print(model.summary)
The shape of the layers in the model does not show any inputs i.e.(None, None, None,64), please see below
Layer (type) Output Shape Param
===================================================================
block1_conv1 (Conv2D) (None, None, None, 64) 1792
block1_conv2 (Conv2D) (None, None, None, 64) 36928
block1_pool (MaxPooling2D) (None, None, None, 64) 0
However, the following code returns the input parameters
from keras.applications import vgg16
model = vgg16.VGG16()
print(model.summary)
The shape of the layers, in this case, return the input parameters
Layer (type) Output Shape Param
==================================================================
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
I seek to understand why it is like this, Please comment

The top layers of VGG are fully-connected layers which are connected to the output of the convolutional base. These contain a fixed number of nodes with the option to instantiate them with weights pretrained on imagenet. When instantiating a VGG model with the top layers included, the size of the architecture is therefore fixed, and the model will only accept images with a fixed input size of (224,224,3). Feeding the network with images of other sizes would change the amount of weights in the dense classification layers.
When you leave out the top classifier however, you'll be able to feed images of varying size to the network, and the output of the convolutional stack will change accordingly. In this way, you can apply the VGG architecture to images of your size of choice, and paste your own densely connected classifier on top of it. In contrast with the dense layers, the number of weights in the convolutional layers stay the same, only the shape of their output changes.
You will notice all this when you instantiate a VGG model without the top layer, but with a specific input shape:
from keras.applications import vgg16
model = vgg16.VGG16(include_top=False, input_shape=(100,100,3))
model.summary()
Will produce:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) (None, 100, 100, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 100, 100, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 100, 100, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 50, 50, 64) 0
_____________________________________________________________
etc.
It's interesting to see how the output shape of the convolutional layers change as you call the architecture with different input shapes. For the above examples, we get:
block5_conv3 (Conv2D) (None, 6, 6, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 3, 3, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
While if you would instantiate the architecture with images of shape (400,400,3), you would get this output:
_________________________________________________________________
block5_conv3 (Conv2D) (None, 25, 25, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 12, 12, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
Note how the number of weights remains the same in both cases.

Related

Keras model shape incompatible / ValueError: Shapes (None, 3) and (None, 3, 3) are incompatible

I'm trying to train my keras model but shapes are incompatible.
The error says
ValueError: Shapes (None, 3) and (None, 3, 3) are incompatible
My train set's shape is (2000, 3, 768) and lable's shape is (2000, 3).
What is the wrong the point?
Model define & fit code
input_shape = x_train.shape[1:]
model = my_dnn(input_shape, 3)
model.fit(x_train, y_train, epochs=25, verbose=1)
Model code
def my_dnn(input, num_classes):
model = Sequential()
model.add(tf.keras.Input(input))
model.add(Dense(1024))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dense(225))
model.add(Activation('relu'))
model.add(Dense(100))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
model.compile( loss='categorical_crossentropy',
optimizer='adam',
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
return model
In addition to what's said, it seems you are carrying the second dimension of the input data until the end of the model. So your model summary is something like this:
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 3, 1024) 787456
_________________________________________________________________
activation_1 (Activation) (None, 3, 1024) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 3, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 3, 512) 524800
_________________________________________________________________
activation_2 (Activation) (None, 3, 512) 0
_________________________________________________________________
dense_3 (Dense) (None, 3, 225) 115425
_________________________________________________________________
activation_3 (Activation) (None, 3, 225) 0
_________________________________________________________________
dense_4 (Dense) (None, 3, 100) 22600
_________________________________________________________________
activation_4 (Activation) (None, 3, 100) 0
_________________________________________________________________
dense_5 (Dense) (None, 3, 3) 303
_________________________________________________________________
activation_5 (Activation) (None, 3, 3) 0
=================================================================
Total params: 1,450,584
Trainable params: 1,450,584
Non-trainable params: 0
As you can see, the output shape of the model (None, 3, 3) is not compatible with the label's shape (None, 3), and at some point, you need to use a Flatten layer.
There are two possible reasons:
Your problem is multi-class classification, hence you need softmax instead of sigmoid + accuracy or CategoricalAccuracy() as a metric.
Your problem is multi-label classification, hence you need binary_crossentropy and tf.keras.metrics.BinaryAccuracy()
Depending on how your dataset is built/the task you are trying to solve, you need to opt for one of those.
For case 1, ensure your data is OHE(one-hot encoded).
Also, Marco Cerliani and Amir (in the comment below) point out that the data output needs to be in a 2D format rather than 3D : you should either preprocess the data accordingly before feeding it to the network or use, as suggested in the comment below, a Flatten() at a point (probably before the final Dense())

Shape of the LSTM layers in multilayer LSTM model

model = tf.keras.Sequential([tf.keras.layers.Embedding(tokenizer.vocab_size, 64),tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True))
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
The second layer has 64 hidden units and since the return_sequences=True, it will output 64 sequences as well. But how can it be fed to a 32 hidden units LSTM. Won't it cause shape mismatch error?
Actually no, it won't cause it. First of all the second layer won't have the output shape of 64, but instead of 128. This is because you are using Bidirectional layer, it will be concatenated by a forward and backward pass and so you output will be (None, None, 64+64=128). You can refer to the link.
The RNN data is shaped in the following was (Batch_size, time_steps, number_of_features). This means when you try to connect two layer with different neurons the features increased or decreased based on the number of neurons.You can follow the particular link for more details.
And for your particular code this is how the model summary will look like. So to answer in short their won't be a mismatch.
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 64) 32000
_________________________________________________________________
bidirectional (Bidirectional (None, None, 128) 66048
_________________________________________________________________
bidirectional_1 (Bidirection (None, 64) 41216
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dense_3 (Dense) (None, 1) 65
=================================================================
Total params: 143,489
Trainable params: 143,489
Non-trainable params: 0
_________________________________________________________________

Select size for output vector with 1000s of labels

Most of the examples on the Internet regarding multi-label image classification are based on just a few labels. For example, with 6 classes we get:
model = models.Sequential()
model.add(layer=base)
model.add(layer=layers.Flatten())
model.add(layer=layers.Dense(units=256, activation="relu"))
model.add(layer=layers.Dense(units=6, activation="sigmoid"))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 7, 7, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 6422784
_________________________________________________________________
dense_2 (Dense) (None, 6) 1542
=================================================================
Total params: 21,139,014
Trainable params: 13,503,750
Non-trainable params: 7,635,264
However, for datasets with significantly more labels, the size of the training parameters explodes and eventually training process fails with a ResourceExhaustedError error. For example, with 3047 label we get:
model = models.Sequential()
model.add(layer=base)
model.add(layer=layers.Flatten())
model.add(layer=layers.Dense(units=256, activation="relu"))
model.add(layer=layers.Dense(units=3047, activation="sigmoid"))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Model) (None, 7, 7, 512) 14714688
_________________________________________________________________
flatten_1 (Flatten) (None, 25088) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 6422784
_________________________________________________________________
dense_2 (Dense) (None, 3047) 783079
=================================================================
Total params: 21,920,551
Trainable params: 14,285,287
Non-trainable params: 7,635,264
_________________________________________________________________
Obviously, there is something wrong with my network but not sure how to overcome this issue...
Resource Exhauseted Error is related to memory issues. Either you don't have enough memory in your system or some other part of the code is causing memory issues.

LSTM Network not learning from sequences. Underfiting or Overfitting using Keras, TF backend

Thanks in advance for your help.
I am working in a problem with sequences of 4 characters. I have around 18.000 sequences in the training set. Working with Keras+TensorFlow backend. The total number of possible characters to predict is 52.
When I use a network like you see below in "Network A" with around 490K parameters to learn, the network tremendously overfit and the validation loss increases like crazy even in 300 epochs. Either way, the validation accuracy does not go up to 20%.
When I use "Network B" below, with around 8K parameters to learn, the network does not seems to learn. Accuracy does not go over 40% even in 3000 epochs for the training data and around 10% for validation set..
I have tried lots of configurations in the middle without any real success.
Do you have any recommendation?
Both cases using the following config:
rms = keras.optimizers.RMSprop(lr=0.01, rho=0.9, epsilon=None, decay=0.0)
model.compile(loss='categorical_crossentropy', optimizer=rms, metrics=['accuracy'])
Network A
Shape of input matrix:
4 1
Shape of Output:
57
Layer (type) Output Shape Param #
=================================================================
lstm_3 (LSTM) (None, 4, 256) 264192
_________________________________________________________________
dropout_2 (Dropout) (None, 4, 256) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 4, 128) 197120
_________________________________________________________________
dropout_3 (Dropout) (None, 4, 128) 0
_________________________________________________________________
lstm_5 (LSTM) (None, 32) 20608
_________________________________________________________________
dense_1 (Dense) (None, 128) 4224
_________________________________________________________________
dropout_4 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 57) 7353
_________________________________________________________________
activation_1 (Activation) (None, 57) 0
=================================================================
Total params: 493,497
Trainable params: 493,497
Non-trainable params: 0
"Network B"
Shape of input matrix:
4 1
Shape of Output:
57
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_6 (LSTM) (None, 4, 32) 4352
_________________________________________________________________
dropout_5 (Dropout) (None, 4, 32) 0
_________________________________________________________________
lstm_7 (LSTM) (None, 16) 3136
_________________________________________________________________
dropout_6 (Dropout) (None, 16) 0
_________________________________________________________________
dense_3 (Dense) (None, 57) 969
_________________________________________________________________
activation_2 (Activation) (None, 57) 0
=================================================================
Total params: 8,457
Trainable params: 8,457
Non-trainable params: 0
I can see that your input shape is "4x1" and you feed that directly that to your LSTM, what is the format of your input ? Because here it seems that at each timestep (for each character) you have a dimension of 1 (so maybe you just passed an int ?).
As you said you are dealing with sequence of 4 characters, you have to treat them as categorical variables and encode them in a proper way.
You could for example one-hot encode them, or embed them using an EmbeddingLayer to a certain dimension.

How to optimize keras model with batchnorm layers with Intel inference engine (OpenVINO)?

Failed to optimize keras model with Intel inference engine (OpenVINO toolkit R.5)
I freeze my model just like following tutorial suggests. The keras model is trained and tested. I need to optimize it for inference.
However I get an error while running model optimizer (mo.py script) on custom model.
[ ERROR ] shapes (128,9) and (0,) not aligned: 9 (dim 1) != 0 (dim 0)
Last few layers of my model (9 is number of output of classes) are:
conv2d_4 (Conv2D) (None, 4, 4, 128) 204928 batch_normalization_3[0][0]
__________________________________________________________________________________________________
activation_4 (Activation) (None, 4, 4, 128) 0 conv2d_4[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 4, 4, 128) 512 activation_4[0][0]
__________________________________________________________________________________________________
average_pooling2d_2 (AveragePoo (None, 1, 1, 128) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 1, 1, 128) 0 average_pooling2d_2[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 128) 0 dropout_2[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 128) 16512 flatten[0][0]
__________________________________________________________________________________________________
activation_5 (Activation) (None, 128) 0 dense[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 128) 512 activation_5[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 128) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 9) 1161 dropout_3[0][0]
__________________________________________________________________________________________________
color_prediction (Activation) (None, 9) 0 dense_1[0][0]
__________________________________________________________________________________________________
Model optimizer fails due to presence of BatchNormalization layers. When I remove them it runs successfully. However I freeze graph with
tf.keras.backend.set_learning_phase(0)
So nodes like BatchNormalization and Dropout must be removed in freezed graph, I can't figure out why they don't removed.
Thanks a lot!
I managed to run OpenVINO model optimizer on Keras model with Batch Normalization layers. The model also seemed to converge little faster. Though test classification rate was lower for about 5-7% (and a gap between classification rate on testing and training datasets was bigger) than one of the model without BN. I am not sure if BatchNormalization is properly removed from model in my solution (but openVINO model file doesn't include one so it's removed).
Remove BN and Dropout layers:
#Clear any previous session.
tf.keras.backend.clear_session()
#This line must be executed before loading Keras model.
tf.keras.backend.set_learning_phase(0)
model = tf.keras.models.load_model(weights_path)
for layer in model.layers:
layer.training = False
if isinstance(layer, tf.keras.layers.BatchNormalization):
layer._per_input_updates = {}
elif isinstance(layer, tf.keras.layers.Dropout):
layer._per_input_updates = {}
And than freeze session:
def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
"""
Freezes the state of a session into a pruned computation graph.
Creates a new computation graph where variable nodes are replaced by
constants taking their current value in the session. The new graph will be
pruned so subgraphs that are not necessary to compute the requested
outputs are removed.
#param session The TensorFlow session to be frozen.
#param keep_var_names A list of variable names that should not be frozen,
or None to freeze all the variables in the graph.
#param output_names Names of the relevant graph outputs.
#param clear_devices Remove the device directives from the graph for better portability.
#return The frozen graph definition.
"""
from tensorflow.python.framework.graph_util import convert_variables_to_constants
graph = session.graph
with graph.as_default():
freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
output_names = output_names or []
output_names += [v.op.name for v in tf.global_variables()]
# Graph -> GraphDef ProtoBuf
input_graph_def = graph.as_graph_def()
if clear_devices:
for node in input_graph_def.node:
node.device = ""
frozen_graph = convert_variables_to_constants(session, input_graph_def,
output_names, freeze_var_names)
return frozen_graph