How to read Keras's model structure? - tensorflow

For example:
BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(train_dataset))
test_dataset = test_dataset.padded_batch(BATCH_SIZE, tf.compat.v1.data.get_output_shapes(test_dataset))
def pad_to_size(vec, size):
zeros = [0] * (size - len(vec))
vec.extend(zeros)
return vec
...
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=False)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
print(model.summary())
The print reads as:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 64) 523840
_________________________________________________________________
bidirectional (Bidirectional (None, 128) 66048
_________________________________________________________________
dense (Dense) (None, 64) 8256
_________________________________________________________________
dense_1 (Dense) (None, 1) 65
=================================================================
Total params: 598,209
Trainable params: 598,209
Non-trainable params: 0
I have the following question:
1) For the embedding layer, why is the ouput shape is (None, None, 64). I understand '64' is the vector length. Why are the other two None?
2) How is the output shape of bidirectional layer is (None, 128)? Why is it 128?

For the embedding layer, why is the ouput shape is (None, None, 64). I understand '64' is the vector length. Why are the other two None?
You can see this function produces (None,None) (including the batch dimension) (in other words it does input_shape=(None,) as default) if you don't define the input_shape to the first layer of the Sequential model.
If you pass in an input tensor of size (None, None) to an embedding layer, it produces a (None, None, 64) tensor assuming embedding dimension is 64. The first None is the batch dimension and the second is the time dimension (refers to the input_length parameter). So that's why you get a (None, None, 64) sized output.
How is the output shape of bidirectional layer is (None, 128)? Why is it 128?
Here, you have a Bidirectional LSTM. Your LSTM layer produces a (None, 64) sized output (when return_sequences=False). When you have a Bidirectional layer it is like having two LSTM layers (one going forward, other going backwards). And you have a default merge_mode of concat meaning that the two output states from forward and backward layers will be concatenated. This gives you a (None, 128) sized output.

Related

ValueError: Input 0 of layer sequential_40 is incompatible with the layer: expected min_ndim=3, found ndim=2. Full shape received: (None, 58)

I am working on a dataset about student performance in a course, and I want to predict student level (low, mid, high) according to their previous year's marks. I'm using a CNN for this purpose, but when I build and fit the model I get this error:
ValueError: Input 0 of layer sequential_40 is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (None, 58)
This is the code:
#reshaping data
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1]))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1]))
#checking the shape after reshaping
print(X_train.shape)
print(X_test.shape)
#normalizing the pixel values
X_train=X_train/255
X_test=X_test/255
#defining model
model=Sequential()
#adding convolution layer
model.add(Conv1D(32,3, activation='relu',input_shape=(28,1)))
#adding pooling layer
model.add(MaxPool1D(pool_size=2))
#adding fully connected layer
model.add(Flatten())
model.add(Dense(100,activation='relu'))
#adding output layer
model.add(Dense(10,activation='softmax'))
#compiling the model
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()
#fitting the model
model.fit(X_train,y_train,epochs=10)
This is the output:
Model: "sequential_40"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_23 (Conv1D) (None, 9, 32) 128
_________________________________________________________________
max_pooling1d_19 (MaxPooling (None, 4, 32) 0
_________________________________________________________________
flatten_15 (Flatten) (None, 128) 0
_________________________________________________________________
dense_30 (Dense) (None, 100) 12900
_________________________________________________________________
dense_31 (Dense) (None, 10) 1010
=================================================================
Total params: 14,038
Trainable params: 14,038
Non-trainable params: 0

How does model.weights in tensorflow/keras work?

I have a model trained.
summary is as follows
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 256) 2560
dense_1 (Dense) (None, 128) 32896
dropout (Dropout) (None, 128) 0
dense_2 (Dense) (None, 1) 129
=================================================================
Total params: 35,585
Trainable params: 35,585
Non-trainable params: 0
_________________________________________________________________
And have weights
for i,weight in enumerate(Model.weights):
exec('w{}=np.array(weight)'.format(i))
have test data for predict
x=test_data.iloc[0]
then I predict with model
Model.predict(np.array(x).reshape(1,9))
get array([[226241.66]], dtype=float32)
then I predict with weights
((x#w0+w1)#w2+w3)#w4+w5
get array([98039.99664026])
Can someone explain how the weights in model works?
And how to get the model-predict result with weights?
Try Model.layers which will return a list of all layers in your model, each layer has a function get_weights() which will return the weights as numpy arrays. I was able to reproduce the output of a simple 3 layer feed-forward model with this approach.
for i,layer in enumerate(model.layers):
exec('w{}=np.array(layer.get_weights()[0])'.format(i)) # weight
exec('b{}=np.array(layer.get_weights()[1])'.format(i)) # bias
X = np.random.randn(1,9)
np.allclose(((X#w1[0] + b1[1])#w2[0] + b2[1])#w4[0] + b4[1], model.predict(X)) # True
Note: In my examle layer 0 was a input layer (no weights) and layer 3 a dropout layer (no weights). When calling model.predict(), dropout is not applied, therefore you can ignore it in this case.

Grad-CAM in keras, ValueError: Graph disconnected: cannot obtain value for tensor Tensor "input_11_6:0", shape=(None, 150, 150, 3)

How to perform Grad-CAM on pretrained custom model.
How to select last_conv_layer_name and classifier_layer_names?
What is its significances and how to select layers' names?
Should I consider Densenet121 sublayers or densenet as one functional layer?
How to perform Grad-CAM for this trained network?
These are the steps I tried,
#load model and custom metrics
dependencies = {'recall_m': recall_m, 'precision_m' : precision_m, 'f1_m' : f1_m }
model = keras.models.load_model("model_val_acc-73.33.h5", custom_objects = dependencies)
model.summary()
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
densenet121 (Functional) (None, 4, 4, 1024) 7037504
_________________________________________________________________
flatten (Flatten) (None, 16384) 0
_________________________________________________________________
dense_encoder (Dense) (None, 1024) 16778240
_________________________________________________________________
dropout_51 (Dropout) (None, 1024) 0
_________________________________________________________________
dense_2 (Dense) (None, 256) 262400
_________________________________________________________________
dropout_52 (Dropout) (None, 256) 0
_________________________________________________________________
dense_3 (Dense) (None, 128) 32896
_________________________________________________________________
dropout_53 (Dropout) (None, 128) 0
_________________________________________________________________
dense_4 (Dense) (None, 64) 8256
_________________________________________________________________
dropout_54 (Dropout) (None, 64) 0
_________________________________________________________________
dense_5 (Dense) (None, 32) 2080
_________________________________________________________________
dropout_55 (Dropout) (None, 32) 0
_________________________________________________________________
Final (Dense) (None, 2) 66
=================================================================
Total params: 24,121,442
Trainable params: 17,083,938
Non-trainable params: 7,037,504
This is the heat map function:-
###defining heat map
def make_gradcam_heatmap(img_array, model, last_conv_layer_name, classifier_layer_names):
# First, we create a model that maps the input image to the activations
# of the last conv layer
last_conv_layer = model.get_layer(last_conv_layer_name)
last_conv_layer_model = keras.Model(model.inputs, last_conv_layer.output)
# Second, we create a model that maps the activations of the last conv
# layer to the final class predictions
classifier_input = keras.Input(shape=last_conv_layer.output.shape[1:])
x = classifier_input
for layer_name in classifier_layer_names:
x = model.get_layer(layer_name)(x)
classifier_model = keras.Model(classifier_input, x)
# Then, we compute the gradient of the top predicted class for our input image
# with respect to the activations of the last conv layer
with tf.GradientTape() as tape:
# Compute activations of the last conv layer and make the tape watch it
last_conv_layer_output = last_conv_layer_model(img_array)
tape.watch(last_conv_layer_output)
# Compute class predictions
preds = classifier_model(last_conv_layer_output)
top_pred_index = tf.argmax(preds[0])
top_class_channel = preds[:, top_pred_index]
# This is the gradient of the top predicted class with regard to
# the output feature map of the last conv layer
grads = tape.gradient(top_class_channel, last_conv_layer_output)
# This is a vector where each entry is the mean intensity of the gradient
# over a specific feature map channel
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the top predicted class
last_conv_layer_output = last_conv_layer_output.numpy()[0]
pooled_grads = pooled_grads.numpy()
for i in range(pooled_grads.shape[-1]):
last_conv_layer_output[:, :, i] *= pooled_grads[i]
# The channel-wise mean of the resulting feature map
# is our heatmap of class activation
heatmap = np.mean(last_conv_layer_output, axis=-1)
# For visualization purpose, we will also normalize the heatmap between 0 & 1
heatmap = np.maximum(heatmap, 0) / np.max(heatmap)
return heatmap
this is an image input
img_array = X_test[10] # 10th image sample
X_test[10].shape
#(150, 150, 3)
last_conv_layer_name = "densenet121"
classifier_layer_names = [ "dense_2", "dense_3", "dense_4", "dense_5", "Final" ]
# Generate class activation heatmap
heatmap = make_gradcam_heatmap(
img_array, model, last_conv_layer_name, classifier_layer_names
) ####===> (I'm getting error here, in this line)
So what is wrong with last_conv_layer_name and classifier_layer_names.
Can anyone please explain this?

Shape of the LSTM layers in multilayer LSTM model

model = tf.keras.Sequential([tf.keras.layers.Embedding(tokenizer.vocab_size, 64),tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64,return_sequences=True))
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
The second layer has 64 hidden units and since the return_sequences=True, it will output 64 sequences as well. But how can it be fed to a 32 hidden units LSTM. Won't it cause shape mismatch error?
Actually no, it won't cause it. First of all the second layer won't have the output shape of 64, but instead of 128. This is because you are using Bidirectional layer, it will be concatenated by a forward and backward pass and so you output will be (None, None, 64+64=128). You can refer to the link.
The RNN data is shaped in the following was (Batch_size, time_steps, number_of_features). This means when you try to connect two layer with different neurons the features increased or decreased based on the number of neurons.You can follow the particular link for more details.
And for your particular code this is how the model summary will look like. So to answer in short their won't be a mismatch.
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (None, None, 64) 32000
_________________________________________________________________
bidirectional (Bidirectional (None, None, 128) 66048
_________________________________________________________________
bidirectional_1 (Bidirection (None, 64) 41216
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dense_3 (Dense) (None, 1) 65
=================================================================
Total params: 143,489
Trainable params: 143,489
Non-trainable params: 0
_________________________________________________________________

Understanding Keras model architecture (tensor index)

This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
print(model.summary())
yields the following output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
A_input (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
B_dense (Dense) (None, 20) 2020 A_input[0][0]
____________________________________________________________________________________________________
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
B_dense[0][0]
____________________________________________________________________________________________________
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
C_dense_shared[1][0]
====================================================================================================
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
layer_name[node_index][tensor_index]
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_7 (InputLayer) (None, 10) 0
____________________________________________________________________________________________________
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
____________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
lambda_7[0][1]
lambda_7[0][2]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
====================================================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
____________________________________________________________________________________________________