Understanding the values of summary() (Output Shape, Param#)? - tensorflow

I'm checking the output of summary function and don't understand all the printed values.
For example, look on this simple code:
x = [1, 2, 3, 4, 5]
y = [1.2, 1.8, 3.5, 3.7, 5.3]
model = Sequential()
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(30, input_dim=1, activation='relu'))
model.add(Dense(10, input_dim=1, activation='relu'))
model.add(Dense(1))
model.summary()
The output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10) 20
_________________________________________________________________
dense_1 (Dense) (None, 30) 330
_________________________________________________________________
dense_2 (Dense) (None, 10) 310
_________________________________________________________________
dense_3 (Dense) (None, 1) 11
=================================================================
Total params: 671
Trainable params: 671
Non-trainable params: 0
_________________________________________________________________
Why is the meaning of None value , under the column Output Shape? what the None mean here?
Which network will not show None in the summary?
Why is the meaning of the Params # column ? How this value is calculated ?

The None is just a placeholder saying that the network can input more than one sample at the time. None means this dimension is variable. The first dimension in a keras model is always the batch size. ... That's why this dimension is often ignored when you define your model. For instance, when you define input_shape=(100,200) , actually you're ignoring the batch size and defining the shape of "each sample".
None won't show If you set a fixed batch. As an example, if you would send in a batch of 10 images your shape would be (10, 64, 64, 3) and if you changed it to 25 you would have (25, 64, 64, 3)
For the dense_1st layer , number of params is 20. This is obtained as
: 10 (input values) + 10 (bias values)
For dense_2nd layer, number of params is 330. This is obtained as : 10
(input values) * 30 (neurons in the second layer) + 30 (bias values
for neurons in the second layer)
For dense_3rd layer, number of params is 310. This is obtained as : 30
(input values) * 10 (neurons in the third layer) + 10 (bias values
for neurons in the third layer)
For final layer, number of params is 11. This is obtained as : 10 (input values) * 1 (neurons in the second layer) + 1 (bias values for neurons in the final layer)
total params = 20+330+310+11 = 671

Related

LSTM output unexpected predict shape

I want build a LSTM model to predict category label, bases on 60 days data
Basically:
Input - 60 days timewindow, 1 feature
- train data x (2571, 60, 1) y (2571, 1)
- test data x (60, 1), y (1)
Output - 1 label either 0 or 1
One thing I am not sure is, should I shape train/test x as (60,1) or (1, 60)
I made a LSTM network like:
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_15 (LSTM) (None, 60, 128) 66560
dropout_10 (Dropout) (None, 60, 128) 0
lstm_16 (LSTM) (None, 60, 64) 49408
dropout_11 (Dropout) (None, 60, 64) 0
lstm_17 (LSTM) (None, 16) 5184
dense_5 (Dense) (None, 1) 17
=================================================================
Total params: 121,169
Trainable params: 121,169
Non-trainable params: 0
_________________________________________________________________
here is my code:
lookback_time_win = 60
num_features = 1
model = Sequential()
model.add(LSTM(128, input_shape=(time_window_size, num_features), return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.1))
# no need return sequences from 'the last layer'
model.add(LSTM(units=16))
# adding the output layer
model.add(Dense(units=1, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
but after train, I call the function model.predict like:
y = model.predict(x_test)
instead of my expected 0 or 1, I get y with shape like (60, 1)
After some debugging, I suspect the root cause was because of my x shape was wrong.
Originally, my x test shape was(60, 1), after I reshape it to (1, 60), I get 1 output as y every time, shape (1). If I shape my test x as (60, 1), I get predicted y shape as (60,1)
But I get a new problem...
If I plot it together with my y_test, the y_predict is just in the middle.
My y_predict is completely making no sense, they are in very narrowed range from 0.45 to 0.447
If I take #Frightera's advise, using np.where(y_predicted_result>0.454, 1, 0) convert them into 0 or 1, it does not looks working, by comparing it with ground truth, no idea why it is like

Why does Tensor Flow add a dimension to my input & output?

Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.

How to compute number of weights of CNN?

How can we compute number of weights considering a convolutional neural network that is used to classify images into two classes :
INPUT: 100x100 gray-scale images.
LAYER 1: Convolutional layer with 60 7x7 convolutional filters (stride=1, valid
padding).
LAYER 2: Convolutional layer with 100 5x5 convolutional filters (stride=1, valid
padding).
LAYER 3: A max pooling layer that down-samples Layer 2 by a factor of 4 (e.g., from 500x500 to 250x250)
LAYER 4: Dense layer with 250 units
LAYER 5: Dense layer with 200 units
LAYER 6: Single output unit
Assume the existence of biases in each layer. Moreover, pooling layer has a weight (similar to AlexNet)
How many weights does this network have?
Some Keras code
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
# Layer 1
model.add(Conv2D(60, (7, 7), input_shape = (100, 100, 1), padding="same", activation="relu"))
# Layer 2
model.add(Conv2D(100, (5, 5), padding="same", activation="relu"))
# Layer 3
model.add(MaxPooling2D(pool_size=(2, 2)))
# Layer 4
model.add(Dense(250))
# Layer 5
model.add(Dense(200))
model.summary()
TL;DR - For TensorFlow + Keras
Use Sequential.summary - Link to documentation.
Example usage:
from tensorflow.keras.models import *
model = Sequential([
# Your architecture here
]);
model.summary()
The output for your architecture is:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 94, 94, 60) 3000
_________________________________________________________________
conv2d_1 (Conv2D) (None, 90, 90, 100) 150100
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 45, 45, 100) 0
_________________________________________________________________
flatten (Flatten) (None, 202500) 0
_________________________________________________________________
dense (Dense) (None, 250) 50625250
_________________________________________________________________
dense_1 (Dense) (None, 200) 50200
_________________________________________________________________
dense_2 (Dense) (None, 1) 201
=================================================================
Total params: 50,828,751
Trainable params: 50,828,751
Non-trainable params: 0
_________________________________________________________________
That's 50,828,751 parameters.
Explanation
Number of weights in a 2D Convolutional layer
For a 2D Convolutional layer having
num_filters filters,
a filter size of filter_size * filter_size * num_channels,
and a bias parameter per filter
The number of weights is: (num_filters * filter_size * filter_size * num_channels) + num_filters
E.g.: LAYER 1 in your neural network has
60 filters
and a filter size of 7 * 7 * 1. (Notice that the number of channels (1) comes from the input image.)
The number of weights in it is: (60 * 7 * 7 * 1) + 60, which is 3000.
Number of weights in a Dense layer
For a Dense layer having
num_units neurons,
num_inputs neurons in the layer prior to it,
and a bias parameter per neuron
The number of weights is: (num_units * num_inputs) + num_units
E.g. LAYER 5 in your neural network has
200 neurons
and the layer prior to it - LAYER 4 - has 250 neurons.
The number of weights in it is 200 * 250, which is 50200.

Variable Number of channels

I need a convolutional layer which outputs a variable number of channels, depending on the input.
conv2d(filters = variable_number)
A model cannot have varying number of filters depending on the input. The model need to have below arguments to be fixed for it to be trained.
Name and type of all layers in the model.
Output shape for each layer.
Number of weight parameters of each layer.
The inputs each layer receives.
The total number of trainable and non-trainable parameters of the model.
If you have varying number of channels, then the model architecture is changing for different input and thus all the above listed points get impacted.
You can build a model with all the fixed parameters and later use dropout for the layer based on the input. But again the dropout is a regularization technique, Simply put, dropout refers to ignoring units (i.e. neurons) during the training phase of certain set of neurons which is chosen at random. By “ignoring”, I mean these units are not considered during a particular forward or backward pass.
OR
The most appropriate solution would be -
Build multiple input layer for different inputs.
Concatenate all these layers, but make sure the output shape of all these layers are same in case of Convolution layers else concatenate throws error.
Add the remaining layers of the model.
Below is an example for this -
from keras.models import Model
from keras.layers import Input, concatenate, Conv2D, ZeroPadding2D
from keras.optimizers import Adagrad
import tensorflow.keras.backend as K
import tensorflow as tf
input_img1 = Input(shape=(44,44,3))
x1 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img1)
input_img2 = Input(shape=(34,34,3))
x2 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img2)
# Zero Padding of 5 at the top, bottom, left and right side of an image tensor
x3 = ZeroPadding2D(padding = (5,5))(x2)
# Concatenate works as layers have same size output
x4 = concatenate([x1,x3])
output = Dense(18, activation='relu')(x4)
model = Model(inputs=[input_img1,input_img2], outputs=output)
model.summary()
Output -
Model: "model_22"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_91 (InputLayer) (None, 34, 34, 3) 0
__________________________________________________________________________________________________
input_90 (InputLayer) (None, 44, 44, 3) 0
__________________________________________________________________________________________________
conv2d_73 (Conv2D) (None, 34, 34, 3) 84 input_91[0][0]
__________________________________________________________________________________________________
conv2d_72 (Conv2D) (None, 44, 44, 3) 84 input_90[0][0]
__________________________________________________________________________________________________
zero_padding2d_14 (ZeroPadding2 (None, 44, 44, 3) 0 conv2d_73[0][0]
__________________________________________________________________________________________________
concatenate_30 (Concatenate) (None, 44, 44, 6) 0 conv2d_72[0][0]
zero_padding2d_14[0][0]
__________________________________________________________________________________________________
dense_47 (Dense) (None, 44, 44, 18) 126 concatenate_30[0][0]
==================================================================================================
Total params: 294
Trainable params: 294
Non-trainable params: 0
__________________________________________________________________________________________________
Hope this answers you question. Happy Learning.

Understanding Keras model architecture (tensor index)

This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
print(model.summary())
yields the following output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
A_input (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
B_dense (Dense) (None, 20) 2020 A_input[0][0]
____________________________________________________________________________________________________
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
B_dense[0][0]
____________________________________________________________________________________________________
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
C_dense_shared[1][0]
====================================================================================================
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
layer_name[node_index][tensor_index]
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_7 (InputLayer) (None, 10) 0
____________________________________________________________________________________________________
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
____________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
lambda_7[0][1]
lambda_7[0][2]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
====================================================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
____________________________________________________________________________________________________