How to compute number of weights of CNN? - tensorflow

How can we compute number of weights considering a convolutional neural network that is used to classify images into two classes :
INPUT: 100x100 gray-scale images.
LAYER 1: Convolutional layer with 60 7x7 convolutional filters (stride=1, valid
padding).
LAYER 2: Convolutional layer with 100 5x5 convolutional filters (stride=1, valid
padding).
LAYER 3: A max pooling layer that down-samples Layer 2 by a factor of 4 (e.g., from 500x500 to 250x250)
LAYER 4: Dense layer with 250 units
LAYER 5: Dense layer with 200 units
LAYER 6: Single output unit
Assume the existence of biases in each layer. Moreover, pooling layer has a weight (similar to AlexNet)
How many weights does this network have?
Some Keras code
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Conv2D, MaxPooling2D
model = Sequential()
# Layer 1
model.add(Conv2D(60, (7, 7), input_shape = (100, 100, 1), padding="same", activation="relu"))
# Layer 2
model.add(Conv2D(100, (5, 5), padding="same", activation="relu"))
# Layer 3
model.add(MaxPooling2D(pool_size=(2, 2)))
# Layer 4
model.add(Dense(250))
# Layer 5
model.add(Dense(200))
model.summary()

TL;DR - For TensorFlow + Keras
Use Sequential.summary - Link to documentation.
Example usage:
from tensorflow.keras.models import *
model = Sequential([
# Your architecture here
]);
model.summary()
The output for your architecture is:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 94, 94, 60) 3000
_________________________________________________________________
conv2d_1 (Conv2D) (None, 90, 90, 100) 150100
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 45, 45, 100) 0
_________________________________________________________________
flatten (Flatten) (None, 202500) 0
_________________________________________________________________
dense (Dense) (None, 250) 50625250
_________________________________________________________________
dense_1 (Dense) (None, 200) 50200
_________________________________________________________________
dense_2 (Dense) (None, 1) 201
=================================================================
Total params: 50,828,751
Trainable params: 50,828,751
Non-trainable params: 0
_________________________________________________________________
That's 50,828,751 parameters.
Explanation
Number of weights in a 2D Convolutional layer
For a 2D Convolutional layer having
num_filters filters,
a filter size of filter_size * filter_size * num_channels,
and a bias parameter per filter
The number of weights is: (num_filters * filter_size * filter_size * num_channels) + num_filters
E.g.: LAYER 1 in your neural network has
60 filters
and a filter size of 7 * 7 * 1. (Notice that the number of channels (1) comes from the input image.)
The number of weights in it is: (60 * 7 * 7 * 1) + 60, which is 3000.
Number of weights in a Dense layer
For a Dense layer having
num_units neurons,
num_inputs neurons in the layer prior to it,
and a bias parameter per neuron
The number of weights is: (num_units * num_inputs) + num_units
E.g. LAYER 5 in your neural network has
200 neurons
and the layer prior to it - LAYER 4 - has 250 neurons.
The number of weights in it is 200 * 250, which is 50200.

Related

Why does Tensor Flow add a dimension to my input & output?

Here is my code:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow import keras
TFDataType = tf.float16
XTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
YTrain = tf.cast(tf.ones((10,10)), dtype=TFDataType)
model = tf.keras.models.Sequential()
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
model.add(layers.Dense(1, dtype=TFDataType, input_shape=(10, 10)))
print(model.summary())
I am feeding it a 2 dimensional matrix. But when I see the model summary, I see:
Model: "sequential"
_________________________________________________________________
2021-08-23 13:32:18.716788: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-TLG9US3
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 10, 1) 11
_________________________________________________________________
dense_1 (Dense) (None, 10, 2) 4
=================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________
Why is the model asking for a 3 Dimensional (None, 10, 1) array?
How do I pass an array that meets the dimensionality of (None, 10, 1)?
I cannot call numpy.ones(None, 10, 1). I cannot reshape the array with -1 in the first dimension.
In your first layer the code input_shape=(10, 10) adds the extra dimension to account for the batch size of the data. Note you only need this code for the FIRST layer in your model so remove input_shape=(10, 10) in your second layer.

Dimension of output in Dense layer Keras

I have the sample following model
from tensorflow.keras import models
from tensorflow.keras import layers
sample_model = models.Sequential()
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.compile(loss="binary_crossentropy",
optimizer="adam", metrics = ["accuracy"])
IP for the model:
sam_x = np.random.rand(10,4)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
The confusion is the fit should have thrown an error of shape mismatch as the expected_input_shape for the 2nd Dense Layer is given as (None,44) but the output for the 1st Dense Layer (which is the input of the 2nd Dense Layer) will be of shape (None,32). But it ran successfully.
I don't understand why there was no error. Any clarifications will be helpful
The input_shape keyword argument has an effect only on the first layer of a Sequential. The shape of the input of the other layers will be derived from their previous layer.
That behaviour is hinted in the doc of tf.keras.layers.InputShape:
When using InputLayer with Keras Sequential model, it can be skipped by moving the input_shape parameter to the first layer after the InputLayer.
And in the Sequential Model guide.
The behaviour can be confirmed by looking at the source of the Sequential.add method:
if not self._layers:
if isinstance(layer, input_layer.InputLayer):
# Case where the user passes an Input or InputLayer layer via `add`.
set_inputs = True
else:
batch_shape, dtype = training_utils.get_input_shape_and_dtype(layer)
if batch_shape:
# Instantiate an input layer.
x = input_layer.Input(
batch_shape=batch_shape, dtype=dtype, name=layer.name + '_input')
# This will build the current layer
# and create the node connecting the current layer
# to the input layer we just created.
layer(x)
set_inputs = True
If there is no layers yet in the model, then an Input will be added to the model with the shape derived from the first layer of the model. This is done only if no layer is present yet in the model.
That shape is either fully known (if input_shape has been passed to the first layer of the model) or will be fully known once the model is built (for example, with a call to model.build(input_shape)).
The thing is after checking the input shape of the model from the first layer, it won't check or deal with other declared input shape inside that same model. For example, if you write your model the following way
sample_model.add(layers.Dense(32, input_shape=(4,)))
sample_model.add(layers.Dense(16, input_shape = (44,)))
sample_model.add(layers.Dense(8, input_shape = (32,)))
The program will always check the first declared input shape layer and discard the rest. So, if you start your first layer with input_shape = (44,), you need to pass exact feature numbers to your model as input such as:
sam_x = np.random.rand(10,44)
sam_y = np.array([0,1,1,0,1,0,0,1,0,1,])
sample_model.fit(sam_x,sam_y)
Additionally, if you look at the Functional API, unlike the Sequential model, you must create and define a standalone Input layer that specifies the shape of input data. It's not learnable but simply a spec layer. It's a kind of gateway of the input data for the model. That means even if we define input_shape inside the other layers, they all will be discarded. For example:
nputs = keras.Input(shape=(4,))
dense = layers.Dense(64, input_shape=(8,)) # dicard input_shape
x = dense(inputs)
x = layers.Dense(64, input_shape=(16,))(x) # dicard input_shape
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs, name="mnist_model")
Here is a more complex example with Conv2D and MNIST.
encoder_input = keras.Input(shape=(28, 28, 1),)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[32,32,3])(encoder_input)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[64,64,3])(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation="relu", input_shape=[224,321,3])(x)
x = layers.Conv2D(16, 3, activation="relu", input_shape=[420,32,3])(x)
x = layers.GlobalMaxPooling2D()(x)
out = layers.Dense(10, activation='softmax')(x)
encoder = keras.Model(encoder_input, out, name="encoder")
encoder.summary()
Model: "encoder"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_15 (InputLayer) [(None, 28, 28, 1)] 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 26, 26, 16) 160
_________________________________________________________________
conv2d_9 (Conv2D) (None, 24, 24, 32) 4640
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 32) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 6, 6, 32) 9248
_________________________________________________________________
conv2d_11 (Conv2D) (None, 4, 4, 16) 4624
_________________________________________________________________
global_max_pooling2d_2 (Glob (None, 16) 0
_________________________________________________________________
dense_56 (Dense) (None, 10) 170
=================================================================
Total params: 18,842
Trainable params: 18,842
Non-trainable params: 0
def pre_process(image, label):
return (image / 256)[...,None].astype('float32'),
tf.keras.utils.to_categorical(label, num_classes=10)
(x, y), (_, _) = tf.keras.datasets.mnist.load_data('mnist')
encoder.compile(
loss = tf.keras.losses.CategoricalCrossentropy(),
metrics = tf.keras.metrics.CategoricalAccuracy(),
optimizer = tf.keras.optimizers.Adam())
encoder.fit(x, y, batch_size=256)
4s 14ms/step - loss: 1.4303 - categorical_accuracy: 0.5279
I think Keras will create (or preserves to create) an additional Input Layer - but as the second dense layer is added using model.add() it will automatically be connected to the layer before, and thus the extra input layer stays unconnected and is not part of the model.
(I agree that it would be nice of Keras to hint at unconnected layers, I sometimes created unconnected layers when using the functional API and changed the inputs. Keras doesn't remind me that I had jumped several layers, I just wondered why the summary() was so short...)

Using dilated convolution in Keras

In WaveNet, dilated convolution is used to increase receptive field of the layers above.
From the illustration, you can see that layers of dilated convolution with kernel size 2 and dilation rate of powers of 2 create a tree like structure of receptive fields. I tried to (very simply) replicate the above in Keras.
import tensorflow.keras as keras
nn = input_layer = keras.layers.Input(shape=(200, 2))
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=2)(nn)
nn = keras.layers.Conv1D(5, 5, padding='causal', dilation_rate=4)(nn)
nn = keras.layers.Dense(1)(nn)
model = keras.Model(input_layer, nn)
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt)
model.summary()
And the output:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 200, 2)] 0
_________________________________________________________________
conv1d_5 (Conv1D) (None, 200, 5) 55
_________________________________________________________________
conv1d_6 (Conv1D) (None, 200, 5) 130
_________________________________________________________________
dense_2 (Dense) (None, 200, 1) 6
=================================================================
Total params: 191
Trainable params: 191
Non-trainable params: 0
_________________________________________________________________
I was expecting axis=1 to shrink after each conv1d layer, similar to the gif. Why is this not the case?

Variable Number of channels

I need a convolutional layer which outputs a variable number of channels, depending on the input.
conv2d(filters = variable_number)
A model cannot have varying number of filters depending on the input. The model need to have below arguments to be fixed for it to be trained.
Name and type of all layers in the model.
Output shape for each layer.
Number of weight parameters of each layer.
The inputs each layer receives.
The total number of trainable and non-trainable parameters of the model.
If you have varying number of channels, then the model architecture is changing for different input and thus all the above listed points get impacted.
You can build a model with all the fixed parameters and later use dropout for the layer based on the input. But again the dropout is a regularization technique, Simply put, dropout refers to ignoring units (i.e. neurons) during the training phase of certain set of neurons which is chosen at random. By “ignoring”, I mean these units are not considered during a particular forward or backward pass.
OR
The most appropriate solution would be -
Build multiple input layer for different inputs.
Concatenate all these layers, but make sure the output shape of all these layers are same in case of Convolution layers else concatenate throws error.
Add the remaining layers of the model.
Below is an example for this -
from keras.models import Model
from keras.layers import Input, concatenate, Conv2D, ZeroPadding2D
from keras.optimizers import Adagrad
import tensorflow.keras.backend as K
import tensorflow as tf
input_img1 = Input(shape=(44,44,3))
x1 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img1)
input_img2 = Input(shape=(34,34,3))
x2 = Conv2D(3, (3, 3), activation='relu', padding='same')(input_img2)
# Zero Padding of 5 at the top, bottom, left and right side of an image tensor
x3 = ZeroPadding2D(padding = (5,5))(x2)
# Concatenate works as layers have same size output
x4 = concatenate([x1,x3])
output = Dense(18, activation='relu')(x4)
model = Model(inputs=[input_img1,input_img2], outputs=output)
model.summary()
Output -
Model: "model_22"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_91 (InputLayer) (None, 34, 34, 3) 0
__________________________________________________________________________________________________
input_90 (InputLayer) (None, 44, 44, 3) 0
__________________________________________________________________________________________________
conv2d_73 (Conv2D) (None, 34, 34, 3) 84 input_91[0][0]
__________________________________________________________________________________________________
conv2d_72 (Conv2D) (None, 44, 44, 3) 84 input_90[0][0]
__________________________________________________________________________________________________
zero_padding2d_14 (ZeroPadding2 (None, 44, 44, 3) 0 conv2d_73[0][0]
__________________________________________________________________________________________________
concatenate_30 (Concatenate) (None, 44, 44, 6) 0 conv2d_72[0][0]
zero_padding2d_14[0][0]
__________________________________________________________________________________________________
dense_47 (Dense) (None, 44, 44, 18) 126 concatenate_30[0][0]
==================================================================================================
Total params: 294
Trainable params: 294
Non-trainable params: 0
__________________________________________________________________________________________________
Hope this answers you question. Happy Learning.

Understanding Keras model architecture (tensor index)

This script defining a dummy using the functional API
from keras.layers import Input, Dense
from keras.models import Model
import keras
inputs = Input(shape=(100,), name='A_input')
x = Dense(20, activation='relu', name='B_dense')(inputs)
shared_l = Dense(20, activation='relu', name='C_dense_shared')
x = keras.layers.concatenate([shared_l(x), shared_l(x)], name='D_concat')
model = Model(inputs=inputs, outputs=x)
print(model.summary())
yields the following output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
A_input (InputLayer) (None, 100) 0
____________________________________________________________________________________________________
B_dense (Dense) (None, 20) 2020 A_input[0][0]
____________________________________________________________________________________________________
C_dense_shared (Dense) (None, 20) 420 B_dense[0][0]
B_dense[0][0]
____________________________________________________________________________________________________
D_concat (Concatenate) (None, 40) 0 C_dense_shared[0][0]
C_dense_shared[1][0]
====================================================================================================
My question concerns the content of the Connected to column.
I understand that a layer can have multiple nodes.
In this case C_dense_shared has two nodes, and D_concat is connected to both of them (C_dense_shared[0][0] and C_dense_shared[1][0]). So the first index (the node_index) is clear to me. But what does the second index mean? From the source code I read that this is the tensor_index:
layer_name[node_index][tensor_index]
But what does the tensor_index mean? And in what situations can it have a value different from 0?
I think the docstring of the Node class makes it quite clear:
tensor_indices: a list of integers,
the same length as `inbound_layers`.
`tensor_indices[i]` is the index of `input_tensors[i]` within the
output of the inbound layer
(necessary since each inbound layer might
have multiple tensor outputs, with each one being
independently manipulable).
tensor_index will be nonzero if a layer has multiple output tensors. It's different from the situation of multiple "datastreams" (e.g. layer sharing), where layers have multiple outbound nodes. For example, LSTM layer will return 3 tensors if given return_state=True:
Hidden state of the last time step, or all hidden states if return_sequences=True
Hidden state of the last time step
Memory cell of the last time step
As another example, feature transformation can be implemented as a Lambda layer:
def generate_powers(x):
return [x, K.sqrt(x), K.square(x)]
model_input = Input(shape=(10,))
powers = Lambda(generate_powers)(model_input)
x = Concatenate()(powers)
x = Dense(10, activation='relu')(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(model_input, x)
From model.summary(), you can see that concatenate_5 is connected to lambda_7[0][0], lambda_7[0][1] and lambda_7[0][2]:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_7 (InputLayer) (None, 10) 0
____________________________________________________________________________________________________
lambda_7 (Lambda) [(None, 10), (None, 1 0 input_7[0][0]
____________________________________________________________________________________________________
concatenate_5 (Concatenate) (None, 30) 0 lambda_7[0][0]
lambda_7[0][1]
lambda_7[0][2]
____________________________________________________________________________________________________
dense_8 (Dense) (None, 10) 310 concatenate_5[0][0]
____________________________________________________________________________________________________
dense_9 (Dense) (None, 1) 11 dense_8[0][0]
====================================================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
____________________________________________________________________________________________________