How Keras can calculate the number of parameters at early stage when there are still None dimensions? - tensorflow

Sorry for the very basic question (I'm new with Keras). I was wondering how Keras can calculate for each layer the number of parameters at an early stage (before fit) despite that model.summary shows that there are dimensions that still have None values at this stage. Are these values already determined in some way and if yes, why not show them in the summary?
I ask the question because I'm having a hard time figure out my "tensor shape bug" (I'm trying to determine the output dimensions of the the C5 block of my resnet50 model but I cannot see them in model.summary even if I see the number of parameters).
I give below an example based on C5_reduced layer in RetinaNet which is fed by C5 layer of Resnet50. The C5_reduced is
Conv2D(256,kernel_size=1,strides=1,pad=1)
Based on model.summary for this particular layer:
C5_reduced (Conv2D) (None, None, None, 256) 524544
I've made the guess that C5 is (None,1,1,2048) because 2048*256+256 = 524544 (I don't know how to confirm or infirm that hypothesis). So if it's already known, why not show it on summary? If dimensions 2 and 3 would have been different, the number of parameters would have been different too right?

If you pass exact input shape to your very first layer or input layer on your network, you will have the output that you want. For instance I used input layer here:
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
Passed input as (224,224,3). 3 represents the depth here. Note that convolutional parameters' calculation differ from Dense layers' calculation.
If you do such following:
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3))
You will see:
conv2d (Conv2D) ---> (None, 148, 148, 16)
Dimensions reduced to 148x148, in Keras padding is valid by default. Also strides is 1. Then the shape of output will be 148 x 148. (You can search for the formula.)
So then what are None values?
First None value is the batch size. In Keras first dimension is the batch size. You can pass them and make fixed, or you can determine them while fitting the model, or predicting.
In 2D convolution, the expected input is (batch_size, height, width, channels), you can also have shapes such as (None, None, None, 3), that means varying image sizes are allowed.
Edit:
tf.keras.layers.Input(shape = (None, None, 3)),
tf.keras.layers.Conv2D(16, (3,3), activation='relu')
Produces:
conv2d_21 (Conv2D) (None, None, None, 16) 448
Regarding to your question, how are the parameters calculated even we passed image height & width as None?
Convolution parameters calculated according to:
(filter_height * filter_width * input_image_channels + 1) * number_of_filters
When we put them into formula,
filter_height = 3
filter_width = 3
input_image_channel = 3
number_of_filters = 16
Parameters = (3 x 3 x 3 + 1) * 16 = 28 * 16 = 448
Notice, we only needed input_image's channel number which is 3, representing that it is an RGB image.
If you want to calculate the params for later convolutions, you need to consider that the number of filters from previous layer becomes the number of channels for current layer's channel.
That's how you can end up having None params rather than batch_size. Keras needs to know if your image is RGB or not in that case. Or you won't specify the dimensions while creating the model and can pass them while fitting the model with the dataset.

You need to define an input layer for your model. The total number of trainable parameters is unknown until you either a) compile the model and feed it data, at which point the model makes a graph based on the dimensions of the input and you will then be able to determine the number of params, or b) you define an input layer for the model with the input dimensions stated, then you can find the number of params with model.summary().
The point is that the model cannot know the number of parameters between the input and first hidden layer until it is defined, or you run inference and give it the shape of the input.

Related

how to calculate the confidence of a softmax layer

I am working on a multi-class computer vision classification task and using a CNN with FC layers stacked on top using softmax activation, the problem is that lets say im classifying animals categories, if i predicted what a rock image is it will return a high probability for the most similar category of animals due to using softmax activation that returns a probabilistic distribution compressed between 0 and 1. what can i use to determine the confidence of my models probability output to say whether i can rely on these probabilities or not.
PS:I dont want to add a no_label class
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Yes. You can do it like this
input = tf.keras.layers.Input((128,128,3))
x = tf.keras.layers.Conv2D(32,3)(input)
x = tf.keras.layers.MaxPooling2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(128)(x)
non_softmax_output = tf.keras.layers.Dense(10)(x)
softmax_output = tf.keras.layers.Softmax()(non_softmax_output)
model = tf.keras.models.Model(inputs=input,outputs=[non_softmax_output,softmax_output])
model.summary()
>>>
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 128, 128, 3)] 0
conv2d_1 (Conv2D) (None, 126, 126, 32) 896
max_pooling2d_1 (MaxPooling (None, 63, 63, 32) 0
2D)
flatten_1 (Flatten) (None, 127008) 0
dense_23 (Dense) (None, 128) 16257152
dense_24 (Dense) (None, 10) 1290
softmax (Softmax) (None, 10) 0
=================================================================
Total params: 16,259,338
Trainable params: 16,259,338
Non-trainable params: 0
_________________________________________________________________
The easier alternative is to just work with the predictions from the softmax layer. You don't gather much from the linear layer without the activation. Those weights by themselves do not mean much. You could instead define a function outside the model that changes the predictions based on some threshold value
Assume you define only 1 output in the above model with a softmax layer. You can define a function like this to get predictions based on some threshold value you choose
def modify_predict(test_images,threshold):
predictions = model.predict(test_images)
max_values = np.max(predictions,axis=1)
labels = np.argmax(predictions,axis=1)
new_predictions = np.where(max_values > threshold, labels, 999) #You can use any indicator here instead of 999 for your no_label class
return new_predictions
On the first part of your question, the only way you can know how your
model will behave on non-animal pictures is by having non-animal pictures
in your data.
There are two options
The first is to include non-animal pictures in the training set (and dev and test sets), and to train the model to distinguish between animal / non-animal.
You could either build a separate binary classification model to distinguish animal/non-animal (as alrady suggesetd in comments), or you could integrate it into one model by having a
'non-animal' class. (Although I recognise you indicate this last option is
not something you want to do).
The second is to include non-animal pictures in the dev and test sets, but not in the training set. You can't then train the model to distinguish between animal and non-animal, but you can at least measure how it behaves on
non-animal pictures, and perhaps create some sort of heuristic for selecting only some of your model's predictions. This seems like a worse option to me, even though it's generally accepted that dev and test sets can come from a different distribution to the training set. It's something one might do if there were only a small number of non-animal pictures available, but that surely can't be the case here.
There is, for example, a large labelled image database
available at https://www.image-net.org/index.php

Convolutional Neural Network (CNN) input shape

I am new to CNN and I have a question regarding CNN. I am a bit confused about the input shape of CNN (specifically with Keras).
My data is a 2D data (let's say 10X10) in different time slots. Therefore, I have 3D data.
I am going to feed this data to my model to predict the coming time slot. So, I will have a certain number of time slots for prediction (let's say 10 slots, so far, I may have a 10X10X10 data).
Now, my question is that I have to deal with this data as a 2D image with 10 channels (like ordinary kinds of data in CNN, RGB images) or as a 3D data. (conv2D or conv3D in Keras).
Thank you in advance for your help.
In your case,Conv2D will be useful. Please refer below description for understanding input shape of Convolution Neural Network (CNN) using Conv2D.
Let’s see how the input shape looks like. The input data to CNN will look like the following picture. We are assuming that our data is a collection of images.
Input shape has (batch_size, height, width, channels). Incase of RGB image would have a channel of 3 and the greyscale image would have a channel of 1.
Let’s look at the following code
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
model=tf.keras.models.Sequential()
model.add(Conv2D(filters=64, kernel_size=1, input_shape=(10,10,3)))
model.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 10, 10, 64) 256
=================================================================
Thought it looks like out input shape is 3D, but you have to pass a 4D array at the time of fitting the data which should be like (batch_size, 10, 10, 3). Since there is no batch size value in the input_shape argument, we could go with any batch size while fitting the data.
The output shape is (None, 10, 10, 64). The first dimension represents the batch size, which is None at the moment. Because the network does not know the batch size in advance.
Note: Once you fit the data, None would be replaced by the batch size you give while fitting the data.
Let’s look at another code with batch Size
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
model=tf.keras.models.Sequential()
model.add(Conv2D(filters=64, kernel_size=1, batch_input_shape=(16,10,10,3)))
model.summary()
Output:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (16, 10, 10, 64) 256
=================================================================
Here I have replaced input_shape argument with batch_input_shape. As the name suggests, this argument will ask you the batch size in advance, and you can not provide any other batch size at the time of fitting the data.

Want to check Intermediate Operations inside Keras Layer

I am facing floating point resolution loss during convolution operation while porting the code on my embedded processor which supports only half precision, so I want to test the intermediate operations that are performed layer by layer in my Keras based model which is performing good while on Full precision on my desktop.
In the following snippet of code I want to compute the 1DConv on the 1500x3 shaped input data. The kernel size is 10 and Kernel shape is (10x3x16).
To compute the 1D-Convolution, Keras does the Expand Dimensions on input shape and add one more dimension to it, which becomes suitable for 2D Convolution operation.
Then series of operations are called e.g. Conv2D followed by Squeeze and finally BiasAdd.
Finally the output of the Conv1D layer is pushed in
conv1d_20/Elu layer.
Please find the picture attached for full description of operations involved.
Now, I want to test the output much before the actual output of a Layer is produced.
Please see the below code:
Input_sequence = keras.layers.Input(shape=(1500,3))
encoder_conv1 = keras.layers.Conv1D(filters=16, kernel_size=10, padding='same', activation=tf.nn.elu)(Input_sequence)
The Model summary shows:
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 1500, 3)] 0
_________________________________________________________________
conv1d_20 (Conv1D) (None, 1500, 16) 496
I want to define the model output at conv1d_20/Conv2D but it gives me error. But the below is accepted at compilation.
encoder = keras.Model(inputs=autoencoder.input, outputs=autoencoder.get_layer('conv1d_20').output)
encoder.get_output_at(0)
It outputs
<tf.Tensor 'conv1d_20/Elu:0' shape=(?, 1500, 16) dtype=float32>
I want to test the output of Conv2D operation but it produces the output of conv1d_20/Elu.
How can I do this test. Please help me.
Conv1D operation
You can disable the bias(use_bias=False) and activation functions(activation=None) when defining the Conv1D operation.
Input_sequence = keras.layers.Input(shape=(1500,3))
encoder_conv1 = keras.layers.Conv1D(filters=16, kernel_size=10,
padding='same', use_bias=False,
activation=None)(Input_sequence)

Error when running LSTM model, Loss: NaN values

My LSTM model using Keras and Tensorflow is giving loss: nan values.
I have tried to reduce the learning rate but still get nan and decreasing overall accuracy, and have also used np.any(np.isnan(x_train)) to check for nan values that I may be introducing myself (no nan's were found). I also read about exploding gradients and cant seem to find anything to help with my specific issue.
I think I have an idea of where the issue may be but not quite sure. This is the process I implemented to build x_train
For example:
a = [[1,0,..0], [0,1,..0], [0,0,..1]]
a.shape() # (3, 20)
b = [[0,0,..1], [0,1,..0], [1,0,..0], [0,1,..0]]
b.shape() # (4, 20)
To ensure that the shapes are the same I append a vector [0,0,..0] (all zero's) to a so the shape is now (4,20).
a and b is appended to give a 3D array shape (2,4,20)and this forms x_train. But I think appending the empty vectors of 0's is for some reason giving me a loss: nan whilst training my model. Is this where I could be going wrong?
n.b. a+b is a numpy array and my actual x_train.shape is (1228, 1452, 20)
•Edit• model.summary() added below:
x_train shape: (1228, 1452, 20)
y_train shape: (1228, 1452, 8)
x_val shape: (223, 1452, 20)
x_val shape: (223, 1452, 8)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
unified_lstm (UnifiedLSTM) (None, 1452, 128) 76288
_________________________________________________________________
batch_normalization_v2 (Batc (None, 1452, 128) 512
_________________________________________________________________
unified_lstm_1 (UnifiedLSTM) (None, 1452, 128) 131584
_________________________________________________________________
batch_normalization_v2_1 (Ba (None, 1452, 128) 512
_________________________________________________________________
dense (Dense) (None, 1452, 32) 4128
_________________________________________________________________
dense_1 (Dense) (None, 1452, 8) 264
=================================================================
Total params: 213,288
Trainable params: 212,776
Non-trainable params: 512
Screenshot of nan is below:
Solution is to use Masking() layers available in keras with mask_value=0. This is because when using empty vectors they are calculated into the loss, by using Masking(), as outlined by keras the padding vectors are skipped and not included.
As per keras documentation:
'If all features for a given sample timestep are equal to mask_value, then the sample timestep will be masked (skipped) in all downstream layers (as long as they support masking)'
I will advise you check the following:-
The output of your Batch Normalisation Layer. Once, I encountered a similar problem, where loss was coming out to be "nan". When I checked the Normalization output, it's was all zeros. Maybe, that's what made loss to be "nan".
The Possible reason for NaNs could be too high of a learning rate. Try reducing it bit and check the output.
If you are using RMSProp, try Adam instead.
As your dense_1 layer has shape of (None, 8), I am assuming you are working on some sort of classification problem. Because, we use log loss in here, sometimes,
precision errors also come into play. If you are using float16, change the precision to float32.
Instead of padding all zeros vector, you should use a dummy feature. That is, your one-hot feature vector will increase size to (21,), e.g., [0, 0, 0, ..., 1] of size 21 with the last dimension for dummy padding.
I also advise you to use index-based input instead of explicit one-hot vector, where each one-hot vector can be replaced by the index of its 1, e.g., [0, 0, 1, ..., 0] becomes 2. Keras support this index-based input style with its embedding layer. This will be easier to use and more computationally efficient.

output dimension of reshape layer

model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
The dense layer takes input of 1*100 dimension. It uses 7*7*256 nodes in it's layer. Reshape layer takes 1*(7*7*256) as input and what's it's output. I mean what does (7, 7, 256) means ?
Is it the image of dimension 7 * 7 if we give input as image of 1*100? What is it ?
I am sorry, I know that I have understood it in a completely wrong way. So I wanted to understand it.
Here your model will take an input_shape of (*, 100), the first dense layer will output a shape of ( * , 7*7*256) and finaly the last Reshape layer will reshape that output to an array of shape (*, 7, 7, 256).
With * being your batch_size.
So yeah basically, your 'image' of shape (,100) will be reshaped to an array of shape
(, 7, 7, 256).
Hope this will help you
This has reference to the google's tensorflow mnist dcgan tutorial.
The first dense layer at the input is configured to have number of filters 7 * 7 * 256 and we are not able to find an explanation for this in the tutorial.
My initial impression about this is as follows:
Remember we want a 28x28 grey scale image as output. That means the required output shape is (None, 28, 28, 1) where first entity is batch size, which is none if a single image is required.
Now note that a Conv2DTranspose layer with strides=(2,2) essentially upsamples the input shape by a factor of 2, it doubles it. Secondly the number of filters of Conv2DTranspose layer become the channels, if I want the output to be grey scale, the number of filters should be one. Thus, if I want (None, 28,28,1) at the output of Conv2DTranspose layer, the shape of its input should be (None, 14,14,x). (No if channels is rather decided by current layer, x can be any value at input).
Suppose I am again putting one more Conv2DTranspose layer with strides=(2,2), before this layer, obviously the input to this layer should be (None, 7,7,x) where x is number of filters.
In general, if a batch of images of size (h, w) is input to a Conv2DTranspose layer with strides = (2,2), its output will have shape (batch_size, 2 * h, 2 * w , no_of_filters)
The google tutorial further puts one more Conv2DTranspose layer [but with strides =(1,1) so it does not have the upsampling effect] and a Dense layer on top of it. These layers are not doing upsampling so the input shape remains 7x7. 7x7 is the image shape here. The first dense layer's output is in flattened shape, so if it has 7 * 7 * x units, we can always reshape it to get an (7,7,x) image.
This is theory behind that 7 * 7 *x number of units of first dense layer. The value 256 they have used is an arbitrary value which they might have derived empirically or intuitively, I guess.