Convolutional Neural Network (CNN) input shape - tensorflow

I am new to CNN and I have a question regarding CNN. I am a bit confused about the input shape of CNN (specifically with Keras).
My data is a 2D data (let's say 10X10) in different time slots. Therefore, I have 3D data.
I am going to feed this data to my model to predict the coming time slot. So, I will have a certain number of time slots for prediction (let's say 10 slots, so far, I may have a 10X10X10 data).
Now, my question is that I have to deal with this data as a 2D image with 10 channels (like ordinary kinds of data in CNN, RGB images) or as a 3D data. (conv2D or conv3D in Keras).
Thank you in advance for your help.

In your case,Conv2D will be useful. Please refer below description for understanding input shape of Convolution Neural Network (CNN) using Conv2D.
Let’s see how the input shape looks like. The input data to CNN will look like the following picture. We are assuming that our data is a collection of images.
Input shape has (batch_size, height, width, channels). Incase of RGB image would have a channel of 3 and the greyscale image would have a channel of 1.
Let’s look at the following code
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
model=tf.keras.models.Sequential()
model.add(Conv2D(filters=64, kernel_size=1, input_shape=(10,10,3)))
model.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 10, 10, 64) 256
=================================================================
Thought it looks like out input shape is 3D, but you have to pass a 4D array at the time of fitting the data which should be like (batch_size, 10, 10, 3). Since there is no batch size value in the input_shape argument, we could go with any batch size while fitting the data.
The output shape is (None, 10, 10, 64). The first dimension represents the batch size, which is None at the moment. Because the network does not know the batch size in advance.
Note: Once you fit the data, None would be replaced by the batch size you give while fitting the data.
Let’s look at another code with batch Size
import tensorflow as tf
from tensorflow.keras.layers import Conv2D
model=tf.keras.models.Sequential()
model.add(Conv2D(filters=64, kernel_size=1, batch_input_shape=(16,10,10,3)))
model.summary()
Output:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (16, 10, 10, 64) 256
=================================================================
Here I have replaced input_shape argument with batch_input_shape. As the name suggests, this argument will ask you the batch size in advance, and you can not provide any other batch size at the time of fitting the data.

Related

how to calculate the confidence of a softmax layer

I am working on a multi-class computer vision classification task and using a CNN with FC layers stacked on top using softmax activation, the problem is that lets say im classifying animals categories, if i predicted what a rock image is it will return a high probability for the most similar category of animals due to using softmax activation that returns a probabilistic distribution compressed between 0 and 1. what can i use to determine the confidence of my models probability output to say whether i can rely on these probabilities or not.
PS:I dont want to add a no_label class
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Is it possible using keras functional api to have 2 outputs of the model the pre_softmax and the softmax output without updating the weights according to a linear activation which is the pre_softmax layer since the training would be affected
Yes. You can do it like this
input = tf.keras.layers.Input((128,128,3))
x = tf.keras.layers.Conv2D(32,3)(input)
x = tf.keras.layers.MaxPooling2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(128)(x)
non_softmax_output = tf.keras.layers.Dense(10)(x)
softmax_output = tf.keras.layers.Softmax()(non_softmax_output)
model = tf.keras.models.Model(inputs=input,outputs=[non_softmax_output,softmax_output])
model.summary()
>>>
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 128, 128, 3)] 0
conv2d_1 (Conv2D) (None, 126, 126, 32) 896
max_pooling2d_1 (MaxPooling (None, 63, 63, 32) 0
2D)
flatten_1 (Flatten) (None, 127008) 0
dense_23 (Dense) (None, 128) 16257152
dense_24 (Dense) (None, 10) 1290
softmax (Softmax) (None, 10) 0
=================================================================
Total params: 16,259,338
Trainable params: 16,259,338
Non-trainable params: 0
_________________________________________________________________
The easier alternative is to just work with the predictions from the softmax layer. You don't gather much from the linear layer without the activation. Those weights by themselves do not mean much. You could instead define a function outside the model that changes the predictions based on some threshold value
Assume you define only 1 output in the above model with a softmax layer. You can define a function like this to get predictions based on some threshold value you choose
def modify_predict(test_images,threshold):
predictions = model.predict(test_images)
max_values = np.max(predictions,axis=1)
labels = np.argmax(predictions,axis=1)
new_predictions = np.where(max_values > threshold, labels, 999) #You can use any indicator here instead of 999 for your no_label class
return new_predictions
On the first part of your question, the only way you can know how your
model will behave on non-animal pictures is by having non-animal pictures
in your data.
There are two options
The first is to include non-animal pictures in the training set (and dev and test sets), and to train the model to distinguish between animal / non-animal.
You could either build a separate binary classification model to distinguish animal/non-animal (as alrady suggesetd in comments), or you could integrate it into one model by having a
'non-animal' class. (Although I recognise you indicate this last option is
not something you want to do).
The second is to include non-animal pictures in the dev and test sets, but not in the training set. You can't then train the model to distinguish between animal and non-animal, but you can at least measure how it behaves on
non-animal pictures, and perhaps create some sort of heuristic for selecting only some of your model's predictions. This seems like a worse option to me, even though it's generally accepted that dev and test sets can come from a different distribution to the training set. It's something one might do if there were only a small number of non-animal pictures available, but that surely can't be the case here.
There is, for example, a large labelled image database
available at https://www.image-net.org/index.php

How to feed new vectors into recurrent and convolutional keras model for real-time/streaming/live inference?

I have successfully trained a Keras/TensorFlow model consisting of layers SimpleRNN→Conv1D→GRU→Dense. The model is meant to run on an Apple Watch for real time inference, which means I want to feed it with a new feature vector and predict a new output for each time step. My problem is that I don't know how to feed data into it such that the convolutional layer receives the latest k outputs from the RNN layer.
I can see three options:
Feed it with one feature vector at a time, i.e. (1,1,6). In this case I assume that the convolutional layer will receive only one time step and hence zero pad for all the previous samples.
Feed it with the last k feature vectors for each time step, i.e. (1,9,6), where k = 9 is the CNN kernel length. In this case I assume that the state flow in the recurrent layers will not work.
Feed it with the last k feature vectors every k:th time step, again where k = 9 is the CNN kernel length. I assume this would work, but introduces unnecessary latency that I wish to avoid.
What I want is a model that I can feed with a new single feature vector for each time step, and it will automatically feed the last k outputs of the SimpleRNN layer into the following Conv1D layer. Is this possible with my current model? If not, can I work with the layer arguments, or can I introduce some kind of FIFO buffer layer between the SimpleRNN and Conv1D layer?
Here is my current model:
feature_vector_size = 6
model = tf.keras.models.Sequential([
Input(shape=(None, feature_vector_size)),
SimpleRNN(16, return_sequences=True, name="rnn"),
Conv1D(16, 9, padding="causal", activation="relu"),
GRU(12, return_sequences=True, name="gru"),
Dropout(0.2),
Dense(1, activation=tf.nn.sigmoid, name="dense")
])
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rnn (SimpleRNN) (None, None, 16) 368
_________________________________________________________________
conv1d (Conv1D) (None, None, 16) 2320
_________________________________________________________________
gru (GRU) (None, None, 12) 1080
_________________________________________________________________
dropout (Dropout) (None, None, 12) 0
_________________________________________________________________
dense (Dense) (None, None, 1) 13
=================================================================
Edit:
After having researched the problem a bit, I have realized:
The Conv1D layer will zero pad in all three cases that I described, so option 3 won't work either. Setting padding="valid" solves this particular problem.
The SimpleRNN and GRU layers must have stateful=True. I found this description of how to make a model stateful after it has been trained stateless: How to implement a forward pass in a Keras RNN in real-time?
Keras sequence models seem to be made for complete, finite sequences only. The infinite streaming use case with one time step at a time isn't really supported.
However, the original question remains open: How can I build and/or feed new feature vectors into the model such that the convolutional layer receives the latest k outputs from the RNN layer?
For anyone else with the same problem: I couldn't solve the SimpleRNN to Conv1D data flow easily, so I ended up replacing the SimpleRNN layer with another Conv1D layer and setting padding="valid" on both Conv1D layers. The resulting model outputs exactly one time step when fed with a sequence of c * k - 1 time steps, where c is the number of Conv1D layers and k is the convolutional kernel length (c = 2 and k = 9 in my case):
feature_vector_size = 6
model = tf.keras.models.Sequential([
Input(shape=(None, feature_vector_size)),
Conv1D(16, 9, padding="valid", name="conv1d1"),
Conv1D(16, 9, padding="valid", name="conv1d2"),
GRU(12, return_sequences=True, name="gru"),
Dropout(0.2),
Dense(1, activation=tf.nn.sigmoid, name="dense")
])
After training, I make the GRU layer stateful according to How to implement a forward pass in a Keras RNN in real-time?. For real-time inference I keep a FIFO queue of the 17 latest feature vectors and feed all these 17 vectors into the model as an input sequence for each new time step.
I don't know if this is the best possible solution, but at least it works.

How Keras can calculate the number of parameters at early stage when there are still None dimensions?

Sorry for the very basic question (I'm new with Keras). I was wondering how Keras can calculate for each layer the number of parameters at an early stage (before fit) despite that model.summary shows that there are dimensions that still have None values at this stage. Are these values already determined in some way and if yes, why not show them in the summary?
I ask the question because I'm having a hard time figure out my "tensor shape bug" (I'm trying to determine the output dimensions of the the C5 block of my resnet50 model but I cannot see them in model.summary even if I see the number of parameters).
I give below an example based on C5_reduced layer in RetinaNet which is fed by C5 layer of Resnet50. The C5_reduced is
Conv2D(256,kernel_size=1,strides=1,pad=1)
Based on model.summary for this particular layer:
C5_reduced (Conv2D) (None, None, None, 256) 524544
I've made the guess that C5 is (None,1,1,2048) because 2048*256+256 = 524544 (I don't know how to confirm or infirm that hypothesis). So if it's already known, why not show it on summary? If dimensions 2 and 3 would have been different, the number of parameters would have been different too right?
If you pass exact input shape to your very first layer or input layer on your network, you will have the output that you want. For instance I used input layer here:
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
Passed input as (224,224,3). 3 represents the depth here. Note that convolutional parameters' calculation differ from Dense layers' calculation.
If you do such following:
tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3))
You will see:
conv2d (Conv2D) ---> (None, 148, 148, 16)
Dimensions reduced to 148x148, in Keras padding is valid by default. Also strides is 1. Then the shape of output will be 148 x 148. (You can search for the formula.)
So then what are None values?
First None value is the batch size. In Keras first dimension is the batch size. You can pass them and make fixed, or you can determine them while fitting the model, or predicting.
In 2D convolution, the expected input is (batch_size, height, width, channels), you can also have shapes such as (None, None, None, 3), that means varying image sizes are allowed.
Edit:
tf.keras.layers.Input(shape = (None, None, 3)),
tf.keras.layers.Conv2D(16, (3,3), activation='relu')
Produces:
conv2d_21 (Conv2D) (None, None, None, 16) 448
Regarding to your question, how are the parameters calculated even we passed image height & width as None?
Convolution parameters calculated according to:
(filter_height * filter_width * input_image_channels + 1) * number_of_filters
When we put them into formula,
filter_height = 3
filter_width = 3
input_image_channel = 3
number_of_filters = 16
Parameters = (3 x 3 x 3 + 1) * 16 = 28 * 16 = 448
Notice, we only needed input_image's channel number which is 3, representing that it is an RGB image.
If you want to calculate the params for later convolutions, you need to consider that the number of filters from previous layer becomes the number of channels for current layer's channel.
That's how you can end up having None params rather than batch_size. Keras needs to know if your image is RGB or not in that case. Or you won't specify the dimensions while creating the model and can pass them while fitting the model with the dataset.
You need to define an input layer for your model. The total number of trainable parameters is unknown until you either a) compile the model and feed it data, at which point the model makes a graph based on the dimensions of the input and you will then be able to determine the number of params, or b) you define an input layer for the model with the input dimensions stated, then you can find the number of params with model.summary().
The point is that the model cannot know the number of parameters between the input and first hidden layer until it is defined, or you run inference and give it the shape of the input.

Change fully convolutional network input shape in TF 2.3 and tf.keras

I'm working with tensorflow 2.3 and tf.keras
I've trained a network on images with input shape (None,120,120,12) . Actually I've also been able to train the model while declaring the input as (None,128,128,12) while feeding (None,120,120,12) batches because of a coding error. TF just printed out a warning and didn't care. This wasn't the behavior in previous versions. My network has only convolutional layers and, if the input size has enough powers of 2 considering the depth, it provides an output image of the same shape as the input, it has only convolutional layers.
I've finally fully trained this model and I'd like to apply it also to images of different shape. Is there a proper way to change the input shape of my trained model? Or should I define a new model and then copy the weights layer by layer? Or should I just forget about it and just accept the warnings and forget about them since it works anyway?
ah. You again. I think your problem is basically simple. Once you train your model with an input size. If you want to run the model, the input must be the same shape. However, if you want to take advantage of the trained model and believe that the features have learnt is not much different, then you can apply transferlearning, and of course, retrain it again. You don't have to copy weights, just freeze the model and train only the input and output. You can check this for some basic example with your VGG
base_model = tensorflow.keras.applications.VGG19(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(224, 224, 3),
include_top=False)
base_model.trainable = False
inputs = layers.Input(shape=150,150,3)
x = base_model(inputs, training=False)
x = keras.layers.GlobalAveragePooling2D()(x)
outputs = keras.layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 150, 150, 3)] 0
_________________________________________________________________
vgg19 (Model) multiple 20024384
_________________________________________________________________
global_average_pooling2d (Gl (None, 512) 0
_________________________________________________________________
dense (Dense) (None, 1) 513
=================================================================

Want to check Intermediate Operations inside Keras Layer

I am facing floating point resolution loss during convolution operation while porting the code on my embedded processor which supports only half precision, so I want to test the intermediate operations that are performed layer by layer in my Keras based model which is performing good while on Full precision on my desktop.
In the following snippet of code I want to compute the 1DConv on the 1500x3 shaped input data. The kernel size is 10 and Kernel shape is (10x3x16).
To compute the 1D-Convolution, Keras does the Expand Dimensions on input shape and add one more dimension to it, which becomes suitable for 2D Convolution operation.
Then series of operations are called e.g. Conv2D followed by Squeeze and finally BiasAdd.
Finally the output of the Conv1D layer is pushed in
conv1d_20/Elu layer.
Please find the picture attached for full description of operations involved.
Now, I want to test the output much before the actual output of a Layer is produced.
Please see the below code:
Input_sequence = keras.layers.Input(shape=(1500,3))
encoder_conv1 = keras.layers.Conv1D(filters=16, kernel_size=10, padding='same', activation=tf.nn.elu)(Input_sequence)
The Model summary shows:
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) [(None, 1500, 3)] 0
_________________________________________________________________
conv1d_20 (Conv1D) (None, 1500, 16) 496
I want to define the model output at conv1d_20/Conv2D but it gives me error. But the below is accepted at compilation.
encoder = keras.Model(inputs=autoencoder.input, outputs=autoencoder.get_layer('conv1d_20').output)
encoder.get_output_at(0)
It outputs
<tf.Tensor 'conv1d_20/Elu:0' shape=(?, 1500, 16) dtype=float32>
I want to test the output of Conv2D operation but it produces the output of conv1d_20/Elu.
How can I do this test. Please help me.
Conv1D operation
You can disable the bias(use_bias=False) and activation functions(activation=None) when defining the Conv1D operation.
Input_sequence = keras.layers.Input(shape=(1500,3))
encoder_conv1 = keras.layers.Conv1D(filters=16, kernel_size=10,
padding='same', use_bias=False,
activation=None)(Input_sequence)