Calculating dimensions in convolutional neural network

Calculating dimensions in convolutional neural network - tensorflow

I was reading this tutorial on Keras convolutional auto encoders, and I realized that I don't get the dimension (8, 4, 4) after these layers in my calculation - the dimension of images should drop to 3 already after the second convolutional layer, as the stride is large. So how does it obtain this dimension? Or can anyone explain the calculation process?
I am also confused on how "same" padding is executed in this situation, as they always mention "when stride=1 same padding will keep the same dimension". I totally get that. But what happens when stride isn't 1? how many zeros do I get on each side? I know the calculation equation for dimensions, floor((h + 2p - k)//s + 1), but what is p in this case?
Thanks
input_img = Input(shape=(1, 28, 28))
x = Convolution2D(16, 3, 3, activation='relu', border_mode='same')(input_img)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
x = MaxPooling2D((2, 2), border_mode='same')(x)
x = Convolution2D(8, 3, 3, activation='relu', border_mode='same')(x)
encoded = MaxPooling2D((2, 2), border_mode='same')(x)
# at this point the representation is (8, 4, 4) i.e. 128-dimensional

Oh no I think I know what happens: the code in the tutorial is wrong. I found this question which cite the same tutorial with the correct code. So they forgot to put parenthesis in all the Convolution2D layers (it's a translated version), and it actually should be 16, (3, 3), that means stride is 1, not 3. So it explains. If stride is 3 we can't get this dimension.

Related

Input_from Conv_1D signal data

This is my signal data
The length of each sample data is = 64.
The sum of train data is =49572
length=len(x_train)
model = tf.keras.models.Sequential([
tf.keras.layers.Conv1D(32, 3, activation='relu', input_shape=(length,64)),
tf.keras.layers.MaxPooling1D(3),
tf.keras.layers.Conv1D(64, 3, activation='relu'),
tf.keras.layers.MaxPooling1D(3),
tf.keras.layers.Conv1D(128, 3, activation='relu'),
tf.keras.layers.MaxPooling1D(3),
tf.keras.layers.Conv1D(128, 3, activation='relu'),
tf.keras.layers.MaxPooling1D(3),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(29, activation='softmax')
])
I want to make a CNN model for signal data. So, I use Conv1d.
How to know the input_shape from my data?

From the keras Conv1D documentation:
When using this layer as the first layer in a model, provide an
input_shape argument (tuple of integers or None, e.g. (10, 128) for
sequences of 10 vectors of 128-dimensional vectors, or (None, 128) for
variable-length sequences of 128-dimensional vectors.
From your image, it seems like your data is simple 1-dimensional hat means length should equal to 1 in your case.
Think of the length dimension as the color channel of an image in the 2D convolution case. Black-white images have only a single color dimension, therefore width x height x 1, whereas RGB images have 3 color channels, hence width x height x 3.
Similar, if you work with time series and 1D convolutions you may have more then one signal, e.g. temperature + atmospheric pressure + humidity measured throughout the day for each minute. Then your signal would be of shape 1440 x 3

what are the values of kernel matrix?

When using CNN with tensorflow, what the convulsion matrix looks like (what are the kernel values) ?
Look on this basic example of CNN:
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
what the convolution matrix looks like ?
what are the values of the 3x3 matrix ?
In the example above, we use 3 Conv2D layers (each layer use 3x3 convultion matrix).
Does those 3 matrixes are the same ? or they will have different values ?

Each convolution layer will have a weight and bias which can be inspected using
# For 1 layer <conv> (weight)
model.layers[0].get_weights()[0]
# For 1 layer <conv> (bias)
model.layers[0].get_weights()[1]
# For 2 layer <pool> (no weight and bias term) <so empty list is returned>
model.layers[1].get_weights()
#and so on....
conv matrix is a 4D tensor (in_channel × filter_size × filter_size × out_channel) and for your case: (3, 3, 3, 32).
Each filter will have different value. Nothing is common.

input output dimension in keras dense layer

I am trying to implement a dense layer in keras. The input is EEG recording using 2 channels, each of them consist of a vector of 8 points and the total number of training points is 17. The y is also 17 points.
I used
x=x.reshape(17,2,8,1)
y=y.reshape(17,1,1,1)
model.add(Dense(1, input_shape=(2,8,1), activation='relu'))
print(model.summary())
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
print(model.compile)
model.fit(x, y, batch_size = 17,epochs=500, verbose=1)
but i get the following error
Error when checking target: expected dense_57 to have shape (2, 8, 1) but got array with shape (17, 1, 1)

Since the Dense layer has output dimension 1, it would expect y to be of the shape (2, 8, 1). An easy fix would be to do the following
x = x.reshape(17, 16)
y = y.reshape(17, 1)
model.add(Dense(1, input_shape=(16,), activation='relu'))

How to structure my video dataset based on extracted features for building a CNN-LSTM classification model?

For my project which deals with the recognition of emotions, I have a dataset consisting of multiple videos, which range from 0.5s-10s. I have an application which goes through each video and creates a .csv file containing the features it has extracted from each frame in the video, i.e., each row represents each frame from the video (with no. of rows being variable) and the columns represent the different features the application has extracted from the frame (with no. of columns being fixed). Each .csv filename also contains a code representing the emotion being expressed in the video.
Initially, my plan was to extract each frame from the video and pass each frame as input to the following CNN-LSTM (CNN for the spatial features and LSTM for the temporal features) model I was planning on using.
model = Sequential()
model.add(Input(input_shape))
model.add(Conv3D(6, (1, 5, 5), (1, 1, 1), activation='relu', name='conv-1'))
model.add(AveragePooling3D((1, 2, 2), strides=(1, 2, 2), name='avgpool-1'))
model.add(Conv3D(16, (1, 5, 5), (1, 1, 1), activation='relu', name='conv-2'))
model.add(AveragePooling3D((1, 2, 2), strides=(1, 2, 2), name='avgpool-2'))
model.add(Conv3D(32, (1, 5, 5), (1, 1, 1), activation='relu', name='conv-3'))
model.add(AveragePooling3D((1, 2, 2), strides=(1, 2, 2), name='avgpool-3'))
model.add(Conv3D(64, (1, 4, 4), (1, 1, 1), activation='relu', name='conv-4'))
model.add(Reshape((30, 64), name='reshape'))
model.add(CuDNNLSTM(64, return_sequences=True, name='lstm-1'))
model.add(CuDNNLSTM(64, name='lstm-2'))
model.add(Dense(6, activation=tf.nn.softmax, name='result'))
I still plan on using a CNN-LSTM model but I don't know how to structure my dataset now. I thought of labelling each frame in each .csv file with the corresponding emotion label and then combining all the .csv files into a single .csv file. This combined .csv file would then be passed to the above model, after changing the input shape and other necessary parameters, but I don't know if the model would be able to differentiate between the videos if done in that way.
So to conclude, I need help structuring my dataset and how this dataset should be passed to a CNN-LSTM model.

By looking at your problem statement I don't think there is a need to differentiate between the videos.
You can go ahead with your approach of labeling each frame in the video and combining it to a single CSV file.
For can use the below code to convert to NumPy arrays from CSV file to prepare your model to train by following the below method.
data = pd.read_csv('input.csv')
width, height = 48, 48
datapoints = data['pixels'].tolist()
#getting features for training
X = []
for xseq in datapoints:
xx = [int(xp) for xp in xseq.split(' ')]
xx = np.asarray(xx).reshape(width, height)
X.append(xx.astype('float32'))
X = np.asarray(X)
X = np.expand_dims(X, -1)
#getting labels for training
y = pd.get_dummies(data['emotion']).as_matrix()
#storing them using numpy
np.save('fdataX', X)
np.save('flabels', y)

How does tf.keras.layers.Conv2DTranspose behave with stride and padding?

While a convolution layer in TensorFlow has a complete description https://www.tensorflow.org/api_guides/python/nn#Convolution, transposed convolution does not have one.
Although tf.keras.layers.Conv2DTranspose has a reference to https://arxiv.org/pdf/1603.07285.pdf, it is not complete.
Is there any documentation that describes how tf.keras.layers.Conv2DTranspose behaves?

Conv2DTranspose is often used as upsampling for an image/feature map. The code below use 1X1 filter kernel to show how the input is padded with zero. the code is for tensorflow 2.0, add enable_eager_execution() with tensorflow1.x
data = tf.ones([2,2],tf.float32,"input_data")
input_layer = tf.reshape(data, [-1, 2, 2, 1])
transpose2d = layers.Conv2DTranspose(1, (1, 1), kernel_initializer='ones', strides=(2, 2), padding='valid', use_bias=False)
x = transpose2d(input_layer)
print(x)
The input is
1,1
1,1
The x is
1,0,1,0
0,0,0,0
1,0,1,0
0,0,0,0
you can change the stride value to see the diffrence

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculating dimensions in convolutional neural network - tensorflow

Related

Input_from Conv_1D signal data

what are the values of kernel matrix?

input output dimension in keras dense layer

How to structure my video dataset based on extracted features for building a CNN-LSTM classification model?

How does tf.keras.layers.Conv2DTranspose behave with stride and padding?

Categories

Resources