CNN features(dimensions) feed to LSTM Tensorflow - tensorflow

So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].

You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.

Related

Tensorflow Keras output layer shape weird error

I am fairly new to TF, Keras and ML in general.
I am trying to implement a very simple MLP with an input shape of (batch_size,3,2) and an output shape of (batch_size,3), that is (if I got it right): for every 3x2 feature, there is a corresponding 3 value array label.
Here is how I create the model:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(3)
])
and these are the X and y shapes:
X_train.shape,y_train.shape
TensorShape([64,3,2]),TensorShape([64,3])
On model.fit I am facing a weird error I cannot understand:
ValueError: Dimensions must be equal, but are 3 and 32 for ... with input shapes: [32,3,3] and [32,3]
I have no clue what's going on, I understand the batch size is 32, but where does that [32,3,3] comes from?
Moreover, if from the original 64, I lower the number (shapes) of X_train and y_train, say, to: (19,3,2) and (19,3), I get the following error instead:
InvalidArgumentError: required broadcastable shapes at loc(unknown)
What's even more weird for me is that if I specify a single unit for the output (last) layer, instead of 3 like this:
model = tf.keras.Sequential([
tf.keras.layers.Dense(50,tf.keras.activations.relu,input_shape=((3,2)),
tf.keras.layers.Dense(1)
])
model.fit works, but the predictions have shape (1,3,1) instead of my expected (3,)
I am very confused.
Whenever you have not any idea about the journey of data throughout your model, use model.summary() to see the details and what happens to the shape of data in each layer.
In this case, the input is a 2D array, and the output is a 1D array, and you just used dense layers. Dense layers can not handle 2d features in nature. For example for an image as input, you can not feed it directly to a dense layer. Instead you should use other layers such as Conv2D or Flatten your input (make it 1D) before feeding your data to the dense layer. Otherwise you will get the other dimension in the output.
Inference: If your input dimension and output dimension differs, somewhere in your model, the shape need to be changed. Most common ways to do so, is using a Flatten layer or GlobalAveragePooling and so on.
When you pass an input to a dense layer, the input should be flattened first. There are 2 ways to deal with this:
Way 1: Adding a flatten input as a first layer of your model:
model = Sequential()
model.add(Flatten(input_shape=(3,2)))
model.add(Dense(50, 'relu'))
model.add(Dense(3))
Way 2: Converting the 2D array to 1D before passing the inputs to your model:
X_train = tf.reshape(X_train, shape=([6]))
or
X_train = tf.reshape(X_train, shape=((6,)))
Then change the input shape of the first layer as:
model.add(Dense(50, 'relu', input_shape=(6,))

1D convolution on flat, one-dimensional data (i.e. no timeseries)

I am training in a dataset in which (some of) the neighboring features exhibit very strong correlations. In order to help the neural network, I am thinking of adding some 1D convolutions as the first layers. Even though 1D convolutions are mostly used to time series/nlp data, I see no theoretical reason why they cannot be used vector-wise in any type of data.
But I am not able to make keras.layers.Conv1D work, since its apparently designed for time-series data. A MRV example is the following:
model = keras.Sequential([
keras.layers.Input(10,),
keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1"),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss=losses.categorical_crossentropy, metrics=['accuracy'])
ValueError: Input 0 of layer conv_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 10]
In that, I believe the "found ndim=2" corresponds to a tensor of [batch_size, 10] while it expects a tensor of shape [series_length, batch_size, 10] (or some other way around).
My question is: Is there a way to make 1D convolutions work in this situation in keras?
Note 1: this SO question has the same problem, though without elaborating and the accepted answer does not solve the problem.
Note 2: I suppose I can convert each datapoint of my dataset to a 2D tensor of two rows where the second would be just 0's and use Conv2D's, but I would like to avoid that.
In all Ccnv layers in Keras there is one dimension defined for the number of channels. For example you can have an image which has 2 Dimensions but Conv2D needs 3 dimension (without batch). The reason is simply because the image can have one channel (gray scale) or 3 for example (colored). the same is true for a 1D signal which can be any signal with any number of channels. you can simply add one dimension to you data. if you have an numpy array:
data = data[:, np.newaxis, :] and setting channels_first keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1, data_format="channels_first"). you can do the same through adding extra dimension at the end and setting `data_format="channels_last"

output dimension of reshape layer

model = tf.keras.Sequential()
model.add(layers.Dense(7*7*256, use_bias=False, input_shape=(100,)))
model.add(layers.BatchNormalization())
model.add(layers.LeakyReLU())
model.add(layers.Reshape((7, 7, 256)))
The dense layer takes input of 1*100 dimension. It uses 7*7*256 nodes in it's layer. Reshape layer takes 1*(7*7*256) as input and what's it's output. I mean what does (7, 7, 256) means ?
Is it the image of dimension 7 * 7 if we give input as image of 1*100? What is it ?
I am sorry, I know that I have understood it in a completely wrong way. So I wanted to understand it.
Here your model will take an input_shape of (*, 100), the first dense layer will output a shape of ( * , 7*7*256) and finaly the last Reshape layer will reshape that output to an array of shape (*, 7, 7, 256).
With * being your batch_size.
So yeah basically, your 'image' of shape (,100) will be reshaped to an array of shape
(, 7, 7, 256).
Hope this will help you
This has reference to the google's tensorflow mnist dcgan tutorial.
The first dense layer at the input is configured to have number of filters 7 * 7 * 256 and we are not able to find an explanation for this in the tutorial.
My initial impression about this is as follows:
Remember we want a 28x28 grey scale image as output. That means the required output shape is (None, 28, 28, 1) where first entity is batch size, which is none if a single image is required.
Now note that a Conv2DTranspose layer with strides=(2,2) essentially upsamples the input shape by a factor of 2, it doubles it. Secondly the number of filters of Conv2DTranspose layer become the channels, if I want the output to be grey scale, the number of filters should be one. Thus, if I want (None, 28,28,1) at the output of Conv2DTranspose layer, the shape of its input should be (None, 14,14,x). (No if channels is rather decided by current layer, x can be any value at input).
Suppose I am again putting one more Conv2DTranspose layer with strides=(2,2), before this layer, obviously the input to this layer should be (None, 7,7,x) where x is number of filters.
In general, if a batch of images of size (h, w) is input to a Conv2DTranspose layer with strides = (2,2), its output will have shape (batch_size, 2 * h, 2 * w , no_of_filters)
The google tutorial further puts one more Conv2DTranspose layer [but with strides =(1,1) so it does not have the upsampling effect] and a Dense layer on top of it. These layers are not doing upsampling so the input shape remains 7x7. 7x7 is the image shape here. The first dense layer's output is in flattened shape, so if it has 7 * 7 * x units, we can always reshape it to get an (7,7,x) image.
This is theory behind that 7 * 7 *x number of units of first dense layer. The value 256 they have used is an arbitrary value which they might have derived empirically or intuitively, I guess.

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.

How to use tf.nn.ctc_loss in cnn+ctc network

Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.
I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).
Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].
I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).
So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?
The fully connected layer should be applied per time step.
It's like applying same dense layer per time step in recurrent neural network.
For output of convolution layer, time step is width.
So for example, output shape would be:
convolution: (10,120,155,3) = (batch, height, width, channels)
flatten: (10, 155, 120*3) = (batch, max_time, features)
fully connected: (10, 155, 1024), (same dense layer applied per time step)
(10, 155, num_classes)
It is expected shape for ctc_loss in tensorflow.