I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.
Related
I'm trying to migrate TensorFlow checkpoint weights to PyTorch.
When I extract some weights with cp.load_variable(<CKPT>, <FIELD_NAME>), I get a 4D list ordered as HWCN, for example [1, 1, 512, 1024] which is clearly HWCN.
However, all convolution blocks data_format are set to NHWC.
So, the question is, why there's mismatch?
what should I believe? does the 4D list from cp.load_variable is correct and all left to do is permute the dimensions?
Thanks!
The weights are not given as HWCN, as the weights do not have any batch dimension (N), otherwise that would apply a different weight for each sample in the batch. The shape is [kernel_height, kernel_width, in_channels, out_channels]. There is no mismatch, because data_format specifies which format the input and output use.
In PyTorch the weight of convolutions is given as [out_channels, in_channels, kernel_height, kernel_width], therefore you only need to permute the dimensions.
I am training in a dataset in which (some of) the neighboring features exhibit very strong correlations. In order to help the neural network, I am thinking of adding some 1D convolutions as the first layers. Even though 1D convolutions are mostly used to time series/nlp data, I see no theoretical reason why they cannot be used vector-wise in any type of data.
But I am not able to make keras.layers.Conv1D work, since its apparently designed for time-series data. A MRV example is the following:
model = keras.Sequential([
keras.layers.Input(10,),
keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1"),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss=losses.categorical_crossentropy, metrics=['accuracy'])
ValueError: Input 0 of layer conv_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 10]
In that, I believe the "found ndim=2" corresponds to a tensor of [batch_size, 10] while it expects a tensor of shape [series_length, batch_size, 10] (or some other way around).
My question is: Is there a way to make 1D convolutions work in this situation in keras?
Note 1: this SO question has the same problem, though without elaborating and the accepted answer does not solve the problem.
Note 2: I suppose I can convert each datapoint of my dataset to a 2D tensor of two rows where the second would be just 0's and use Conv2D's, but I would like to avoid that.
In all Ccnv layers in Keras there is one dimension defined for the number of channels. For example you can have an image which has 2 Dimensions but Conv2D needs 3 dimension (without batch). The reason is simply because the image can have one channel (gray scale) or 3 for example (colored). the same is true for a 1D signal which can be any signal with any number of channels. you can simply add one dimension to you data. if you have an numpy array:
data = data[:, np.newaxis, :] and setting channels_first keras.layers.Conv1D(filters=64, kernel_size=3, activation='relu', name="conv_1, data_format="channels_first"). you can do the same through adding extra dimension at the end and setting `data_format="channels_last"
I'm trying keras.layers.LSTM.
The following code works.
#!/usr/bin/python3
import tensorflow as tf
import numpy as np
from tensorflow import keras
data = np.array([1, 2, 3]).reshape((1, 3, 1))
x = keras.layers.Input(shape=(3, 1))
y = keras.layers.LSTM(10)(x)
model = keras.Model(inputs=x, outputs=y)
print (model.predict(data))
As shown above, the input data shape is (1, 3, 1), and the actual input shape in the Input layer is (3, 1). I'm a little bit confused about this inconsistency of the dimension.
If I use the following shape in the Input layer, it doesn't work:
x = keras.layers.Input(shape=(1, 3, 1))
The error message is as follows:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1, 3, 1]
It seems that the rank of the input must be 3, but why should we use a rank-2 shape in the Input layer?
Keras works with "batches" of "samples". Since most models use variable batch sizes that you define only when fitting, for convenience you don't need to care about the batch dimension, but only with the sample dimension.
That said, when you use shape = (3,1), this is the same as defining batch_shape = (None, 3, 1) or batch_input_shape = (None, 3, 1).
The three options mean:
A variable batch size: None
With samples of shape (3, 1).
It's important to know this distinction especially when you are going to create custom layers, losses or metrics. The actual tensors all have the batch dimension and you should take that into account when making operations with tensors.
Check out the documentation for tf.keras.Input. The syntax is as-
tf.keras.Input(
shape=None,
batch_size=None,
name=None,
dtype=None,
sparse=False,
tensor=None,
**kwargs
)
shape: defines the shape of a single sample, with variable batch size.
Notice, that it expects the first value as batch_size otherwise pass batch_size as a parameter explicitly
So recently i am working on a project which i am supposed to take images as input to a CNN and extract the features and feed them to LSTM for training. I am using 2 Layer CNN for feature extraction and im taking the features form fully connected layer and trying to feed them to LSTM. Problem is when i want to feed the FC layer to LSTM as input i get error regarding to wrong dimension. my FC layer is a Tensor with (128,1024) dimension. I tried to reshape it like this tf.reshape(fc,[-1]) which gives me a tensor ok (131072, )
dimension and still wont work. Could anyone give me any ideas of how im suppose to feed the FC to LSTM?here i just write part of my code and teh error i get.
Convolution Layer with 32 filters and a kernel size of 5
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# Convolution Layer with 32 filters and a kernel size of 5
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
# Max Pooling (down-sampling) with strides of 2 and kernel size of 2
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
# Flatten the data to a 1-D vector for the fully connected layer
fc1 = tf.contrib.layers.flatten(conv2)
# Fully connected layer (in contrib folder for now)
fc1 = tf.layers.dense(fc1, 1024)
# Apply Dropout (if is_training is False, dropout is not applied)
fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)
s = tf.reshape(fc1, [1])
rnn_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
outputs, states = rnn.static_rnn(rnn_cell, s, dtype=tf.float32)
return tf.matmul(outputs[-1], rnn_weights['out']) + rnn_biases['out']
here is the error:
ValueError: Cannot reshape a tensor with 131072 elements to shape [1] (1 elements) for 'ConvNet/Reshape' (op: 'Reshape') with input shapes: [128,1024], [1] and with input tensors computed as partial shapes: input[1] = [1].
You have a logical error in how you approach the problem. Collapsing the data to a 1D tensor is not going to solve anything (even if you get it to work correctly).
If you are taking a sequence of images as input your input tensor should be 5D (batch, sequence_index, x, y, channel) or something permutation like that. conv2d should complain about the extra dimension but you probably missing one of them. You should try to fix it first.
Next use conv3d and max_pool3d with a window of 1 for the depth (since you don't want the different frames to interact at this stage).
When you are done you should still have 5D tensor, but x and y dimensions should be 1 (you should check this, and fix the operation if that's not the case).
The RNN part expects 3D tensors (batch, sequence_index, fature_index). You can use tf.squeeze to remove the 1 sized dimensions from your 5D tensor and get this 3D tensor. You shouldn't have to reshape anything.
If you don't use batches, it's OK, but the operations will still expect the dimension to be there (but for you it will be 1). Missing the dimension will cause problems with shapes down the line.
Recently, I try to use tensorflow to implement a cnn+ctc network base on the article Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks.
I try to feed batch spectrogram data (shape:(10,120,155,3),batch_size is 10) into 10 convolution layer and 3 fully connected layer. So the output before connecting the ctc layer is 2d data(shape:(10,1024)).
Here is my problem: I want to use tf.nn.ctc_loss function in tensorflow library,but it generate the ValueError: Dimension must be 2 but is 3 for 'transpose'(op:'Transpose') with input shapes:[?,1024],[3].
I guess the error is related to the dimension of my 2d input data. The discription of the ctc_loss function in tensorflow official site is require a 3d input with the shape (batch_size x max_time x num_classes).
So, what is the extra dimension of 'num_classes' ? what should I change the shape of my cnn+fc output data?
The fully connected layer should be applied per time step.
It's like applying same dense layer per time step in recurrent neural network.
For output of convolution layer, time step is width.
So for example, output shape would be:
convolution: (10,120,155,3) = (batch, height, width, channels)
flatten: (10, 155, 120*3) = (batch, max_time, features)
fully connected: (10, 155, 1024), (same dense layer applied per time step)
(10, 155, num_classes)
It is expected shape for ctc_loss in tensorflow.