I'm currently building a GAN with Tensorflow 2 and Keras and noticed a lot of the existing Neural Networks for the generator and discriminator use Conv2D and Conv2DTranspose in Keras.
I'm struggling to find something that functionally explains the difference between the two. Can anyone explain what these two different options for making a NN in Keras mean?
Conv2D applies Convolutional operation on the input. On the contrary, Conv2DTranspose applies a Deconvolutional operation on the input.
For example:
x = tf.random.uniform((1,3,3,1))
conv2d = tf.keras.layers.Conv2D(1,2)(x)
print(conv2d.shape)
# (1, 2, 2, 1)
conv2dTranspose = tf.keras.layers.Conv2DTranspose(1,2)(x)
print(conv2dTranspose.shape)
# (1, 4, 4, 1)
Conv2D is mainly used when you want to detect features, e.g., in the encoder part of an autoencoder model, and it may shrink your input shape.
Conversely, Conv2DTranspose is used for creating features, for example, in the decoder part of an autoencoder model for constructing an image. As you can see in the above code, it makes the input shape larger.
For example:
kernel = tf.constant_initializer(1.)
x = tf.ones((1,3,3,1))
conv = tf.keras.layers.Conv2D(1,2, kernel_initializer=kernel)
y = tf.ones((1,2,2,1))
de_conv = tf.keras.layers.Conv2DTranspose(1,2, kernel_initializer=kernel)
conv_output = conv(x)
print("Convolution\n---------")
print("input shape:",x.shape)
print("output shape:",conv_output.shape)
print("input tensor:",np.squeeze(x.numpy()).tolist())
print("output tensor:",np.around(np.squeeze(conv_output.numpy())).tolist())
'''
Convolution
---------
input shape: (1, 3, 3, 1)
output shape: (1, 2, 2, 1)
input tensor: [[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]
output tensor: [[4.0, 4.0], [4.0, 4.0]]
'''
de_conv_output = de_conv(y)
print("De-Convolution\n------------")
print("input shape:",y.shape)
print("output shape:",de_conv_output.shape)
print("input tensor:",np.squeeze(y.numpy()).tolist())
print("output tensor:",np.around(np.squeeze(de_conv_output.numpy())).tolist())
'''
De-Convolution
------------
input shape: (1, 2, 2, 1)
output shape: (1, 3, 3, 1)
input tensor: [[1.0, 1.0], [1.0, 1.0]]
output tensor: [[1.0, 2.0, 1.0], [2.0, 4.0, 2.0], [1.0, 2.0, 1.0]]
'''
To sum up:
Conv2D:
May shrink your input
For detecting features
Conv2DTranspose:
Enlarges your input
For constructing features
And if you want to know how Conv2DTranspose enlarges input, here you go:
Related
I'm trying to get mean IoU of my binary semantic segmentation model using tensorflow.keras.metrics MeanIoU. However the output shows that the MeanIoU is 1.0, which should not be correct because the loss(binary crossentropy) is decreasing during the training. Does anyone have an idea how can I get the right value?
Here is what I have tried so far.
from tensorflow.keras.metrics import MeanIoU
#Test generator using validation data.
test_image_batch, test_mask_batch = val_img_gen.__next__()
print(test_mask_batch.shape) #(32, 256, 256, 3)
#Convert categorical to integer for visualization and IoU calculation
test_mask_batch_argmax = np.argmax(test_mask_batch, axis=3)
test_pred_batch = (model.predict(test_image_batch)> 0.5).astype(np.uint8)
print(test_pred_batch.shape) # (32, 256, 256, 1)
test_pred_batch_argmax = np.argmax(test_pred_batch, axis=3)
print(test_mask_batch.shape) #(32, 256, 256, 3)
n_classes = 2
IOU_keras = MeanIoU(num_classes=n_classes)
IOU_keras.update_state(test_pred_batch_argmax, test_mask_batch_argmax)
print("Mean IoU =", IOU_keras.result().numpy())
#output -- Mean IoU = 1.0
I am wondering whether it is possible to end up with the same tensor after propagating it through a convolutional and then deconvolutional filter. For example:
random_image = np.random.rand(1, 6, 6, 3)
input_image = tf.placeholder(shape=[1, 6, 6, 3], dtype=tf.float32)
conv = tf.layers.conv2d(input_image, filters=6, kernel_size=[3, 3], strides=(1, 1), data_format="channels_last")
deconv = tf.layers.conv2d_transpose(conv, filters=3, kernel_size=[3, 3], strides=(1, 1), data_format="channels_last")
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(random_image)
# Get an output which will be same as:
print(sess.run(deconv, feed_dict={input_image: random_image}))
In other words, if the generated random_image vector is for example: [1,2,3,4,5], after convolution and deconvolution the deconv vector to be [1,2,3,4,5].
However, I am not able to get it to work.
Looking forward to you answers!
It's possible to get some degree of visual similarity, by using VarianceScaling initialization for example. Or even with completely custom initializer. But transposed convolution isn't mathematically deconvolution. So you can't get math equality with conv2d_transpose.
Take a look Why isn't this Conv2d_Transpose / deconv2d returning the original input in tensorflow?
sequence_size = [4, 2, 3] ### batch_size:4 num_steps:2 embedding_size: 3
num_units = 2
dummy_sequences = np.array([[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]],
[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]])
fw_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
bw_cell = tf.nn.rnn_cell.BasicLSTMCell(num_units)
inputs = tf.placeholder(dtype=tf.float64, shape=sequence_size)
encoder_outputs, encoder_state = tf.nn.bidirectional_dynamic_rnn(cell_fw=fw_cell, cell_bw=bw_cell,
inputs=inputs, sequence_length=sequence_size,
dtype=tf.float64)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
output, state = sess.run([encoder_outputs, encoder_state], feed_dict={inputs: dummy_sequences})
print(output, state)
I coded an example to test the usage of rnn in tensorflow, and I encountered a problem with the parameter sequence_length. If I remove the parameter sequence_length, the code will run correctly. So, what is the correct way to set sequence_length. It confused me a lot because I have already set sequence_length in the order of batch_size, num_steps and embedding_size. Thanks a lot for your answer.
And the error is as fllowing:
ValueError: Dimension 0 in both shapes must be equal, but are 3 and 4 for 'bidirectional_rnn/fw/fw/while/Select' (op: 'Select') with input shapes: [3], [?,2], [4,2].
Parameter sequence_length expects the number of timesteps each sample should be processed for - num_steps in your example above. sequence_length must be a vector of length batch_size.
It is not the shape of input. Suppose your input is as in dummy_sequence that is [4,2,3]. You have 4 samples, 2 timesteps long, each represented by 3 values.
Hence your sequence_length is [2,2,2,2]. In case all samples are of the same length you can omit this parameter. Otherwise, the network will output zero-vector output for each timestep of each sample after maximum timestep for that sample has been reached.
Predictions only successful when providing a single instance instance.json.
Test 1: Contents of instance.json:
{"serving_input": [20.0, 0.0, 1.0 ... 0.16474569041197143, 0.04138248072194471], "prediction_id": 0, "keep_prob": 1.0}
Prediction (same output for local and online prediction)
gcloud ml-engine local predict --model-dir=./model_dir --json-instances=instances.json
Output:
SERVING_OUTPUT ARGMAX PREDICTION_ID SCORES TOP_K
[-340.6920166015625, -1153.0877685546875] 0 0 [1.0, 0.0] [1.0, 0.0]
Test 2: Contents of instance.json:
{"serving_input": [20.0, 0.0, 1.0 ... 0.16474569041197143, 0.04138248072194471], "prediction_id": 0, "keep_prob": 1.0}
{"serving_input": [21.0, 2.0, 3.0 ... 3.14159265359, 0.04138248072194471], "prediction_id": 1, "keep_prob": 1.0}
Output:
.. Incompatible shapes: [2] vs. [2,108] .. (_arg_keep_prob_0_1, Model/dropout/random_uniform)
Where as 108 is the size of the first hidden layer(net_dim=[2015,108,2]). (Initialized with tf.nn.dropout, thus the keep_prob=1.0)
Exporting code:
probabilities = tf.nn.softmax(self.out_layer)
top_k, _ = tf.nn.top_k(probabilities, self.network_dim[-1])
prediction_signature = (
tf.saved_model.signature_def_utils.predict_signature_def(
inputs={'serving_input': self.x, 'keep_prob': self.keep_prob,
'prediction_id': self.prediction_id_in},
outputs={'serving_output': self.out_layer, 'argmax': tf.argmax(self.out_layer, 1),
'prediction_id': self.prediction_id_out, 'scores': probabilities, 'top_k': top_k}))
builder.add_meta_graph_and_variables(
sess,
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={
tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
prediction_signature
},
main_op=tf.saved_model.main_op.main_op())
builder.save()
How can i format the instance.json to perform a batched prediction? (Prediction with multiple input instances)
The problem is not in the JSON. Check to see how you are using self.x
I think that your code is assuming that it's a 1D array, when you should treat it as a tensor of shape [?, 108]
I'm trying to set CIFAR10's tf-slim model to have input of dynamic batch, height, width and single channel, i.e. monochromatic images of different sizes. Given that all shapes but channel size are dynamic, the output shape of tf.flatten is (?, ?). Is there any way to circumvent this? I'm trying to adapt CIFAR10 to tf's DeepDream tutorial that uses InceptionV3 with an unspecific input shape.
I'm assuming this happens because CIFAR10 is not fully convolutional
import tensorflow as tf
slim = tf.contrib.slim
images = tf.placeholder(tf.float32, shape=(None, None, None, 1))
NUM_CLASSES = 18
scope = 'CifarNet'
with tf.variable_scope(scope, 'CifarNet', [images, NUM_CLASSES]):
net = slim.conv2d(images, 64, [5, 5], scope='conv1')
net = slim.max_pool2d(net, [2, 2], 2, scope='pool1')
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm1')
net = slim.conv2d(net, 64, [5, 5], scope='conv2')
net = tf.nn.lrn(net, 4, bias=1.0, alpha=0.001/9.0, beta=0.75, name='norm2')
net = slim.max_pool2d(net, [2, 2], 2, scope='pool2')
net = slim.flatten(net)
net = slim.fully_connected(net, 384, scope='fc3')
ValueError: The last dimension of the inputs to Dense should be defined. Found None.