Deconvolutions/Transpose_Convolutions with tensorflow - tensorflow

I am attempting to use tf.nn.conv3d_transpose, however, I am getting an error indicating that my filter and output shape is not compatible.
I have a tensor of size [1,16,16,4,192]
I am attempting to use a filter of [1,1,1,192,192]
I believe that the output shape would be [1,16,16,4,192]
I am using "same" padding and a stride of 1.
Eventually, I want to have an output shape of [1,32,32,7,"does not matter"], but I am attempting to get a simple case to work first.
Since these tensors are compatible in a regular convolution, I believed that the opposite, a deconvolution, would also be possible.
Why is it not possible to perform a deconvolution on these tensors. Could I get an example of a valid filter size and output shape for a deconvolution on a tensor of shape [1,16,16,4,192]
Thank you.

I have a tensor of size [1,16,16,4,192]
I am attempting to use a filter of [1,1,1,192,192]
I believe that the output shape would be [1,16,16,4,192]
I am using "same" padding and a stride of 1.
Yes the output shape will be [1,16,16,4,192]
Here is a simple example showing that the dimensions are compatible:
import tensorflow as tf
i = tf.Variable(tf.constant(1., shape=[1, 16, 16, 4, 192]))
w = tf.Variable(tf.constant(1., shape=[1, 1, 1, 192, 192]))
o = tf.nn.conv3d_transpose(i, w, [1, 16, 16, 4, 192], strides=[1, 1, 1, 1, 1])
print(o.get_shape())
There must be some other problem in your implementation than the dimensions.

Related

Different behavior of sequential API and functional API for tensorflow embedding

When I tried using Sequential API and Functional API in Tensorflow to apply the same simple embedding function, I see different result.
The result is as follows:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers
inputs = np.random.randint(0, 99, [32, 100, 1])
myLayer = layers.Embedding(input_dim = 100, output_dim = 8)
# Sequential API
sm = keras.Sequential()
sm.add(myLayer)
sm_out = sm(inputs)
sm_out.shape # Shape of sm_out is: TensorShape([32, 100, 8])
# Functional API
fm_out = myLayer(inputs)
fm_out.shape # Shape of fm_out is: TensorShape([32, 100, 1, 8])
Is it intended or a bug?
First of all, your second call is not a functional API call. You need to wrap your layer output (with a tf.keras.layers.Input) in a tf.keras.models.Model for this to be a functional API call.
Secondly, when you're calling the sequential model, it is smart enough to detect that last dimension is 1 and ignore that when looking up embeddings (I'm not sure where exactly this is handled, maybe someone else can point to). So when you pass in a tensor of [32, 100, 1], what the embedding layer really sees is a [32, 100] sized array. This, after the look up, gets converted to a [32, 100, 8] sized tensor.
In your second call, when calling the model directly, it doesn't do this. So it simply converts the [32, 100, 1] sized input to a [32, 100, 1, 8] sized input.
You can get the same result from both these methods if you set your inputs shape to [32, 100] or [32, 100, 2] (last dimension != 1).
I guess the lesson here is always use the input_shape argument (to the first layer of the Sequential model) to prevent such unexpected behaviors.

The input dimension of the LSTM layer in Keras

I'm trying keras.layers.LSTM.
The following code works.
#!/usr/bin/python3
import tensorflow as tf
import numpy as np
from tensorflow import keras
data = np.array([1, 2, 3]).reshape((1, 3, 1))
x = keras.layers.Input(shape=(3, 1))
y = keras.layers.LSTM(10)(x)
model = keras.Model(inputs=x, outputs=y)
print (model.predict(data))
As shown above, the input data shape is (1, 3, 1), and the actual input shape in the Input layer is (3, 1). I'm a little bit confused about this inconsistency of the dimension.
If I use the following shape in the Input layer, it doesn't work:
x = keras.layers.Input(shape=(1, 3, 1))
The error message is as follows:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1, 3, 1]
It seems that the rank of the input must be 3, but why should we use a rank-2 shape in the Input layer?
Keras works with "batches" of "samples". Since most models use variable batch sizes that you define only when fitting, for convenience you don't need to care about the batch dimension, but only with the sample dimension.
That said, when you use shape = (3,1), this is the same as defining batch_shape = (None, 3, 1) or batch_input_shape = (None, 3, 1).
The three options mean:
A variable batch size: None
With samples of shape (3, 1).
It's important to know this distinction especially when you are going to create custom layers, losses or metrics. The actual tensors all have the batch dimension and you should take that into account when making operations with tensors.
Check out the documentation for tf.keras.Input. The syntax is as-
tf.keras.Input(
shape=None,
batch_size=None,
name=None,
dtype=None,
sparse=False,
tensor=None,
**kwargs
)
shape: defines the shape of a single sample, with variable batch size.
Notice, that it expects the first value as batch_size otherwise pass batch_size as a parameter explicitly

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.

How is the input tensor for TensorFlow's tf.nn.dynamic_rnn operator structured?

I am trying to write a language model using word embeddings and recursive neural networks in TensorFlow 0.9.0 using the tf.nn.dynamic_rnn graph operation, but I don't understand how the input tensor is structured.
Let's say I have a corpus of n words. I embed each word in a vector of length e, and I want my RNN to unroll to t time steps. Assuming I use the default time_major = False parameter, what shape would my input tensor [batch_size, max_time, input_size] have?
Maybe a specific tiny example will make this question clearer. Say I have a corpus consisting of n=8 words that looks like this.
1, 2, 3, 3, 2, 1, 1, 2
Say I embed it in a vector of size e=3 with the embeddings 1 -> [10, 10, 10], 2 -> [20, 20, 20], and 3 -> [30, 30, 30], what would my input tensor look like?
I've read the TensorFlow Recurrent Neural Network tutorial, but that doesn't use tf.nn.dynamic_rnn. I've also read the documentation for tf.nn.dynamic_rnn, but find it confusing. In particular I'm not sure what "max_time" and "input_size" mean here.
Can anyone give the shape of the input tensor in terms of n, t, and e, and/or an example of what that tensor would look like initialized with data from the small corpus I describe?
TensorFlow 0.9.0, Python 3.5.1, OS X 10.11.5
In your case, it looks like batch_size = 1, since you're looking at a single example. So max_time is n=8 and input_size is the input depth, in your case e=3. So you would want to construct an input tensor which is shaped [1, 8, 3]. It's batch_major, so the first dimension (the batch dimension) is 1. If, say, you had another input at the same time, with n=6 words, then you would combine the two by padding this second example to 8 words (by padding zeros for the last 2 word embeddings) and you would have an inputs size of [2, 8, 3].

Tensorflow reshape tensor gives None dimension

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.