Different behavior of sequential API and functional API for tensorflow embedding - tensorflow

When I tried using Sequential API and Functional API in Tensorflow to apply the same simple embedding function, I see different result.
The result is as follows:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers
inputs = np.random.randint(0, 99, [32, 100, 1])
myLayer = layers.Embedding(input_dim = 100, output_dim = 8)
# Sequential API
sm = keras.Sequential()
sm.add(myLayer)
sm_out = sm(inputs)
sm_out.shape # Shape of sm_out is: TensorShape([32, 100, 8])
# Functional API
fm_out = myLayer(inputs)
fm_out.shape # Shape of fm_out is: TensorShape([32, 100, 1, 8])
Is it intended or a bug?

First of all, your second call is not a functional API call. You need to wrap your layer output (with a tf.keras.layers.Input) in a tf.keras.models.Model for this to be a functional API call.
Secondly, when you're calling the sequential model, it is smart enough to detect that last dimension is 1 and ignore that when looking up embeddings (I'm not sure where exactly this is handled, maybe someone else can point to). So when you pass in a tensor of [32, 100, 1], what the embedding layer really sees is a [32, 100] sized array. This, after the look up, gets converted to a [32, 100, 8] sized tensor.
In your second call, when calling the model directly, it doesn't do this. So it simply converts the [32, 100, 1] sized input to a [32, 100, 1, 8] sized input.
You can get the same result from both these methods if you set your inputs shape to [32, 100] or [32, 100, 2] (last dimension != 1).
I guess the lesson here is always use the input_shape argument (to the first layer of the Sequential model) to prevent such unexpected behaviors.

Related

The input dimension of the LSTM layer in Keras

I'm trying keras.layers.LSTM.
The following code works.
#!/usr/bin/python3
import tensorflow as tf
import numpy as np
from tensorflow import keras
data = np.array([1, 2, 3]).reshape((1, 3, 1))
x = keras.layers.Input(shape=(3, 1))
y = keras.layers.LSTM(10)(x)
model = keras.Model(inputs=x, outputs=y)
print (model.predict(data))
As shown above, the input data shape is (1, 3, 1), and the actual input shape in the Input layer is (3, 1). I'm a little bit confused about this inconsistency of the dimension.
If I use the following shape in the Input layer, it doesn't work:
x = keras.layers.Input(shape=(1, 3, 1))
The error message is as follows:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1, 3, 1]
It seems that the rank of the input must be 3, but why should we use a rank-2 shape in the Input layer?
Keras works with "batches" of "samples". Since most models use variable batch sizes that you define only when fitting, for convenience you don't need to care about the batch dimension, but only with the sample dimension.
That said, when you use shape = (3,1), this is the same as defining batch_shape = (None, 3, 1) or batch_input_shape = (None, 3, 1).
The three options mean:
A variable batch size: None
With samples of shape (3, 1).
It's important to know this distinction especially when you are going to create custom layers, losses or metrics. The actual tensors all have the batch dimension and you should take that into account when making operations with tensors.
Check out the documentation for tf.keras.Input. The syntax is as-
tf.keras.Input(
shape=None,
batch_size=None,
name=None,
dtype=None,
sparse=False,
tensor=None,
**kwargs
)
shape: defines the shape of a single sample, with variable batch size.
Notice, that it expects the first value as batch_size otherwise pass batch_size as a parameter explicitly

How to drop last row and last col in a tensor using Keras Tensorflow

Let's say I have a tensor (None, 2, 56, 56, 256). Now I want to have my tensor with shape (None, 2, 55, 55, 256) by dropping last col and last row. How can I acheive this using Keras/Tensorflow?
In tensorflow we can slice tensors using python slice notation. SO, given a tensor X with shape (20,2,56,56,256) say (as you have described but with a batch size of 20), we can easily slice it taking all but the last 'row' in the 2nd and 3rd dimension as follows:
X[:,:,:-1,:-1,:]
Note the use of :-1 to denote "everything before the last 'row'".
Given this know-how about slicing the tensor in tensorflow we just need to adapt it for keras. We could, of course, write a full blown custom layer implementing this (or possibly even find one out there someone else has written - I've not looked but slicing is pretty common so suspect someone has written something somewhere!).
However, for something as simple as this, I'd advocate just using a Lambda layer which we can define as follows:
my_slicing_layer = Lambda(lambda x: x[:,:,:-1,:-1,:], name='slice')
And can use in our keras models as normal:
my_model = Sequential([
Activation('relu', input_shape=(2,56,56,256)),
my_slicing_layer
])

Deconvolutions/Transpose_Convolutions with tensorflow

I am attempting to use tf.nn.conv3d_transpose, however, I am getting an error indicating that my filter and output shape is not compatible.
I have a tensor of size [1,16,16,4,192]
I am attempting to use a filter of [1,1,1,192,192]
I believe that the output shape would be [1,16,16,4,192]
I am using "same" padding and a stride of 1.
Eventually, I want to have an output shape of [1,32,32,7,"does not matter"], but I am attempting to get a simple case to work first.
Since these tensors are compatible in a regular convolution, I believed that the opposite, a deconvolution, would also be possible.
Why is it not possible to perform a deconvolution on these tensors. Could I get an example of a valid filter size and output shape for a deconvolution on a tensor of shape [1,16,16,4,192]
Thank you.
I have a tensor of size [1,16,16,4,192]
I am attempting to use a filter of [1,1,1,192,192]
I believe that the output shape would be [1,16,16,4,192]
I am using "same" padding and a stride of 1.
Yes the output shape will be [1,16,16,4,192]
Here is a simple example showing that the dimensions are compatible:
import tensorflow as tf
i = tf.Variable(tf.constant(1., shape=[1, 16, 16, 4, 192]))
w = tf.Variable(tf.constant(1., shape=[1, 1, 1, 192, 192]))
o = tf.nn.conv3d_transpose(i, w, [1, 16, 16, 4, 192], strides=[1, 1, 1, 1, 1])
print(o.get_shape())
There must be some other problem in your implementation than the dimensions.

Reusing layer weights in Tensorflow

I am using tf.slim to implement an autoencoder. I's fully convolutional with the following architecture:
[conv, outputs = 1] => [conv, outputs = 15] => [conv, outputs = 25] =>
=> [conv_transpose, outputs = 25] => [conv_transpose, outputs = 15] =>
[conv_transpose, outputs = 1]
It has to be fully convolutional and I cannot do pooling (limitations of the larger problem). I want to use tied weights, so
encoder_W_3 = decoder_W_1_Transposed
(so the weights of the first decoder layer are the ones of the last encoder layer, transposed).
If I reuse weights the regular way tfslim lets you reuse them, i.e. reuse = True and then just provide the scope name of the layer you want to reuse, I get size issue:
ValueError: Trying to share variable cnn_block_3/weights, but specified shape (21, 11, 25, 25) and found shape (21, 11, 15, 25).
This makes sense, if you do not transpose the weights of the previous model. Does anyone have an idea on how I can transpose those weights?
PS: I know this is very abstract and hand-waving, but I am working with a custom api, on top of tfslim, so I can't post code examples here.
Does anyone have an idea on how I can transpose those weights?
Transposition is simple:
new_weights = tf.transpose(weights, perm=[0, 1, 3, 2])
will swap the last two axes.
However, as #Seven mentioned, that wouldn't be enough to address the error, as the total number of weights changed.

Oversampling images during inference

It is is a common practice in convolutional neural networks to oversample a given image during inference,
I.e to create a batch from different transformation of the same image (most common - different crops and mirroring), transfer the entire batch through the network and average (or another kind of reducing function) over the results to get a single prediction (caffe example),
How can this approach be implemented in tensorflow?
You can take a look at the TF cnn tutorial. In particular, the function distorted_inputs does the image preprocessing step.
In short, there are a couple of TF functions in the tf.image package that help with distorting the images. You can use either them or regular numpy functions to create an extra dimension for the output, for which you can average the results:
Before:
input_place = tf.placeholder(tf.float32, [None, 256, 256, 3])
prediction = some_model(input_place) # size: [None]
sess.run(prediction, feed_dict={input_place: batch_of_images})
After:
input_place = tf.placeholder(tf.float32, [None, NUM_OF_DISTORTIONS, 256, 256, 3])
prediction = some_model(input_place) # make sure it is of size [None, NUM_DISTORTIONS]
new_prediction = tf.reduce_mean(prediction, axis=1)
new_batch = np.zeros(batch_size, NUM_OF_DISTORTIONS, 256, 256, 3)
for i in xrange(len(batch_of_images)):
for f in xrange(len(distortion_functions)):
new_batch[i, f, :, :, :] = distortion_functions[f](batch_of_images[i])
sess.run(new_prediction, feed_dict={input_place: new_batch})
Take a look at TF's image-related functions. You could apply those transformations at test time to some input image, and stack all of them together to make a batch.
I imagine you could also do this using OpenCV or some other image processing tool. I don't see a need to do it in the computation graph. You could create the batches beforehand, and pass it through in feed_dict.