How to crop keras image tensor dependent on model input? - tensorflow

I want to create a keras network that gets a masked image (image with a rectangular region filled by zeros) as one input and information about the position of this region as a second input:
Input 1: masked_image, shape=(128, 128, 3)
Input 2: mask_pos, shape=(2,) or (4,), depends on whether or not the region size is fix or also variable.
I want to let my network fill in the "zeroed" region and extract this reconstructed region of the model output afterwards.
The main problems I encountered today are:
slicing a tensor in general
confusion about if I need to think in terms of batches, or if keras handles that for me
how to use values from the second input in the slicing/cropping procedure
Here is the code snippet of my current attempt. The last line basically presents where I am stuck now: I need to fill in the values BOX_X, BOX_Y and BATCH_SIZE in a way that they dynamically adapt to the inputs while the network is training.
I would reaaally appreciate any help, I'm trying to figure that out for very long now:
# get masked image and bounding box information as inputs
masked_img = Input(shape=self.input_shape)
mask_pos = Input(shape=(2,), dtype="int32", name="input-mask-position")
# fill in the masked region and extract the fill-in region
filled_img = self.generator(masked_img)
fill_in = Lambda(lambda img: K.slice(img, (0, BOX_X, BOX_Y, 0), (BATCH_SIZE, 32, 32, 3)))(filled_img)
I tried a little bit further, but get an annoying error message :/
def extract_mask_region(img, box_x, box_y):
return K.slice(img, (0, box_x, box_y, 0), (32, 32, 32, 3))
fill_in = Lambda(lambda args: extract_mask_region(args[0], args[1], args[2]))([filled_img, mask_pos[1], mask_pos[0]])
Error Message:
_ValueError: Tried to convert 'begin' to a tensor and failed. Error: Shapes
must be equal rank, but are 1 and 0
From merging shape 2 with other shapes. for 'lambda_28/Slice/packed' (op: 'Pack') with input shapes: [], [2], [2], []._

Related

The input dimension of the LSTM layer in Keras

I'm trying keras.layers.LSTM.
The following code works.
#!/usr/bin/python3
import tensorflow as tf
import numpy as np
from tensorflow import keras
data = np.array([1, 2, 3]).reshape((1, 3, 1))
x = keras.layers.Input(shape=(3, 1))
y = keras.layers.LSTM(10)(x)
model = keras.Model(inputs=x, outputs=y)
print (model.predict(data))
As shown above, the input data shape is (1, 3, 1), and the actual input shape in the Input layer is (3, 1). I'm a little bit confused about this inconsistency of the dimension.
If I use the following shape in the Input layer, it doesn't work:
x = keras.layers.Input(shape=(1, 3, 1))
The error message is as follows:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 1, 3, 1]
It seems that the rank of the input must be 3, but why should we use a rank-2 shape in the Input layer?
Keras works with "batches" of "samples". Since most models use variable batch sizes that you define only when fitting, for convenience you don't need to care about the batch dimension, but only with the sample dimension.
That said, when you use shape = (3,1), this is the same as defining batch_shape = (None, 3, 1) or batch_input_shape = (None, 3, 1).
The three options mean:
A variable batch size: None
With samples of shape (3, 1).
It's important to know this distinction especially when you are going to create custom layers, losses or metrics. The actual tensors all have the batch dimension and you should take that into account when making operations with tensors.
Check out the documentation for tf.keras.Input. The syntax is as-
tf.keras.Input(
shape=None,
batch_size=None,
name=None,
dtype=None,
sparse=False,
tensor=None,
**kwargs
)
shape: defines the shape of a single sample, with variable batch size.
Notice, that it expects the first value as batch_size otherwise pass batch_size as a parameter explicitly

Keras image captioning model not compiling because of concatenate layer when mask_zero=True in a previous layer

I am new to Keras and I am trying to implement a model for an image captioning project.
I am trying to reproduce the model from Image captioning pre-inject architecture (The picture is taken from this paper: Where to put the image in an image captioning generator) (but with a minor difference: generating a word at each time step instead of only generating a single word at the end), in which the inputs for the LSTM at the first time step are the embedded CNN features. The LSTM should support variable input length and in order to do this I padded all the sequences with zeros so that all of them have maxlen time steps.
The code for the model I have right now is the following:
def get_model(model_name, batch_size, maxlen, voc_size, embed_size,
cnn_feats_size, dropout_rate):
# create input layer for the cnn features
cnn_feats_input = Input(shape=(cnn_feats_size,))
# normalize CNN features
normalized_cnn_feats = BatchNormalization(axis=-1)(cnn_feats_input)
# embed CNN features to have same dimension with word embeddings
embedded_cnn_feats = Dense(embed_size)(normalized_cnn_feats)
# add time dimension so that this layer output shape is (None, 1, embed_size)
final_cnn_feats = RepeatVector(1)(embedded_cnn_feats)
# create input layer for the captions (each caption has max maxlen words)
caption_input = Input(shape=(maxlen,))
# embed the captions
embedded_caption = Embedding(input_dim=voc_size,
output_dim=embed_size,
input_length=maxlen)(caption_input)
# concatenate CNN features and the captions.
# Ouput shape should be (None, maxlen + 1, embed_size)
img_caption_concat = concatenate([final_cnn_feats, embedded_caption], axis=1)
# now feed the concatenation into a LSTM layer (many-to-many)
lstm_layer = LSTM(units=embed_size,
input_shape=(maxlen + 1, embed_size), # one additional time step for the image features
return_sequences=True,
dropout=dropout_rate)(img_caption_concat)
# create a fully connected layer to make the predictions
pred_layer = TimeDistributed(Dense(units=voc_size))(lstm_layer)
# build the model with CNN features and captions as input and
# predictions output
model = Model(inputs=[cnn_feats_input, caption_input],
outputs=pred_layer)
optimizer = Adam(lr=0.0001,
beta_1=0.9,
beta_2=0.999,
epsilon=1e-8)
model.compile(loss='categorical_crossentropy',optimizer=optimizer)
model.summary()
return model
The model (as it is above) compiles without any errors (see: model summary) and I managed to train it using my data. However, it doesn't take into account the fact that my sequences are zero-padded and the results won't be accurate because of this. When I try to change the Embedding layer in order to support masking (also making sure that I use voc_size + 1 instead of voc_size, as it's mentioned in the documentation) like this:
embedded_caption = Embedding(input_dim=voc_size + 1,
output_dim=embed_size,
input_length=maxlen, mask_zero=True)(caption_input)
I get the following error:
Traceback (most recent call last):
File "/export/home/.../py3_env/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1567, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 200 and 1. Shapes are [200] and [1]. for 'concatenate_1/concat_1' (op: 'ConcatV2') with input shapes: [?,1,200], [?,25,1], [] and with computed input tensors: input[2] = <1>
I don't know why it says the shape of the second array is [?, 25, 1], as I am printing its shape before the concatenation and it's [?, 25, 200] (as it should be).
I don't understand why there'd be an issue with a model that compiles and works fine without that parameter, but I assume there's something I am missing.
I have also been thinking about using a Masking layer instead of mask_zero=True, but it should be before the Embedding and the documentation says that the Embedding layer should be the first layer in a model (after the input).
Is there anything I could change in order to fix this or is there a workaround to this ?
The non-equal shape error refers to the mask rather than the tensors/inputs. With concatenate supporting masking, it need to handle mask propagation. Your final_cnn_feats doesn't have mask (None), while your embedded_caption has a mask of shape (?, 25). You can find this out by doing:
print(embedded_caption._keras_history[0].compute_mask(caption_input))
Since final_cnn_feats has no mask, concatenate will give it a all non-zero mask for proper mask propagation. While this is correct, the shape of the mask, however, has the same shape as final_cnn_feats which is (?, 1, 200) rather than (?, 1), i.e. masking all features at all time step rather than just all time step. This is where the non-equal shape error comes from ((?, 1, 200) vs (?, 25)).
To fix it, you need to give final_cnn_feats a correct/matching mask. Now I'm not familiar with your project here. One option is to apply a Masking layer to final_cnn_feats, since it is designed to mask timestep(s).
final_cnn_feats = Masking()(RepeatVector(1)(embedded_cnn_feats))
This can be correct only when not all 200 features in final_cnn_feats are zero, i.e. there is always at least one non-zero value in final_cnn_feats. With that condition, Masking layer will give a (?, 1) mask and will not mask the single time step in final_cnn_feats.

Organizing tensor into batches of dynamically shaped tensors

I have the following situation:
I want to deploy a face detector model using Tensorflow Serving: https://www.tensorflow.org/serving/.
In Tensorflow Serving, there is a command line option called --enable_batching. This causes the model server to automatically batch the requests to maximize throughput. I want this to be enabled.
My model takes in a set of images (called images), which is a tensor of shape (batch_size, 640, 480, 3).
The model has two outputs: (number_of_faces, 4) and (number_of_faces,). The first output will be called faces. The last output, which we can call partitions is the index in the original batch for the corresponding face. For example, if I pass in a batch of 4 images and get 7 faces, then I might have this tensor as [0, 0, 1, 2, 2, 2, 3]. The first two faces correspond to the first image, the third face for the second image, the 3rd image has 3 faces, etc.
My issue is this:
In order for the --enable_batching flag to work, the output from my model needs to have the 0th dimension the same as the input. That is, I need a tensor with the following shape: (batch_size, ...). I suppose this is so that the model server can know which grpc connection to send each output in the batch towards.
What I want to do is to convert my output tensor from the face detector from this shape (number_of_faces, 4) to this shape (batch_size, None, 4). That is, an array of batches, where each batch can have a variable number of faces (e.g. one image in the batch may have no faces, and another might have 3).
What I tried:
tf.dynamic_partition. On the surface, this function looks perfect. However, I ran into difficulties after realizing that the num_partitions parameter cannot be a tensor, only an integer:
tensorflow_serving_output = tf.dynamic_partition(faces, partitions, batch_size)
If the tf.dynamic_partition function were to accept tensor values for num_partition, then it seems that my problem would be solved. However, I am back to square one since this is not the case.
Thank you all for your help! Let me know if anything is unclear
P.S. Here is a visual representation of the intended process:
I ended up finding a solution to this using TensorArray and tf.while_loop:
def batch_reconstructor(tensor, partitions, batch_size):
"""
Take a tensor of shape (batch_size, 4) and a 1-D partitions tensor as well as the scalar batch_size
And reconstruct a TensorArray that preserves the original batching
From the partitions, we can get the maximum amount of tensors within a batch. This will inform the padding we need to use.
Params:
- tensor: The tensor to convert to a batch
- partitions: A list of batch indices. The tensor at position i corresponds to batch # partitions[i]
"""
tfarr = tf.TensorArray(tf.int32, size=batch_size, infer_shape=False)
_, _, count = tf.unique_with_counts(partitions)
maximum_tensor_size = tf.cast(tf.reduce_max(count), tf.int32)
padding_tensor_index = tf.cast(tf.gather(tf.shape(tensor), 0), tf.int32)
padding_tensor = tf.expand_dims(tf.cast(tf.fill([4], -1), tf.float32), axis=0) # fill with [-1, -1, -1, -1]
tensor = tf.concat([tensor, padding_tensor], axis=0)
def cond(i, acc):
return tf.less(i, batch_size)
def body(i, acc):
partition_indices = tf.reshape(tf.cast(tf.where(tf.equal(partitions, i)), tf.int32), [-1])
partition_size = tf.gather(tf.shape(partition_indices), 0)
# concat the partition_indices with padding_size * padding_tensor_index
padding_size = tf.subtract(maximum_tensor_size, partition_size)
padding_indices = tf.reshape(tf.fill([padding_size], padding_tensor_index), [-1])
partition_indices = tf.concat([partition_indices, padding_indices], axis=0)
return (tf.add(i, 1), acc.write(i, tf.gather(tensor, partition_indices)))
_, reconstructed = tf.while_loop(
cond,
body,
(tf.constant(0), tfarr),
name='batch_reconstructor'
)
reconstructed = reconstructed.stack()
return reconstructed

Learning: Vector input in Tensorflow

Many of the examples which I have learnt to code are of scalar input numbers. I want to try vector input. With example of https://github.com/tencia/stocks_rnn
I tried to change the code to input [x,x^2] instead of x, with following two lines of changes. But I get error.
in STOCKLSTM:
self._input_data = tf.placeholder(tf.float32, [2, batch_size, num_steps])
In main/Epoch
cost, state, _ = session.run([m.cost, m.final_state, eval_op],
{m.input_data: (x,x**2), m.targets: y, m.initial_state: state})
ERROR:
ValueError: Cannot feed value of shape (2, 30, 10) for Tensor u'model/Placeholder:0', which has shape '(30, 10)'
Any ideas if thought direction is correct? I feel severely punished for bunking tensor classes in grad :(
Karma
Here the problem is giving the batch size as a value in to the place holder. Make that 2 , none. That means it can get any amount of batch data.When using placeholders don't initiate each and everything because they are flexible structures.

Tensorflow reshape tensor gives None dimension

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.