Tensorflow reshape tensor gives None dimension - tensorflow

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!

TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.

Related

Does Keras masking impact weight updates and loss calcuations?

I'm working with time series, and understand that keras.layers.Masking and keras.layers.Embedding are useful to create a mask value in the network which indicates timesteps to 'skip'. The mask value is propagated throughout the network to be used by any layers that support it.
The Keras documentation doesn't specify any further impacts of the mask value. My expectation is that the mask would be applied through all functions in model training and evaluation, but I don't see any evidence in support of this.
Does the mask value impact back-propagation?
Does the mask value impact the loss function or the metrics?
Would it be wise or foolish to use the sample_weight parameter in model.compile() to tell Keras to 'ignore' the masked timesteps in the loss function?
I've performed some experiments to answer these questions.
Here's my sample code:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
# Fix the random seed for repeatable results
np.random.seed(5)
tf.random.set_seed(5)
x = np.array([[[3, 0], [1, 4], [3, 2], [4, 0], [4, 5]],
[[1, 2], [3, 1], [1, 3], [5, 1], [3, 5]]], dtype='float64')
# Choose some values to be masked out
mask = np.array([[False, False, True, True, True],
[ True, True, False, False, True]]) # True:keep. False:ignore
samples, timesteps, features_in = x.shape
features_out = 1
y_true = np.random.rand(samples, timesteps, features_out)
# y_true[~mask] = 1e6 # TEST MODIFICATION
# Apply the mask to x
mask_value = 0 # Set to any value
x[~mask] = [mask_value] * features_in
input_tensor = keras.Input(shape=(timesteps, features_in))
this_layer = input_tensor
this_layer = keras.layers.Masking(mask_value=mask_value)(this_layer)
this_layer = keras.layers.Dense(10)(this_layer)
this_layer = keras.layers.Dense(features_out)(this_layer)
model = keras.Model(input_tensor, this_layer)
model.compile(loss='mae', optimizer='adam')
model.fit(x=x, y=y_true, epochs=100, verbose=0)
y_pred = model.predict(x)
print("y_pred = ")
print(y_pred)
print("model weights = ")
print(model.get_weights()[1])
print(f"{'model.evaluate':>14s} = {model.evaluate(x, y_true, verbose=0):.5f}")
# See if the loss computed by model.evaluate() is equal to the masked loss
error = y_true - y_pred
masked_loss = np.abs(error[mask]).mean()
unmasked_loss = np.abs(error).mean()
print(f"{'masked loss':>14s} = {masked_loss:.5f}")
print(f"{'unmasked loss':>14s} = {unmasked_loss:.5f}")
Which outputs
y_pred =
[[[-0.28896046]
[-0.28896046]
[ 0.1546848 ]
[-1.1596009 ]
[ 1.5819632 ]]
[[ 0.59000516]
[-0.39362794]
[-0.28896046]
[-0.28896046]
[ 1.7996234 ]]]
model weights =
[-0.06686568 0.06484845 -0.06918766 0.06470951 0.06396528 0.06470013
0.06247645 -0.06492618 -0.06262784 -0.06445726]
model.evaluate = 0.60170
masked loss = 1.00283
unmasked loss = 0.90808
mask and loss calculation
Surprisingly, the 'mae' (mean absolute error) loss calculation does NOT exclude the masked timesteps from the calculation. Instead, it assumes that these timesteps have zero loss - a perfect prediction. Therefore, every masked timestep actually reduces the calculated loss!
To explain in more detail: the above sample code input x has 10 timesteps. 4 of them are removed by the mask, so 6 valid timesteps remain. The 'mean absolute error' loss calculation sums the losses for the 6 valid timesteps, then divides by 10 instead of dividing by 6. This looks like a bug to me.
output values are masked
Output values of masked timesteps do not impact the model training or evaluation (as it should be).
This can be easily tested by setting:
y_true[~mask] = 1e6
The model weights, predictions and losses remain exactly the same.
input values are masked
Input values of masked timesteps do not impact the model training or evaluation (as it should be).
Similarly, I can change mask_value from 0 to any other number, and the resulting model weights, predictions, and losses remain exactly the same.
In summary:
Q1: Effectively yes - the mask impacts the loss function, which is used through backpropagation to update the weights.
Q2: Yes, but the mask impacts the loss in an unexpected way.
Q3: Initially foolish - the mask should already be applied to the loss calculation. However, perhaps sample_weights could be valuable to correct the unexpected method of the loss calculation...
Note that I'm using Tensorflow 2.7.0.
I have been struggling through this on a related issue, namely implementing a mask to a multi-output model where some samples are missing labels for different outputs. Here, construct features, labels, sample_weights from a dataset and labels and sample_weights are dictionaries with equivalent keys. The weights are 0,1 for each sample indicating if it should contribute to the calculation for the relevant loss.
I had hoped that sample_weights would contribute to the loss as they do when I pass the metric equivalents for the losses via weight_metrics in model.compile
I've found that sample_weight does not seem to address this problem. I can tell from the training metrics that the task_loss values are different from task_metric values when sample weights are used.
I've given up on this and decided to go ahead and use masking. The masked loss values are low in your case (and in mine) because tensorflow sees the modeled output as perfection - I hope this means it does not see a gradient for this points and so parameters aren't tuned in response.

tensorflow 1.x reshape placeholder

From TF v1.x as shown below, x is an entry with dim [None, 784] to train my example model.
It looks similar to [?, 784] from tensorboard.
For some reason, I have to reshape x to [1, 784] to predict, that is x needs to look like [1, 784] instead of [?, 784] to predict after training the model.
Any suggestions?
with tf.name_scope('Input_Layer'):
x = tf.placeholder("float",shape=[None, 784]
,name="x")
x_image = tf.reshape(x, [-1, 28, 28, 1])
...
The "?" in tensorflow indicates that this dimension is not fixed. So it can vary from call to call. The function predict expects a shaped tensor [n_examples,784] where n_examples is the number of examples.
In your case, since you only need to predict one example, you need to reshape it to [1,784] , i.e., n_examples=1

Organizing tensor into batches of dynamically shaped tensors

I have the following situation:
I want to deploy a face detector model using Tensorflow Serving: https://www.tensorflow.org/serving/.
In Tensorflow Serving, there is a command line option called --enable_batching. This causes the model server to automatically batch the requests to maximize throughput. I want this to be enabled.
My model takes in a set of images (called images), which is a tensor of shape (batch_size, 640, 480, 3).
The model has two outputs: (number_of_faces, 4) and (number_of_faces,). The first output will be called faces. The last output, which we can call partitions is the index in the original batch for the corresponding face. For example, if I pass in a batch of 4 images and get 7 faces, then I might have this tensor as [0, 0, 1, 2, 2, 2, 3]. The first two faces correspond to the first image, the third face for the second image, the 3rd image has 3 faces, etc.
My issue is this:
In order for the --enable_batching flag to work, the output from my model needs to have the 0th dimension the same as the input. That is, I need a tensor with the following shape: (batch_size, ...). I suppose this is so that the model server can know which grpc connection to send each output in the batch towards.
What I want to do is to convert my output tensor from the face detector from this shape (number_of_faces, 4) to this shape (batch_size, None, 4). That is, an array of batches, where each batch can have a variable number of faces (e.g. one image in the batch may have no faces, and another might have 3).
What I tried:
tf.dynamic_partition. On the surface, this function looks perfect. However, I ran into difficulties after realizing that the num_partitions parameter cannot be a tensor, only an integer:
tensorflow_serving_output = tf.dynamic_partition(faces, partitions, batch_size)
If the tf.dynamic_partition function were to accept tensor values for num_partition, then it seems that my problem would be solved. However, I am back to square one since this is not the case.
Thank you all for your help! Let me know if anything is unclear
P.S. Here is a visual representation of the intended process:
I ended up finding a solution to this using TensorArray and tf.while_loop:
def batch_reconstructor(tensor, partitions, batch_size):
"""
Take a tensor of shape (batch_size, 4) and a 1-D partitions tensor as well as the scalar batch_size
And reconstruct a TensorArray that preserves the original batching
From the partitions, we can get the maximum amount of tensors within a batch. This will inform the padding we need to use.
Params:
- tensor: The tensor to convert to a batch
- partitions: A list of batch indices. The tensor at position i corresponds to batch # partitions[i]
"""
tfarr = tf.TensorArray(tf.int32, size=batch_size, infer_shape=False)
_, _, count = tf.unique_with_counts(partitions)
maximum_tensor_size = tf.cast(tf.reduce_max(count), tf.int32)
padding_tensor_index = tf.cast(tf.gather(tf.shape(tensor), 0), tf.int32)
padding_tensor = tf.expand_dims(tf.cast(tf.fill([4], -1), tf.float32), axis=0) # fill with [-1, -1, -1, -1]
tensor = tf.concat([tensor, padding_tensor], axis=0)
def cond(i, acc):
return tf.less(i, batch_size)
def body(i, acc):
partition_indices = tf.reshape(tf.cast(tf.where(tf.equal(partitions, i)), tf.int32), [-1])
partition_size = tf.gather(tf.shape(partition_indices), 0)
# concat the partition_indices with padding_size * padding_tensor_index
padding_size = tf.subtract(maximum_tensor_size, partition_size)
padding_indices = tf.reshape(tf.fill([padding_size], padding_tensor_index), [-1])
partition_indices = tf.concat([partition_indices, padding_indices], axis=0)
return (tf.add(i, 1), acc.write(i, tf.gather(tensor, partition_indices)))
_, reconstructed = tf.while_loop(
cond,
body,
(tf.constant(0), tfarr),
name='batch_reconstructor'
)
reconstructed = reconstructed.stack()
return reconstructed

TensorFlow post-LSTM fully connected layer outputs return the same values as each other

I was trying to train a sequence-to-sequence LSTM model with a dataset with three labels: [1, 0] for detection of class 1, [0, 1] for detection of class 2, and [0, 0] for detection of nothing. After getting the outputs from the LSTM network, I applied a fully connected layer to each cell's output the following way:
outputs, state = tf.nn.dynamic_rnn(cell, input)
# Shape of outputs is [batch_size, n_time_steps, n_hidden]
# As matmul works only on matrices, reshape to get the
# time dimension into the batch dimension
outputs = tf.reshape(outputs, [-1, n_hidden])
# Shape is [batch_size * n_time_steps, n_hidden]
w = tf.Variable(tf.truncated_normal(shape=[n_hidden, 2], stddev=0.1))
b = tf.Variable(tf.constant(0.1, shape=[2]))
logit = tf.add(tf.matmul(outputs, w), b, name='logit')
# Reshape back to [batch_size, n_time_steps, 2]
logit = tf.reshape(logit, [batch_size, -1, 2])
On the output, I apply tf.nn.sigmoid_cross_entropy_with_logits and reduce the mean. The model seems to work just fine achieving high accuracy and recall, except for the fact that in almost all the cases it outputs either [0, 0], or [1, 1]. The two logit outputs from the fully connected layer always have very similar values (but not the same). This effectively puts a hard-cap on precision of 50%, which the model converges to (but not a fraction of a percent above).
Now, my intuition would tell me that something must be wrong with the training step and both fully connected outputs are trained on the same data, but curiously enough when I replace my own implementation with the prepackaged one from tf.contrib:
outputs, state = tf.nn.dynamic_rnn(cell, input)
logit = tf.contrib.layers.fully_connected(outputs, 2, activation_fn=None)
without changing a single other thing, the model starts training properly. Now, the obvious solution would be to just use that implementation, but why doesn't the first one work?

regarding reshape a multi-dimensional tensor into [-1, n]

While reading a tensorflow segmentation, I am trying to figure out how does the following implementation aiming to do?
A x tensor is defined as follows self.x = tf.placeholder("float", shape=[None, None, None, n_label]).
Later, one function tries to invoke a transformed tensor "x1", which is defined as x1=tf.reshape(self.x, [-1, n_label])
My understanding is that tf.reshape(self.x, [-1,n_label])should try to re-shape
x tensor into a 1-D vector.
But I am kind of confusing about the x defined this way as shape=[None, None, None, n_label] and x1 transformed as such. What really should x1 look like and why doing so?
None means we don't want to specify dimension when creating a graph, rather want to determine it in the runtime. For instance, it could be useful when you want to use different minibatch sizes during train and for the inference.
Reshape with -1 for some dimension means just 'preserve the total size of a tensor'. For example, reshape.(x, [-1, 2]) for x of shape [3, 4, 2] would produce a new tensor of shape [12, 2].