Tensorflow 2 timeseries_dataset_from_array input vs target batch shapes difference - tensorflow

The new tf.keras.preprocessing.timeseries_dataset_from_array function is used to create sliding minibatch windows over the sequential data, for example for tasks involving rnn networks.
According to the docs it returns a minibatch of inputs and targets. However, the target minibatch this function returns does not have a sequence_length (timesteps) dimension. For example.
data = timeseries_dataset_from_array(
data=tokens,
targets=targets,
sequence_length=25,
batch_size=32,
)
for minbatch in data:
inputs, targets = minbatch
assert(inputs.shape[1] == targets.shape[1]) # error
The inputs have [32, 25, 1] shape in case you already just have word indices there and targets confusingly have [32, 1] shape.
So, my question is how am I supposed to map a tensor of inputs with a window of 25 units to a target tensor with a window of 0 units?
How I always train sequence models is by feeding the input tensor of [32, 25, 1] which is then projected into [32, 25, 100] and then you feed the target tensor to the network of size [32, 25, 1] to your loss function or if you have multi-class problem a target vector of [32, 25, num_of_classes].
That is why I am confused by the shape of the target tensor from timeseries_dataset_from_array and the intuition behind it.

Related

Significant increase in memory usage when passing through tensorflow layers

I am using tensorflow to build a CNN model, which consists of 4 layers.
Each layer contain one Conv2D, one batch normalization and one activation in order.
For details,
def __call__(self, hidden):
hidden = self._layers(hidden)
hidden = self._norm(hidden)
hidden = self._act(hidden)
return hidden
the Conv2D layer is built by
tf.keras.layers.Conv2D(depth,kernel,stride,pad,**kwargs).
For example in the first layer, depth = 64, kernel = [4, 4], stride = 2, pad = 'valid'.
Depth times 2 at each layer afterwards while others keep same.
the batch normalization is implemented by
self.scale = self.add_weight(
'scale', input_shape[-1], tf.float32, 'Ones')
self.offset = self.add_weight(
'offset', input_shape[-1], tf.float32, 'Zeros')
mean, var = tf.nn.moments(x, 0, keepdims=True)
tf.nn.batch_normalization(x, mean, var, self.offset, self.scale, 1e-3)
where x is input that is also the output of previous Conv2d layer.
the activation is tf.nn.elu.
I checked GPU memory at each step.
Before Conv2D layer, used GPU memory is 4GB. The input data has shape (6490, 60, 80, 4), type is float16.
After Conv2D layer and before BN, memory is 6GB. Features has shape (6490, 29, 39, 64), type is float16.
After BN and before act, memory is 10GB. Features has shape (6490, 29, 39, 64), type is float16.
After act, memory is 18GB. Output has shape (6490, 29, 39, 64), type is float16.
Tensorflow has tf.config.experimental.set_memory_growth(True)
When completes 4 layers of CNN model and forwards to the next process, the memory used still keeps 18GB. So in the next process, out of memory occurs.
I am not sure whether there is a bug or this is just due to excessive input data. I really wonder why such a large memory space is used when passing through those layers in training model, especially for BN and act layer.
Any help will be appreciated!
System Info
OS : Ubuntu 20.04
Python : 3.8.10
Tensorflow : 2.11.0

Tensorflow - No gradients provided for any variable while executing the mean_squared_error loss function

I am doing transfer learning using a pre-trained inception-resnet-v2 model. From one of the conv layers I am extracting the best activation (best quality) to calculate the predicted landmarks using opencv and numpy operations. The loss function I am applying is the mean_squared_error loss. Unfortunately, when I am executing this function I get an error message that no gradients are available for any of the variables. I am struggling with this problem since two weeks and I don't know how to proceed. While debugging I could see that the problem occurred when the apply_gradients function gets executed internally. I have searched and used some solutions from here like this ones:
ValueError: No gradients provided for any variable in Tensorflow
selecting trainable variables to compute gradient "No variables to optimize"
Tensorflow: How to replace or modify gradient?
...
In addition, I have tried to write my own operation with gradient support, using this awesome tutorial: https://code-examples.net/en/q/253d718, because this solution wraps my python and opencv code in tensorflow. Unfortunately, the issue still remains. Tracing the path from the output of the network to the mean_squared_error function using TensorBoard, I could see that the path is available and continuously, too.
# Extracts the best predicted images from a specific activation
layer
# PYTHON function: get_best_images(...) -> uses numpy and opencv
# PYTHON function: extract_landmarks(...) -> uses numpy
# Endpoints is the conv layer that gets extracted
best_predicted = tf.py_func(get_best_images, [input,
end_points['Conv2d_1a_3x3']], tf.uint8) # Gets best activation
best_predicted.set_shape(input.shape)
# Gets the predicted landmarks and processes both target and
predicted for further calculation
proc_landmarks = tf.py_func(get_landmarks, [best_predicted,
target_landmarks], [tf.int32, tf.int32])
proc_landmarks[0].set_shape(target_landmarks.shape)
# target landmarks
proc_landmarks[1].set_shape(target_landmarks.shape)
# predicted landmarks
# --> HERE COMES THE COMPUTATION TO PROCESS THE TARGET AND PREDICTED
LANDMARKS
# Flattens and reshapes the tensors to 1D (68,1)
target_flatten = tf.reshape(target_result[0], [-1])
target_flatten = tf.reshape(target_flatten, [68,1])
predicted_flatten = tf.reshape(predicted_result[1], [-1])
predicted_flatten = tf.reshape(predicted_flatten, [68,1])
edit_target_landmarks = tf.cast(target_flatten, dtype=tf.float32)
edit_predicted_landmarks = tf.cast(predicted_flatten,
dtype=tf.float32)
# Calculating the loss
mse_loss =
tf.losses.mean_squared_error(labels=edit_target_landmarks,
predictions=edit_predicted_landmarks)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001,
name='ADAM_OPT').minimize(mse_loss) # <-- here does the error occur
The error message is this one (for short only some variables get listed):
ValueError: No gradients provided for any variable, check your graph >for ops that do not support gradients, between variables ["'InceptionResnetV2/Conv2d_1a_3x3/weights:0' shape=(3, 3, 3, 32) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_1a_3x3/BatchNorm/beta:0' shape=(32,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2a_3x3/weights:0' shape=(3, 3, 32, 32) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2a_3x3/BatchNorm/beta:0' shape=(32,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2b_3x3/weights:0' shape=(3, 3, 32, 64) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_2b_3x3/BatchNorm/beta:0' shape=(64,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_3b_1x1/weights:0' shape=(1, 1, 64, 80) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_3b_1x1/BatchNorm/beta:0' shape=(80,) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_4a_3x3/weights:0' shape=(3, 3, 80, 192) >dtype=float32_ref>", "'InceptionResnetV2/Conv2d_4a_3x3/BatchNorm/beta:0' shape=(192,) >dtype=float32_ref>", "'InceptionResnetV2/Mixed_5b/Branch_0/Conv2d_1x1/weights:0' shape=(1, 1, >192, 96) dtype=float32_ref>", "
EDIT:
I have managed to compute the gradients for the first two variables of the train list using this guide Override Tensorflow Backward-Propagation. Based on that I forgot the third parameter (which is mentioned as the d parameter in the guide) in the forward and backward propagation function which is in my case the conv layer output of the net. Nevertheless, I am getting only the first two gradients computed and all the others are missing. Do I have to compute and return in the backpropagation function for every trainable variable the gradient?. When I am right in the backpropagation function we are computing the derivatives with respect to the ops inputs, which are in my case 2 variables (target and predicted) and one for the conv layer output (i.e. return grad * op.inputs[0], grad * op.inputs[1], grad * op.inputs[2]). I thought that the overall computation for all trainable variables gets done after defining the custom gradient computation and while applying the opt.compute_gradient function using as a second parameter the variable list. Am I right or wrong?.
I have posted the part of the TensorBoard output for the mean_squared_error op. The image shows the additional loss function which I had left out to simplify my problem. This loss function works well. The arrow from the mean_squared_error function to the gradient computation is missing, because of the issue. I hope this gives a better overview.

TensorFlow - predicting next word - loss function logit na target shape

I'm trying to create a language model. I have logit and target of size: [32, 312, 512]
Where:
.shape[0] is batch_size
.shape[1] is sequence_max_len
.shape[2] is vocabulary size
The question is - when I pass logit and target to the loss function as follows:
self.loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=self.logit, labels=self.y))
Does it compute appropriate loss for the current batch? Or should I reshape logit and target to express the following shape: [32, 312*512]?
Thanks in advance for your help!
The api documentation says about labels,
labels: Each row labels[i] must be a valid probability distribution
If you are predicting each character at a time, you would have a probability distribution (probability of being each character sum up to 1) over your vocab size 512. Given that, your labels and unscaled logits of shape [32, 312, 512], you should reshape it into [32*312, 512] before calling the function. In this way each row of your labels have a valid probability distribution and your unscaled logits will be converted to prob distribution by the function itself and then loss will be calculated.
The answer is: it's irrelevant, since tf.nn.softmax_cross_entropy_with_logits() have dim argument:
dim: The class dimension. Defaulted to -1 which is the last dimension.
name: A name for the operation (optional).
Also inside tf.nn.softmax_cross_entropy_with_logits() you have this code:
# Make precise_logits and labels into matrices.
precise_logits = _flatten_outer_dims(precise_logits)
labels = _flatten_outer_dims(labels)

TensorFlow Batch Normalization Dimension

I'm trying to use batch normalization in a conv2d_transpose as follows:
h1 = tf.layers.conv2d_transpose(inputs, 64, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
h2 = tf.layers.conv2d_transpose(h1, 3, 4, 2, padding='SAME',
kernel_initializer=tf.variance_scaling_initializer,
bias_initializer=tf.ones_initializer,
activity_regularizer=tf.layers.batch_normalization,
)
And I am receiving the following error:
ValueError: Dimension 1 in both shapes must be equal, but are 32 and 64
From merging shape 2 with other shapes. for 'tower0/AddN' (op: 'AddN') with input shapes: [?,32,32,64], [?,64,64,3].
I've seen that other people have had this error in Keras because of the difference in dimension ordering between TensorFlow and Theano. However, I'm using pure TensorFlow, all of my variables are in TensorFlow dimension format (batch_size, height, width, channels), and the data_format of the conv2d_transpose layer should be the default 'channels_last'. What am I missing here?
tf.layers.batch_normalization should be added as a layer, not a regularizer. activity_regularizer is a function that takes activity (layer's output) and produces an extra loss term that is added to the overall loss term of the whole network. For example, you might want to penalize networks that produce high activation. You can see how activity_regularizer is called on the outputs and its result added to the loss here.

Tensorflow reshape tensor gives None dimension

I have used the model described here on the 0.6.0 branch. The code can be found here. I have done some minor changes to the linked code.
In my code I create two models, one for training and one for validation, very similar as it is done in the Tensorflow Tutorial.
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel_User(is_training=True, config=config, name='Training model')
with tf.variable_scope("model", reuse=True, initializer=initializer):
mtest = PTBModel_User(is_training=False, config=config_valid, name='Validation model')
The first model, the one for training, seems to be created just fine, but the second, used for validation, does not. The output gets a None dimension! The row I'm refering to is on row 134 in the linked code:
output = tf.reshape(tf.concat(1, outputs), [-1, size])
I've added these lines right after the reshape of the output:
output_shape = output.get_shape()
print("Model num_steps:", num_steps)
print("Model batch_size:", batch_size)
print("Output dims", output_shape[0], output_shape[1])
and that gives me this:
Model num_steps: 400
Model batch_size: 1
Output dims Dimension(None) Dimension(650)
This problem only happens with the 'validation model', not with the 'training model'. For the 'training model' I get expected output:
Model num_steps: 400
Model batch_size: 2
Output dims Dimension(800) Dimension(650)
(Note that with the 'validation model' I use a batch_size=1 instead of batch_size=2 that I use for the training model)
From what I understand, using -1 as input to the reshape function, will figure the output shape out automagically! But then why do I get None? Nothing in my config fed to the model has a None value.
Thank you for all the help and tips!
TL;DR: A dimension being None simply means that shape inference could not determine an exact shape for the output tensor, at graph-building time. When you run the graph, the tensor will have the appropriate run-time shape.
If you're not interested in how shape inference works, you can stop reading now.
Shape inference applies local rules, based on a "shape function" that takes the shapes of the inputs to an operation and computes (possibly incomplete) shapes for the outputs of an operation. To figure out why tf.reshape() gives an incomplete shape, we have to look at its inputs, and work backwards:
The shape argument to tf.reshape() includes a [-1], which means "figure the output shape automagically" based on the shape of the tensor input.
The tensor input is the output of tf.concat() on the same line.
The inputs to tf.concat() are computed by a tf.mul() in BasicLSTMCell.__call__(). The tf.mul() op multiplies the result of a tf.tanh() and a tf.sigmoid() op.
The tf.tanh() op produces an output of size [?, hidden_size], and the tf.sigmoid() op produces an output of size [batch_size, hidden_size].
The tf.mul() op performs NumPy-style broadcasting. A dimension will only be broadcast if it has size 1. Consider three cases where we compute tf.mul(x, y):
If x has shape [1, 10], and y has shape [5, 10], then broadcasting will happen, and the output shape will be [5, 10].
If x has shape [1, 10], and y has shape [1, 10], then there will be no broadcasting, and the output shape will be [1, 10].
However, if x has shape [1, 10], and y has shape [?, 10], there is insufficient static information to tell whether broadcasting will happen (even though we happen to know that case 2 applies at runtime).
Therefore, when batch_size is 1, the tf.mul() op produces an output with the shape [?, hidden_size]; but when batch_size is greater than 1, the output shape is [batch_size, hidden_size].
Where shape inference breaks down, it can be appropriate to use the Tensor.set_shape() method to add information. This would potentially be useful in the BasicLSTMCell implementation, where we know more than it is possible to infer about the shapes of the outputs.