Parameters in tf.contrib.seq2seq.sequence_loss - tensorflow

I'm trying to use the tf.contrib.seq2seq.sequence_loss function in a RNN model to calculate the loss.
According to the API document, this function requires at least three parameters: logits, targets and weights
sequence_loss(
logits,
targets,
weights,
average_across_timesteps=True,
average_across_batch=True,
softmax_loss_function=None,
name=None
)
logits: A Tensor of shape [batch_size, sequence_length, num_decoder_symbols] and dtype float. The logits correspond to the prediction across all classes at each timestep.
targets: A Tensor of shape [batch_size, sequence_length] and dtype int. The target represents the true class at each timestep.
weights: A Tensor of shape [batch_size, sequence_length] and dtype float. weights constitutes the weighting of each prediction in the sequence. When using weights as masking, set all valid timesteps to 1 and all padded timesteps to 0, e.g. a mask returned by tf.sequence_mask.
average_across_timesteps: If set, sum the cost across the sequence dimension and divide the cost by the total label weight across timesteps.
average_across_batch: If set, sum the cost across the batch dimension and divide the returned cost by the batch size.
softmax_loss_function: Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). Note that to avoid confusion, it is required for the function to accept named arguments.
name: Optional name for this operation, defaults to "sequence_loss".
My understand is logits is my prediction after using Xw+b, so the shape of it should be [batch_size, sequence_length, output size]. Then target should be my label, but the shape required in is [batch_size, sequence_length]. I suppose my label should have the same shape as the logits.
So how to convert the 3d labels to 2d? Thanks in advance

Your targets(labels) don't need to be the same shape with logits.
If we ignore batch_size(which is not relevant to your question) for a moment, this API simply calculates loss between two sequences through weighed sum loss of each word.Suppose vocab_size is 5, and we get a target word 3, logits provide a prediction for this target with a vector [0.2, 0.1, 0.15, 0.4, 0.15].
To calculate the loss between target and prediction, target need not to be the same shape with prediction as [0, 0, 0, 1, 0]. tensorflow will do this internally.
You may refer to the distinction between two api: softmax_cross_entropy_with_logits and sparse_softmax_cross_entropy_with_logits

Your labels should be a 2d matrix of shape [batch_size, sequence_length], and your logits should be a 3d tensor of shape [batch_size, sequence_length, output_size]. Therefore you don't need to extend your label's dimension if your label variable is already in shape [batch_size, sequence_length].
In case you do want to extend the dimension, you can do it like this expended_variable = tf.expand_dims(the_variable_you_wanna_expand, axis = -1)

Deprecated, use instead
import tensorflow as tf
import tensorflow_addons as tfa
tfa.seq2seq.sequence_loss(
logits: tfa.types.TensorLike,
targets: tfa.types.TensorLike,
weights: tfa.types.TensorLike,
average_across_timesteps: bool = True,
average_across_batch: bool = True,
sum_over_timesteps: bool = False,
sum_over_batch: bool = False,
softmax_loss_function: Optional[Callable] = None,
name: Optional[str] = None
) -> tf.Tensor
https://www.tensorflow.org/addons/api_docs/python/tfa/seq2seq/sequence_loss

Related

Logits representation in TensorFlow’s sparse_softmax_cross_entropy

I’ve a question regarding to the sparse_softmax_cross_entropy cost function in TensorFlow.
I want to use it in a semantic segmentation context where I use an autoencoder architecture which uses typical convolution operations to downsample images to create a feature vector. This vector is than upsampled (using conv2d_transposeand one-by-one convolutions to create an output image.
Hence, my input consists of single channel images with shape (1,128,128,1), where the first index represents the batch size and the last one the number of channels. The pixel of the image are currently either 0 or 1. So each pixel is mapped to a class. The output image of the autoencoder follows the same rules. Hence, I can’t use any predefined cost function than either MSE or the previously mentioned one.
The network works fine with MSE. But I can’t get it working with sparse_softmax_cross_entropy. It seems like that this is the correct cost function in this context but I’m a bit confused about the representation of the logits. The official doc says that the logits should have the shape (d_i,...,d_n,num_classes). I tried to ignore the num_classes part but this causes an error which says that only the interval [0,1) is allowed. Of course, I need to specify the number of classes which would turn the allowed interval to [0,2) because the exclusive upper bound is obviously num_classes.
Could someone please explain how to turn my output image into the required logits?
The current code for the cost function is:
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
The squeeze removes the last dimension of the label input to create a shape for the labels of [1 128 128]. This causes the following exception:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1).
Edit:
As requested, here's a minimal example to verfiy the behavior of the cost function in the context of fully-convolutional nets:
constructor snipped:
def __init__(self, img_channels=1, img_width=128, img_height=128):
...
self._loss_op = None
self._learning_rate_placeholder = tf.placeholder(tf.float32, [], 'lr')
self._input_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'x')
self._target_placeholder = tf.placeholder(tf.float32, [None, img_width, img_height, img_channels], 'y')
self._model = self.build_model()
self.init_optimizer()
build_model() snipped:
def build_model(self):
with tf.variable_scope('conv1', reuse=tf.AUTO_REUSE):
#not necessary
x = tf.reshape(self._input_placeholder, [-1, self._img_width, self._img_height, self._img_channels])
conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
with tf.variable_scope('conv2', reuse=tf.AUTO_REUSE):
conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
conv2 = tf.layers.max_pooling2d(conv2, 2, 2)
with tf.variable_scope('conv3_red', reuse=tf.AUTO_REUSE):
conv3 = tf.layers.conv2d(conv2, 1024, 30, strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv4_red', reuse=tf.AUTO_REUSE):
conv4 = tf.layers.conv2d(conv3, 64, 1, strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv5_up', reuse=tf.AUTO_REUSE):
conv5 = tf.layers.conv2d_transpose(conv4, 32, (128, 128), strides=1, activation=tf.nn.relu)
with tf.variable_scope('conv6_1x1', reuse=tf.AUTO_REUSE):
conv6 = tf.layers.conv2d(conv5, 1, 1, strides=1, activation=tf.nn.relu)
return conv6
init_optimizer() snipped:
def init_optimizer(self):
self._loss_op = tf.reduce_mean((tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(self._target_placeholder, [3]), logits=self._model, name="Loss")))
optimizer = tf.train.AdamOptimizer(learning_rate=self._learning_rate_placeholder)
self._train_op = optimizer.minimize(self._loss_op)
By definition the logit is an unscaled probability (strictly speaking odds) or simply put any number. The sequence of logits of length num_classes can be interpreted as unscaled probability distribution. For example, in your case, num_classes=2, then logits=[125.0, -10.0] is an unscaled probability distribution for one pixel (which clearly favors 0 over 1). This array can be squashed to a valid distribution by a softmax, and this is what tf.sparse_softmax_cross_entropy does internally. For [125.0, -10.0] the squashed distribution will be very close to [1.0, 0.0].
Once again, the array [2] is for a single pixel.
If you want to compute the cross-entropy over entire image, the network has to output the binary distribution for all pixels and all images in a batch, i.e. output [batch_size, 128, 128, 2] tensor. The term sparse in the name of the loss refers to the fact that the labels are not one-hot encoded (more details here). It's most useful when the number of classes is large, i.e. one-hot encoding becomes too inefficient in terms of memory, but in your case it's insignificant. If you decide to use tf.sparse_softmax_cross_entropy loss, the labels must be [batch_size, 128, 128], it must be tf.int32 or tf.int64 and must contain correct class indices, zero or one. That's it: tensorflow can compute the cross-entropy between these two arrays.

How to reset the state of a GRU in tensorflow after every epoch

I am using the tensorflow GRU cell to implement an RNN. I am using the aforementioned with videos that range for maximum 5 mins. Therefore, since the next state is fed automatically into the GRU, how can I reset manually the state of the RNN after each epoch. In other words, I want the initial state at the beginning of the training to be always 0. Here is a snippet for my code:
with tf.variable_scope('GRU'):
latent_var = tf.reshape(latent_var, shape=[batch_size, time_steps, latent_dim])
cell = tf.nn.rnn_cell.GRUCell(cell_size)
H, C = tf.nn.dynamic_rnn(cell, latent_var, dtype=tf.float32)
H = tf.reshape(H, [batch_size, cell_size])
....
Any help is much appreciated!
Use initial_state argument of tf.nn.dynamic_rnn:
initial_state: (optional) An initial state for the RNN. If
cell.state_size is an integer, this must be a Tensor of appropriate
type and shape [batch_size, cell.state_size]. If cell.state_size is a
tuple, this should be a tuple of tensors having shapes [batch_size, s] for s in cell.state_size.
An adapted example from the documentation:
# create a GRUCell
cell = tf.nn.rnn_cell.GRUCell(cell_size)
# 'outputs' is a tensor of shape [batch_size, max_time, cell_state_size]
# defining initial state
initial_state = cell.zero_state(batch_size, dtype=tf.float32)
# 'state' is a tensor of shape [batch_size, cell_state_size]
outputs, state = tf.nn.dynamic_rnn(cell, input_data,
initial_state=initial_state,
dtype=tf.float32)
Also note that despite initial_state not being a placeholder, you can also feed the value to it. So if wish to preserve the state within an epoch, but start with a zero at the beginning of the epoch, you can do it like this:
# Compute the zero state array of the right shape once
zero_state = sess.run(initial_state)
# Start with a zero vector and update it
cur_state = zero_state
for batch in get_batches():
cur_state, _ = sess.run([state, ...], feed_dict={initial_state=cur_state, ...})

tensorflow cross entropy loss for sequence with different lengths

i'm building a seq2seq model with LSTM using tensorflow. The loss function i'm using is the softmax cross entropy loss. The problem is my input sequences have different lenghts so i padded it. The output of the model have the shape [max_length, batch_size, vocab_size]. How can i calculate the loss that the 0 padded values don't affect the loss? tf.nn.softmax_cross_entropy_with_logits provide axis parameter so we can calculate the loss with 3-dimention but it doesn't provide weights. tf.losses.softmax_cross_entropy provides weights parameter but it recieves input with shape [batch_size, nclass(vocab_size)]. Please help!
I think you'd have to write your own loss function. Check out https://danijar.com/variable-sequence-lengths-in-tensorflow/.
In this case you need to pad the two logits and labels so that they have the same length. So, if you have the tensors logits with the size of (batch_size, length, vocab_size) and labels with the size of (batch_size, length) in which length is the size of your sequence. First, you have to pad them to same length:
def _pad_tensors_to_same_length(logits, labels):
"""Pad x and y so that the results have the same length (second dimension)."""
with tf.name_scope("pad_to_same_length"):
logits_length = tf.shape(logits)[1]
labels_length = tf.shape(labels)[1]
max_length = tf.maximum(logits_length, labels_length)
logits = tf.pad(logits, [[0, 0], [0, max_length - logits_length], [0, 0]])
labels = tf.pad(labels, [[0, 0], [0, max_length - labels_length]])
return logits, labels
Then you can do the padded cross entropy:
def padded_cross_entropy_loss(logits, labels, vocab_size):
"""Calculate cross entropy loss while ignoring padding.
Args:
logits: Tensor of size [batch_size, length_logits, vocab_size]
labels: Tensor of size [batch_size, length_labels]
vocab_size: int size of the vocabulary
Returns:
Returns the cross entropy loss
"""
with tf.name_scope("loss", values=[logits, labels]):
logits, labels = _pad_tensors_to_same_length(logits, labels)
# Calculate cross entropy
with tf.name_scope("cross_entropy", values=[logits, labels]):
xentropy = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=targets)
weights = tf.to_float(tf.not_equal(labels, 0))
return xentropy * weights
The function below takes two tensors with shapes (batch_size,time_steps,vocab_len). computes the mask for zeroing the time steps related to padding. the mask will remove the loss of padding from the categorical cross entropy.
# the labels that has 1 as the first element
def mask_loss(y_true, y_pred):
mask_value = np.zeros((vocab_len))
mask_value[0] = 1
# find out which timesteps in `y_true` are not the padding character
mask = K.equal(y_true, mask_value)
mask = 1 - K.cast(mask, K.floatx())
mask = K.sum(mask,axis=2)/2
# multplying the loss by the mask. the loss for padding will be zero
loss = tf.keras.layers.multiply([K.categorical_crossentropy(y_true, y_pred), mask])
return K.sum(loss) / K.sum(mask)

Understanding Tensor Inputs & Transformations for use in an LSTM (dynamic RNN)

I am building an LSTM style neural network in Tensorflow and am having some difficulty understanding exactly what input is needed and the subsequent transformations made by tf.nn.dynamic_rnn before it is passed to the sparse_softmax_cross_entropy_with_logits layer.
https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn
Understanding the input
The input function is sending a feature tensor in the form
[batch_size, max_time]
However the manual states that input tensors must be in the form
[batch_size, max_time, ...]
I have therefore expanded the input with a 1d tensor to take the form
[batch_size, max_time, 1]
At this point the input does not break upon running, but I don't understand exactly what we have done here and suspect it may be causing the problems when calculating loss (see below).
Understanding the Transformations
This expanded tensor is then the 'features' tensor used in the code below
LSTM_SIZE = 3
lstm_cell = rnn.BasicLSTMCell(LSTM_SIZE, forget_bias=1.0)
outputs, _ = tf.nn.dynamic_rnn(lstm_cell, features, dtype=tf.float64)
#slice to keep only the last cell of the RNN
outputs = outputs[-1]
#softmax layer
with tf.variable_scope('softmax'):
W = tf.get_variable('W', [LSTM_SIZE, n_classes], dtype=tf.float64)
b = tf.get_variable('b', [n_classes], initializer=tf.constant_initializer(0.0), dtype=tf.float64)
logits = tf.matmul(outputs, W) + b
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels))
This throws a value error at loss
dimensions must be equal, but are [max_time, num_classes] and [batch_size]
from https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/classification -
A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.
At some point in the process max_time and batch_size have been mixed up and I'm uncertain if its at input or during the LSTM. I'm grateful for any advice!
That is because of the shape of the output of the tf.nn.dynamic_rnn. From its documentation https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn:
outputs: The RNN output Tensor.
If time_major == False (default), this will be a Tensor shaped: [batch_size, max_time, cell.output_size].
If time_major == True, this will be a Tensor shaped: [max_time, batch_size, cell.output_size].
you are in the default case, so your outputs gas shape [batch_size, max_time, output_size], and when performing outputs[-1] you obtain a tensor with shape [max_time, output_size]. Probably slicing with outputs[:, -1] should fix it.

Initialize a variable with placeholder as shape

I want to initialize the Weights variable by including the BatchSize dimension, which will be different between the Training and Prediction stages. Tried using the placeholder for that, but doesn't seem to work:
batchsize = tf.placeholder(tf.int32, name='batchsize', shape=[])
...
output, state = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32, initial_state=inState)
weights = tf.Variable(tf.truncated_normal([batchsize, CELL_SIZE, 1], 0.0, 1.0), name='weights')
bias = tf.Variable(tf.zeros(1), name='bias')
preds = tf.add(tf.matmul(output, weights), bias, name='preds')
loss = tf.reduce_mean(tf.squared_difference(preds, Y_))
train_step = tf.train.AdamOptimizer(LR).minimize(loss)
I can get it to work by specifying batchsize as a constant for the weights variable dimension, as opposed to a placeholder, but this way I get an error when I try to recover the session for the Prediction stage, because there the batchsize is 1. If I specify the placeholder, I get the error:
ValueError: initial_value must have a shape specified: Tensor("truncated_normal:0", shape=(?, 32, 1), dtype=float32)
Even though I do pass the value for the batchsize placeholder into the feed_dict when running this part of the graph.
If I specify the option validate_shape=False while creating the weights variable, that stage of the graph works, but later I get this error in AdamOptimizer:
ValueError: as_list() is not defined on an unknown TensorShape.
How can I get this to work? My ultimate goal is to reduce the Cell-Size dimension of the dynamic_rnn output down to 1 to predict the output at each time-step of the RNN.
Make the whole size of variable
get the specific shape of variable corresponding to the batch size (using tf.gather)
self.model_X = tf.placeholder(dtype=tf.float32, shape=[None, 100], name='X')
real_batch_size = tf.cast(tf.shape(self.model_X)[0],tf.int32)
self.y_dk = tf.get_variable(name="y_dk",initializer=tf.truncated_normal(shape=[self.num_doc, self.num_topic], mean=0, stddev=tf.truediv(1.0,self.lambda_y)), dtype=tf.float32)
batch_y_dk = tf.reshape(tf.gather(self.y_dk, self.model_batch_data_idx), [real_batch_size, self.num_topic])