TensorFlow - predicting next word - loss function logit na target shape - tensorflow

I'm trying to create a language model. I have logit and target of size: [32, 312, 512]
Where:
.shape[0] is batch_size
.shape[1] is sequence_max_len
.shape[2] is vocabulary size
The question is - when I pass logit and target to the loss function as follows:
self.loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
logits=self.logit, labels=self.y))
Does it compute appropriate loss for the current batch? Or should I reshape logit and target to express the following shape: [32, 312*512]?
Thanks in advance for your help!

The api documentation says about labels,
labels: Each row labels[i] must be a valid probability distribution
If you are predicting each character at a time, you would have a probability distribution (probability of being each character sum up to 1) over your vocab size 512. Given that, your labels and unscaled logits of shape [32, 312, 512], you should reshape it into [32*312, 512] before calling the function. In this way each row of your labels have a valid probability distribution and your unscaled logits will be converted to prob distribution by the function itself and then loss will be calculated.

The answer is: it's irrelevant, since tf.nn.softmax_cross_entropy_with_logits() have dim argument:
dim: The class dimension. Defaulted to -1 which is the last dimension.
name: A name for the operation (optional).
Also inside tf.nn.softmax_cross_entropy_with_logits() you have this code:
# Make precise_logits and labels into matrices.
precise_logits = _flatten_outer_dims(precise_logits)
labels = _flatten_outer_dims(labels)

Related

Problem with tensor shape when implementing a custom loss function for my model in Tensorflow

I picked up the idea of triplet loss and global orthogonal regularization from this paper http://cs230.stanford.edu/projects_fall_2019/reports/26251543.pdf. However, I keep getting caught up in an tensor shape error.
After I define modelv1 as the base model (modelv1 take input of shape (None,224,224,3) and return tensor of shape (None,64)), the complete model will be defined as follow:
input_shape=(3,224,224,3)
input_all=Input(shape=input_shape)
input_anchor=input_all[:,0,:]
input_pos=input_all[:,1,:]
input_neg=input_all[:,2,:]
output_anchor=modelv1(input_anchor)
output_pos=modelv1(input_pos)
output_neg=modelv1(input_neg)
model=Model(inputs=input_all,outputs=[output_anchor,output_pos,output_neg])
The formula for triplet loss with global orthogonal regularization, as provided in the paper I mentioned above is:
Formular for the loss function
I implemented this formular as follow:
def triplet_loss_with_margin(margin=0.4,d=64,alpha=1.1):
def triplet_loss(y_true,y_pred):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 64)
positive -- the encodings for the positive images, of shape (None, 64)
negative -- the encodings for the negative images, of shape (None, 64)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,positive)),axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,negative)),axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.math.add(tf.math.subtract(pos_dist,neg_dist),margin)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.math.reduce_sum(tf.math.maximum(basic_loss,0.0))
# add regularization term
dot_product=tf.matmul(anchor,tf.transpose(negative))
multiply_2_vectors_value=tf.linalg.diag_part(dot_product)
M1=tf.math.reduce_sum(multiply_2_vectors_value,axis=-1)
M2=tf.math.square(multiply_2_vectors_value)
M2=tf.math.maximum(tf.math.subtract(M2,1/d),0.0)
M2=tf.math.reduce_sum(M2,axis=-1)
loss+=alpha*(tf.math.square(M1)+M2)
return loss
return triplet_loss
I assumed that since anchor and negative all have shape (None,64), this approach should work. However, when I trained the model, I encoutered the error bellow
ValueError: in user code:
/opt/conda/lib/python3.7/site-packages/keras/engine/training.py:853 train_function *
return step_function(self, iterator)
/tmp/ipykernel_24/1319124991.py:34 triplet_loss *
dot_product=tf.matmul(anchor,tf.transpose(negative))
/opt/conda/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper **
return target(*args, **kwargs)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3655 matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py:5714 mat_mul
name=name)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
attrs=attr_protos, op_def=op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:601 _create_op_internal
compute_device)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3569 _create_op_internal
op_def=op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2042 __init__
control_input_ops, op_def)
/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1883 _create_c_op
raise ValueError(str(e))
ValueError: Shape must be rank 2 but is rank 1 for '{{node triplet_loss/MatMul}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false](triplet_loss/strided_slice, triplet_loss/transpose)' with input shapes: [64], [64].
From what I understand, the error is caused because when implementing dot_product=tf.matmul(anchor,tf.transpose(negative)), anchor and negative only has shape (64) so it caused the error. But should anchor and negative is of shape (batch_size,64)? I really could not understand what I did wrong. Could you please enlighten me about this? Thank you
I tried to debug by implementing an independent funct to test
def triplet_loss(y_pred):
"""
Implementation of the triplet loss as defined by formula (3)
Arguments:
y_true -- true labels, required when you define a loss in Keras, you don't need it in this function.
y_pred -- python list containing three objects:
anchor -- the encodings for the anchor images, of shape (None, 64)
positive -- the encodings for the positive images, of shape (None, 64)
negative -- the encodings for the negative images, of shape (None, 64)
Returns:
loss -- real number, value of the loss
"""
anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]
# Step 1: Compute the (encoding) distance between the anchor and the positive
pos_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,positive)),axis=-1)
# Step 2: Compute the (encoding) distance between the anchor and the negative
neg_dist = tf.math.reduce_sum(tf.math.square(tf.math.subtract(anchor,negative)),axis=-1)
# Step 3: subtract the two previous distances and add alpha.
basic_loss = tf.math.add(tf.math.subtract(pos_dist,neg_dist),0.4)
# Step 4: Take the maximum of basic_loss and 0.0. Sum over the training examples.
loss = tf.math.reduce_sum(tf.math.maximum(basic_loss,0.0))
# add regularization term
print("anchor shape: ",anchor.shape)
print("neg shape: ",negative.shape)
dot_product=tf.matmul(anchor,tf.transpose(negative))
multiply_2_vectors_value=tf.linalg.diag_part(dot_product)
M1=tf.math.reduce_sum(multiply_2_vectors_value,axis=-1)
M2=tf.math.square(multiply_2_vectors_value)
mask=tf.math.maximum(tf.math.subtract(M2,1/64),0.0)
M2=tf.math.reduce_sum(M2,axis=-1)
loss+=1.1*(tf.math.square(M1)+M2)
return loss
And it works fine with dummy tensor I passed to it
dummy=tf.random.uniform((1,3,224,224,3))
re_dum=model.predict(dummy)
test=triplet_loss(re_dum)
re_dum is a list of 3 elements, each is a tensor of shape (1,64),test is a number. So this little test shows that there is no problem with my implementing. But why the error keeps showing up?
Besides, when I replace
dot_product=tf.matmul(anchor,tf.transpose(negative))
with
dot_product=tf.matmul(tf.expand_dims(anchor,axis=0),tf.transpose(tf.expand_dims(negative,axis=0)))
The error disappeared, but it seems very perplexing for me why it works.

How can I multiply a tensor with an unknown dimension to a tensorflow variable?

I'm working in Keras (Tensorflow 2). I'd like to multiply each element of a tensor with its own trainable weight. Let's say that my input tensor is 1D, with 10 elements; so I try to define the input as a Keras input tensor, the weights as a tf.Variable, and I try to use the Keras Multiply layer, thus:
import tensorflow as tf
inputs = tf.keras.layers.Input(shape=(10), name='inputs')
weights = tf.Variable(tf.random.normal([10]), name='weights')
outputs = tf.keras.layers.Multiply()([inputs, weights])
Now when I inspect the dimensions they are:
inputs: shape=(None, 10)
weights: shape=(10,)
outputs: shape=(10, 10)
The input dimension has a None dimension, for the batch size, which is what I expect and want. However I expected outputs to have shape=(None, 10). Instead, the initial dimension for the batch size seems to have taken a fixed size of 10. How should I correct this?
You need to broadcast weights along dimenstion 0. The shape of the dimension you want to fix must be constant.
That is, weights must have the shape (1, 10), not (10,).
This can be done using:
weights = tf.Variable(tf.random.normal([1, 10]), name='weights')
or
weights = tf.Variable(tf.random.normal([10]), name='weights')
...
weights = tf.expand_dims(weights, axis=0)

Tensorflow 2 timeseries_dataset_from_array input vs target batch shapes difference

The new tf.keras.preprocessing.timeseries_dataset_from_array function is used to create sliding minibatch windows over the sequential data, for example for tasks involving rnn networks.
According to the docs it returns a minibatch of inputs and targets. However, the target minibatch this function returns does not have a sequence_length (timesteps) dimension. For example.
data = timeseries_dataset_from_array(
data=tokens,
targets=targets,
sequence_length=25,
batch_size=32,
)
for minbatch in data:
inputs, targets = minbatch
assert(inputs.shape[1] == targets.shape[1]) # error
The inputs have [32, 25, 1] shape in case you already just have word indices there and targets confusingly have [32, 1] shape.
So, my question is how am I supposed to map a tensor of inputs with a window of 25 units to a target tensor with a window of 0 units?
How I always train sequence models is by feeding the input tensor of [32, 25, 1] which is then projected into [32, 25, 100] and then you feed the target tensor to the network of size [32, 25, 1] to your loss function or if you have multi-class problem a target vector of [32, 25, num_of_classes].
That is why I am confused by the shape of the target tensor from timeseries_dataset_from_array and the intuition behind it.

What is the right way to Computing correct predictions in Tensorflow?

I'm feeding a simple ConvNet in Tensorflow using a tfrecords file containing grayscale images as inputs and integer class labels.
my loss is defined as loss = tf.nn.sparse_softmax_cross_entropy_with_logits(y_conv, label_batch)
where y_conv=tf.matmul(h_fc1_drop,W_fc2) + b_fc2
and label_batch is tensor of size [batch_size].
I'm trying to compute the accuracy by using
correct_prediction = tf.equal(tf.argmax(label_batch,1),tf.argmax(y_conv, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
This correct_prediction statement is giving an error:
InvalidArgumentError (see above for traceback): Minimum tensor rank: 2 but got: 1
I'm a bit confused as to how exactly one computes correct predictions in TF.
You probably want to use 0 for the dimension argument to tf.argmax since label_batch and y_conv are vectors. Using dimension=1 implies a tensor rank of at least 2. See the documentation for the dimension parameter of argmax here.
I hope that helps!
For your y_conv you do everything right -- it is a matrix of shape (batch_size, n_classes) where for each sample and for each class you have a probability that this is the class the image belongs to. So to get the actual predicted class you need to call argmax.
However your labels are integers and have a shape of just (batch_size,), because the class of an image is known and there's no reason to supply n_classes probabilities, a single integer can hold the actual class just as well. So you don't need to call argmax on it to convert probabilities to a class, it already has the class. To fix it, just do
correct_prediction = tf.equal(label_batch, tf.argmax(y_conv, 1))

tensorflow tutorial of convolution, scale of logit

I am trying to edit my own model by adding some code to cifar10.py and here is the question.
In cifar10.py, the [tutorial][1] says:
EXERCISE: The output of inference are un-normalized logits. Try editing the network architecture to return normalized predictions using tf.nn.softmax().
So I directly input the output from "local4" to tf.nn.softmax(). This gives me the scaled logits which means the sum of all logits is 1.
But in the loss function, the cifar10.py code uses:
tf.nn.sparse_softmax_cross_entropy_with_logits()
and description of this function says
WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
Also, according to the description, logits as input to above funtion must have the shape [batch_size, num_classes] and it means logits should be unscaled softmax, like sample code calculate unnormalized softmaxlogit as follow.
# softmax, i.e. softmax(WX + b)
with tf.variable_scope('softmax_linear') as scope:
weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
stddev=1/192.0, wd=0.0)
biases = _variable_on_cpu('biases', [NUM_CLASSES],
tf.constant_initializer(0.0))
softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
_activation_summary(softmax_linear)
Does this mean I don't have to use tf.nn.softmax in the code?
You can use tf.nn.softmax in the code if you want, but then you will have to compute the loss yourself:
softmax_logits = tf.nn.softmax(logits)
loss = tf.reduce_mean(- labels * tf.log(softmax_logits) - (1. - labels) * tf.log(1. - softmax_logits))
In practice, you don't use tf.nn.softmax for computing the loss. However you need to use tf.nn.softmax if for instance you want to compute the predictions of your algorithm and compare them to the true labels (to compute accuracy).