Properly using tf.GradientTape().gradient - tensorflow

This is not a code question, but a "how it works" morelike one. I have a model which has as inputs 4 tf.Tensor() of same shapes (60, 200, 15000). And my output is a tf.Tensor with shape (60, 200). My custom loss changes the shape of all 4 tensors to output's shape so here there is no problem. Then, inside my custom loss, I measure the loss and compute it. My question comes after, by doing loss = tf.GradientTape().gradient(loss_fn, model.trainable_variables) and optimizer.apply_gradients(zip(loss, model.trainable_variables).
How does the gradient "know" which variables to apply itself? How can I know if the gradient is properly made?

Related

Having trouble getting the gradient on with a given model on Tensorflow 2.0

For some purpose, I'm trying to do gradient descent on a randomly initialized vector (numpy array) as input while given a simple model.
This is the summary of my model:
And this is the algorithm I'm trying to realize (paper linked below):
Which is simply doing gradient descent on a random input (with shape (512,) in this case) minimizing a custom loss (which is the square of a certain output neuron).
The idea is quite simple, but I'm having a hard time implementing it.
The following are functions I looked up and tried, but the output turns out to be a list of (512, 512) arrays instead of one (512,). Also, tf.reduce_mean() is not what I intended to use, but it's relatively a small problem here.
def loss_fn(model, inputs, targets):
error = model(inputs) - targets
return tf.reduce_mean(tf.square(error))
def gradients(model, inputs, targets):
with tf.GradientTape() as tape:
loss_value = loss_fn(model, inputs, targets)
return (tape.gradient(loss_value, model.trainable_variables), loss_value)
for e in range(epochs):
gradient, loss = gradients(model, x, y)
x = x - learning_rate * gradient
print('epoch', e, 'loss', loss)
Can anyone point out which part I'm doing wrong?
I assume the shapes of tensors are all messed up here, but I really have no clue where and how to start fixing it.
Sorry for this naive question, I hope I described it well though. Thanks in advance.
Paper: Trojaning Attack on Neural Networks
Edit: Apparently I did not explain well enough.
The problem is here:
gradient, loss = gradients(model, x, y)
gradients() isn't giving the expected results.
Expected: return int, np.array(shape=(512,)) on parameters model, np.array(shape=(512,)) np.array(shape=(10,))
What I got:
ValueError: Input 0 of layer dense_5 is incompatible with the layer: : expected min_ndim=2, found ndim=1. Full shape received: (512,)

What shape does my loss tensor need to be in tensorflow 2 using Keras API?

I have been playing around with custom loss functions for a while with some success, but I'm struggling with a new loss function, and I wonder if it might be due to the loss result tensor's shape.
My y_true and y_pred tensors have shape == (100, 216, 563). Due to the nature of the data and the calculations I'm performing in my loss function, it makes perfect sense to output a loss tensor of shape == (100, 563) because the second dimension gets reduced away with a reduce_prod() operation.
However, if I use this loss function alone, the loss value steadily increases instead of decreasing... I've not seen this before. If it was all over the place I'd think it was just a bad idea for a loss function or my maths was wrong somewhere, but as far as I can tell the maths is right.
Will this weird shape with a missing middle dimension throw off the gradient calculations? I've tried already using keepdims=True in my reduce_foo() methods, but this makes no difference to the increasing loss value (and the results still have a different shape, shape == (100, 1, 563)
Looking through tensorflow docs, I can find examples of both a loss with matching shape to y_pred and y_true, and another loss with a single scalar value. Are there any specific rules stated anywhere as to what shape the output loss should be or can anyone give me insights that might help me understand why the loss should be a specific shape (if that is even my problem)?

Tensorflow weighted vs sigmoid cross-entropy loss

I am trying to implement multi-label classification using TensorFlow (i.e., each output pattern can have many active units). The problem has imbalanced classes (i.e., much more zeros than ones in the labels distribution, which makes label patterns very sparse).
The best way to tackle the problem should be to use the tf.nn.weighted_cross_entropy_with_logits function. However, I get this runtime error:
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32
I can't understand what is wrong here. As input to the loss function, I pass the labels tensor, the logits tensor, and the positive class weight, which is a constant:
positive_class_weight = 10
loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
Any hints about how to solve this? If I just pass the same labels and logits tensors to the tf.losses.sigmoid_cross_entropy loss function, everything works well (in the sense that Tensorflow runs properly, but of course following training predictions are always zero).
See related problem here.
The error is likely to be thrown after the loss function, because the only significant difference between tf.losses.sigmoid_cross_entropy and tf.nn.weighted_cross_entropy_with_logits is the shape of the returned tensor.
Take a look at this example:
logits = tf.linspace(-3., 5., 10)
labels = tf.fill([10,], 1.)
positive_class_weight = 10
weighted_loss = tf.nn.weighted_cross_entropy_with_logits(targets=labels, logits=logits, pos_weight=positive_class_weight)
print(weighted_loss.shape)
sigmoid_loss = tf.losses.sigmoid_cross_entropy(multi_class_labels=labels, logits=logits)
print(sigmoid_loss.shape)
Tensors logits and labels are kind of artificial and both have shape (10,). But it's important that weighted_loss and sigmoid_loss are different. Here's the output:
(10,)
()
This is because tf.losses.sigmoid_cross_entropy performs reduction (the sum by default). So in order to replicate it, you have to wrap the weighted loss with tf.reduce_sum(...).
If this doesn't help, make sure that labels tensor has type float32. This bug is very easy to make, e.g., the following declaration won't work:
labels = tf.fill([10,], 1) # the type is not float!
You might be also interested to read this question.

Why are my TRAINABLE_VARIABLES in Tensorflow so weird?

I've just started my first TF project.
I trained a 4 layer vanilla NN on MNIST.
Then I wanted to display the learned weights,
but weirdly I got way more output than I expected.
I used
sess.run(tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "my_w1"))
where I had previously defined
tf.Variable(tf.random_normal([layer_sizes[i-1], layer_sizes[i]]), name = "my_w1").
The problem is, that I expected a 2d array of the shape (784, 500),
but I got a 3d one of the shape (15, 784, 500).
What does the first dimension mean?
This is your batch size: the number of images you use in each iteration. It comes from this part of the code: epoch_x, epoch_y = mnist.train.next_batch(batch_size)

Regarding setting up the target tensor shape for sparse_categorical_crossentropy

I am trying to experiment with a multi-layer encoder-decoder type of network. The screenshot of the last several layers of network architecture is as follows. This is how I setup model compiling and training process.
optimizer = SGD(lr=0.001, momentum=0.9, decay=0.0005, nesterov=False)
autoencoder.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer, metrics=['accuracy'])
model.fit(imgs_train, imgs_mask_train, batch_size=batch_size, nb_epoch=nb_epoch, verbose=1,callbacks=[model_checkpoint])
imgs_train and imgs_mask_train are of shape (2000, 1, 128, 128). imgs_train represent the raw image and imgs_mask_train represents the mask image. I am trying to solve a semantic segmentation problem. However, running the program generates the following error message, (I only keep the main related part).
tensorflow.python.pywrap_tensorflow.StatusNotOK: Invalid argument: logits first dimension must match labels size. logits shape=[4096,128] labels shape=[524288]
[[Node: SparseSoftmaxCrossEntropyWithLogits = SparseSoftmaxCrossEntropyWithLogits[T=DT_FLOAT, Tlabels=DT_INT64, _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_364, Cast_158)]]
It seems to me that the loss function of sparse_categorical_crossentropy causes the problem for the current (imgs_train, imgs_mask_train) shape setting. The Keras API does not include the detail about how to setup the target tensor. Any suggestions are highly appreciated!
I am currently trying to figure the same problem and as far as I can tell it takes a sparse representation of the target category. That means integers as the target label instead of the one-hot encoded binary class matrix.
Concerning your problem, do you have categories in your masking or do you just have information about the outline of an object? With outline information it becomes a pixel wise binary loss instead of a categorical one. If you have categories, the output of your decoder should have dimensionality (None, number_of_classes, 128, 128). On that you should be able to use a sparse target mask but I haven't tried this myself...
Hope that helps