Declaring Variables inside the Tensorflow GradientTape - tensorflow

I have a model with a complex loss, computed per class of the model output.
As you can see below, I'm computing the loss with some custom loss function, assigning this value to the variable, as tensor are immutable in tensorflow.
def calc_loss(y_true, y_pred):
num_classes=10
pos_loss_class = tf.Variable(tf.zeros((1, num_classes), dtype=tf.dtypes.float32))
for idx in range(num_classes):
pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx]
pos_loss_class[:, idx].assign(pos_loss)
return tf.reduce_mean(pos_loss_class)
My code is simple:
with tf.GradientTape() as tape:
output = model(input, training=True)
loss = calc_loss(targets, output)
grads = tape.gradient(loss, model.trainable_weights)
However, I receive None for all model's variables. From my understanding this is caused by a blocking manner of the state of the variable as written here: https://www.tensorflow.org/guide/autodiff#4_took_gradients_through_a_stateful_object
Any suggestions?
Here is the reproducible code, which is a toy example, but demonstrates the issue.
y_true = tf.Variable(tf.random.normal((1, 2)), name='targets')
layer = tf.keras.layers.Dense(2, activation='relu')
x = tf.constant([[1., 2., 3.]])
with tf.GradientTape() as tape:
y_pred = layer(x)
loss_class = tf.Variable(tf.zeros((1,2)), dtype=tf.float32)
for idx in range(2):
loss = tf.abs(y_true[:, idx] - y_pred[:, idx])
loss_class[:, idx].assign(loss)
final_loss = tf.reduce_mean(loss_class)
grads = tape.gradient(final_loss, layer.trainable_weights)

My current second guess, is that the assign method blocks the gradient, as explained in the tensorflow page you liked... instead, try to use just a plain list:
def calc_loss(y_true, y_pred):
num_classes=10
pos_loss_class = []
for idx in range(num_classes):
pos_loss = SOME_LOSS_FUNC(y_true[:, idx], y_pred[:, idx]
pos_loss_class.append(pos_loss)
return tf.reduce_mean(pos_loss_class)

Related

In tensorflow 1, when the loss function is defined with operations on Tensors, is the model really trained?

First, I m sorry but it's not possible to reproduce this problem on a few lines, as the model involved is a very complex network.
But here is an idea of the code:
def return_iterator(data, nb_epochs, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.repeat(nb_epochs).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
yy = iterator.get_next()
return tf.cast(yy, tf.float32)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
y_pred = complex_model.autoencode(train)
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
nb_epochs = 10
batch_size = 64
y_real = return_iterator(train, nb_epochs, batch_size)
y_pred = return_iterator(y_pred, nb_epochs, batch_size)
res_equal = 1. - tf.reduce_mean(tf.abs(y_pred - y_real), [1,2,3])
loss = 1 - tf.reduce_sum(res_equal, axis=0)
opt = tf.train.AdamOptimizer().minimize(loss)
tf.global_variables_initializer().run()
for epoch in range(0, nb_epochs):
_, d_loss = sess.run([opt, loss])
To define the loss, I must use operations like tf.reduce_mean and tf.reduce_sum , and these operations only accept Tensors as input.
My question is: with this code, will the complex_model autoencoder be trained during the training ? (eventhough here, it's just used to output the predictions to compute the loss)
Thank you
p.s: I am using TF1.15 (and I cannot use another version)

ValueError: No gradients provided for any variable in my custom loss - Why?

Here is my code (you can copy and paste to execute it)
import tensorflow as tf
import numpy as np
from sklearn.preprocessing import MinMaxScaler
x = np.array([[1, 2], [3, 4], [5, 6], [7, 8]]).astype(np.float32)
y = np.array([[-1], [3], [7], [-2]]).astype(np.float32)
# scale x and y
x_scaler = MinMaxScaler()
x_scaler.fit(x)
x_sc = x_scaler.transform(x)
y_scaler = MinMaxScaler()
y_scaler.fit(y)
y_sc = y_scaler.transform(y)
batch_size = 2
ds = tf.data.Dataset.from_tensor_slices((x_sc, y_sc)).batch(batch_size=batch_size)
# create the model
model = tf.keras.Sequential(
[
tf.keras.layers.Input(shape=(2,)),
tf.keras.layers.Dense(units=3, activation='relu'),
tf.keras.layers.Dense(units=1)
]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
def standard_loss(y_batch, y_pred, y_min_max):
batches = y_pred.shape[0]
loss = 0.0
y_true_unsc = tf.convert_to_tensor(y_min_max.inverse_transform(y_batch), tf.float32)
y_pred_unsc = tf.convert_to_tensor(y_min_max.inverse_transform(y_pred), tf.float32)
for batch in range(batches):
loss += tf.math.reduce_mean(tf.math.square(y_true_unsc[batch] - y_pred_unsc[batch]))
return loss / batches
# training loop
epochs = 1
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch, ))
for step, (x_batch, y_batch) in enumerate(ds):
with tf.GradientTape() as tape:
y_pred = model(x_batch, training=True)
loss_value = standard_loss(y_batch, y_pred, y_scaler)
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
The problem is located in my cost function (standard_loss). When I don't unscale my data, all work better as below:
def standard_loss(y_batch, y_pred, y_min_max):
batches = y_pred.shape[0]
loss = 0.0
for batch in range(batches):
loss += tf.math.reduce_mean(tf.math.square(y_batch[batch] - y_pred[batch]))
return loss / batches
But when I let it as above, I got this error:
ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
I need to unscale my data to use it for others computations.
Someone could help me understand why this happen?
EDIT 1:
The problem is due to the tape (in tf.GradientTape() as tape) which records all the operations, this series of operations by which it goes up in the opposite direction when calculating the gradient. My goal now is to figure out how to unscale my y_pred variable without the "tape" saving it and going astray when calculating the gradient. Ideas?
EDIT 2:
In my custom loss my unscale operation is a numpy operation and this operation is not recorded by "tape" since we go out of the tensorflow field. This is the reason why the error appears. So I'm going to look for a way to scale my data with a tensorflow operation in order to unscale them with a tensorflow operation.
SOLUTION :
EDIT 2 is the solution. Now, everything works perfectly.
In my custom loss my unscale operation is a numpy operation and this operation is not recorded by "tape" since we go out of the tensorflow field. This is the reason why the error appears. One solution is to use tensorflow operations to scale and unscale data in order to allow the tape to record the path. See code below,
import tensorflow as tf
import numpy as np
x = tf.convert_to_tensor([[1, 2], [3, 4], [5, 6], [7, 8]], dtype=tf.float32)
y = tf.convert_toètensor([[-1], [3], [7], [-2]], dtype=tf.float32)
# retrieve x and y min max
xmin, xmax = tf.reduce_min(x, axis=0), tf.reduce_max(x, axis=0)
ymin, ymax = tf.reduce_min(y, axis=0), tf.reduce_max(y, axis=0)
batch_size = 2
ds = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size)
# create the model
model = tf.keras.Sequential(
[
tf.keras.layers.Input(shape=(2,)),
tf.keras.layers.Dense(units=3, activation='relu'),
tf.keras.layers.Dense(units=1)
]
)
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
def standard_loss(y_batch, y_pred):
# unscale y_pred (note that y_batch has never been scaled)
y_pred_unsc = y_pred * (ymax - ymin) + ymin
return tf.reduce_mean(tf.square(y_batch - y_pred_unsc)
# training loop
epochs = 1
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch, ))
for step, (x_batch, y_batch) in enumerate(ds):
with tf.GradientTape() as tape:
# scale data (we see that I do not quit tensorflow operations)
x_scale = (x_batch - xmin)/(xmax - xmin)
y_pred = model(x_scale, training=True)
loss_value = standard_loss(y_batch, y_pred)
grads = tape.gradient(loss_value, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

Can't apply gradients on tf.Variable

I am trying to learn a similarity matrix(M) between two image embeddings, A single instance of training is a pair of images - (anchor, positive). So ideally the model will return 0 distance for embeddings of similar images.
The problem is, when i declare the distance matrix(M) as a tf.Variable, it returns an error
on this line
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
TypeError: 'Variable' object is not iterable.
I think I should use a tensorflow datatype for M, that is iterable
Please tell me how I can fix this issue
import tensorflow as tf
from tensorflow import keras
# metric learning model
class MetricLearningModel:
def __init__(self, lr):
self.optimizer = keras.optimizers.Adam(lr=lr)
self.lr = lr
self.loss_object = keras.losses.MeanSquaredError()
self.trainable_variables = tf.Variable(
(tf.ones((2048, 2048), dtype=tf.float32)),
trainable=True
)
def similarity_function(self, anchor_embeddings, positive_embeddings):
M = self.trainable_variables
X_i = anchor_embeddings
X_j = positive_embeddings
similarity_value = tf.matmul(X_j, M, name='Tensor')
similarity_value = tf.matmul(similarity_value, tf.transpose(X_i), name='Tensor')
# distance(x,y) = sqrt( (x-y)#M#(x-y).T )
return similarity_value
def train_step(self, anchor, positive):
anchor_embeddings, positive_embeddings = anchor, positive
# Calculate gradients
with tf.GradientTape() as tape:
# Calculate similarity between anchors and positives.
similarities = self.similarity_function(anchor_embeddings, positive_embeddings)
y_pred = similarities
y_true = tf.zeros(1)
print(y_true, y_pred)
loss_value = self.loss_object(
y_pred=y_true,
y_true=y_pred,
)
gradients = tape.gradient(loss_value, self.trainable_variables)
# Apply gradients via optimizer
self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))
metric_model = MetricLearningModel(lr=1e-3)
anchor, positive = tf.ones((1, 2048), dtype=tf.float32), tf.ones((1, 2048), dtype=tf.float32)
metric_model.train_step(anchor, positive)
The python zip function expects iterable objects, like for example a list or a tuple.
In your calls to tape.gradient, or optimizer.apply_gradients, you can put your Variable in a list to solve the issue :
with tf.GradienTape() as tape:
gradients = tape.gradient(loss_value, [self.trainable_variables])
# Apply gradients via optimizer
self.optimizer.apply_gradients(zip(gradients, [self.trainable_variables]))
tape.gradient respects the shape of the sources object passed to compute the gradients of, so if you feed it with a list, you will get a list out of it. It is stated in the documentation:
Returns
a list or nested structure of Tensors (or IndexedSlices, or None), one for each element in sources. Returned structure is the same as the structure of sources.

Weighted Absolute Error implementation doesn't work in tensorflow (keras)

I have created custom loss (Weighted Absolute error) in keras but implementation doesn't work - I get an error ValueError: No gradients provided for any variable: ['my_model/conv2d/kernel:0', 'my_model/conv2d/bias:0'].
I want to apply different weight for each pixel.
class WeightedMeanAbsoluteError(tf.keras.metrics.Metric):
def __init__(self, name='weighted_mean_absolute_error'):
super(WeightedMeanAbsoluteError, self).__init__(name=name)
self.wmae = self.add_weight(name='wmae', initializer='zeros')
def update_state(self, y_true, y_pred, loss_weights):
values = tf.math.abs(y_true - y_pred) * loss_weights
return self.wmae.assign_add(tf.reduce_sum(values))
def result(self):
return self.wmae
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.wmae.assign(0.)
loss_object = WeightedMeanAbsoluteError()
train_loss = WeightedMeanAbsoluteError()
I use the following code to implement a training step:
#tf.function
def train_step(input_images, output_images):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
result_images = model(input_images, training=True)
loss = loss_object(output_images, result_images)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Also my code works just fine if I use
loss_object = tf.keras.losses.MeanAbsoluteError()
train_loss = tf.keras.metrics.MeanAbsoluteError()
The best and simple way to minimize a weighted standard loss (such mae) is using the sample_weights parameter in fit method where we pass an array with the desired weight of each sample
X = np.random.uniform(0,1, (1000,50))
y = np.random.uniform(0,1, 1000)
W = np.random.randint(1,10, 1000)
inp = Input((50))
x = Dense(64, activation='relu')(inp)
out = Dense(10)(x)
model = Model(inp, out)
model.compile('adam','mae')
model.fit(X,y, epochs=100, sample_weights=W)

Reuse the weight matrix from embedding layer with #tf.function

Without using #tf.function, the script work perfectly
I want to use it to speed up training, but it's giving me error where I reuse the weight matrix from the embedding layers.
I think the error is caused by get_weights(), because it converts tensor back to numpy
I tried to use a tf.keras.layers.Dense instead of re-using the weights from embedding, and it worked perfectly.
class Example(tf.keras.Model):
def __init__(self,):
super(Example, self).__init__()
self.embed_dim = embed_dim
self.vocab_size = vocab_size
self.embed = tf.keras.layers.Embedding(self.vocab_size, self.embed_dim)
...
def call(self, inputs, trianing):
...
embed_matrix = self.embed.get_weights()
# a dense layer
Vhid = tf.matmul(self.kernel, tf.transpose(embed_matrix[0]))
pred_w = tf.matmul(pred, Vhid) + self.bias
In my train script.
I did
#tf.function
def train_step(x, y, training=None):
with tf.GradientTape() as tape:
pred = model(x, y, training)
losses = compute_loss(y, pred)
grads = tape.gradient(losses, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
return losses
/home/thomas/projects/tf_convsent/models/.py:195 call *
embed_matrix = self.embed.get_weights() # [vocab_size, 300]
/home/thomas/.conda/envs/tf2_p37/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:1177 get_weights
return backend.batch_get_value(params)
/home/thomas/.conda/envs/tf2_p37/lib/python3.7/site-packages/tensorflow/python/keras/backend.py:3011 batch_get_value
raise RuntimeError('Cannot get value inside Tensorflow graph function.')
RuntimeError: Cannot get value inside Tensorflow graph function.
Found the easiest solution which improved 50% training speed(122 hrs to ~65 hrs)
just change
embed_matrix = self.embed.get_weights()
to
embed_matrix = self.embed.weights
will do the trick.