I am adding a custom loss to a VAE, as suggested here: https://www.linkedin.com/pulse/supervised-variational-autoencoder-code-included-ibrahim-sobh-phd/
Instead of defining a loss function, it uses a dense network and takes its output as the loss (if I understand correctly).
# New: add a classifier
clf_latent_inputs = Input(shape=(latent_dim,), name='z_sampling_clf')
clf_outputs = Dense(10, activation='softmax', name='class_output')(clf_latent_inputs)
clf_supervised = Model(clf_latent_inputs, clf_outputs, name='clf')
clf_supervised.summary()
# instantiate VAE model
# New: Add another output
outputs = [decoder(encoder(inputs)[2]), clf_supervised(encoder(inputs)[2])]
vae = Model(inputs, outputs, name='vae_mlp')
vae.summary()
reconstruction_loss = binary_crossentropy(inputs, outputs[0])
reconstruction_loss *= original_dim
kl_loss = 1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)
kl_loss = K.sum(kl_loss, axis=-1)
kl_loss *= -0.5
vae_loss = K.mean((reconstruction_loss + kl_loss) /100.0)
vae.add_loss(vae_loss)
# New: add the clf loss
vae.compile(optimizer='adam', loss={'clf': 'categorical_crossentropy'}) ===> this line <===
vae.summary()
# reconstruction_loss = binary_crossentropy(inputs, outputs)
svae_history = vae.fit(x_train, {'clf': y_train},
epochs=epochs,
batch_size=batch_size)
I was stuck at the compilation step (annotated as ===> this line <===) that I met a type error:
TypeError: Expected float32, got <function
BaseProtVAE.init..vae_loss at 0x7ff53051dd08> of type
'function' instead.
I need your help if you've got any suggestions.
There are several ways to implement VAE in Tensorflow. I propose an alternative implementation that can be found in custom_layers_and_models in Tensorflow guide pages :
Let's put all of these things together into an end-to-end example: we're going to implement a Variational AutoEncoder (VAE). We'll train it on MNIST digits.
It uses custom Model classes and the gradient tape. In this way, it is quite easy to add the classifier into the VAE model and add the categorical cross-entropy to the total loss during the optimization.
All you need is to modify:
class VariationalAutoEncoder(Model):
"""Combines the encoder and decoder into an end-to-end model for training."""
def __init__(
self,
original_dim,
intermediate_dim=64,
latent_dim=32,
name="autoencoder",
**kwargs
):
super(VariationalAutoEncoder, self).__init__(name=name, **kwargs)
self.original_dim = original_dim
self.encoder = Encoder(latent_dim=latent_dim, intermediate_dim=intermediate_dim)
self.decoder = Decoder(original_dim, intermediate_dim=intermediate_dim)
self.clf_supervised = Dense(10, activation='softmax', name='class_output')
def call(self, inputs):
z_mean, z_log_var, z = self.encoder(inputs)
reconstructed = self.decoder(z)
# Add KL divergence regularization loss.
kl_loss = -0.5 * tf.reduce_mean(
z_log_var - tf.square(z_mean) - tf.exp(z_log_var) + 1
)
self.add_loss(kl_loss)
# classifier
y_pred = self.clf_supervised(z)
return reconstructed, y_pred
by adding the lines self.clf_supervised = Dense(10, activation='softmax', name='class_output') and y_pred = self.clf_supervised(z).
The optimization is done this way:
vae = VariationalAutoEncoder(original_dim, intermediate_dim, latent_dim)
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
mse_loss_fn = tf.keras.losses.MeanSquaredError()
loss_metric = tf.keras.metrics.Mean()
epochs = 2
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=500).batch(4)
# Iterate over epochs.
for epoch in range(epochs):
print("Start of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
reconstructed, y_pred = vae(x_batch_train)
clf_loss = tf.keras.losses.SparseCategoricalCrossentropy()(y_batch_train, y_pred)
# Compute reconstruction loss
loss = mse_loss_fn(x_batch_train, reconstructed)
loss += sum(vae.losses) # Add KLD regularization loss
loss += clf_loss
grads = tape.gradient(loss, vae.trainable_weights)
optimizer.apply_gradients(zip(grads, vae.trainable_weights))
loss_metric(loss)
if step % 100 == 0:
print("step %d: mean loss = %.4f" % (step, loss_metric.result()))
The rest of the code is in the link above. The main change is the optimization done with tf.GradientTape(). It's a bit more complicated than the fit method but it's still quite simple and very powerful.
Related
I'm trying to use GradientTape mechanism for the first time. I've looked at some examples but I'm getting the "No gradients provided for any variable" error and was wondering how to overcome this?
I want to define some complex loss functions, so I tried using GradientTape to produce its gradient for the CNN training. What was I doing wrong and can I fix it?
Attached is a run-able example code that demonstrates my problem:
# imports
import numpy as np
import tensorflow as tf
import sklearn
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
tf.config.run_functions_eagerly(True)
#my loss function
def my_loss_fn(y_true, y_pred):
` # train SVM classifier
VarC=1E6
VarGamma='scale'
clf = SVC(kernel='rbf', C=VarC, gamma=VarGamma, probability=True )
clf.fit(y_pred, y_true)
y_pred = clf.predict_proba(y_pred)
scce = tf.keras.losses.SparseCategoricalCrossentropy()
return scce(y_true, y_pred)
`
#creating inputs to demontration
X0=0.5*np.ones((12,12))
X0[2:12:4,:]=0
X0[3:12:4,:]=0
X1=0.5*np.ones((12,12))
X1[1:12:4,:]=0
X1[2:12:4,:]=0
X1=np.transpose(X1)
X=np.zeros((2000,12,12))
for i in range(0,1000):
X[i]=X0+np.random.rand(12,12)
for i in range(1000,2000):
X[i]=X1+np.random.rand(12,12)
y=np.zeros(2000, dtype=int)
y[1000:2000]=1
x_train, x_val, y_train, y_val = train_test_split(X, y, train_size=0.5)
x_val, x_test, y_val, y_test = train_test_split(x_val, y_val, train_size=0.5)
x_train = tf.convert_to_tensor(x_train)
x_val = tf.convert_to_tensor(x_val)
x_test = tf.convert_to_tensor(x_test)
y_train = tf.convert_to_tensor(y_train)
y_val = tf.convert_to_tensor(y_val)
y_test = tf.convert_to_tensor(y_test)
inputs = keras.Input((12,12,1), name='images')
x0 = tf.keras.layers.Conv2D(8,4,strides=4)(inputs)
x0 = tf.keras.layers.AveragePooling2D(pool_size=(3, 3), name='pooling')(x0)
outputs = tf.keras.layers.Flatten(name='predictions')(x0)
model = keras.Model(inputs=inputs, outputs=outputs)
optimizer=tf.keras.optimizers.Adam(learning_rate=0.001)
# Instantiate a loss function.
loss_fn = my_loss_fn
# Prepare the training dataset.
batch_size = 256
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=1024).batch(batch_size)
epochs = 100
for epoch in range(epochs):
print('Start of epoch %d' % (epoch,))
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass, which enables autodifferentiation.
with tf.GradientTape() as tape:
tape.watch(model.trainable_weights)
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train, training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train, logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value, model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if step % 200 == 0:
print('Training loss (for one batch) at step %s: %s' % (step, float(loss_value)))
print('Seen so far: %s samples' % ((step + 1) * 64))
And when running, I get:
ValueError: No gradients provided for any variable: (['conv2d_2/kernel:0', 'conv2d_2/bias:0'],). Provided grads_and_vars is ((None, <tf.Variable 'conv2d_2/kernel:0' shape=(4, 4, 1, 8) dtype=float32, nump
If I use some standard loss function:
For example the following model and loss function
inputs = keras.Input((12,12,1), name='images')
x0 = tf.keras.layers.Conv2D(8,4,strides=4)(inputs)
x0 = tf.keras.layers.AveragePooling2D(pool_size=(3, 3), name='pooling')(x0)
x0 = tf.keras.layers.Flatten(name='features')(x0)
x0 = layers.Dense(16, name='meta_features')(x0)
outputs = layers.Dense(2, name='predictions')(x0)
model = keras.Model(inputs=inputs, outputs=outputs)
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
Everything works fine and converges well.
What am I doing wrong and can I fix it?
How should f1-score be evaluated during a custom training and evaluating loop in TensorFlow in a binary classification task?
I have checked some online sources. The solution using tfa simply does not work, some self-written f1score functions cannot integrate into the custom training loop. Specifically, in order to follow the same usage pattern as other evaluation metrics, such as keras.metrics.BinaryAccuracy, keras.metrics.AUC, I think I should extend on the tf.keras.metrics.Metric class, but I am not capable of writing such an evaluation function myself.
# Get model
inputs = keras.Input(shape=(784,), name="digits")
x = layers.Dense(64, activation="relu", name="dense_1")(inputs)
x = layers.Dense(64, activation="relu", name="dense_2")(x)
outputs = layers.Dense(10, name="predictions")(x)
model = keras.Model(inputs=inputs, outputs=outputs)
# Instantiate an optimizer to train the model.
optimizer = keras.optimizers.SGD(learning_rate=1e-3)
# Instantiate a loss function.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Prepare the metrics.
train_acc_metric = keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = keras.metrics.SparseCategoricalAccuracy()
import time
epochs = 2
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
start_time = time.time()
# Iterate over the batches of the dataset.
for step, (x_batch_train, y_batch_train) in enumerate(train_dataset):
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Update training metric.
train_acc_metric.update_state(y_batch_train, logits)
# Log every 200 batches.
if step % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (step, float(loss_value))
)
print("Seen so far: %d samples" % ((step + 1) * batch_size))
# Display metrics at the end of each epoch.
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
# Run a validation loop at the end of each epoch.
for x_batch_val, y_batch_val in val_dataset:
val_logits = model(x_batch_val, training=False)
# Update val metrics
val_acc_metric.update_state(y_batch_val, val_logits)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
print("Time taken: %.2fs" % (time.time() - start_time))
Specifically, I wonder how I can calculate f1-score in exactly the same way as the train_acc_metric and val_acc_metric in the following code segment. (i.e. call update_state, result, reset_state at exactly the same location as train_acc_metric and val_acc_metric)
you can use this code
f1 = 2*(tf.compat.v1.metrics.recall(labels, predictions) * tf.compat.v1.metrics.precision(labels, predictions)) / ( tf.compat.v1.metrics.recall(labels, predictions) + tf.compat.v1.metrics.precision(labels, predictions))
or you can try this one
def f1_m(y_true, y_pred):
precision = precision_m(y_true, y_pred)
recall = recall_m(y_true, y_pred)
return 2*((precision*recall)/(precision+recall+K.epsilon()))
or this one
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.00001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.Accuracy(),
tf.keras.metrics.Precision(),
tf.keras.metrics.Recall(),
tfa.metrics.F1Score(num_classes=nb_classes,
average='macro',
threshold=0.5))
First, I m sorry but it's not possible to reproduce this problem on a few lines, as the model involved is a very complex network.
But here is an idea of the code:
def return_iterator(data, nb_epochs, batch_size):
dataset = tf.data.Dataset.from_tensor_slices(data)
dataset = dataset.repeat(nb_epochs).batch(batch_size)
iterator = dataset.make_one_shot_iterator()
yy = iterator.get_next()
return tf.cast(yy, tf.float32)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) as sess:
y_pred = complex_model.autoencode(train)
y_pred = tf.convert_to_tensor(y_pred, dtype=tf.float32)
nb_epochs = 10
batch_size = 64
y_real = return_iterator(train, nb_epochs, batch_size)
y_pred = return_iterator(y_pred, nb_epochs, batch_size)
res_equal = 1. - tf.reduce_mean(tf.abs(y_pred - y_real), [1,2,3])
loss = 1 - tf.reduce_sum(res_equal, axis=0)
opt = tf.train.AdamOptimizer().minimize(loss)
tf.global_variables_initializer().run()
for epoch in range(0, nb_epochs):
_, d_loss = sess.run([opt, loss])
To define the loss, I must use operations like tf.reduce_mean and tf.reduce_sum , and these operations only accept Tensors as input.
My question is: with this code, will the complex_model autoencoder be trained during the training ? (eventhough here, it's just used to output the predictions to compute the loss)
Thank you
p.s: I am using TF1.15 (and I cannot use another version)
Consider the following toy model:
class MyModel(keras.Model):
def __init__(self, **kwargs):
super(MyModel, self).__init__(**kwargs)
self.square_layer = keras.layers.Dense(2)
self.cube_layer = keras.layers.Dense(2)
self.optimizer = tf.keras.optimizers.Adam()
#tf.function
def call(self, X):
return tf.stack([self.square_layer(X), self.cube_layer(X)], axis=-1)
#tf.function
def train_step(self, inputs, targets):
with tf.GradientTape() as tape:
predictions = self(inputs)
loss = tf.reduce_mean(tf.square(predictions - targets))
grads = tape.gradient(loss, self.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.trainable_weights))
return loss
If we train using the following 'train' function, and set 'self.cube_layer.trainable' as True or False, the result is as expected in both the cases:
def train(self, inputs, targets, num_epochs=5000):
self.cube_layer.trainable = False # True or False
self.compile(optimizer=self.optimizer)
for epoch in range(num_epochs):
loss = self.train_step(inputs, targets)
print("Loss: " +str(loss))
inputs = tf.constant([[1,2]], dtype=tf.float32)
targets = tf.constant([[[3,6], [9,12]]], dtype=tf.float32)
model = MyModel()
model.train(inputs, targets)
print(model(inputs))
But, if we change the 'trainable' flag during training, the result is not as expected:
def train(self, inputs, targets, num_epochs=5000):
self.cube_layer.trainable = False
self.compile(optimizer=self.optimizer)
for epoch in range(num_epochs):
loss = self.train_step(inputs, targets)
self.cube_layer.trainable = True
self.compile(optimizer=self.optimizer)
for epoch in range(num_epochs):
loss = self.train_step(inputs, targets)
print("Loss: " +str(loss))
inputs = tf.constant([[1,2]], dtype=tf.float32)
targets = tf.constant([[[3,6], [9,12]]], dtype=tf.float32)
model = MyModel()
model.train(inputs, targets)
print(model(inputs))
In the above example, if we remove the '#tf.function' decorators from 'call' and 'train_step', the result is as expected ! So, I believe it has something to do with tf.function and tensorflow graph compilation.
Is there a way we can use tf.function and set the 'trainable' attribute dynamically during training ? I am using tensorflow 2.9.1.
This is a very intersting and significant problem. Let's locate the problem by adding 3 print line and do a little test in epoch 5, basing on the last train func in your question decalration. i.e.:
...
#tf.function
def train_step(self, inputs, targets):
with tf.GradientTape() as tape:
predictions = self(inputs)
loss = tf.reduce_mean(tf.square(predictions - targets))
grads = tape.gradient(loss, self.trainable_variables)
tf.print(len(self.trainable_variables),"in graph") # add
self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
return loss
...
def train(self, inputs, targets, num_epochs=5):
self.cube_layer.trainable = False
print(len(self.trainable_variables),"before frozen") # add
self.compile(optimizer=self.optimizer)
for epoch in range(num_epochs):
loss = self.train_step(inputs, targets)
self.cube_layer.trainable = True
print(len(self.trainable_variables),"after frozen") # add
self.compile(optimizer=self.optimizer)
for epoch in range(num_epochs):
loss = self.train_step(inputs, targets)
output is:
0 before frozen
2 in graph
2 in graph
2 in graph
2 in graph
2 in graph
4 after frozen
2 in graph
2 in graph
2 in graph
2 in graph
2 in graph
Wow~, even you have changed cube_layer's flag and influence model.trainable_variables indeed, but did not influence the train_step.
Because in this code, train_step has been converted into graph and will not be converted again. It does not mean that once a function is converted into a calculation graph, it will always remain unchanged.
😊The deep reason istf.function's Tracing mechanism. If you repeatedly call a Graphed Function with the same argument type, TensorFlow will skip the tracing stage and reuse a previously traced graph, as the generated graph would be identical. Obviously, here the input of train_step did not change, so we cannot get a new different Graphed Function, leading invalid modification of self.cube_layer.trainable.
So, let's fix it. In fact, it's not a bug, because we'd better not mix high-level(compile,fit) and medium-level(tf.GradientTape) APIs. model.compileonly works for model.fit and did nothing here.
So, a better way here can be write as:
class MyModel(tf.keras.Model):
def __init__(self, **kwargs):
super(MyModel, self).__init__(**kwargs)
self.square_layer = tf.keras.layers.Dense(2)
self.cube_layer = tf.keras.layers.Dense(2)
self.optimizer = tf.keras.optimizers.Adam()
#tf.function
def call(self, X):
return tf.stack([self.square_layer(X), self.cube_layer(X)], axis=-1)
#tf.function
def train_step1(self, inputs,targets,):
with tf.GradientTape() as tape:
predictions = self(inputs)
loss = tf.reduce_mean(tf.square(predictions - targets))
grads = tape.gradient(loss, self.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
return loss
#tf.function
def train_step2(self, inputs,targets):
with tf.GradientTape() as tape:
predictions = self(inputs)
loss = tf.reduce_mean(tf.square(predictions - targets))
grads = tape.gradient(loss, self.trainable_variables)
self.optimizer.apply_gradients(zip(grads, self.trainable_variables))
return loss
def train(self, inputs, targets, num_epochs=5000):
self.cube_layer.trainable = False
self.train_step = self.train_step1
for epoch in range(num_epochs):
loss = self.train_step(inputs,targets)
self.cube_layer.trainable = True
self.train_step = self.train_step2
for epoch in range(num_epochs):
loss = self.train_step(inputs,targets)
print("Loss: " +str(loss))
inputs = tf.constant([[1,2]], dtype=tf.float32)
targets = tf.constant([[[3,6], [9,12]]], dtype=tf.float32)
model = MyModel()
model.train(inputs, targets)
print(model(inputs))
And anything is OK:
Loss: tf.Tensor(1.351493e-06, shape=(), dtype=float32)
tf.Tensor(
[[[ 3. 5.9999933]
[ 8.999994 11.997685 ]]], shape=(1, 2, 2), dtype=float32)
I have created custom loss (Weighted Absolute error) in keras but implementation doesn't work - I get an error ValueError: No gradients provided for any variable: ['my_model/conv2d/kernel:0', 'my_model/conv2d/bias:0'].
I want to apply different weight for each pixel.
class WeightedMeanAbsoluteError(tf.keras.metrics.Metric):
def __init__(self, name='weighted_mean_absolute_error'):
super(WeightedMeanAbsoluteError, self).__init__(name=name)
self.wmae = self.add_weight(name='wmae', initializer='zeros')
def update_state(self, y_true, y_pred, loss_weights):
values = tf.math.abs(y_true - y_pred) * loss_weights
return self.wmae.assign_add(tf.reduce_sum(values))
def result(self):
return self.wmae
def reset_states(self):
# The state of the metric will be reset at the start of each epoch.
self.wmae.assign(0.)
loss_object = WeightedMeanAbsoluteError()
train_loss = WeightedMeanAbsoluteError()
I use the following code to implement a training step:
#tf.function
def train_step(input_images, output_images):
with tf.GradientTape() as tape:
# training=True is only needed if there are layers with different
# behavior during training versus inference (e.g. Dropout).
result_images = model(input_images, training=True)
loss = loss_object(output_images, result_images)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
Also my code works just fine if I use
loss_object = tf.keras.losses.MeanAbsoluteError()
train_loss = tf.keras.metrics.MeanAbsoluteError()
The best and simple way to minimize a weighted standard loss (such mae) is using the sample_weights parameter in fit method where we pass an array with the desired weight of each sample
X = np.random.uniform(0,1, (1000,50))
y = np.random.uniform(0,1, 1000)
W = np.random.randint(1,10, 1000)
inp = Input((50))
x = Dense(64, activation='relu')(inp)
out = Dense(10)(x)
model = Model(inp, out)
model.compile('adam','mae')
model.fit(X,y, epochs=100, sample_weights=W)