PyTorch equivalent of BestExporter? - tensorflow

I am searching for an equivalent of BestExporter in PyTorch to save the most recent N best checkpoints, but I cannot find one.
Here is my implementation:
class SaveBestModel:
"""
Class to save the best model while training. If the current epoch's
validation loss is less than the previous least less, then save the
model state.
"""
def __init__(
self, best_valid_loss=float('inf')
):
self.best_valid_loss = best_valid_loss
def __call__(
self, current_valid_loss,
epoch, model, optimizer
):
if current_valid_loss < self.best_valid_loss:
self.best_valid_loss = current_valid_loss
print(f"\nBest validation loss: {self.best_valid_loss}")
print(f"\nSaving best model for epoch: {epoch+1}\n")
torch.save({
'epoch': epoch+1,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),},
f'weights/best_at_epoch_{epoch}_with_loss_{current_valid_loss}.pth')
It just saves all checkpoints each of which is better than its previous one in loss. I wonder if there is a high level API equivalent to BestExporter in TensorFlow?

Related

Is there a way to reset the learning rate on each fold while employing the ReduceLROnPlateau callback of Keras?

As the title is self-descriptive, I'm looking for a way to reset the learning rate (lr) on each fold. The ReduceLROnPlateau callback of Keras manages the lr.
below is a custom callback that will do the job. At the start of training, the callback prompts the user to enter the value of the initial learning rate.
class INIT_LR(keras.callbacks.Callback):
def __init__ (self, model): # initialization of the callback
super(INIT_LR, self).__init__()
self.model=model
def on_train_begin(self, logs=None): # this runs on the beginning of training
print('Enter initial learning rate below')
lr=input('')
tf.keras.backend.set_value(self.model.optimizer.lr, float(lr)) # set the learning rate in the optimizer
lr=float(tf.keras.backend.get_value(self.model.optimizer.lr)) # get the current learning rate to insure it is set
print('Optimizer learning rate set to ', lr)
in model.fit set the parameter
callbacks = [INIT_LR(model), rlronp]
Note: model is the name of your compiled model, and rlronp is the name of your ReduceLROnPlateau callback. When you run model.fit you will be
prompted with
Enter initial learning rate below # printed by the callback
.001 # user entered initial learning rate
Optimizer learning rate set to 0.0010000000474974513 # printed by the callback
With no reproducible example I can only make a suggestion. If you take a look at the source code of ReduceLROnPlateau you can get some inspiration and create a custom callback to reset the learning rate on the beginning of training:
class ResetLR(tf.keras.callbacks.Callback):
def on_train_begin(self, logs={}):
default_lr = 0.1
previous_lr = self.model.optimizer.lr.read_value()
if previous_lr!=defaul_lr:
print("Resetting learning rate from {} to {}".format(previous_lr, default_lr))
self.model.optimizer.lr.assign(default_lr)
So with this callback you train using a for loop:
custom_callback = ResetLR()
for fold in folds:
model.fit(...., callbacks=[custom_callback])
If this does not work (due to tensorflow versions) you can try assigning the default learning rate using the tf.keras.backend like so:
class ResetLR(tf.keras.callbacks.Callback):
def on_train_begin(self, logs={}):
default_lr = 0.1
previous_lr = float(tf.keras.backend.get_value(self.model.optimizer.lr))
if previous_lr!=default_lr:
print("Resetting learning rate from {} to {}".format(previous_lr, default_lr))
tf.keras.backend.set_value(self.model.optimizer.lr, default_lr)
Also I would suggest taking a look at this post, for more references.

Implementing custom loss that makes use of training batch data with Keras

I am trying to implement a GAN called the SimGAN proposed by Apple researchers. The SimGAN is used to refine labelled synthetic images so that they look more like the unlabelled real images.
The link to the paper can be found on arXiv here.
In the paper, the loss function of the combined model, which comprises the generator and the discriminator, has a self-regularization component in the form of an L1 loss that penalizes too great a difference between the synthetic images and the images after refinement. In other words, the refinement should not be too drastic.
I would like to know how I can implement this self-regularization loss in Keras. Here is what I tried:
def self_regularization_loss(refined_images, syn_images):
def l1loss(y_true, y_pred):
return keras.metrics.mean_absolute_error(refined_images, syn_images)
return l1loss
However, I do not think I can compile the model in the way below as the batches of refined and synthetic images change during training time.
model.compile(loss=[self_regularization_loss(current_batch_of_refined, current_batch_of_synthetic),
local_adversarial_loss],
optimizer=opt)
What is the way to implement this loss?
Trying using the tf.function decorator and tf.GradientTape():
#tf.function
def train_step(model, batch):
with tf.GradientTape() as tape:
refined_images, syn_images = batch
loss = self_regularization_loss(model, refined_images, syn_images)
gradients = tape.gradient(loss, model.trainable_variables)
self.optimizer.apply_gradients(zip(gradients, model.trainable_variables))
your training loop can look something like:
for image_batch in dataset:
train_step(model, image_batch)
Here it is assumed that model is of type tf.keras.Model. More details to the model class can be found here. Note that model is also passed to self_regularization_loss. In this function your model recieves both images as inputs and then gives you the respective output. Then you calculate your loss.

didn't assign model.fit, can I plot the history?

Can I plot accuracy, loss... if I didn't assign model.fit to something.
I just wrote model.fit and trained the model.
Thanks
There are 2 ways to do this without a history in keras:
take the text output of the keras training and manually take the loss values of every epoch and do the plot by hand by filling 2 numpy arrays with the values ( one for the loss and an other for the validation loss).
This seems a long task but with a text editor like visual studio code it's a matter of seconds.
write a callback that after every epoch writes the result of the epoch on an external text file and then take the values in a similar way as point 1.
something like this:
class print_log_Callback(Callback):
def __init__(self, logpath, steps):
self.logpath = logpath
self.losslst = np.zeros(steps)
def on_epoch_end(self, epoch, logs=None):
with open(logpath, 'a') as writefile:
with redirect_stdout(writefile):
print("The average loss for epoch {} is {:7.2f} and val_loss is {:7.2f}.".format(epoch, logs["loss"], logs['val_loss']))
writefile.write("\n")
print("The mean train loss is: ", np.mean(self.losslst))
writefile.write("\n")
writefile.write("\n")

Keras evaluate the validation data before the epoch ends

I'm want to train my model with Keras. I'm using a huge dataset Where one training epoch has more than 30000 steps. My problem is that I don't want to wait for an epoch before checking the model improvement on the validation dataset. Is there any way to make Keras evaluate the validation data every 1000 steps of the training data? I think one option will be to use a callback but is there any built-in solution with Keras?
if train:
log('Start training')
history = model.fit(train_dataset,
steps_per_epoch=train_steps,
epochs=50,
validation_data=val_dataset,
validation_steps=val_steps,
callbacks=[
keras.callbacks.EarlyStopping(
monitor='loss',
patience=10,
restore_best_weights=True,
),
keras.callbacks.ModelCheckpoint(
filepath=f'model.h5',
monitor='val_loss',
save_best_only=True,
save_weights_only=True,
),
keras.callbacks.ReduceLROnPlateau(
monitor = "val_loss",
factor = 0.5,
patience = 3,
min_lr=0.001,
),
],
)
With the in-built callbacks, you cannot do that. What you need is to implement a custom callback.
class MyCustomCallback(tf.keras.callbacks.Callback):
def on_train_batch_begin(self, batch, logs=None):
print('Training: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))
def on_train_batch_end(self, batch, logs=None):
print('Training: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))
def on_test_batch_begin(self, batch, logs=None):
print('Evaluating: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))
def on_test_batch_end(self, batch, logs=None):
print('Evaluating: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))
This is taken from the TensorFlow documentation.
You can override the on_train_batch_end() function and, since the batch parameter is an integer, you can verify batch % 100 == 0, then self.model.predict(val_data) etc. to your needs.
Please check my answer here: How to get other metrics in Tensorflow 2.0 (not only accuracy)? to have a good overview on how to override a custom callback function. Please note that in your case it is the on_train_batch_end() not on_epoch_end() that is important.

How can I train a neural network with weights constrained to specific values?

I am trying to train a network with weights that can only have certain values. However, the way that I am doing this takes a very long time, e.g. 5h per epoch for a 3-layered fully connected network on MNIST. Is there a faster way to do this?
I am using tf.keras for building my network. I added a custom tf.constraint that does a binary search on the list of possible weight values when updating the weights. I found a binary search code from here that I adapted for my application. In order to apply the binary search function to all the parameters, I use "tf.map_fn".
Here is the Constraint class:
from tensorflow.python.keras.constraints import Constraint
import tensorflow as tf
# binary search function
def find(weights, query, shape):
vals = tf.map_fn(lambda x: weights[tf.argmin(tf.cast(x >= weights, dtype=tf.int32)[1:] - tf.cast(x >= weights, dtype=tf.int32)[:-1])], tf.reshape(query,[-1]))
return tf.reshape(vals, shape)
class WeightQuantizeClip(Constraint):
# weights parameter holds the possible weight values
def __init__(self, weights = []):
self.weights = tf.convert_to_tensor(weights)
def __call__(self, p):
p = find(self.weights, p, p.shape)
return p
def get_config(self):
return {'name': self.__class__.__name__}
When I train a network with the above constraint the weights are only from the possible weight values, but the training time increases extremely. Without the binary search function, my GPU is fully utilized, but when I train with the binary search function the utilization drops to 2%. Can anyone help me with this?
From your description it seems, that some part of the clipping op gets executed on CPU which requires RAM-VRAM communication which is extremely slow.
However, if you are trying to do the traditional NN quantization, there is actually a whole TF module built for this purpose, you may want to check it out, maybe it covers your use case.
https://www.tensorflow.org/api_docs/python/tf/quantization/quantize