Which policy to use after training RL agent

Which policy to use after training RL agent - tensorflow

When running the Tensorflow agents notebook for the Soft Actor-Critic Minitaur, https://www.tensorflow.org/agents/tutorials/7_SAC_minitaur_tutorial, the following directories are created under /tmp:
+tmp
-eval
-train
+policies
-checkpoints
-collect_policy
-greedy_policy
-policy
I initially assumed that 'collect_policy' is the policy from which the agent learns (since SAC is off-policy), and 'greedy_policy' is the optimal policy, which is continually updated as training progresses, and 'checkpoints' are stored if you want to resume training at a later stage. What 'policy' is, I don't know.
However, I see that 'collect_policy', 'greedy_policy' and 'policy' sometimes only modified when training starts, specifically when the checkpointing triggers are created:
# Triggers to save the agent's policy checkpoints.
learning_triggers = [
triggers.PolicySavedModelTrigger(
saved_model_dir,
tf_agent,
train_step,
interval=policy_save_interval),
triggers.StepPerSecondLogTrigger(train_step, interval=1000),
]
And other times they are updated continuously. Checkpoints are always updated continuously. I am therefore unsure which policy should be used after training (for inference, so to say), since the checkpoints only store model variables, which need to be loaded in conjunction with a policy as far as I'm concerned.
To summarize: after training, which policy (or policy + checkpoint) do I use to get the best results, and how do I load it?
Thanks!

Related

Logging state of AdamW optimizer at certain stage in training

I'm using an AdamW optimizer to train a model (Tensorflow 2.0).
I need to log its state at certain steps during training, so to be able to reload the optimizer's state and resume training from there later on.
Do I need to log anything else, apart from the initial config (the output of optimizer.get_config()) and its weights (the output of optimizer.get_weights()) in order to fully reconstruct the optimizer's state?

If you are using keras Model, you can directly use model.save and then later restore it with tf.keras.models.load_model. This will ensure that the optimizer state is restored exactly as it was when saving.

tensorflow federated learning checkpoint

I am studying a federated_learning_for_image_classification.ipynb with tensorflow federated API.
In the example, I could check each simulated clients train Accuracy, Loss and Total accuracy, Total loss.
But there are no checkpoint files.
I want to make each client checkpoint file and total checkpoint files.
And then compare the client parameter variables and total parameter variables.
Anyone can help me to make checkpoint file in federated_learning_for_image_classification.ipynb example?

One question to ask is whether you want to compare the variables within TFF (as part of the federated computation) or post-hoc/outside TFF (analyzing within Python).
Modifying the tff.utils.IterativeProcess construction performed by tff.learning.build_federated_averaging_process may be a good way to go. In fact, I'd recommend forking the simplified implementation on GitHub at tensorflow_federated/python/research/simple_fedavg/simple_fedavg.py, rather than digging into tff.learning.
Changing the line that performs a tff.fedetated_mean on the updates from the clients to a tff.federated_collect will will give a list of all the client's models that can then be compared to the global model.
Example:
client_deltas = tff.federated_collect(client_outputs.weights_delta)
#tff.tf_computation(server_state.model.type_signature,
client_deltas.type_signature)
def compare_deltas_to_global(global_model, deltas):
for delta in deltas:
# do something with delta vs global_model
tff.federated_apply(compare_deltas_to_global, (server_state.model, client_deltas))

Tensorflow eager choose checkpoint max to keep

I'm writing a process-based implementation of a3c with tensorflow in eager mode. After every gradient update, my general model writes its parameters as checkpoints to a folder. The workers then update their parameters by loading the last checkpoints from this folder. However, there is a problem.
Often times, while the worker is reading the last available checkpoint from the folder, the master network will write new checkpoints to the folder and sometimes will erase the checkpoint that the worker is reading. A simple solution would be raising the maximum of checkpoints to keep. However, tfe.Checkpoint and tfe.Saver don't have a parameter to choose the max to keep.
Is there a way to achieve this?

For the tf.train.Saver you can specify max_to_keep:
tf.train.Saver(max_to_keep = 10)
and max_to_keep seems to be present in the both fte.Saver and it's tf.training.Saver.
I haven't tried if it works though.

It seems the suggested way of doing checkpoint deletion is to use the CheckpointManager.
import tensorflow as tf
checkpoint = tf.train.Checkpoint(optimizer=optimizer, model=model)
manager = tf.contrib.checkpoint.CheckpointManager(
checkpoint, directory="/tmp/model", max_to_keep=5)
status = checkpoint.restore(manager.latest_checkpoint)
while True:
# train
manager.save()

How to smoothly produce Tensorflow auc summaries for training and test sets?

Tensorflow describes writing file summaries to visualize graph execution.
I envision three stages:
training the data (with optimization)
measuring accuracy on the training set (no optimization)
measuring accuracy on the test set (no optimization!)
I'd like all stages in the same script, as in the evaluate function of the wide_and_deep tutorial, but with the low-level API. I'd like three different graphs for stats like loss or AUC, one for each stage.
Suppose I use one session, and in each stage I define an AUC summary op:
# define auc
auc, auc_op = tf.metrics.auc(labels, predictions)
# summary scalar to track it
tf.summary.scalar("auc", auc_op, family=family_name)
# merge all summaries for evaluation and later writing
summary_op = tf.summary.merge_all()
...
summary_writer.add_summary(summary, step_num)
There are three graphs, but the first graph has all three runs on it, and the second graph has the last two runs (see below). What's worse, each stage starts from the previous state. This makes sense, because all the variables from the previous stages are still around.
I could use a different session for each stage, but that would throw away the model as well.
What is the smooth way to handle this?
I'd like to just clear some of the summary variables. I've tried re-initializing some variables, looked at related questions, read about name scope and variable scope and tried not to re-use variables for AUC, read about variables and sharing, looked into pruning nodes (though I don't understand it), etc. I have not made it work yet.
I am using the low-level API. I saw something like this in the high-level API in _eval_metric_ops, but I don't understand how they 'clear' the different stages. With name_scope?
Do I have to save and load the model into a new session just for this, or is there some clean way to graph each summary separately?

The metric ops will be local variables, so you could run tf.local_variables_initializer() in your Session, which will reset all of your metrics. You could also look through the local variables collection for those with "auc" in the name if you wanted to be a bit more discerning. The high-level way to do this would be to use an Estimator, which will manage metrics for you.

Checkpoint file not found, restoring evaluation graph

I have a model which runs in a distributed mode for 4000 steps. After every 120s the accuracies are calculated (as is done in the provided examples). However, at times the last checkpoint file is not found.
Error:
Couldn't match files for checkpoint gs://path-on-gcs/train/model.ckpt-1485
The checkpoint file is present at the location. A local run for 2000 steps runs perfectly.
last_checkpoint = tf.train.latest_checkpoint(train_dir(FLAGS.output_path))
I assume that the checkpoint is still in saving process, and the files are not actually written. Tried introducing a wait before the accuracies are calculated as such. However, this seemed to work at first, the model still failed with a similar issue.
saver.save(session, sv.save_path, global_step)
time.sleep(2) #wait for gcs to be updated

From your comment I think I understand what is going on. I may be wrong.
The cloud_ml distributed sample
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/mnist/hptuning/trainer/task.py#L426
uses a temporary file by default. As a consequence, it works locally on the /tmp. Once the training is complete, it copies the result on gs:// but it does not correct the checkpoint file which stills contains references to local model files on /tmp. Basically, this is a bug.
In order to avoid this, you should launch the training process with --write_to_tmp 0 or modify the task.py file directly for disabling this option. Tensorflow will then directly work on gs:// and the resulting checkpoint will therefore be consistent. At least it worked for me.
One way of checking if my assumptions are correct is to copy the resulting checkpoint file from gs:// on your local filesystem using gsutils and then output its content.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas