tensorflow object detection export_inference_graph.py ckpt name - tensorflow

Does export_inference_graph.py need an exact checkpoint number, or is there a way to run it so that it will use the highest numbered checkpoint in a directory?

It needs exact checkpoint number in the command to find the correct file.

Related

How do I load a non-latest Tensorflow checkpoint?

I made checkpoints every 1000 steps of training, and I have 16 files in my checkpoints directory. However it seems that when I want to retrieve the latest one it's reverting to its pre-trained state. I am assuming something to do with the summary logs not documenting that later checkpoints exist.
chkpt.restore(tf.train.latest_checkpoint(chkpt_dir))
# fit(train_ds, test_ds, steps=100000)
for i in range(10):
ex_input, ex_output = next(iter(test_ds.take(1)))
generate_images(generator, ex_input, ex_output, i, test=True)
How can I manually ask the checkpoint manager to retrieve this or that particular checkpoint file, as oppossed to .latest_checkpoint()?
Edit: Solved it myself, open the checkpoints.txt file in your checkpoint folder and set the suffix number to whichever checkpoint you want to load.
you can use the checkpoints.restore() method to restore checkpoints of your preference. For example, if you want to load checkpoint at iteration 1000, then you write:
checkpoint.restore('./test/model.ckpt-1000')
For more details please refer to this documentation. Thank You.

How to rewrite a tensorflow's checkpoint files?

I want to change a ckpt files's tensor's value by many other ckpt files's tensors, and use the modified ckpt files to restart TF training jobs.
Hope you some advices!
Thanks!
There are standalone utilities for reading checkpoint files (search for CheckpointReader or NewCheckpointReader) but not modifying them. The easiest approach is probably to load the checkpoint into your model, assign a new value to the variable you want to change, and save this new checkpoint.

TypeError: names_to_saveables must be a dict mapping string names to Tensor/Variables

I am trying to convert a retrained version of MobileNet 0.50 using freeze_graph.py.
Here is my code.
python -m tensorflow.python.tools.freeze_graph --input_checkpoint=model.ckpt --input_graph=graph.pb --output_graph=frozen.pb --input_binary=TRUE --output_node_names=softmax
The input graph and checkpoint files have been obtained from retrain.py given by Tensorflow.
I've tried using .pbtxt, using the different checkpoint files outputed by saving the session, input_binary as false. I'm just not sure where to go from here.
Thanks for your time and help!

Tensorflow can't save model

I encountered this weird problem...I use this code to construct tensorflow saver:
tf.train.Saver(tf.all_variables(), max_to_keep=FLAGS.keep)
which is supposed to be very standard. However, when I point the saving directory to my custom directory (under my username) instead of "/tmp", all of a sudden, the saved models are files like
translate.ckpt-329.data-00000-of-00001
translate.ckpt-329.index
translate.ckpt-329.meta
I can't find the file "translate.ckpt-329".
The generated checkpoint file is pointing to:
model_checkpoint_path: "/Users/.../train_dir/translate.ckpt-329"
all_model_checkpoint_paths: "/Users/.../train_dir/translate.ckpt-329"
while this file does not exist and create problems for me restoring my model.
Can someone shed any light on this?? What could possibly be the problem?
Thanks for the first answer! I guess my bigger problem is the restore method:
The original code uses this way to restore a session:
ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir)
model.saver.restore(session, ckpt.model_checkpoint_path)
Which failed with V2 saving :(
if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):
logging.info("Reading model parameters from %s" % ckpt.model_checkpoint_path)
model.saver.restore(session, ckpt.model_checkpoint_path)
else:
logging.info("Created model with fresh parameters.")
session.run(tf.global_variables_initializer())
TL;DR: In the new checkpoint format, the "filename" that you pass to the saver is actually used as the prefix of several filenames, and no file with that exact name is written. You can use the old checkpoint format by constructing your tf.train.Saver with the optional argument write_version=tf.train.SaverDef.V1.
From the names of the saved files, it appears that you are using the "V2" checkpoint format, which became the default in TensorFlow 0.12. This format stores the checkpoint data in multiple files: one or more data files (e.g. translate.ckpt-329.data-00000-of-00001 in your case) and an index file (translate.ckpt-329.index) that tells TensorFlow where each saved variable is located in the data files. The tf.train.Saver uses the "filename" that you pass as the prefix for these files' names, but doesn't produce a file with that exact name.
Although there is no file with the exact name you gave, you can use the value returned from saver.save() as the argument to a subsequent saver.restore(), and the other checkpoint locating mechanisms should continue to work as before.

what is conv1/weights/Adam in checkpoint file in tensorflow

I printed all tensor value in a checkpoint file.
I can understand "conv1/weights". But what is "conv1/weights/Adam" in checkpoint file?
It's an extra variable that was created because you are using an AdamOptimizer() to train your data. You can read about the algorithm in the original paper - https://arxiv.org/pdf/1412.6980v8.pdf