Tensorflow Index File Utility - tensorflow

I've been looking for a clear answer, but couldn't find until now.
In Tensorflow, after the training executing, 4 files are generated:
.meta,
.data,
.index and
checkpoint
What is the utility of the .index file?
Thanks!

The .index file holds an immutable key-value table linking a serialized tensor name and where to find data in its .data files

Related

How to rewrite a tensorflow's checkpoint files?

I want to change a ckpt files's tensor's value by many other ckpt files's tensors, and use the modified ckpt files to restart TF training jobs.
Hope you some advices!
Thanks!
There are standalone utilities for reading checkpoint files (search for CheckpointReader or NewCheckpointReader) but not modifying them. The easiest approach is probably to load the checkpoint into your model, assign a new value to the variable you want to change, and save this new checkpoint.

Tensorflow can't save model

I encountered this weird problem...I use this code to construct tensorflow saver:
tf.train.Saver(tf.all_variables(), max_to_keep=FLAGS.keep)
which is supposed to be very standard. However, when I point the saving directory to my custom directory (under my username) instead of "/tmp", all of a sudden, the saved models are files like
translate.ckpt-329.data-00000-of-00001
translate.ckpt-329.index
translate.ckpt-329.meta
I can't find the file "translate.ckpt-329".
The generated checkpoint file is pointing to:
model_checkpoint_path: "/Users/.../train_dir/translate.ckpt-329"
all_model_checkpoint_paths: "/Users/.../train_dir/translate.ckpt-329"
while this file does not exist and create problems for me restoring my model.
Can someone shed any light on this?? What could possibly be the problem?
Thanks for the first answer! I guess my bigger problem is the restore method:
The original code uses this way to restore a session:
ckpt = tf.train.get_checkpoint_state(FLAGS.train_dir)
model.saver.restore(session, ckpt.model_checkpoint_path)
Which failed with V2 saving :(
if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path):
logging.info("Reading model parameters from %s" % ckpt.model_checkpoint_path)
model.saver.restore(session, ckpt.model_checkpoint_path)
else:
logging.info("Created model with fresh parameters.")
session.run(tf.global_variables_initializer())
TL;DR: In the new checkpoint format, the "filename" that you pass to the saver is actually used as the prefix of several filenames, and no file with that exact name is written. You can use the old checkpoint format by constructing your tf.train.Saver with the optional argument write_version=tf.train.SaverDef.V1.
From the names of the saved files, it appears that you are using the "V2" checkpoint format, which became the default in TensorFlow 0.12. This format stores the checkpoint data in multiple files: one or more data files (e.g. translate.ckpt-329.data-00000-of-00001 in your case) and an index file (translate.ckpt-329.index) that tells TensorFlow where each saved variable is located in the data files. The tf.train.Saver uses the "filename" that you pass as the prefix for these files' names, but doesn't produce a file with that exact name.
Although there is no file with the exact name you gave, you can use the value returned from saver.save() as the argument to a subsequent saver.restore(), and the other checkpoint locating mechanisms should continue to work as before.

what is conv1/weights/Adam in checkpoint file in tensorflow

I printed all tensor value in a checkpoint file.
I can understand "conv1/weights". But what is "conv1/weights/Adam" in checkpoint file?
It's an extra variable that was created because you are using an AdamOptimizer() to train your data. You can read about the algorithm in the original paper - https://arxiv.org/pdf/1412.6980v8.pdf

cnn using tensorflow for own image set - what should be the tfrecord format

I have an image data set of size 600 x 400 each and I have converted each of the images to TFRecord format. But I am unable to figure out how to use this data? I have seen the imagenet dataset and found only one single binary file (when extracted it form here).
Is it that for an image dataset there will be only one TFRecord or each individual images will have their own TFRecord files?
Tensorflow doesnt look for single tfrecord file. So feel free and point your "data directory" and "train directory" to the location which has set of tfrecord files.
Also, keep in mind files should be in respective directories based on their names like TRAIN-*.tfrecord files in "train directory".
Answer can be more specific if you mentioned what model of TF you are targeting to run on these TF record files.
Hope it helps.

Fail to read the new format of tensorflow checkpoint?

I pip installed tensorflow 0.12. I am able to resume training by loading old checkpoints which ends with .ckpk. However, tensorflow 0.12 dumps new checkpoints in a different format including *.index, .data-00000-of-00001 and *.meta. After that, I am not able to restore from the new checkpoint.
What is the proper way of loading the new format? Besides, how to read *index?
Mostly duplicate of How to restore a model by filename in Tensorflow r12?
Troubleshooting:
Read common suffix
-stop before the first dot after ckpt
Check model path
absolute
saver.restore(sess, "/full/path/to/model.ckpt")
or relative
saver.restore(sess, "./model.ckpt")
Regarding reading the .index file, as the name suggests, it is the first file to be opened by the restore function. No .index file, no restore (you could still restore without a .meta file).
The .index files needs the data-xxxx-of-xxxx shards, so it would be kind of pointless to read only the .index file, without any tensor data restored. What are you trying to achieve?