I used the transfer learning approach to develop a detection model using the faster_rcnn algorithm.
To evaluate my model, I used the following commands-
!python model_main_tf2.py --model_dir=models/faster_rcnn_inception_resnet_v2 --pipeline_config_path=models/faster_rcnn_inception_resnet_v2/pipeline.config --checkpoint_dir=models/faster_rcnn_inception_resnet_v2
However, I have been getting the following error/info message: -
INFO:tensorflow:Waiting for new checkpoint at models/faster_rcnn_inception_resnet_v2
I0331 23:23:11.699681 140426971481984 checkpoint_utils.py:139] Waiting for new checkpoint at models/faster_rcnn_inception_resnet_v2
I checked the path to the checkpoint_dir is correct. What could be the problem and how can I resolve it?
Thanks in advance.
You need to run another script for training to generate new checkpoint. model_main_tf2.py does not do both at once, i.e., it won't train model and evaluate the model at the end of each epoch.
One way to get what you want modifying checkpoint_max_to_keep in https://github.com/tensorflow/models/blob/13ec3c1460b928301d208115aed0c94fb47538b7/research/object_detection/model_lib_v2.py#L445
to keep all checkpoints, then evaluate separately. This does not work exactly same as you want, but it generates the curves.
A similar situation happened to me. I don't know if this is the solution or just a workaround but it did work for me. I simply exported my model and provided the path to that checkpoint folder.
fintune_checkpoint_model_directory
|
\---checkpoint(folder)
|
\---checkpoint(file with no extension)
\---ckpt-1.data0000of0001
\---ckpt-1.index
and then simply run the model_main_tf.py file for evaluation.
if you trained your model with a few number of steps it can be a problem maybe a few checkpoints can affect that TensorFlow can't generate evaluation so try to increase the number of steps
Related
I'm trying to load a fine_tune_checkpoint to start the training with.
I added the appropriate field in the config file, and set the value to be a checkpoint I have from a model previously trained from scratch.
The way tfod saves checkpoints is by using 3 files with different postfixes (data, index, meta) and I set the value to be the name of the checkpoint without the postfix.
fine_tune_checkpoint: "/path/to/my/checkpoint/dir/model.ckpt-190000"
There's no indication whatsoever that what I'm doing is either right or wrong. No logs stating the checkpoint is loaded and no errors, warnings or indications that it is not.
How can I tell if the checkpoint was actually loaded?
I'm happy for any suggestions to verify it either way.
Thanks in advance.
you need to do two things to solve this:
1- give absolute path to this parameter like this:
fine_tune_checkpoint:"c:/user/desktop/path/to/my/checkpoint/dir/model.ckpt-190000"
2- the fine tune should be in the same folder of the previous folder you train these steps
"model.ckpt-190000" other ways the model will start from step 0 .
regards.
I am trying to quantize MobileFacenet (code from sirius-ai) according to the suggestion
and I think I met the same issue as this one
When I add tf.contrib.quantize.create_training_graph() into training graph
(train_nets.py ln.187: before train_op = train(...) or in train() utils/common.py ln.38 before gradients)
It did not add quantize-aware ops into the graph to collect dynamic range max\min.
I assume that I should see some additional nodes in tensorboard, but I did not, thus I think I did not successfully add quantize-aware ops in training graph.
And I try to trace tensorflow, found that I got nothing with _FindLayersToQuantize().
However when I add tf.contrib.quantize.create_eval_graph() to refine the training graph. I can see some quantize-aware ops as act_quant...
Since I did not add ops in training graph successfully, I have no weights to load in eval graph.
Thus I got some error message as
Key MobileFaceNet/Logits/LinearConv1x1/act_quant/max not found in checkpoint
or
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobileFaceNet/Logits/LinearConv1x1/act_quant/max
Does anyone know how to fix this error? or how to get quantized MobileFacenet with good accuracy?
Thanks!
H,
Unfortunately, the contrib/quantize tool is now deprecated. It won't be able to support newer models, and we are not working on it anymore.
If you are interested in QAT, I would recommend trying the new TF/Keras QAT API. We are actively developing that and providing support for it.
I was trying to follow this page https://www.tensorflow.org/tutorials/sequences/audio_recognition
I successfully executed the following command:
python tensorflow/examples/speech_commands/train.py
I used a virtual environment in Anaconda. Used Tensorflow 14 and Python 3.6
It took about about 22 hours to train it. it said "/tmp/speech_commands_train/conv.ckpt-100" after every 100 iterations
(there were 18000 in total)
but now when I try to find conv.ckpt-18000.meta or just speech_commands_train I cannot find it.
I am very new to this. This is my first effort in deep learning.
how the terminal looked when training ended
Firstly, what you mean by " Where It saved", by it you mean logs, the trained model or weights.
In your case, you are just storing the weights at given checkpoints hence you can acess them at given paths said in the tutorial
I0730 16:54:41.813438 55030 train.py:252] Saving to "/tmp/speech_commands_train/conv.ckpt-100"
*This is saving out the current trained weights to a checkpoint file. If your training script gets interrupted, you can look for the last saved checkpoint and then restart the script with -*
Also you can store logs using file writer and model using save_model or tensorboard callback with logdir.
Don't forget to upvote if found it useful
I fine-tuned im2txt model and obtained the ckpt.data, ckpt.index and ckpt.meta files and a graph.pbtxt file using the procedure in the im2txt github.
The model seems to work well as it produces almost correct captions.
Now I would like to freeze this model to use it on android.
I used the freeze_graph.py script in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py.
python freeze_graph.py --input_graph=/path/to/graph.pbtxt --input_binary=false --input_checkpoint=/path/to/model.ckpt --output_graph=/path/to/output_graph.pb --output_node_names="softmax,lstm/initial_state,lstm/state"
And I have the following error : AssertionError: softmax is not in graph.
The discussion in https://github.com/tensorflow/models/issues/816 is about the same problem but it did not help me very much.
Indeed, when I look in the graph.pbtxt generated after fine-tuning, I cannot find softmax, lstm/initial_state and lstm/state.
But in the show_and_tell_model.py file of im2txt, the names of the tensors seem to be "softmax", "lstm/initial_state" and "lstm/state". So, I don't know what's happening.
I hope I was clear enough about what I've tried so far. Thanks in advance for any help.
Regards,
Stephane
Found and verified the answer: in inference_wrapper.base.py, just add something like saver.save(sess, "model/ckpt4") after saver.restore(sess, checkpoint_path) in def _restore_fn(sess):. Then rebuild and run_inference and you'll get a model that can be frozen, transformed, and optionally memmapped, to be loaded by iOS and Android apps.
For detailed commands to freeze, transform, and convert to memmapped, see my answer at Error using Model after using optimize_for_inference.py on frozen graph.
Ok,
I think I finally got the solution. In case it is useful for others, here it is:
After training, you obtain ckpt.data, ckpt.index and ckpt.meta files and a graph.pbtxt file.
You then have to load this model in 'inference' mode (see InferenceWrapper in im2txt). It builds a graph with the correct names 'softmax', 'lstm/initial_state' and 'lstm/state'. You save this graph (with the same ckpt format) and then you can apply the freeze_graph script to obtain the frozen model.
Regards,
Stephane
Now I'm in front of the problem about inception v-3 and checkpoint data.
I have been tackling with updating inception-v3's checkpoint data by my images, reading the git page below and succeeded to make new checkpoint data.
https://github.com/tensorflow/models/tree/master/inception
I thought at first just by little change of the code, I can use those checkpoint data to recognise new image datas like the below url.
https://www.tensorflow.org/versions/master/tutorials/image_recognition/index.html
I thought at first that "classify.py" or something reads the new check point datas and just by "python classify.py -image something.png", the program recognises the image data. But It doesn't....
I really need a help.
thanks.
To have input .pb file, during training, import also tf.train.write_graph(sess.graph.as_graph_def(), 'path_to_folder', 'input_graph.pb',False)
If you have downloaded the inception v3 source code, in the inception_train.py, add the line I wrote above, under
saver.save(sess, checkpoint_path, global_step=step). (Where you save the checkpoint/s)
Hope this helps!
To use your checkpoints and model in something like the label_image example, you'll need to run the tensorflow/python/tools/freeze_graph script to convert your variables into constants stored inside the GraphDef. That's how we created the graph file used in that sample code, for example.