Can't save save/export and load a keras model that uses eager execution - tensorflow

I'm following the RNN text-generation tutorial with eager execution pretty much line for line. I've trained the model with my own data set and have saved a low loss checkpoint. I'm able to load the weights and generate text but I want to export/save the model so that I can learn how to deploy one using flask. However I can't figure out how. The version I'm using is '1.14.0-rc1'.
The tutorial: https://www.tensorflow.org/tutorials/sequences/text_generation
I have been able to save the model as an HDF5 file but I cannot load it. I've also disabled eager execution but that causes problems with running the code later on. I have tried the following and a few more snippets but those led to nothing as well:
new_model = keras.models.load_model("/content/gdrive/My Drive/ColabNotebooks/ckpt4/my_model.h5")
How ever I get
RuntimeError: tf.placeholder() is not compatible with eager execution.
Lastly I found this in another post and tried it as well but was met with another error:
tf.saved_model.save(model, "/content/gdrive/My Drive/Colab Notebooks/ckpt4/my_model.h5")
error:
AssertionError: Tried to export a function which references untracked object Tensor("StatefulPartitionedCall/args_2:0", shape=(), dtype=resource).TensorFlow objects (e.g. tf.Variable) captured by functions must be tracked by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly.

Related

Use Tensorflow2 saved model for object detection

im quite new to object detection but i managed to train my first Tensorflow custom model yesterday. I think it worked fine besides some warnings, at least i got my exported_model folder with checkpoint, saved model and pipeline.config. I built it with exporter_main_v2.py from Tensorflow. I just loaded some images of deers and want to try to detect some on different pictures.
That's what i would like to test now, but i dont know how. I already did an object detection tutorial with pre trained models and it worked fine. I tried to just replace config_file_path, saved_model_path and image_path with the paths linking to my exported model but it didnt work:
error: OpenCV(4.6.0) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\tensorflow\tf_io.cpp:42: error: (-2:Unspecified error) FAILED: ReadProtoFromBinaryFile(param_file, param). Failed to parse GraphDef file: D:\VSCode\Machine_Learning_Tests\Tensorflow\workspace\exported_models\first_model\saved_model\saved_model.pb in function 'cv::dnn::ReadTFNetParamsFromBinaryFileOrDie'
There are endless tutorials on how to train custom detection but i cant find a good explanation how to manually test my exported model.
Thanks in advance!
EDIT: I need to know how to build a script where i can import a model i saved with Tensorflow exporter_main_v2.py and an image i want to test this model on and get a result, either in text or with rectangels in picture. Seeing many tutorials but none works for me with a model i saved with Tensorflow exporter_main_v2.py
From the error it looks like you have a model saved as .pb. If you want to do inference you can write something like this:
# load the model
model = tf.keras.models.load_model(my_model_dir)
prediction = model.predict(x=x_test, ...)
You'll have to set x which is the only mandatory argument. It is your test dataset (the images you want to obtain predictions from). Also, predict is useful when you have a great amount of images to predict. It handles the prediction in a batched way, avoiding filling up the memory. If you have just a few you can use directly the __call__() method of your model, like this:
prediction = model(x_test, training=False)
More about prediction can be found at the Tensorflow documentation.

create_training_graph() failed when converted MobileFacenet to quantize-aware model with TF-lite

I am trying to quantize MobileFacenet (code from sirius-ai) according to the suggestion
and I think I met the same issue as this one
When I add tf.contrib.quantize.create_training_graph() into training graph
(train_nets.py ln.187: before train_op = train(...) or in train() utils/common.py ln.38 before gradients)
It did not add quantize-aware ops into the graph to collect dynamic range max\min.
I assume that I should see some additional nodes in tensorboard, but I did not, thus I think I did not successfully add quantize-aware ops in training graph.
And I try to trace tensorflow, found that I got nothing with _FindLayersToQuantize().
However when I add tf.contrib.quantize.create_eval_graph() to refine the training graph. I can see some quantize-aware ops as act_quant...
Since I did not add ops in training graph successfully, I have no weights to load in eval graph.
Thus I got some error message as
Key MobileFaceNet/Logits/LinearConv1x1/act_quant/max not found in checkpoint
or
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value MobileFaceNet/Logits/LinearConv1x1/act_quant/max
Does anyone know how to fix this error? or how to get quantized MobileFacenet with good accuracy?
Thanks!
H,
Unfortunately, the contrib/quantize tool is now deprecated. It won't be able to support newer models, and we are not working on it anymore.
If you are interested in QAT, I would recommend trying the new TF/Keras QAT API. We are actively developing that and providing support for it.

How to use feature_column v2 in Tensorflow (TF-Ranking)

I'm using TF-Ranking to train a recommendation engine. I have encountered a problem that seems to be a version incompatibility issue concerning tf.feature_column API.
The short version of my question is: What is a v2 feature column (TF 2.0?) (see this for instance) and how can I ensure that my feature columns are treated as v2, while I'm still using TF 1.14.
Here is the details:
I'm unable to shorten my code sufficiently to provide a reproducible example. But I will try to describe the problem in words.
TF Version: 1.14
OS: Ubuntu 18.04
I initialy had two features in my model, user and item, both sparse categorical features which were wrapped in their own tf.feature_column.embedding_column. I was able to use the train_and_evaluate method of the Estimator and export the model for serving.
Then I added a new feature curr_item which is only present during prediction (as a context feature). This shares the embeddings with item. So now I have a tf.feature_column.shared_embedding_columns which wraps both item and current_item.
Now calling train_and_evaluate results in the following error (shortened messages):
ValueError: Could not load all requested variables from checkpoint. Please make sure your model_fn does not expect variables that were not saved in the checkpoint.
Key input_layer/user_embedding/embedding_weights not found in checkpoint
Note that calling train method only works fine. My understanding is that once it gets to evaluation, it tries to load the variables from the checkpoint, but that variable doesn't exist. I did a little debugging and found the reason:
When encode_listwise_features is called during training (which in turn calls encode_features) all features (user and item) are "V2" (not sure what that means) and so the following if statement holds:
https://github.com/tensorflow/ranking/blob/31fc134816cc4974a46a11e7bb2df0066d0a88f0/tensorflow_ranking/python/feature.py#L92
and both variables are named with an encoding_layer prefix (scope name?):
encoding_layer/user_embedding/embedding_weights
encoding_layer/item_embedding/embedding_weights
But when I call the same function for all three features (a little confused wether this is in eval or predict mode), some of these are not "V2" and we end up in the else part of the above condition which calls input_layer direcetly and variables are named using input_layer prefix. Now TF is trying to restore
input_layer/user_embedding/embedding_weights
from the check-point, but that name doesn't exist in the checkpoint, because it was called
encoding_layer/user_embedding/embedding_weights
in training.
So:
1) How can I ensure that all my features are treated as v2 at all stages? I tried using tf.compat.v2.feature_column but that didn't help. There is already a ToDo note above that if statement for this.
2) Can the encode_feature be modified to avoid this situation? e.g. raise an exception with a helpful message?

how to properly train TensorFlow on one machine and evaluate on another?

I'm training a TensorFlow (1.2) model on one machine and attempting to evaluate it on another. Everything works fine when I stay local to one machine.
I am not using placeholders and feed-dict's to get data to the model but rather TF file queues and batch generators. I suspect with placeholders this would be much easier but I am trying to make the TF batch generator machinery work.
In my evaluation code I have lines like:
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(os.path.dirname(ckpt_dir))
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
This produces errors like:
017-08-16 12:29:06.387435: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Unsuccessful TensorSliceReader constructor: Failed to get matching files on /data/perdue/minerva/tensorflow/models/11/20170816/checkpoints-20: Not found: /data/perdue/minerva/tensorflow/models/11/20170816
The referenced directory (/data/...) exists on my training machine but not the evaluation machine. I have tried things like
saver = tf.train.import_meta_graph(
'/local-path/checkpoints-XXX.meta',
clear_devices=True
)
saver.restore(
sess, '/local-path/checkpoints-XXX',
)
but this produces a different error:
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value train_file_queue/limit_epochs/epochs
or, if I explicitly call the initializer functions immediately after the restore,
AttributeError: 'Tensor' object has no attribute 'initializer'
Here, train_file_queue/limit_epochs/epochs is an element of the training graph that I would like the evaluation function to ignore (I have another, new element test_file_queue that is pointing at a different file queue with the evaluation data files in it).
I think in the second case when I'm calling the initializers right after the restore that there is something in the local variables that won't doesn't work quite like a "normal" Tensor, but I'm not sure exactly what the issue is.
If I just use a generic Saver and restore TF does the right thing on the original machine - it just restores model parameters and then uses my new file queue for evaluation. But I can't be restricted to that machine, I need to be able to evaluate the model on other machines.
I've also tried freezing a protobuf and a few other options and there are always difficulties associated with the fact that I need to use file queues as the most upstream inputs.
What is the proper way to train using TensorFlow's file queues and batch generators and then deploy the model on a different machine / in a different environment? I suspect if I were using feed-dict's to get data to the graph this would be fairly simple, but it isn't as clear when using the built in file queues and batch generators.
Thanks for any comments or suggestions!
At least part of the answer to this dilemma was answered in TF 1.2 or 1.3. There is a new flag for the Saver() constructor:
saver = tf.train.Saver(save_relative_paths=True)
that makes it such that when you save the checkpoint directory and move it to another machine, and use it to restore() a model, everything works without errors relating to nonexistent paths for the data (the paths from the old machine where training was performed).
It isn't clear my use of the API is really idiomatic in this case, but at least the code works such that I can export trained models from one machine to another.

How to properly freeze a tensorflow graph containing a LookupTable

I am working with a model that uses multiple lookup tables to transform the model input from text to feature ids. I am able to train the model fine. I am able to load it via the javacpp bindings. I am using a default Saver object via the tensor flow supervisor on a periodic basis.
When I try to run the model I get the following error:
Table not initialized.
[[Node: hash_table_Lookup_3 = LookupTableFind[Tin=DT_STRING, Tout=DT_INT64,
_class=["loc:#string_to_index_2/hash_table"], _output_shapes=[[-1]],
_device="/job:localhost/replica:0/task:0/cpu:0"]
(string_to_index_2/hash_table, ParseExample/ParseExample:5, string_to_index_2/hash_table/Const)]]
I prepare the model by using the freeze_graph.py script as follows:
bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=/tmp/tf/graph.pbtxt
--input_checkpoint=/tmp/tf/model.ckpt-0 --output_graph=/tmp/ticker_classifier.pb
--output_node_names=sigmoid --initializer_nodes=init_all_tables
As far as I can tell specifying the initializer_nodes has no effect on the resulting file. Am I running into something that is not currently supported? If not than is there something else I need to do to prepare the graph to be frozen?
I had the same problem when using C++ to invoke TF API to run the inference. It seems the reason is I train a model using tf.feature_column.categorical_column_with_hash_bucket, which needs to be initialized like this:
table_init_op = tf.tables_initializer(name="init_all_tables")
sess.run(table_init_op)
So when you want to freeze the model, you must append the name of table_init_op to the argument "--output_node_names":
freeze_graph --input_graph=/tmp/tf/graph.pbtxt
--input_checkpoint=/tmp/tf/model.ckpt-0
-- output_graph=/tmp/ticker_classifier.pb
--output_node_names=sigmoid,init_all_tables
--initializer_nodes=init_all_tables
When you load and init model in C++, you should first invoke TF C++ API like this:
std::vector<Tensor> dummy_outputs;
Status st = session->Run({}, {}, {"init_all_tables"}, dummy_outputs);
Now you have initialized all tables and can do other things such as inference. This issue may give you a help.