How to save and load using modelcheckpointcallback in tensorflow 2.4.0? - tensorflow

I am using the keras API in tensorflow 2.4.0 to save at different steps my model using this line of code:
tf.keras.callbacks.ModelCheckpoint(os.path.join(log_dir, "checkpoint_{epoch:02d}.tf"),
save_freq=train_steps_per_epoch * cfg.log.checkpoint_save_every_epochs,
save_weights_only=False))
When the model is saved I encounter some warnings:
[2021-03-17 12:11:09,974][absl][WARNING] - Found untraced functions such as nl_0_layer_call_fn, nl_0_layer_call_and_return_conditional_losses, nl_1_layer_call_fn, nl_1_layer_call_and_return_conditional_losses, conv2d_layer_call_fn while saving (showing 5 of 280). These functions will not be directly callable after loading.
And when I load the model using this line of code:
model = tf.keras.models.load_model(os.path.join(train_dir, f'checkpoint_{cfg.training.checkpoint_epoch:02d}.tf'))
I have this error:
Traceback (most recent call last):
File "run/linear_classifier_evaluation.py", line 51, in run
model = tf.keras.models.load_model(os.path.join(train_dir, f'checkpoint_{cfg.training.checkpoint_epoch:02d}.tf'))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/save.py", line 212, in load_model
return saved_model_load.load(filepath, compile, options)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 138, in load
keras_loader.load_layers(compile=compile)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 376, in load_layers
node_metadata.metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 417, in _load_layer
obj, setter = self._revive_from_config(identifier, metadata, node_id)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 434, in _revive_from_config
self._revive_graph_network(metadata, node_id) or
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 471, in _revive_graph_network
inputs=[], outputs=[], name=config['name'])
KeyError: 'name'
I've searched online this error and I've read people suggesting to implement get_config and from_config methods in my model class but I am not using the H5 format so I shouldn't have to do that if I understood correctly the keras tutorial and others solutions where for tf1.x.
I would gladly welcome any help or suggestions on where to look.

Related

Error while exciting the eval.py on TF object detection API

i'm trying to evaluate my model
using this command:
python eval.py --logtostderr --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config --checkpoint_dir=inference_graph --eval_dir=eval
and im getting this error
and I'm getting this error:
Traceback (most recent call last): File "eval.py", line 142, in tf.app.run() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\app.py", line 299, in run _run_main(main, args) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\app.py", line 250, in _run_main sys.exit(main(argv)) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func return func(*args, **kwargs) File "eval.py", line 138, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\evaluator.py", line 274, in evaluate evaluator_list = get_evaluators(eval_config, categories) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\evaluator.py", line 166, in get_evaluators EVAL_METRICS_CLASS_DICTeval_metric_fn_key) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 470, in init use_weighted_mean_ap=False) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 194, in init self._build_metric_names() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 213, in _build_metric_names category_name = unicode(category_name, 'utf-8') NameError: name 'unicode' is not defined
Hi there!
Python 3 renamed the unicode type to str, the old str type has been replaced by bytes.
Knowing this it makes sense that we're getting errors as parts of the TF Object Detection API are deprecated (written using Python 2.x)
See here for more explanation on how to upgrade the code to be compatible with Python 3.
I hope this helps!

How to save complete TensorFlow model while using official TensorFlow object detection API on Retinanet

I am trying to save the complete model using model.save (instead of only checkpoints) at the end of training steps while using official retinanet object detection API. However, I am getting the below error when model.save is called:
I0414 17:18:52.661234 140283524683584 distributed_executor.py:49] Saving model as TF checkpoint: /home/ubuntu/ankur/models/official/vision/detection/ModelDIr/ctl_step_5.ckpt-1
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0414 17:19:26.865248 140283524683584 deprecation.py:506] From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: /home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model/assets
I0414 17:19:33.264274 140283524683584 builder_impl.py:775] Assets written to: /home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model/assets
Traceback (most recent call last):
File "main.py", line 235, in <module>
app.run(main)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main.py", line 230, in main
run()
File "main.py", line 224, in run
callbacks=callbacks)
File "main.py", line 117, in run_executor
save_config=True)
File "/home/ubuntu/ankur/models/official/modeling/training/distributed_executor.py", line 480, in train
model.save('/home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model')
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save
signatures, options)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/save.py", line 115, in save_model
signatures, options)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/saved_model/save.py", line 78, in save
save_lib.save(model, filepath, signatures, options)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/save.py", line 923, in save
saveable_view, asset_info.asset_index)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/save.py", line 647, in _serialize_object_graph
concrete_function, saveable_view.captured_tensor_node_ids, coder)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/function_serialization.py", line 70, in serialize_concrete_function
coder.encode_structure(structured_outputs))
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 95, in encode_structure
return self._map_structure(nested_structure, self._get_encoders())
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 79, in _map_structure
return do(pyobj, recursion_fn)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 203, in do_encode
encoded_dict.dict_value.fields[key].CopyFrom(encode_fn(value))
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 79, in _map_structure
return do(pyobj, recursion_fn)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 203, in do_encode
encoded_dict.dict_value.fields[key].CopyFrom(encode_fn(value))
TypeError: 3 has type int, but expected one of: bytes, unicode
Only change that I have made in order to save the complete model is to add the command model.save('/home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model') at the end of training in distributed_executor.py found at the path ~/models/official/modeling/training. Kindly suggest what can be the issue.
TensorFlow version- 2.1
GIThub link

Restoring official Tensorflow Resnet-50 Checkpoint gives 'ExperimentalFunctionBufferingResource' Error

When trying to reload the official tensorflow models for ResNet-50 checkpoint here:
http://download.tensorflow.org/models/official/20181001_resnet/checkpoints/resnet_imagenet_v1_fp32_20181001.tar.gz
...using this code:
import os
import tensorflow as tf
print(tf.__version__)
saver = tf.train.import_meta_graph(os.path.join(
'resnet_imagenet_v1_fp32_20181001',
'model.ckpt-225207.meta'))
I get this error:
1.13.1
Traceback (most recent call last):
File "chehckpoint_to_savedmodel.py", line 11, in <module>
'model.ckpt-225207.meta'))
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1435, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/training/saver.py", line 1457, in _import_meta_graph_with_return_elements
**kwargs))
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/importer.py", line 399, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/Users/*user*/Library/Python/3.7/lib/python/site-packages/tensorflow/python/framework/importer.py", line 159, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'ExperimentalFunctionBufferingResource'
Funny that googling "KeyError: 'ExperimentalFunctionBufferingResource'" returns zero hits. That's a first.
Ideas?
Not sure how else to reload this model. I also tried this:
path = os.path.join(
'resnet_imagenet_v1_fp32_20181001',
'model.ckpt-225207')
checkpoint = tf.train.Checkpoint()
status = checkpoint.restore(path)
print(status)
status.assert_consumed()
But it fails the assertion with no other information.
Thanks in advance.
P
This seems to be a issue with TF >= 1.13 versions. Try downgrading to 1.12 and give it a try. It should work.
Issues to track would be these : #29751

GCloud MLEngine:Create Version failed. Bad model detected with error: Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)

I am trying to create version under google cloud ml models for the successfully trained tensorflow estimator model. I believe that I am providing the correct Uri(in google storage) which includes saved_model.pb.
Framework: Tensorflow,
Framework Version: 1.13.1,
Runtime Version: 1.13,
Python: 3.5
Here is the traceback of the error:
Traceback (most recent call last):
File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 985, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 795, in Run
resources = command_instance.Run(args)
File "/google/google-cloud-sdk/lib/surface/ml_engine/versions/create.py", line 119, in Run
python_version=args.python_version)
File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 114, in Create
message='Creating version (this might take a few minutes)...')
File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 75, in WaitForOpMaybe
return operations_client.WaitForOperation(op, message=message).response
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/ml_engine/operations.py", line 114, in WaitForOperation
sleep_ms=5000)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 264, in WaitFor
sleep_ms, _StatusUpdate)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 326, in PollUntilDone
sleep_ms=sleep_ms)
File "/google/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 229, in RetryOnResult
if not should_retry(result, state):
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 320, in _IsNotDone
return not poller.IsDone(operation)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 122, in IsDone
raise OperationError(operation.error.message)
OperationError: Bad model detected with error: "Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)"
ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)"
Do you have any idea what might be the problem?
EDIT
I am using:
tf.estimator.LatestExporter('exporter', model.serving_input_fn)
as a estimator exporter.
serving_input_fn:
def serving_input_fn():
inputs = {'string1': tf.placeholder(tf.int16, [None, MAX_SEQUENCE_LENGTH]),
'string2': tf.placeholder(tf.int16, [None, MAX_SEQUENCE_LENGTH])}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
PS: my model takes two inputs and returns one binary output.

Error while testing Tensorflow Object Detection

I've tried all steps of object_detection model installation mentioned #
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
While testing the installation as mentioned in last step of article, I am getting below error.
ERROR: test_create_ssd_mobilenet_v1_model_from_config (__main__.ModelBuilderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 193, in test_create_ssd_mobilenet_v1_model_from_config
model = self.create_model(model_proto)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 53, in create_model
return model_builder.build(model_config, is_training=False)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 73, in build
return _build_ssd_model(model_config.ssd, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 126, in _build_ssd_model
is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 98, in _build_ssd_feature_extractor
feature_extractor_config.conv_hyperparams, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 70, in build
hyperparams_config.regularizer),
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 119, in _build_regularizer
return slim.l2_regularizer(scale=regularizer.l2_regularizer.weight)
File "/Users/RonakBhavsar/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/regularizers.py", line 92, in l2_regularizer
raise ValueError('scale cannot be an integer: %s' % (scale,))
ValueError: scale cannot be an integer: 1
I get this error for all the models mentioned in the test script. Anyone has any ideas?
We have a pull request out that should fix this issue. Please give that a try.