GCloud MLEngine:Create Version failed. Bad model detected with error: Failed to load model: a bytes-like object is required, not 'str' (Error code: 0) - tensorflow

I am trying to create version under google cloud ml models for the successfully trained tensorflow estimator model. I believe that I am providing the correct Uri(in google storage) which includes saved_model.pb.
Framework: Tensorflow,
Framework Version: 1.13.1,
Runtime Version: 1.13,
Python: 3.5
Here is the traceback of the error:
Traceback (most recent call last):
File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/cli.py", line 985, in Execute
resources = calliope_command.Run(cli=self, args=args)
File "/google/google-cloud-sdk/lib/googlecloudsdk/calliope/backend.py", line 795, in Run
resources = command_instance.Run(args)
File "/google/google-cloud-sdk/lib/surface/ml_engine/versions/create.py", line 119, in Run
python_version=args.python_version)
File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 114, in Create
message='Creating version (this might take a few minutes)...')
File "/google/google-cloud-sdk/lib/googlecloudsdk/command_lib/ml_engine/versions_util.py", line 75, in WaitForOpMaybe
return operations_client.WaitForOperation(op, message=message).response
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/ml_engine/operations.py", line 114, in WaitForOperation
sleep_ms=5000)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 264, in WaitFor
sleep_ms, _StatusUpdate)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 326, in PollUntilDone
sleep_ms=sleep_ms)
File "/google/google-cloud-sdk/lib/googlecloudsdk/core/util/retry.py", line 229, in RetryOnResult
if not should_retry(result, state):
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 320, in _IsNotDone
return not poller.IsDone(operation)
File "/google/google-cloud-sdk/lib/googlecloudsdk/api_lib/util/waiter.py", line 122, in IsDone
raise OperationError(operation.error.message)
OperationError: Bad model detected with error: "Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)"
ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error: "Failed to load model: a bytes-like object is required, not 'str' (Error code: 0)"
Do you have any idea what might be the problem?
EDIT
I am using:
tf.estimator.LatestExporter('exporter', model.serving_input_fn)
as a estimator exporter.
serving_input_fn:
def serving_input_fn():
inputs = {'string1': tf.placeholder(tf.int16, [None, MAX_SEQUENCE_LENGTH]),
'string2': tf.placeholder(tf.int16, [None, MAX_SEQUENCE_LENGTH])}
return tf.estimator.export.ServingInputReceiver(inputs, inputs)
PS: my model takes two inputs and returns one binary output.

Related

Tensorflow Object Detection API 2

I'm new to tf object detection api 2.
After training the model you can run an evaluation process to check the accuracy of the model.
But when I tried to run I got the below error. I'm using the backbone as an efficientDet.
I was able to run the evaluation for scaling resolution 512 but 640 is failing with the below error.
This is the python file I called and ended up with the below error.
enter code here /tensorflow/models/research/object_detection/model_main_tf2.py
`enter code here`enter code here`Call arguments received:
• inputs=tf.Tensor(shape=(1, 480, 640, 3), dtype=float32)
• kwargs={'training': 'False'}
exception.
INFO:tensorflow:A replica probably exhausted all examples. Skipping pending examples on other replicas.
I0719 06:49:27.115007 140042699994880 model_lib_v2.py:943] A replica probably exhausted all examples. Skipping pending e
xamples on other replicas.
Traceback (most recent call last):
File "/home/pictcompute/effient_net_ve/tensorflow/models/research/object_detection/model_main_tf2.py", line 115, in <m
odule>
tf.compat.v1.app.run()
File "/home/pictcompute/effient_net_ve/lib/python3.8/site-packages/tensorflow/python/platform/app.py", line 36, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/pictcompute/effient_net_ve/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/pictcompute/effient_net_ve/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/pictcompute/effient_net_ve/tensorflow/models/research/object_detection/model_main_tf2.py", line 82, in mai
n
model_lib_v2.eval_continuously(
File "/home/pictcompute/effient_net_ve/lib/python3.8/site-packages/object_detection/model_lib_v2.py", line 1159, in ev
al_continuously
eager_eval_loop(
File "/home/pictcompute/effient_net_ve/lib/python3.8/site-packages/object_detection/model_lib_v2.py", line 1009, in ea
ger_eval_loop
for evaluator in evaluators:
TypeError: 'NoneType' object is not iterable
Highly appreciate your help.
Thanks
The error occurs when you try to iterate over a None value. For example
mylist = None
for x in mylist:
print(x)
TypeError Traceback (most recent call last)
<ipython-input-2-a63d8b17c4a7> in <module>
1 mylist = None
2
----> 3 for x in mylist:
4 print(x)
TypeError: 'NoneType' object is not iterable
The error can be avoided by checking if a value is None or not before iterating over it. Thank You.

datasets = tfds.load(name="mnist") comes with error "Expected binary or unicode string, got WindowsGPath......"

I have some difficulty with tensorflow_datasets when I was trying to load mnist.
python:3.7
tensorflow : 2.1.0
tensorflow_datasets has been upgraded to latest version 4.6, because the default version of tensorflow_datasets from tensorflow installation has no attribute 'load'
But now the problem is data can not be downloaded and extracted successfully.
with the following command:
datasets = tfds.load(name="mnist")
the error message is :
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to ~\tensorflow_datasets\mnist\3.0.1...
Extraction completed...: 0 file [00:00, ? file/s]██████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 138.37 url/s]
Dl Size...: 100%|██████████████████████████████████████████████████████████████████████████| 11594722/11594722 [00:00<00:00, 373172106.07 MiB/s]
Dl Completed...: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 122.03 url/s]
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\load.py", line 327, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 483, in download_and_prepare
download_config=download_config,
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 1222, in _download_and_prepare
disable_shuffling=self.info.disable_shuffling,
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\split_builder.py", line 310, in submit_split_generation
return self._build_from_generator(**build_kwargs)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\core\split_builder.py", line 376, in _build_from_generator
leave=False,
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\image_classification\mnist.py", line 151, in _generate_examples
images = _extract_mnist_images(data_path, num_examples)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_datasets\image_classification\mnist.py", line 350, in _extract_mnist_images
f.read(16) # header
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 122, in read
self._preread_check()
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_core\python\lib\io\file_io.py", line 84, in _preread_check
compat.as_bytes(self.__name), 1024 * 512)
File "C:\Users\Wilso\Anaconda3\envs\tfgpu\lib\site-packages\tensorflow_core\python\util\compat.py", line 87, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got WindowsGPath('C:\Users\Wilso\tensorflow_datasets\downloads\extracted\GZIP.cvdf-datasets_mnist_train-images-idx3-ubyteRA_Kv3PMVG-iFHXoHqNwJlYF9WviEKQCTSyo8gNSNgk.gz')
Try:
(ds_train, ds_test), ds_info = tfds.load(
"mnist",
split=["train", "test"],
shuffle_files=True,
as_supervised=True, # will return tuple (img, label) otherwise dict
with_info=True, # able to get info about dataset
)

How to save and load using modelcheckpointcallback in tensorflow 2.4.0?

I am using the keras API in tensorflow 2.4.0 to save at different steps my model using this line of code:
tf.keras.callbacks.ModelCheckpoint(os.path.join(log_dir, "checkpoint_{epoch:02d}.tf"),
save_freq=train_steps_per_epoch * cfg.log.checkpoint_save_every_epochs,
save_weights_only=False))
When the model is saved I encounter some warnings:
[2021-03-17 12:11:09,974][absl][WARNING] - Found untraced functions such as nl_0_layer_call_fn, nl_0_layer_call_and_return_conditional_losses, nl_1_layer_call_fn, nl_1_layer_call_and_return_conditional_losses, conv2d_layer_call_fn while saving (showing 5 of 280). These functions will not be directly callable after loading.
And when I load the model using this line of code:
model = tf.keras.models.load_model(os.path.join(train_dir, f'checkpoint_{cfg.training.checkpoint_epoch:02d}.tf'))
I have this error:
Traceback (most recent call last):
File "run/linear_classifier_evaluation.py", line 51, in run
model = tf.keras.models.load_model(os.path.join(train_dir, f'checkpoint_{cfg.training.checkpoint_epoch:02d}.tf'))
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/save.py", line 212, in load_model
return saved_model_load.load(filepath, compile, options)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 138, in load
keras_loader.load_layers(compile=compile)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 376, in load_layers
node_metadata.metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 417, in _load_layer
obj, setter = self._revive_from_config(identifier, metadata, node_id)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 434, in _revive_from_config
self._revive_graph_network(metadata, node_id) or
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/saved_model/load.py", line 471, in _revive_graph_network
inputs=[], outputs=[], name=config['name'])
KeyError: 'name'
I've searched online this error and I've read people suggesting to implement get_config and from_config methods in my model class but I am not using the H5 format so I shouldn't have to do that if I understood correctly the keras tutorial and others solutions where for tf1.x.
I would gladly welcome any help or suggestions on where to look.

How to save complete TensorFlow model while using official TensorFlow object detection API on Retinanet

I am trying to save the complete model using model.save (instead of only checkpoints) at the end of training steps while using official retinanet object detection API. However, I am getting the below error when model.save is called:
I0414 17:18:52.661234 140283524683584 distributed_executor.py:49] Saving model as TF checkpoint: /home/ubuntu/ankur/models/official/vision/detection/ModelDIr/ctl_step_5.ckpt-1
WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0414 17:19:26.865248 140283524683584 deprecation.py:506] From /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1786: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: /home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model/assets
I0414 17:19:33.264274 140283524683584 builder_impl.py:775] Assets written to: /home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model/assets
Traceback (most recent call last):
File "main.py", line 235, in <module>
app.run(main)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "main.py", line 230, in main
run()
File "main.py", line 224, in run
callbacks=callbacks)
File "main.py", line 117, in run_executor
save_config=True)
File "/home/ubuntu/ankur/models/official/modeling/training/distributed_executor.py", line 480, in train
model.save('/home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model')
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 1008, in save
signatures, options)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/save.py", line 115, in save_model
signatures, options)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/saved_model/save.py", line 78, in save
save_lib.save(model, filepath, signatures, options)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/save.py", line 923, in save
saveable_view, asset_info.asset_index)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/save.py", line 647, in _serialize_object_graph
concrete_function, saveable_view.captured_tensor_node_ids, coder)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/function_serialization.py", line 70, in serialize_concrete_function
coder.encode_structure(structured_outputs))
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 95, in encode_structure
return self._map_structure(nested_structure, self._get_encoders())
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 79, in _map_structure
return do(pyobj, recursion_fn)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 203, in do_encode
encoded_dict.dict_value.fields[key].CopyFrom(encode_fn(value))
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 79, in _map_structure
return do(pyobj, recursion_fn)
File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow_core/python/saved_model/nested_structure_coder.py", line 203, in do_encode
encoded_dict.dict_value.fields[key].CopyFrom(encode_fn(value))
TypeError: 3 has type int, but expected one of: bytes, unicode
Only change that I have made in order to save the complete model is to add the command model.save('/home/ubuntu/ankur/models/official/vision/detection/saved_model/test_model') at the end of training in distributed_executor.py found at the path ~/models/official/modeling/training. Kindly suggest what can be the issue.
TensorFlow version- 2.1
GIThub link

Error while testing Tensorflow Object Detection

I've tried all steps of object_detection model installation mentioned #
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
While testing the installation as mentioned in last step of article, I am getting below error.
ERROR: test_create_ssd_mobilenet_v1_model_from_config (__main__.ModelBuilderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 193, in test_create_ssd_mobilenet_v1_model_from_config
model = self.create_model(model_proto)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 53, in create_model
return model_builder.build(model_config, is_training=False)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 73, in build
return _build_ssd_model(model_config.ssd, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 126, in _build_ssd_model
is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 98, in _build_ssd_feature_extractor
feature_extractor_config.conv_hyperparams, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 70, in build
hyperparams_config.regularizer),
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 119, in _build_regularizer
return slim.l2_regularizer(scale=regularizer.l2_regularizer.weight)
File "/Users/RonakBhavsar/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/regularizers.py", line 92, in l2_regularizer
raise ValueError('scale cannot be an integer: %s' % (scale,))
ValueError: scale cannot be an integer: 1
I get this error for all the models mentioned in the test script. Anyone has any ideas?
We have a pull request out that should fix this issue. Please give that a try.