Warning when training a custom model using tensorflow object detection API

Warning when training a custom model using tensorflow object detection API - tensorflow2.0

I have been using below tensorflow object detection tutorial to build a custom object detector.
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/index.html
I have run according to the instructions provided in the google colab with GPU support and then in AWS EC2 instance with GPU support. In both of the cases,I am getting warning and model training stops there.
I have used EfficientDet D6 model from tensorflow 2 detection model garden.
Below is the warning which stops the model training.
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.axis
W0910 14:45:44.534728 140520822372160 util.py:203] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.axis
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.gamma
W0910 14:45:44.534780 140520822372160 util.py:203] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.gamma
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.beta
W0910 14:45:44.534832 140520822372160 util.py:203] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.beta
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.moving_mean
W0910 14:45:44.534884 140520822372160 util.py:203] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.moving_mean
WARNING:tensorflow:Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.moving_variance
W0910 14:45:44.534937 140520822372160 util.py:203] Unresolved object in checkpoint: (root).model._feature_extractor._bifpn_stage.node_input_blocks.7.0.1.1.moving_variance
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0910 14:45:44.534990 140520822372160 util.py:211] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights
Any help or pointer is appreciated.

Use expect_partial() on the load status object,
e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings,
or use assert_consumed().
Official Document

Related

Attribute error while training tensor flow object detection

AttributeError: module 'tensorflow_estimator.python.estimator.api._v1.estimator' has no attribute 'slim'
Got this while training tensor flow object detection.

Colab TPU: INTERNAL: {{function_node __inference_train_function_7167}} failed to connect to all addresses

I am currently trying to train a model for my bachelor's.
The train ETA though is very huge so I have considered using TPUs. However, everytime I try to train with a tpu strategy following this google's notebook I keep getting the following error:
(0) INTERNAL: {{function_node __inference_train_function_7167}} failed to connect to all addresses
Additional GRPC error information from remote target /job:localhost/replica:0/task:0/device:CPU:0:
:{"created":"#1651692210.674048314","description":"Failed to pick subchannel","file":"third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc","file_line":3124,"referenced_errors":[{"created":"#1651692210.674047476","description":"failed to connect to all addresses","file":"third_party/grpc/src/core/lib/transport/error_utils.cc","file_line":163,"grpc_status":14}]}
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
Executing non-communication op <MultiDeviceIteratorGetNextFromShard> originally returned UnavailableError, and was replaced by InternalError to avoid invoking TF network error handling logic.
[[RemoteCall]]
[[IteratorGetNextAsOptional]]
[[strided_slice_69/_310]]
Error as shown in Colab
you can check my TPU boilerplate code here:
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')
tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
My dataset is stored in my google's drive as images.
I am trying to train using tf.keras.model.fit

Runtime Error: tensorflow.python.framework.errors_impl.NotFoundError: Could not find metadata file. [[{{node LoadDataset/_1}}]] [Op:DatasetFromGraph]

As in the tutorial, trying to execute tff.learning.build_federated_averaging_process(model_fn, client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02)) but on orchestrator (server) with data saved on edge node (client) using tf.data.experimental.load() method:
#tff.tf_computation
def make_data():
element_spec = collections.OrderedDict([('x', tf.TensorSpec(shape=(None, 784), dtype=tf.float32, name=None)),
('y', tf.TensorSpec(shape=(None,), dtype=tf.int32, name=None))])
data = tf.data.experimental.load('./train_data', element_spec = element_spec)
return data
However, I'm getting the following error:
W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at dataset_ops.cc:175 : Not found: Could not find metadata file.
[[{{node LoadDataset/_1}}]]
TF data was saved using tf.data.experimental.save(train_data[0], './train_data') method. The implementation works when executed locally: tff.backends.native.set_local_execution_context()
python - 3.7
libraries versions:
tensorflow - 2.5.2
tensorflow-estimator - 2.5.0
tensorflow-federated - 0.19.0
Any help would be most appreciated.

When you decorate a Python function with #tff.tf_computation it will serialize the contents as a tf.Graph to be reused later. Frankly, I do not know how I/O like the experimental tf.data load logic interacts with it.
The recommended pattern would be to avoid that, and instead load the data in Python at the level where you are creating TFF computations, and pass the loaded dataset as an input to your tff.tf_computation or tff.federated_computation, with matching type signature (tff.types.SequenceType).

Can't save save/export and load a keras model that uses eager execution

I'm following the RNN text-generation tutorial with eager execution pretty much line for line. I've trained the model with my own data set and have saved a low loss checkpoint. I'm able to load the weights and generate text but I want to export/save the model so that I can learn how to deploy one using flask. However I can't figure out how. The version I'm using is '1.14.0-rc1'.
The tutorial: https://www.tensorflow.org/tutorials/sequences/text_generation
I have been able to save the model as an HDF5 file but I cannot load it. I've also disabled eager execution but that causes problems with running the code later on. I have tried the following and a few more snippets but those led to nothing as well:
new_model = keras.models.load_model("/content/gdrive/My Drive/ColabNotebooks/ckpt4/my_model.h5")
How ever I get
RuntimeError: tf.placeholder() is not compatible with eager execution.
Lastly I found this in another post and tried it as well but was met with another error:
tf.saved_model.save(model, "/content/gdrive/My Drive/Colab Notebooks/ckpt4/my_model.h5")
error:
AssertionError: Tried to export a function which references untracked object Tensor("StatefulPartitionedCall/args_2:0", shape=(), dtype=resource).TensorFlow objects (e.g. tf.Variable) captured by functions must be tracked by assigning them to an attribute of a tracked object or assigned to an attribute of the main object directly.

TF Object Detection API - Trouble running quantized network after freezing and quantizing my fine-tuned network

TensorFlow Object Detection API
Using the TensorFlow Object Detection API to retrain MobileNet on my own DataSet. The issue occurs as I try to run my inference graph that has been both frozen and quantized.
System:
Ubuntu 16.04,
TensorFlow 1.2 (from source, CPU only),
Bazel 0.4.5
Issue:
Use provided frozen_graph.pb from model zoo.
Quantize to 8-bit using
bazel-bin/tensorflow/tools/graph_transforms/transform_graph.
Run inference
This works, however,
Re-train and produce my own frozen_graph.pb using object_detection/export_inference_graph.py
Quantize to 8-bit using bazel-bin/tensorflow/tools/graph_transforms/transform_graph
Run inference <-- Produces error
Does NOT work, and the error I'm getting during the attempt to run the graph is:
File
"/home/unibap/TensorFlow/tensorflow-python2-sse4.2/local/lib/python2.7/site-packages/tensorflow/python/client/session.py",
line 1298, in _do_call
raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: The node
'Preprocessor/map/while/ResizeImage/ResizeBilinear/eightbit' has
inputs from different frames. The input
'Preprocessor/map/while/ResizeImage/size' is in frame
'Preprocessor/map/while/Preprocessor/map/while/'. The input
'Preprocessor/map/while/ResizeImage/ResizeBilinear_eightbit/Preprocessor/map/while/ResizeImage/ExpandDims/quantize'
is in frame ''.
Since I can quantize and run the provided frozen_graph.pb the issue has to be with the export tool? Which export tool was used to create the frozen_graph.pb that are in the model zoo? Or how was the export tool called?
PS:
Quote from comments in export_inference_graph.pb, assuring me that it should produce a frozen graph if checkpoint is provided.
"Optionally, one can freeze the graph by converting the weights in the provided
checkpoint as graph constants thereby eliminating the need to use a checkpoint
file during inference."
Best

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas