How can I run mobiledet model successfully with the pretrained model in TF1 model zoo from TensorFlow object detection api? - tensorflow

I want to test the mobiledet model provided in the TF1 model zoo from TensorFlow object detection api. tf1 object detection model zoo
since the pretrained files contain both the pb file and the ckpt files the Screenshot of ckpt files.
So, I have tried two methods to load the pretrained model to do inference.
Firstly, I tried to load the tflite_graph.pb directly.I encountered the following problem, I tried to change the tf version, but it still did not solve.
The code is like this:
MODEL_DIR = '/tf_ckpts/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/'
MODEL_CHECK_FILE = os.path.join(MODEL_DIR, 'tflite_graph.pb')
graph = tf.Graph()
with graph.as_default():
graph_def = tf.GraphDef()
with tf.gfile.Open(MODEL_CHECK_FILE,'rb') as f:
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def, name='')
Traceback (most recent call last):
File "/home/zhaoxin/workspace/models-1.12.0/research/inference_demo.py", line 41, in <module>
tf.import_graph_def(graph_def, name='')
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 505, in _import_graph_def_internal
raise ValueError(str(e))
ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node FeatureExtractor/MobileDetCPU/Conv/BatchNorm/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
Then, I tried to load the ckpt files to run the model.
mobiledet = 'tf_ckpts/ssdlite_mobiledet_cpu_320x320_coco_2020_05_19/'
meta_path = mobiledet+'model.ckpt-400000.meta'
ckpt_path = mobiledet+'model.ckpt-400000'
with tf.Session() as sess:
saver=tf.train.import_meta_graph(meta_path)
saver.restore(sess, ckpt_path)
graph = tf.get_default_graph()
The error like this:
Traceback (most recent call last):
File "/home/zhaoxin/workspace/models-1.12.0/research/tf_load.py", line 15, in <module>
saver=tf.train.import_meta_graph(meta_path)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
**kwargs)[0]
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/home/zhaoxin/tools/miniconda3/envs/tf115/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 'LegacyParallelInterleaveDatasetV2' in binary running on localhost.localdomain. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed.
It seems that the loading errors of the above two methds are caused by the inconsistency of the tf version, but I have tried many tf versions and failed to solve it. Has anyone successfully run the mobiledet model in TF1 object detection model zoo?
OS: linux
TF version: tf 1.15

#Shane Zhao - are you planning on training with custom dataset or are you using the pretrained graph as is? The version of Tensorflow should only matter during training to the best of my knowledge. Anyways please refer this demo from Google in Colab - https://colab.research.google.com/github/luxonis/depthai-ml-training/blob/master/colab-notebooks/Easy_Object_Detection_Demo_Training.ipynb#scrollTo=JDddx2rPfex9

Related

Tensorflow 2.2.0 and Keras save model / load model problems

After adding a custom loss function as #tf.function to my keras DQN, keras models stopped loading (seems to save model, but cannot reload model). Documentation suggests this is really simple, but...
Various SO answers suggest that models trained using one Keras version cannot be loaded into other Keras version. So I uninstalled Keras 2.4.3 (from Anaconda env), to avoid any confusion, and trying to solely model and save/load using Tensorflow-keras.
So, now trying to save a Tensorflow-keras model and then load that model again, but will not re-load, various errors (below). Environment is Anaconda3 python3.8 (with Keras 2.4.3, then uninstalled this) and Tensorflow 2.2.0 (containing Keras 2.3.0-tf).
Is there some solution to simply save a model and then reload a model in tf 2.2.0 (with keras 2.3.0-tf)?
import tensorflow as tf
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Dense, LSTM, Masking, Input
from tensorflow.keras.optimizers import Adam
Then all tf.keras modelling and save / load should be done by Keras-2.3.0-tf, from within Tensorflow. Model save is done with:
agent.model.save(os.path.join(pathOUT, PAIR, 'models' + modelNum, modelFolder),
save_format='tf')
But generates deprecation warning during save:
2020-11-26 00:19:03.388858: W tensorflow/python/util/util.cc:329] Sets are not currently
considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:tensorflow:From C:\..mypath......\lib\site-
packages\tensorflow\python\ops\resource_variable_ops.py:1813: calling
BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with
constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Saved an Intermediate model...
Then attempt to load model with:
model = load_model(LOAD_MODEL)
but generates error during loading:
TypeError: __init__() got an unexpected keyword argument 'reduction'
Again, is there some solution to simply save a model and then reload a model in tf 2.2.0 (with keras 2.3.0-tf)?
Full error:
Traceback (most recent call last):
File mypath, line 851, in <module>
agent = DQNAgent()
File mypath, line 266, in __init__
self.model = self.create_model()
File mypath, line 336, in create_model
model = load_model(LOAD_MODEL)
File mypath\lib\site-packages\tensorflow\python\keras\saving\save.py", line 190, in load_model
return saved_model_load.load(filepath, compile)
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 116,
in load
model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
File mypath\lib\site-packages\tensorflow\python\saved_model\load.py", line 602, in
load_internal
loader = loader_cls(object_graph_proto,
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 188,
in __init__
super(KerasObjectLoader, self).__init__(*args, **kwargs)
File mypath\lib\site-packages\tensorflow\python\saved_model\load.py", line 123, in __init__
self._load_all()
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 209,
in _load_all
self._layer_nodes = self._load_layers()
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 312,
in _load_layers
layers[node_id] = self._load_layer(proto.user_object, node_id)
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 335,
in _load_layer
obj, setter = self._revive_from_config(proto.identifier, metadata, node_id)
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 349,
in _revive_from_config
obj = self._revive_metric_from_config(metadata, node_id)
File mypath\lib\site-packages\tensorflow\python\keras\saving\saved_model\load.py", line 441,
in _revive_metric_from_config
obj = metrics.deserialize(
File mypath\lib\site-packages\tensorflow\python\keras\metrics.py", line 3345, in deserialize
return deserialize_keras_object(
File mypath\lib\site-packages\tensorflow\python\keras\utils\generic_utils.py", line 361, in
deserialize_keras_object
(cls, cls_config) = class_and_config_for_serialized_keras_object(
File mypath\lib\site-packages\tensorflow\python\keras\utils\generic_utils.py", line 327, in
class_and_config_for_serialized_keras_object
deserialized_objects[key] = deserialize_keras_object(
File mypath\lib\site-packages\tensorflow\python\keras\utils\generic_utils.py", line 375, in
deserialize_keras_object
return cls.from_config(cls_config)
File mypath\lib\site-packages\tensorflow\python\keras\metrics.py", line 628, in from_config
return super(MeanMetricWrapper, cls).from_config(config)
File mypath\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 655, in
from_config
return cls(**config)
TypeError: __init__() got an unexpected keyword argument 'reduction'
Additional code:
Custom loss function (trying to implement gradient ascent, by 'flipping' the error gradient), have tried this in various locations (within same agent class as model, outside agent class, other.
#tf.function
def positive_mse(y_true, y_pred):
return -1 * tf.keras.losses.MSE(y_true, y_pred)
The Sequential model I am using is here
Have resolved/bypassed the original keyword argument 'reduction' error by REMOVING the MeanSquareError() metric during model compile of original model. Original model:
model.compile(loss=positive_mse,
optimizer=Adam(lr=LEARNING_RATE, decay=DECAY),
metrics=[tf.keras.losses.MeanSquaredError()])
From the Keras docs: "Note that this is an important difference between loss functions like tf.keras.losses.mean_squared_error and default loss class instances like tf.keras.losses.MeanSquaredError: the function version does not perform reduction, but by default the class instance does."
The MeanSquaaredError loss class function is passing a 'reduction' keyword during evaluation of loss over a minibatch. Removing this metric allows model to be reloaded without error.

Trying to restore model, but tf.train.import_meta_graph(meta_path) raises error

I downloaded pretrained mobilenetV2 models from tensorflow models,and try to restore the graph,but got unexpected error.
Codes to reproduce the error is pretty concise:
import tensorflow as tf
meta_path = 'path/to/mobilenet_v2_0.35_224/mobilenet_v2_0.35_224.ckpt.meta'
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
saver = tf.train.import_meta_graph(meta_path)
then the last line raises error:
Traceback (most recent call last):
File "/home/CVAR/study/codes/languages/python/pycharm/learn_tensorflow/train_mobileNet_v2/test_of_functions/saver_test.py", line 21, in <module>
saver = tf.train.import_meta_graph(meta_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1960, in import_meta_graph
**kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/meta_graph.py", line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 391, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 158, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'InfeedEnqueueTuple'
My system information is :
ubuntu 16.04
python 3.5
tensorflow-gpu 1.9
Any idea?
I recently also met such a problem. It seems like the reason is that the TensorFlow version you use to train the model is different from the version you use to read the graph description proto. What you need to do is to reinstall the TensorFlow to your training version. Otherwise, retraining the model would work.
FYI, the TensorFlow version I used to train is 1.12.0, by contrast, the version I use to load the graph is 1.13.1. Reinstallation solves the problem.
There are some ops not defined. from conv_blocks import * will fix this bug but I got another problem "ValueError: NodeDef expected inputs 'float, int32' do not match 1 inputs specified;". Still debugging, but hope this tip solves your problem.

No OpKernel was registered to support Op 'ShutdownDistributedTPU' with these attrs. Registered devices

I'm trying to restore Mobile-net V2 model using TensorFlow 1.7.0 version from this link, and using the following code, but I am getting an error.
import tensorflow as tf
dir(tf.contrib)
tf.reset_default_graph()
v1 = tf.get_variable("v1", shape=[3])
v2 = tf.get_variable("v2", shape=[5])
saver = tf.train.Saver()
with tf.Session() as sess:
saver = tf.train.import_meta_graph("/mobilenet_v2_1.4_224.ckpt.meta")
saver.restore(sess, "/mobilenet_v2_1.4_224.ckpt.data-00000-of-00001")
I am facing the following error which is related with TPU, where as I have support upto GPU:
Traceback (most recent call last):
File "/home/ext_user1/tensorflow_1.2.1_cp34/lib/python3.4/site-
packages/tensorflow/python/client/session.py", line 1327, in _do_call
return fn(*args)
File "/home/ext_user1/tensorflow_1.2.1_cp34/lib/python3.4/site-
packages/tensorflow/python/client/session.py", line 1310, in _run_fn
self._extend_graph()
File "/home/ext_user1/tensorflow_1.2.1_cp34/lib/python3.4/site-
packages/tensorflow/python/client/session.py", line 1358, in _extend_graph
graph_def.SerializeToString(), status)
File "/home/ext_user1/tensorflow_1.2.1_cp34/lib/python3.4/site-
packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel
was registered to support Op 'ShutdownDistributedTPU' with these attrs.
Registered devices: [CPU], Registered kernels:
[[Node: ShutdownDistributedTPU =
ShutdownDistributedTPU_device="/job:tpu_worker/device:TPU_SYSTEM:0"]]
Please help me.
The fix for this is to clear the preset devices from the metagraph
saver = tf.train.import_meta_graph("/mobilenet_v2_1.4_224.ckpt.meta", clear_devices=True)
The metagraph is used for restoring a training session from a checkpoint. For prediction from this checkpoint the metagraph isn't needed. However if you want to keep training the model then importing the metagraph and clearing devices is the best way.

Q: Tensorflow: how to save/restore tf.data.Dataset?

I made a model with tf.data.Dataset() as a data IO function
then i exported the graph and tried to restore it with meta_graph file
But it failed and following error messages occurred.
I think that tf.data.Dataset() made a C++ object instead of python queue used before.
And the graph_def only has a C++ object handler reference, so the graph_def alone without real C++ object can't load complete graph.
How can I load a executable graph with tf.data.Dataset()?
Or is it impossible for now?
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Function
_make_dataset_5150cb86 is not defined.
[[Node: batch_processing/OneShotIterator = OneShotIterator[container="", dataset_factory=_make_dataset_5150cb86[], output_shapes=[[?,1], [?,299,299,3]], output_types=[DT_INT32, DT_FLOAT], shared_name="",
_device="/job:workers/replica:0/task:0/device:CPU:0"]()]]

tensorflow: ValueError: GraphDef cannot be larger than 2GB

This is the error i got
Traceback (most recent call last):
File "fully_connected_feed.py", line 387, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "fully_connected_feed.py", line 289, in main
run_training()
File "fully_connected_feed.py", line 256, in run_training
saver.save(sess, checkpoint_file, global_step=step)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1386, in save
self.export_meta_graph(meta_graph_filename)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/training/saver.py", line 1414, in export_meta_graph
graph_def=ops.get_default_graph().as_graph_def(add_shapes=True),
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/framework/ops.py", line 2257, in as_graph_def
result, _ = self._as_graph_def(from_version, add_shapes)
File "/home/-/.local/lib/python2.7/site-
packages/tensorflow/python/framework/ops.py", line 2220, in _as_graph_def
raise ValueError("GraphDef cannot be larger than 2GB.")
ValueError: GraphDef cannot be larger than 2GB.
I believe it is from the result of this code
weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="hidden1")[0]
weights = tf.scatter_nd_update(weights,indices, updates)
weights = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope="hidden2")[0]
weights = tf.scatter_nd_update(weights,indices, updates)
I am not sure why my model is getting so big in size (15k steps and 240MB). Any thoughts? thanks!
It's hard to say what is happening without seeing the code, but in general TensorFlow model sizes will not increase with number of steps - they should be fixed.
If the model size is increasing with number of steps, it suggests that the computation graph is being added to on every step. For example, something like:
import tensorflow as tf
with tf.Session() as sess:
for i in xrange(1000):
sess.run(tf.add(1, 2))
# or perhaps sess.run(tf.scatter_nd_update(...)) in your case
will create 3000 nodes in the graph (one for add, one for '1' one for '2' on every iteration). Instead, you want to define your computational graph once and run repeatedly with something like:
import tensorflow as tf
x = tf.add(1, 2)
# or perhaps x = tf.scatter_nd_update(...) in your case
with tf.Session() as sess:
for i in xrange(1000):
sess.run(x)
Which will have a fixed graph of 3 nodes for all the 1000 (and any more) iterations. Hope that helps.