TensorFlow `AssertionError` on `fit()` method - tensorflow

I get a AssertionError when passing my tf.Dataset into the tf.Keras Model's fit() method.
I am using tensorflow==2.0.0.
I checked if my dataset works by:
# for x,y in dataset:
# print(x.shape, y.shape)
which yields correct shapes for models input data.
The full trace is:
Traceback (most recent call last):
File "/anaconda3/envs/ml36/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/anaconda3/envs/ml36/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/me/train.py", line 102, in <module>
start_training(**arguments)
File "/me/train.py", line 66, in start_training
steps_per_epoch=TRAIN_STEPS_PER_EPOCH,
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
use_multiprocessing=use_multiprocessing)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 789, in fit
*args, **kwargs)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 776, in wrapper
mode=dc.CoordinatorMode.INDEPENDENT_WORKER)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_coordinator.py", line 782, in run_distribute_coordinator
rpc_layer)
File "/anaconda3/envs/ml36/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_coordinator.py", line 344, in _run_single_worker
assert strategy
AssertionError

I had the same error when running gcloud ai-platform local train on the final release of tensorflow 2.0.0. However, it was working on earlier releases. Try to downgrade to 2.0.0b1:
pip install tensorflow==2.0.0b1
--
Also found that you don't get this error if you run directly in python or if you run it in the cloud.

If you are training locally without using any distributed strategies you can add following lines to your code to solve this issue:
TF_CONFIG = os.environ.get('TF_CONFIG')
if TF_CONFIG:
os.environ.pop('TF_CONFIG')

Related

Conversion from Tensorflow to Onnx

I want to transform this TF model: ICNET_0.5 to onnx and I followed this example: ConvertingSSDMobilenetToONNX
I understood if I just want to inference I should use the frozen graph (in my case: frozen_inference_graph.pb) so I changed the name to savel_model.pb (it seems that tf2onnx does not recognize other name) and run the following with this error:
C:\Users\esarojp\Desktop\newmodel\0818_icnet_0.5_1025_resnet_v1.tar> python -m tf2onnx.convert --opset 10 --fold_const --saved-model .\0818_icnet_0.5_1025_resnet_v1\saved_model\ --output MODEL.onnx
- WARNING - From C:\Users\esarojp\AppData\Local\Continuum\anaconda3\lib\site-packages\tf2onnx\verbose_logging.py:72: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
Traceback (most recent call last):
File "C:\Users\esarojp\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\esarojp\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\esarojp\AppData\Local\Continuum\anaconda3\lib\site-packages\tf2onnx\convert.py", line 161, in <module>
main()
File "C:\Users\esarojp\AppData\Local\Continuum\anaconda3\lib\site-packages\tf2onnx\convert.py", line 123, in main
args.saved_model, args.inputs, args.outputs, args.signature_def)
File "C:\Users\esarojp\AppData\Local\Continuum\anaconda3\lib\site-packages\tf2onnx\loader.py", line 103, in from_saved_model
meta_graph_def = tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], model_path)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\saved_model\loader_impl.py", line 269, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\saved_model\loader_impl.py", line 422, in load
**saver_kwargs)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\saved_model\loader_impl.py", line 349, in load_graph
meta_graph_def = self.get_meta_graph_def_from_tags(tags)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\saved_model\loader_impl.py", line 327, in get_meta_graph_def_from_tags
"\navailable_tags: " + str(available_tags))
RuntimeError: MetaGraphDef associated with tags 'serve' could not be found in SavedModel. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI: `saved_model_cli`
available_tags: [set()]
and when I run:
C:\Users\esarojp\Desktop\newmodel\0818_icnet_0.5_1025_resnet_v1.tar> saved_model_cli show --dir .\0818_icnet_0.5_1025_resnet_v1\saved_model\ --tag_set serve --signature_def serving_default
Traceback (most recent call last):
File "C:\Users\esarojp\AppData\Local\Continuum\anaconda3\Scripts\saved_model_cli-script.py", line 10, in <module>
sys.exit(main())
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\tools\saved_model_cli.py", line 909, in main
args.func(args)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\tools\saved_model_cli.py", line 621, in show
_show_inputs_outputs(args.dir, args.tag_set, args.signature_def)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\tools\saved_model_cli.py", line 133, in _show_inputs_outputs
tag_set)
File "C:\Users\esarojp\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\tools\saved_model_utils.py", line 120, in get_meta_graph_def
' could not be found in SavedModel')
RuntimeError: MetaGraphDef associated with tag-set serve could not be found in SavedModel
I think that something is pointing to frozen_inference_graph.pb on the other files but it does not exist anymore (although it says that all the weights are inside the graph).
Any idea of what is wrong?
I did it wrong, I tried to convert the model from a SavedModel using a Frozen graph, to convert a Frozen graph it is needed to add graphdef flag and to specify inputs and outputs.
python -m tf2onnx.convert --graphdef .\0818_icnet_0.5_1025_resnet_v1\frozen_inference_graph.pb --output frozen.onnx --fold_const --opset 10 --inputs inputs:0 --outputs predictions:0

how to solve 'tensorflow' has no attribute 'init_scope'

I'm using TensorFlow 1.8 in windows. I used object_detection sample for my project and when i run train.py i get this error:
(tensorflow_gpu) C:\Users\hewil\Desktop\Tensorflow\models\research>python train.py --logtostderr --train_dir=object_detection/CAPTCHA_training/ --pipeline_config_path=object_detection/CAPTCHA_training/faster_rcnn_inception_v2_coco.config
WARNING:tensorflow:From C:\Users\hewil\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py:126: main (from __main__) is deprecated and will be removed in a future version.
Instructions for updating:
Use object_detection/model_main.py.
WARNING:tensorflow:From C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\legacy\trainer.py:266: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
Traceback (most recent call last):
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Users\hewil\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "C:\Users\hewil\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 250, in new_func
return func(*args, **kwargs)
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\legacy\trainer.py", line 291, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "C:\Users\hewil\Desktop\Tensorflow\models\research\slim\deployment\model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\legacy\trainer.py", line 204, in _create_losses
prediction_dict = detection_model.predict(images, true_image_shapes)
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py", line 822, in predict
prediction_dict = self._predict_first_stage(preprocessed_inputs)
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py", line 873, in _predict_first_stage
image_shape) = self._extract_rpn_feature_maps(preprocessed_inputs)
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\meta_architectures\faster_rcnn_meta_arch.py", line 1252, in _extract_rpn_feature_maps
feature_map_shape[2])]))
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\core\anchor_generator.py", line 108, in generate
anchors_list = self._generate(feature_map_shape_list, **params)
File "C:\Users\hewil\Desktop\Tensorflow\models\research\object_detection\anchor_generators\grid_anchor_generator.py", line 111, in _generate
with tf.init_scope():
AttributeError: module 'tensorflow' has no attribute 'init_scope'
how should I solve that?
Following this thread https://github.com/tensorflow/hub/issues/324 it seems that TensorFlow only has the init_scope attribute on version higher than 9. I have TensorFlow 1.12 up and running perfectly for object detection as well.
Give that a shot and let me know how that goes.
P.S. you might want to purge your installation and start from scratch with the specific version. 1.12.0 should do the trick.
Cheers,

cudnn handle not created, solve is clear, how to implement

Hello I am using ubuntu 16.04, ROS kinetic, tensorflow 1.13.1.
My aim to combine an ensenso n35 camera with its rosdriver to the mask rcnn node created for ROS. I have altered the original code for the mask rcnn node so that it takes a grayscale input an stacks it onto itself. I have actually already verified this to work by using a virtual version of the ensenso camera.The sdk contains an app that sets this up. It outputs a white image, however, this should not be an issue for testing functionality. The problem arrises when I attacht the actual camera to the system. This gives the following error:
2019-03-28 13:30:43.113919: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
2019-03-28 13:30:43.872243: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-28 13:30:43.874466: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
None
None
Traceback (most recent call last):
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 182, in <module>
main()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 179, in main
node.run()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 104, in run
results = self._model.detect([np_image], verbose=0)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 2340, in detect
self.keras_model.predict([molded_images, image_metas], verbose=0)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1790, in predict
verbose=verbose, steps=steps)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/training.py", line 1299, in _predict_loop
batch_outs = f(ins_batch)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2357, in __call__
**self.session_kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1156, in _run
feed_dict_tensor, options, run_metadata)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_run
run_metadata)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1354, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1/convolution (defined at /home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3195) ]]
[[node ROI/strided_slice_20 (defined at /home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/utils.py:687) ]]
Caused by op u'conv1/convolution', defined at:
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 182, in <module>
main()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 178, in main
node = MaskRCNNNode()
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/nodes/mask_rcnn_node", line 65, in __init__
config=config)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 1735, in __init__
self.keras_model = self.build(mode=mode, config=config)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 1791, in build
_, C2, C3, C4, C5 = resnet_graph(input_image, "resnet101", stage5=True)
File "/home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/model.py", line 152, in resnet_graph
x = KL.Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/engine/topology.py", line 603, in __call__
output = self.call(inputs, **kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/layers/convolutional.py", line 164, in call
dilation_rate=self.dilation_rate)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 3195, in conv2d
data_format=tf_data_format)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 851, in convolution
return op(input, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in __call__
return self.conv_op(inp, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in __call__
return self.call(inp, filter)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in __call__
name=self.name)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/riwo-rack-pc/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv1/convolution (defined at /home/riwo-rack-pc/.local/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:3195) ]]
[[node ROI/strided_slice_20 (defined at /home/riwo-rack-pc/ROS_Mask_rcnn/src/mask_rcnn_ros/src/mask_rcnn_ros/utils.py:687) ]]
I can't, for the life of me, figure out where this goes into error nor why. I was ensured that the virtual camera outputs the same data as the actual would, but the error only occurs when using the actual camera.
What i have found so far is that the following statement should be added somewhere in the code but I can not think of, or find, the proper placement for it:
config_pb2.GPUOptions(allow_growth=True)
Help would be much appreciated! Also if anyone thinks this question is better asked elsewhere I will move it there.
I have seen that you are using python=2.7, in the Mask-Rcnn documentation requires.
python_requires='>=3.4',
Other things you should consider.
If you're trying to use your gpu you shloud use tensorflow-gpu.
$ pip install tensorflow-gpu

ImageProjectiveTransformV2 error in loading meta graph by import_meta_graph

I am trying to load meta graph of trained networks "name.ckpt-1.meta" using tf.train.import_meta_graph("./name.ckpt-1.meta")
but the following error appears:
Traceback (most recent call last):
File "/home/rapsodo/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-634d5d15ac05>", line 1, in <module>
saver = tf.train.import_meta_graph(input_checkpoint + '.meta', clear_devices=False)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1960, in import_meta_graph
**kwargs)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 744, in import_scoped_meta_graph
producer_op_list=producer_op_list)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 391, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "/home/rapsodo/workspace_mike3352/anaconda2/envs/mike_tfpy36/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 158, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'ImageProjectiveTransformV2'
I did not understand the reason and did not saw same thing somewhere else, Im not sure because of the tensorflow version or something else.
I found the solution; It is because of version mismatch. Newer version of tensorflow does not match with older versions in terms of saving graph etc.
If we saved checkpoints with older version, we should use proper version (same version is preferred) to load meta graph or frozen graph.

Why the tutorial of cnn in tensorflow doesn't work?

I try to test this tutorial https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/layers/cnn_mnist.py about Convolutional Neural Network, It is explained , but I find this error:
Traceback (most recent call last):
File "Convolution_Neural_Network.py", line 161, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "Convolution_Neural_Network.py", line 129, in main
model_fn=cnn_model_fn, model_dir="/mnist_convnet_model/")
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 171, in __init__
_verify_model_fn_args(model_fn, params)
File "/usr/lib/python2.7/site-packages/tensorflow/python/estimator/estimator.py", line 742, in _verify_model_fn_args
raise ValueError('model_fn (%s) must include features argument.' % model_fn)
ValueError: model_fn (<function cnn_model_fn at 0x53790c8>) must include features argument.
The error is from this line:
model_fn=cnn_model_fn, model_dir="/mnist_convnet_model/")
I would be very grateful if you could help me please.
you probably renamed the variable called 'features' in definition of function cnn_model_fn. I had similiar problem, because I renamed it to 'inputs'.