I've tried all steps of object_detection model installation mentioned #
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
While testing the installation as mentioned in last step of article, I am getting below error.
ERROR: test_create_ssd_mobilenet_v1_model_from_config (__main__.ModelBuilderTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 193, in test_create_ssd_mobilenet_v1_model_from_config
model = self.create_model(model_proto)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder_test.py", line 53, in create_model
return model_builder.build(model_config, is_training=False)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 73, in build
return _build_ssd_model(model_config.ssd, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 126, in _build_ssd_model
is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/model_builder.py", line 98, in _build_ssd_feature_extractor
feature_extractor_config.conv_hyperparams, is_training)
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 70, in build
hyperparams_config.regularizer),
File "/Users/RonakBhavsar/eDAM/ML/ObjectRecognition/models/object_detection/builders/hyperparams_builder.py", line 119, in _build_regularizer
return slim.l2_regularizer(scale=regularizer.l2_regularizer.weight)
File "/Users/RonakBhavsar/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/regularizers.py", line 92, in l2_regularizer
raise ValueError('scale cannot be an integer: %s' % (scale,))
ValueError: scale cannot be an integer: 1
I get this error for all the models mentioned in the test script. Anyone has any ideas?
We have a pull request out that should fix this issue. Please give that a try.
Related
I am doing tensorflow source code modification, wherein I am modigying the call() function of the source code. Interestingly, the checkpoint restore() is giving the error of mismatch in shape.
But from my understanding the flow of code is: build()->restore()->call().
Although my change in tensorflow code is expected to alter shape in call phase, is it expected that restoring checkpoint which happens before it will give shape mismatch error? Or, there must be another reason for this error?
Traceback (most recent call last):
File "run_classifier.py", line 549, in <module>
app.run(main)
File "/home/arpitj/projects/lib/python3.8/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/arpitj/projects/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run_classifier.py", line 542, in main
custom_main(custom_callbacks=None, custom_metrics=None)
File "run_classifier.py", line 485, in custom_main
checkpoint.restore(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 2354, in restore
status = self.read(save_path, options=options)
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 2229, in read
result = self._saver.restore(save_path=save_path, options=options)
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 1366, in restore
base.CheckpointPosition(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 254, in restore
restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 983, in _restore_from_checkpoint_position
current_position.checkpoint.restore_saveables(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/tracking/util.py", line 329, in restore_saveables
new_restore_ops = functional_saver.MultiDeviceSaver(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 339, in restore
restore_ops = restore_fn()
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 323, in restore_fn
restore_ops.update(saver.restore(file_prefix, options))
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/functional_saver.py", line 115, in restore
restore_ops[saveable.name] = saveable.restore(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 133, in restore
return resource_variable_ops.shape_safe_assign_variable_handle(
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 309, in shape_safe_assign_variable_handle
shape.assert_is_compatible_with(value_tensor.shape)
File "/home/arpitj/projects/lib/python3.8/site-packages/tensorflow/python/framework/tensor_shape.py", line 1171, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (10, 64, 768) and (12, 64, 768) are incompatible
I'm trying to run on cloud this deep learning model:
https://github.com/razvanmarinescu/brgm#image-reconstruction-with-pre-trained-stylegan2-generators
What I do is simply utilizing their Colab Notebook: https://colab.research.google.com/drive/1G7_CGPHZVGFWIkHOAke4HFg06-tNHIZ4?usp=sharing#scrollTo=qMgE6QFiHuSL
When I try to exectute:
!python recon.py recon-real-images --input=/content/drive/MyDrive/boeing/EDGEconnect/val_imgs --masks=/content/drive/MyDrive/boeing/EDGEconnect/val_masks --tag=brains --network=dropbox:brains.pkl --recontype=inpaint --num-steps=1000 --num-snapshots=1
I receive this error:
args: Namespace(command='recon-real-images', input='/content/drive/MyDrive/boeing/EDGEconnect/val_imgs', masks='/content/drive/MyDrive/boeing/EDGEconnect/val_masks', network_pkl='dropbox:brains.pkl', num_snapshots=1, num_steps=1000, recontype='inpaint', superres_factor=4, tag='brains')
Local submit - run_dir: results/00004-brains-inpaint
dnnlib: Running recon.recon_real_images() on localhost...
Processing image 1/4
Loading networks from "dropbox:brains.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Failed!
Traceback (most recent call last):
File "recon.py", line 270, in <module>
main()
File "recon.py", line 263, in main
dnnlib.submit_run(sc, func_name_map[subcmd], **kwargs)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/internal/local.py", line 22, in submit
return run_wrapper(submit_config)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "/content/drive/MyDrive/boeing/brgm/brgm/recon.py", line 189, in recon_real_images
recon_real_one_img(network_pkl, img_list[image_idx], masks, num_snapshots, recontype, superres_factor, num_steps)
File "/content/drive/MyDrive/boeing/brgm/brgm/recon.py", line 132, in recon_real_one_img
_G, _D, Gs = pretrained_networks.load_networks(network_pkl)
File "/content/drive/MyDrive/boeing/brgm/brgm/pretrained_networks.py", line 83, in load_networks
G, D, Gs = pickle.load(stream, encoding='latin1')
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/network.py", line 297, in __setstate__
self._init_graph()
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "<string>", line 395, in G_synthesis_stylegan2
File "<string>", line 359, in layer
File "<string>", line 106, in modulated_conv2d_layer
File "<string>", line 75, in apply_bias_act
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
cuda_kernel = _get_plugin().fused_bias_act
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/custom_ops.py", line 156, in get_plugin
plugin = tf.load_op_library(bin_file)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/_cudacache/fused_bias_act_237d55aca3e3c3ec0547da06888d8e66.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I found that the very last part of an error:
tensorflow.python.framework.errors_impl.NotFoundError: /content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/_cudacache/fused_bias_act_237d55aca3e3c3ec0547da06888d8e66.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Can be solved by changing a flag in Cuda Makefile: https://github.com/mgharbi/hdrnet_legacy/issues/2 or by installing tf 1.14(colab runs on 1.15.2 and this change made no positive effect).
My question is, how can I get rid of this error, is there an option to change smth inside Google Colab's Cuda Makefile?
i'm trying to evaluate my model
using this command:
python eval.py --logtostderr --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config --checkpoint_dir=inference_graph --eval_dir=eval
and im getting this error
and I'm getting this error:
Traceback (most recent call last): File "eval.py", line 142, in tf.app.run() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\app.py", line 299, in run _run_main(main, args) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\app.py", line 250, in _run_main sys.exit(main(argv)) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func return func(*args, **kwargs) File "eval.py", line 138, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\evaluator.py", line 274, in evaluate evaluator_list = get_evaluators(eval_config, categories) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\evaluator.py", line 166, in get_evaluators EVAL_METRICS_CLASS_DICTeval_metric_fn_key) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 470, in init use_weighted_mean_ap=False) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 194, in init self._build_metric_names() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 213, in _build_metric_names category_name = unicode(category_name, 'utf-8') NameError: name 'unicode' is not defined
Hi there!
Python 3 renamed the unicode type to str, the old str type has been replaced by bytes.
Knowing this it makes sense that we're getting errors as parts of the TF Object Detection API are deprecated (written using Python 2.x)
See here for more explanation on how to upgrade the code to be compatible with Python 3.
I hope this helps!
I'm using 3 tf.contrib.cudnn_rnn.CudnnLSTM(1, 128, direction='bidirectional') layers with a batch size of 32 on an AWS p2.xlarge instance. The exact same configuration works correctly with non-eager(standard) tensorflow. Following is the error log:
2018-04-27 18:15:59.139739: E tensorflow/stream_executor/cuda/cuda_dnn.cc:1520] Failed to allocate RNN workspace of 74252288 bytes.
2018-04-27 18:15:59.139758: E tensorflow/stream_executor/cuda/cuda_dnn.cc:1697] Unable to create rnn workspace
Traceback (most recent call last):
File "tf_run_eager.py", line 424, in <module>
run_experiments()
File "tf_run_eager.py", line 417, in run_experiments
train_losses.append(model.optimize(bX, bY).numpy())
File "tf_run_eager.py", line 397, in optimize
loss, grads_and_vars = self.loss(phoneme_features, utterances)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 233, in grad_fn
sources)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/imperative_grad.py", line 65, in imperative_grad
tape._tape, vspace, target, sources, output_gradients, status) # pylint: disable=protected-access
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 141, in grad_fn
op_inputs, op_outputs, orig_outputs)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 109, in _magic_gradient_function
return grad_fn(mock_op, *out_grads)
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 1609, in _cudnn_rnn_backward
direction=op.get_attr("direction"))
File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/ops/gen_cudnn_rnn_ops.py", line 320, in cudnn_rnn_backprop
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed to call ThenRnnBackward [Op:CudnnRNNBackprop]
When I try to run mnist_with_summaries.py I get the following error:
Traceback (most recent call last):
File "/home/rob/tf_from_source/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py", line 110, in
tf.app.run()
File "/home/rob/.virtualenvs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "/home/rob/tf_from_source/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py", line 85, in main
writer = tf.train.SummaryWriter(FLAGS.summaries_dir, sess.graph)
File "/home/rob/.virtualenvs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/summary_io.py", line 104, in init
self.add_graph(graph_def)
File "/home/rob/.virtualenvs/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/training/summary_io.py", line 168, in add_graph
graph_bytes = graph_def.SerializeToString()
AttributeError: 'Graph' object has no attribute 'SerializeToString'
I'm also seeing this error on some of my own code when I try to generate a tensorboard graph. Any ideas about the problem and solution would be appreciated.
Looks like the answer was provided here. I rolled back to the r0.7 branch and the problem was resolved.