tensorflow.python.framework.errors_impl.NotFoundError while running deep learning model on Google Colaboratory - tensorflow

I'm trying to run on cloud this deep learning model:
https://github.com/razvanmarinescu/brgm#image-reconstruction-with-pre-trained-stylegan2-generators
What I do is simply utilizing their Colab Notebook: https://colab.research.google.com/drive/1G7_CGPHZVGFWIkHOAke4HFg06-tNHIZ4?usp=sharing#scrollTo=qMgE6QFiHuSL
When I try to exectute:
!python recon.py recon-real-images --input=/content/drive/MyDrive/boeing/EDGEconnect/val_imgs --masks=/content/drive/MyDrive/boeing/EDGEconnect/val_masks --tag=brains --network=dropbox:brains.pkl --recontype=inpaint --num-steps=1000 --num-snapshots=1
I receive this error:
args: Namespace(command='recon-real-images', input='/content/drive/MyDrive/boeing/EDGEconnect/val_imgs', masks='/content/drive/MyDrive/boeing/EDGEconnect/val_masks', network_pkl='dropbox:brains.pkl', num_snapshots=1, num_steps=1000, recontype='inpaint', superres_factor=4, tag='brains')
Local submit - run_dir: results/00004-brains-inpaint
dnnlib: Running recon.recon_real_images() on localhost...
Processing image 1/4
Loading networks from "dropbox:brains.pkl"...
Setting up TensorFlow plugin "fused_bias_act.cu": Preprocessing... Loading... Failed!
Traceback (most recent call last):
File "recon.py", line 270, in <module>
main()
File "recon.py", line 263, in main
dnnlib.submit_run(sc, func_name_map[subcmd], **kwargs)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/internal/local.py", line 22, in submit
return run_wrapper(submit_config)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/submission/submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "/content/drive/MyDrive/boeing/brgm/brgm/recon.py", line 189, in recon_real_images
recon_real_one_img(network_pkl, img_list[image_idx], masks, num_snapshots, recontype, superres_factor, num_steps)
File "/content/drive/MyDrive/boeing/brgm/brgm/recon.py", line 132, in recon_real_one_img
_G, _D, Gs = pretrained_networks.load_networks(network_pkl)
File "/content/drive/MyDrive/boeing/brgm/brgm/pretrained_networks.py", line 83, in load_networks
G, D, Gs = pickle.load(stream, encoding='latin1')
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/network.py", line 297, in __setstate__
self._init_graph()
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/network.py", line 154, in _init_graph
out_expr = self._build_func(*self.input_templates, **build_kwargs)
File "<string>", line 395, in G_synthesis_stylegan2
File "<string>", line 359, in layer
File "<string>", line 106, in modulated_conv2d_layer
File "<string>", line 75, in apply_bias_act
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 68, in fused_bias_act
return impl_dict[impl](x=x, b=b, axis=axis, act=act, alpha=alpha, gain=gain)
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 122, in _fused_bias_act_cuda
cuda_kernel = _get_plugin().fused_bias_act
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/ops/fused_bias_act.py", line 16, in _get_plugin
return custom_ops.get_plugin(os.path.splitext(__file__)[0] + '.cu')
File "/content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/custom_ops.py", line 156, in get_plugin
plugin = tf.load_op_library(bin_file)
File "/tensorflow-1.15.2/python3.7/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/_cudacache/fused_bias_act_237d55aca3e3c3ec0547da06888d8e66.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
I found that the very last part of an error:
tensorflow.python.framework.errors_impl.NotFoundError: /content/drive/MyDrive/boeing/brgm/brgm/dnnlib/tflib/_cudacache/fused_bias_act_237d55aca3e3c3ec0547da06888d8e66.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Can be solved by changing a flag in Cuda Makefile: https://github.com/mgharbi/hdrnet_legacy/issues/2 or by installing tf 1.14(colab runs on 1.15.2 and this change made no positive effect).
My question is, how can I get rid of this error, is there an option to change smth inside Google Colab's Cuda Makefile?

Related

Error while exciting the eval.py on TF object detection API

i'm trying to evaluate my model
using this command:
python eval.py --logtostderr --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config --checkpoint_dir=inference_graph --eval_dir=eval
and im getting this error
and I'm getting this error:
Traceback (most recent call last): File "eval.py", line 142, in tf.app.run() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\app.py", line 299, in run _run_main(main, args) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\absl\app.py", line 250, in _run_main sys.exit(main(argv)) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func return func(*args, **kwargs) File "eval.py", line 138, in main graph_hook_fn=graph_rewriter_fn) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\evaluator.py", line 274, in evaluate evaluator_list = get_evaluators(eval_config, categories) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\legacy\evaluator.py", line 166, in get_evaluators EVAL_METRICS_CLASS_DICTeval_metric_fn_key) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 470, in init use_weighted_mean_ap=False) File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 194, in init self._build_metric_names() File "C:\Users\mosta\Anaconda3\envs\mat\lib\site-packages\object_detection-0.1-py3.5.egg\object_detection\utils\object_detection_evaluation.py", line 213, in _build_metric_names category_name = unicode(category_name, 'utf-8') NameError: name 'unicode' is not defined
Hi there!
Python 3 renamed the unicode type to str, the old str type has been replaced by bytes.
Knowing this it makes sense that we're getting errors as parts of the TF Object Detection API are deprecated (written using Python 2.x)
See here for more explanation on how to upgrade the code to be compatible with Python 3.
I hope this helps!

Apache BEAM pipeline fails when writing TF Records - AttributeError: 'str' object has no attribute 'iteritems'

The issue started appearing over the weekend. For some reason, it feels to be a DataFlow issue.
Previously, I was able to execute the script and write TF records just fine. However, now, I am unable to initialize the computation graph to process the data.
The traceback is:
Traceback (most recent call last):
File "my_script.py", line 1492, in <module>
MyBeamClass()
File "my_script.py", line 402, in __init__
self.run()
File "my_script.py", line 514, in run
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
File "/anaconda3/envs/ml27/lib/python2.7/site-packages/apache_beam/pipeline.py", line 426, in __exit__
self.run().wait_until_finish()
File "/anaconda3/envs/ml27/lib/python2.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1238, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 176, in execute
op.start()
File "apache_beam/runners/worker/operations.py", line 531, in apache_beam.runners.worker.operations.DoOperation.start
def start(self):
File "apache_beam/runners/worker/operations.py", line 532, in apache_beam.runners.worker.operations.DoOperation.start
with self.scoped_start_state:
File "apache_beam/runners/worker/operations.py", line 533, in apache_beam.runners.worker.operations.DoOperation.start
super(DoOperation, self).start()
File "apache_beam/runners/worker/operations.py", line 202, in apache_beam.runners.worker.operations.Operation.start
def start(self):
File "apache_beam/runners/worker/operations.py", line 206, in apache_beam.runners.worker.operations.Operation.start
self.setup()
File "apache_beam/runners/worker/operations.py", line 480, in apache_beam.runners.worker.operations.DoOperation.setup
with self.scoped_start_state:
File "apache_beam/runners/worker/operations.py", line 485, in apache_beam.runners.worker.operations.DoOperation.setup
pickler.loads(self.spec.serialized_fn))
File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in loads
return dill.loads(s)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 317, in loads
return load(file, ignore)
File "/usr/local/lib/python2.7/dist-packages/dill/_dill.py", line 305, in load
obj = pik.load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1232, in load_build
for k, v in state.iteritems():
AttributeError: 'str' object has no attribute 'iteritems'
I am using tensorflow==1.13.1 and tensorflow-transform==0.9.0 and apache_beam==2.7.0
with beam.Pipeline(options=self.pipe_opt) as p:
with beam_impl.Context(temp_dir=self.google_cloud_options.temp_location):
# rest of the script
_ = (
transform_fn
| 'WriteTransformFn' >>
transform_fn_io.WriteTransformFn(path=self.JOB_DIR + '/transform/'))
I was experiencing the same error.
It seems to be triggered by a mismatch in the tensorflow-transform versions of your local (or master) machine and the workers one (specified in the setup.py file).
In my case I was running tensorflow-transform==0.13 on my local machine whereas the workers were running 0.8.
Downgrading the local version to 0.8 fixed the issue.

Protobuf errors while using Tensorflow Object Detection API locally

I got tensorflow and object detection API on my machine.
Test run shows that everything works.
~ $ cd models/research
research $ protoc object_detection/protos/*.proto --python_out=.
research $ export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim
research $ python3 object_detection/builders/model_builder_test.py
...............
----------------------------------------------------------------------
Ran 15 tests in 0.144s
OK
Then I tried to retrain a model and got the protobuf error
research $ cd object_detection
object_detection $ python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=ssdlite_mobilenet_v2_coco_2018_05_09/pipeline.config
WARNING:tensorflow:From /Users/me/models/research/object_detection/trainer.py:257: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
Traceback (most recent call last):
File "/Users/me/models/research/object_detection/utils/label_map_util.py", line 135, in load_labelmap
text_format.Merge(label_map_string, label_map)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 533, in Merge
descriptor_pool=descriptor_pool)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 587, in MergeLines
return parser.MergeLines(lines, message)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 620, in MergeLines
self._ParseOrMerge(lines, message)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 635, in _ParseOrMerge
self._MergeField(tokenizer, message)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 735, in _MergeField
merger(tokenizer, message, field)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 823, in _MergeMessageField
self._MergeField(tokenizer, sub_message)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 722, in _MergeField
tokenizer.Consume(':')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/text_format.py", line 1087, in Consume
raise self.ParseError('Expected "%s".' % token)
google.protobuf.text_format.ParseError: 3:10 : Expected ":".
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 1083, in MergeFromString
if self._InternalParse(serialized, 0, length) != length:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 1105, in InternalParse
(tag_bytes, new_pos) = local_ReadTag(buffer, pos)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/internal/decoder.py", line 181, in ReadTag
while six.indexbytes(buffer, pos) & 0x80:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 184, in <module>
tf.app.run()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "/Users/me/models/research/object_detection/trainer.py", line 264, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "/Users/me/models/research/object_detection/trainer.py", line 59, in create_input_queue
tensor_dict = create_tensor_dict_fn()
File "train.py", line 121, in get_next
dataset_builder.build(config)).get_next()
File "/Users/me/models/research/object_detection/builders/dataset_builder.py", line 155, in build
label_map_proto_file=label_map_proto_file)
File "/Users/me/models/research/object_detection/data_decoders/tf_example_decoder.py", line 245, in __init__
use_display_name)
File "/Users/me/models/research/object_detection/utils/label_map_util.py", line 152, in get_label_map_dict
label_map = load_labelmap(label_map_path)
File "/Users/me/models/research/object_detection/utils/label_map_util.py", line 137, in load_labelmap
label_map.ParseFromString(label_map_string)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/message.py", line 185, in ParseFromString
self.MergeFromString(serialized)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/google/protobuf/internal/python_message.py", line 1089, in MergeFromString
raise message_mod.DecodeError('Truncated message.')
google.protobuf.message.DecodeError: Truncated message.
object_detection $
I tried a bunch of similar problems' solutions, but they didn't work for my case. For example this one suggest to encode pbtxt file with ASCII.
Python 2 gives an error too. Here is it's last line
google.protobuf.message.DecodeError: Unexpected end-group tag.
Context:
macOS 10.13.4
Local run on CPU
Python 3.6.4
protobuf 3.5.1
libprotoc 3.4.0
tensorflow 1.8.0
Google Cloud SDK 200.0.0
bq 2.0.33
core 2018.04.30
gsutil 4.31

Tensorflow TF_records Generate Error

When I try to generate TF record, I'm getting the following error message:
Traceback (most recent call last):
File "generate_tfrecord.py", line 112, in <module>
tf.app.run()
File "/home/harisohmnaathss/anaconda3/envs/tensorflow/lib/python3.5/site-
packages/tensorflow/python/platform/app.
py", line 124, in run
_sys.exit(main(argv))
File "generate_tfrecord.py", line 98, in main
writer = tf.python_io.TFRecordWriter(FLAGS.output_path)
File "/home/harisohmnaathss/anaconda3/envs/tensorflow/lib/python3.5/site-
packages/tensorflow/python/lib/io/tf_rec
ord.py", line 106, in __init__
compat.as_bytes(path), compat.as_bytes(compression_type), status)
File "/home/harisohmnaathss/anaconda3/envs/tensorflow/lib/python3.5/site-
packages/tensorflow/python/framework/err
ors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or
directory
The command that I try to run is:
python generate_tfrecord.py --csv_input=data/Train_labels.csv
--output_path=data/train.records
Any ideas to solve this issue?
You are supplying output path as data/train.records instead of data/train.tfrecord

unorderable types: str() < tuple() when train pet detector by google object detection api

I train pet detector by google object detection api and get error as fellow:Does it mean sorted fun does not support the dict's key type is tuple and the object detection api still does not support python3? 
Traceback (most recent call last):
File "D:\Program Files\JetBrains\PyCharm 2017.1.1\helpers\pydev\pydevd.py", line 1578, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "D:\Program Files\JetBrains\PyCharm 2017.1.1\helpers\pydev\pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "D:\Program Files\JetBrains\PyCharm 2017.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "E:/Work/Lib/tensorflow/models/object_detection/train.py", line 198, in <module>
tf.app.run()
File "D:\Program Files\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "E:/Work/Lib/tensorflow/models/object_detection/train.py", line 194, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "E:\Work\Lib\tensorflow\models\object_detection\trainer.py", line 184, in train
data_augmentation_options)
File "E:\Work\Lib\tensorflow\models\object_detection\trainer.py", line 77, in _create_input_queue
prefetch_queue_capacity=prefetch_queue_capacity)
File "E:\Work\Lib\tensorflow\models\object_detection\core\batcher.py", line 93, in __init__
num_threads=num_batch_queue_threads)
File "D:\Program Files\Python\Python35\lib\site-packages\tensorflow\python\training\input.py", line 919, in batch
name=name)
File "D:\Program Files\Python\Python35\lib\site-packages\tensorflow\python\training\input.py", line 697, in _batch
tensor_list = _as_tensor_list(tensors)
File "D:\Program Files\Python\Python35\lib\site-packages\tensorflow\python\training\input.py", line 385, in _as_tensor_list
return [tensors[k] for k in sorted(tensors)]
TypeError: unorderable types: str() < tuple()
I ran into the same problem. I traced the issue down to a python 3 compat issue in TensorFlow. I have submitted a fix for it here: https://github.com/tensorflow/tensorflow/pull/11039