I have a question, but when I was training the tensorflow-object-detection-API, I got the following error. Can you tell me if there is any workaround?
Conducted commnand
python model_main_tf2.py --model_dir=models/my_ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8 --pipeline_config_path=models/my_ssd_mobilenet_v1_fpn_640x640_coco17_tpu-8/pipeline.config
erroer messege
File "model_main_tf2.py", line 115, in <module>
tf.compat.v1.app.run()
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 40, in ru
n
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 106, in main
model_lib_v2.train_loop(
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\object_detection\model_lib_v2.py", line 611, in tr
ain_loop
manager = tf.compat.v2.train.CheckpointManager(
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\tensorflow\python\training\checkpoint_management.p
y", line 640, in __init__
recovered_state = get_checkpoint_state(directory)
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\tensorflow\python\training\checkpoint_management.p
y", line 278, in get_checkpoint_state
file_content = file_io.read_file_to_string(
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 352, in
read_file_to_string
return f.read()
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 117, in
read
self._preread_check()
File "C:\Users\rh731\.virtualenvs\Tensorflow\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 79, in
_preread_check
self._read_buf = _pywrap_file_io.BufferedInputStream(
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position 108: invalid start byte
What I did
-I tried to convert the character code of pipeline.config.
-The API was tested. (It's OK like the attached image.)
-Check if there are any mistakes in the execution command.
Also, when learning on another network, I was able to finish learning to the end without such an error. This time as well, I downloaded and ran the trained model.
Reference site:
·tutorial
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#training-the-model
・ List of trained models https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md
We apologize for the inconvenience, but thank you for your cooperation.
Most Probably it's because you are trying to run a TPU model on your local machine(I guessed that from you PyCharm ScreenShot). Try running a GPU based model or a CPU one.
Related
Below output_saved_model_dir in this directory i am having trt file named final_model_gender_classification_gpu0_int8.trt
output_saved_model_dir='/home/cocoslabs/Downloads/age_gender_trt'
saved_model_loaded = tf.saved_model.load(output_saved_model_dir, tags=[tag_constants.SERVING])
When I run the above script it showing error as follows:
File "test.py", line 7, in <module>
saved_model_loaded = tf.saved_model.load(output_saved_model_dir, tags=[tag_constants.SERVING])
File "/home/cocoslabs/deepstream_docker/venv/lib/python3.6/site-packages/tensorflow_core/python/saved_model/load.py", line 528, in load
return load_internal(export_dir, tags)
File "/home/cocoslabs/deepstream_docker/venv/lib/python3.6/site-packages/tensorflow_core/python/saved_model/load.py", line 537, in load_internal
saved_model_proto = loader_impl.parse_saved_model(export_dir)
File "/home/cocoslabs/deepstream_docker/venv/lib/python3.6/site-packages/tensorflow_core/python/saved_model/loader_impl.py", line 83, in parse_saved_model
constants.SAVED_MODEL_FILENAME_PB))
OSError: SavedModel file does not exist at: /home/cocoslabs/Downloads/age_gender_trt/{saved_model.pbtxt|saved_model.pb}
From the above error what I understand is tf.saved_model.load() accept only .pb or .pbtxt files. Is it right ? But as per this link Load and run test a .trt model what they said is tf.saved_model.load() function will accept .trt file. Help me to rectify this error. Thank You.
I'm trying to convert a custom dataset to tfrecord for DeepLab v3+, following this tutorial. My directory setup is as follows:
+ datasets
+ pascal_voc_seg/custom_dataset
+ VOCdevkit
+ VOC2012
+ JPEGImages
+ SegmentationClassRaw
+ ImageSets
+Segmentation
+ tfrecord
I also have downloaded the Pascal VOC dataset and the two directory structures are now identical. When I run the build_voc2012_data.py script on the PascalVOC dataset as follows:
#from models/research/deeplab/dataset/pascal_voc_seg
python build_voc2012_data.py \
--image_folder="./VOCdevkit/VOC2012/JPEGImages" \
--semantic_segmentation_folder="./VOCdevkit/VOC2012/SegmentationClassRaw" \
--list_folder="./VOCdevkit/VOC2012/ImageSets/Segmentation" \
--image_format="jpg" \
--output_dir="./tfrecord"
...everything works fine, the dataset gets converted to tfrecord files with a progress bar displayed. However when I run the same script from my custom dataset directory, the following error occurs:
>> Converting image 1/164 shard 0Traceback (most recent call last):
File "build_voc2012_data.py", line 146, in <module>
tf.compat.v1.app.run()
File "/home/delanyn/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/delanyn/.local/lib/python2.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/delanyn/.local/lib/python2.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "build_voc2012_data.py", line 142, in main
_convert_dataset(dataset_split)
File "build_voc2012_data.py", line 121, in _convert_dataset
image_data = tf.io.gfile.GFile(image_filename, 'rb').read()
File "/home/delanyn/.local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 122, in read
self._preread_check()
File "/home/delanyn/.local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 84, in _preread_check
compat.as_bytes(self.__name), 1024 * 512)
.jpg; No such file or directoryors_impl.NotFoundError: ./VOCdevkit/VOC2012/JPEGImages/2020_0
What could I be missing here? My images are JPEG with the same dimensions as the Pascal VOC images. The segmentation masks have the same colormap as well, and I use remove colormap script on them in advance.
As per the error message, I could only say that an entry in the train.txt or val.txt file in the folder: pascal_voc_dataset/VOCdevkit/VOC2012/ImageSets/Segmentation does not match with anything (image) in the JPEG folder: pascal_voc_dataset/VOCdevkit/VOC2012/JPEGImages.
I'm running a tensorflow model submitting the training on ml engine. I have built a pipeline which reads from BigQuery using tf.contrib.cloud.python.ops.bigquery_reader_ops.BigQueryReader as a reader for the queue.
Everything works fine using DataLab and in local, setting the GOOGLE_APPLICATION_CREDENTIALS variable pointing to the json file for the credentials key. However, when I submit the training job in the cloud I get these errors (I just post the main two):
Permission denied: Error executing an HTTP request (HTTP response code 403, error code 0, error message '') when reading schema for...
There was an error creating the model. Check the details: Request had insufficient authentication scopes.
I've already checked everything else like correctly defining the table schema in the script and project/dataset/table ids/names
I paste down here the whole error present in the log for more clarity:
message: "Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 131, in <module>
hparams=hparam.HParams(**args.__dict__)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run
return _execute_schedule(experiment, schedule)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule
return task()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 495, in train_and_evaluate
self.train(delay_secs=0)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 521, in __exit__
self._close_internal(exception_type)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 556, in _close_internal
self._sess.close()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 791, in close
self._sess.close()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 888, in close
ignore_live_threads=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/queue_runner_impl.py", line 238, in _run
enqueue_callable()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1063, in _single_operation_run
target_list_as_strings, status, None)
File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
PermissionDeniedError: Error executing an HTTP request (HTTP response code 403, error code 0, error message '')
when reading schema for pasquinelli-bigdata:Transactions.t_11_Hotel_25_w_train#1505224768418
[[Node: GenerateBigQueryReaderPartitions = GenerateBigQueryReaderPartitions[columns=["F_RACC_GEST", "LABEL", "F_RCA", "W24", "ETA", "W22", "W23", "W20", "W21", "F_LEASING", "W2", "W16", "WLABEL", "SEX", "F_PIVA", "F_MUTUO", "Id_client", "F_ASS_VITA", "F_ASS_DANNI", "W19", "W18", "W17", "PROV", "W15", "W14", "W13", "W12", "W11", "W10", "W7", "W6", "W5", "W4", "W3", "F_FIN", "W1", "ImpTot", "F_MULTIB", "W9", "W8"], dataset_id="Transactions", num_partitions=1, project_id="pasquinelli-bigdata", table_id="t_11_Hotel_25_w_train", test_end_point="", timestamp_millis=1505224768418, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Any suggestion would be extremely helpful since I'm relatively new with GC.
Thank you all.
Support for reading BigQuery data from Cloud ML Engine is still under development, so what you are doing is currently unsupported. The issue you are hitting is the machines that ML Engine runs do not have the right scopes to talk to BigQuery. A potential issue you may also encounter running locally is poor performance reading from BigQuery. These are two examples of work that needs to be addressed.
In the meantime, I recommend exporting data to GCS for training. This is going to be much more scalable so you don't have to worry about poor training performance as your data increases. This can be a good pattern as well as it will let you preprocess your data once, write the result to GCS in CSV format, and then do multiple training runs to try out different algorithms or hyperparameters.
I want to view statistics of my model by saving my graph to a file then running graph_metrics.py.
I have tried a few different things to write the file, my best effort is:
tf.train.write_graph( session.graph_def, ".", "my_graph", as_text=True )
But here's what happens:
$ python ./util/graph_metrics.py --noinput_binary --graph my_graph
Traceback (most recent call last):
File "./util/graph_metrics.py", line 137, in <module>
tf.app.run()
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv))
File "./util/graph_metrics.py", line 85, in main
FLAGS.batch_size)
File "./util/graph_metrics.py", line 109, in calculate_graph_metrics
input_tensor = sess.graph.get_tensor_by_name(input_layer)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2531, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2385, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File ".virtualenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2427, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'Mul:0' refers to a Tensor which does not exist. The operation, 'Mul', does not exist in the graph."
Is there a complete working example of saving a graph, then analyzing it with graph_metrics.py?
This process seems to involve a magic incantation that I haven't yet discovered.
The error you're hitting is because you need to specify the name of your own input node with --input_layer= (it just defaults to Mul:0 because that's what we use in one of our Inception models):
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/graph_metrics.py#L51
The graph_metrics script is still very much a work in progress unfortunately, and you may hit problems with shape inference, but hopefully this should get you past the initial hurdle.
I was just trying out the os.dup2() function to redirect outputs, when I was typing in os.dup2(3,1), which my ipython (2.7) didn't seem to like.
It crashed and now it won't start again, yielding the error:
Traceback (most recent call last):
File "/usr/bin/ipython", line 8, in <module>
launch_new_instance()
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py", line 402, in launch_new_instance
app.initialize()
File "<string>", line 2, in initialize
File "/usr/lib/python2.7/dist-packages/IPython/config/application.py", line 84, in catch_config_error
return method(app, *args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py", line 312, in initialize
self.init_shell()
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/ipapp.py", line 332, in init_shell
ipython_dir=self.ipython_dir)
File "/usr/lib/python2.7/dist-packages/IPython/config/configurable.py", line 318, in instance
inst = cls(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/IPython/frontend/terminal/interactiveshell.py", line 183, in __init__
user_module=user_module, custom_exceptions=custom_exceptions
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 456, in __init__
self.init_readline()
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 1777, in init_readline
self.refill_readline_hist()
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 1789, in refill_readline_hist
include_latest=True):
File "/usr/lib/python2.7/dist-packages/IPython/core/history.py", line 256, in get_tail
return reversed(list(cur))
DatabaseError: database disk image is malformed
If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at ipython-dev#scipy.org
You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.
Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
c.Application.verbose_crash=True
can anyone help me with that?
Reposting as an answer:
That looks like fd 3 is your IPython history database, and you redirected stdout to it and corrupted it.
To get it to start again, remove or rename ~/.ipython/profile_default/history.sqlite (or ~/.config/ipython/profile_default/history.sqlite on certain IPython versions on Linux).