NotFoundError : ; on tensorflow 1.5 object detection API, running smoothly on 1.4 - tensorflow

I recently upgraded one of my small ubuntu (16.04) servers from tensorflow-gpu 1.4 to tensorflow-gpu 1.5 for working with the object detection API. I have git cloned the latest version API that is supposed to work with tensorflow 1.5.
CUDA/cudNN and other tensorflow programs are up and running after the upgrade, and all test-scripts in the object detection API are running fine.
Despite this, when I attempt to run train.py it fails immediately with the following error:
File "/home/arvid/ownCloud/tensorflow/models/research/object_detection/train.py", line 167, in <module> tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 124, in run _sys.exit(main(argv))
File "/home/arvid/ownCloud/tensorflow/models/research/object_detection/train.py", line 107, in main overwrite=True)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/file_io.py", line 385, in copy compat.as_bytes(oldpath), compat.as_bytes(newpath), overwrite, status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__ c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ; No such file or directory
This error arise when some input file is missing, but the problem here is that no file is specified in the error.
Usually the missing file is presented between the comma and the semicolon, but in this error it is just a blank space.
I can reproduce the same error on my working server running tensorflow 1.4 by inserting a space between --train_dir= and the path:
--train_dir= {some_path}
But that is not the case here!
Additional info: when I run train.py the 'train' directory is created at the location I specify, so tensorflow seems to be able to identify paths etc..
Any input on how to debug this would be greatly appreciated!!

(Ok, I'm feeling a bit stupid right now...)
The solution was simple - the name of the flags for train.py changed with the update...
It used to be:
--pipeline_config={some_path}
But now it's:
--pipeline_config_path={some_path}
Still, it would be useful with a more informative error message...

Romove some spaces between --train_dir= {some_path} and --pipeline_config_path= {some_path} .
It works for me.

Related

Tensorflow TFX pipeline in Windows machine is failing when trying to create a folder with Linux like folder naming structure

I am trying to run the simple TFX pipeline in Windows 10 machine. I am using the codes as given in Tensorflow website (https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple). While trying to run the pipeline, it is throwing below error. The folder name is using a mix of '\' and '/' while TFX is trying to create the folder. I am not sure, how to solve this issue as it is happening within Tensorflow internal code.
ERROR:absl:Failed to make stateful working dir: pipelines\penguin-simple\CsvExampleGen.system\stateful_working_dir\2021-06-24T20:11:37.715669
Traceback (most recent call last):
File "G:\Anaconda3\lib\site-packages\tfx\orchestration\portable\outputs_utils.py", line 211, in get_stateful_working_directory
fileio.makedirs(stateful_working_dir)
File "G:\Anaconda3\lib\site-packages\tfx\dsl\io\fileio.py", line 83, in makedirs
_get_filesystem(path).makedirs(path)
File "G:\Anaconda3\lib\site-packages\tfx\dsl\io\plugins\tensorflow_gfile.py", line 76, in makedirs
tf.io.gfile.makedirs(path)
File "G:\Anaconda3\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 483, in recursive_create_dir_v2
_pywrap_file_io.RecursivelyCreateDir(compat.path_to_bytes(path))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Failed to create a directory: pipelines\penguin-simple\CsvExampleGen.system\stateful_working_dir/2021-06-24T20:11:37.715669; Invalid argument

Anaconda 2020.07 with python3.8 lacks support for 'snappy' compressor in blosc?

I'm loading an hdf file wrote under pandas.to_hdf(...,complib="blosc:snappy") in python3.7 installed by anaconda
after I upgraded anaconda to py3.8, it shows
HDF5ExtError: HDF5 error back trace
File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 199, in H5Dread
can't read data
File "C:\ci\hdf5_1545244154871\work\src\H5Dio.c", line 601, in H5D__read
can't read data
File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 2229, in H5D__chunk_read
unable to read raw data chunk
File "C:\ci\hdf5_1545244154871\work\src\H5Dchunk.c", line 3609, in H5D__chunk_lock
data pipeline read failed
File "C:\ci\hdf5_1545244154871\work\src\H5Z.c", line 1326, in H5Z_pipeline
filter returned failure during read
File "hdf5-blosc/src/blosc_filter.c", line 188, in blosc_filter
this Blosc library does not have support for the 'snappy' compressor, but only for: blosclz,lz4,lz4hc,zlib,zstd
End of HDF5 error back trace
Problems reading the array data.
seems like Blosc 1.19.0 deprecates support for 'snappy' or not included by default? how to solve it?

Tensorflow TypeError: expected bytes, Descriptor found

I've been following this tutorial for recognising an object using machine learning:
https://www.youtube.com/watch?v=Rgpfk6eYxJA
I've followed all the instructions on what to install and how, including those in this related tutorial:
https://www.youtube.com/watch?v=RplXYjxgZbw
I tried both with their version and the newest available versions of the software. With the exception that I create the virtual environment like this:
conda create -n tensorflow1 pip python=3.6
Because the tensorflow module isn't yet compatible with python 3.7.
After I install all the packages needed, also described here:
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
Under 2d. Set up new Anaconda virtual environment
and go through the code in the video, I run into a error when I run
python generate_tfrecord.py --csv_input=images\train_labels.csv --image_dir=images\train --output_path=train.record
which is working in the video at 19:35.
The error is
2019-12-11 10:13:43.410540: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
Traceback (most recent call last):
File "generate_tfrecord.py", line 17, in <module>
import tensorflow as tf
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow\__init__.py", line 98, in <module>
from tensorflow_core import *
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\__init__.py", line 40, in <module>
from tensorflow.python.tools import module_util as _module_util
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 947, in _find_and_load_unlocked
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow\__init__.py", line 50, in __getattr__
module = self._load()
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow\__init__.py", line 44, in _load
module = _importlib.import_module(self.__name__)
File "C:\Anaconda\envs\tensorflow1\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\python\__init__.py", line 52, in <module>
from tensorflow.core.framework.graph_pb2 import *
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\graph_pb2.py", line 16, in <module>
from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\node_def_pb2.py", line 16, in <module>
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\attr_value_pb2.py", line 16, in <module>
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\tensor_pb2.py", line 16, in <module>
from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\resource_handle_pb2.py", line 16, in <module>
from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\tensor_shape_pb2.py", line 112, in <module>
'__module__' : 'tensorflow.core.framework.tensor_shape_pb2'
TypeError: expected bytes, Descriptor found
This problem is the same that appears in the jupyter kernel when I run the imports that appear in the video at 14:25
How do I fix the
TypeError: expected bytes, Descriptor found
Error?
And what's with
Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
That also appears?
I can also share this with you, in the second tutorial, the one just about installing tensorflow-gpu library, after I create an account for cuDNN and download it as inscribed, I only get a cudnn64_7.dll file in C:\cuda\bin which is in my system path environment variable, just as are
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp and
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\CUPTI\lib64,
as instructed in the tutorial. As you can see, I have version 10.1 of Cuda and cuDNN and the paths are a bit different. The GPU Driver is also updated.
P.S. in the tensorflow installing tutorial, the test code doesn't work either.
This is all the information I think I have to offer.
I've been trying to solve this problem for 4-5 days at this point (and this is not my first video I watch to get a .record file for an image recognition neural network)
and the solutions for this particular problem offered in TypeError: expected bytes, Descriptor found or any other place on stackoverflow are not useful.
What should I do?
P.S. The tensorflow-gpu version I have is 2.0.0, and it might not be compatible with Cuda and cuDNN. It might be why I only have a cudnn64_7.dll file and not a cudart64_100.dll file. If no one has other solutions, I'll just install tensorflow 1.5 and try the software again.
If someone has another solution however, by all means, post it. I'll post a reply if it works. I'll edit this if it doesn't.
I've followed a different tutorial, however came across the same errors.
In case anyone is still wondering, I've fixed it by updating the tensorflow version from 1.5 originally to 1.15
pip install --ignore-installed --upgrade tensorflow-gpu==1.15.0
This is the official issue where I got the idea from.
As for the second part,
Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
This is an issue with the CUDA drivers. In short, there's compatibility issue between the tensorflow and your GPU. In most cases, don't worry too much, since it will default to using your CPU over GPU for training of a model. In case you really want to use the GPU (for better performance etc) check if it's supported. You can check similarly asked question, or from an official source.
Alternatively, since you've installed CUDA 10.1, as per official documentation, you'll need to upgrade tensorflow 2.1.0 or above to make it work.
Personally, I had to opt to using tensorflow 1.15 over 2.2.0 and installing CUDA 9.0 to make everything run. However, I'm working on a laptop with a mobile 1050 GPU, and no matter what, I couldn't get it to run otherwise.

Tensorboard error after upgrading to 1.4: trying to access flag before flags were parsed

Since upgrading to TF 1.4 I am getting this error when I try to run tensorboard:
Traceback (most recent call last):
File "/opt/python/3.6.3/bin/tensorboard", line 11, in <module>
sys.exit(main())
File "/opt/python/3.6.3/lib/python3.6/site-packages/tensorboard/main.py", line 39, in main
return program.main(default.get_plugins(),
File "/opt/python/3.6.3/lib/python3.6/site-packages/tensorboard/default.py", line 71, in get_plugins
debugger = debugger_plugin_loader.get_debugger_plugin()
File "/opt/python/3.6.3/lib/python3.6/site-packages/tensorboard/plugins/debugger/debugger_plugin_loader.py", line 46, in get_debugger_plugin
if FLAGS.debugger_data_server_grpc_port is None:
File "/opt/python/3.6.3/lib/python3.6/site-packages/absl/flags/_flagvalues.py", line 509, in __getattr__
raise _exceptions.UnparsedFlagAccessError(error_message)
absl.flags._exceptions.UnparsedFlagAccessError: Trying to access flag --debugger_data_server_grpc_port before flags were parsed.
I am getting this error even when just typing tensorboard with no --logdir specified but also when I do specify a log dir. I notice this has been reported in github as of 5 days ago (https://github.com/tensorflow/nmt/issues/176), but I am surprised not to see more folks reporting this.
I also noticed that I was not able to run a Tensorflow RNN tutorial for the same reason last week, with the error also indicating flags were accessed before being parsed. Has anyone run into this and can you tell me if there's a fix?
As instructed in this Github issue, the quick fix is to upgrade Tensorboard to the nightly build:
pip install --upgrade tb-nightly
As also explained this issue will be fixed as soon as soon as TensorFlow nightly 20171122 is released though.

Error with PyQtDeploy

I'm using PyQt 5.3.1 and I'm deploying with pyqtdeploy 0.4 and when i want to build a project i obtain this error message:
Generating code...
Cleaning E:\ProgramasPython3\PythonQT\QTCalculator\build.
Freezing C:\Users\Tobal\AppData\Local\Temp\bootstrap_py3.py
Freezing E:\ProgramasPython3\PythonQT\QTCalculator\qtcalculator.py
Freezing E:/ProgramasPython3/PythonQT\QTCalculator\__init__.py
Freezing E:/ProgramasPython3/PythonQT\QTCalculator\calculator_ui.py
Freezing E:/ProgramasPython3/PythonQT\QTCalculator\img_rc.py
Freezing E:/ProgramasPython3/PythonQT\QTCalculator\qtcalculator.py
Freezing C:\Python34\libs\site-packages\PyQt5\__init__.py
Unable to freeze C:\Python34\libs\site-packages\PyQt5\__init__.py.
Traceback (most recent call last):
File "C:\Users\Tobal\AppData\Local\Temp\freeze.py", line 103, in <module>
freeze_as_data(py_file, options.as_data)
File "C:\Users\Tobal\AppData\Local\Temp\freeze.py", line 36, in freeze_as_data
code = _get_marshalled_code(py_filename)
File "C:\Users\Tobal\AppData\Local\Temp\freeze.py", line 71, in _get_marshalled_code
source_file = open(py_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Python34\\libs\\site-packages\\PyQt5\\__init__.py'
I think this is a bug. someone does know how to fix it?
Thanks
Did you build PyQt5 statically? The host and target Python are two different things, and often in different directories. The target Python has PyQt module built statically, the host Python must also have PyQt5 installed (because pyqtdeploy uses Qt for its GUI) but its usually a dynamic library in the host Python.
In the pyqtdeploy GUI on "Locations" tab, make sure that "Standard library directory" is correct.