running video processing python code on google colaboratory using gpu k80 - google-colaboratory

I am using an open source code from github for face detection uploaded to google colaboratory that runs k80 GPU. I believe I have installed and imported the libraries needed such as mxnet-cu90 and when I run the code I get the error listed below which I am not sure if it is a path problem or k80 does not support it or something else.
OSError: libcudart.so.9.0: cannot open shared object file: No such file or directory
Do we do imports differently in colab such as import mxnet as mx
It is my first attempt at colab and I run the code in the notebook cell as follows:
!python VideoProcessor.py
The VideoProcessor.py code runs as follows:
File "VideoProcessor.py", line 7, in <module>
from retinaface import RetinaFace
File "../thirdparty/insightface/RetinaFace/retinaface.py", line 7, in <module>
import mxnet as mx
File "/usr/local/lib/python3.6/dist-packages/mxnet/__init__.py", line 24, in <module>
from .context import Context, current_context, cpu, gpu, cpu_pinned
File "/usr/local/lib/python3.6/dist-packages/mxnet/context.py", line 24, in <module>
from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
File "/usr/local/lib/python3.6/dist-packages/mxnet/base.py", line 213, in <module>
_LIB = _load_lib()
File "/usr/local/lib/python3.6/dist-packages/mxnet/base.py", line 204, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
File "/usr/lib/python3.6/ctypes/__init__.py", line 348, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcudart.so.9.0: cannot open shared object file: No such file or directory

Related

Tensorflow TypeError: expected bytes, Descriptor found

I've been following this tutorial for recognising an object using machine learning:
https://www.youtube.com/watch?v=Rgpfk6eYxJA
I've followed all the instructions on what to install and how, including those in this related tutorial:
https://www.youtube.com/watch?v=RplXYjxgZbw
I tried both with their version and the newest available versions of the software. With the exception that I create the virtual environment like this:
conda create -n tensorflow1 pip python=3.6
Because the tensorflow module isn't yet compatible with python 3.7.
After I install all the packages needed, also described here:
https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10
Under 2d. Set up new Anaconda virtual environment
and go through the code in the video, I run into a error when I run
python generate_tfrecord.py --csv_input=images\train_labels.csv --image_dir=images\train --output_path=train.record
which is working in the video at 19:35.
The error is
2019-12-11 10:13:43.410540: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
Traceback (most recent call last):
File "generate_tfrecord.py", line 17, in <module>
import tensorflow as tf
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow\__init__.py", line 98, in <module>
from tensorflow_core import *
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\__init__.py", line 40, in <module>
from tensorflow.python.tools import module_util as _module_util
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 947, in _find_and_load_unlocked
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow\__init__.py", line 50, in __getattr__
module = self._load()
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow\__init__.py", line 44, in _load
module = _importlib.import_module(self.__name__)
File "C:\Anaconda\envs\tensorflow1\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\python\__init__.py", line 52, in <module>
from tensorflow.core.framework.graph_pb2 import *
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\graph_pb2.py", line 16, in <module>
from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\node_def_pb2.py", line 16, in <module>
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\attr_value_pb2.py", line 16, in <module>
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\tensor_pb2.py", line 16, in <module>
from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\resource_handle_pb2.py", line 16, in <module>
from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
File "C:\Anaconda\envs\tensorflow1\lib\site-packages\tensorflow_core\core\framework\tensor_shape_pb2.py", line 112, in <module>
'__module__' : 'tensorflow.core.framework.tensor_shape_pb2'
TypeError: expected bytes, Descriptor found
This problem is the same that appears in the jupyter kernel when I run the imports that appear in the video at 14:25
How do I fix the
TypeError: expected bytes, Descriptor found
Error?
And what's with
Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
That also appears?
I can also share this with you, in the second tutorial, the one just about installing tensorflow-gpu library, after I create an account for cuDNN and download it as inscribed, I only get a cudnn64_7.dll file in C:\cuda\bin which is in my system path environment variable, just as are
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp and
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\extras\CUPTI\lib64,
as instructed in the tutorial. As you can see, I have version 10.1 of Cuda and cuDNN and the paths are a bit different. The GPU Driver is also updated.
P.S. in the tensorflow installing tutorial, the test code doesn't work either.
This is all the information I think I have to offer.
I've been trying to solve this problem for 4-5 days at this point (and this is not my first video I watch to get a .record file for an image recognition neural network)
and the solutions for this particular problem offered in TypeError: expected bytes, Descriptor found or any other place on stackoverflow are not useful.
What should I do?
P.S. The tensorflow-gpu version I have is 2.0.0, and it might not be compatible with Cuda and cuDNN. It might be why I only have a cudnn64_7.dll file and not a cudart64_100.dll file. If no one has other solutions, I'll just install tensorflow 1.5 and try the software again.
If someone has another solution however, by all means, post it. I'll post a reply if it works. I'll edit this if it doesn't.
I've followed a different tutorial, however came across the same errors.
In case anyone is still wondering, I've fixed it by updating the tensorflow version from 1.5 originally to 1.15
pip install --ignore-installed --upgrade tensorflow-gpu==1.15.0
This is the official issue where I got the idea from.
As for the second part,
Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
This is an issue with the CUDA drivers. In short, there's compatibility issue between the tensorflow and your GPU. In most cases, don't worry too much, since it will default to using your CPU over GPU for training of a model. In case you really want to use the GPU (for better performance etc) check if it's supported. You can check similarly asked question, or from an official source.
Alternatively, since you've installed CUDA 10.1, as per official documentation, you'll need to upgrade tensorflow 2.1.0 or above to make it work.
Personally, I had to opt to using tensorflow 1.15 over 2.2.0 and installing CUDA 9.0 to make everything run. However, I'm working on a laptop with a mobile 1050 GPU, and no matter what, I couldn't get it to run otherwise.

How to fix "calibration_pb2 from 'object_detection.protos' " error (Windows)

I've tried to run the below code but it always gives a set of errors. I tried searching the answers but none work for my code, there are two files named 'object_detection' one in the research folder and other in the object_detection-0.1-py3.7.egg folder which might be causing the error but i tried to change the path but the errors still persist
I'm trying to execue this command:
C:\tensorflow1\models\research\object_detection>python train.py --
logtostderr --train_dir=training/ --
pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
but have the following errors coming up:
Traceback (most recent call last):
1. File "train.py", line 51, in <module>
from object_detection.builders import model_builder
2. File "C:\Users\Swayam\mypython\lib\site-packages\object_detection-
0.1-
py3.7.egg\object_detection\builders\model_builder.py", line 27, in
<module>
from object_detection.builders import post_processing_builder
3. File "C:\Users\Swayam\mypython\lib\site-packages\object_detection-
0.1-
py3.7.egg\object_detection\builders\post_processing_builder.py",
line 2
2, in <module>
from object_detection.protos import post_processing_pb2
4. File "C:\Users\Swayam\mypython\lib\site-packages\object_detection-
0.1-
py3.7.egg\object_detection\protos\post_processing_pb2.py", line 15,
in
<module>
from object_detection.protos import calibration_pb2 as
object__detection_dot_protos_dot_calibration__pb2
5. ImportError: cannot import name 'calibration_pb2' from
'object_detection.protos' (C:\Users\Swayam\mypython\lib\site-
packages\object_detection-0.1-
py3.7.egg\object_detection\protos\__init__.py)
I've tried using the
protoc object_detection/protos/*.proto --python_out=.
command but it brings up errors too.
Also, the environment is not made in conda, could that be the cause of the error? Though all the necessary installations are present in the existing virtual environment.
Try this Solution:
Check if the file
"calibration_pb2.py"
is located in the following path, in your case may be this one:
C:\Users\Swayam\mypython\lib\site-packages\object_detection-0.1-
py3.7.egg\object_detection\protos\
If not, just copy it from your working path:
C:\tensorflow1\models\research\object_detection\protos\
If it works, I sugggest you try to copy all the *pb2.py files into the path mentioned above.
you just compile this
protoc --python_out=. .\object_detection\protos\anchor_generator.proto .\object_detection\protos\argmax_matcher.proto .\object_detection\protos\bipartite_matcher.proto .\object_detection\protos\box_coder.proto .\object_detection\protos\box_predictor.proto .\object_detection\protos\eval.proto .\object_detection\protos\faster_rcnn.proto .\object_detection\protos\faster_rcnn_box_coder.proto .\object_detection\protos\grid_anchor_generator.proto .\object_detection\protos\hyperparams.proto .\object_detection\protos\image_resizer.proto .\object_detection\protos\input_reader.proto .\object_detection\protos\losses.proto .\object_detection\protos\matcher.proto .\object_detection\protos\mean_stddev_box_coder.proto .\object_detection\protos\model.proto .\object_detection\protos\optimizer.proto .\object_detection\protos\pipeline.proto .\object_detection\protos\post_processing.proto .\object_detection\protos\preprocessor.proto .\object_detection\protos\region_similarity_calculator.proto .\object_detection\protos\square_box_coder.proto .\object_detection\protos\ssd.proto .\object_detection\protos\ssd_anchor_generator.proto .\object_detection\protos\string_int_label_map.proto .\object_detection\protos\train.proto .\object_detection\protos\keypoint_box_coder.proto .\object_detection\protos\multiscale_anchor_generator.proto .\object_detection\protos\graph_rewriter.proto .\object_detection\protos\calibration.proto
it will resolve the issue

Lxml import issues when using Scrapy

I am trying to use Scrapy with Anaconda/Miniconda on Windows 10. Installation goes fine, but trying to actually run Scrapy gives the following error:
Traceback (most recent call last):
File "C:\ProgramData\Miniconda3\Scripts\scrapy-script.py", line 6, in <module>
from scrapy.cmdline import execute
File "C:\ProgramData\Miniconda3\lib\site-packages\scrapy\__init__.py", line 34, in <module>
from scrapy.spiders import Spider
File "C:\ProgramData\Miniconda3\lib\site-packages\scrapy\spiders\__init__.py", line 10, in <module>
from scrapy.http import Request
File "C:\ProgramData\Miniconda3\lib\site-packages\scrapy\http\__init__.py", line 11, in <module>
from scrapy.http.request.form import FormRequest
File "C:\ProgramData\Miniconda3\lib\site-packages\scrapy\http\request\form.py", line 11, in <module>
import lxml.html
File "C:\ProgramData\Miniconda3\lib\site-packages\lxml\html\__init__.py", line 53, in <module>
from .. import etree
ImportError: DLL load failed: The specified module could not be found.
I have tried reinstalling Scrapy, lxml, and Anaconda itself (this time, I'm using a clean install of Miniconda), as well as downloading unofficial lxml build from https://www.lfd.uci.edu/~gohlke/pythonlibs/, as suggested in one of the answers on Stack Overflow, but the problem persists. I have also done this on an Amazon AWS EC2 instance started from scratch, but I'm getting the same issue.
It seems to be something relatively common, but I couldn't find an answer that would work for me. What's an appropriate way to address this? Is it just about lxml, or is there something else causing this problem?

Tensorflow object detection gives No module named 'deployment'

I'm trying to train a custom object detection module using object detection api. I have put everything together and tried to train the module using 'Google Colab'. When I try to train the module it gives this error.
Traceback (most recent call last):
File "train.py", line 49, in <module>
from object_detection import trainer
File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/trainer.py", line 33, in <module>
from deployment import model_deploy
ModuleNotFoundError: No module named 'deployment'
I also execute the blow code segment which is equivalent to export PYTHONPATH=$PYTHONPATH:pwd:pwd/slim
import sys
sys.path.append('/content/models/research/slim/')
How do I overcome this error?
Copy 'deployment' folder in 'slim'. Then paste it to 'site-packages' folder of your python environment
Hope this helps!
For Google Colab,
import os
os.environ['PYTHONPATH'] += ':/models/research/:/models/research/slim/'
this one works..

Argparse error with TensorFlow's cifar10.py

I get the following error when I run python cifar10.py:
argparse.ArgumentError: argument --batch_size: conflicting option string(s): --batch_size
Here's the full output of the run including a complete trace:
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcublas.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcudnn.so.6.5 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcufft.so.7.0 locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:101] successfully opened CUDA library libcurand.so.7.0 locally
Traceback (most recent call last):
File "cifar10.py", line 54, in <module>
"""Number of images to process in a batch.""")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_flags.py", line 86, in DEFINE_integer
_define_helper(flag_name, default_value, docstring, int)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_flags.py", line 60, in _define_helper
type=flagtype)
File "/usr/lib/python2.7/argparse.py", line 1297, in add_argument
return self._add_action(action)
File "/usr/lib/python2.7/argparse.py", line 1671, in _add_action
self._optionals._add_action(action)
File "/usr/lib/python2.7/argparse.py", line 1498, in _add_action
action = super(_ArgumentGroup, self)._add_action(action)
File "/usr/lib/python2.7/argparse.py", line 1311, in _add_action
self._check_conflict(action)
File "/usr/lib/python2.7/argparse.py", line 1449, in _check_conflict
conflict_handler(action, confl_optionals)
File "/usr/lib/python2.7/argparse.py", line 1456, in _handle_conflict_error
raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument --batch_size: conflicting option string(s): --batch_size
This error seems to come from the following line in cifar10.py: tf.app.flags.DEFINE_integer('batch_size', 128, """Number of images to process in a batch.""")
It seems like the argparse library thinks that I've already defined the option string --batch_size, but I haven't.
[Stack: Amazon g2.2xlarge spot instance, Python 2.7.6]
In the cifr10.py file:
import tensorflow as tf
from tensorflow.models.image.cifar10 import cifar10_input
FLAGS = tf.app.flags.FLAGS
# Basic model parameters.
tf.app.flags.DEFINE_integer('batch_size', 128,
"""Number of images to process in a batch.""")
....
The error is produced by this last statement, which, in the _flags.py file, defines an argparse argument with that name. Evidently at this point the tf.app already has such an argument define.
So we need to look further back at import tensorflow as tf to see how tf.app was created?
What's the Amazon g2.2xlarge? Could that defining batch_size as well?
Looks like tf.app comes from
tensorflow/python/platform/app.py
which in turn gets it from something like
from tensorflow.python.platform.google._app import *
So if you are running this on some google or amazon platform that itself accepts batch_size parameter, it could produce this error.
Another question about cifr10 and the batch_size argument:
How to use "FLAGS" (command line switches) in TensorFlow?
Same error here:
Tensorflow ArgumentError Running CIFAR-10 example
The answer says to use cifar10_train.py,cifar10_eval.py, not cifar10.py.