Do I need to add tf.compat.v1.disable_eager_execution() to export_inference_graph.py to convert tf.train.Checkpoint to SavedModel? - tensorflow

I found a question about this error (in a different scenario), many Github issues and articles, but it seemingly always has to do with people upgrading from TF 1.x to TF 2.x. I'm not doing that.
Here are my versions:
tensorflow 2.5.0
tensorflow-addons 0.13.0
tensorflow-datasets 4.3.0
tensorflow-estimator 2.5.0
tensorflow-gpu 2.5.0
I'm trying to use TF object detection, converting a model trained in TF 2.5 via Python to a tensorflow.js compatible model and asked a question about it. The answer given was to start by running:
python export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/ssd_inception_v2.config \
--trained_checkpoint_prefix path/to/model.ckpt \
--output_directory path/to/exported_model_directory
So my command ended up being:
py Tensorflow\models\research\object_detection\export_inference_graph.py
--input_type image_tensor
--pipeline_config_path Tensorflow\workspace\models\my_ssd_mobnet\pipeline.config
--trained_checkpoint_prefix Tensorflow\workspace\pre-trained-models\ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8\checkpoint\ckpt-0.data-00000-of-00001
--output_directory Tensorflow\workspace\models\my_ssd_mobnet\export
Which resulted in the error:
RuntimeError: tf.placeholder() is not compatible with eager execution
I do see in the logs a common cause of this error, I know where it's coming from:
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 186, in _image_tensor_input_placeholder
input_tensor = tf.placeholder(
But I don't understand how to deal with this, since I'm not writing any of these Tensorflow modules, I'm just trying to do something basic with existing modules, like converting a tf.train.Checkpoint to SavedModel.
Normally the answer seems to be to call tf.compat.v1.disable_eager_execution() but the weird thing about this is it's not my code, I don't know what else I'll potentially break in this conversion script by disabling a feature. Nor am I good enough with the Tensorflow API yet to really understand that script.
Full logs and trace:
2021-07-15 09:40:24.482953: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-15 09:40:26.835151: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-15 09:40:26.856379: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-07-15 09:40:26.856487: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-15 09:40:26.861810: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-15 09:40:26.861891: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-15 09:40:26.864685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-15 09:40:26.865561: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-15 09:40:26.872246: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-15 09:40:26.874465: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-15 09:40:26.874979: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-15 09:40:26.875238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-15 09:40:26.876220: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-15 09:40:26.877353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.845GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2021-07-15 09:40:26.877556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-15 09:40:27.285985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-15 09:40:27.286153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-15 09:40:27.286917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-15 09:40:27.287164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5957 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2080 SUPER, pci bus id: 0000:01:00.0, compute capability: 7.5)
Traceback (most recent call last):
File "Tensorflow\models\research\object_detection\export_inference_graph.py", line 206, in <module>
tf.app.run()
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "Tensorflow\models\research\object_detection\export_inference_graph.py", line 194, in main
exporter.export_inference_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 611, in export_inference_graph
_export_inference_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 503, in _export_inference_graph
outputs, placeholder_tensor_dict = build_detection_graph(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 457, in build_detection_graph
placeholder_tensor, input_tensors = input_placeholder_fn_map[input_type](
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\object_detection-0.1-py3.8.egg\object_detection\exporter.py", line 186, in _image_tensor_input_placeholder
input_tensor = tf.placeholder(
File "C:\Users\jonat\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3268, in placeholder
raise RuntimeError("tf.placeholder() is not compatible with "
RuntimeError: tf.placeholder() is not compatible with eager execution.
What could I be doing here that would cause this error? Did I install the wrong version of the conversion script? I checked that I have the latest Tensorflow files from the official repo, and that's where export_inference_graph.py is found. Does the conversion script just not work with Tensorflow 2.x? Do I need to modify the conversion script with tf.compat.v1.disable_eager_execution()? Will this cause other problems in the script since I'm disabling a feature?
Edit:
I know some models in the object detection were built for tf 1.x (model zoo) and others 2.x (model zoo). I verified that I have a 2.x model, so that's not the cause.

Tensorflow allows you to save the model in multiple different format (checkpoint or savedmodel). The checkpoint just saves the weights for every layer so when loading the model, you need to first define the network architecture and then load the weights. The SavedModel saves the complete model i.e. architecture, weights and training configuration (including the optimizer weights). This link has more details related to the various format that are available.
https://www.tensorflow.org/tutorials/keras/save_and_load
In your case, since tfjs requires savedmodel as input, you can directly save the tensorflow model in the savedmodel rather than saving it first as checkpoint and then trying to convert it to savedmodel format.

Related

Why my TensorFlow got warning for first run but then works?

Excuse for the stupid question from a newbie. I just got my TensorFlow, both GPU and CPU, installed on Ubuntu 20.04 LTS on WSL2 with CUDA. My GPU is GeForce 940MX. When I try to test the TensorFlow installation, the first attempt got some warning as the message below. However, the exact same syntax following went through smoothly. I wonder what happened here? And is it using CPU or GPU actually?
>>> import tensorflow as tf
2021-01-22 12:01:14.284464: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
2021-01-22 12:01:20.034653: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-22 12:01:20.035316: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2021-01-22 12:01:20.035400: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-01-22 12:01:20.035518: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-3E6PSHT): /proc/driver/nvidia/version does not exist
2021-01-22 12:01:20.035984: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-22 12:01:20.036526: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
tf.Tensor(-872.4863, shape=(), dtype=float32)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(590.53516, shape=(), dtype=float32)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(294.59973, shape=(), dtype=float32)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(261.34412, shape=(), dtype=float32)
Anything helps!

GPU errors when running tensorflow AI

I'm following a beginner's TensorFlow tutorial and trying out classification. There are a bunch of GPU errors. I have cuda tools installed as well as my latest GPU drivers. Here is the output:
2021-01-13 15:42:24.186914: W
tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could
not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll
not found 2021-01-13 15:42:24.187065: I
tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart
dlerror if you do not have a GPU set up on your machine.
[NumericColumn(key='SepalLength', shape=(1,), default_value=None,
dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth',
shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),
NumericColumn(key='PetalLength', shape=(1,), default_value=None,
dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth',
shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]
2021-01-13 15:42:26.282013: I
tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library nvcuda.dll 2021-01-13
15:42:26.302224: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0
with properties: pciBusID: 0000:0e:00.0 name: GeForce GTX 1080
computeCapability: 6.1 coreClock: 1.86GHz coreCount: 20
deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2021-01-13 15:42:26.302958: W
tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could
not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll
not found 2021-01-13 15:42:26.303513: W
tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could
not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll
not found 2021-01-13 15:42:26.304062: W
tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could
not load dynamic library 'cublasLt64_11.dll'; dlerror:
cublasLt64_11.dll not found starting training 2021-01-13
15:42:26.307161: I
tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library cufft64_10.dll 2021-01-13
15:42:26.308219: I
tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library curand64_10.dll 2021-01-13
15:42:26.312354: I
tensorflow/stream_executor/platform/default/dso_loader.cc:49]
Successfully opened dynamic library cusolver64_10.dll 2021-01-13
15:42:26.312941: W
tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could
not load dynamic library 'cusparse64_11.dll'; dlerror:
cusparse64_11.dll not found 2021-01-13 15:42:26.313499: W
tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could
not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not
found 2021-01-13 15:42:26.313623: W
tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Cannot dlopen
some GPU libraries. Please make sure the missing libraries mentioned
above are installed properly if you would like to use GPU. Follow the
guide at https://www.tensorflow.org/install/gpu for how to download
and setup the required libraries for your platform. Skipping
registering GPU devices... 2021-01-13 15:42:26.314323: I
tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow
binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical
operations: AVX2 To enable them in other operations, rebuild
TensorFlow with the appropriate compiler flags. 2021-01-13
15:42:26.315481: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device
interconnect StreamExecutor with strength 1 edge matrix: 2021-01-13
15:42:26.315604: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1306]
WARNING:tensorflow:Using temporary folder as model directory:
C:\Users\levig\AppData\Local\Temp\tmpbmbc3as1 WARNING:tensorflow:From
C:\Users\levig\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\training\training_util.py:235:
Variable.initialized_value (from tensorflow.python.ops.variables) is
deprecated and will be removed in a future version. Instructions for
updating: Use Variable.read_value. Variables in 2.X are initialized
automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From
C:\Users\levig\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\optimizer_v2\adagrad.py:82:
calling Constant.init (from tensorflow.python.ops.init_ops) with
dtype is deprecated and will be removed in a future version.
Instructions for updating: Call initializer instance with the dtype
argument instead of passing it to the constructor 2021-01-13
15:42:27.410575: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1760] Found device 0
with properties: pciBusID: 0000:0e:00.0 name: GeForce GTX 1080
computeCapability: 6.1 coreClock: 1.86GHz coreCount: 20
deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2021-01-13 15:42:27.410786: W
tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Cannot dlopen
some GPU libraries. Please make sure the missing libraries mentioned
above are installed properly if you would like to use GPU. Follow the
guide at https://www.tensorflow.org/install/gpu for how to download
and setup the required libraries for your platform. Skipping
registering GPU devices... 2021-01-13 15:42:27.474456: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1300] Device
interconnect StreamExecutor with strength 1 edge matrix: 2021-01-13
15:42:27.474571: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1306] 0
2021-01-13 15:42:27.474637: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1319] 0: N
2021-01-13 15:42:27.482654: I
tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:258] None of
the MLIR optimization passes are enabled (registered 0 passes)
Here is my code:
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
train_path = tf.keras.utils.get_file(
"iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
"iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
# Here we use keras (a module inside of TensorFlow) to grab our datasets and read them into a pandas dataframe
train_y = train.pop('Species')
test_y = test.pop('Species')
train.head() # the species column is now gone
def input_fn(features, labels, training=True, batch_size=256):
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle and repeat if you are in training mode.
if training:
dataset = dataset.shuffle(1000).repeat()
return dataset.batch(batch_size)
# Feature columns describe how to use the input.
my_feature_columns = []
for key in train.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
# Build a DNN with 2 hidden layers with 30 and 10 hidden nodes each.
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# Two hidden layers of 30 and 10 nodes respectively.
hidden_units=[30, 10],
# The model must choose between 3 classes.
n_classes=3)
print("starting training")
classifier.train(
input_fn=lambda: input_fn(train, train_y, training=True),
steps=5000)
From comments
Please make sure the missing libraries mentioned above are installed
properly if you would like to use GPU. Follow the GPU Support
guide for how to download and setup the required libraries for
your platform. (paraphrased from Soleil)

keras error when trying to get intermediate layer output: Could not create cudnn handle

I am building a model using keras.
I am using:
anaconda (python 3.7)
tensorflow-gpu (2.1)
keras (2.3.1)
cuda (10.1.2)
cudnn (7.6.5)
nvidia driver (445.7)
nvidia gpu: gtx 1660Ti (6GB)
when I am trying to run a model, there is a code that creates an error:
def get_gen_output(gan, noise):
intermediate_model=Model(inputs=gan.input,outputs=gan.layers[24].output)
layer_output = intermediate_model.predict(noise)
return layer_output[0]
this model is a CNN gan. I can run other CNN models well, only this model creates a problem.
the error I get is:
Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
from other questions that faces the same problem, I see that there are two common things that can cause it:
insufficient gpu memory - but I dont think this is the problem, since even if I create a very small model that includes the code snippet from above the error appears. and bigger models without this code work well.
problem with cuda and cudnn compatibility - but based on this link, the version I listed above should work.
any idea what could be the problem and how to fix it? I have been trying to solve this for days now.
if any more information is needed (summary of the model for example), please let me know in the comments and I will add it.
UPDATE: a comment asked me to post the logs:
(base) C:\Users\Moran>ju[yter notebook
'ju[yter' is not recognized as an internal or external command,
operable program or batch file.
(base) C:\Users\Moran>jupyter notebook
[I 16:42:41.966 NotebookApp] Serving notebooks from local directory: C:\Users\Moran
[I 16:42:41.967 NotebookApp] The Jupyter Notebook is running at:
[I 16:42:41.967 NotebookApp] http://localhost:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
[I 16:42:41.967 NotebookApp] or http://127.0.0.1:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
[I 16:42:41.967 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:42:42.000 NotebookApp]
To access the notebook, open this file in a browser:
file:///C:/Users/Moran/AppData/Roaming/jupyter/runtime/nbserver-15820-open.html
Or copy and paste one of these URLs:
http://localhost:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
or http://127.0.0.1:8888/?token=ec3a664897f7d31597f7f4544609cc8c0d7b4db7450b55b1
[I 16:42:47.284 NotebookApp] Kernel started: ae448b14-33fc-471e-a2ae-991be8321434
[W 16:42:47.740 NotebookApp] 404 GET /api/kernels/4ce83e1e-9aa5-4c93-97d8-55dc16480242/channels?session_id=eaa90dc2c0bb4c448d6a01d66f4fbb21 (127.0.0.1): Kernel does not exist: 4ce83e1e-9aa5-4c93-97d8-55dc16480242
[W 16:42:47.757 NotebookApp] 404 GET /api/kernels/4ce83e1e-9aa5-4c93-97d8-55dc16480242/channels?session_id=eaa90dc2c0bb4c448d6a01d66f4fbb21 (127.0.0.1) 18.94ms referer=None
[W 16:42:49.439 NotebookApp] 404 GET /api/kernels/b9e9b610-9c5b-4565-8b85-deb70837c31f/channels?session_id=34072dd627c74e96b496ef73d99601a9 (::1): Kernel does not exist: b9e9b610-9c5b-4565-8b85-deb70837c31f
[W 16:42:49.440 NotebookApp] 404 GET /api/kernels/b9e9b610-9c5b-4565-8b85-deb70837c31f/channels?session_id=34072dd627c74e96b496ef73d99601a9 (::1) 2.00ms referer=None
2020-04-12 16:43:00.321827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:02.652473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-04-12 16:43:02.685848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-12 16:43:02.693105: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:02.700970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-12 16:43:02.708335: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-12 16:43:02.713049: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-12 16:43:02.720598: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-12 16:43:02.726428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-12 16:43:02.738007: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:02.741940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-12 16:43:02.745942: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2020-04-12 16:43:02.754621: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-12 16:43:02.761464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:02.766394: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-12 16:43:02.770257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-12 16:43:02.773975: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-12 16:43:02.777827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-12 16:43:02.782949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-12 16:43:02.786952: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:02.791207: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-12 16:43:03.372450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-12 16:43:03.376375: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-04-12 16:43:03.379436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-04-12 16:43:03.382400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 4625 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-04-12 16:43:03.966022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1660 Ti computeCapability: 7.5
coreClock: 1.59GHz coreCount: 24 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 268.26GiB/s
2020-04-12 16:43:03.976011: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-04-12 16:43:03.980766: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-04-12 16:43:03.985179: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-04-12 16:43:03.988922: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-04-12 16:43:03.992744: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-04-12 16:43:03.997758: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-04-12 16:43:04.001856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:04.006936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-04-12 16:43:04.009739: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-12 16:43:04.014702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-04-12 16:43:04.017351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-04-12 16:43:04.020371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4625 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
[W 16:43:04.449 NotebookApp] Replacing stale connection: 4ce83e1e-9aa5-4c93-97d8-55dc16480242:eaa90dc2c0bb4c448d6a01d66f4fbb21
2020-04-12 16:43:05.280820: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-04-12 16:43:06.518456: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-04-12 16:43:06.522375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-04-12 16:43:06.525103: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node 1/convolution}}]]
[W 16:43:06.741 NotebookApp] Replacing stale connection: b9e9b610-9c5b-4565-8b85-deb70837c31f:34072dd627c74e96b496ef73d99601a9
[I 16:43:08.454 NotebookApp] Saving file at /generative models/GAN.ipynb
Kindly remove nvidia cuda toolkit from both anaconda environment as well as system.
sudo apt-get remove nvidia-cuda-toolkit
conda remove cudatoolkit
And, use the following option while calling tensorflow session
Tensorflow
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)
For keras,
from keras.backend.tensorflow_backend import set_session
import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess) # set this TensorFlow session as the default session for Keras

On tensorflow v.1.15 sess=tf.Session() - Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found

I had downgrade tensorflow v.2.0 to v.1.15
And then I typed some code in ipython to check.
But, there is some problem about cudnn64_7.dll
(base) C:\Users\puppy>ipython
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import tensorflow as tf
2019-10-31 00:14:52.841679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
In [2]: hello=tf.constant('Hello, tensorflow!')
In [3]: sess=tf.Session()
2019-10-31 00:17:45.140209: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-31 00:17:45.937511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce 920M major: 3 minor: 5 memoryClockRate(GHz): 0.954
pciBusID: 0000:03:00.0
2019-10-31 00:17:45.945256: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-10-31 00:17:45.981463: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-10-31 00:17:46.039438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_100.dll
2019-10-31 00:17:46.056982: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_100.dll
2019-10-31 00:17:46.131993: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_100.dll
2019-10-31 00:17:46.192560: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_100.dll
2019-10-31 00:17:46.202156: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2019-10-31 00:17:46.209181: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-31 00:17:46.223313: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-31 00:17:46.237318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-10-31 00:17:46.243341: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]
In [4]: print(sess.run(hello))
b'Hello, tensorflow!'
this part is problem
2019-10-31 00:17:46.202156: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2019-10-31 00:17:46.209181: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-31 00:17:46.223313: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-31 00:17:46.237318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
Why I cannot find cudnn64_7.dll?
I download 'cuDNN v7.6.4 (September 27, 2019), for CUDA 10.0' and add to the path of system variable like this
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\dll
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\extras\CUPTI\libx64
How can I fix this problem and use GPU library?
It sometimes happen that cudnn64_7.dll gets removed automatically from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin. Try re-pasting the file from the extracted cudnn package and it will work.

Tensorflow 2.0 can't use GPU, something wrong in cuDNN? :Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

I am trying to understand and debug my code. I try to predict with a CNN model developed under tf2.0/tf.keras on GPU, but get those error messages.
could someone help me to fix it?
here is my environmental configuration
enviroments:
python 3.6.8
tensorflow-gpu 2.0.0-rc0
nvidia 418.x
CUDA 10.0
cuDNN 7.6+**
and the log file,
2019-09-28 13:10:59.833892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-09-28 13:11:00.228025: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-09-28 13:11:00.957534: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963310: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963416: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node mobilenetv2_1.00_192/Conv1/Conv2D}}]]
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0=====>GPU Available: True
=====> 4 Physical GPUs, 1 Logical GPUs
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "NSFW_Server.py", line 162, in <module>
model.predict(initial_tensor)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 915, in predict
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 722, in predict
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 393, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3625, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node mobilenetv2_1.00_192/Conv1/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_10727]
Function call stack:
keras_scratch_graph
The code
if __name__ == "__main__":
print("=====>GPU Available: ", tf.test.is_gpu_available())
tf.debugging.set_log_device_placement(True)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("=====>", len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
paras_path = "./paras/{}".format(int(2011))
model = tf.keras.experimental.load_from_saved_model(paras_path)
initial_tensor = np.zeros((1, INPUT_SHAPE, INPUT_SHAPE, 3))
model.predict(initial_tensor)
You have to check that you have the right version of CUDA + CUDNN + TensorFlow (also ensure that you have all installed).
A couple of examples of running configurations are presented below(UPDATE FOR LATEST VERSIONS OF TENSORFLOW)
Cuda 11.3.1 + CuDNN 8.2.1.32 + TensorFlow 2.7.0
Cuda 11.0 + CuDNN 8.0.4 + TensorFlow 2.4.0
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.2.0/TensorFlow 2.3.0 (TF >= 2.1 requires CUDA >=10.1)
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.1.0 (TF >= 2.1 requires CUDA >=
10.1)
Cuda 10.0 + CuDNN 7.6.3 + / TensorFlow 1.13/1.14 / TensorFlow 2.0.
Cuda 9.0 + CuDNN 7.0.5 + TensorFlow 1.10
Usually this error appears when you have an incompatible version of TensorFlow/CuDNN installed. In my case, this appeared when I tried using an older TensorFlow with a newer version of CuDNN.
**If for some reason you get an error message like(and nothing happens afterwards) :
Relying on the driver to perform ptx compilation
Solution : Install the latest nvidia driver
[SEEMS TO BE SOLVED IN TF >= 2.5.0] (see below):
Only for Windows Users : Some late combintations of CUDA, CUDNN and TF may not work, due to a bug (a .dll extension named improperly). To handle that specific case, please consult this link: Tensorflow GPU Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
For those who are facing issues regarding the above error(For Windows platform), I sorted it just by installing CuDNN version compatible with the CUDA already installed in the system.
This suitable version can be downloaded from the website Download CuDNN from Developer's portal. You might need Nvidia account for it. This will be easily created by providing mail id and filling a questionnaire.
To check the CUDA version, run NVCC --version.
Once the suitable version is downloaded, extract the folder from the zip file.
Go to the bin folder of the extracted folder. copy the cudnn64:7.dll and paste it in the CUDA's bin folder. In my case, the location where Cuda is installed is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin.
This would most probably solve the problem.
My system details:
Windows 10
CUDA 10.0
TensorFlow 2.0
GPU- Nvidia GTX 1060
I also found this blog Installing TensorFlow with CUDA and GPU support on Windows 10. very useful.
Check the instructions on this TensorFlow GPU instruction page for your OS. It resolved issue for me on Ubuntu 16.04.6 LTS and Tensorflow 2.0