tensorflow-gpu missing libraries dlerror - Import error - tensorflow

I am trying to use Tensorflow-gpu for the first time on HPC cluster. I have some main errors in terms of the lack of libraries that don't let me use the GPU.
2020-11-22 14:19:26.629817: W tensorflow/stream_executor/platform/default/dso_lo ader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcuda rt.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRAR Y_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16 .02/lib/p7zip
2020-11-22 14:19:26.629870: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-11-22 14:19:30.479705: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-22 14:19:31.048853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:82:00.0 name: Tesla P100-PCIE-16GB computeCapability: 6.0
coreClock: 1.3285GHz coreCount: 56 deviceMemorySize: 15.90GiB deviceMemoryBandwidth: 681.88GiB/s
2020-11-22 14:19:31.049038: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16.02/lib/p7zip
2020-11-22 14:19:31.049540: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16.02/lib/p7zip
2020-11-22 14:19:31.049988: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16.02/lib/p7zip
2020-11-22 14:19:31.050412: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16.02/lib/p7zip
2020-11-22 14:19:31.050833: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16.02/lib/p7zip
2020-11-22 14:19:31.051262: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/R/3.5.1/lib64/R/lib:/opt/cluster/lib:/opt/cluster/external/p7zip-16.02/lib/p7zip
2020-11-22 14:19:31.539912: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-22 14:19:31.539974: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Num GPUs Available: 0
By using "nvcc -version" I have:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
cudatoolkit version: 9.0 and cudnn: 7.6.5, tf: 2.3.1
I looked up online and found some similar errors, but, solutions did not work in my case. can you please help me?

As suggested by #Robert Crovella and As per the Tensorflow website, Tensorflow 2.3.1 version requires CUDA 10.1.
And, Error also says Could not load dynamic library 'libcudart.so.10.1'.

Related

Time consuming Tensorflow's CUDA driver check in AWS Lambda

I've been running an AWS Lambda and mounted an EFS, where I've installed Tensorflow 2.4. When I try to run the Lambda (and every Lambda that uses Tensorflow 2.4) it wastes a lot of time (about 4 minutes, or maybe more sometimes) on some Tensorflow's settings check. So I need to set a very wide timeout to overcome this issue.
These are the prints that the Lambda produces:
2022-05-17 06:33:21.917336: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:21.921992: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /var/lang/lib:/lib64:/usr/lib64:/var/runtime:/var/runtime/lib:/var/task:/var/task/lib:/opt/lib
2022-05-17 06:33:21.922025: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-17 06:33:21.922048: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (169.254.137.137): /proc/driver/nvidia/version does not exist
2022-05-17 06:33:21.922460: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-05-17 06:33:22.339905: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-05-17 06:33:22.340468: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2500010000 Hz
[WARNING] 2022-05-17T06:33:22.436Z c4500036-5b77-4808-a062-f8ae820b0317 AutoGraph could not transform <function Model.make_predict_function..predict_function at 0x7f65bfb37280> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output.
Cause: unsupported operand type(s) for -: 'NoneType' and 'int'
To silence this warning, decorate the function with #tf.autograph.experimental.do_not_convert
What I need is to overcome this waste of time, and run a clean elaboration.

tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found

I was instaling tensorflow on my cpu when I got these 2 errors:
2022-03-13 17:59:56.171741: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-03-13 17:59:56.171872: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Can anybody help me out here a little bit because I was also following a tutorial from a few years ago.
This is just a Warning and Information message that CUDA libraries cannot be found.
The I message at line 2: ignore the W message that comes above it if no CUDA GPU is installed on your machine.
The only effect of this is that training will happen on CPU only.
If you are using NVIDIA GPU, you can refer to how to install the missing files.
If you don't use NVIDIA GPU, or simply want to ignore the I and W messages, you can add the 2 lines below at the beginning of your code:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
You can see more about TF_CPP_MIN_LOG_LEVEL at TensorFlow logging.

Process finished with exit code -1073740791 (0xC0000409) tensorflow-gpu

Here is the code I am using:
import math
import pandas_datareader as web
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
Stock = 'BTC-USD'
#Get the stock quote
df = web.DataReader(Stock, data_source='yahoo', start='2016-01-01', end='2020-12-17')
#Show the Data
#print(df)
#Get the number of rows and columns in the data set
#print(df.shape)
#Visualize the closing price history
plt.figure(figsize=(16,8))
plt.title("Close Price History")
plt.plot(df['Close'])
plt.xlabel('Data', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.show()
print(1)
#Create a new Dataframe with only the 'close column'
data = df.filter(['Close'])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil(len(dataset) * .8)
print(2)
#print(training_data_len)
#Scale the data
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
print(3)
#print(scaled_data)
#Create the training data set
#Create the scaled training data set
train_data = scaled_data[0:training_data_len, :]
#Split the data into x_train and y_train data sets
x_train = []
y_train = []
print(4)
for i in range(60, len(train_data)):
x_train.append(train_data[i-60:i, 0])
y_train.append(train_data[i, 0])
if i<= 61:
print(x_train)
print(y_train)
print()
print(5)
#Convert the x_train and y_train to numpy arrays
x_train, y_train = np.array(x_train), np.array(y_train)
print(6)
#Reshape the x_train data set
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
#print(x_train.shape)
print(7)
#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(25))
model.add(Dense(1))
print(8)
#Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')
print(9)
#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=1)
print(10)
#Create the test data set
#Create a new array containing scaled values from index 1543 to 2003
test_data = scaled_data[training_data_len - 60: 2003]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
print(11)
#Convert the data to a numpy array
x_test = np.array(x_test)
#print(x_test.shape)
#Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1))
#Get the models predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)
#Evaluate the model. Get the root mean squared error (RMSE)
rmse = np.sqrt(np.mean(predictions - y_test)**2)
print(rmse)
#Plot the data
train = data[:training_data_len]
valid = data[training_data_len: ]
valid['Predictions'] = predictions
#Visualize the data
plt.figure(figsize=(16,8))
plt.title('Model')
plt.xlabel('Date', fontsize=18)
plt.ylabel('Close Price USD ($)', fontsize=18)
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.legend(['Train', 'Val', 'Predictions'], loc='lower right')
plt.show()
#Show the valid and predicted prices
print(valid)
It stops between 9 and 10 if you look at my "print" debugging method and this is the output:
C:\Users\gunne\anaconda3\envs\EnvBioWell\python.exe C:/Users/gunne/PycharmProjects/pythonProject/Stocks_Price.py
2021-02-19 09:57:52.322041: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
1
2
3
4
[array([0.00307386, 0.00303452, 0.00288404, 0.00301928, 0.00296962,
0.00284426, 0.00411515, 0.00390359, 0.00365686, 0.00367355,
0.00369274, 0.00313341, 0.00298767, 0.00289699, 0. ,
0.00101894, 0.00078898, 0.00100278, 0.00069457, 0.00245455,
0.00201685, 0.00079746, 0.00101697, 0.0016967 , 0.00120293,
0.00122168, 0.00134546, 0.00070072, 0.00066494, 0.00061141,
0.00019479, 0.00038312, 0.00044424, 0.00024669, 0.00110931,
0.0009756 , 0.00053531, 0.00053962, 0.00040029, 0.00051366,
0.00076044, 0.00067284, 0.00087522, 0.00120881, 0.00188371,
0.00157436, 0.00189504, 0.00228295, 0.00254865, 0.00247892,
0.00319813, 0.00326988, 0.00322377, 0.00247677, 0.00266203,
0.00264398, 0.00297805, 0.00299417, 0.00303742, 0.00322153])]
[0.003108507179665692]
[array([0.00307386, 0.00303452, 0.00288404, 0.00301928, 0.00296962,
0.00284426, 0.00411515, 0.00390359, 0.00365686, 0.00367355,
0.00369274, 0.00313341, 0.00298767, 0.00289699, 0. ,
0.00101894, 0.00078898, 0.00100278, 0.00069457, 0.00245455,
0.00201685, 0.00079746, 0.00101697, 0.0016967 , 0.00120293,
0.00122168, 0.00134546, 0.00070072, 0.00066494, 0.00061141,
0.00019479, 0.00038312, 0.00044424, 0.00024669, 0.00110931,
0.0009756 , 0.00053531, 0.00053962, 0.00040029, 0.00051366,
0.00076044, 0.00067284, 0.00087522, 0.00120881, 0.00188371,
0.00157436, 0.00189504, 0.00228295, 0.00254865, 0.00247892,
0.00319813, 0.00326988, 0.00322377, 0.00247677, 0.00266203,
0.00264398, 0.00297805, 0.00299417, 0.00303742, 0.00322153]), array([0.00303452, 0.00288404, 0.00301928, 0.00296962, 0.00284426,
0.00411515, 0.00390359, 0.00365686, 0.00367355, 0.00369274,
0.00313341, 0.00298767, 0.00289699, 0. , 0.00101894,
0.00078898, 0.00100278, 0.00069457, 0.00245455, 0.00201685,
0.00079746, 0.00101697, 0.0016967 , 0.00120293, 0.00122168,
0.00134546, 0.00070072, 0.00066494, 0.00061141, 0.00019479,
0.00038312, 0.00044424, 0.00024669, 0.00110931, 0.0009756 ,
0.00053531, 0.00053962, 0.00040029, 0.00051366, 0.00076044,
0.00067284, 0.00087522, 0.00120881, 0.00188371, 0.00157436,
0.00189504, 0.00228295, 0.00254865, 0.00247892, 0.00319813,
0.00326988, 0.00322377, 0.00247677, 0.00266203, 0.00264398,
0.00297805, 0.00299417, 0.00303742, 0.00322153, 0.00310851])]
[0.003108507179665692, 0.0026196096172032522]
5
6
7
2021-02-19 09:57:55.479350: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-02-19 09:57:55.479830: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-02-19 09:57:55.506019: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0e:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-02-19 09:57:55.506184: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-02-19 09:57:55.509645: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-02-19 09:57:55.509728: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-02-19 09:57:55.511906: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-02-19 09:57:55.512542: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-02-19 09:57:55.516144: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-02-19 09:57:55.517468: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-02-19 09:57:55.518014: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-02-19 09:57:55.518133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-19 09:57:55.518454: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-19 09:57:55.519487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:0e:00.0 name: GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.74GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-02-19 09:57:55.519678: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-02-19 09:57:55.519763: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-02-19 09:57:55.519836: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-02-19 09:57:55.519915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-02-19 09:57:55.519995: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-02-19 09:57:55.520066: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-02-19 09:57:55.520138: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-02-19 09:57:55.520212: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-02-19 09:57:55.520304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-02-19 09:57:56.033929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-19 09:57:56.034021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-02-19 09:57:56.034068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-02-19 09:57:56.034278: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8444 MB memory) -> physical GPU (device: 0, name: GeForce RTX 3080, pci bus id: 0000:0e:00.0, compute capability: 8.6)
2021-02-19 09:57:56.035130: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
8
9
2021-02-19 09:57:56.610845: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-02-19 09:57:57.789418: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-02-19 09:57:58.358210: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-02-19 09:57:58.377765: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
Process finished with exit code -1073740791 (0xC0000409)
I'm using my RTX 3080 GPU with Cuddn 8.0 and Cuda Toolkit 11.0 and tensorflow-gpu 2.4.0 with python 3.7. Please let me know if anyone has any suggestions! I've tried everything I can think of and when I run code to check for GPU it pulls it up, so it should be working, but I just can't get it to. I had it working a month ago before I messed something up and now it shows all of those time stamps in the output with packages it's opening. Not sure why, but I just need help.
Have you tried this link:
https://www.reddit.com/r/tensorflow/comments/jsalkw/rtx_3090_and_tensorflow_for_windows_10_step_by/
Or have you solved it already?
Here is the content of the link:
The NVIDIA 3000 Series GPUs (Ampere) require CUDA v11 and cuDNN v8 to work. The tensorflow versions on anaconda and pip on Windows (currently at max tensorflow 2.3) do not include a tensorflow built with CUDA v11. But you can use pip to install a nightly build of tensorflow (currently tensorflow 2.5) which built with CUDA v11. Apart from a tensorflow build with CUDA v11, you will also need the actual DLLs for CUDA v11 and cuDNN v8. Normally, you would just install these with anaconda with the packages cudatoolkit and cudnn, but while cudatoolkit is available with v11, for cudnn, at least for Windows, v8 is not available in anaconda. The workaround is to manually get these DLLs and set them in the system environment path (so that python/tensorflow can find and load them). So let's start:
First, install anaconda if you haven't already. Open the anaconda prompt with admin rights.
Type conda create -n tf2 python=3.8 and hit enter to create a new anaconda environment with python 3.8 (the tensorflow nightly build needs python 3.8 or higher, that's why we are using python 3.8)
Type activate tf2 or conda activate tf2 and hit enter to enter that new environment.
Install the nightly tensorflow build with pip3 install tf-nightly-gpu
Install other packages that you might need. For me, it's conda install jupyter scikit-learn matplotlib pandas
Now, download CUDA v11 from NVIDIA (https://developer.nvidia.com/cuda-downloads or https://developer.nvidia.com/cuda-toolkit-archive ). Yeah, the file is pretty big with 3GB.
Additionally, apparently we also need a Microsoft Visual Studio version for C++ for the installer to run properly. Download the free Visual Studio Community Edition (https://visualstudio.microsoft.com/downloads/ ) and install the C++ components. For this, select "Desktop development with C++", select the first 6 options and install. This step is taken from the guide I mentioned earlier , so refer to it if you have trouble with this. For me, I already had Visual Studio with C++ in mind set up on my computer, so I could skip this step.
Now, let's first execute the CUDA v11 installer. Execute it. You can do the express installation, but if you already have GeForce Experience installed, you can also choose the Custom option and deselect everything that you already have installed with a higher version. For me, I only needed the very first checkbox with the CUDA options, so that might be enough.
What the CUDA v11 installer basically did was installing all the CUDA v11 DLLs, Headers, and stuff in the directory "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1" (the version may be different for you). What we will do next: Add the cuDNN DLLs, Headers, etc. in this directory as well and then add this directory to the system path. Ok, let's go.
Download cuDNN from NVIDIA (https://developer.nvidia.com/rdp/cudnn-download ). This file is around 700MB. You need to register as a developer and answer some questions, but don't worry, it's free. When asked for an email, you can type in any email, since in the next page, you will get an option to login using google or facebook as an alternative (which you may or may not prefer). Once you downloaded the file, extract it. Going into the directory, you will see three folders "bin", "include", "lib". Comparing it with the CUDA v11 directory (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1), you'll notice that these directories are present there as well! So just copy the folders from cuDNN to the CUDA v11 directory. Windows will add the files into the existing folders.
Now, let's add those directories to the system path. In windows, open start and search for "This PC". Rightclick and select "Properties" to open a Window called "System". On the left side at the bottom, select "Advanced system settings". Click "Environment Variables..." at the bottom. Here, in the lower half, in "System variables", find and open "Path". Here, click "New" to add a new directory to the system path. Do this every time for each of the following directories (as mentioned earlier, the version number may be different for you). Some of the directories may be already listed there, so feel free to skip them (there is no negative effect from double entries though, so don't worry too much): C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\extras\CUPTI\lib64 C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include
Now, very important: Restart your system!
Now, run your code to see if everything works. For me, it was through a jupyter notebook. A simple thing to do first is to import tensorflow and check the physical devices:import tensorflow as tftf.config.list_physical_devices()
Your GPU may not show up. Take a close look at the output of the console (for me, it was the anaconda prompt with which I started up my jupyter notebook). There, you should see logs like tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll or a similar log stating that a certain DLL could not be loaded! In my case, everything loaded except the DLL "cusolver64_10.dll". So, I went to the CUDA v11 directory (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1), opened the "bin" folder (the DLLs are in there) to check if that DLL was there. Nope, it was not. Instead, there was "cusolver64_11.dll". So what I did was just copy that DLL and renamed the copy to "cusolver64_10.dll". Yeah, sounds dumb, but after that, everything worked.

Object Detection Performance Issues Using Tensorflow 2.1.0 and Tensorflow Hub

Running through some of the object detection documentation and examples found online utilizing the OpenImagesV4 Data Model I am experiencing less than favorable performance on the processing speed of the detection events. The code I am using is as follows and is a stripped down version of the detection so I can understand the performance metrics. The Camera Stream Processes Fine without using any detection, Once detection is implemented it slows the feed down by roughly 20 seconds or so. I have seen this done in TF1.14 using the old object detection with tf.graph() functions with near zero-delay on a different model so my question is really where can more performance be made for the feed stream or where are my hang-ups at with this stripped down version. This is using the gpu for processing but only seeing spikes at ~6%. My original thought was to introduce threading on the Detection process but I am not sure how to go about doing that or if it is necessary
Software
Tensorflow version (2.1.0)
Cuda 10.1
cudnn 7
Hardware
CPU: Intel i7-4820K
GPU: Geforce GTX 1660 (6GB)
Memory: 16GB
import cv2
import time
import gc
from datetime import datetime
import tensorflow as tf
import tensorflow_hub as hub
low_res_vid_source = "http://192.168.1.85:14238/videostream.cgi?loginuse=####&loginpas=######"
hi_res_vid_source = "rtsp://####:#####192.168.1.85:10554/tcp/av0_0"
cap = cv2.VideoCapture(low_res_vid_source)
#Low Res (640): Hi Res (1280)
width = cap.get(3)
#Low Res (480): Hi Res (720)
height = cap.get(4)
print("Dimensions: Width: ", width, "Height: ", height)
#Remote Loading
#module_handle = "https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1"
#Local Loading
module_handle = "C://Users//Isaiah//tf2//Tutorial Sets//Expert//HubCache//ddd04e3eaa283f2b3ae566e084863074d12b403a"
detector = hub.load(module_handle).signatures['default']
def LoadStream():
ret, frame = cap.read()
image_resize_val = (1280, 720)
frame = cv2.resize(frame, image_resize_val)
## Average Calculation Time of Conversion Of Pixel Normalization = 0.018950 Seconds
frame = frame / 255
## Average Calculation Time of Conversion Of Image Data Type = 0.001999 Seconds
converted_img = tf.image.convert_image_dtype(frame, tf.float32)[tf.newaxis, ...]
## Average Calculation Time of Loading Results From Detector = 1.7 Seconds
time_start = time.time()
results = detector(converted_img)
time_end = time.time()
print("Detection Took: ", time_end - time_start)
cv2.imshow('camera feed', frame)
while True:
LoadStream()
if cv2.waitKey(1) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
Output From the Conda Environment for this code is as follows and nothing seems to be really sticking out
(tf2-gpu) C:\Users\Isaiah\tf2\Tutorial Sets\Expert\Camera_Feed>python Camera_Feed_Raw.py
2020-05-03 16:52:36.567941: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Dimensions: Width: 640.0 Height: 360.0
2020-05-03 16:54:52.037826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-05-03 16:54:52.253465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2020-05-03 16:54:52.260714: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-03 16:54:52.272442: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-03 16:54:52.282134: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-03 16:54:52.287729: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-03 16:54:52.300130: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-03 16:54:52.307647: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-03 16:54:52.326362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-03 16:54:52.331006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-03 16:54:52.334046: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX
2020-05-03 16:54:52.626783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:03:00.0 name: GeForce GTX 1660 computeCapability: 7.5
coreClock: 1.815GHz coreCount: 22 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 178.86GiB/s
2020-05-03 16:54:52.633826: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-05-03 16:54:52.638740: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-03 16:54:52.642777: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-05-03 16:54:52.647763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-05-03 16:54:52.651710: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-05-03 16:54:52.656789: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-05-03 16:54:52.660852: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-03 16:54:52.667018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-05-03 16:54:53.626966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-03 16:54:53.630823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-05-03 16:54:53.633295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-05-03 16:54:53.638096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4630 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660, pci bus id: 0000:03:00.0, compute capability: 7.5)
2020-05-03 16:57:25.429470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-05-03 16:57:26.697611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-05-03 16:57:29.627538: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Internal: Invoking GPU asm compilation is supported on Cuda non-Windows platforms only
Relying on driver to perform ptx compilation. This message will be only logged once.
Detection Took: 58.80091857910156
Detection Took: 1.747373104095459
Detection Took: 1.7253808975219727
Detection Took: 1.736377477645874
Detection Took: 1.7273805141448975
Detection Took: 1.7343783378601074
Detection Took: 1.742375373840332
Detection Took: 1.7413759231567383
Detection Took: 1.7293803691864014
Detection Took: 1.7283804416656494
Detection Took: 1.7403762340545654
Detection Took: 1.7323787212371826
Detection Took: 1.7373778820037842
Detection Took: 1.7323782444000244

While importing tensorflow I got an "errror" saying the following

2019-11-07 00:41:30.414603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Is this an error or is it normal? I saw 'successfully' so I'm thinking its good but is it?