I am new with Deep learning. I have a A100 GPU installed with CUDA 11.6. I installed using Conda tensor flow-1.15 and tensorflow gpu - 1.15, cudatoolkit 10.0, python 3.7 but the code I am trying to run from github has given a note as below and it shows errors which I am finding difficult to interpret where I went wrong.The error is displayed as
failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED 2022-06-30 09:37:12.049400: I tensorflow/stream_executor/stream.cc:4925] [stream=0x55d668879990,impl=0x55d668878ac0] did not memcpy device-to-host; source: 0x7f2fe2d0d400 2022-06-30 09:37:12.056385: W tensorflow/core/framework/op_kernel.cc:1651] OP_REQUIRES failed at iterator_ops.cc:867 : Cancelled: Operation was cancelled
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found. (0) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] (1) Internal: Blas GEMM launch failed : a.shape=(25, 25), b.shape=(25, 102400), m=25, n=102400, k=25 [[{{node Hyperprior/HyperAnalysis/layer_Hyperprior_1/MatMul}}]] [[Hyperprior/truediv_3/_3633]]
NOTE: At the moment, we only support CUDA 10.0, Python 3.6-3.7, TensorFlow 1.15, and Tensorflow Compression 1.3. TensorFlow must be installed via pip, not conda. Unfortunately, newer versions of Tensorflow or Python will not work due to various constraints in the dependencies and in the TF binary API.
Related
I'm trying to run the tutorial code from Kaggle on my computer. However, the kernel crashed in the model training part history=ConvNeXt_model.fit().
Here is the jupyter notebook log:
warn 16:45:23.988: StdErr from Kernel Process 2023-02-13 16:45:23.989108: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neu
warn 16:45:23.988: StdErr from Kernel Process ral Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
warn 16:45:24.253: StdErr from Kernel Process 2023-02-13 16:45:24.253410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/repli
warn 16:45:24.253: StdErr from Kernel Process ca:0/task:0/device:GPU:0 with 21348 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
warn 16:45:44.398: StdErr from Kernel Process 2023-02-13 16:45:44.398973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
warn 16:45:44.798: StdErr from Kernel Process 2023-02-13 16:45:44.799017: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logge
warn 16:45:44.799: StdErr from Kernel Process d once.
warn 16:45:45.140: StdErr from Kernel Process 2023-02-13 16:45:45.141061: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1e2b8a88750 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-02-13 16:45:45.141144: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0
warn 16:45:45.141: StdErr from Kernel Process ): NVIDIA GeForce RTX 4090, Compute Capability 8.9
warn 16:45:45.191: StdErr from Kernel Process 2023-02-13 16:45:45.191262: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code -1, output: ' If the error message indicates that a file could not be written, please verify that sufficient
warn 16:45:45.191: StdErr from Kernel Process filesystem space is provided.
error 16:45:45.530: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\Users\User\anaconda3\envs\tf\lib\site-packages\traitlets\traitlets.py:2548: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5.
warn(
c:\Users\User\anaconda3\envs\tf\lib\site-packages\traitlets\traitlets.py:2499: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use '00cfbd3c-ac34-43be-a838-9653221d1a82' instead of 'b"00cfbd3c-ac34-43be-a838-9653221d1a82"'.
warn(
2023-02-13 16:45:23.989108: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-13 16:45:24.253410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21348 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:01:00.0, compute capability: 8.9
2023-02-13 16:45:44.398973: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8100
2023-02-13 16:45:44.799017: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] INTERNAL: ptxas exited with non-zero error code -1, output:
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2023-02-13 16:45:45.141061: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1e2b8a88750 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-02-13 16:45:45.141144: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2023-02-13 16:45:45.191262: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:453] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: ptxas exited with non-zero error code -1, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
info 16:45:45.530: Dispose Kernel process 24268.
error 16:45:45.530: Raw kernel process exited code: 3221226505
error 16:45:45.531: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:33213)
at c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:52265
at Map.forEach (<anonymous>)
at y._clearKernelState (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:52250)
at y.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:45732)
at c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:17:139244
at Z (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:2:1608939)
at Kp.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:17:139221)
at qp.dispose (c:\Users\User\.vscode\extensions\ms-toolsai.jupyter-2023.1.2010391206\out\extension.node.js:17:146518)
at process.processTicksAndRejections (node:internal/process/task_queues:96:5)]
warn 16:45:45.531: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
It is weird that I can successfully train other models (such as ResNet or EfficientNet) using the GPU but only failed in the ConvNext model. And I followed the instruction to install the TensorFlow.
I guess the error may happen in XLA implementation, but I do not know how the fix it.
All the codes are running on win10 VScode.
Device information:
Nvidia Driver 527.56
CUDA 11.2
cuDNN 8.1.0
Python 3.9.10
TensorFlow 2.10.1
GPU Nvidia RTX 4090
I have this code to disable GPU usage:
import numpy as np
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
w = tf.Variable(
[
[1.],
[2.]
])
I get this output still, not sure why :
E:\MyTFProject\venv\Scripts\python.exe E:/MyTFProject/tfvariable.py
2021-11-03 14:09:16.971644: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-11-03 14:09:16.971644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-03 14:09:19.563793: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-03 14:09:19.566793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: newtonpc
2021-11-03 14:09:19.567793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: mypc
2021-11-03 14:09:19.567793: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
TF Version: '2.6.1'
Not able to stop it from loading Cuda DLLs. I dont want to setup cuda just right now. Maybe later.
I am using the latest PyCharm and installed tensorflow as given in the site with pip.
You can try to reinstall tensorflow with CPU-only version. The links are available here depending on your OS and your python version:
https://www.tensorflow.org/install/pip?hl=fr#windows_1
Excuse for the stupid question from a newbie. I just got my TensorFlow, both GPU and CPU, installed on Ubuntu 20.04 LTS on WSL2 with CUDA. My GPU is GeForce 940MX. When I try to test the TensorFlow installation, the first attempt got some warning as the message below. However, the exact same syntax following went through smoothly. I wonder what happened here? And is it using CPU or GPU actually?
>>> import tensorflow as tf
2021-01-22 12:01:14.284464: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
2021-01-22 12:01:20.034653: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-22 12:01:20.035316: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2021-01-22 12:01:20.035400: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2021-01-22 12:01:20.035518: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-3E6PSHT): /proc/driver/nvidia/version does not exist
2021-01-22 12:01:20.035984: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-01-22 12:01:20.036526: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
tf.Tensor(-872.4863, shape=(), dtype=float32)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(590.53516, shape=(), dtype=float32)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(294.59973, shape=(), dtype=float32)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(261.34412, shape=(), dtype=float32)
Anything helps!
I am trying to run tensorflow with CPU support.
tensorflow:
Version: 1.14.0
Keras:
Version: 2.3.1
When I try to run the following piece of code :
def run_test_harness(trainX,trainY,testX,testY):
datagen=ImageDataGenerator(rescale=1.0/255.0)
train_it = datagen.flow(trainX, trainY, batch_size=1)
test_it = datagen.flow(testX, testY, batch_size=1)
model=define_model()
history = model.fit_generator(train_it, steps_per_epoch=len(train_it),
validation_data=test_it, validation_steps=len(test_it), epochs=1, verbose=0)
I get the following error as shown in image:
Image shows the error
I tried to configure bazel for the same but it was of no use. It would be helpful if someone could direct me to resources or help with the problem. Thank you
EDIT : (Warning messages)
WARNING:tensorflow:From /home/neha/valiance/kerascpu/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
WARNING:tensorflow:From /home/neha/valiance/kerascpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-10-22 12:41:36.023849: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-22 12:41:36.326420: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299965000 Hz
2020-10-22 12:41:36.327496: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5502350 executing computations on platform Host. Devices:
2020-10-22 12:41:36.327602: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-10-22 12:41:36.679930: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2020-10-22 12:41:36.890241: W tensorflow/core/framework/allocator.cc:107] Allocation of 3406823424 exceeds 10% of system memory.
^Z
[1]+ Stopped python3 model.py
You should try running your code on google colab. I think there aren't enough resources available on your PC for the task you are trying to run even though you are using a batch_size of 1.
I am trying to understand and debug my code. I try to predict with a CNN model developed under tf2.0/tf.keras on GPU, but get those error messages.
could someone help me to fix it?
here is my environmental configuration
enviroments:
python 3.6.8
tensorflow-gpu 2.0.0-rc0
nvidia 418.x
CUDA 10.0
cuDNN 7.6+**
and the log file,
2019-09-28 13:10:59.833892: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-09-28 13:11:00.228025: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2019-09-28 13:11:00.957534: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963310: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 13:11:00.963416: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node mobilenetv2_1.00_192/Conv1/Conv2D}}]]
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0=====>GPU Available: True
=====> 4 Physical GPUs, 1 Logical GPUs
mobilenetv2_1.00_192/block_15_expand_BN/cond/then/_630/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_depthwise_BN/cond/then/_644/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_15_project_BN/cond/then/_658/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_expand_BN/cond/then/_672/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_depthwise_BN/cond/then/_686/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/block_16_project_BN/cond/then/_700/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const: (Const): /job:localhost/replica:0/task:0/device:GPU:0
mobilenetv2_1.00_192/Conv_1_bn/cond/then/_714/Const_1: (Const): /job:localhost/replica:0/task:0/device:GPU:0
Traceback (most recent call last):
File "NSFW_Server.py", line 162, in <module>
model.predict(initial_tensor)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 915, in predict
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 722, in predict
callbacks=callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 393, in model_iteration
batch_outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/backend.py", line 3625, in __call__
outputs = self._graph_fn(*converted_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node mobilenetv2_1.00_192/Conv1/Conv2D (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [Op:__inference_keras_scratch_graph_10727]
Function call stack:
keras_scratch_graph
The code
if __name__ == "__main__":
print("=====>GPU Available: ", tf.test.is_gpu_available())
tf.debugging.set_log_device_placement(True)
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("=====>", len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)
paras_path = "./paras/{}".format(int(2011))
model = tf.keras.experimental.load_from_saved_model(paras_path)
initial_tensor = np.zeros((1, INPUT_SHAPE, INPUT_SHAPE, 3))
model.predict(initial_tensor)
You have to check that you have the right version of CUDA + CUDNN + TensorFlow (also ensure that you have all installed).
A couple of examples of running configurations are presented below(UPDATE FOR LATEST VERSIONS OF TENSORFLOW)
Cuda 11.3.1 + CuDNN 8.2.1.32 + TensorFlow 2.7.0
Cuda 11.0 + CuDNN 8.0.4 + TensorFlow 2.4.0
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.2.0/TensorFlow 2.3.0 (TF >= 2.1 requires CUDA >=10.1)
Cuda 10.1 + CuDNN 7.6.5 (normally > 7.6) + TensorFlow 2.1.0 (TF >= 2.1 requires CUDA >=
10.1)
Cuda 10.0 + CuDNN 7.6.3 + / TensorFlow 1.13/1.14 / TensorFlow 2.0.
Cuda 9.0 + CuDNN 7.0.5 + TensorFlow 1.10
Usually this error appears when you have an incompatible version of TensorFlow/CuDNN installed. In my case, this appeared when I tried using an older TensorFlow with a newer version of CuDNN.
**If for some reason you get an error message like(and nothing happens afterwards) :
Relying on the driver to perform ptx compilation
Solution : Install the latest nvidia driver
[SEEMS TO BE SOLVED IN TF >= 2.5.0] (see below):
Only for Windows Users : Some late combintations of CUDA, CUDNN and TF may not work, due to a bug (a .dll extension named improperly). To handle that specific case, please consult this link: Tensorflow GPU Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
For those who are facing issues regarding the above error(For Windows platform), I sorted it just by installing CuDNN version compatible with the CUDA already installed in the system.
This suitable version can be downloaded from the website Download CuDNN from Developer's portal. You might need Nvidia account for it. This will be easily created by providing mail id and filling a questionnaire.
To check the CUDA version, run NVCC --version.
Once the suitable version is downloaded, extract the folder from the zip file.
Go to the bin folder of the extracted folder. copy the cudnn64:7.dll and paste it in the CUDA's bin folder. In my case, the location where Cuda is installed is C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin.
This would most probably solve the problem.
My system details:
Windows 10
CUDA 10.0
TensorFlow 2.0
GPU- Nvidia GTX 1060
I also found this blog Installing TensorFlow with CUDA and GPU support on Windows 10. very useful.
Check the instructions on this TensorFlow GPU instruction page for your OS. It resolved issue for me on Ubuntu 16.04.6 LTS and Tensorflow 2.0