I've installed tensorflow in a remote machine, then connected it via remote VSCode.
I've also installed tensorflow following the installation guide that I have the conda environment with the name atnwf in the remote machine.
I also learned that jupyter notebook does not keep environment variables so that I set the proper LD_LIBRARY_PATH using %env in my notebook.
%env LD_LIBRARY_PATH=/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/
However still the tensorflow warning message says
2022-12-21 13:46:29.064619: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/grad/tomandjerry/anaconda3/envs/atnwf/lib/
I've already checked that the library was installed in the corresponding path. Moreover, I verified the basic installation.
(atnwf) [lib]$ python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2022-12-21 14:02:34.992430: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-21 14:02:36.382350: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/lib/:/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/:/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/:/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/
2022-12-21 14:02:36.382611: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/lib/:/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/:/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/:/home/grad/tomandjerry/anaconda3/envs/atnwf/lib/
2022-12-21 14:02:36.382633: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]
How can I resolve the "not found" issue correctly?
I tried re-installing all the packages in the conda environment by resetting a new environment with another name. I've also tried the method editing kernelspec stated in How to set env variable in Jupyter notebook.
Related
I know that this question has been asked a lot, but none of the suggestions seem to work, probably since my setup is somewhat different:
Ubuntu 22.04
python 3.10.8
tensorflow 2.11.0
cudatoolkit 11.2.2
cudnn 8.1.0.77
nvidia-tensorrt 8.4.3.1
nvidia-pyindex 1.0.9
Having created a conda environment 'tf', in the directory home/dan/anaconda3/envs/tf/lib/python3.10/site-packages/tensorrt I have
libnvinfer_builder_resource.so.8.4.3
libnvinfer_plugin.so.8
libnvinfer.so.8
libnvonnxparser.so.8
libnvparsers.so.8
tensorrt.so
When running python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" I get
tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7';
dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory;
LD_LIBRARY_PATH: :/home/dan/anaconda3/envs/tf/lib
tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7';
dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory;
LD_LIBRARY_PATH: :/home/dan/anaconda3/envs/tf/lib
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
I'm guessing I should downgrade nvidia-tensorrt, but nothing I've tried seems to work, any advice would be much appreciated.
Solution: follow the steps listed here https://github.com/tensorflow/tensorflow/issues/57679#issuecomment-1249197802.
Add the following to ~/.bashrc (for the conda envs as described in my scenario):
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/dan/anaconda3/lib/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/dan/anaconda3/lib/python3.8/site-packages/tensorrt/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/dan/anaconda3/envs/tf/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/dan/anaconda3/envs/tf/lib/python3.8/site-packages/tensorrt/
For me the setting a symbolic link from libnvinfer version 7 to 8 worked:
# the follwoing path will be different for you - depending on your install method
$ cd env/lib/python3.10/site-packages/tensorrt
# create symbolic links
$ ln -s libnvinfer_plugin.so.8 libnvinfer_plugin.so.7
$ ln -s linvinfer.so.8 libnvinfer.so.7
# add tensorrt to library path
$ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/env/lib/python3.10/site-packages/tensorrt/
I installed TensorFlow as described on this page in WSL.
The final test after installing is succesfull and the GPU is detected by TensorFlow:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
2022-12-14 15:40:40.633322: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-14 15:40:41.288334: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/miepi/miniconda3/envs/tf/lib/
2022-12-14 15:40:41.288439: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/miepi/miniconda3/envs/tf/lib/
2022-12-14 15:40:41.288485: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-12-14 15:40:41.778389: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-12-14 15:40:41.786330: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-12-14 15:40:41.786752: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
But using the same conda environment in vscode the gpu isn't detected:
import tensorflow as tf
Output:
2022-12-14 15:52:40.605584: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-14 15:52:40.702419: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-14 15:52:40.702451: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-14 15:52:41.280631: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-12-14 15:52:41.280716: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-12-14 15:52:41.280726: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
# Check if GPU is available
print(tf.config.list_physical_devices('GPU'))
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Output:
[]
Num GPUs Available: 0
2022-12-14 15:52:43.871361: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:967] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-12-14 15:52:43.871484: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871536: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871577: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871617: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871656: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871694: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871734: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871772: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-12-14 15:52:43.871781: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Does anybody have an idea, what I could try to fix this issue? Or what I could have done wrong?
I already checked if the correct kernel is selected in vscode.
And the correct interpreter is set for the workspace as well.
Jupyter is installed in the environment
Could it be possible that this step from the tutorial is not working for the jupyter kernel in vscode?
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
But I don't know how to set that up in vscode.
This question already has answers here:
Keras: "RuntimeError: Failed to import pydot." after installing graphviz and pydot
(13 answers)
Closed 7 months ago.
2022-07-19 18:41:58.081967: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-19 18:41:58.082145: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-07-19 18:42:01.316963: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-19 18:42:01.317546: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found
2022-07-19 18:42:01.318073: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found
2022-07-19 18:42:01.318783: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found
2022-07-19 18:42:01.319328: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found
2022-07-19 18:42:01.319857: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusolver64_11.dll'; dlerror: cusolver64_11.dll not found
2022-07-19 18:42:01.320492: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found
2022-07-19 18:42:01.321261: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2022-07-19 18:42:01.321404: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2022-07-19 18:42:01.321943: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
--------- Training ---------
Weight for class fire : 0.84
Weight for class No_fire : 1.24
Found 8617 files belonging to 2 classes.
Using 6894 files for training.
Found 8617 files belonging to 2 classes.
Using 1723 files for validation.
Traceback (most recent call last):
File "C:\Users\Lenovo\Desktop\Hobiler\PythonDers\Programlar\SaDe\main.py", line 80, in <module>
train_keras()
File "C:\Users\Lenovo\Desktop\Hobiler\PythonDers\Programlar\SaDe\training.py", line 114, in train_keras
keras.utils.plot_model(model, show_shapes=True)
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\vis_utils.py", line 421, in plot_model
dot = model_to_dot(
File "C:\Users\Lenovo\AppData\Local\Programs\Python\Python39\lib\site-packages\keras\utils\vis_utils.py", line 163, in model_to_dot
raise ImportError(message)
ImportError: You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model/model_to_dot to work.
Process finished with exit code 1
I tried using
pip install pydot
pip install pydotplus
pip install graphviz
commands. But still got this error.I'm new to machine learning and I would appreciate it if you could help me a bit.
Here is the GitHub of the code.
Unfortunately, there are two software systems named Graphviz. You need both.
Go here https://graphviz.gitlab.io/download/ and download the appropriate version (pip won't solve your problem)
I have this code to disable GPU usage:
import numpy as np
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
w = tf.Variable(
[
[1.],
[2.]
])
I get this output still, not sure why :
E:\MyTFProject\venv\Scripts\python.exe E:/MyTFProject/tfvariable.py
2021-11-03 14:09:16.971644: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2021-11-03 14:09:16.971644: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-11-03 14:09:19.563793: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-11-03 14:09:19.566793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: newtonpc
2021-11-03 14:09:19.567793: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: mypc
2021-11-03 14:09:19.567793: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
TF Version: '2.6.1'
Not able to stop it from loading Cuda DLLs. I dont want to setup cuda just right now. Maybe later.
I am using the latest PyCharm and installed tensorflow as given in the site with pip.
You can try to reinstall tensorflow with CPU-only version. The links are available here depending on your OS and your python version:
https://www.tensorflow.org/install/pip?hl=fr#windows_1
I got the following error when trying to train my tensorflow model on sagemaker ml.p2.xlarge instance. I use tensorflow==2.3.0. I wonder whether this is because of the tensorflow version incompatibility with cuda. sagemaker ml.p2.xlarge seems to use cuda 10.0
GPU error:
2020-08-31 08:46:46.429756: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-08-31 08:47:02.170819: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/openmpi/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-08-31 08:47:02.764874: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
This question is probably old, but it falls back on an open issue found at the beginning of choosing which versions of frameworks to use.
The problem does not depend on the type of instance that you specified (which has NVidia GPU).
From the official documentation "Available Deep Learning Containers Images", to date 20/10/2022, precompiled versions higher than 2.2 do not seem to be usable:
Framework
Job Type
Horovod Options
CPU/GPU
Python Version Options
Example URL
TensorFlow 2.2 (Cuda 10.2)
training
Yes
GPU
3.7 (py37)
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.2.0-gpu-py37-cu102-ubuntu18.04
TensorFlow 2.2
inference
No
GPU
3.7 (py37)
763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.2.0-gpu-py37-cu102-ubuntu18.04
Within the dockerfile that is used to use the container is the instruction to install the libraries that your custom version is missing:
RUN apt-get update && apt-get install -y --no-install-recommends --allow-unauthenticated \
python3-dev \
python3-pip \
python3-setuptools \
ca-certificates \
cuda-command-line-tools-10-1 \
cuda-cudart-dev-10-1 \
cuda-cufft-dev-10-1 \
cuda-curand-dev-10-1 \
cuda-cusolver-dev-10-1 \
cuda-cusparse-dev-10-1 \
curl \
libcudnn7=7.6.2.24-1+cuda10.1 \
# TensorFlow doesn't require libnccl anymore but Open MPI still depends on it
libnccl2=2.4.7-1+cuda10.1 \
libgomp1 \
libnccl-dev=2.4.7-1+cuda10.1 \
....
Then you can install the required libraries from your custom version directly with a requirements.txt file or run the install command directly in the training script.
If there are no special project requirements, I recommend using the precompiled versions of sagemaker. Otherwise, build a docker image from scratch instead of installing libraries this way..