I make a Project using TensorFlow Object detection.
Windows might have encountered an error because of the GPU module version. Linux is trying again now.
If you look at some review posts, it seems that GPU version has not been used. Is that right?
Related
I have AMD graphics card with me. Now, when I try to import tensorflow, I get the following error.
I think (as I understand) that if I use normal TF and not TF-GPU, then this issue may resolve. However, I do want to use the graphics card because my reinforcement learning algorithms are really very slow otherwise. Is there any workaround?
ImportError: Could not find 'nvcuda.dll'. TensorFlow requires that
this DLL be installed in a directory that is named in your %PATH%
environment variable. Typically it is installed in
'C:\Windows\System32'. If it is not present, ensure that you have a
CUDA-capable GPU with the correct driver installed.
Currently, right now with AMD, there are two ways you can go about it.
Either using the lastest AMD's ROCm to install tensorflow.
official ROCm install
and
official ROCm tensorflow install
check if you use the supported AMD GPU check it over here
or using the OpenCL implementation of TensorFlow if your video card does not support ROCm
https://github.com/benoitsteiner/tensorflow-opencl
or
https://github.com/hughperkins/tf-coriander
I started a new machine learning project.
In according to this document (https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras)
TF with Tensorboard appears to support GPU profiling. So, i used the same code in my Jupyter Notebook for testing.
The sample code generates profiling resulting. However, there is no GPU tracing information in resulting file. (only CPU)
This is my main problem.
I am using two RTX 2080 TI graphic cards.
And also, they were working when running the code.
The sample code does not use MirroredStrategy. So, i could see the one of them was running.
At first, i thought Tensorboard was the problem. But,i realized soon that TF does not generate the GPU tracing information.
The image above is the resulting file (local.trace). There was no GPU data.
It is my system specification.
OS ubuntu 18.04
jupyter-client 5.3.4
jupyter-core 4.6.1
jupyter-tensorboard 0.1.10
tensorflow-gpu 2.0.0
tensorflow-estimator 2.0.1
tensorflow-metadata 0.15.1
tensorboard 2.0.2
nVidia 410.104
CUDA 10.0
anaconda 4.7.12 (with python 3.6)
It looks irrelevant, but there was a warning message like the image below.
I have tested this on other PC and got the same resulting. It could be the GPU profiling is only supporting on Google Colab. (I am still confusing) Recently, I have searched it on google to fix the problem. I could not get still the answer.
Is there someone who is using GPU profiling on your own System instead of Google Colab?
Please give me piece of advices.
I figured out what caused the problem.
It was related with CUPTI(CUDA Profiling Tools Interface)
In contrast to Jupyter Notebook, there was a warning message when the code is running on Ubunto shell.
CUPTI error: CUPTI could not be loaded or symbol could not be found.
TF could not find CUPTI libraries. This is the main reason of the problem.
After adding the path to LD_LABRARY_PATH as below link, the problem is fixed!
https://stackoverflow.com/a/58752904/5553618
Is it possible to run Tensorboard on a machine without CUDA support?
I'm working at a computation center (via ssh) which has two major clusters:
CPU-Cluster which is a general workhorse without CUDA support (no dedicated GPU)
GPU-Cluster with dedicated GPUs e.g. for running neural networks with tensorflow-gpu.
The access to the GPU-cluster is limited to Training etc. such that I can't afford to run Tensorboard on a machine with CUDA-support. Instead, I'd like to run Tensorboard on the CPU-Cluster.
With the TF bundled Tensorboard I get import errors due to missing CUDA support.
It seems reasonable that the official Tensorboard should have a mode for running with CPU-only. Is this true?
I've also found an inofficial standalone Tensorboard version (github.com/dmlc/tensorboard), does this work without CUDA-support?
Solved my problem: just install tensorflow instead of tensorflow-gpu.
Didn't work for me for a while due to my virtual environment (conda), which didn't properly remove tensorflow-gpu.
Tensorboard is not limited by whether a machine has GPU or not.
And as far as I know, what Tensorboard do is parsing events pb files and display them on web. There is not computing, so it doesn't need GPU.
I have previously asked if it is possible to run tensor flow with gpu support on a cpu. I was told that it is possible and the basic code to switch which device I want to use but not how to get the initial code working on a computer that doesn't have a gpu at all. For example I would like to train on a computer that has a NVidia gpu but program on a laptop that only has a cpu. How would I go about doing this? I have tried just writing the code as normal but it crashes before I can even switch which device I want to use. I am using Python on Linux.
This thread might be helpful: Tensorflow: ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory
I've tried to import tensorflow with tensorflow-gpu loaded in the uni's HPC login node, which does not have GPUs. It works well. I don't have Nvidia GPU in my laptop, so I never go through the installation process. But I think the cause is it cannot find relevant libraries of CUDA, cuDNN.
But, why don't you just use cpu version? As #Finbarr Timbers mentioned, you still can run a model in a computer with GPU.
What errors are you getting? It is very possible to train on a GPU but develop on a CPU- many people do it, including myself. In fact, Tensorflow will automatically put your code on a GPU if possible.
If you add the following code to your model, you can see which devices are being used:
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
This should change when you run your model on a computer with a GPU.
I am currently looking into running tensorflow on GPU. I was reading about CUDA pinned memory.
I was not able to find anyway to set this when using tensorflow python library.
Any idea how it can be done?
It's automatically used for any memcpy between CPU and GPU. If you want more sophisticated functionalities, you can write a kernel that explicitly uses them.