I am trying to build a model in DIGITS. I am using CPU only to do the learning.. However, I get a CUDA driver version error although I am not using GPU. What could this problem be? I have attached my solver.prototxt below.
Related
The GPU version of Tensorboard is having certain issues in Colab although the CPU version works alright. I could not find much from the docs though. This is the error
Also, I tried the following for installation
As you can see, I tried with both GPU and non-GPU versions and it does not work till I disable the GPU from runtime. Any help shall be appreciated.
I am using Windows 7. After i tested my GPU in tensorflow, which was awkwardly slowly on a already tested model on cpu, i switched to cpu with:
tf.device("/cpu:0")
I was assuming that i can switch back to gpu with:
tf.device("/gpu:0")
However i got the following error message from windows, when i try to rerun with this configuration:
The device "NVIDIA Quadro M2000M" is not exchange device and can not be removed.
With "nvida-smi" i looked for my GPU, but the system said the GPU is not there.
I restarted my laptop, tested if the GPU is there with "nvida-smi" and the GPU was recogniced.
I imported tensorflow again and started my model again, however the same error message pops up and my GPU vanished.
Is there something wrong with the configuration in one of the tensorflow configuration files? Or Keras files? What can i change to get this work again? Do you know why the GPU is so much slower that the 8 CPUs?
Solution: Reinstalling tensorflow-gpu worked for me.
However there is still the question why that happened and how i can switch between gpu and cpu? I dont want to use a second virtual enviroment.
I am using Google Cloud (4 CPU,15 GB RAM) to host tensorflow serving (branch 0.5.1). The model is a pre-trained Resnet which I imported using Keras and converted to .pb format using SavedModelBuilder. I followed Tensorflow Serving installation and compilation steps as mentioned in the installation docs.Did a bazel build using :
bazel build tensorflow_serving/...
Doing inference on an image from my local machine using a python client, gave me results in approximately 23 secs.This I was able to fine tune a bit by following the advice here. Replaced the bazel build to the below command to use CPU optimization. This brought the response time down to 12 secs.
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma
--copt=-msse4.2 //tensorflow_serving/model_servers:tensorflow_model_server
Other stuff I tried which resulted in no difference to response times..
1. Increased 4 CPU to 8 CPU machine
2. Tried on a GPU Tesla K80 + 4 CPU machine
I haven't tried batch optimization , as I am currently just testing it out with a single inference request. The configuration doesnt user docker or Kubernettes.
Appreciate any pointers which can help in bringing down the inference times. Thanks !
Solved and closing this issue. Now am able to get a sub second prediction time. There were multiple problems.
One was the image upload/download times which was playing a role.
The second was when I was running using the GPU, tensorflow serving wasnt compiling using GPU Support. The GPU issue got resolved using two approaches outlined in these links - https://github.com/tensorflow/serving/issues/318 and https://github.com/tensorflow/tensorflow/issues/4841
I'm running a Python script using GPU-enabled Tensorflow. However, the program doesn't seem to recognize any GPU and starts using CPU straight away. What could be the cause of this?
Just want to add to the discussion that tensorflow may stop seeing a GPU due to CUDA initialization failure, in other words, tensorflow detects the GPU, but can't dispatch any op onto it, so it falls back to CPU. In this case, you should see an error in the log, like this
E tensorflow/stream_executor/cuda/cuda_driver.cc:481] failed call to cuInit: CUDA_ERROR_UNKNOWN
The cause is likely to be the conflict between different processes using GPU simultaneously. When that is the case, the most reliable way I found to get tensorflow working is to restart the machine. In the worst case, reinstalling tensorflow and / or NVidia driver.
See also one more case when GPU suddenly stops working.
From TensorFlow Download and Setup under
Docker installation I see:
b.gcr.io/tensorflow/tensorflow latest 4ac133eed955 653.1 MB
b.gcr.io/tensorflow/tensorflow latest-devel 6a90f0a0e005 2.111 GB
b.gcr.io/tensorflow/tensorflow-full latest edc3d721078b 2.284 GB
I know 2. & 3. are with source code and I am using 2. for now.
What is the difference between 2. & 3. ?
Which one is recommended for "normal" use?
TLDR:
First of all - thanks for Docker images! They are the easiest and cleanest way to start with TF.
Few aside things about images
there is no PIL
there is no nano (but there is vi) and apt-get cannot find it. yes i probable can configure repos for it, but why not out of the box
There are four images:
b.gcr.io/tensorflow/tensorflow: TensorFlow CPU binary image.
b.gcr.io/tensorflow/tensorflow:latest-devel: CPU Binary image plus source code.
b.gcr.io/tensorflow/tensorflow:latest-gpu: TensorFlow GPU binary image.
gcr.io/tensorflow/tensorflow:latest-devel-gpu: GPU Binary image plus source code.
And the two properties of concern are:
1. CPU or GPU
2. no source or plus source
CPU or GPU: CPU
For a first time user it is highly recommended to avoid the GPU version as they can be any where from difficult to impossible to use. The reason is that not all machines have an NVidia graphic chip that meet the requirements. You should first get TensorFlow working to understand it then move onto using the GPU version if you want/need.
From TensorFlow Build Instructions
Optional: Install CUDA (GPUs on Linux)
In order to build or run TensorFlow with GPU support, both Cuda
Toolkit 7.0 and CUDNN 6.5 V2 from NVIDIA need to be installed.
TensorFlow GPU support requires having a GPU card with
NVidia Compute Capability >= 3.5. Supported cards include but are not limited to:
NVidia Titan
NVidia Titan X
NVidia K20
NVidia K40
no source or plus source: no source
The docker images will work without needing the source. You should only want or need the source if you need to rebuild TensorFlow for some reason such as adding a new OP.
The standard recommendation for someone new to using TensorFlow is to start with the CPU version without the source.