How to choose to use GPU with tf.device() in tensorflow - tensorflow

I run tensorflow example from models/cifar10/cifar10_train.py, in the example script, they said it will use one GPU, but I did find they define to use GPU like with tf.device('/gpu:0'):
and they define store some intermediate variable with CPU, so I wander how does tensorflow handle the GPU and CPU? To use GPU is the default choice???
Here is the link for the example:
https://www.tensorflow.org/versions/r0.10/tutorials/deep_cnn/index.html
Any advice will be appreciated,
Thanks in advance
Good day

Related

Tensorflow profiling for non-model computations

I have a computation which has for loops and calls to Tensorflow matrix algorithms such as tf.lstsq and Tensorflow iteration with tf.map_fn. I would like to profile this to see how much parallelism I am getting in the tf.map_fn and matrix algorithms that get called.
This doesn't seem to be the use case at all for the Tensorflow Profiler which is organized around the neural network model training loop.
Is there a way to use Tensorflow Profiler for arbitrary Tensorflow computations, or is the go-to move in this case to use NVidia tools like nvprof?
I figured out that the nvprof and nvvp and nsight tools I was looking for are available as a Conda install of cudatoolkit-dev. Uses are described in this gist.

Can a model trained using a GPU be used for inference on a CPU?

I want to run inference on the CPU; although my machine has a GPU.
I wonder if it's possible to force TensorFlow to use the CPU rather than the GPU?
By default, TensorFlow will automatically use GPU for inference, but since my GPU is not good (OOM'ed), I wonder if there's a setting to force Tensorflow to use the CPU for inference?
For inference, I used:
tf.contrib.predictor.from_saved_model("PATH")
Assuming you're using TensorFlow 2.0, please check out this issue on GitHub:
[TF 2.0] How to globally force CPU?
The solution seems to be to hide the GPU devices from TensorFlow. You can do that using one of the methodologies described below:
TensorFlow 2.0:
my_devices = tf.config.experimental.list_physical_devices(device_type='CPU')
tf.config.experimental.set_visible_devices(devices= my_devices, device_type='CPU')
TensorFlow 2.1:
tf.config.set_visible_devices([], 'GPU')
(Credit to #ymodak and #henrysky, who answered the question on the GitHub issue.)

By default, does TensorFlow use GPU/CPU simultaneously for computing or GPU only?

By default, TensorFlow will use our available GPU devices. That said, does TensorFlow use GPUs and CPUs simultaneously for computing, or GPUs for computing and CPUs for job handling (no matter how, CPUs are always active, as I think)?
Generally it uses both, the CPU and the GPU (assuming you are using a GPU-enabled TensorFlow). What actually gets used depends on the actual operations that your code is using.
For each operation available in TensorFlow, there are several "implementations" of such operation, generally a CPU implementation and a GPU one. Some operations only have CPU implementations as it makes no sense for a GPU implementation, but overall most operations are available for both devices.
If you make custom operations then you need to provide implementations that you want.
TensorFlow operations come packaged with a list of devices they can execute on and a list of associated priorities.
For example, a convolution is very conducive to computation on a GPU; but can still be done on a CPU whereas scalar additions should definitely be done on a CPU. You can override this selection using tf.Device and the key attached to the device of interest.
Someone correct me if I'm wrong.
But from what I'm aware TensorFlow only uses either GPU or CPU depending on what installation you ran. For example if you used pip install TensorFlow for python 2 or python3 -m pip install TensorFlow for python 3 you'll only use the CPU version.
Vise versa for GPU.
If you still have any questions or if this did not correctly answer your question feel free to ask me more.

data generator with tensorflow on the gpu

I am making a neural network using tensorflow and I ran into a problem trying to use a generator to split my data up, basically it's too slow.
My training data consists of 52x52 numpy arrays. I need to split each array into a 52x52x3 array before I input it into my NN. As mentioned I have a generator working that does this, but I noticed that even though my NN is running on the GPU my GPU usage is very low (under 10% usually). I think this might be caused by me doing the generator on the CPU.
Is there any way of running my generator on the GPU?
What I tried:
- I thought of trying to use pyCUDA in order to program the generator on the GPU but found that tensorflow and pyCUDA don't support each other
-I tried using the from_generator function from the Dataset API as mentioned here:
https://www.tensorflow.org/api_docs/python/tf/contrib/data/Dataset
But while having issues with it I ran into this github thread mentioning that this function isn't supported to run on the GPU anyway:
https://github.com/tensorflow/tensorflow/issues/13610
Any help would be greatly appreciated.

Does tensorflow automatically detect GPU or do I have to specify it manually?

I have a code written in tensorflow that I run on CPUs and it runs fine.
I am transferring to a new machine which has GPUs and I run the code on the new machine but the training speed did not improve as expected (takes almost the same time).
I understood that Tensorflow automatically detects GPUs and run the operations on them (https://www.quora.com/How-do-I-automatically-put-all-my-computation-in-a-GPU-in-TensorFlow) & (https://www.tensorflow.org/tutorials/using_gpu).
Do I have to change the code to make it manually runs the operations on GPUs (for now I have a single GPU)? and what would be gained by doing that manually?
Thanks
If the GPU version of TensorFlow is installed and if you don't assign all your tensors to CPU, some of them should be assigned to GPU.
To find out which devices (CPU, GPU) are available to TensorFlow, you can use this:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
Regarding the question of the performance, it's quite a broad subject and it really depends of your model, your data and so on. Here are a few and wide remarks on TensorFlow performance.