Using gpu for tensorflow's calculation on raspberry pi - tensorflow

I am developing robot with computer vision on Raspberry Pi 3 with Tensorflow. Can I use gpu for deep learning on raspberry pi 3?

Here is an alternative backend for Keras called plaidml that is not Tensorflow. The major selling feature is a speedup on non-Nvidia graphics cards. It still isn't Tensorflow, but it may be a viable option.
The short answer is no, it isn't possible at this time since Tensorflow leverages Nvidia drivers to power Nvidia GPUs and Raspberry Pi does not have Nvidia hardware.
One of two things have to change for you to have access to GPUs for a small form computing, Tensorflow has to support OpenCl (tracked here), or you have to switch platforms to something that has a Nvidia GPU like this
Sorry to be the bringer of bad news.


Since TensorflowJS can use the GPU via WebGL, why would I need an nVIDIA GPU?

So TensorFlowJS can use WebGL to do GPU computations and train deep learning models. Why isn't this more popular than using CUDA with an nVIDIA GPU? Most people just trying to prototype machine learning models would love to do so on their personal computer, but many of us resort to using expensive cloud services like AWS (although more recently Google Colab helps) for ML training if we don't have a computer with an nVIDIA GPU. I'm sure nVIDIA GPUs are faster than whatever GPU is in my Macbook, but probably any GPU will offer at least an order of magnitude speedup over even a fast CPU and allow for model prototyping, so why aren't well using WebGL GPGPU? There must be a catch I just don't know about.
WebGL backend uses GLSL language to define functions and upload data as shaders - it "works", but you pay huge cost to compile GSLS and upload shaders: warmup time for semi-complex models is immense (we're talking about minutes just to startup). And then memory overhead is 100-200% of what model would normally need - and for larger models, you're GPU memory bound, you don't want to waste that.
Btw, actual inference time once model is warmed up and it fits in memory is ok using WebGL
On the other hand nVidia CUDA libraries provide direct access to GPU, so TF compiled to use them is always going to be much more efficient.
Unfortunately, not many GPU vendors provide libraries like CUDA, so most ML is done on nVidia GPUs
Then there is a next level when you're using TPU instead of GPU - then there is no WebGL to start with
If I select WebGPU with the TFJS benchmark ( it responds with "WebGPU is not supported. Please use Chrome Canary browser with flag "--enable-unsafe-webgpu" enabled...."
So when that's ready will it be competitive with CUDA? On my laptop it is about 15% faster than WebGL on that benchmark.

On an NVIDIA GPU with multiple graphics cards (K80 for example), why does torch.cuda.device_count() return 1?

I ran the following code on a Tesla K80, which as I understand consists of 2 GK210 graphics cards, each with 12GB of on chip ram, connected by something called a PLX switch. I am confused how at the pytorch level, the fact that there are two graphics cards is hidden from the user
import torch
torch.cuda.device_count() # 1
(my hunch is that tensorflow provides this same abstraction)
Follow up questions:
If I am training a model with pytorch, and I run nvidia-smi and see that the GPU is fully utilized, I would assume this means that both GK210's are at 100% utilization. How does pytorch distribute kernels across the two GK210's, and can I have faith that this is being done efficiently? (I.e. being done in a way that minimized data transfer between the two cards). Any resources that explain how this works would be much appreciated.
If I were writing a CUDA application, could I pin a CUDA stream to each card, and explicitly manage data transfers between the two cards?

Train Deep learning Models with AMD

I am currently using Lenovo Ideapad PC with AMD Radeon graphics in it. I am trying to run an image classifier model using convolutional neural networks. The dataset contains 50000 images and it takes too long to train the model. Can someone tell me how can I use my AMD GPU to fasten the process. I think AMD Graphics does not support CUDA. So is there any way around?
PS: I am using Ubuntu 17.10
What you're asking for is OpenCL support, or in more grandiose terms: the democratization of accelerated devices. There seems to be tentative support for OpenCL, I see some people testing it as of early 2018, but it doesn't appear fully baked yet. The issue has been tracked for quite some time here:
You should also be aware of development on XLA, an attempt to virtualize tensorflow over an LLVM (or LLVM-like) virtualization layer making it more portable. It's currently cited as being in alpha as of early 2018.
There isn't yet a simple solution, but these are the two efforts to follow along these lines.

XLA support for custom kernel implementation on Raspberry Pi GPU

I am trying to implement Tensorflow OpKernels on Raspberry Pi3 GPU (QPU) for operations like Conv2D,Pooling,ReLU etc.
The operations are mainly targeted to improve performance during inference and do not care about training (hence back propagation and gradients).
Is using XLA a right approach to achieve this or is there any better way to do?

Recommended GPUs for Tensorflow

I understand that Tensorflow requires (for GPU computation) a GPU with Nvidia Compute Capability >= 3.0. There are many such GPUs to choose from. The gaming oriented GPUs, e.g. GeForce models, are much less expensive than the compute-oriented models, e.g. Tesla. My limited undertanding is that the compute-oriented models may lack video output (not needed for computation) and that the gaming models may be doing 32-bit math instead of 64. Assuming that Tensorflow uses (or prefers) 64-bit, does this mean that the gaming models will not work or will produce deficient results if used with Tensorflow? What attributes should one look for in choosing a GPU to use with Tensorflow?
The GPU-enabled version of TensorFlow has the following requirements:
64-bit Linux
Python 2.7
NVIDIA CUDA® 7.5 (CUDA 8.0 required for Pascal GPUs)
NVIDIA cuDNN v4.0 (minimum) or v5.1 (recommended)
TensorFlow GPU support requires having a GPU card with NVidia Compute Capability >= 3.0. Supported cards include but are not limited to:
NVidia Titan
NVidia Titan X
NVidia K20
NVidia K40
You can see their official docs Tensorflow GPU support
Gaming GPUs can work quite well. You want a very recent GPU with lots of memory and CUDA cores. Most people training neural nets these days on GPU use 32 bit floats.