How to convert data(tensor) across different deep learning frameworks directly in gpu without copying to cpu? - tensorflow

For example, I have a cuda tensor in pytorch, how to directly convert it to mxnet/tensorflow in gpu without copying it to cpu? Frequently copying data between cpu and gpu is very time-consuming.

The fastest way would be to use CUDA API for that.
Sharing tensors via PyTorch is only supported on CPU.
And if you would try to do some Python exhibition you will be slow because of GIL.

Related

Since TensorflowJS can use the GPU via WebGL, why would I need an nVIDIA GPU?

So TensorFlowJS can use WebGL to do GPU computations and train deep learning models. Why isn't this more popular than using CUDA with an nVIDIA GPU? Most people just trying to prototype machine learning models would love to do so on their personal computer, but many of us resort to using expensive cloud services like AWS (although more recently Google Colab helps) for ML training if we don't have a computer with an nVIDIA GPU. I'm sure nVIDIA GPUs are faster than whatever GPU is in my Macbook, but probably any GPU will offer at least an order of magnitude speedup over even a fast CPU and allow for model prototyping, so why aren't well using WebGL GPGPU? There must be a catch I just don't know about.
WebGL backend uses GLSL language to define functions and upload data as shaders - it "works", but you pay huge cost to compile GSLS and upload shaders: warmup time for semi-complex models is immense (we're talking about minutes just to startup). And then memory overhead is 100-200% of what model would normally need - and for larger models, you're GPU memory bound, you don't want to waste that.
Btw, actual inference time once model is warmed up and it fits in memory is ok using WebGL
On the other hand nVidia CUDA libraries provide direct access to GPU, so TF compiled to use them is always going to be much more efficient.
Unfortunately, not many GPU vendors provide libraries like CUDA, so most ML is done on nVidia GPUs
Then there is a next level when you're using TPU instead of GPU - then there is no WebGL to start with
If I select WebGPU with the TFJS benchmark (https://tensorflow.github.io/tfjs/e2e/benchmarks/local-benchmark/index.html) it responds with "WebGPU is not supported. Please use Chrome Canary browser with flag "--enable-unsafe-webgpu" enabled...."
So when that's ready will it be competitive with CUDA? On my laptop it is about 15% faster than WebGL on that benchmark.

GPU support for TensorFlow & PyTorch

Okay, so I've worked on a bunch of Deep Learning projects and internships now and I've never had to do heavy training. But lately I've been thinking of doing some Transfer Learning for which I'll need to run my code on a GPU. Now I have a system with Windows 10 and a dedicated NVIDIA GeForce 940M GPU. I've been doing a lot of research online, but I'm still a bit confused. I haven't installed the NVIDIA Cuda Toolkit or cuDNN or tensorflow-gpu on my system yet. I currently use tensorflow and pytorch to train my DL models. Here are my queries -
When I define a tensor in tf or pytorch, it is a cpu tensor by default. So, all the training I've been doing so far has been on the CPU. So, if I make sure to install the correct versions of Cuda and cuDNN and tensorflow-gpu (specifically for tensorflow), I can run my models on my GPU using tf-gpu and pytorch and that's it? (I'm aware of the torch.cuda.is_available() in pytorch to ensure pytorch can access my GPU and the device_lib module in tf to check if my gpu is visible to tensorflow)(I'm also aware of the fact that tf doesnt support all Nvidia GPUs)
Why does tf have a separate module for GPU support? PyTorch doesnt seem to have that and all you need to do is cast your tensor from cpu() to cuda() to switch between them.
Why install cuDNN? I know it is a high-level API CUDA built for support to train Deep Neural Nets on the GPU. But do tf-gpu and torch use these in the backend while training on the gpu?
After tf == 1.15, did they combine CPU and GPU support all into one package?
First of all unfortunately 940M is a kinda weak GPU for training. I suggest you use Google colab for faster training but of course, it would be faster than the CPU. So here my answers to your four questions.
1-) Yes if you install the requirements correctly, then you can run on GPU. You can manually place your data to your GPU as well. You can check implementations on TensorFlow. In PyTorch, you should specify the device that you want to use. As you said you should do device = torch.device("cuda" if args.cuda else "cpu") then for models and data you should always call .to(device) Then it will automatically use GPU if available.
2-) PyTorch also needs extra installation (module) for GPU support. However, with recent updates both TF and PyTorch are easy to use for GPU compatible code.
3-) Both Tensorflow and PyTorch is based on cuDNN. You can use them without cuDNN but as far as I know, it hurts the performance but I'm not sure about this topic.
4-) No they are still different packages. tensorflow-gpu==1.15 and tensorflow==1.15 what they did with tf2, was making the tensorflow more like Keras. So it is more simplified then 1.15 or before.
Rest was already answered by regarding 3) cudNN optimizes layer and such operations on hardware level and those implementations are pure black magic. It is incredibly hard to write CUDA code that properly utilizes your GPU (how load data into the GPU, how to actually perform them using matrices etc. )

How to optimize your tensorflow model by using TensorRT?

These are the instruction to solve the assignments?
Convert your TensorFlow model to UFF
Use TensorRT’s C++ API to parse your model to convert it to a CUDA engine.
TensorRT engine would automatically optimize your model and perform steps
like fusing layers, converting the weights to FP16 (or INT8 if you prefer) and
optimize to run on Tensor Cores, and so on.
Can anyone tell me how to proceed with this assignment because I don't have GPU in my laptop and is it possible to do this in google colab or AWS free account.
And what are the things or packages I have to install for running TensorRT in my laptop or google colab?
so I haven't used .uff but I used .onnx but from what I've seen the process is similar.
According to the documentation, with TensorFlow you can do something like:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=['logits', 'classes'])
frozen_graph = converter.convert()
In TensorFlow1.0, so they have it pretty straight forward, TrtGraphConverter has the option to serialized for FP16 like:
converter = trt.TrtGraphConverter(
input_saved_model_dir=input_saved_model_dir,
max_workspace_size_bytes=(11<32),
precision_mode=”FP16”,
maximum_cached_engines=100)
See the preciosion_mode part, once you have serialized you can load the networks easily on TensorRT, some good examples using cpp are here.
Unfortunately, you'll need a nvidia gpu with FP16 support, check this support matrix.
If I'm correct, Google Colab offered a Tesla K80 GPU which does not have FP16 support. I'm not sure about AWS but I'm certain the free tier does not have gpus.
Your cheapest option could be buying a Jetson Nano which is around ~90$, it's a very powerful board and I'm sure you'll use it in the future. Or you could rent some AWS gpu server, but that is a bit expensive and the setup progress is a pain.
Best of luck!
Export and convert your TensorFlow model into .onnx file.
Then, use this onnx-tensorrt tool to do the CUDA engine file conversion.

By default, does TensorFlow use GPU/CPU simultaneously for computing or GPU only?

By default, TensorFlow will use our available GPU devices. That said, does TensorFlow use GPUs and CPUs simultaneously for computing, or GPUs for computing and CPUs for job handling (no matter how, CPUs are always active, as I think)?
Generally it uses both, the CPU and the GPU (assuming you are using a GPU-enabled TensorFlow). What actually gets used depends on the actual operations that your code is using.
For each operation available in TensorFlow, there are several "implementations" of such operation, generally a CPU implementation and a GPU one. Some operations only have CPU implementations as it makes no sense for a GPU implementation, but overall most operations are available for both devices.
If you make custom operations then you need to provide implementations that you want.
TensorFlow operations come packaged with a list of devices they can execute on and a list of associated priorities.
For example, a convolution is very conducive to computation on a GPU; but can still be done on a CPU whereas scalar additions should definitely be done on a CPU. You can override this selection using tf.Device and the key attached to the device of interest.
Someone correct me if I'm wrong.
But from what I'm aware TensorFlow only uses either GPU or CPU depending on what installation you ran. For example if you used pip install TensorFlow for python 2 or python3 -m pip install TensorFlow for python 3 you'll only use the CPU version.
Vise versa for GPU.
If you still have any questions or if this did not correctly answer your question feel free to ask me more.

Does Gensim library support GPU acceleration?

Using Word2vec and Doc2vec methods provided by Gensim, they have a distributed version which uses BLAS, ATLAS, etc to speedup (details here). However, is it supporting GPU mode? Is it possible to get GPU working if using Gensim?
Thank you for your question. Using GPU is on the Gensim roadmap. Will appreciate any input that you have about it.
There is a version of word2vec running on keras by #niitsuma called word2veckeras.
The code that runs on latest Keras version is in this fork and branch https://github.com/SimonPavlik/word2vec-keras-in-gensim/tree/keras106
#SimonPavlik has run performance test on this code. He found that a single gpu is slower than multiple CPUs for word2vec.
Regards
Lev