How much faster (approx) does Tensorflow run with a GPU? - gpu

I have a Mac, and consequently have been running Tensorflow without GPU support (because it's not official yet). However, there are some hacked together impls that I'm thinking of installing... that is if the performance gains are worth the trouble. How much faster (approximately) would Tensorflow run on a Macbook Pro with GPU support?
Thanks

as a rule of thumb somewhere between 10 and 20 times - I've found just running the standard examples.

To give you an idea of the speed difference, I ran some language modelling code (similar to the PTB example), with a fairly large data set, on 3 different machines with the following results:
Intel Xeon X5690 (CPU only): 1 day, 19 hours
Nvidia Grid K520 (on Amazon AWS): 17 hours
Nvidia Tesla K80: 4 hours

Related

Why are the GPU utilization rates different for the same data?

On two PCs, the exact same data is exported from “Cinem4D” with the “Redshift” renderer.
Comparing the two, one uses the GPU at 100% while the other uses very little (it uses about the same amount of GPU memory).
Cinema4D, Redshift and GPU driver versions are the same.
GPU is RTX3060
64GB memory
OS is windows 10
M2.SSD
The only difference is the CPU.
12th Gen intel core i9-12900K using GPU at 100%
AMD Ryzen 9 5950 16 Core on the other
is.
Why is the GPU utilization so different?
Also, is it possible to adjust the PC settings to use 100%?

Colab pro and GPU availability

I need GPU for my project. Till now I had limited use and used Colab free. Now I think I may need as much as 3 hours a day. Now it says GPU is not available because they are already taken. My question is, what effect does upgrading to Colab pro have on GPU availability? How many hours should I expect to have GPU and are these hours arbitrary chosen by me or not?
I referred Here and There but no good detail about GPU availability is given.
In Their website they tell that these limitations vary and depends on previous usage, and a precise answer might not be even available, so even an approximated answer is welcome.
Thanks.
Yeah.I had the same experience that GPU is not available in colab.
Why not try gpushare.com to run 3090 or 2080ti with free credit.
The platform supports the most popular machine learning frameworks,like TensorFlow and PyTorch,users can be fast to instantiate a VM image.
I think it's appropriate to accelerate your model training.

Hardware for Deep Learning

I have a couple questions on hardware for a Deep Learning project I'm starting, I intend to use pyTorch for Neural Networks.
I am thinking about going for an 8th Gen CPU on a z390 (I'll wait month to see if prices drop after 9th gen CPU's are available) so I still get a cheaper CPU that can be upgraded later.
Question 1) Are CPU cores going to be beneficial would getting the latest Intel chips be worth the extra cores, and if cores on CPU will be helpful, should I just go AMD?
I am also thinking about getting a 1080ti and then later on, once I'm more proficient adding two more 2080ti's, I would go for more but it's difficult to find a board to fit 4.
Question 2) Does mixing GPU's effect parallel processing, Should I just get a 2080ti now and then buy another 2 later. And a part b to this question do the lane speeds matter, should I spend more on a board that doesn't slow down the PCIe slots if you utilise more than one.
Question 3) More RAM? 32GB seems plenty. So 2x16gb sticks with a board that can has 4 slots up to 64gb.
The matter when running multi GPU is also the number of available PCIe lanes. If you may go for up to 4 GPUs, I'd go for AMD Threadrippers for the 64 PCIe lanes.
For machine learning in a general manner, core & thread count is quite important, so TR is still a good option, depending on the budget of course.
Few poeple mention that running an instance on each GPU may be more interesting, if you do so, mising GPUs is not a problem.
32GB of ram seems good, no need to go for 4 sticks if your CPU does not support quad channel indeed.

Gem5 system requirements for decent performance

I have to work with gem5 for my project but was wondering that what hardware configuration i should buy. I owned a "good enough" laptop but sadly it's no longer working reliably, so i would have to stick to some lower end laptop. What minimum priced processor i should buy? Also AMD or Intel? Can't afford an apple laptop either.
Any help is deeply appreciated
To give you an idea, I have high end Lenovo P51 laptop with:
Intel Core i7-7820HQ Processor (8MB Cache, up to 3.90GHz) (4 cores 8 threads)
32GB(16+16) DDR4 2400MHz SODIMM
512GB SSD PCIe TLC OPAL2
Ubuntu 17.10
Then the build time for:
git checkout da79d6c6cde0fbe5473ce868c9be4771160a003b
CC=gcc-6 CXX=g++-6 scons -j"$(nproc)" --ignore-style build/ARM/gem5.opt
is 10 minutes which I consider reasonable.
And a minimalistic ARM Buildroot Linux kernel boot takes:
1 minute 40 seconds on the default simplified AtomicSimpleCPU
10 minutes on the much more realistic --cpu-type=HPI --caches
This laptop is likely more expensive than most Apple laptops however at 2500 dollars. But you are going to be developing professionally, it is a worthy investment.
For hobbyist use however, a midend 1200 dollar laptop should be good enough to get started I believe, considering that:
you won't build from scratch very often, mostly incrementally with scons
you can boot with a simple and fast CPU, make a checkpoint with m5 checkpoint before your benchmark, then restore the checkpoint with a more realistic and slower CPU model: How to switch CPU models in gem5 after restoring a checkpoint and then observe the difference?

Tensorflow: GPU util big difference when setting CUDA_VISIBLE_DIVICES to different values

Linux: Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-38-generic x86_64)
Tensorflow: compile from source, 1.4
GPU: 4xP100
I am trying the new released object detection tutorial training program.
I noticed that there is big difference when I set CUDA_VISIBLE_DEVICES to different value. Specifically, when it is set to "gpu:0", the gpu util is
quite high like 80%-90%, but when I set it to other gpu devices, such as
gpu:1, gpu:2 etc. The gpu util is very low between 10%-30%.
As for the training speed, it seems to be roughly the same, much faster than that when using CPU only.
I just curious how this happens.
As this answer mentions GPU-Util is a measure of usage/business of the computation of each GPU.
I'm not an expert, but from my experience GPU 0 is generally where most of your processes run by default. CUDA_VISIBLE_DEVICES sets the GPUs seen by the processes you run on that bash. Therefore, by setting CUDA_VISIBLE_DEVICES to gpu:1/2 you are making it to run on less busy GPUs.
Moreover, you only reported 1 value, in theory you should have one per GPU; there is the possibility you were only looking at GPU-util for GPU-0 which would of course decrease if you are not using.