Google Compute Engine - GPU seems not available - gpu

I'm new to Google Compute Engine and I'd just like to have a VM with a GPU but I simply can't select one.
I already requested+got these quotas. What am I missing?
NVIDIA P100 GPUs = 1
GPUs (All regions) = 1

In general the GPUs only work with N1 instances as mentioned here.
Instances with GPUs have specific restrictions that make them behave differently than other instance types.
GPUs are currently only supported with general-purpose N1 machine types.

Related

how many 'CPU' cores does google colab assigns when I keep n_jobs=8, is there any way to check that?

I am running Regression tasks in Google colab with GridSearhCV. In parameters I keep n_jobs=8, when I keep it to -1 (to use all possible cores) it uses only 2 cores, so I am assuming that there is a limit there on server end if n_jobs=-1, so i would like to know that how to check how many cores are actually getting used.
If you use the code below, you will see that the multiprocessor in Google colab has 2 cores:
import multiprocessing
cores = multiprocessing.cpu_count() # Count the number of cores in a computer
cores
That is a question that I had too. I've put n_job = 100 in Colab and I've got:
[Parallel(n_jobs=100)]: Using backend LokyBackend with 100 concurrent workers.
This is a surprising because google colab only gives you 2 processors. However, you can always use your own CPU/GPU on colab.

Can I create multiple virtual devices from multiple GPUs in Tensorflow?

I'm using Local device configuration in Tensorflow 2.3.0 currently, to simulate multiple GPU training, and it is working. If I buy another GPU, will I be able to use the same functionality to each GPU?
Right now I have 4 virtual GPUs and one physical GPU. I want to buy another GPU and want to have 2x4 virtual GPUs. I haven't found any information about it, and because I don't have another GPU right now, I can't test it. Is it supported? I'm afraid, it's not.
Yes, you can have additional GPU, since there is no restriction in the number of GPU's you can make use of all the GPU devices you have.
As you can see in the document also which says,
A visible tf.config.PhysicalDevice will by default have a single
tf.config.LogicalDevice associated with it once the runtime is
initialized. Specifying a list of tf.config.LogicalDeviceConfiguration
objects allows multiple devices to be created on the same
tf.config.PhysicalDevice
You can follow this documentation for more details on usage of multiple GPU's.

What do G and C types mean in nvidia-smi?

I have an open issue because I thought that my cuda code wasn't running in my GPU (here). I thougth that because I get a C in the type field of my process when I use nvidia-smi, but I see that my GPU-Util grows when I run my code so now I don't know if it is running in the cpu or gpu. Can someone explain to me what is the meaning of the C or G type, please? I found this: "Displayed as "C" for Compute Process, "G" for Graphics Process, and "C+G" for the process having both Compute and Graphics contexts." but I don't understand if it means that C is for CPU and G for GPU, because I don't know what "compute process" and "graphics process" are, or what differences are between them.
They are both for GPU.
C = compute = CUDA or OpenCL
G = graphics = DirectX or OpenGL
According to the man pages of Ubuntu defined here: https://manpages.ubuntu.com/manpages/precise/man1/alt-nvidia-current-smi.1.html
C = Compute, which defines the processes that use the compute mode of Nvidia GPUs which use CUDA libraries, used in deep learning training and inferencing using Tensorflow-GPU, Pytorch, etc
G = Graphics, which defines the processes that use the graphics mode of Nvidia GPUs used by professional 3D graphics, gnome-shell (Ubuntu's GUI environment), Games, etc for the rendering of graphics or videos
C+G = Compute + Graphics, which defines the processes that use both the contexts defined above.
A developer document for nvidia-smi - NVIDIA System Management Interface program
http://developer.download.nvidia.com/compute/DCGM/docs/nvidia-smi-367.38.pdf
If you want to have a deeper dive into the architectural components of the Nvidia Turing GPU, have a look at the whitepaper #
https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
As a general rule, everyone working on a software stack that is as expansive as ML should have a good understanding of the hardware components they work on.

Where do Workers and Parameter Servers reside in Distributed TensorFlow?

In this post, it was mentioned that:
Also, there's no built-in distinction between worker and ps devices --
it's just a convention that variables get assigned to ps devices, and
ops are assigned to worker devices.
In this post, it was mentioned that:
TL;DR: TensorFlow doesn't know anything about "parameter servers", but
instead it supports running graphs across multiple devices in
different processes. Some of these processes have devices whose names
start with "/job:ps", and these hold the variables. The workers drive
the training process, and when they run the train_op they will cause
work to happen on the "/job:ps" devices, which will update the shared
variables.
Questions:
Do variables in ps reside on the CPU or GPU? Also, are there any performance gains if "/job:ps" resides on CPU or GPU?
Do the lower level libraries decide where to place a variable or operation?
Do variables in ps reside on the CPU or GPU? Also, are there any performance gains if "/job:ps" resides on CPU or GPU?
You can pin ps job to either on of those (with exceptions, see below), but pinning it to GPU is not practical. ps is really a storage of parameters and ops to update it. A CPU device can have a lot more memory (i.e., main RAM) than a GPU and is fast enough to update the parameters as the gradients are coming in. In most cases, matrix multiplications, convolutions and other expensive ops are done by the workers, hence a placement of a worker on a GPU makes sense. A placement of a ps to a GPU is a waste of resources, unless a ps job is doing something very specific and expensive.
But: Tensorflow does not currently have a GPU kernel for integer variables, so the following code will fail when Tensorflow tries to place the variable i on GPU #0:
with tf.device("/gpu:0"):
i = tf.Variable(3)
with tf.Session() as sess:
sess.run(i.initializer) # Fails!
with the following message:
Could not satisfy explicit device specification '/device:GPU:0'
because no supported kernel for GPU devices is available.
This is the case when there's no choice of device for a parameter, and thus for a parameter server: only CPU.
Do the lower level libraries decide where to place a variable or operation?
If I get this question right, node placement rules are pretty simple:
If a node was already placed on a device in a previous run of the graph, it is left on that device.
Else, if the user pinned a node to a device via tf.device, the placer places it on that device.
Else, it defaults to GPU #0, or the CPU if there is no GPU.
Tensorflow whitepaper describes also a dynamic placer, which is more sophisticated, but it's not part of the open source version of tensorflow right now.

How to enable crossfire on AMD Radeon Pro DUO

I am using AMD Radeon Pro duo for my application in opencl.
It has a Dual Fiji GPUs, How can i configure Cross Fire to make them work as one device. I am using clgetdeviceinfo in opencl for checking the device compute units but it's showing 64 for each fiji GPU.
I have total 128 compute units in two GPUS, How to use all of them by using Crossfire.
OpenCL has device fission but not device fusion. Devices can share memory for efficiency but shaders can't be joined.
There are also some functions that can't synchronize between two GPUs yet:
Atomic functions in kernels
Prefetch command(which GPUs global cache?)
clEnqueueAcquireGLObject(which GPU's buffer?)
clCreateBuffer (which device memorry does it choose? we can't choose.)
clEnqueueTask (where does this task go?)
You should partition the encoding work in two pieces and run on both GPUs. This may even need cross-fire to be disabled if drivers have problems with it. This shouldn't be harder than writing a GPGPU encoder.
But you may need to copy data to only one of the devices, then copy half of data to other GPU from that buffer, instead of passing through pci-e twice. The inter-GPU connection must be faster than pci-e.