HI ! About Data transmisson ! from gpu to DRAM - gpu

In the modern computer structure, can gpu send direct data to the DRAM? If possible, Is nvlink(nvida gpu) used for data transmission?
thanks!
i hope that you have a good day

Related

Is it possible to use system memory instead of GPU memory for processing Dask tasks

We have been running DASK clusters on Kubernetes for some time. Up to now, we have been using CPUs for processing and, of course, system memory for storing our Dataframe of around 1,5 TB (per DASK cluster, split onto 960 workers). Now we want to update our algorithm to take advantage of GPUs. But it seems like the available memory on GPUs is not going to be enough for our needs, it will be a limiting factor(with our current setup, we are using more than 1GB of memory per virtual core).
I was wondering if it is possible to use GPUs (thinking about NVDIA, AMD cards with PCIe connections and their own VRAMS, not integrated GPUs that use system memory) for processing and system memory (not GPU memory/VRAM) for storing DASK Dataframes. I mean, is it technically possible? Have you ever tried something like this? Can I schedule a kubernetes pod such that it uses GPU cores and system memory together?
Another thing is, even if it was possible to allocate the system RAM as VRAM of GPU, is there a limitation to the size of this allocatable system RAM?
Note 1. I know that using system RAM with GPU (if it was possible) will create an unnecessary traffic through PCIe bus, and will result in a degraded performance, but I would still need to test this configuration with real data.
Note 2. GPUs are fast because they have many simple cores to perform simple tasks at the same time/in parallel. If an individual GPU core is not superior to an individual CPU core then may be I am chasing the wrong dream? I am already running dask workers on kubernetes which already have access to hundreds of CPU cores. In the end, having a huge number of workers with a part of my data won't mean better performance (increased shuffling). No use infinitely increasing the number of cores.
Note 3. We are mostly manipulating python objects and doing math calculations using calls to .so libraries implemented in C++.
Edit1: DASK-CUDA library seems to support spilling from GPU memory to host memory but spilling is not what I am after.
Edit2: It is very frustrating that most of the components needed to utilize GPUs on Kubernetes are still experimental/beta.
Dask-CUDA: This library is experimental...
NVIDIA device plugin: The NVIDIA device plugin is still considered beta and...
Kubernetes: Kubernetes includes experimental support for managing AMD and NVIDIA GPUs...
I don't think this is possible directly as of today, but it's useful to mention why and reply to some of the points you've raised:
Yes, dask-cuda is what comes to mind first when I think of your use-case. The docs do say it's experimental, but from what I gather, the team has plans to continue to support and improve it. :)
Next, dask-cuda's spilling mechanism was designed that way for a reason -- while doing GPU compute, your biggest bottleneck is data-transfer (as you have also noted), so we want to keep as much data on GPU-memory as possible by design.
I'd encourage you to open a topic on Dask's Discourse forum, where we can reach out to some NVIDIA developers who can help confirm. :)
A sidenote, there are some ongoing discussion around improving how Dask manages GPU resources. That's in its early stages, but we may see cool new features in the coming months!

Colab pro and GPU availability

I need GPU for my project. Till now I had limited use and used Colab free. Now I think I may need as much as 3 hours a day. Now it says GPU is not available because they are already taken. My question is, what effect does upgrading to Colab pro have on GPU availability? How many hours should I expect to have GPU and are these hours arbitrary chosen by me or not?
I referred Here and There but no good detail about GPU availability is given.
In Their website they tell that these limitations vary and depends on previous usage, and a precise answer might not be even available, so even an approximated answer is welcome.
Thanks.
Yeah.I had the same experience that GPU is not available in colab.
Why not try gpushare.com to run 3090 or 2080ti with free credit.
The platform supports the most popular machine learning frameworks,like TensorFlow and PyTorch,users can be fast to instantiate a VM image.
I think it's appropriate to accelerate your model training.

Does Swap memory help if my RAM is insufficient?

New to StackOverflow and don't have enough credits to post a comment. So opening a new question.
I am running into the same issue as this:
why tensorflow just outputs killed
In this scenario, does SWAP memory help?
Little more info on the platform:
Ras Pi 3 on Ubuntu Mate 16.04
RAM - 1 GB
Storage - 32 GB SD card
Framework: Tensorflow
Network architecture- similar to complexity of AlexNet.
Appreciate any help!
Thanks
SK
While swap may stave off the hard failure as seen in the linked question, swapping will generally doom your inference. The difference in throughput and latency b/w RAM and any other form of storage is simply far too large.
As a rough estimate, RAM throughput is about 100x that of even high quality SD cards. The factor for I/O operations per second is even larger, at somewhere between 100,000 and 5,000,000.

Getting the most of the GPU in an Embedded Platform

My platform is Ubuntu running ob Exynos4412CPU which has the Mali400GPU. I would like to do some computer vision using OpenCV and OpenGL, I'm also going to do some fragment shaders. My question is what is the fastest way to copy the contents from the GPU to the CPU, which is really slow on my platform using glreadpixels. Is it beneficial to utilize glreadpixels in its own thread or use OpenMP ? Suggestions are welcome please :).
The Exynos 4412 doesn't have separate CPU and GPU memory at the hardware level; it's all the same RAM and physically accessible by both. Thus, there is likely to be some way to access the GPU's portion of the memory directly from the CPU.

AMD Fusion how can CPU and GPU share the same memory?

AMD announced it's Fusion platform some time ago. Having read a bit about it I'm both excited and sceptic. For example it should make it possible that GPUs and CPUs share the same memory. (and the GPU and CPU are both in the same package) Now since GPUs have a much higher memory bandwidth (around 10x the bandwidth a CPU has) and that the way CPUs and GPUs use cache is fundamentally different, the question arises how the heck can they do this? I wonder if any details are known.
I have also searched for some detailed info on how this APU technically works, but haven't found anything better than AMD's whitepaper on the subject, which, in a slightly marketing-wise tone, does present a lot of good info.
By using high-bandwidth dual ported RAM. AMD is happy to explain.
Here is a good paper (from AMD) explaining the different memory access patterns in an APU
http://amddevcentral.com/afds/assets/presentations/1004_final.pdf
CPUs and GPUs have been sharing memory for years. Fusion is no different, in fact it's far from the first instance of a GPU being integrated into a general-purpose CPU core.
Like all other such solutions, it'll be fine for casual use, but it'll be far from cutting-edge 3D acceleration.