New to StackOverflow and don't have enough credits to post a comment. So opening a new question.
I am running into the same issue as this:
why tensorflow just outputs killed
In this scenario, does SWAP memory help?
Little more info on the platform:
Ras Pi 3 on Ubuntu Mate 16.04
RAM - 1 GB
Storage - 32 GB SD card
Framework: Tensorflow
Network architecture- similar to complexity of AlexNet.
Appreciate any help!
Thanks
SK
While swap may stave off the hard failure as seen in the linked question, swapping will generally doom your inference. The difference in throughput and latency b/w RAM and any other form of storage is simply far too large.
As a rough estimate, RAM throughput is about 100x that of even high quality SD cards. The factor for I/O operations per second is even larger, at somewhere between 100,000 and 5,000,000.
Related
This is a total newbie question but I've been searching for a couple days and cannot find the answer.
I am using cupy to allocate a large array of doubles (circa 655k rows x 4k columns ) which is about 16Gb in ram. I'm running on p2.8xlarge (the aws instance that claims to have 96GB of GPU ram and 8 GPUs), but when I allocate the array it gives me out of memory error.
Is this happening becaues the 96GB of ram is split into 8x12 GB lots that are only accessible to each GPU? Is there no concept of pooling the GPU ram across the GPUs (like regular ram in multiple CPU situation) ?
From playing around with it a fair bit, I think the answer is no, you cannot pool memory across GPUs. You can move data back and forth between GPUs and CPU but there's no concept of unified GPU ram accessible to all GPUs
I have a couple questions on hardware for a Deep Learning project I'm starting, I intend to use pyTorch for Neural Networks.
I am thinking about going for an 8th Gen CPU on a z390 (I'll wait month to see if prices drop after 9th gen CPU's are available) so I still get a cheaper CPU that can be upgraded later.
Question 1) Are CPU cores going to be beneficial would getting the latest Intel chips be worth the extra cores, and if cores on CPU will be helpful, should I just go AMD?
I am also thinking about getting a 1080ti and then later on, once I'm more proficient adding two more 2080ti's, I would go for more but it's difficult to find a board to fit 4.
Question 2) Does mixing GPU's effect parallel processing, Should I just get a 2080ti now and then buy another 2 later. And a part b to this question do the lane speeds matter, should I spend more on a board that doesn't slow down the PCIe slots if you utilise more than one.
Question 3) More RAM? 32GB seems plenty. So 2x16gb sticks with a board that can has 4 slots up to 64gb.
The matter when running multi GPU is also the number of available PCIe lanes. If you may go for up to 4 GPUs, I'd go for AMD Threadrippers for the 64 PCIe lanes.
For machine learning in a general manner, core & thread count is quite important, so TR is still a good option, depending on the budget of course.
Few poeple mention that running an instance on each GPU may be more interesting, if you do so, mising GPUs is not a problem.
32GB of ram seems good, no need to go for 4 sticks if your CPU does not support quad channel indeed.
I am avaluating Spin using Promela for model checking, but processing time is an issue to me.
I have seen that I can use Multi Core to improve the calculation but what about GPU/Cuda support to speed up the calculations ? Can I do this at all ?
regards
Adrian
GPU support is not included in Spin but is an active area of research. Most SPIN problems that are slow enough to seek a speed up are also large enough to exceed the local memory on a GPU. As a result the CPU memory needs to be used to store the explored state space and then memory bandwidth, CPU <==> GPU, swamp any computational speed increases. If however, your state space is small then the GPU may be amenable to use; yet, Spin does not include such support.
Most of the benchmarks for gpu performance and load testing are graphics related. Is there any benchmark that is computationally intensive but not graphics related ? I am using
DELL XPS 15 laptop,
nvidia GT 525M graphics card,
Ubuntu 11.04 with bumblebee installed.
I want to load test my system to come up with a max load the graphics cards can handle. Are there any non-graphics benchmarks for gpu ?
What exactly do you want to measure?
To measure GFLOPS on the card just write a simple Kernel in Cuda (or OpenCL).
If you have never written anything in CUDA let me know and i can post something for you.
If your application is not computing intensive (take a look at a roofline paper) then I/O will be the bottleneck. Getting data from global (card) memory to the processor takes 100's of cycles.
On the other hand if your application IS compute intensive then just time it and calculate how many bytes you process per second. In order to hit the maximum GFLOPS (your card can do 230) you need many FLOPs per memory access, so that the processors are busy and not stalling for memory and switching threads.
AMD announced it's Fusion platform some time ago. Having read a bit about it I'm both excited and sceptic. For example it should make it possible that GPUs and CPUs share the same memory. (and the GPU and CPU are both in the same package) Now since GPUs have a much higher memory bandwidth (around 10x the bandwidth a CPU has) and that the way CPUs and GPUs use cache is fundamentally different, the question arises how the heck can they do this? I wonder if any details are known.
I have also searched for some detailed info on how this APU technically works, but haven't found anything better than AMD's whitepaper on the subject, which, in a slightly marketing-wise tone, does present a lot of good info.
By using high-bandwidth dual ported RAM. AMD is happy to explain.
Here is a good paper (from AMD) explaining the different memory access patterns in an APU
http://amddevcentral.com/afds/assets/presentations/1004_final.pdf
CPUs and GPUs have been sharing memory for years. Fusion is no different, in fact it's far from the first instance of a GPU being integrated into a general-purpose CPU core.
Like all other such solutions, it'll be fine for casual use, but it'll be far from cutting-edge 3D acceleration.