Limit RAM Usage to a CPP application - ram

I have 3 GB RAM. Is there a way to allocate only 512MB of RAM to a C++ application?
Else
Is there a way to reduce my RAM to 512MB for a while for testing purpose?
Thanks,
Ashok

Use SetProcessWorkingSetSize()
Sets the minimum and maximum working set sizes for the specified process.

Related

Large cupy array running out of GPU Ram

This is a total newbie question but I've been searching for a couple days and cannot find the answer.
I am using cupy to allocate a large array of doubles (circa 655k rows x 4k columns ) which is about 16Gb in ram. I'm running on p2.8xlarge (the aws instance that claims to have 96GB of GPU ram and 8 GPUs), but when I allocate the array it gives me out of memory error.
Is this happening becaues the 96GB of ram is split into 8x12 GB lots that are only accessible to each GPU? Is there no concept of pooling the GPU ram across the GPUs (like regular ram in multiple CPU situation) ?
From playing around with it a fair bit, I think the answer is no, you cannot pool memory across GPUs. You can move data back and forth between GPUs and CPU but there's no concept of unified GPU ram accessible to all GPUs

CPU and GPU memory sharing

If the (discrete) GPU has its own video RAM, I have to copy my data from RAM to VRAM to be able to use them. But if the GPU is integrated with the CPU (e.g. AMD Ryzen) and shares the memory, do I still have to make copies, or can they both alternatively access the same memory block?
It is possible to avoid copying in case of integrated graphics, but this feature is platform specific, and it may work differently for different vendors.
How to Increase Performance by Minimizing Buffer Copies on IntelĀ® Processor Graphics article describes how to achieve this for Intel hardware:
To create zero copy buffers, do one of the following:
Use CL_MEM_ALLOC_HOST_PTR and let the runtime handle creating a zero copy allocation buffer for you
If you already have the data and want to load the data into an OpenCL buffer object, then use CL_MEM_USE_HOST_PTR with a buffer allocated at a 4096 byte boundary (aligned to a page and cache line boundary) and a total size that is a multiple of 64 bytes (cache line size).
When reading or writing data to these buffers from the host, use clEnqueueMapBuffer(), operate on the buffer, then call clEnqueueUnmapMemObject().
GPU and CPU memory sharing ?
GPU have multiple cores without control unit but the CPU controls the GPU through control unit. dedicated GPU have its own DRAM=VRAM=GRAM faster then integrated RAM. when we say integrated GPU its mean that GPU placed on same chip with CPU, and CPU & GPU used same RAM memory (shared memory ).
References to other similar Q&As:
GPU - System memory mapping
Data sharing between CPU and GPU on modern x86 hardware with OpenCL or other GPGPU framework

Does Swap memory help if my RAM is insufficient?

New to StackOverflow and don't have enough credits to post a comment. So opening a new question.
I am running into the same issue as this:
why tensorflow just outputs killed
In this scenario, does SWAP memory help?
Little more info on the platform:
Ras Pi 3 on Ubuntu Mate 16.04
RAM - 1 GB
Storage - 32 GB SD card
Framework: Tensorflow
Network architecture- similar to complexity of AlexNet.
Appreciate any help!
Thanks
SK
While swap may stave off the hard failure as seen in the linked question, swapping will generally doom your inference. The difference in throughput and latency b/w RAM and any other form of storage is simply far too large.
As a rough estimate, RAM throughput is about 100x that of even high quality SD cards. The factor for I/O operations per second is even larger, at somewhere between 100,000 and 5,000,000.

Is it possible to make CPU work on large persistent storage medium directly without using RAM?

Is it possible to make CPU work on large persistent storage medium directly without using RAM ? I am fine if performance is low here.
You will need to specify the CPU architecture you are interested, most standard architectures (x86, Power, ARM) assume the existence of RAM for their data bus, I am afraid only a custom board for this processors would allow to use something like SSD instead of RAM.
Some numbers comparing RAM vs SSD latencies: https://gist.github.com/jboner/2841832
Also, RAM is there for a reason, to "smooth" access to bigger storage from CPU,
have a look at this image (from https://www.reddit.com/r/hardware/comments/2bdnny/dram_vs_pcie_ssd_how_long_before_ram_is_obsolete/)
As side note, it is possible to access persistent storage without CPU involving (although RAM is still needed), see https://www.techopedia.com/definition/2767/direct-memory-access-dma or https://en.wikipedia.org/wiki/Direct_memory_access

What is the minimum requirement of physical RAM if I want to set JVM virtual RAM 8GB?

Recently I am trying to Integrate mateplus a semantic role labeling tool for my work. It requires upto 8GB JVM virtual RAM for heap memory. can anybody help me out that If I want to do that, what is the minimum requirement of physical RAM.
It will be good to have at least 8GB RAM, but the JVM's are allocating virtual memory, not physical memory. The OS will assign physical memory to virtual memory as necessary and efficient. Modern operating systems can allow virtual memory consumption to significantly exceed physical memory. This may or may not lead to poor performance, depending on the working set size -- roughly how much of the virtual memory is actually accessed.
So in simple words if OS have less physical memory than JVM need it will swap it on hard disk which will affect performance.