How to free allocated GiB memory in EC2 CUDA GPU instance - gpu

I have a p3.2xlarge instance, I ran a couple of experiments on the instance, and now that I want to run a new experiment (deep learning) I get the following error:
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 15.78 GiB total capacity; 14.70 GiB already allocated; 34.44 MiB free; 14.76 GiB reserved in total by PyTorch)
I wonder if there's any way that I can free the allocated memories so that I can run my experiment? Is freeing the memories even the solution for such an error?

Related

How to use GPU in Paperspace when running Transformers Pipeline?

We are trying to run HuggingFace Transformers Pipeline model in Paperspace (using its GPU).
The problem is that when we set 'device=0' we get this error:
RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.90 GiB total capacity; 476.40 MiB already allocated; 7.44 MiB free; 492.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
And if we don't set 'device=0' then GPU doesn't work (which is OK because default option is not to use it).
How to make GPU work for these models in Paperspace?
The code is pretty simple so far:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
model="Sahajtomar/German_Zeroshot"
#,device = 0
)
On Google Colab with the same model and same code we didn't have this problem. We just set 'device=0' and we got GPU running perfectly.

No additional memory from Colab Pro+

I've been running this notebook with the Runtime Type as "high-RAM" "GPU." I was getting the following error:
CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 15.90 GiB total capacity; 14.81 GiB already allocated; 31.75 MiB free; 14.94 GiB reserved in total by PyTorch)
So I upgraded from Pro to Pro+, because that's supposed to give me more memory, but I'm still getting the same error.
I don't think a better GPU was promised with Colab Pro+.

aws gpu oom issue onnx cuda

Doing predictions on AWS GPU instance g4dn.4xlarge(16gb gpu memory,64 gb cpu mem) and deployed with k8s & dockers.
Tested with (cuda10.1 + onnxruntime-gpu==1.4.0 ) and (cuda10.2 + onnxruntime-gpu==1.6.0) same error.Models are customised for our purpose,cant point to weights.
Problem is :
Getting cuda oom(out of memory) error:
Error: onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Conv node. Name:'Conv_16' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:298 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool) Failed to allocate memory for requested buffer of size 33554432
On some backtracking:
Using nvidia-smi commands and GPU memory profiling, found for the 1st prediction and for next all predictions a constant GPU memory of ~1.8GB minimum for some models ~ 3 GB is blocked for some (I think it's blocked for multiprocess ). Releasing mem doesnt make sense , coz for next prediction same amount of mem will be blocked.
My understanding:
So at the peak, we are scaling up to 22 pods & in every pod, the model load is initialized, and hence every pod is blocking 1.8 ~ 3gb of memory & pointing to 1 GPU instance of 16 GB GPU memory.So, with 22 pods, oom is expected.
What is confusing:
Above cuda message throws oom, but gpu profiling shows memory utilisation is never more than 50% , though SM(Streaming multiprocessing) is 100% at peak(when pods scaled to 22).Attached image for refernce.
On research I understood that SM has nothing to do with oom and cuda would handle sm efficiently. Then why getting cuda oom error if only 50% mem is utilised?
Ruled out.
I ruled out memory leak from model , as it runs w/o oom error when load is low.
Why GPU and not CPU for prediction.
Want faster predictions. Ran on CPU w/o any error ,even on high load.
What I am looking for:
A solution to scale AWS GPU instances on the basis of GPU memory.If oom is reason ,scaling on GPU mem should solve problem.I can't find.
Understanding cuda msg , when mem is available why oom ?
Being very hypothetical. If there is a way to create singleton object by design or using k8s for particular model load and saled up pods can utilise that model load object for prediction rather than creating new server. BUt that would kill sense or using k8s for availabilty & scalabilty.

Loading large set of images kill the process

Loading 1500 images of size (1000,1000,3) breaks the code and throughs kill 9 without any further error. Memory used before this line of code is 16% of system total memory. Total size of images direcotry is 7.1G.
X = np.asarray(images).astype('float64')
y = np.asarray(labels).astype('float64')
system spec is:
OS: macOS Catalina
processor: 2.2 GHz 6-Core Intel Core i7 16 GB 2
memory: 16 GB 2400 MHz DDR4
Update:
getting the bellow error while running the code on 32 vCPUs, 120 GB memory.
MemoryError: Unable to allocate 14.1 GiB for an array with shape (1200, 1024, 1024, 3) and data type float32
You would have to provide some more info/details for an exact answer but, assuming that this is a memory error(incredibly likely, size of the images on disk does not represent the size they would occupy in memory, so that is irrelevant. In 100% of all cases, the images in memory will occupy a lot more space due to pointers, objects that are needed and so on. Intuitively I would say that 16GB of ram is nowhere nearly enough to load 7GB of images. It's impossible to tell you how much you would need but from experience I would say that you'd need to bump it up to 64GB. If you are using Keras, I would suggest looking into the DirectoryIterator.
Edit:
As Cris Luengo pointed out, I missed the fact that you stated the size of the images.

Allocating GPU memory for cupy arrays

I have a tensorflow session running in parallel to this cupy code. I have allocated 8 Gb out of 16 Gb of my total gpu memory to the tensorflow session. What I want now is to allocate 2 Gb from the rest of 7 Gb for executing this cupy code. The actual code is more involved than the example code I provided. In my actual code, cp_arr is a result of a series of array operations. But I want cp_array to be allocated in the specified 2 Gb space of my gpu memory. Remember, freeing the gpu resources by closing out tensorflow session is not an option.
This is the code I am using.
memory = cp.cuda.Memory(2048000000)
ptr = cp.cuda.MemoryPointer(memory,0)
cp_arr = cp.ndarray(shape=(30,1080,1920,3),memptr=ptr)
cp_arr = ** Array operations **
In this case, an additional memory of 1.7 GB was allocated when 'cp_arr = ** Array operations **' is being executed. What I want is to make use of the allocated space of 2 GB to hold my cupy array, cp_arr. Thanks in advance.
Memory allocation behavior of CuPy is similar to that of NumPy.
Like as in NumPy, several functions support out argument which can be used to store the computation results into specified array. e.g., https://docs-cupy.chainer.org/en/stable/reference/generated/cupy.dot.html