Unable to allocate memory for an array - pandas

I have a system with 8GB RAM, 0.5 TB HD on which I'm trying to load a CSV file of 1.2 GB using Jupyter Notebook. I am getting the following error:
Unable to allocate 64.0 KiB for an array with shape (8192,) and data type int64
Is there any way as to how can I load my file without the notebook crashing down?

Related

Memory related error when uploading images in Laravel

I am currently using Laravel 8, and I am using AWS LightSail to operate the service.
Among the services, there is a function to upload images of up to 50 MB. When using this function, the following error sometimes occurs.
Currently, php is set to 512M, and Lightsail is using 16 GB of RAM. Any workaround for this error?
error:
[2022-11-20 19:02:43] local.ERROR: Allowed memory size of 536870912
bytes exhausted (tried to allocate 36864 bytes)
{"userId":13111,"exception":"[object]
(Symfony\Component\ErrorHandler\Error\FatalError(code: 0): Allowed
memory size of 536870912 bytes exhausted (tried to allocate 36864
bytes) at
/var/www/project/vendor/intervention/image/src/Intervention/Image/Gd/Decoder.php:154)
[stacktrace]
Which method should I use?

How to use GPU in Paperspace when running Transformers Pipeline?

We are trying to run HuggingFace Transformers Pipeline model in Paperspace (using its GPU).
The problem is that when we set 'device=0' we get this error:
RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 15.90 GiB total capacity; 476.40 MiB already allocated; 7.44 MiB free; 492.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
And if we don't set 'device=0' then GPU doesn't work (which is OK because default option is not to use it).
How to make GPU work for these models in Paperspace?
The code is pretty simple so far:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
model="Sahajtomar/German_Zeroshot"
#,device = 0
)
On Google Colab with the same model and same code we didn't have this problem. We just set 'device=0' and we got GPU running perfectly.

pandas, problem with sklearn's KNNImputer: MemoryError: Unable to allocate 2.37 PiB for an array with shape (2567655, 130060533) and data type float64

I have data with the shape 130060533 rows × 4 columns, and I am trying to run KNNImputer(n_neighbors=2) from sklearn.impute, but I get this message:
MemoryError: Unable to allocate 2.37 PiB for an array with shape
(2567655, 130060533) and data type float64
I tried reducing the size of the table by using the int8 type, but still no change.
I am running this on Azure ML Studio on a Jupyter notebook on a computer with 64-core and 128GB RAM.
I also checked and I am using python 64 bit.
Do you have any suggestions?
thanks

Dask-Rapids data movment and out of memory issue

I am using dask (2021.3.0) and rapids(0.18) in my project. In this, I am performing preprocessing task on the CPU, and later the preprocessed data is transferred to GPU for K-means clustering. But in this process, I am getting the following problem:
1 of 1 worker jobs failed: std::bad_alloc: CUDA error: ~/envs/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory
(before using GPU memory completely it gave the error i.e. it is not using GPU memory completely)
I have a single GPU of size 40 GB.
Ram size 512 GB.
I am using following snippet of code:
cluster=LocalCluster(n_workers=1, threads_per_worker=1)
cluster.scale(100)
##perform my preprocessing on data and get output on variable A
# convert A varible to cupy
x = A.map_blocks(cp.asarray)
km =KMeans(n_clusters=4)
predict=km.fit_predict(x).compute()
I am also looking for a solution so that the data larger than GPU memory can be preprocessed, and whenever there is a spill in GPU memory the spilled data is transferred into temp directory or CPU (as we do with dask where we define temp directory when there is a spill in RAM).
Any help will be appriciated.
There are several ways to run larger than GPU datasets.
Check out Nick Becker's blog, which has a few methods well documented
Check out BlazingSQL, which is built on top of RAPIDS and can perform out of core processings. You can try it at beta.blazingsql.com.

Loading large set of images kill the process

Loading 1500 images of size (1000,1000,3) breaks the code and throughs kill 9 without any further error. Memory used before this line of code is 16% of system total memory. Total size of images direcotry is 7.1G.
X = np.asarray(images).astype('float64')
y = np.asarray(labels).astype('float64')
system spec is:
OS: macOS Catalina
processor: 2.2 GHz 6-Core Intel Core i7 16 GB 2
memory: 16 GB 2400 MHz DDR4
Update:
getting the bellow error while running the code on 32 vCPUs, 120 GB memory.
MemoryError: Unable to allocate 14.1 GiB for an array with shape (1200, 1024, 1024, 3) and data type float32
You would have to provide some more info/details for an exact answer but, assuming that this is a memory error(incredibly likely, size of the images on disk does not represent the size they would occupy in memory, so that is irrelevant. In 100% of all cases, the images in memory will occupy a lot more space due to pointers, objects that are needed and so on. Intuitively I would say that 16GB of ram is nowhere nearly enough to load 7GB of images. It's impossible to tell you how much you would need but from experience I would say that you'd need to bump it up to 64GB. If you are using Keras, I would suggest looking into the DirectoryIterator.
Edit:
As Cris Luengo pointed out, I missed the fact that you stated the size of the images.