Your notebook tried to allocate more memory than is available. It has restarted - pandas

I was getting started with TalkingData AdTracking, my first entry to Kaggle competitions. The first line was pd.read_csv() and I got this error
Your notebook tried to allocate more memory than is available. It has restarted
I thought my code was run in the cloud and that I don't have to worry about the memory requirements. Can someone help me with that?

Yes, kaggle has a memory limit: Its 8GB per kernel, or 20 Min running. It takes a lot of server juice to host such a thing.
There are various solutions for this problem, notably loading and processing the dataset in chunks.
There is this as well. But I have no experience with it.
And of course you can use another cloud platform such as AWS, GCP, Oracle, etc..

Related

is it possible to increase the ram in google colab with another way?

When I run this code in google colab
n = 100000000
i = []
while True:
i.append(n * 10**66)
it happens to me all the time. My data is huge. After hitting 12.72 GB RAM, but I don't immediately get to the crash prompt and the option to increase my RAM.
I have just this Your session crashed after using all available RAM. View runtime logs
What is the solution ? Is there another way ?
You either need to upgrade to Colab Pro or if your computer itself has more RAM than the VM for Colab, you can connect to your local runtime instead.
Colab Pro will give you about twice as much memory as you have now. If that’s enough, and you’re willing to pay $10 per month, that’s probably the easiest way.
If instead you want to use a local runtime, you can hit the down arrow next to “Connect” in the top right, and choose “Connect to local runtime
The policy was changed. However, currently, this workaround works for me:
Open and copy this notebook to your drive. Check if you already have 25gb RAM by hovering over the RAM indicator on the top right (this was the case for me). If not, follow the instructions in the colab notebook.
Source: Github
To double the RAM size of Google Colab use this Notebook, it gives a 25GB RAM! Note: set Runtime type to "None" to double RAM, then change it again to GPU or TPU.
https://colab.research.google.com/drive/155S_bb3viIoL0wAwkIyr1r8XQu4ARwA9?usp=sharing
as you said 12GB
this needs a large RAM,
if you need a small increase you can use colab pro
If you need a large increase and using a deep learning framework my advice you should use :
1- the university computer (ACADEMIC & RESEARCH COMPUTING)
2- using a platform like AWS, GCP, etc 3- you may use your very professional computer using GPU (I didn't recommend this)

Can multiple Colab notebooks share the same Runtime?

In Q1 2019, I ran some experiments and I noticed that Colab notebooks with the same Runtime type (None/GPU/TPU) would always share the same Runtime (i.e., the same VM). For example, I could write a file to disk in one Colab notebook and read it in another Colab notebook, as long as both notebooks had the same Runtime type.
However, I tried again today (October 2019) and it now seems that each Colab notebook gets its own dedicated Runtime.
My questions are:
When did this change happen? Was this change announced anywhere?
Is this always true now? Will Runtimes sometimes be shared and sometimes not?
What is the recommended way to communicate between two Colab notebooks? I'm guessing Google Drive?
Thanks
Distinct notebooks are indeed isolated from one another. Isolation isn't configurable.
For file sharing, I think you're right that Drive is the best bet as described in the docs:
https://colab.research.google.com/notebooks/io.ipynb#scrollTo=u22w3BFiOveA
I have found no easy way of running multiple notebooks within the same runtime. That being said, I have no idea how this effects the quota. On my real computer, I'd limit GPU memory per script and run multiple python threads. They don't let you do this, and I think if you do not use the whole amount of RAM, they should not treat that the same as if you had used all of that GPU for 12 or 24 hrs. They can pool your tasks with other users.

77Giga data load in Google.colab

I have a tar.gz file that contains 77 gigabyte of data and i am trying to load it into my Google.colab. But i get
"Runtime died"
error and then it automaticly restarts. Please can anyone help?
Google colab is only for research purposes or for educational purposes and not for prolonged training. It has limitations most important being memory.
If you run :
!df
You will find that memory that runtime is allocated is about 45-50 GB (47GB, to be precise), you are trying to load 77 GB, don't you expect runtime to die?
If you want to use, try splitting your data into small parts and train on them, delete and reload from g-drive and repeat.
See this answer for more info on runtime hardware
What's the hardware spec for Google Colaboratory?

Shinking JVM memory and Swap

Virtual Machine:
4CPU
10GB RAM
10GB swap
Java 1.7
-Xms=-Xmx=6144m
Tomcat 7
We observed a very strange behaviour with the JVM. The JVm resident memory began to shrink and the swap usage shot up to over 50%.
Please see below stats from monitoring tools.
http://i44.tinypic.com/206n6sp.jpg
http://i44.tinypic.com/m99hl0.jpg
Any pointers to understand this is grateful.
Thanks!
Or maybe your Java program was idle and it didn't need that memory, and you have high swappiness? In such situation your OS would free RAM just in case and leave only used part.
In my opinion, that is actually good behaviour, why should you waste RAM for process that won't use it?
Unless you run only this one process on VM, then it would be quite good idea to set swappiness to 0 or other small number - this memory was given to this single process, so we may disable swapping it.
Thanks for the response. Yes this is more close to a system troubleshooting than Java but I thought this the right forum to initiate this topic incase anybody has seen such a phenomena with JVM.
Anyways, I had already checked the top and no there was no other process than Java which was hungry for memory. Actually the second top process was utilizing 72MB (RSS).
No the swappiness is not aggressive set on this system but at default 60. One additional information I missed to share is we have 4 app servers in cluster and all showed this behaviour exactly at the same time. AFAIK, JVM does not swap out but the OS would. But all of it is what confusing me.
All these app servers are production and busy serving request so not idle. The used Heap size was at Avg 5 GB used of the the 6GB.
The other interesting thing I found out were some failed messages in the Vmware logs at the same time which is what I'm investigating.

Redis taking up all server memory. What to do?

I have a Redis server running version 2.4.5 and with a dump.rdb of 11GB loaded into memory.
It is running on EC2 on a high memory 4x extra large instance (70GB total memory).
However, turns out Redis is already taking up 50GB of memory and is just growing more and more. My dataset is still gonna grow larger, probably to around 20GB, so clearly 70GB memory wont be enough. Do you guys have any ideas on how to overcome this limitation or how to make Redis eat less memory?
I've tried redis 32bit but it dies trying to load the data set into memory at startup.
Have also tried max-memory in the past but got weird results. Haven't tried virtual memory since I read it is/was gonna be deprecated.
Unlike the discussion in the comments, I think this problem can be solved with programming, not server configuration
Systems like redis work well sharded. Once you have your scheme set up, you can get it to scale pretty easily. It does take some work to get it set up though in the client code.
For example...
You could shard it across 4x instances using a modulo/hash scheme.
Basically, if md5sum(key) % 4 == 0, it goes to server 0; if md5sum(key) % 4 == 1, it goes to server 1, etc.
You'll have to add some logic into your client to make sure it accesses the right one. When you get a record, figure out which server it is suppose to be at, then query that one. If you have to set a record, figure out which server it is suppose to be at, then set it in that one.
The nice thing about this is that it doesn't affect your performance.