77Giga data load in Google.colab - google-colaboratory

I have a tar.gz file that contains 77 gigabyte of data and i am trying to load it into my Google.colab. But i get
"Runtime died"
error and then it automaticly restarts. Please can anyone help?

Google colab is only for research purposes or for educational purposes and not for prolonged training. It has limitations most important being memory.
If you run :
!df
You will find that memory that runtime is allocated is about 45-50 GB (47GB, to be precise), you are trying to load 77 GB, don't you expect runtime to die?
If you want to use, try splitting your data into small parts and train on them, delete and reload from g-drive and repeat.
See this answer for more info on runtime hardware
What's the hardware spec for Google Colaboratory?

Related

is it possible to increase the ram in google colab with another way?

When I run this code in google colab
n = 100000000
i = []
while True:
i.append(n * 10**66)
it happens to me all the time. My data is huge. After hitting 12.72 GB RAM, but I don't immediately get to the crash prompt and the option to increase my RAM.
I have just this Your session crashed after using all available RAM. View runtime logs
What is the solution ? Is there another way ?
You either need to upgrade to Colab Pro or if your computer itself has more RAM than the VM for Colab, you can connect to your local runtime instead.
Colab Pro will give you about twice as much memory as you have now. If that’s enough, and you’re willing to pay $10 per month, that’s probably the easiest way.
If instead you want to use a local runtime, you can hit the down arrow next to “Connect” in the top right, and choose “Connect to local runtime
The policy was changed. However, currently, this workaround works for me:
Open and copy this notebook to your drive. Check if you already have 25gb RAM by hovering over the RAM indicator on the top right (this was the case for me). If not, follow the instructions in the colab notebook.
Source: Github
To double the RAM size of Google Colab use this Notebook, it gives a 25GB RAM! Note: set Runtime type to "None" to double RAM, then change it again to GPU or TPU.
https://colab.research.google.com/drive/155S_bb3viIoL0wAwkIyr1r8XQu4ARwA9?usp=sharing
as you said 12GB
this needs a large RAM,
if you need a small increase you can use colab pro
If you need a large increase and using a deep learning framework my advice you should use :
1- the university computer (ACADEMIC & RESEARCH COMPUTING)
2- using a platform like AWS, GCP, etc 3- you may use your very professional computer using GPU (I didn't recommend this)

Your notebook tried to allocate more memory than is available. It has restarted

I was getting started with TalkingData AdTracking, my first entry to Kaggle competitions. The first line was pd.read_csv() and I got this error
Your notebook tried to allocate more memory than is available. It has restarted
I thought my code was run in the cloud and that I don't have to worry about the memory requirements. Can someone help me with that?
Yes, kaggle has a memory limit: Its 8GB per kernel, or 20 Min running. It takes a lot of server juice to host such a thing.
There are various solutions for this problem, notably loading and processing the dataset in chunks.
There is this as well. But I have no experience with it.
And of course you can use another cloud platform such as AWS, GCP, Oracle, etc..

FileSize Limit on Google Colab

I am working on APTOS Blindness detection challenge datasets from Kaggle. Post uploading the files; when I try to unzip the train images folder ; I get an error of file size limit saying the limited space available on RAM and Disk. Could any one please suggest an alternative to work with large size of image data.
If you get that error while unzipping the archive, it is a disk space problem. Colab gives you about 80 gb by default, try switching runtime to GPU acceleration, aside from better performance during certain tasks as using tensorflow, you will get about 350 gb of available space.
From Colab go to Runtime -> Change runtime type, and in the hardware acceleration menu select GPU.
If you need more disk space, Colab now offers a Pro version of the service with double disk space available in the free version.

Google Colab : Local Runtime use

I was currently using google-colab and on the getting started pages, we see:
Local runtime support Colab supports connecting to a Jupyter runtime
on your local machine. For more information, see our documentation.
So, when I saw the documentation I connected my colab notebook to the local runtime, after the installations,etc by using the connected tab.
And when I access the memory info:
!cat /proc/meminfo
The output is as follows:
MemTotal: 3924628 kB
MemFree: 245948 kB
MemAvailable: 1473096 kB
Buffers: 168560 kB
Cached: 1280300 kB
SwapCached: 20736 kB
Active: 2135932 kB
Inactive: 991300 kB
Active(anon): 1397156 kB
Inactive(anon): 560124 kB
Active(file): 738776 kB
Inactive(file): 431176 kB
Unevictable: 528 kB
Mlocked: 528 kB
Which is the memory info for my pc, so certainly the access from the notebook is to my pc? Then how is it any different from my local jupyter-notebook? Now, I can't use the high memory environment of 13 Gigs, nor can I have GPU access.
Would be great if someone can explain!
The main advantages to using Colab with a local backend stem from Drive-based notebook storage: Drive commenting, ACLs, and easy link-based sharing of the finished notebook.
When using Jupyter, sharing notebooks requires sharing files. And, accessing your notebooks from a distinct machine requires installing Jupyter rather than loading a website.
The only benefit is to keep your notebooks in Google Drive.
you can share them easily
you have automatic history/versioning
people can comment on your notebooks
You also have headings with collapsible outline, and probably cleaner UI (if you prefer Colab styling).
TLDR - the short answer is that it's not any different
But, here's an analogy that might help better explain what the point of that is:
Let's pretend Google Colab was something like a video gaming streaming service that enabled users with low-end equipment to play high end graphic demanding video games by hosting the game on their systems. It would make sense if say, we don't have a high end gaming PC or a very strong laptop and we wanted to play a new game that just came out with very high system requirements (which ours barely meets if at all) then naturally we may want to use this streaming service, let's call it Stadia for fun, to play that new game because it lets us play it at 30FPS on 720p resolution for example, whereas maybe using our own computer might give us barely 15 fps at 480p. They would be the people who represent people Like you and I, who want to benefit from the game being run on another system which in this case, would be equivalent to how we want Google Colab to run our iterations on their system. So for us, it wouldn't make sense to have Stadia run locally and use our system resources because there's no benefit in that, even if our saved games were stored locally.
But then there are others, who have high end pc and graphics cards installed, with much better components and resources available to them and let's say they also want to play the same game. Now they could use the same streaming service as us and play at 720p, but since their computer is more powerful and can actually handle let's say the game at 60 FPS on 4k graphics, then they may want to run the game off their own system resources instead of using the streaming service such as Stadia. But that would normally mean, they'd have to get a hardcopy of the game to install it locally on their system and play it that way. And for the sake of the example, let's just pretend it was download only and would require 2 terabytes to install.
So then, if we pretend that stadia had an ability to save them from having download and install the game while still using their systems' resources to provide better graphics while they play, then that would be the case for how or why Colab connecting to a local runtime would serve as a desirable feature for someone. Sharing colab notebooks would be like sharing a game in our theoretical version of stadia, where users wouldn't have to download and install anything so any time there is any update or changes, users would immediately be able to use that new updated version without downloading anything because the actual code (or game install in our metaphor) is run remotely.
Sometimes it's hard to understand things that weren't designed for our use when it contradicts the value which base our decision to use them. Hopefully that helps someone who stumbles across this understand the purpose of it, at least in principle.

ImageResizer crashing on large images

Im using the awesome ImageResizing component and am experiencing an "Out of memory" error when trying to upload and read images that are about 100MB in size. It may seem large, but we're a printers so many people do need to provide images of that size.
The line of code that fails is:
ImageResizer.ImageBuilder.Current.Build(Server.MapPath(strImagePath), Server.MapPath(strThumbPath), new ResizeSettings("maxheight=" + "150"+ "&maxwidth=" + "238"));
This is probably the GDI itself failing, but is there any workaround other than detecting the error occured and letting the user know?
Thanks in advance
Al
A 100MB jpeg generally decompresses to around 8 gigabytes in bitmap form. Your only chance of getting that to work is getting 16 GB of RAM and running the process in 64-bit mode.
Alternatively, you could try libvips - it's designed for gigantic image files. There's no .NET wrapper yet, but I really want to make one and get some ImageResizer integration going! Of course, without anyone interested in funding that, it probably won't happen for a while....
As mentioned by Lilith River, libvips is capable of resizing large images with low memory needs. Fortunately, there is now a full libvips binding for .NET available: https://github.com/kleisauke/net-vips/.
It should be able to process 100MB jpeg files without any problems.