Not able to clear Google colab disk space - google-colaboratory

I have been using Google colab for quite sometime. Recently I noticed that 43 GB of disk space is already occupied. Even if I change runtime to TPU 43 GB out of 107 GB remains occupied. I tried factory reset runtime but it doesn't work. Even if I use Google colab from another Google account still 43 GB of disk space is occupied. How to clear the disk space ?

There's some amount of space that's used by the base operating system and libraries. You won't be able to trim this value very much by deleting files since most will be required for normal operation.
If you need a larger disk, consider Colab Pro, which has 2x the disk space.

Related

Where to store a massive training dataset in the desktop (C drive, D drive or external USB)?

I have a 2TB training dataset to run in my desktop PC with 4 nvidia rtx2080ti graphics cards. This entire 2TB dataset is going to be read from the disk at each epoch and there 200 epochs to train (total training time is estimated as two months).
My desktop storage configuration is as follow:
C drive:
4TB Samsung 970 EVO NVMe SSD with 320 MB/sec R/W speed. Windows 10 OS, anaconda environment and all pytorch program files reside in the C drive (there are still 2.5TB free space available).
D drive:
4TB Western Digital HDD with approx. 30MB/sec R/W speed.
External:
4TB portable SSD memory card with USB 3.1 and 240MB/sec R/W speed.
Under this hardware environment, I am contemplating where to store my training dataset (read-only).
C drive is the fastest. However, if I store the training dataset in C drive, then Windows OS, anaconda, all programs scripts and dataset read/write will be congested (??) in the same drive.
HDD in D drive is too slow (20MB/sec) and I am afraid whether this slow I/O might be a bottleneck for training speed.
External SSD might be a good choice but I saw many warnings in the internet that massive data transfer through USB port becomes slower and slower over long time period (eventually got stuck down to 1 ~ 2MB/sec data transfer speed) and also might terminate program without warning due to power and heat issues over time.
I wonder whether there is anyone who experienced similar situation in the past and can give a proper suggestion in this environment.
Another option might be splitting your C drive into 2 partitions, and keep your files in the second partition. This way all the read/writes will be done on the first partition, and the second part will remain untouched/undamaged. Also, you'll be able to easily reinstall windows and be sure that your files in the second partition are untouched.
To split C drive do the following:
Open the disk management by hitting menu button and typing diskmgmt.msc
Select the C drive, right click on it, and hit "Shrink"
Provide the storage size you'd like to cut
After finishing you'll have some empty space like this
Right click on that area and click on "New Simple Volume", and create a new partition.
Assign a letter to the partition, and now it's ready for use.

Google Colab Pro not allocating more than 1 GB of GPU memory

I recently upgraded to colab pro. I am trying to use GPU resources from colab pro to train my Mask RCNN model. I was allocated around 15 GB of memory when I tried to run the model right after I signed up for Pro. However, for some reason, I was allocated just 1 GB of memory from the next morning. Since then, I haven't been allocated more than 1 GB. I was wondering if I am missing something or I perturbed the VM inherent packages. I understand that the allocation varies from day to day, but it's been like this for almost 3 days now. Following attempts have already made to improve, but none seems to work.
I have made sure that GPU and "High-RAM" option is selected.
I have tried restarting runtimes several times
I have tried running other scripts (just to make sure that problem was not with mask rcnn script)
I would appreciate any suggestions on this issue.
GPU info
The high memory setting in the screen controls the system RAM rather than GPU memory.
The command !nvidia-smi will show GPU memory. For example:
The highlighted output shows the GPU memory utilization: 0 of 16 GB.

is it possible to increase the ram in google colab with another way?

When I run this code in google colab
n = 100000000
i = []
while True:
i.append(n * 10**66)
it happens to me all the time. My data is huge. After hitting 12.72 GB RAM, but I don't immediately get to the crash prompt and the option to increase my RAM.
I have just this Your session crashed after using all available RAM. View runtime logs
What is the solution ? Is there another way ?
You either need to upgrade to Colab Pro or if your computer itself has more RAM than the VM for Colab, you can connect to your local runtime instead.
Colab Pro will give you about twice as much memory as you have now. If that’s enough, and you’re willing to pay $10 per month, that’s probably the easiest way.
If instead you want to use a local runtime, you can hit the down arrow next to “Connect” in the top right, and choose “Connect to local runtime
The policy was changed. However, currently, this workaround works for me:
Open and copy this notebook to your drive. Check if you already have 25gb RAM by hovering over the RAM indicator on the top right (this was the case for me). If not, follow the instructions in the colab notebook.
Source: Github
To double the RAM size of Google Colab use this Notebook, it gives a 25GB RAM! Note: set Runtime type to "None" to double RAM, then change it again to GPU or TPU.
https://colab.research.google.com/drive/155S_bb3viIoL0wAwkIyr1r8XQu4ARwA9?usp=sharing
as you said 12GB
this needs a large RAM,
if you need a small increase you can use colab pro
If you need a large increase and using a deep learning framework my advice you should use :
1- the university computer (ACADEMIC & RESEARCH COMPUTING)
2- using a platform like AWS, GCP, etc 3- you may use your very professional computer using GPU (I didn't recommend this)

FileSize Limit on Google Colab

I am working on APTOS Blindness detection challenge datasets from Kaggle. Post uploading the files; when I try to unzip the train images folder ; I get an error of file size limit saying the limited space available on RAM and Disk. Could any one please suggest an alternative to work with large size of image data.
If you get that error while unzipping the archive, it is a disk space problem. Colab gives you about 80 gb by default, try switching runtime to GPU acceleration, aside from better performance during certain tasks as using tensorflow, you will get about 350 gb of available space.
From Colab go to Runtime -> Change runtime type, and in the hardware acceleration menu select GPU.
If you need more disk space, Colab now offers a Pro version of the service with double disk space available in the free version.

Qemu using too much disk

I am using Centos 7 on the host, and with QEMU emulation windows server, I create a 15 GB disk, then after a week or so, the disk goes to 30 GB, is there a way to stop this? I am using snapshots. Is there any windows service that may be creating lots of files, or maybe the HD is being used as RAM to imprive system? from 15 GB to 30 is a lot, and the server's HD is not bigger, so It's a big issue to me. I reinstalled all and turned a lot of things off, but the same thing happens again. Little help here.
You don't mention what kind of disk image you're using, and what its configuration is. I'm guessing from your description that its a qcow2 image, since you mention use of snapshots. Snapshots stored inside the qcow2 image will increase its size beyond that visible to the guest OS. eg if you create a qcow2 image that has a virtual disk size of 15 GB, then it is possible for it to consume much more than 15 GB on the host if you have saved multiple different snapshots in the image and the guest has been writing alot of data between each snapshot. Even when snapshots are deleted, this space consumed by qcow2 won't be released back to the OS normally. So while seeing 30 GB of usage sounds quite large it is not totally unreasonable if using snapshots alot.