Trying to download zip file from drive to instance in colab hits quota limit - google-colaboratory

I am trying to download the zip file dataset from drive using
from google.colab import drive
drive.mount('/content/drive')
!cp "/content/drive/Shared Drives/infinity-drive/datasets/coco.zip" ./data
I get the error that one of the quota limits is exceeded for 3 days in consecutively and file copying ends in I/O error.
I went over all the solutions. File size is 18GB. Although I could not understand the directive
Use drive.google.com to download the file.. What does it have to do with triggering limis in colab? For people trying to download file to colab instance and then to their local machine? Any way/
The file is in archive format.
The folder it is in has no any other files in it.
The file is private, although I am not the manager of the shared drive.
I am baffled at exactly which quota I am hitting. It does not say to me but I am 99% sure I should not be hitting any considering I can download the entire thing to my local machine. I can't keep track of exactly what is happening because even within the same day, it is able to copy about 3-5 GB of the file(even the file size is changing).

This is a bug in colab/Drive integration, and is tracked in #1607. See this comment for workarounds.

Related

VideoColorizerColab.ipynb doesn't communicate with google drive links

DownloadError: ERROR: unable to download video data: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
Since yesterday I can't use my google drive files shared links with VideoColorizerColab.ipynb. I get this error above all the time I try to colorize my videos.
Does anyone know what's going on? Thank you, Géza.
You might want to try mounting your Google Drive to your colab and copying the video to the colab rather than using the link to download the video.
The code to mount your google drive to colab is
from google.colab import drive
drive.mount('/content/drive')
After this step, you can use all the content in your Drive as folders in your colab. You can see them in the Files section on the left side of your notebook. You can select a file, right-click and copy path and use the path to do any operation on the file.
This is an example of copying
!cp -r /content/gdrive/My\ Drive/headTrainingDatastructure/eval /content/models/research/object_detection/

Cloud9 deploy hitting size limit for numpy, pandas

I'm building in Cloud9 to deploy to Lambda. My function works fine in Cloud9 but when I go to deploy I get the error
Unzipped size must be smaller than 262144000 bytes
Running du -h | sort -h shows that my biggest offenders are:
/debug at 291M
/numpy at 79M
/pandas at 47M
/botocore at 41M
My function is extremely simple, it calls a service, uses panda to format the response, and sends it on.
What is in debug and how do I slim it down/eliminate it from the deploy package?
How do others use libraries at all if they eat up most of the memory limit?
A brief background to understand the problem root-cause
The problem is not with your function but with the size of the zipped packages. As per AWS documentation, the overall size of zipped package must not exceed greater than 3MB. With that said, if the package size is greater than 3MB which happens inevitably, as a library can have many dependencies, then consider uploading the zipped package to a AWS S3 bucket. Note: even s3 bucket has a size limit of 262MB. Ensure that your package does not exceed this limit. The error message that you have posted, Unzipped size must be smaller than 262144000 bytes is referring to the size of the deployment package aka the libraries.
Now, Understand some facts when working with AWS,
AWS Containers are empty.
AWS containers have a linux kernel
AWS Cloud9 is only an IDE like RStudio or Pycharm. And it uses S3 bucket for saving the installed packages.
This means you'll need to know the following:
the package and its related dependencies
extract the linux-compiled packages from cloud9 and save to a folder-structure like, python/lib/python3.6/site-packages/
Possible/Workable solution to overcome this problem
Overcome this problem by reducing the package size. See below.
Reducing the deployment package size
Manual method: delete files and folders within each library folder that are named *.info and *._pycache. You'll need to manually look into each folder for the above file extensions to delete them.
Automatic method: I've to figure out the command. work in progress
Use Layers
In AWS go to Lambda and create a layer
Attach the S3 bucket link containing the python package folder. Ensure the lambda function IAM role has permission to access S3 bucket.
Make sure the un-zipped folder size is less than 262MB. Because if its >260 MB then it cannot be attached to AWS Layer. You'll get an error, Failed to create layer version: Unzipped size must be smaller than 262144000 bytes

File uploding failed when i was uploding the file on google-compute instance, now it's showing xyz.zip.csupload. Can i resume it or retry it?

Instance operating system is ubuntu 16.04.
I was uploading using the instance upload file option.
File size was 2.24 GB.
I didn't find anything useful on internet.
Thanks
The file "xyz.zip.ccsupload" is the file with the partial upload. Once the upload is complete, then the file will have the proper name. You cannot resume the upload from where it left off. If it fails, then you will have to attempt uploading the file again.
The reason why it failed is most likely due to the file size. Due to the size of the file, I would suggest using the "gcloud compute scp" command to upload the file to the VM instance as documented here.

How to unmount drive in Google Colab and remount to another drive?

I mounted to Google drive account A. Now I want to switch to account B but I cannot do that, because there is no way for me to enter a new authentication key when executing drive.mount().
What I have tried and failed:
restart browser, restart computer
use force_remount=True in drive.mount(), it will only automatically remount account A. Not asking me for new mounting target.
change account A password
change run-time type from GPU to None and back to GPU
open everything in incognito mode
sign out all google accounts
How can I:
forget previous authentication key so it will ask me for a new one?
dismount drive and forget previous authentication key?
I have found the 'Restart runtime...' not to work, and changing permission too much of a hassle.
Luckily, the drive module is equipped with just the function you need:
from google.colab import drive
drive.flush_and_unmount()
You can reset your Colab backend by selecting the 'Reset all runtimes...' item from the Runtime menu.
Be aware, however, that this will discard your current backend.
Another solution for your problem could be to terminate your session and run your code (drive.mount()) again.
Steps:
1) Press "Additional connection options" button. Is the little sign button next to RAM and DISK
2) Select "Manage sessions"
3) Press the "Terminate" button
4) Run again your code (drive.mount()).
Now you will be asked to put your new key.
To force Colab to ask for a new key without waiting or resetting the runtime you can revoke the previous key.
To do this:
go to https://myaccount.google.com/permissions (or manually navigate to Security → Manage third-party access on your Google account page),
on the top right, select your profile image or initial, and then select the account whose drive you want do disconnect from Colab,
select Google Drive File Stream in the Google apps section, then select Remove access.
Executing drive.mount() will now ask for a new key.
Remount does not work when you have recently mounted and unmounted using flush_and_unmount(). The correct steps you should follow is (which worked for me at the time of posting):
After mounting using:
from google.colab import drive
drive.mount('/content/drive')
Unmount using: drive.flush_and_unmount() and you cannot see the 'drive/' folder but TRUST me you should use !rm -rf /content/drive before remounting the drive using:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
And you will again get the authorise request for a new Gmail account.
You can terminate the session in Runtime -> manage session. That should do the work and you can remount the drive again.
Restarting runtimes and removing access did not help. I discovered that the notebook I was using created directories on the mountpoint:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
I had to first remove the subdirectories on the mountpoint. First ensure that your drive is not actually mounted!
!find /content/drive
/content/drive
/content/drive/My Drive
/content/drive/My Drive/Colab Notebooks
/content/drive/My Drive/Colab Notebooks/assignment4
/content/drive/My Drive/Colab Notebooks/assignment4/output_dir
/content/drive/My Drive/Colab Notebooks/assignment4/output_dir/2020-04-05_16:17:15
The files and directories above was accidentally created by the notebook before I've mounted the drive. Once you are sure (are you sure?) your drive is not mounted then delete the subdirectories.
!rm -rf /content/drive
After this, I was able to mount the drive.
Here is an explanation from their FAQs.
Why do Drive operations sometimes fail due to quota?
Google Drive enforces various limits, including per-user and per-file operation count and bandwidth quotas. Exceeding these limits will trigger Input/output error as above, and show a notification in the Colab UI. A typical cause is accessing a popular shared file, or accessing too many distinct files too quickly. Workarounds include:
Copy the file using drive.google.com and don't share it widely so that other users don't use up its limits.
Avoid making many small I/O reads, instead opting to copy data from Drive to the Colab VM in an archive format (e.g. .zip or.tar.gz files) and unarchive the data locally on the VM instead of in the mounted Drive directory.
Wait a day for quota limits to reset.
https://research.google.com/colaboratory/faq.html#drive-quota
Symptom
/content/drive gets auto-mounted without mounting it and not being asked for Enter your authorization code:.
Cached old state of the drive kept showing up.
The actual Google drive content did not show up.
Terminating, restarting, factory resetting revoking permissions, clear chrome cache did not work.
Flush and unmount google.colab.drive.flush_and_unmount() did not work.
Solution
Create a dummy file inside the mount point /content/drive.
Take a moment and make sure the content /content/drive is not the same with that in the Google drive UI.
Run rm -rf /content/drive.
Run google.colab.drive.flush_and_unmount()
From the menu Runtime -> Factory reset runtime.
Then re-run google.colab.drive.mount('/content/drive', force_remount=True) finally asked for Enter your authorization code.
The current code for the drive.mount() function is found at https://github.com/googlecolab/colabtools/blob/fe964e0e046c12394bae732eaaeda478bc5fa350/google/colab/drive.py
It is a wrapper for the drive executable found at /opt/google/drive/drive. I have found that executable accepts a flag authorize_new_user which can be used to force a reauthentication.
Copy and paste the contents of the drive.py file into your notebook. Then modify the call to d.sendline() currently on line 189 to look like this (note the addition of the authorize_new_user flag):
d.sendline(
('cat {fifo} | head -1 | ( {d}/drive '
'--features=max_parallel_push_task_instances:10,'
'max_operation_batch_size:15,opendir_timeout_ms:{timeout_ms},'
'virtual_folders:true '
'--authorize_new_user=True '
'--inet_family=' + inet_family + ' ' + metadata_auth_arg +
'--preferences=trusted_root_certs_file_path:'
'{d}/roots.pem,mount_point_path:{mnt} --console_auth 2>&1 '
'| grep --line-buffered -E "{oauth_prompt}|{problem_and_stopped}"; '
'echo "{drive_exited}"; ) &').format(
d=drive_dir,
timeout_ms=timeout_ms,
mnt=mountpoint,
fifo=fifo,
oauth_prompt=oauth_prompt,
problem_and_stopped=problem_and_stopped,
drive_exited=drive_exited))
Call either the drive module version of flush_and_unmount() or the one you pasted in, and then call your version of mount() to login as a different user!

Failed to download .ckpt weights from Google Colab

I've trained a Tensorflow model on Google Colab, and saved that model in ".ckpt" format.
I want to download the model so I tried to do this:
from google.colab import files
files.download('/content/model.ckpt.index')
files.download('/content/model.ckpt.meta')
files.download('/content/model.ckpt.data-00000-of-00001')
I was able to get meta and index files. However, data file is giving me the following error:
"MessageError: Error: Failed to download: Service Worker Response
Error"
Could anybody tell me how should I solve this problem.
Google Colab doesn't allow downloading files of large sizes (not sure about the exact limit). Possible solutions could be to either split the file into smaller files or can use github to push your files and then download to your local machine.
I just tried with a 17 Mb graph file using the same command syntax with no error. Perhaps a transient problem on Google's servers?
For me it helped to rename the file before download.
I had a file named
26.9766_0.5779_150-Adam-mean_absolute_error#3#C-8-1-....-RL#training-set-6x6.04.hdf5
and renamed it to
model.hdf5
before download, then it worked. Maybe the '-' in the filename caused the error in my case.