file download from google drive to colaboratory - google-colaboratory

I was trying to download file from my google drive to colaboratory.
file_id = '1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
doing so am getting this error:
NameError: name 'drive_service' is not defined
How to remove this error?

No installing/importing any library. Just put your file id at the end.
!gdown yourFileIdHere
Note: at the time of writing gdown library is preinstalled on Colab.

the easiest method to download a file from google drive to colabo notebook is via the colabo api:
from google.colab import drive
drive.mount('/content/gdrive')
!cp '/content/gdrive/My Drive/<file_path_on_google_drive>' <filename_in_colabo>
Remarks:
By the drive.mount(), you can access any file on your google drive.
'My Drive' is equivalent to 'Google Drive' on your local file system.
The file_path is surrounded with single quotes as the standard directory below the mount point ('My Drive') has a space, and you might also have spaces in your path elsewhere anyway.
Very useful to locate your file and get the file path is the file browser (activated by click on left arrow). It lets you click through your mounted folder structure and copy the file path, see image below.

Here's an easy way to get by. You may either use wget command or requests module in Python to get the job done.
# find the share link of the file/folder on Google Drive
file_share_link = "https://drive.google.com/open?id=0B_URf9ZWjAW7SC11Xzc4R2d0N2c"
# extract the ID of the file
file_id = file_share_link[file_share_link.find("=") + 1:]
# append the id to this REST command
file_download_link = "https://docs.google.com/uc?export=download&id=" + file_id
The string in file_download_link can be pasted in the browser address bar to get the download dialog box directly.
If you use the wget command:
!wget -O ebook.pdf --no-check-certificate "$file_download_link"

Step 1
!pip install -U -q PyDrive
Step 2
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
Step3
file_id = '17Cp4ZxCYGzWNypZo1WPiIz20x06xgPAt' # URL id.
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('shaurya.txt')
Step4
!ls #to verify content
OR
import os
print(os.listdir())

You need to define a drive API service client to interact with the Google drive API, for instance:
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
(see the notebook External data: Drive, Sheets, and Cloud Storage/Drive REST API)

I recommend you use Pydrive to download your file from google drive. I download 500MB dataset for 5s.
1. install Pydrive
!pip install PyDrive
2. OAouth
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
3. code for download file from google drive
fileId = drive.CreateFile({'id': 'DRIVE_FILE_ID'}) #DRIVE_FILE_ID is file id example: 1iytA1n2z4go3uVCwE_vIKouTKyIDjEq
print fileId['title'] # UMNIST.zip
fileId.GetContentFile('UMNIST.zip') # Save Drive file as a local file
Cheer Mo Jihad

The --id argument has been deprecated so now you simply have to run:
! gdown 1uBtlaggVyWshwcyP6kEI-y_W3P8D26sz
If your file is stored in a variable you can run:
! gdown $my_file_id

You can also use my implementations on google.colab and PyDrive at https://github.com/ruelj2/Google_drive which makes it a lot easier.
!pip install - U - q PyDrive
import os
os.chdir('/content/')
!git clone https://github.com/ruelj2/Google_drive.git
from Google_drive.handle import Google_drive
Gd = Google_drive()
Gd.load_file(local_dir, file_ID)

You can simply copy all your google drive files and folders inside google colab and use it directly using this command
# import drive
from google.colab import drive
drive.mount('/content/drive')
This will ask you for permission after accepting you will have all your google drive inside your colab
If you want to use any file just copy the path

Related

upload file to another Drive Google colab

I created a script generating me a pdf file on a daily basis. It is then sent to the user's drive using a simple drive.mount('/content/drive')
In order to ensure the use of the script, I would like all the pdf files created to be sent to a specific file stored on a specific drive.
Is there a way to specify this path and this drive account after having sent it to the base drive?
I found very few topics going in this direction and not being very clear on the possibility of uploading files to another drive.
If you have any ideas, I'm all ears!
Good day to you all !
I finally found a solution through the PyDrive library.
Here is an example for the curious little ones ;)
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'ID folder'}]})
gfile.SetContentFile("file path")
gfile.Upload()
Once the imports and the creation of the gauth variable have been executed, you will be asked for permissions on your account (don't be surprised)
Hope this helps anyone in need !

TypeError: 'module' object is not callable in Google Colab

I am getting the error when I run this code in google Colab
folders = glob('/content/drive/MyDrive/Data/train/*')
[![enter image description here]
You tried to use glob method directly, take the following code sample which was tested in Google colab environment. It should help to fix your issue.
from google.colab import drive
drive.mount('/gdrive')
import glob
path = glob.glob("/gdrive/MyDrive/*")
for file_or_folder in path:
print(file_or_folder)
For this issue, the path should be defined differently.
For that, initially import glob and drive module as mentioned below
from google.colab import drive
drive.mount('/gdrive')
import glob
And then using this, you can easily access the file as follows
folders = glob.glob('/gdrive/MyDrive/Datasets/Tomato/train/*')
Here if you see the difference in the copied path and the defined path is
'/content/drive/MyDrive/Datasets/Tomato/test' is what coppid form the drive
'/gdrive/MyDrive/Datasets/Tomato/train/*' is how you need to define the path

bug (resolved) with "Mount drive" web-button in colab. Accessing "shared with me" files from google colab (y2020, previous solutions seem to fail)

[ Newer Edit]: colab team reported that they corrected the issue on May 27 2020.
I have checked - it works Okay for me now.
Link to issue: https://github.com/googlecolab/colabtools/issues/1205
==================================================================
[New Edit:] It became clear that problem below arises ONLY if mount the google drive to colab by via web interface button "Mount Drive"
and does NOT appear if mount by command line way.
So seems web way is bugged. See details in my own answer below.
It is checked for "Chrome" browser.
==================================================================
[Original question:]
How to access "shared with me" from google colab ? (Interface seems changed now (2020) and previously described solutions does not seem to work).
More details:
The question has been asked several times, and
the solutions described e.g. here : https://stackoverflow.com/a/53887376/625396
The problem that I do not see "Add to My Drive" , but see "Add shortcut to Drive".
After doing it, we can see that via web-interface for google drive, that shortcut indeed appears.
BUT that shortcut canNOT be seen via colab utilities, like
os.listdir() !
So shortcut seems to be invisible for colab, and not clear how to access it.
Below are the screenshot, showing that colab does not see the shortcut to "shared with me"-"cytotrace_datasets", but web-gui of google drive can see.
Here is screenshot what I see by colab (shortcut canNOT be seen):
Here is screenshot what I see by web-gui of google drive (shortcut can be seen):
Brief: do NOT mount google drive by web-interface button "Mount drive" (it is bugged), but do it in the "old" command line way, and you will not have problems.
Details:
After getting excellent answer above and playing with it,
it seems I found some strange thing which results in simpler solution and probably indicates that there is currently a bug with mounting the google drive by the web interface button "Mount drive".
I mean do NOT mount the drive by interface:
But do it in the old way:
and that is all - you will get the access to files which added before with the help of
"Add shortcut to Drive":
Suppose you want to read a shared csv file from drive. You have done "Add shortcut to Drive".
1) At Colab Notebook Connect to your drive.
# Import PyDrive and associated libraries.
# This only needs to be done once per notebook.
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
2) Get the id of shared file you want to access.
Open file
-> go to linksharing [https://drive.google.com/open?id=1JKECh3GNry6xbAK6aBSzQtSntD4GTEl ] -> copy the the string after 'id='
3) back to colab
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
file_id = '1JKECh3GNry6xbAK6aBSzQtSntMD4GTEl'
downloaded = drive.CreateFile({'id': file_id}) #important
print(downloaded['title']) # it should print the title of desired file
downloaded.GetContentFile('file.csv')
#Finally, you can read the file as pandas dataframe.
import pandas as pd
df= pd.read_csv('file.csv')
Note : This is my first ever answer to a stack overflow question

trouble accessing files on mounted google drive in colab

I have my google drive mounted in colab using:
from google.colab import drive
drive.mount('/content/gdrive')
Under My Drive, I have a folder called Data whose content I wish to access. The problem is, whenever I try to do something with this folder, either by just checking whether it is there:
!ls /content/gdrive/My\ Drive/
or trying to read a csv in that folder:
datapath = '/content/gdrive/My Drive/Data/'
scores = pd.read_csv(datapath+'Scores.csv')
It creates an empty folder with the same name (Data) under My Drive, and returns an error saying no file was found in Data. When I use !ls, it shows a folder named Data and a file named 'Data (1)' under My Drive.
from google.colab import drive
drive.mount('/content/gdrive')
import pandas as pd
df = pd.read_csv('/content/gdrive/My Drive/Data/Scores.csv')
df.head()

I've pickle files already uploaded in google drive. How can I use it in Google Colab?

I already have pickle files worth of 300-400MBs in the google drive's Colab folder.
I want to read use it in Google colab, but unable to do it?
I tried
from google.colab import files
uploaded = files.upload()
#print(uploaded)
for name, data in uploaded.items():
with open(name, 'wb') as f:
#f.write(data)
print ('saved file', name)
But, it prompts to upload.
I already gave access to drive using:
from google.colab import auth
auth.authenticate_user()
Do I need to give access permission again??
Why it shows only datalab in folder?
$ !ls
> datalab
Do I need doanload the file again to the google colab notebook ??
You will need to use Python and change the current directory. For e.g.,
import os
os.chdir('datalab')
will take you inside the datalab folder. If you run !ls now, you will see the contents of the datalab folder. Then, you can once again change directories as long as you want.
I find it easiest to mount your Google Drive locally.
from google.colab import drive
drive.mount('/content/gdrive')
!ls # will show you can now access the gdrive locally
This mounts your google drive to the notebook so you can access documents in your google drive as if they were local. To access the "Colab Notebooks" part of your google drive use the following path:
GDRIVE_DIR = "gdrive/My Drive/Colab Notebooks/"
If you have your pickle files in the Colab Notebooks folder then you can load them in with:
import os
import pickle
filename = ... # The name of the pickle file in your Google Drive
data = pickle.load(os.path.join(GDRIVE_DIR, filename))
A tutorial on mounting your Google Drive and other methods can be found here
You can use pydrive for that. First, you need to find the ID of your file.
# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
listed = drive.ListFile({'q': "title contains '.pkl' and 'root' in parents"}).GetList()
for file in listed:
print('title {}, id {}'.format(file['title'], file['id']))
You can then load the file using the following code:
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
import io
import pickle
from googleapiclient.http import MediaIoBaseDownload
file_id = 'laggVyWshwcyP6kEI-y_W3P8D26sz'
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
f = pickle.load(downloaded)