Reading files from Drive folders with Colab - google-colaboratory

I have mounted my Drive folders into Colab,
from google.colab import drive drive.mount('/content/drive')
but I can't get images inside that folders.
I want to iterate over the images inside that folder and resize them.
I tried !ls /content/drive/path/to/folder, it returns the names of the images, but when I make 'for' loop on that, after every iteration for every 'i' it returns string like this 'image.png pic.jpg\t more_contr499.jpg' , i.e. the names of three images in one string, plus '\t'
So when I try to open image after every iteration it returns me an error: FileNotFoundError: [Errno 2] No such file or directory: 'image.png\t\t pic.jpg\t more_contr499.jpg'

Related

!cp Executes only once when copying a file from Google Colab to Google Drive

I am running a long process that will take hours and hours to finish; therefore I want to dynamically save the results I receive from an API to my Google Drive. !cp command works, but only once. It copies the file to my drive, but refuses to overwrite or update it later on.
I tried
Changing the file from private to public.
Deleting the file after !cp has been executed once to see if it will create a new one.
Played around with dynamic file names, as file_name = f"FinishedCheckpoint_{index}" however this did not work as well. After creating a file with 0th index, it just stops any further updates. However the files are still generated under the colab notebooks directory, but they are not uploaded to google drive, which is essential to not lose progress.
Code cell below, any ideas?
from google.colab import drive
drive.mount('/content/gdrive')
answers = []
for index, row in df.iterrows():
answer = prompt_to_an_api(...)
answers.append(answer)
pd.DataFrame(answers).to_csv('FinishedCheckpoint.csv')
!cp FinishedCheckpoint.csv "gdrive/My Drive/Colab Notebooks/Runtime Results"
pd.DataFrame(answers).to_csv('Finished.csv')
!cp Finished.csv "gdrive/My Drive/Colab Notebooks/Runtime Results"
drive.flush_and_unmount()

How to actually save a csv file to google drive from colab?

so, this problem seems very simple but apparently is not.
I need to transform a pandas dataframe to a csv file and save it in google drive.
My drive is mounted, I was able to save a zip file and other kinds of files to my drive.
However, when I do:
df.to_csv("file_path\data.csv")
it seems to save it where I want, it's on the left panel in my colab, where you can see all your files from all your directories. I can also read this csv file as a dataframe with pandas in the same colab.
HOWEVER, when I actually go on my Google Drive, I can never find it! but I need a code to save it to my drive because I want the user to be able to just run all cells and find the csv file in the drive.
I have tried everything I could find online and I am running out of ideas!
Can anyone help please?
I have also tried this which creates a visible file named data.csv but i only contains the file path
import csv
with open('file_path/data.csv', 'w', newline='') as csvfile:
csvfile.write('file_path/data.csv')
HELP :'(
edit :
import csv
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data_tmp.csv') as f:
s = f.read()
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data.csv', 'w', newline='') as csvfile:
csvfile.write(s)
seems to do the trick.
First export as csv with pandas (named this one data_tmp.csv),
then read it and put that in a variable,
then write the result of this "reading" into another file that I named data.csv,
this data.csv file can be found in my drive :)
HOWEVER when the csv file I try to open is too big (mine has 100.000 raws), it does nothing.
Has anyone got any idea?
First of all, mount your Google Drive with the Colab:
from google.colab import drive
drive.mount('/content/drive')
Allow Google Drive permission
Save your data frame as CSV using this function:
import pandas as pd
filename = 'filename.csv'
df.to_csv('/content/drive/' + filename)
In some cases, directory '/content/drive/' may not work, so try 'content/drive/MyDrive/'
Hope it helps!
Here:
df.to_csv( "/Drive Path/df.csv", index=False, encoding='utf-8-sig')
I recommend you to use pandas to work with data in python, works very well.
In that case, here is a simple tutorial, https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html Pandas tutorial
Then to save your data frame to drive, if you have your drive already mounted, use the function to_csv
dataframe.to_csv("/content/drive/MyDrive/'filename.csv'", index=False), will do the trick

How to change file name in Google Colab

After uploading a file in Google Colab with the code below
from google.colab import files
uploaded = files.upload()
How can I change its name?
import os
src = os.listdir()[1] #find out the file name which u want to rename using indexing
dst ='image.jpg' #change it to the destination name
os.rename(src, dst) #rename it
os.listdir()[1] #access the renamed file
I think you guess right that 'uploaded' holds your file name. And yes, you can access it for renaming purposes, like this:
import os
dst ='image.jpg'
os.rename(list(uploaded.keys())[0], dst)
Now, if you have several files uploaded, you should pay attention on which file to choose, since 'uploaded' is a dictionary and it is not guaranteed to be sorted in any way.

How to load images in Google Colab notebook using Tensorflow from mounted Google drive

In a Google Colab notebook, I have my Google drive mounted and can see my files.
I'm trying to load a zipped directory that has two folders with several picture files in each.
I followed an example from the Tensorflow site that has an example on how to load pictures, but it's using a remote location.
Here's the site - https://www.tensorflow.org/tutorials/load_data/images
Here's the code from the example that works:
data_root_orig = tf.keras.utils.get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
fname='flower_photos', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
Here's the revised code where I tried to reference the zipped directory from the mounted Google drive:
data_root_orig = tf.keras.utils.get_file(origin='/content/gdrive/My Drive/TrainingPictures/',
fname='TrainingPictures_Car', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
I get this error:
ValueError: unknown url type: '/content/gdrive/My Drive/TrainingPictures/'
It's obviously expecting a URL instead of the path as I've provided.
I would like to know how I can load the zipped directory as provided from the Google drive.
In this case, no need to use tf.keras.utils.get_file(), Only Path is enough.
Here 2 ways to do that
First: !unzip -q 'content/gdrive/My Drive/TrainingPictures/TrainingPictures_Car.zip'
it will be unzipped on '/content/'
import pathlib
data = pathlib.Path('/content/folders_inside_zip')
count = len(list(data.glob('*/*.jpg')))
count
Second:
if archive already unzipped in google drive:
import pathlib
data = pathlib.Path('/content/gdrive/My Drive/TrainingPictures/')
count = len(list(data.glob('*.jpg')))
count
In my case it actually worked by removing all imports and libraries and just setting the path as a string. The file has to be uploaded into the google colab.
content_path = "cat.jpg"
For me it worked with file:///content/(filename)

Time out on drive.mount('/content/drive') in google colab

I am using google colab and there is always time out when I run the command
from google.colab import drive
drive.mount('/content/drive')
I have restarted runtime as well but nothing changed.
Although it was working yesterday.
Here is the error:
TIMEOUT: Timeout exceeded.
command: /bin/bash
args: [b'/bin/bash', b'--noediting']
buffer (last 100 chars): 'ZI [91298688] ui.cc:80:DisplayNotification Drive File Stream encountered a problem and has stopped\r\n'
before (last 100 chars): 'ZI [91298688] ui.cc:80:DisplayNotification Drive File Stream encountered a problem and has stopped\r\n'
after:
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 3135
child_fd: 76
closed: False
timeout: 120
delimiter:
logfile: None
logfile_read: None
logfile_send: None
maxread: 1000000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_re:
0: re.compile('google.colab.drive MOUNTED')
1: re.compile('root#32155b861949-0ddc780f6f5b40478d01abff0ab81cc1: ')
2: re.compile('(Go to this URL in a browser: https://.*)\r\n')`
A common cause of timeouts is having many thousands of files or folders in your root Drive directory.
If that's the case for you, my recommendation is to move some of these items into folders in your root directory so that the root has fewer items.
Under the covers, the way that Drive works requires listing the entire root directory to mount it as a FUSE filesystem, which takes time proportional to the number of files and folders you have, which leads to timeouts if you have many files and folders.
Why does drive.mount() sometimes fail to say "timed out", and why do I/O operations in drive.mount()-mounted Do folders sometimes fail?
Google Drive operations can time out when the number of files or sub-folders in a folder grows too large. If thousands of items are directly contained in the top-level "My Drive" folder then mounting the drive will likely time out. Repeated attempts may eventually succeed as failed attempts cache partial state locally before timing out. If you encounter this problem, try moving files and folders directly contained in "My Drive" into sub-folders. A similar problem can occur when reading from other folders after a successful drive.mount(). Accessing items in any folder containing many items can cause errors like OSError: [Errno 5] Input/output error (python 3) or IOError: [Errno 5] Input/output error (python 2). Again, you can fix this problem by moving directly contained items into sub-folders.
Note that "deleting" files or sub-folders by moving them to the Trash may not be enough; if that doesn't seem to help, make sure to also Empty your Trash. For your Reference
Can you check what you are pasting if its the token that is being generated?
I had this issue and the copy to the clipboard was copying the link, not the token.
you might want to manually copy it.