How to change file name in Google Colab - google-colaboratory

After uploading a file in Google Colab with the code below
from google.colab import files
uploaded = files.upload()
How can I change its name?

import os
src = os.listdir()[1] #find out the file name which u want to rename using indexing
dst ='image.jpg' #change it to the destination name
os.rename(src, dst) #rename it
os.listdir()[1] #access the renamed file

I think you guess right that 'uploaded' holds your file name. And yes, you can access it for renaming purposes, like this:
import os
dst ='image.jpg'
os.rename(list(uploaded.keys())[0], dst)
Now, if you have several files uploaded, you should pay attention on which file to choose, since 'uploaded' is a dictionary and it is not guaranteed to be sorted in any way.

Related

how to read multiple .xlsx files from multiple subfolders using python

I have one folder that includes 10-12 subfolders, from each subfolder I need to read a specific .xlsx file. I am stuck, I have got all the .xlsx files I want to use os.walk but I don't know how to proceed further.
for root,dirs,files in os.walk(path):
for name in files:
if name.endswith("abc.xlsx"):
I You would like to us os.walk, this is how.
import os
for root,dirs,files in os.walk(path):
reqfiles = [i for i in files if i.endswith("abc.xlsx")]
You can use just os.listdir.
reqfiles = [i for i in os.listdir(path) if i.endswith("abc.xlsx")]

How to actually save a csv file to google drive from colab?

so, this problem seems very simple but apparently is not.
I need to transform a pandas dataframe to a csv file and save it in google drive.
My drive is mounted, I was able to save a zip file and other kinds of files to my drive.
However, when I do:
df.to_csv("file_path\data.csv")
it seems to save it where I want, it's on the left panel in my colab, where you can see all your files from all your directories. I can also read this csv file as a dataframe with pandas in the same colab.
HOWEVER, when I actually go on my Google Drive, I can never find it! but I need a code to save it to my drive because I want the user to be able to just run all cells and find the csv file in the drive.
I have tried everything I could find online and I am running out of ideas!
Can anyone help please?
I have also tried this which creates a visible file named data.csv but i only contains the file path
import csv
with open('file_path/data.csv', 'w', newline='') as csvfile:
csvfile.write('file_path/data.csv')
HELP :'(
edit :
import csv
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data_tmp.csv') as f:
s = f.read()
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data.csv', 'w', newline='') as csvfile:
csvfile.write(s)
seems to do the trick.
First export as csv with pandas (named this one data_tmp.csv),
then read it and put that in a variable,
then write the result of this "reading" into another file that I named data.csv,
this data.csv file can be found in my drive :)
HOWEVER when the csv file I try to open is too big (mine has 100.000 raws), it does nothing.
Has anyone got any idea?
First of all, mount your Google Drive with the Colab:
from google.colab import drive
drive.mount('/content/drive')
Allow Google Drive permission
Save your data frame as CSV using this function:
import pandas as pd
filename = 'filename.csv'
df.to_csv('/content/drive/' + filename)
In some cases, directory '/content/drive/' may not work, so try 'content/drive/MyDrive/'
Hope it helps!
Here:
df.to_csv( "/Drive Path/df.csv", index=False, encoding='utf-8-sig')
I recommend you to use pandas to work with data in python, works very well.
In that case, here is a simple tutorial, https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html Pandas tutorial
Then to save your data frame to drive, if you have your drive already mounted, use the function to_csv
dataframe.to_csv("/content/drive/MyDrive/'filename.csv'", index=False), will do the trick

How to load images in Google Colab notebook using Tensorflow from mounted Google drive

In a Google Colab notebook, I have my Google drive mounted and can see my files.
I'm trying to load a zipped directory that has two folders with several picture files in each.
I followed an example from the Tensorflow site that has an example on how to load pictures, but it's using a remote location.
Here's the site - https://www.tensorflow.org/tutorials/load_data/images
Here's the code from the example that works:
data_root_orig = tf.keras.utils.get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
fname='flower_photos', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
Here's the revised code where I tried to reference the zipped directory from the mounted Google drive:
data_root_orig = tf.keras.utils.get_file(origin='/content/gdrive/My Drive/TrainingPictures/',
fname='TrainingPictures_Car', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
I get this error:
ValueError: unknown url type: '/content/gdrive/My Drive/TrainingPictures/'
It's obviously expecting a URL instead of the path as I've provided.
I would like to know how I can load the zipped directory as provided from the Google drive.
In this case, no need to use tf.keras.utils.get_file(), Only Path is enough.
Here 2 ways to do that
First: !unzip -q 'content/gdrive/My Drive/TrainingPictures/TrainingPictures_Car.zip'
it will be unzipped on '/content/'
import pathlib
data = pathlib.Path('/content/folders_inside_zip')
count = len(list(data.glob('*/*.jpg')))
count
Second:
if archive already unzipped in google drive:
import pathlib
data = pathlib.Path('/content/gdrive/My Drive/TrainingPictures/')
count = len(list(data.glob('*.jpg')))
count
In my case it actually worked by removing all imports and libraries and just setting the path as a string. The file has to be uploaded into the google colab.
content_path = "cat.jpg"
For me it worked with file:///content/(filename)

How to mount a Google Shared Drive in Google Colaboratory?

I am working in Google Colaboratory. Until about a week ago, I was able to mount my shared drive using the following:
from google.colab import drive
drive.mount('/content/drive/')
and then read a data file using:
data = pd.read_csv('/content/drive/Team Drives/TestProject/test.csv')
About a week ago, after they updated team drives to shared drives, that stopped working. How do I access my shared drive files now?
All that needed to be done was update "Team Drives" to "Shared drives".
Changing the code to this works:
data = pd.read_csv('/content/drive/Shared drives/TestProject/test.csv')
As of 2022, it looks like the Shared Drive path is Shareddrives with no space. So:
drive.mount('/content/drive/')
# /contents/drive/Shareddrive/Foo is now pointing to my Foo shared drive
This is the only simple solution that I found myself and is working like charm:
Step 1: mount google drive normally:
from google.colab import drive
drive.mount("/content/drive")
Step 2: while saving or fetching data from shared drive:
a) Writing Data to shared drive
!cp "//content/drive/Shareddrives/<shared_ drive_name>/my-video.mp4" "my-video.mp4"
b) Reading data from Shared Drive
!cat /content/drive/Shareddrives/<shared_ drive_name>/example.txt
#or
pd.read_csv("/content/drive/Shareddrives/<shared_ drive_name>/data_example.csv")
Note: If if it did not work try to changing name of share drive without spaces

Pandas save CSV ZIP with proper internal name

I'm running on Pandas 0.23.4.
I have a DataFrame called df. On it, I invoke:
df.to_csv('name.csv.zip', compression='zip')
This creates a zip file called name.csv.zip. Inside it, however, the CSV file is called name.csv.zip and not name.csv. How can I correct this?
In pandas 0.24, there is a new to_csv keyword compression='infer' which will look at the suffix of the file being saved. Unfortunately, it doesn't work that great with zip archives, because the name of the file being saved is used as the name of the member of the zip archive. And it is unclear how to provide archive member names. So what happens is you get the replace df.csv.zip? [y]es, [n]o, [A]ll, [N]one, [r]ename: on extraction and would be left to rename the members of the archive. This also happens when infer is not used and a name and compression method of zip is instead used.
saving df.csv with compression zip gives df.csv with df.csv in it - the archive does not get a .zip suffix. Can be annoying to someone trying to use the file.
saving df.csv.zip with compression zip gives df.csv.zip with df.csv.zip as the archive member name. Can be annoying when extracting, because there is then an archive/member name collision.
Yet a zip archive can be constructed with proper zip archive member names.
import pandas as pd
import zipfile as zf
from pandas.compat import StringIO
print(pd.__version__)
csvdata = StringIO("""index,id1,id2,timestamp,number
465,255,3644,2019-05-02 08:00:20.137000,123123
62,87,912,2019-05-02 5:00:00,435456
""")
# prep dataframe
df = pd.read_csv(csvdata, sep=",")
with zf.ZipFile('archive.zip', 'w') as myziparchive:
myziparchive.writestr('df.csv', df.to_csv())
file archive.zip
archive.zip: Zip archive data, at least v2.0 to extract
Richs-MBP:pandas_examples randrews$ zip --show-files archive.zip
Archive contains:
df.csv
Total 1 entries (119 bytes)
And more than on dataframe can be placed inside.