How to actually save a csv file to google drive from colab? - pandas

so, this problem seems very simple but apparently is not.
I need to transform a pandas dataframe to a csv file and save it in google drive.
My drive is mounted, I was able to save a zip file and other kinds of files to my drive.
However, when I do:
df.to_csv("file_path\data.csv")
it seems to save it where I want, it's on the left panel in my colab, where you can see all your files from all your directories. I can also read this csv file as a dataframe with pandas in the same colab.
HOWEVER, when I actually go on my Google Drive, I can never find it! but I need a code to save it to my drive because I want the user to be able to just run all cells and find the csv file in the drive.
I have tried everything I could find online and I am running out of ideas!
Can anyone help please?
I have also tried this which creates a visible file named data.csv but i only contains the file path
import csv
with open('file_path/data.csv', 'w', newline='') as csvfile:
csvfile.write('file_path/data.csv')
HELP :'(
edit :
import csv
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data_tmp.csv') as f:
s = f.read()
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data.csv', 'w', newline='') as csvfile:
csvfile.write(s)
seems to do the trick.
First export as csv with pandas (named this one data_tmp.csv),
then read it and put that in a variable,
then write the result of this "reading" into another file that I named data.csv,
this data.csv file can be found in my drive :)
HOWEVER when the csv file I try to open is too big (mine has 100.000 raws), it does nothing.
Has anyone got any idea?

First of all, mount your Google Drive with the Colab:
from google.colab import drive
drive.mount('/content/drive')
Allow Google Drive permission
Save your data frame as CSV using this function:
import pandas as pd
filename = 'filename.csv'
df.to_csv('/content/drive/' + filename)
In some cases, directory '/content/drive/' may not work, so try 'content/drive/MyDrive/'
Hope it helps!

Here:
df.to_csv( "/Drive Path/df.csv", index=False, encoding='utf-8-sig')

I recommend you to use pandas to work with data in python, works very well.
In that case, here is a simple tutorial, https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html Pandas tutorial
Then to save your data frame to drive, if you have your drive already mounted, use the function to_csv
dataframe.to_csv("/content/drive/MyDrive/'filename.csv'", index=False), will do the trick

Related

!cp Executes only once when copying a file from Google Colab to Google Drive

I am running a long process that will take hours and hours to finish; therefore I want to dynamically save the results I receive from an API to my Google Drive. !cp command works, but only once. It copies the file to my drive, but refuses to overwrite or update it later on.
I tried
Changing the file from private to public.
Deleting the file after !cp has been executed once to see if it will create a new one.
Played around with dynamic file names, as file_name = f"FinishedCheckpoint_{index}" however this did not work as well. After creating a file with 0th index, it just stops any further updates. However the files are still generated under the colab notebooks directory, but they are not uploaded to google drive, which is essential to not lose progress.
Code cell below, any ideas?
from google.colab import drive
drive.mount('/content/gdrive')
answers = []
for index, row in df.iterrows():
answer = prompt_to_an_api(...)
answers.append(answer)
pd.DataFrame(answers).to_csv('FinishedCheckpoint.csv')
!cp FinishedCheckpoint.csv "gdrive/My Drive/Colab Notebooks/Runtime Results"
pd.DataFrame(answers).to_csv('Finished.csv')
!cp Finished.csv "gdrive/My Drive/Colab Notebooks/Runtime Results"
drive.flush_and_unmount()

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

How to change file name in Google Colab

After uploading a file in Google Colab with the code below
from google.colab import files
uploaded = files.upload()
How can I change its name?
import os
src = os.listdir()[1] #find out the file name which u want to rename using indexing
dst ='image.jpg' #change it to the destination name
os.rename(src, dst) #rename it
os.listdir()[1] #access the renamed file
I think you guess right that 'uploaded' holds your file name. And yes, you can access it for renaming purposes, like this:
import os
dst ='image.jpg'
os.rename(list(uploaded.keys())[0], dst)
Now, if you have several files uploaded, you should pay attention on which file to choose, since 'uploaded' is a dictionary and it is not guaranteed to be sorted in any way.

pandas.read_csv of a gzip file within a zipped directory

I would like to use pandas.read_csv to open a gzip file (.asc.gz) within a zipped directory (.zip). Is there an easy way to do this?
This code doesn't work:
csv = pd.read_csv(r'C:\folder.zip\file.asc.gz') // can't find the file
This code does work (however, it requires me to unzip the folder, which I want to avoid because my dataset currently contains thousands of zipped folders):
csv = pd.read_csv(r'C:\folder\file.asc.gz')
Is there an easy way to do this? I have tried using a combination of zipfile.Zipfile and read_csv, but have been unsuccessful (I think partly due to the fact that this is an ascii file as well)
Maybe the followings might help.
df = pd.read_csv('filename.gz', compression='gzip')
OR
import gzip
file=gzip.open('filename.gz','rb')
content=file.read()

How to load images in Google Colab notebook using Tensorflow from mounted Google drive

In a Google Colab notebook, I have my Google drive mounted and can see my files.
I'm trying to load a zipped directory that has two folders with several picture files in each.
I followed an example from the Tensorflow site that has an example on how to load pictures, but it's using a remote location.
Here's the site - https://www.tensorflow.org/tutorials/load_data/images
Here's the code from the example that works:
data_root_orig = tf.keras.utils.get_file(origin='https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
fname='flower_photos', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
Here's the revised code where I tried to reference the zipped directory from the mounted Google drive:
data_root_orig = tf.keras.utils.get_file(origin='/content/gdrive/My Drive/TrainingPictures/',
fname='TrainingPictures_Car', untar=True)
data_root = pathlib.Path(data_root_orig)
print(data_root)
I get this error:
ValueError: unknown url type: '/content/gdrive/My Drive/TrainingPictures/'
It's obviously expecting a URL instead of the path as I've provided.
I would like to know how I can load the zipped directory as provided from the Google drive.
In this case, no need to use tf.keras.utils.get_file(), Only Path is enough.
Here 2 ways to do that
First: !unzip -q 'content/gdrive/My Drive/TrainingPictures/TrainingPictures_Car.zip'
it will be unzipped on '/content/'
import pathlib
data = pathlib.Path('/content/folders_inside_zip')
count = len(list(data.glob('*/*.jpg')))
count
Second:
if archive already unzipped in google drive:
import pathlib
data = pathlib.Path('/content/gdrive/My Drive/TrainingPictures/')
count = len(list(data.glob('*.jpg')))
count
In my case it actually worked by removing all imports and libraries and just setting the path as a string. The file has to be uploaded into the google colab.
content_path = "cat.jpg"
For me it worked with file:///content/(filename)