problem reading panda csv file into python - pandas

I have a very elementary csv reading program which does not work
import pandas as pd
Reading the tips.csv file
data = pd.read_csv('tips.csv')`
The error messages are long and end with tips.csv not found

Is your csv file in the same folder?

Related

Problem importing txt file data with pandas

df = pd.read_csv("AVG.txt")
df
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I'm a beginner i'm trying to interpret some data with python and i ran into this error trying to load the file
This is the file I am trying to upload:
File
This is probably an encoding issue, try
df = pd.read_csv("AVG.txt",encoding="utf-16")
You may also try using the basic open() function and parse it later on
The file is a .txt file. If you save it as a .csv should work fine (e.g. copy into Excel and use Save As). Just tried it and its worked.

Save the file in a different folder using python

I have a pandas dataframe and I would like to save it as a text file to another folder. What I tried so far?
import pandas as pd
df.to_csv(path = './output/filename.txt')
This does not save the file and gives me an error. How do I save the dataframe (df) into the folder called output?
the first arguement name of to_csv() is path_or_buf either change it or just remove it
df.to_csv('./output/filename.txt')

How to actually save a csv file to google drive from colab?

so, this problem seems very simple but apparently is not.
I need to transform a pandas dataframe to a csv file and save it in google drive.
My drive is mounted, I was able to save a zip file and other kinds of files to my drive.
However, when I do:
df.to_csv("file_path\data.csv")
it seems to save it where I want, it's on the left panel in my colab, where you can see all your files from all your directories. I can also read this csv file as a dataframe with pandas in the same colab.
HOWEVER, when I actually go on my Google Drive, I can never find it! but I need a code to save it to my drive because I want the user to be able to just run all cells and find the csv file in the drive.
I have tried everything I could find online and I am running out of ideas!
Can anyone help please?
I have also tried this which creates a visible file named data.csv but i only contains the file path
import csv
with open('file_path/data.csv', 'w', newline='') as csvfile:
csvfile.write('file_path/data.csv')
HELP :'(
edit :
import csv
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data_tmp.csv') as f:
s = f.read()
with open('/content/drive/MyDrive/Datatourisme/tests_automatisation/data.csv', 'w', newline='') as csvfile:
csvfile.write(s)
seems to do the trick.
First export as csv with pandas (named this one data_tmp.csv),
then read it and put that in a variable,
then write the result of this "reading" into another file that I named data.csv,
this data.csv file can be found in my drive :)
HOWEVER when the csv file I try to open is too big (mine has 100.000 raws), it does nothing.
Has anyone got any idea?
First of all, mount your Google Drive with the Colab:
from google.colab import drive
drive.mount('/content/drive')
Allow Google Drive permission
Save your data frame as CSV using this function:
import pandas as pd
filename = 'filename.csv'
df.to_csv('/content/drive/' + filename)
In some cases, directory '/content/drive/' may not work, so try 'content/drive/MyDrive/'
Hope it helps!
Here:
df.to_csv( "/Drive Path/df.csv", index=False, encoding='utf-8-sig')
I recommend you to use pandas to work with data in python, works very well.
In that case, here is a simple tutorial, https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html Pandas tutorial
Then to save your data frame to drive, if you have your drive already mounted, use the function to_csv
dataframe.to_csv("/content/drive/MyDrive/'filename.csv'", index=False), will do the trick

Reading csv file from s3 using pyarrow

I want read csv file located in s3 bucket using pyarrow and convert it to parquet to another bucket.
I am facing problem in reading csv file from s3.I tried reading below code but failed.Does pyarrow support reading csv from s3 ?
from pyarrow import csv
s3_input_csv_path='s3://bucket1/0001.csv'
table=csv.read_csv(s3_input_csv_path)
This is throwing error
"errorMessage": "Failed to open local file 's3://bucket1/0001.csv', error: No such file or directory",
I know we can read csv file using boto3 and then can use pandas to convert it into data frame and finally convert to parquet using pyarrow. But in this approach pandas is also required to be added to the package that makes package size go beyond 250 mb limit for lambda when taken along with pyarrow.
Try passing a file handle to pyarrow.csv.read_csv instead of an S3 file path.
Note that future editions of pyarrow will have built-in S3 support but I am not sure of the timeline (and any answer I provide here will grow quickly out of date with the nature of StackOverflow).
import pyarrow.parquet as pq
from s3fs import S3FileSystem
s3 = S3FileSystem() # or s3fs.S3FileSystem(key=ACCESS_KEY_ID, secret=SECRET_ACCESS_KEY)
s3_input_csv_path = f"s3://bucket1/0001.csv"
dataset = pq.ParquetDataset(s3_input_csv_path, filesystem=s3)
table = dataset.read_pandas().to_pandas()
print(table)
s3_output_csv_path = f"s3://bucket2/0001.csv"
#Wring table to another bucket
pq.write_to_dataset(table=table,
root_path=s3_output_csv_path,
filesystem=s3)
AWS has a project (AWS Data Wrangler) that helps with the integration between Pandas/PyArrow and their services.
Example of CSV read:
import awswrangler as wr
df = wr.s3.read_csv(path="s3://...")
Reference
Its not possible as of now. But here is a workaround, we can load data to pandas and cast it to pyarrow table
import pandas as pd
import pyarrow as pa
df = pd.read_csv("s3://your_csv_file.csv", nrows=10). #reading 10 lines
pa.Table.from_pandas(df)

Cannot open a csv file

I have a csv file on which i need to work in my jupyter notebook ,even though i am able to view the contents in the file using the code in the picture
When i am trying to convert the data into a data frame i get a "no columns to parse from file error"
i have no headers. My csv file looks like this and also i have saved it in the UTF-8 format
Try to use pandas to read the csv file:
df = pd.read_csv("BON3_NC_CUISINES.csv)
print(df)