Cannot open a csv file - pandas

I have a csv file on which i need to work in my jupyter notebook ,even though i am able to view the contents in the file using the code in the picture
When i am trying to convert the data into a data frame i get a "no columns to parse from file error"
i have no headers. My csv file looks like this and also i have saved it in the UTF-8 format

Try to use pandas to read the csv file:
df = pd.read_csv("BON3_NC_CUISINES.csv)
print(df)

Related

how to read data from multiple folder from adls to databricks dataframe

file path format is data/year/weeknumber/no of day/data_hour.parquet
data/2022/05/01/00/data_00.parquet
data/2022/05/01/01/data_01.parquet
data/2022/05/01/02/data_02.parquet
data/2022/05/01/03/data_03.parquet
data/2022/05/01/04/data_04.parquet
data/2022/05/01/05/data_05.parquet
data/2022/05/01/06/data_06.parquet
data/2022/05/01/07/data_07.parquet
how to read all this file one by one in data bricks notebook and store into the data frame
import pandas as pd
#Get all the files under the folder
data = dbutils.fs.la(file)
df = pd.DataFrame(data)
#Create the list of file
list = df.path.tolist()
enter code here
for i in list:
df = spark.read.load(path=f'{f}*',format='parquet')
i can able to read only the last file skipping the other file
The last line of your code cannot load data incrementally. In contrast, it refreshes df variable with the data from each path for each time it ran.
Removing the for loop and trying the code below would give you an idea how file masking with asterisks works. Note that the path should be a full path. (I'm not sure if the data folder is your root folder or not)
df = spark.read.load(path='/data/2022/05/*/*/*.parquet',format='parquet')
This is what I have applied from the same answer I shared with you in the comment.

Problem importing txt file data with pandas

df = pd.read_csv("AVG.txt")
df
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I'm a beginner i'm trying to interpret some data with python and i ran into this error trying to load the file
This is the file I am trying to upload:
File
This is probably an encoding issue, try
df = pd.read_csv("AVG.txt",encoding="utf-16")
You may also try using the basic open() function and parse it later on
The file is a .txt file. If you save it as a .csv should work fine (e.g. copy into Excel and use Save As). Just tried it and its worked.

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

Save the file in a different folder using python

I have a pandas dataframe and I would like to save it as a text file to another folder. What I tried so far?
import pandas as pd
df.to_csv(path = './output/filename.txt')
This does not save the file and gives me an error. How do I save the dataframe (df) into the folder called output?
the first arguement name of to_csv() is path_or_buf either change it or just remove it
df.to_csv('./output/filename.txt')

importing training data to CloudML with images that do not have a file-extension

i created some training data and put the CSV in the google-storage, but it looks like the import won't work when the files do not have a proper .jpg extension:
Error: INVALID_ROW: Invalid input found at row 1 of gs://weg-li-production/training/test.csv: "Unsupported file extension."
values look like this:
TRAIN,gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj,Opel
is there a way to work around this issue?
It seems you put the whole "TRAIN,gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj,Opel" into a single unit in your csv file. The comma should represent another unit in the csv file. You can open it in Excel to check your csv file, and the correct format should include three columns in Excel.
Assuming gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj is the image file & Opel is the label. It all looks fine, just that the image file name does not have a valid extension.
Check https://cloud.google.com/vision/automl/docs/prepare for valid file types (extension), during training & predictions