How to import Excel xlsx files into pandas [duplicate] - pandas

This question already has answers here:
Pandas cannot open an Excel (.xlsx) file
(5 answers)
Closed 1 year ago.
I need to import .XLSX Excel file into pandas it is now unsupported and gives the error
XLRDError: Excel xlsx file; not supported
I need an alternative for:
import pandas as pd
df = pd.read_excel("Challenger track/Data Sets/extract-text-1.xlsx", index_col=0)
df.head()

This is the workaround for pandas not supporting xlsx files. Install openpyxl and specify it as the engine when reading an xlsx file as below:
xlfile = pd.ExcelFile('sample.xlsx', engine='openpyxl')
df = xlfile.parse('sheet_name')

Related

Cannot read .xlsx file with read_excel()

I want to open the .xlsx file through read_excel().
However, an error message is printed even though the openpyxl and pandas packages are installed.
The pandas version is 0.24.2 and the openpyxl version is 3.0.10.
The error message is - ValueError: Unknown engine: openpyxl
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx',engine='openpyxl')
print(retail_df.head())
In Pandas 0.24.2 the default engine is openpyxl and for that, you don't need to set it up manually during loading the excel file inside the read_excel() function.
So now your updated working code for reading excel files is :
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx')
print(retail_df.head())
Testing result from my side with this code.

how to use pandas.concat insted of append [duplicate]

This question already has answers here:
How to replace pandas append with concat?
(3 answers)
Closed 4 months ago.
I have to import my excel files and combine them into one file.
I used below code and it's worked, but I got information "The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead"
I tried to use concat but it doest't work please help.
import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob(r'path\*.xlsx'):
df = pd.read_excel(f)
all_data = all_data.append(df,ignore_index=True)
Use list comprehension instead loop with DataFrame.append:
all_data = pd.concat([pd.read_excel(f) for f in glob.glob(r'path\*.xlsx')],
ignore_index=True)

Importing Data with read_csv into DF [duplicate]

This question already has answers here:
How to read a file with a semi colon separator in pandas
(2 answers)
Closed 1 year ago.
I have tried to import csv via pandas. But df.head shows the data in wrong rows (see picture).
import numpy as np
import pandas as pd
df = pd.read_csv(r"C:\Users\micha\OneDrive\Dokumenty\ML\winequality-red.csv")
df.head()
Can you help me?
Seems like your data is not 'comma' seperated but 'semicolon' separated. Try adding this separator parameter.
df = pd.read_csv(r"C:\Users\micha\OneDrive\Dokumenty\ML\winequality-red.csv", sep=';')

Pandas - xls to xlsx converter

I want python to take ANY .xls file from given location and save it as .xlsx with original file name? How I can do that so anytime I paste file to location it will be converted to xlsx with original file name?
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(??)
Your code seems to be perfectly fine. In case you are only missing the correct way to write it with the given name, here you go.
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(f"{os.path.splitext(filename)[0]}.xlsx")
A possible extension to convert any file that gets pasted inside the folder can be implemented with an infinite loop, for instance:
import pandas as pd
import os
import time
while True:
files = os.listdir('./')
for filename in files:
out_name = f"{os.path.splitext(filename)[0]}.xlsx"
if filename.endswith('.xls') and out_name not in files:
df = pd.read_excel(filename)
df.to_excel(out_name)
time.sleep(10)

Read json files from tar.gz folders and convert to pandas dataframe [duplicate]

This question already has answers here:
JSON to pandas DataFrame
(14 answers)
Closed 3 years ago.
i have never worked with json files and my problem is I have several folders tar.gz containing different json files. From each zipped folders I need to read only files AAjson, append and convert to a pandas dataframe. I tried in this way
import os, re
import pandas as pd
import pandas as pd
import tarfile
import json
from pandas.io.json import json_normalize
cd = "my_path"
dfList = []
for root, dirs, files in os.walk(cd):
with tarfile.open("dirs", "r:*") as tar:
for fname in files:
if re.match("AA_*.json$", fname):
data = json.load(fname)
frame = pd.DataFrame.from_dict(json_normilized(data),
orient='columns')
dfList.append(frame)
df = pd.concat(dfList)
I found the error
FileNotFoundError: [Errno 2] No such file or directory: 'dirs'
import pandas as pd
data = pd.read_json('filepath/filename')
data