Cannot read .xlsx file with read_excel() - pandas

I want to open the .xlsx file through read_excel().
However, an error message is printed even though the openpyxl and pandas packages are installed.
The pandas version is 0.24.2 and the openpyxl version is 3.0.10.
The error message is - ValueError: Unknown engine: openpyxl
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx',engine='openpyxl')
print(retail_df.head())

In Pandas 0.24.2 the default engine is openpyxl and for that, you don't need to set it up manually during loading the excel file inside the read_excel() function.
So now your updated working code for reading excel files is :
import pandas as pd
import math
retail_df = pd.read_excel('./Online_Retail.xlsx')
print(retail_df.head())
Testing result from my side with this code.

Related

VCF file is missing mandatory header line ("#CHROM...")

I am getting an error when I am going to read a VCF file using scikit-allel library inside a docker image and os ubuntu 18.04. It shows that
raise RuntimeError('VCF file is missing mandatory header line ("#CHROM...")')
RuntimeError: VCF file is missing mandatory header line ("#CHROM...")
But in the VCF file is well-formatted.
Here is my code of how I applied :
import pandas as pd
import os
import numpy as np
import allel
import tkinter as tk
from tkinter import filedialog
import matplotlib.pyplot as plt
from scipy.stats import norm
GenomeVariantsInput = allel.read_vcf('quartet_variants_annotated.vcf', samples=['ISDBM322015'],fields=[ 'variants/CHROM', 'variants/ID', 'variants/REF',
'variants/ALT','calldata/GT'])
version what Installed :
Python 3.6.9
Numpy 1.19.5
pandas 1.1.5
scikit-allel 1.3.5
You need to add a line like this in the first:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
but it's not static for all of the files, you have to make a Header like above for your file. (I suggest try this header first and if it's got error then customize it)

pandas-read-xml has error on 'json-normalize'

I saw there is a way to directly read XML files using pandas. I followed and used this package. However, I keep getting errors.
https://pypi.org/project/pandas-read-xml/
import pandas as pd
import pandas_read_xml as pdx
from pandas.io.json import json_normalize
The error was generated by last line and the error is
ImportError: cannot import name 'json_normalize'
I am using kernel python 3, can anyone tell me what was wrong with it?

How can I convert my text file into netcdf file. I have observation datasets of simply one meteorological station between 1980 and 2018

I tried to convert my text file into NetCDF (nc) file with the help of the youtube link I shared here. I cannot open this nc file in GrADS. I guess the reason is that I cannot add metadata or something into nc file with these lines of codes.
I would therefore like to improve the code in my hand so that I can open it up in other platforms. I need to open this NetCDF file in RCMES so I can carry out quantile mapping bias correction operations.
I am also open to suggestion for other ways/programming languages/platforms to perform this conversion task.
Below is the code I used.
import netCDF4 as nc
import numpy as np
import panda
import numpy as np
import pandas as pd
import xarray
# here csv file is converted into pandas dataframe
df = pd.read_csv('C:/Users/Asus/Documents/ArcGIS/ArcGIS Copy/evaporation/Downscaling Files/netcdfye dönecek csvler/Aydin_cnrm_Prec_rcp451.txt')
df
#converting pandas dataframe into xarray
xr = df.to_xarray()
xr
#lastly from xarray to nc file conversion
xr.to_nc('Aydin_cnrm_Prec_rcp451.nc')
Instead of using Python for creating a Netcdf file from a ASCII/txt file I tried using cdo that I installed on Ubuntu.
The following lines of code solved the problem
cdo -f nc input,r1x1 tmp.nc < Aydin_cnrm_Prec_rcp45.txt
cdo -r -chname,var1,prec -settaxis,1980-01-01,00:00:00,1mon tmp.nc prec.nc

Pandas - xls to xlsx converter

I want python to take ANY .xls file from given location and save it as .xlsx with original file name? How I can do that so anytime I paste file to location it will be converted to xlsx with original file name?
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(??)
Your code seems to be perfectly fine. In case you are only missing the correct way to write it with the given name, here you go.
import pandas as pd
import os
for filename in os.listdir('./'):
if filename.endswith('.xls'):
df = pd.read_excel(filename)
df.to_excel(f"{os.path.splitext(filename)[0]}.xlsx")
A possible extension to convert any file that gets pasted inside the folder can be implemented with an infinite loop, for instance:
import pandas as pd
import os
import time
while True:
files = os.listdir('./')
for filename in files:
out_name = f"{os.path.splitext(filename)[0]}.xlsx"
if filename.endswith('.xls') and out_name not in files:
df = pd.read_excel(filename)
df.to_excel(out_name)
time.sleep(10)

Generating a NetCDF from a text file

Using Python can I open a text file, read it into an array, then save the file as a NetCDF?
The following script I wrote was not successful.
import os
import pandas as pd
import numpy as np
import PIL.Image as im
path = 'C:\path\to\data'
grb = [[]]
for fn in os.listdir(path):
file = os.path.join(path,fn)
if os.path.isfile(file):
df = pd.read_table(file,skiprows=6)
grb.append(df)
df2 = pd.np.array(grb)
#imarray = im.fromarray(df2) ##cannot handle this data type
#imarray.save('Save_Array_as_TIFF.tif')
i once used xray or xarray (they renamed them selfs) to get a NetCDF file into an ascii dataframe... i just googled and appearantly they have a to_netcdf function
import xarray and it allows you to treat dataframes just like pandas.
so give this a try:
df.to_netcdf(file_path)
xarray slow to save netCDF