Browse for xlsx workbook and create a drop down list to select a sheet to read - pandas

I am absolutely new to python and I am trying to build a code that will upload a data frame based on the browsed xlsx file and then drop down list of all sheets in the selected xlsx file.
I have found two codes: one for browsing and reading excel file and second for drop down list with all sheets in selected xlsx file. What I need to do is actually to combine this two codes. First of all I would like to select an xlsx sheet and then I would like to select which sheet to read (based on drop down list).
Function to browse and read excel file
enter code hereimport tkinter as tk
enter code herefrom tkinter import filedialog
enter code hereimport pandas as pd
root= tk.Tk()
canvas1 = tk.Canvas(root, width = 300, height = 300, bg = 'white')
canvas1.pack()
def getExcel ():
global df
import_file_path = filedialog.askopenfilename()
df = pd.read_excel(import_file_path, sheet_name='Loan Tape')
df.keys()
df
root.destroy()
browseButton_Excel = tk.Button(text='Import Excel File', command=getExcel, bg='yellow', fg='black', font=('arial', 12, 'bold'))
canvas1.create_window(150, 150, window=browseButton_Excel)
root.mainloop()
Function to create a drop down list of all sheets in the xlsx file
import tkinter as tk
from tkinter import *
root = Tk()
root.title("Select a sheet")
mainframe = Frame(root)
mainframe.grid(column=0,row=0, sticky=(N,W,E,S) )
mainframe.columnconfigure(0, weight = 1)
mainframe.rowconfigure(0, weight = 1)
mainframe.pack(pady = 100, padx = 100)
tkvar = StringVar(root)
xl = pd.ExcelFile(r'Full file path.xlsx')
choices=xl.sheet_names
tkvar.set('Nothing selected') # set the default option
popupMenu = OptionMenu(mainframe, tkvar, *choices)
Label(mainframe, text="Select a sheet").grid(row = 1, column = 1)
popupMenu.grid(row = 2, column =1)
def change_dropdown(*args):
print( tkvar.get() )
root.destroy()
tkvar.trace('w', change_dropdown)
root.mainloop()
What I need to do is to actually combine this two codes. First of all I would like to select an xlsx sheet and then I would like to select which sheet to read (based on drop down list).

Related

how to solve unsupported format, or corrupt file: expected bof record error

file_folder_address = 'C:/Users/Amirreza/Desktop/python homeworks/project files'
df_total=pd.DataFrame()
for file in os.listdir(file_folder_address): #os.listdir gives a list of exel file names
df_men_urb = pd.DataFrame()
df_women_urb = pd.DataFrame()
df_men_rural = pd.DataFrame()
df_women_rural = pd.DataFrame()
sheet_names = pd.ExcelFile(os.path.join(file_folder_address, file), engine='xlrd').sheet_names #create a list from sheet names
for i in sheet_names:
sht = pd.read_excel(os.path.join(file_folder_address, file), sheet_name= i)
I want to open several exel files with several worksheet and I write above code and it,s make this error.
Excel file format cannot be determined, you must specify an engine manually.

Reading an .xltx file and Writing to .xlsx file with openpyxl

I've been struggling with this problem for a while and couldn't find a solution else where. I have several excel templates in xltx format that I want to read, then write a new xlsx file after filling in some cells.
Every time I run my code it creates a corrupted excel file. Using a preview extension in VS code I'm able to see that the values were correctly changed. When I read an xlsx file instead of an xltx it works fine. Does openpyxl just not allow what I am trying to do?
import openpyxl
import win32com.client
report = openpyxl.load_workbook("0100048-A5_R_11.xltx")
sheet = report["A5 form"]
search_arr = ["Test_Date"]
for r in range(2, sheet.max_row+1):
for c in range(3,sheet.max_column+1):
val = sheet.cell(r,c).value
if val != None and "$!" in str(val):
sheet.cell(r,c).value = 1
report.active = 1
report.save("output.xlsx")
Copied from the docs:
You can specify the attribute template=True, to save a workbook as a template:
wb = load_workbook('document.xlsx')
wb.template = True
wb.save('document_template.xltx')
or set this attribute to False (default), to save as a document:
wb = load_workbook('document_template.xltx')
wb.template = False
wb.save('document.xlsx', as_template=False)
Although the last line is from the latest docs, as_template is not a keyword argument for save!
This works instead:
wb.save('document.xlsx')

Export pandas dataframe to xlsx: dealing with the openpyxl issue on python 3.9

Using latest packages version: openpyxl: 3.0.6 | pandas: 1.2.3 |python: 3.9
The function below was working fine before updating the packages above to the latest version reported.
Now it raises the error: "zipfile.BadZipFile: File is not a zip file".
Such fuction is really useful and would be great to know if it can be fixed in order to works.
The function below can be run as it is, just replace "pathExport" to your export directory for testing.
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
pathExport = r"F:\PYTHON\NB-Suite_python39\MNE\outputData\df.xlsx"
df1 = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue']
})
append_df_to_excel(pathExport, df1, sheet_name="DF1", index=False, startcol=0, startrow=0)
OK, I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5
In pandas 1.2.0 they did some changes
At the time when you instantiate pd.ExcelWriter with
writer = pd.ExcelWriter(filename, engine='openpyxl')`
it creates empty file with size 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works fine with pandas 1.1.5.
The solution - specify mode='a', change the above line to
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
Alternatively - look at this or this solution where it loads the file before instantiating the pd.ExcelWriter.
EDIT: I've been advised in the comments that with mode='a' it will raise FileNotFoundError in case the file does not exists. Although it's unexpected that it will not create the file in this case, the solution is to move creating the writer inside the existing try block and create a writer with mode w in the except part:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
writer = pd.ExcelWriter(filename, engine='openpyxl')
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
The solution is the following:
import pandas as pd
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
resizeColumns: default = True . It resize all columns based on cell content width
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''
Returns: None
*******************
CONTRIBUTION:
Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
Features of the new helper function:
1) Now it works with python 3.9 and latest versions of pandas and openpxl
---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
3) You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
*******************
"""
from openpyxl import load_workbook
from string import ascii_uppercase
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
try:
f = open(filename)
# Do something with the file
except IOError:
# print("File not accessible")
wb = Workbook()
ws = wb.active
ws.title = sheet_name
wb.save(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
# startrow = -1
startrow = 0
if startcol is None:
startcol = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)
if resizeColumns:
ws = writer.book[sheet_name]
def auto_format_cell_width(ws):
for letter in range(1,ws.max_column):
maximum_value = 0
for cell in ws[get_column_letter(letter)]:
val_to_check = len(str(cell.value))
if val_to_check > maximum_value:
maximum_value = val_to_check
ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2
auto_format_cell_width(ws)
# save the workbook
writer.save()
Example Usage:
# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue'],
'NaNcheck': [float('NaN'), 1, float('NaN')],
})
# EDIT YOUR PATH FOR THE EXPORT
filename = r"C:\DataScience\df.xlsx"
# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCEL FILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

How to add data to an existing excel file in python without overwriting the data

I have a master excel sheet where the data looks like this [1]: https://i.stack.imgur.com/IS4cw.png
I have a script which imports the csv files and combines them and save it to the master excel sheet.
import pandas as pd
from openpyxl import load_workbook
import tkinter as tk
from tkinter import filedialog
root = tk.Tk()
root.withdraw()
root.call('wm', 'attributes', '.', '-topmost', True)
files = filedialog.askopenfilename(multiple=True)
%gui tk
var = root.tk.splitlist(files)
filePaths = []
for f in var:
df = pd.read_csv(f,skiprows=8, index_col=None, header='infer',parse_dates=True, squeeze=True, encoding='ISO-8859–1',names=['Date', 'Time', 'Temperature', 'Humidty'])
filePaths.append(df)
df = pd.concat(filePaths, axis=0, join='outer', ignore_index=True, sort=True)
book = load_workbook(r'C:\Users\Administrator\Documents\Hebin\Scripts\Temperature Distribution chart/july/12.xlsx')
writer = pd.ExcelWriter(r'C:\Users\Administrator\Documents\Hebin\Scripts\Temperature Distribution chart/july/12.xlsx', engine='openpyxl')
writer.book = book
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df.to_excel(writer, "Sheet1", columns=['Date', 'Time','Temperature', 'Humidty'],index=False)
writer.save()
The problem is that the newly imported data is saved from row 1 instead of starting at the ending row of the previously saved data. How can I save the data in an orderly manner everytime without entering the row number?
The ExcelWriter can have its mode set to either write ('w') or append ('a'). The default is write.
writer = pd.ExcelWriter(r'C:\Users\Administrator\Documents\Hebin\Scripts\Temperature Distribution chart/july/12.xlsx', engine='openpyxl', mode='a')

Add dataframe and button to same sheet with XlsxWriter

I am able to create an excel file with in one sheet the data from a data frame and in a second sheet a button to run a macro
What I need is to have both the data from the dataframe than the button in the same sheet
This is the code I found that I have tried to modify:
import pandas as pd
import xlsxwriter
df = pd.DataFrame({'Data': [10, 20, 30, 40]})
writer = pd.ExcelWriter('hellot.xlsx', engine='xlsxwriter')
worksheet = workbook.add_worksheet()
#df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
workbook.filename = 'test.xlsm'
worksheet = workbook.add_worksheet()
workbook.add_vba_project('./vbaProject.bin')
worksheet.write('A3', 'Press the button to say hello.')
#Add a button tied to a macro in the VBA project.
worksheet.insert_button('A1', {'macro': 'start',
'caption': 'Press Me',
'width': 80,
'height': 30})
df.to_excel(writer, sheet_name ='Sheet2')
writer.save()
workbook.close()
I know that you simply asked how to insert the button in the same sheet but i decided to check how the macros are working with xlsxwriter, so i wrote a complete tutorial on how to add a macro.
1) Firstly we need to create manually a file which will contain the macro in order to extract it as a bin file and inject it later using xlsxwriter. So we create a new excel file, go to Developer tab, Visual Basic, Insert Module and write the following code:
Sub TestMsgBox()
MsgBox "Hello World!"
End Sub
Save the file with xlsm extension to contain the macro, e.g. as Book1.xlsm.
2) Now we need to extract the bin file. Open your cmd and browse to the directory where you have saved the Book1.xlsm. Then browse through file explorer to the folder where you have installed python (or to the virtual environment folder) and search for vba_extract.py. Copy this script into the same folder as the Book1.xlsm. Then type in cmd:
python vba_extract.py Book1.xlsm
This way you will extract the macro and create the vbaProject.bin file in the same folder.
3) Now it's time to create the final file. Delete the Book1.xlsm and the vba_extract.py files as they 're not needed anymore and run the following code:
import pandas as pd
# Create a test dataframe
df = pd.DataFrame({'Data': [10, 20, 30, 40]})
# Import it through the xlsxwriter
writer = pd.ExcelWriter('hello_world.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Create the workbook and the worksheet
workbook = writer.book
workbook.filename = 'hello_world.xlsm' # rename the workbook to xlsm extension
worksheet = writer.sheets['Sheet1']
# Inject the bin file we extracted earlier
workbook.add_vba_project('./vbaProject.bin')
# Insert a description
worksheet.write('B1', 'Press the button to say hello.')
#Add a button tied to a macro in the VBA project.
worksheet.insert_button('B2', {'macro': 'TestMsgBox',
'caption': 'Press Me',
'width': 80, 'height': 30})
# Finally write the file
writer.save()
Now button is in the same sheet as your data and working: