Using win32com/python to recalculate excel sheet that uses an add-in - win32com

Description of problem: I have been manually using a spreadsheet that works with an excel add-in. I am trying to automate it with win32com & python. I can input a value into an input cell, but the sheet does not recalculate. When I try to read the output cell, I get an error as described below. Also, after running the below code, when I open up the sheet manually, I see that the calculated cells all say #VALUE!. Manually, I can press CTRL+ALT+F9, and the sheet recalculates, leaving no #VALUE! cells. Note, when I say "calculated cells", these cells rely on functions that are part of a VBA macro/the add-in.
Main question: How can I force a recalculation of the sheet from python?
Code:
import win32com.client
import win32api
def open_util():
excel = win32com.client.Dispatch("Excel.Application")
wb1 = excel.Workbooks.Open(r'C:\...the add-in.xla')
wb = excel.Workbooks.Open(r'C:\...the file name.xlsm')
ws = wb.Worksheets('the sheet name')
# get the values from two cells.
myInput = ws.Cells(1,1).Value # this is an input cell, not calculated
myOutput = ws.Cells(1,2).Value # this cell is calculated
print("The original value of myOutput is : ", myOutput)
# put a new value into the input cell.
newInput = 42
ws.Cells(1,1).Value = newInput
# Now I want the sheet to recalculate and give me a new output.
# (when manually operating this spreadsheet, I just input a new value,
# and the whole sheet automatically recalculates)
# Note: these two lines of code appear not to have any effect.
ws.EnableCalculation = True
ws.Calculate()
# get the new output
newOutput = ws.Cells(1,2).Value
wb.Close(True)
print("newOutput is : ", newOutput)
return True # I haven't finished writing the function... no return value yet.
Output:
The original value of myOutput is : 10 # this is fine.
newOutput is : -2146826273 # when I open the spreadsheet, this cell now says #VALUE!
About the add-in: I'm using this chemistry-related software which is basically a complicated spreadsheet and an execel add-in. The output cells in my code that say "#VALUE!": they use named formulas, which are in a macro that I can't look at (I suppose to not give away the code of the chemistry-related software).
Similar question: I found this similar question. In my case, I'm not sure the add-in is involved in calculating the cells. Anyway, I tried adding the code suggested in the answer:
def open_util():
excel = win32com.client.Dispatch("Excel.Application")
excel.AddIns.Add("rC:\...\add-in.xla").Installed = True # <-- This is the line I added.
...
However, this only generated an error message:
Traceback (most recent call last):
File "C:\...\my_program_name.py", line 246, in <module>
open_util()
File "C:\...\my_program_name.py", line 216, in open_util
excel.AddIns.Add("C:\...\add-in.xla").Installed = True
File "C:\Users\00168070\AppData\Local\Programs\Python\Python36-32\lib\site-packages\win32com\gen_py\00020813-0000-0000-C000-000000000046x0x1x9\AddIns.py", line 35, in Add
, CopyFile)
pywintypes.com_error: (-2147352567, '例外が発生しました。', (0, 'Microsoft Excel', 'Add method of AddIns class failed', 'xlmain11.chm', 0, -2146827284), None)
(Note: the bit of Japanese means, I think, 'An exception occurred'.)
Side question: Is there documentation about what python/win32com functions are available for excel, and how to write a program with them? (for example, I never would have known that I needed the line of code "ws.EnableCalculation = True" until I saw it in a fragemet of someone else' code.) I've found only bits of tutorials.

A little late, but I ran into a similar issue. I am also using a third-party add-in. I believe this was causing the issue.
Test case to see if
ws.EnableCalculation = True
ws.Calculate()
is working
import time
import win32com.client
xlApp = win32com.client.Dispatch("Excel.Application")
xlApp.Visible = True # so we can see what is happening
wb = xlApp.Workbooks.Open(excel_wb) # make excel wb a saved file where A1 = A2 + A3
ws = wb.ActiveSheet
for x in range(9):
ws.EnableCalculation = False
ws.Range('A2').Value = x
ws.Range('A3').Value = 2 * x
print(ws.Range('A1').Value) # value in A1 should remain the same
time.sleep(10)
ws.EnableCalculation = True
ws.Calculate()
# value in A1 should update
print(ws.Range('A1').Value) # value in A1 should be updated
This works just fine for me. Our expected output is 0, 3, 3, 6, 6, 9, etc. as it remains the same before updating.
If this code works for you, the issue is likely the plugin. For my plugin, the issue was a "phantom" excel instance that would hang out after closing the excel workbook and interfere with the plug-in.
To fix this, 2a. use a finally statement that closes excel at the end of the program and 2b. explicitly add a command line excel kill at the beginning of the main function
2a.
def quit_xl(xlApp, xlWB=None, save=True):
if xlWB is not None:
xlWB.Close(save)
xlApp.Quit()
del xlApp
2b.
import subprocess # be careful - make sure that an outside user can't pass anything in
def kill_xl():
try:
kill_excel = """"“taskkill / f / im
excel.exe”""" # command prompt to kill all excel instances
subprocess.run(kill_excel, shell=True)
return 0
except Exception as e:
return 1
put together:
def main():
kill_xl()
xlApp = win32com.client.Dispatch("Excel.Application")
try:
pass # main part of function
except Exception as e:
pass # if something goes wrong
finally:
quit_xl(xlApp)

Related

how to solve unsupported format, or corrupt file: expected bof record error

file_folder_address = 'C:/Users/Amirreza/Desktop/python homeworks/project files'
df_total=pd.DataFrame()
for file in os.listdir(file_folder_address): #os.listdir gives a list of exel file names
df_men_urb = pd.DataFrame()
df_women_urb = pd.DataFrame()
df_men_rural = pd.DataFrame()
df_women_rural = pd.DataFrame()
sheet_names = pd.ExcelFile(os.path.join(file_folder_address, file), engine='xlrd').sheet_names #create a list from sheet names
for i in sheet_names:
sht = pd.read_excel(os.path.join(file_folder_address, file), sheet_name= i)
I want to open several exel files with several worksheet and I write above code and it,s make this error.
Excel file format cannot be determined, you must specify an engine manually.

Reading an .xltx file and Writing to .xlsx file with openpyxl

I've been struggling with this problem for a while and couldn't find a solution else where. I have several excel templates in xltx format that I want to read, then write a new xlsx file after filling in some cells.
Every time I run my code it creates a corrupted excel file. Using a preview extension in VS code I'm able to see that the values were correctly changed. When I read an xlsx file instead of an xltx it works fine. Does openpyxl just not allow what I am trying to do?
import openpyxl
import win32com.client
report = openpyxl.load_workbook("0100048-A5_R_11.xltx")
sheet = report["A5 form"]
search_arr = ["Test_Date"]
for r in range(2, sheet.max_row+1):
for c in range(3,sheet.max_column+1):
val = sheet.cell(r,c).value
if val != None and "$!" in str(val):
sheet.cell(r,c).value = 1
report.active = 1
report.save("output.xlsx")
Copied from the docs:
You can specify the attribute template=True, to save a workbook as a template:
wb = load_workbook('document.xlsx')
wb.template = True
wb.save('document_template.xltx')
or set this attribute to False (default), to save as a document:
wb = load_workbook('document_template.xltx')
wb.template = False
wb.save('document.xlsx', as_template=False)
Although the last line is from the latest docs, as_template is not a keyword argument for save!
This works instead:
wb.save('document.xlsx')

Export pandas dataframe to xlsx: dealing with the openpyxl issue on python 3.9

Using latest packages version: openpyxl: 3.0.6 | pandas: 1.2.3 |python: 3.9
The function below was working fine before updating the packages above to the latest version reported.
Now it raises the error: "zipfile.BadZipFile: File is not a zip file".
Such fuction is really useful and would be great to know if it can be fixed in order to works.
The function below can be run as it is, just replace "pathExport" to your export directory for testing.
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
writer = pd.ExcelWriter(filename, engine='openpyxl')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
pathExport = r"F:\PYTHON\NB-Suite_python39\MNE\outputData\df.xlsx"
df1 = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue']
})
append_df_to_excel(pathExport, df1, sheet_name="DF1", index=False, startcol=0, startrow=0)
OK, I was able to replicate the problem. It is pandas related. Everything works just fine up to pandas 1.1.5
In pandas 1.2.0 they did some changes
At the time when you instantiate pd.ExcelWriter with
writer = pd.ExcelWriter(filename, engine='openpyxl')`
it creates empty file with size 0 bytes and overwrites the existing file and then you get error when try to load it. It is not openpyxl related, because with latest version of openpyxl it works fine with pandas 1.1.5.
The solution - specify mode='a', change the above line to
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
Alternatively - look at this or this solution where it loads the file before instantiating the pd.ExcelWriter.
EDIT: I've been advised in the comments that with mode='a' it will raise FileNotFoundError in case the file does not exists. Although it's unexpected that it will not create the file in this case, the solution is to move creating the writer inside the existing try block and create a writer with mode w in the except part:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
Returns: None
(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
from openpyxl import load_workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
writer = pd.ExcelWriter(filename, engine='openpyxl')
if startrow is None:
startrow = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)
# save the workbook
writer.save()
The solution is the following:
import pandas as pd
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, startcol=None,
truncate_sheet=False, resizeColumns=True, na_rep = 'NA', **to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.
Parameters:
filename : File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
df : dataframe to save to workbook
sheet_name : Name of sheet which will contain DataFrame.
(default: 'Sheet1')
startrow : upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
truncate_sheet : truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
resizeColumns: default = True . It resize all columns based on cell content width
to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
[can be dictionary]
na_rep: default = 'NA'. If, instead of NaN, you want blank cells, just edit as follows: na_rep=''
Returns: None
*******************
CONTRIBUTION:
Current helper function generated by [Baggio]: https://stackoverflow.com/users/14302009/baggio?tab=profile
Contributions to the current helper function: https://stackoverflow.com/users/4046632/buran?tab=profile
Original helper function: (c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
Features of the new helper function:
1) Now it works with python 3.9 and latest versions of pandas and openpxl
---> Fixed the error: "zipfile.BadZipFile: File is not a zip file".
2) Now It resize all columns based on cell content width AND all variables will be visible (SEE "resizeColumns")
3) You can handle NaN, if you want that NaN are displayed as NaN or as empty cells (SEE "na_rep")
4) Added "startcol", you can decide to start to write from specific column, oterwise will start from col = 0
*******************
"""
from openpyxl import load_workbook
from string import ascii_uppercase
from openpyxl.utils import get_column_letter
from openpyxl import Workbook
# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')
try:
f = open(filename)
# Do something with the file
except IOError:
# print("File not accessible")
wb = Workbook()
ws = wb.active
ws.title = sheet_name
wb.save(filename)
writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')
# Python 2.x: define [FileNotFoundError] exception if it doesn't exist
try:
FileNotFoundError
except NameError:
FileNotFoundError = IOError
try:
# try to open an existing workbook
writer.book = load_workbook(filename)
# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row
# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)
# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
except FileNotFoundError:
# file does not exist yet, we will create it
pass
if startrow is None:
# startrow = -1
startrow = 0
if startcol is None:
startcol = 0
# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, startcol=startcol, na_rep=na_rep, **to_excel_kwargs)
if resizeColumns:
ws = writer.book[sheet_name]
def auto_format_cell_width(ws):
for letter in range(1,ws.max_column):
maximum_value = 0
for cell in ws[get_column_letter(letter)]:
val_to_check = len(str(cell.value))
if val_to_check > maximum_value:
maximum_value = val_to_check
ws.column_dimensions[get_column_letter(letter)].width = maximum_value + 2
auto_format_cell_width(ws)
# save the workbook
writer.save()
Example Usage:
# Create a sample dataframe
df = pd.DataFrame({'numbers': [1, 2, 3],
'colors': ['red', 'white', 'blue'],
'colorsTwo': ['yellow', 'white', 'blue'],
'NaNcheck': [float('NaN'), 1, float('NaN')],
})
# EDIT YOUR PATH FOR THE EXPORT
filename = r"C:\DataScience\df.xlsx"
# RUN ONE BY ONE IN ROW THE FOLLOWING LINES, TO SEE THE DIFFERENT UPDATES TO THE EXCEL FILE
append_df_to_excel(filename, df, index=False, startrow=0) # Basic Export of df in default sheet (Sheet1)
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0) # Append the sheet "Cool" where "df" is written
append_df_to_excel(filename, df, sheet_name="Cool", index=False) # Append another "df" to the sheet "Cool", just below the other "df" instance
append_df_to_excel(filename, df, sheet_name="Cool", index=False, startrow=0, startcol=5) # Append another "df" to the sheet "Cool" starting from col 5
append_df_to_excel(filename, df, index=False, truncate_sheet=True, startrow=10, na_rep = '') # Override (truncate) the "Sheet1", writing the df from row 10, and showing blank cells instead of NaN

Add dataframe and button to same sheet with XlsxWriter

I am able to create an excel file with in one sheet the data from a data frame and in a second sheet a button to run a macro
What I need is to have both the data from the dataframe than the button in the same sheet
This is the code I found that I have tried to modify:
import pandas as pd
import xlsxwriter
df = pd.DataFrame({'Data': [10, 20, 30, 40]})
writer = pd.ExcelWriter('hellot.xlsx', engine='xlsxwriter')
worksheet = workbook.add_worksheet()
#df.to_excel(writer, sheet_name='Sheet1')
workbook = writer.book
workbook.filename = 'test.xlsm'
worksheet = workbook.add_worksheet()
workbook.add_vba_project('./vbaProject.bin')
worksheet.write('A3', 'Press the button to say hello.')
#Add a button tied to a macro in the VBA project.
worksheet.insert_button('A1', {'macro': 'start',
'caption': 'Press Me',
'width': 80,
'height': 30})
df.to_excel(writer, sheet_name ='Sheet2')
writer.save()
workbook.close()
I know that you simply asked how to insert the button in the same sheet but i decided to check how the macros are working with xlsxwriter, so i wrote a complete tutorial on how to add a macro.
1) Firstly we need to create manually a file which will contain the macro in order to extract it as a bin file and inject it later using xlsxwriter. So we create a new excel file, go to Developer tab, Visual Basic, Insert Module and write the following code:
Sub TestMsgBox()
MsgBox "Hello World!"
End Sub
Save the file with xlsm extension to contain the macro, e.g. as Book1.xlsm.
2) Now we need to extract the bin file. Open your cmd and browse to the directory where you have saved the Book1.xlsm. Then browse through file explorer to the folder where you have installed python (or to the virtual environment folder) and search for vba_extract.py. Copy this script into the same folder as the Book1.xlsm. Then type in cmd:
python vba_extract.py Book1.xlsm
This way you will extract the macro and create the vbaProject.bin file in the same folder.
3) Now it's time to create the final file. Delete the Book1.xlsm and the vba_extract.py files as they 're not needed anymore and run the following code:
import pandas as pd
# Create a test dataframe
df = pd.DataFrame({'Data': [10, 20, 30, 40]})
# Import it through the xlsxwriter
writer = pd.ExcelWriter('hello_world.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Create the workbook and the worksheet
workbook = writer.book
workbook.filename = 'hello_world.xlsm' # rename the workbook to xlsm extension
worksheet = writer.sheets['Sheet1']
# Inject the bin file we extracted earlier
workbook.add_vba_project('./vbaProject.bin')
# Insert a description
worksheet.write('B1', 'Press the button to say hello.')
#Add a button tied to a macro in the VBA project.
worksheet.insert_button('B2', {'macro': 'TestMsgBox',
'caption': 'Press Me',
'width': 80, 'height': 30})
# Finally write the file
writer.save()
Now button is in the same sheet as your data and working:

Python refresh data in excel with external connection through add in doesnt work

I am using the following code to refresh data in excel file which uses external add in for receiving data.
import sys, os, pandas as pd, numpy as np, time, win32com.client
import win32com.client as w3c
if __name__ == '__main__':
your_file_path = r'C:\Book11.xlsx'
for ii in np.arange(1, 10):
xlapp = w3c.gencache.EnsureDispatch('Excel.Application')
xlapp.Visible = 0
xlwb = xlapp.Workbooks.Open(your_file_path, False, True, None)
books = w3c.Dispatch(xlwb)
xlwb.RefreshAll() # Runs with no errors, but doesn't refresh
xlapp.DisplayAlerts = False
xlwb.Save()
xlapp.Quit()
df = pd.read_excel(your_file_path) # updates should be applied
print(df)
time.sleep(20)
# Another version of code that I tried is following:
# xlapp = win32com.client.DispatchEx("Excel.Application")
# xlapp.Visible = True
# wb = xlapp.Workbooks.Open(your_file_path)
# wb.RefreshAll()
# xlapp.CalculateUntilAsyncQueriesDone()
# xlapp.DisplayAlerts = False
# wb.Save()
# xlapp.Quit()
However, the file doesn't refresh. In fact it looks like the following:
On the other hand if I just open the file on desktop using mouse clicks, I see the data as expected.
Are you running this as a macro?
Is refresh in bg property is false for all connections?
Things to try:
a)
Calculate
ActiveWorkbook.RefreshAll instead of wbRefresh.RefreshAll
b)
Unchecking "enable background refresh" (uncheck to disable the background refresh)