openpyxl destroys functions on save - openpyxl

I'm trying to save pandas DF into an existing spreadsheet. I found an excellent answer at Writing Pandas DataFrame to Excel: How to auto-adjust column widths, which is really continuation of another question *)
The problem though is that when I use it, on trying to load the spreadsheet I get an error on "damaged content", complaining about a drawing - even though I have none in the spreadsheet, and all functions are gone. Static data are still there.
log is
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error171360_05.xml</logFileName>
<summary>Errors were detected in file 'test.xlsx'</summary>
-<repairedRecords summary="Following is a list of repairs:">
<repairedRecord>Repaired Records: Drawing from /xl/drawings/drawing1.xml part (Drawing shape)</repairedRecord>
</repairedRecords>
</recoveryLog>
Any ideas?
Edit: I'm pretty sure now it's not caused by pandas, as opening workbook, adding an empty sheet, and saving it removes all the formulas.
workbook = load_workbook(file)
try:
sheet = workbook["Result"]
except KeyError:
sheet = workbook.create_sheet("Result")
# for r in dataframe_to_rows(result, index=False, header=True):
# sheet.append(r)
workbook.save(file)
It doesn't produce the error above though.
Edit2: There's a question from 2013 (Openpyxl: Formulas getting removed when saving file) which says OpenPyxl doesn't support it, with a feature requested to do so. But the link to the feature doesn't work, so I have no idea whether it works or not.
*) there is a small bug in the function in that answer, sheet_name is a param, but it also tries to look it up in **kwargs, which of course fails, so gets replaced by a default value even if passed into the function. I can't comment on the question, so maybe #maxU will read this and edit..

Related

Does openpyxl preserve cell color?

I'm creating an Excel file from the template. So I expect the formatting of the template to be preserved. However it seems saving of the workbook to new file looses some of formatting (at least cell color).
Original file looks like that:
I do then following:
import openpyxl
wb = openpyxl.open('c:\\temp\\test_templ.xlsx')
wb.save('c:\\temp\\test.xlsx')
Resulting file is 9KB smaller than original and looks like this:
Is there any way to save the Excel file with keeping the formatting?
Yes, it does. The problem I have met was abnormally complex Excel file with thousands of styles. So it seems some of them weren't properly read by openpyxl and hence the problem I had. But if you start with the clean slate and add necessary formatting openpyxl does the job just fine.

Sheet not found via codename

I'm running across and error that appears sporadically. Essentially, a master .xlsm file is used by multiple people to populate data for aggregation. I then use another .xlsm file with macros to pull the data and aggregate it into a single source file.
The code generally works quite well, with one exception:
'define range based on count'
Set rngItemRange = SourceWb.Sheets("Quality").Range(Cells(3, i), Cells((intItemCount + 2), i))
'write concatenated range'
targetwb.Sheets("Raw Data").Cells(pintDest_row, pintCol).Value = concatRange(rngItemRange)
It cannot find the tab "Quality" on some work books (but not all). Okay, maybe someone renamed the tab? checked that, thats not true. One thing that works as a work around: on the open source workbook, when debug throws an error, if i manually click the quality tab, then resume the macro, it will continue.
I also said, okay, well maybe there's some weird character recognition issue, so I started to refer to the sheet by the codename as shown in the vba editor. I experience the same behaviors.
This affects maybe 50% of the workbooks, and I cant find any root cause. I have similar code elsewhere, for different sheets, but this is the only one where i define a range to pass into a function using the "set" command. Again, this only happens sometimes on some workbooks, and i can continue to execute when i manually click on the tab i'm pulling data from.
I'll also add, there is only one workbook open with this sheet name, at any given time, so i dont think it's choking trying to figure which sheet is relevant. Plus sourceWb is a set variable.
Does someone have any clue whats going on? anything to try? solutions? help!
(Forgive any typos, I'm working on a broken thumb right now.)

grab and filter from more than 255 columns from a huge closed workbook

i have a huge workbook (0.6 million rows) and 315 columns whose column names i need to grab into an array. due to the huge size, i don't want to open and close the workbook to copy the 1st row of the range. Also, I want to only grab certain columns from the 1st row that begin with the word "Global ".
can anyone help with short code example on how to go about doing this? please note i have tried ADOX, ADO etc but both show the 255 column limitations. I also dont want to open the workbook, but pull the required "Global " columns from the 315 columns into an array.
any help is most appreciated.
You can copy the first row of your target by opening a new workbook, and in A1 use this formula:
='C:\PATH_TO_TARGET\[TARGET_FILE_NAME.xlsx]WORKSHEET_NAME'!A1
Note that PATH+FILENAME+WORKSHEET is enclosed in single quotes, the FILENAME is enclosed in square brackets, and an exclamation separates the cell reference.
Then copy/Paste or fill right to get the next 314 columns. Note: this formula will return zero for empty target cells.
Once you have the column heading you can copy/paste_special_values if you want to destroy the links to the closed workbook.
Hope that helps
You could use the Python programing language.
While it does not actively works with XLSX fiels, you just have to install the openpyxl external module from here: https://pypi.python.org/pypi/openpyxl -
(You will also have to install Python. of course - just download it from www.python.org)
It will make working with your data in an interactive Python session a piece of cake, and the time to open the workbook without having to load the Excel interface should be a fraction of what you are expecting. (I think it will have to fit in your memory, though).
But this is all I had to type, in an interactive Python2 session to open a workbook, and retreive the column names that start with "bl":
import openpyxl
a = openpyxl.load_workbook("bla.xlsx")
[cell.value for cell in a.worksheets[0].rows[0] if cell.value.startswith("bl")]
output:
Out[8]: [u'bla', u'ble', u'bli', u'blo', u'blu']
The last input line requires on to know Python to be understood, so, here is a summary of what happens: Python is a language very fond of working with sequences - and the openpyxl libray gives your workbook as just that:
an object which is a sequence of worksheets - each worksheet having a rows attribute which has a sequence of all rows in the sheet, and each row bein a sequence of cells. Each cell has a value attribute which is the text within it.
The inline for statement is the compact form, but it could be written as a multiple line statement as:
In [10]: for cell in a.worksheets[0].rows[0]:
....: if cell.value.startswith("bl"):
....: print cell.value
....:
bla
ble
bli
blo
blu
Keep in mind that by exploring Python a bit deeper, you can programatically manipulate your data in a way that will be easier than ininteractivelygiven a data-set this size - and you can even use Python itself to drop select contents to an SQL database, (including its bult-in, single-file database, sqlite), where sophisticated indexes and queries can make working with your data a breeze)

vb.NET SaveAs not saving all Excel data

I have a very strange issue that I cannot seem to find an answer to online.
I have a VB.NET application that creates an Excel of data (roughly 42,542 rows in total) and the saves the file to a folder location & opens it on screen for the user.
The onscreen version & folder version is only showing 16,372 rows of data like it is being cut off.
When I go through debug I can see all the rows are being added & if I save manually in debug all the rows save. Some data seems to get lost on the system save.
I am taking data from 4 record sets & writing each set one after the other with specific headers for each block on the Excel sheet.
My save line is:
xlWBook.SaveAs(Filename:=sFileName, FileFormat:=Excel.XlFileFormat.xlExcel7)
Would anyone please have any ideas as to what this might be?
Older version of Excel only support 16,384 rows per worksheet. You are saving as Excel7 (which is Excel 95) and has this limitation:
See here for a summary of sizes per version:
https://superuser.com/questions/366468/what-is-the-maximum-allowed-rows-in-a-microsoft-excel-xls-or-xlsx
Change your code to another format, See here for all the allowed formats: XlFileFormat Enumeration
However the file format is actually an optional argument in the SaveAs method, so you could leave it off altogether: "For an existing file, the default format is the last file format specified; for a new file, the default is the format of the version of Excel being used."
Source: WorkBook.SaveAs Method

Calculate a percentile using EPPlus

I'm trying to use either PERCENTILE.EXC, PERCENTILE.INC or PERCENTILE.
Looking at FormulaParserManager.GetImplementedFunctionNames() these are not implemented functions.
I wondered if I could set the formula and leave it to Excel to calculate. So far I've not got this to work and I get a #NAME? and "The formula contains unrecognized text". Merely clicking in the formula bar causes the formula to be calculated correctly.
Inspecting the internals of the Excel file I am creating (via EPPlus):
_xludf.PERCENTILE.EXC(B14:B113,0.95)
whereas in Excel I get:
_xlfn.PERCENTILE.EXC(A14:A113,0.95)
I think this is user defined function vs function. I've tried prefixing "_xlfn." to my formula string.
This is as far as I've got I think I either need to roll my own percentile calculation in code or manipulate the xml in the Excel file maybe.
Any help appreciated.
Doh! I spend all afternoon stuck, post a question here and then immediately suss the answer...
Anyway in case anyone is interested:
var row95 = percentile95RowLookup[profileKey];
var percentileRange = sheet.Cells[row, i, row+ profileCollection.Profiles.Count-1, i];
sheet.Cells[row95, i].Formula = $"_xlfn.PERCENTILE.INC({percentileRange.Address},0.95)";
With the important proviso that workbook.CalcMode is not Manual.