Does openpyxl preserve cell color? - openpyxl

I'm creating an Excel file from the template. So I expect the formatting of the template to be preserved. However it seems saving of the workbook to new file looses some of formatting (at least cell color).
Original file looks like that:
I do then following:
import openpyxl
wb = openpyxl.open('c:\\temp\\test_templ.xlsx')
wb.save('c:\\temp\\test.xlsx')
Resulting file is 9KB smaller than original and looks like this:
Is there any way to save the Excel file with keeping the formatting?

Yes, it does. The problem I have met was abnormally complex Excel file with thousands of styles. So it seems some of them weren't properly read by openpyxl and hence the problem I had. But if you start with the clean slate and add necessary formatting openpyxl does the job just fine.

Related

openpyxl destroys functions on save

I'm trying to save pandas DF into an existing spreadsheet. I found an excellent answer at Writing Pandas DataFrame to Excel: How to auto-adjust column widths, which is really continuation of another question *)
The problem though is that when I use it, on trying to load the spreadsheet I get an error on "damaged content", complaining about a drawing - even though I have none in the spreadsheet, and all functions are gone. Static data are still there.
log is
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
-<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error171360_05.xml</logFileName>
<summary>Errors were detected in file 'test.xlsx'</summary>
-<repairedRecords summary="Following is a list of repairs:">
<repairedRecord>Repaired Records: Drawing from /xl/drawings/drawing1.xml part (Drawing shape)</repairedRecord>
</repairedRecords>
</recoveryLog>
Any ideas?
Edit: I'm pretty sure now it's not caused by pandas, as opening workbook, adding an empty sheet, and saving it removes all the formulas.
workbook = load_workbook(file)
try:
sheet = workbook["Result"]
except KeyError:
sheet = workbook.create_sheet("Result")
# for r in dataframe_to_rows(result, index=False, header=True):
# sheet.append(r)
workbook.save(file)
It doesn't produce the error above though.
Edit2: There's a question from 2013 (Openpyxl: Formulas getting removed when saving file) which says OpenPyxl doesn't support it, with a feature requested to do so. But the link to the feature doesn't work, so I have no idea whether it works or not.
*) there is a small bug in the function in that answer, sheet_name is a param, but it also tries to look it up in **kwargs, which of course fails, so gets replaced by a default value even if passed into the function. I can't comment on the question, so maybe #maxU will read this and edit..

Excel VBA: Memory Exceeds Limits on Second Import

I am using 32-bit 2013 Excel with VBA extensively. I have disabled hardware graphics acceleration and COM add-ins, yet I still struggle with the following problem:
I am importing the contents of another large workbook with formatting on the cells but with no formulas (~3mb Excel file) into the problematic Excel workbook. On the first attempt - when the contents have not been imported yet - the import succeeds. I am importing the content via VBA similar to the following code:
Application.Workbooks(F_Home).Activate
Workbooks(F_Home).Sheets(Sheet1).Visible = xlSheetVisible
...
Application.Workbooks(F_Source).Activate
Workbooks(F_Source).Sheets(S_Source).Cells.Copy Destination:=Workbooks(F_Home).Sheets(Sheet1).Cells
Application.Workbooks(F_Home).Activate
F_Home is the problematic Excel workbook and F_Source is the workbook with the content we are importing in. When doing this the first time it works, and then I save the file and close out of it, and reopen the file, and try this a second time. On the second time we attempt to import the contents (when the contents already imported) the F_Home workbook crashes with the Out of Memory Error (There isn't enough memory to complete this action...) on the line that copies F_Source contents to F_Home.
Using Process Explorer I've found that the Excel process usually runs around 600mb - 700mb in virtual memory size, but when we run the VBA script to import the contents a second time, the virtual memory size suddenly jumps to 4gb (this does not happen on the first time around, which stays at the 600mb - 700mb range). How should I fix this? I cannot do a workaround such as saving the file before importing the contents a second time, because the timestamp on the file is used and saving the file through VBA will confuse some users.
Thank you for your help.
Copy and pasting the whole sheet is not a wise approach. Let's assume you have data from "A1:Z99999". You can do this which is going to be much faster.
Set S_Range = Workbooks(F_Source).Sheets(S_Source).Range("A1:Z99999")
Set H_Range = Workbooks(F_Home).Sheets(Sheet1).Range("A1:Z99999")
H_Range.Value = S_Range.Value
Also read why you should avoid selecting and activating in vba.

How to keep the first space in Cell on a .xlt

I worked on an export of data from an ERP to Excel but I encoutered a problem.
When I received my datas on my model Excel (.xlt, i don't have a choice for the extension...), all first spaces of fields in the ERP disappeared on my worksheet...
An exemple (Here, spaces before "Holder") :
And now, on excel, without spaces... :
And the last information, I think the problem is only on file type .xlt (97/03) (The only one I can use of course...) because when I try an export in .xls, there is no problem.
I already tried to change the type of cell in Text or Standard but it doesn't work.
Did you have a solution ?
Thanks !
Let me outline a typical solution:
You have a "data source" you cannot control - in this case it's an xlt file that somewhere on your hard drive - call it export1.xlt
You want to add the data from a data source (export1.xlt) to a "database" which could just be another aggregate spreadsheet or whatever. Let's call it database1.xlsx.
Typcially you would create a marcro inside database1.xlsx that knows how to import data into intself - in this case let's say you give a path e.g. C:\temp\export1.xlt and tell it to copy that data to Sheet1.
When you run that macro it will open export1.xlt, read the data into Sheet1 of database1.xlsx, and perform any necessary post-processing.
In this case the post processing could simply be looping over every cell to looking for a missing space.

SQL developer Import data wizard comes up blank

So I have a file that has over 100 entries in it as an excel worksheet. I want to put those over into a sql. So I fire up my sql developer and try and import the data but it doesn't show up.
The next and finish buttons don't do anything. (the blue underline words aren't links to anything either fyi)
Have you tried converting the original file to text (csv) then importing? That has worked for me in the past.
I had the same problem and the only way to get rid of it was to rename the preferences folder (as described here: https://www.thatjeffsmith.com/archive/2015/08/how-to-reset-your-sql-developer-preferencessettings/) and start the program with factory defaults.
One more reason is that the CSV or excel file is of 0 kb size or empty.

Convert xls File to csv, but extra rows added?

So, I am trying to convert some xls files to a csv, and everything works great, except for one part. The SaveAs function in the Excel interop seems to export all of the rows (including blank ones). I can see these rows when I look at the file using Notepad. (All of the rows I expect, 15 rows with two single quotes, then the rest are just blank). I then have a stored procedure that takes this csv and imports to the desired table (this works on spreadsheets that have been manually converted to csv (e.g. open, File--> Saves As, etc.)
Here is the line of code I am using for my SavesAs in my code. I have tried xlCSV, xlCSVWindows, and xlCSVDOS as my file format, but they all do the same thing.
wb.SaveAs(aFiles(i).Replace(".xls", "B.csv"), Excel.XlFileFormat.xlCSVMSDOS, , , , False) 'saves a copy of the spreadsheet as a csv
So, is there some additional step/setting I need to do to not get the extraneuos rows to show up in the csv?
Note that if I open this newly created csv, and then click Save As, and choose csv, my procedure likes it again.
When you create a CSV from a Workbook, the CSV is generated based upon your UsedRange. Since the UsedRange can be expanded simply by having formatting applied to a cell (without any contents) this is why you are getting blank rows. (You can also get blank columns due to this issue.)
When you open the generated CSV all of those no-content cells no longer contribute to the UsedRange due to having no content or formatting (since only values are saved in CSVs).
You can correct this issue by updating your used range before the save. Here's a brief sub I wrote in VBA that would do the trick. This code would make you lose all formatting, but I figured that wasn't important since you're saving to a CSV anyway. I'll leave the conversion to VB.Net up to you.
Sub CorrectUsedRange()
Dim values
Dim usedRangeAddress As String
Dim r As Range
'Get UsedRange Address prior to deleting Range
usedRangeAddress = ActiveSheet.UsedRange.Address
'Store values of cells to array.
values = ActiveSheet.UsedRange
'Delete all cells in the sheet
ActiveSheet.Cells.Delete
'Restore values to their initial locations
Range(usedRangeAddress) = values
End Sub
Tested your code with VBA and Excel2007 - works nice.
However, I could replicate it somewhat, by formatting an empty cell below my data-cells to bold. Then I would get empty single quotes in the csv. BUT this was also the case, when I used SaveAs.
So, my suggestion would be to clear all non-data cells, then to save your file. This way you can at least exclude this point of error.
I'm afraid that may not be enough. It seems there's an Excel bug that makes even deleting the non-data cells insufficient to prevent them from being written out as empty cells when saving as csv.
http://answers.microsoft.com/en-us/office/forum/office_2010-excel/excel-bug-save-as-csv-saves-previously-deleted/2da9a8b4-50c2-49fd-a998-6b342694681e
Another way, without a script. Hit Ctrl+End . If that ends up in a row AFTER your real data, then select the rows from the first one until at least the row this ends up on, right click, and "Clear Contents".