Openpyxl corrupted my excel file and now it is lost? - openpyxl

I have been using openpyxl to write some data to an already existing Excel file and I was finding it quite user friendly and easy.
However, things started to go wrong yesterday.
The excel file that my python code loads has some titles in A2, B2, C2, e.t.c. and the code prints some data below each column.
During the afternoon I ran the code (I have been running it a lot and doing lots of trial and error) and it worked exactly as hoped except a few columns in various sheets of the Excel file had mysteriously disappeared. This was because some columns (mostly where there was data) had been minimized to 0 pixels wide for some reason, and once I had expanded the column wider again, everything was as it should be. One of the sheets that had had columns being minimized was not being touched by the code. Not sure if this is relevant but found it strange so thought I'd add it anyway.
Then during the evening, I ran the code again trialing a couple of new functions I had added. It looked to be working fine, but when it got to the .save() command at the end it threw out all these errors:
Traceback (most recent call last):
File "whoscored_scraper.py", line 239, in <module>
wb2.save("WhoscoredDatabase.xlsx")
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\workbook\workbook.py", l
ine 296, in save
save_workbook(self, filename)
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\writer\excel.py", line 1
91, in save_workbook
writer.save(filename, as_template=as_template)
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\writer\excel.py", line 1
74, in save
self.write_data(archive, as_template=as_template)
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\writer\excel.py", line 8
5, in write_data
self._write_worksheets(archive)
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\writer\excel.py", line 1
11, in _write_worksheets
write_worksheet(sheet, self.workbook.shared_strings,
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\writer\worksheet.py", li
ne 299, in write_worksheet
xf.write(comments)
File "C:\Users\SamH\Documents\Betting\Python\lib\contextlib.py", line 24, in _
_exit__
self.gen.next()
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\xml\xmlfile.py", line 51
, in element
self._write_element(el)
File "C:\Users\SamH\Documents\Betting\Python\openpyxl\xml\xmlfile.py", line 78
, in _write_element
xml = tostring(element)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 1126, in tostring
ElementTree(element).write(file, encoding, method=method)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 932, in _serialize_xml
v = _escape_attrib(v, encoding)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 1092, in _escape_attrib
_raise_serialization_error(text)
File "C:\Users\SamH\Documents\Betting\Python\lib\xml\etree\ElementTree.py", li
ne 1052, in _raise_serialization_error
"cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize 7L (type long)
My code is quite long, so below are some of the lines that involve openpyxl:
from openpyxl import workbook
from openpyxl import load_workbook
wb2 = load_workbook('filename.xlsx')
ws = wb2["Shots_and_Goals"]
ws4 = wb2["Assorted"]
ws2 = wb2["On_Target"]
def HomeTeam(string):
HomeTeam_cell = “c” + str(3)
ws[str(HomeTeam_cell)] = Home_Team #Home_Team is some variable
def Home_Shot_Minutes(string):
for num in range (0, 10):
cell = column_titles[num+10] + str(3)] #column_titles is just a list
ws[str(cell)] = int(shot_count_for_that_minute) #shot_count_for_that_minute is variable defined above
HomeTeam(data)
Home_Shot_Minutes(data)
Wb2.save(“filename.xlsx”)
There’s a lot more lines of code that are basically the same as those listed above. Everything had been working correctly up until this point.
When I try and open the new saved file, it says Excel has found unreadable content
When I say I want to recover the unreadable content, my new excel file is totally blank apart from the sheet names.
I know it was a stupid decision to not save the file as a different filename as then it would mean I don’t have to remake my entire Excel file!
Does anyone understand what went wrong here?

Unfortunately, the data is lost because the original file was overwritten.
Please submit a bug report with the original file so that we can investigate. It's clear where the error is coming from but we need the original data to reproduce it.
In the meantime I suspect the problem is unlikely to happen if you install lxml.

Related

Visual basic write in an opened file (printLine)

So I want to do this:
Open file "this.txt"
Write a line to this file (replacing anything else written to this file)
[Other stuff, irrelevant to the file]
Write a line to this file (replacing anything else written to this file)
Close the file
I thought it would be easy, but I was wrong. I tried many ways, but they all failed. Either they wouldn't let me write in an open file, or they would open the file and immediately close it (WriteAllText).
I ended up using FileOpen(), PrintLine() and FileClose() which lets me write in an open file but PrintLine only writes a new line, it doesn't replace everything in the file. Any help? Either with the printline or the whole thing
It is crucial that the file stays open until the very last moment I want it closed, (cause I have another program checking to see when this file is not open/used).
If this is about VB.NET, then you can use File.Open() with mode = FileMode.Truncate. This clears the file on opening. This assumes that the file exists.
You can also use SetLength() to truncate:
Dim f As FileStream
' FileMode.Truncate also works, but the file needs to exist from before
f = File.Open("test.txt", FileMode.OpenOrCreate, FileAccess.Write)
f.SetLength(0) ' truncate to zero size
Dim line As String = "hello2"
Dim w = New StreamWriter(f)
w.WriteLine(line)
w.Flush()
w.Close()
f.Close()
If this is about VB6, check out this question. One solution there is to import and use a native Windows function that does truncation.
In VB.Net OpenTextFileWriter does exactly what you need (docs):
Dim file As System.IO.StreamWriter
file = My.Computer.FileSystem.OpenTextFileWriter("c:\test.txt", False)
file.WriteLine("Here is the first string.")
file.Close()

Proper usage of load_workbook in openpyxl when saving data to worksheet in existing file

I have a dataframe called results and an excel file named as vlpandas.xlsx. I set a default path for working dir as follows:
excel_dir = 'Users/Documents/Pythonfiles'
Based on the example from this post,: How to save a new sheet in an existing excel file, using Pandas?, I did the following:
book = load_workbook(excel_dir)
But I an error above after running the command, to fix it I use the usage as documented from the openpyxl and the following works:
book = load_workbook(filename = 'vlpandas.xlsx')
But then I get an exception error when I run the command below. Something about workbook.py from the openpyxl directory.
writer = pd.ExcelWriter(excel_dir, engine='openpyxl')
and then I want to complete my task saving the data to the new worksheet in the existing file vlpandas.xlsx with the following lines of code:
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
## Your dataframe to append.
results.to_excel(writer, '3rd_sheet')
writer.save()
So my two questions are:
1) What is the proper usage of load_Workbook from openpyxl
2) Why I am I getting an error with the command line below. I also changed my working folder path using the command osc.chdir:
writer = pd.ExcelWriter(excel_dir, engine='openpyxl')
Regards,
Gus

VBA Reading From a UCS-2 Little Endian Encoded Text File

I have a whole bunch of text files that are exported from Photoshop that I need to import into an Excel document. I wrote a macro to get the job done and it seemed to work just fine for my test document but when I tried loading in some of the actual files produced by Photoshop Excel started putting all the data in a separate column except for the first line.
My code that reads the text file:
Open currentDocPath For Input As stream
Do Until EOF(stream)
Input #stream, currentLine
columnContents = Split(currentLine, vbTab)
For n = 0 To UBound(columnContents)
ActiveSheet.Cells(row, Chr(64 + colum + n)).Value = columnContents(n)
Next n
row = row + 1
Loop
Close stream
The text files I am reading look like this, only with much more data:
"Name" "Data" "Info" "blah"
"Name1" "Data1" "Info1" "blah1"
"Name2" "Data2" "Info2" "blah2"
The problem seemed pretty trivial, but when I load it into excel, instaed of looking like it does above it looks like this:
ÿþ"Name" "Data" "Info" "blah"
Name1
Data1
Info1
blah1
Name2
Data2
Info2
blah2
Now I am not sure why this is happening. It seems like the first two characters in the first row are there because those bytes declare the text encoding. Somehow those characters keep the first row formatted correctly while the remaining rows lose their quotation marks and all get moved to new lines.
Could someone who understands UCS-2 Little Endian text encoding explain how I can work around this? When I convert the files to ASCII it works fine.
Cheers!
edit: Okay so I understand now that the encoding is UTF-16 (I don't know a whole lot about character encoding). My main issue is that it's formatting strangely and I don't understand why or how to fix it. Thanks!
As I mentioned in my comment, it appears the file you're trying to import is encoded in UTF-16.
In this vbaexpress.com article, someone suggested that the following should work:
Dim GetOpenFile As String
Dim MyData As String
Dim r As Long
GetOpenFile = Application.GetOpenFilename
r = 1
Open GetOpenFile For Input As #1
Do While Not EOF(1)
Line Input #1, MyData
Cells(r, 1).Value = MyData
r = r + 1
Loop
Close #1
Obviously I can't test it myself, but maybe it'll help you.
Why not just tell excel to import the file. MS has probably put hundreds of thousands of person hours into that code. Record the importation to get easy code.
Remember Excel is a tool for non programmers to do programming things. Use it instead of trying to replace it.
These are the replacement file functions that you use for new code. Add a reference to Microsoft Scripting Runtime.
Opens a specified file and returns a TextStream object that can be used to read from, write to, or append to the file.
object.OpenTextFile(filename[, iomode[, create[, format]]])
Arguments
object
Required. Object is always the name of a FileSystemObject.
filename
Required. String expression that identifies the file to open.
iomode
Optional. Can be one of three constants: ForReading, ForWriting, or ForAppending.
create
Optional. Boolean value that indicates whether a new file can be created if the specified filename doesn't exist. The value is True if a new file is created, False if it isn't created. If omitted, a new file isn't created.
format
Optional. One of three Tristate values used to indicate the format of the opened file. If omitted, the file is opened as ASCII.
The format argument can have any of the following settings:
Constant Value Description
TristateUseDefault
-2
Opens the file using the system default.
TristateTrue
-1
Opens the file as Unicode.
TristateFalse
0
Opens the file as ASCII.

Using ReadLine, where did my text go?

I'm pretty new to visual basic (and coding in general) so if I've made any really simple mistakes let me know.
Right now, I'm getting a pretty weird problem with my vb.net code.
The filestream is able to correctly open the file and read from it - but what's weird is that while the code is able to read a bunch of lines from the beginning of the file, when I manually open the file in notepad I'm not. Here's the code:
Dim fs, f, s 'filesystemobject, file, stream.
fs = CreateObject("Scripting.FileSystemObject")
f = fs.GetFile(CurrDataPath) ' This change made to ensure the correct file is opened
s = f.OpenAsTextStream(1, 0) ' 1 = ForReading, 0 = as ASCII (which i think is right?)
Dim param(14) As String
Dim line As String
line = s.ReadLine()
While i <= 14
i += 1
MessageBox.Show(line)
line = s.ReadLine()
End While
(I've read that arrays are a bad idea but they've been convenient and haven't caused me any problems so I've been using them anyways.)
What's weird is that when this code is run, it will (in the message boxes) show me the information I want to see - which isn't bad at all. The information that I want looks like this:
BEGINPARAM
parameter1, 0
parameter2, 7.5
ENDPARAM
EDIT:
After using Path.GetFullPath(DFile), I found that there were two files in different directories with the same name DFile. The file I had been opening in Notepad was saved in the directory where I expected it to be saved, while the file the code was reading was saved in the VB project's folder.
Once I changed the code to rely on CurrDataPath which includes the expected path, the code read from the file exactly what I did in notepad.
I do have word wrap on in notepad, so I know that's not the issue, however, I will look into getting notepad++.
The file named DFile is created in a c++ program that I'll be digging through to find out why one part of the file is written to a different folder than the rest.
Obviously I'm missing something important, and if anyone could help, that would be great.
*Note: This is a vb6 migration project so if anyone asks I can provide the old code.
Assuming the most recent version of VB.Net, the modern way to write that is like this:
For Each line As String In File.ReadLines(CurrDataPath).Take(14)
MessageBox.Show(line)
Next
I'm not 100% clear on what you're saying. There's nothing in this code that outputs to a file, so what you have to be saying is that when you open the file referenced by "DFile" on line 3 above, that file doesn't have the lines containing "parameter1, 0" and "parameter2, 7.5" in it?
Since we know that's not technically possible, do verify the answer to the question above and make sure you're really opening the same file in notepad as the script is opening. The second thing to do is to turn on Word Wrap in Notepad or download Notepad++ (a text editor I think everyone should have anyway) and make sure that the data's actually missing, and not just not showing on your screen because it's not using Windows style line endings.

How to : streamreader in csv file splits to next if lowercase followed by uppercase in line

I am using asp.Net MVC application to upload the excel data from its CSV form to database. While reading the csv file using the Stream Reader, if line contains lower case letter followed by Upper case, it splits in two line . EX.
Line :"1,This is nothing but the Example to explanationIt results wrong, testing example"
This line splits to :
Line 1: 1,This is nothing but the Example to explanation"
Line 2:""
Line 3:It results wrong, testing example
where as CSV file generates right as ""1,This is nothing but the Example to explanationIt results wrong, testing example"
code :
Dim csvFileReader As New StreamReader("my csv file Path")
While Not csvFileReader.EndOfStream()
Dim _line = csvFileReader.ReadLine()
End While
Why should this is happening ? how to resolve this.
When a cell in an excel spreadsheet contains multiple lines, and it is saved to a CSV file, excel separates the lines in the cell with a line-feed character (ASCII value 0x0A). Each row in the spreadsheet is separated with the typical carriage-return/line-feed pair (0x0D 0x0A). When you open the CSV file in notepad, it does not show the lone LF character at all, so it looks like it all runs together on one line. So, in the CSV file, even though notepad doesn't show it, it actually looks like this:
' 1,"This is nothing but the Example to explanation{LF}It results wrong",testing example{CR}{LF}
According to the MSDN documentation on the StreamReader.Readline method:
A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n").
Therefore, when you call ReadLine, it will stop reading at the end of the first line in a multi-line cell. To avoid this, you would need to use a different "read" method and then split on CR/LF pairs rather than on either individually.
However, this isn't the only issue you will run into with reading CSV files. For instance, you also need to properly handle the way quotation characters in a cell are escaped in CSV. In such cases, unless it's really necessary to implement it in your own way, it's better to use an existing library to read the file. In this case, Microsoft provides a class in the .NET framework that properly handles reading CSV files (including ones with multi-line cells). The name of the class is TextFieldParser and it's in the Microsoft.VisualBasic.FileIO namespace. Here's the link to a page in the MSDN that explains how to use it to read a CSV file:
http://msdn.microsoft.com/en-us/library/cakac7e6
Here's an example:
Using reader As New TextFieldParser("my csv file Path")
reader.TextFieldType = FieldType.Delimited
reader.SetDelimiters(",")
While Not reader.EndOfData
Try
Dim fields() as String = reader.ReadFields()
' Process fields in this row ...
Catch ex As MalformedLineException
' Handle exception ...
End Try
End While
End Using