Failing to open spreadsheet - openpyxl

I am currently parsing multiple spread sheets and running into problems opening a few of them. The error that is produced directs me to the file "Cell\text.py" at the line that joins the snippets together in the content property.
workBook = openpyxl.load_workbook(filepath, True)
File "C:\Python34\lib\site-packages\openpyxl-2.3.1-py3.4.egg\openpyxl\reader\excel.py", line 191, in load_workbook
shared_strings = read_string_table(archive.read(strings_path))
File "C:\Python34\lib\site-packages\openpyxl-2.3.1-py3.4.egg\openpyxl\reader\strings.py", line 21, in read_string_table
text = Text.from_tree(node).content
File "C:\Python34\lib\site-packages\openpyxl-2.3.1-py3.4.egg\openpyxl\cell\text.py", line 182, in content
return "".join(snippets)
TypeError: sequence item 3: expected str instance, NoneType found
If I alter the code to have a check around the appending of the blocks in the formatted section as follows:
for block in self.formatted:
if(block.t is not None):
snippets.append(block.t)
It works fine, I was just wondering if there is some obvious problem with the excel sheet that I don't understand that someone could shed light on. I haven't rooted around the code for openpyxl so I am not sure what determines the contents of "self.formatted" but, My guess is that it is something caused by the merged cells in that area of the spread sheet.
EDIT
After reading your comments I dived a little deeper to see if there is any data I could share with you. I updated the Content property("Cell/Text.py") to output the data it was trying to join and searched the "sharedStrings.xml" for the xml data. The printed data was:
['“Replaced Data”', None]
After taking a look in the sharedStrings file the xml that contains this data is:
<si>
<r>
<t>“Replaced Data”</t>
</r>
<r>
<rPr>
<sz val="11"/>
<color rgb="FF008080"/>
<rFont val="Calibri"/>
<family val="2"/>
<scheme val="minor"/>
</rPr>
<t/>
</r>
</si>

This has nothing to do with merged cells but with the way text has been stored. Could you submit a bug report with a test file?

Related

trouble with utf-8 with julia and jupyterlab

I'm reading the csv file at https://github.com/VinitaSilaparasetty/julia-beginners/blob/master/data/nba/nba19-20.csv
I get a DataFrame and I save it as XLSX. When I try to read it in jupyterlab I get the error the file is not UTF-8 encoded and therefore the file is not read.
This is my code:
using HTTP, XLSX, CSV, DataFrames
df = CSV.read(HTTP.get("https://raw.githubusercontent.com/VinitaSilaparasetty/julia-beginners/master/data/nba/nba19-20.csv").body)
# first(df,5) # first shows the top five rows ok
XLSX.writetable("data/nba/nba19-20.XLSX", collect(eachcol(df)), names(df), overwrite = true)
The file is saved in my data folder. When I try to open it with jupyterlab, I get a pop up with the file is not UTF-8 encoded and the file is not opened.
When I try to open the file in Ubuntu (with LibreOffice) I do not see anything suspicious.
As I'm new to Julia I'm struggling to understand where the problem lies or how to fix it.
I tried to see if I could encode the dataframe in UTF-8 (after saving the file to disk) with
data = DataFrame(CSV.File(open(read,"data/nba/nba19-20.csv", enc"utf-8")))
But I did not see any change. Any suggestion is welcome.
Do you have the jupyterlab-spreadsheet plugin installed? JupyterLab by default doesn't support opening xlsx files (it isn't mentioned in the file formats list here for example).
See also this similar question involving Python pandas (which says pretty much the same thing).

AHK: How to store a selected file in a variable to execute an script using FileSelectFile function after selected in Window/File explorer?

Apologies, I am quite new with AHK.
Context: I am trying to build a small program (eventually with UI) which will clean data in .XLF files in order to be processed properly by a CAT tool interpreter (import into it).
By "clean" I mean to find HTML attributes and replace them with their respective Char Entities. This as a single script; writing the name of the file or path inside the script is working perfectly.
Problem: I would like to run my .ahk/.exe allowing the user to open the file manager/explorer and select the file that needs to be processed by the script (find/replace html attributes with char entities) selecting the file is not working. Nothing is populated (the final file/result is empty) I'm trying to sort out this with FileSelectFile function and store the output var value (selecting the file) in the first instruction "fileread, selectedfile".
But it's not working! If I don't do this and I only provide in the default directory "A_ScriptDir" an specific file name .xf -> this works fine.
This is my code so far w/comments:
SetWorkingDir, %A_ScriptDir%
FileEncoding, UTF-8
;NoEnv
;Open Window File Manager/Explorer and select a file .xlf file
FileSelectFile, SelectedFile, 8, , Open a file, , ,(*.xlf)
;--- > HTML attribute '&' must be replaced by its char entity first/overall otherwise this instruction will overwrite the amp entities from the rest of char entities corrupting the file;
;"SelectedFile" can be any filename such as "example.xlf" but this is not my scope
fileread, text, SelectedFile ;previously "text.xlf" with random html content to do tests
replace := "&"
newtext := strreplace(text, "&", replace, all)
sleep, 200
filedelete, newtest.xlf
fileappend, %newtext%, newtest.xlf
;--------------------------------- <b>
;here "fileread" must read the final appended file as solution to use "streplace" function multiple times (replace more than one desired string) running the script at the same time. (due to my lack of exp. with loop function)
fileread, text, newtest.xlf
replace := ">b<"
newtext := strreplace(text, "<b>", replace, all)
filedelete, newtest.xlf
fileappend, %newtext%, newtest.xlf
I've been thinking that other solution can be:
I am still new to understand apply Drag and Drop GUI but I am unsure how to modify my code in order to drag/drop a file onto the ahk.exe
Thanks for your time reading this! any tip and/or help would be super appreciated :)
The FileRead command expects text, not an input variable.
So if you add % around SelectedFile like this, it should work:
FileRead, text, %SelectedFile%
If that doesn't work, it means the file does not exist or an error occurred. In that, you'll want to look at FileRead's error handling section.

importing training data to CloudML with images that do not have a file-extension

i created some training data and put the CSV in the google-storage, but it looks like the import won't work when the files do not have a proper .jpg extension:
Error: INVALID_ROW: Invalid input found at row 1 of gs://weg-li-production/training/test.csv: "Unsupported file extension."
values look like this:
TRAIN,gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj,Opel
is there a way to work around this issue?
It seems you put the whole "TRAIN,gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj,Opel" into a single unit in your csv file. The comma should represent another unit in the csv file. You can open it in Excel to check your csv file, and the correct format should include three columns in Excel.
Assuming gs://weg-li-production/d7nwcheo8774rvbcgj4lyta3athj is the image file & Opel is the label. It all looks fine, just that the image file name does not have a valid extension.
Check https://cloud.google.com/vision/automl/docs/prepare for valid file types (extension), during training & predictions

Using a list in a text file as a list in Latex \foreach loop

Sorry if I didn't tag\classify this well. But here is my problem
I have a list generated by another program which goes into a text file, when I want to simply include this text in my latex document i use \input{writing.txt} this works fine.
The issue i'm having is figuring out a way to use a text file that has a list like
picnname1 picname2 picname2 ... and have latex recognize that as my list, included in \input{list.txt}
Below is a broken example, which works when I replace \input{list.txt} with the actual list. My attempts so far have gotten me nowhere. So any help would be greatly appreciated.
\documentclass{article}
\usepackage{booktabs}
\usepackage{tabularx}
\usepackage{python}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage[latin1]{inputenc}
\usepackage[top=2cm]{geometry}
\foreach \picname in {\input{list.txt}}
{
\section{\MakeUppercase\picname}
\begin{figure}[h]
\centering
\includegraphics[width=10cm]{\picname_Main.pdf}
\caption{This depicts the trend in \picname}
\end{figure}
\input{"filepathforsometext"\picname.txt}
\input{"filepathforatable"\picname_GrowthTable}
\input{"filepathforanothertable"\picname_ActualPop}
\clearpage
}
\end{document}

Generated corrupt large ply file - how to find the error

I just wrote a java class to generate meshes from a cylinder list stored to a ply file. I tested the files with a hand generated list of 3 cylinders. The resulting file I can open both in Meshlab and Cloudcompare.
When I use the class in my real program I have to write a mesh for more than 13000 cylinders.
Cloudcompare gives me the following error : Reading error(no access right?)
Meshlab this one : error details, unexptected eof
I already checked if my ply file contains the exact number of vertices and faces defined in the header. I also assured, there are no nan (checked for 'n','a', etc in winedit) values contained.
I can reproduce the errors with my test file from the 3 hand made cylinder file by deleting the last line. But as mentioned earlier, I already checked if the line numbers are correct (might be an empty line not caught by my eyes though, as scrolling down half a million lines is impossible).
So are there any programs available to parse the ply file for errors? Open source tools would be appreciated here. Or are the files just to large? 436302 lines to be exact. I use ascii version of ply.
Found a non open source tool called nugraf, which provides information about the corrupted line numbers.
Java seems to print NAN with '?'. For this char i did not check, so problem seems to be solved and I can debug my java software now again.