Index was out of range exception when attempting to access Worksheets object - epplus

I have an .XLSX created by a 3rd party. I can read it in Excel just fine, however if I try to read it with EPPlus, I get the following exception:
Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index
at System.ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument argument, ExceptionResource resource)
at OfficeOpenXml.Style.ExcelStyle..ctor(ExcelStyles styles, ChangedEventHandler ChangedEvent, Int32 positionID, String Address, Int32 xfsId)
at OfficeOpenXml.Style.XmlAccess.ExcelNamedStyleXml..ctor(XmlNamespaceManager NameSpaceManager, XmlNode topNode, ExcelStyles styles)
at OfficeOpenXml.ExcelStyles.LoadFromDocument()
at OfficeOpenXml.ExcelStyles..ctor(XmlNamespaceManager NameSpaceManager, XmlDocument xml, ExcelWorkbook wb)
at OfficeOpenXml.ExcelWorkbook.get_Styles()
at OfficeOpenXml.ExcelWorkbook.get_MaxFontWidth()
at OfficeOpenXml.ExcelWorksheet.get_DefaultColWidth()
at OfficeOpenXml.ExcelWorksheet.LoadColumns(XmlReader xr)
at OfficeOpenXml.ExcelWorksheet.CreateXml()
at OfficeOpenXml.ExcelWorksheet..ctor(XmlNamespaceManager ns, ExcelPackage excelPackage, String relID, Uri uriWorksheet, String sheetName, Int32 sheetID, Int32 positionID, eWorkSheetHidden hide)
at OfficeOpenXml.ExcelWorksheets..ctor(ExcelPackage pck, XmlNamespaceManager nsm, XmlNode topNode)
at OfficeOpenXml.ExcelWorkbook.get_Worksheets()
If I open the file in Excel and then save it as a new file, EPPlus can read the file just fine. Obviously, this is not a solution that is reasonable for the end users of my system.
I can send the file to the developers if they need, or I can provide more diagnostic information if needed and available.
I'm going to check in with the developers of the other app to try to figure out what API they are using to generate the .XLSX.
code:
using (var package = new ExcelPackage(importEquipmentFile.InputStream))
{
foreach (var ws in package.Workbook.Worksheets.Where(ws => ws.Hidden == eWorkSheetHidden.Visible))
...
and this simplified blows up too:
using (var package = new ExcelPackage(importEquipmentFile.InputStream))
{
foreach (var ws in package.Workbook.Worksheets)
...
Here's the problem. In the code below
positionID = -1 and _styles.CellStyleXfs list is empty, so the "else" gets called
internal ExcelStyle(ExcelStyles styles, OfficeOpenXml.XmlHelper.ChangedEventHandler ChangedEvent, int positionID, string Address, int xfsId) :
base(styles, ChangedEvent, positionID, Address)
{
Index = xfsId;
ExcelXfs xfs;
if (positionID > -1)
{
xfs = _styles.CellXfs[xfsId];
}
else
{
xfs = _styles.CellStyleXfs[xfsId];
}
and that throws an exception because the code that's called can't handle an empty list.
public T this[int PositionID]
{
get
{
return _list[PositionID]; // <<<--- this blows up because _list has no members
}
}
I tweaked the code to be this, and it worked OK, but I'm not sure of the best way to deal with that list being empty:
if (positionID > -1 || _styles.CellStyleXfs.Count == 0)

Related

Multiple synonym matching is not working the way I intend or expect, what am I doing wrong?

I am indexing technical documentation and incorporating synonyms at index time, so that users can search with a number of alternative patterns. But only some synonyms seem to be getting into the map.
I have a text file synonyms.list which contains a series of lines like so:
note,notes,notice
subtree,sub-tree,sub tree
My analyzer and synonym map builder (I've removed try and catch wrappers to save space, but they aren't the problem):
public class TechAnalyzer extends Analyzer {
#Override
protected TokenStreamComponents createComponents(String fieldName) {
WhitespaceTokenizer src = new WhitespaceTokenizer();
TokenStream result = new TechTokenFilter(new LowerCaseFilter(src));
result = new SynopnymGraphFilter(result, getSynonyms(getSynonymsList()), Boolean.TRUE);
result = new FlattenGraphFilter(result);
return new TokenStreamComponents(src, result);
}
private static SynonymMap getSynonyms(String synlist) {
boolean dedup = Boolean.TRUE;
SynonymMap synMap = null;
SynonymMap.Builder builder = new SynonymMap.Builder(dedup);
int cnt = 0;
BufferedReader br = new BufferedReader(new FileReader(synlist));
String line;
while ((line = br.readLine()) != null) {
processLine(builder,line);
cnt++;
}
br.close();
if (cnt > 0) {
synMap = builder.build();
}
return synMap;
}
private static void processLine(SynonymMap.Builder builder, String line) {
boolean keepOrig = Boolean.TRUE;
String terms[] = line.split(",");
if (terms.length > 1) {
String word = terms[0];
String[] synonymsOfWord = Arrays.copyOfRange(terms, 1, terms.length);
for (String syn : synonymsOfWord) {
addPair(builder, word, syn, keepOrig);
}
}
}
private static void addPair(SynonymMap.Builder builder, String word, String syn, boolean keepOrig) {
CharsRef synp = SynonymMap.Builder.join(syn.split("\\s+"), new CharsRefBuilder());
CharsRef wordp = new CharsRef(word);
builder.add(wordp, synp, keepOrig);
// builder.add(synp, wordp, keepOrig); // ? do I need this??
}
I'm not splitting word in addPair() because (at the moment, anyway) the first term in every line of synonyms.list must be a word not a phrase.
My first question relates to that comment at the bottom of addPair(): if I am adding (word,synonym) to the map, do I also need to add (synonym,word)? Or is the map commutative? I can't tell, because of the problem I'm having which is the basis of the next question.
So... the technical documentation being indexed contains some documents which refer to "release notes", and some which refer to "release notices". There are also points described as a "release note". So I would like a search for any of "release note", "release notes", or "release notice" to match all three alternatives.
My code doesn't seem to enable this. If I index a single file which refers to "release notes" I can inspect the generated index with luke and I can see that the index only ever contains one synonym, not two. The same position in the index might have "note" and "notes", or "notes" and "notice", depending on the order of the words in the synonyms.list text file, but it will never have "note", "notes" and "notice".
Obviously I'm not building the map correctly, but the documentation hasn't helped me see what I am doing wrong.
If you've read this far, and can see the flaw in my code, please help me see it too!
Thanks, etc.

How can I log something in USQL UDO?

I have custom extractor, and I'm trying to log some messages from it.
I've tried obvious things like Console.WriteLine, but cannot find where output is. However, I found some system logs in adl://<my_DLS>.azuredatalakestore.net/system/jobservice/jobs/Usql/.../<my_job_id>/.
How can I log something? Is it possible to specify log file somewhere on Data Lake Store or Blob Storage Account?
A recent release of U-SQL has added diagnostic logging for UDOs. See the release notes here.
// Enable the diagnostics preview feature
SET ##FeaturePreviews = "DIAGNOSTICS:ON";
// Extract as one column
#input =
EXTRACT col string
FROM "/input/input42.txt"
USING new Utilities.MyExtractor();
#output =
SELECT *
FROM #input;
// Output the file
OUTPUT #output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
This was my diagnostic line from the UDO:
Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));
This is the whole UDO:
using System.Collections.Generic;
using System.IO;
using System.Text;
using Microsoft.Analytics.Interfaces;
namespace Utilities
{
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class MyExtractor : IExtractor
{
//Contains the row
private readonly Encoding _encoding;
private readonly byte[] _row_delim;
private readonly char _col_delim;
public MyExtractor()
{
_encoding = Encoding.UTF8;
_row_delim = _encoding.GetBytes("\n\n");
_col_delim = '|';
}
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
string s = string.Empty;
string x = string.Empty;
int i = 0;
foreach (var current in input.Split(_row_delim))
{
using (System.IO.StreamReader streamReader = new StreamReader(current, this._encoding))
{
while ((s = streamReader.ReadLine()) != null)
{
//Strip any line feeds
//s = s.Replace("/n", "");
// Concatenate the lines
x += s;
i += 1;
}
Microsoft.Analytics.Diagnostics.DiagnosticStream.WriteLine(System.String.Format("Concatenations done: {0}", i));
//Create the output
output.Set<string>(0, x);
yield return output.AsReadOnly();
// Reset
x = string.Empty;
}
}
}
}
}
And these were my results found in the following directory:
/system/jobservice/jobs/Usql/2017/10/20.../diagnosticstreams
good question. I have been asking myself the same thing. This is theoretical, but I think it would work (I'll updated if I find differently).
One very hacky way is that you could insert rows into a table with your log messages as a string column. Then you can select those out and filter based on some log_producer_id column. You also get the benefit of logging if part of the script works, but later parts do not assuming the failure does not roll back. Table can be dumped at end as well to file.
For the error cases, you can use the Job Manager in ADLA to open the job graph and then view the job output. The errors often have detailed information for data-related errors (e.g. row number in file with error and a octal/hex/ascii dump of the row with issue marked with ###).
Hope this helps,
J
ps. This isn't a comment or an answer really, since I don't have working code. Please provide feedback if the above ideas are wrong.

How to save Excel Table as a Picture using vb.net?

I'm trying to save tables from excel sheets as pictures. Is there a way to just put that table on the clipboard and save it? This is what I've got so far but the library referenced is not there?
Thank you in advance!
-Rueben Ramirez
Public Sub extract_excelTable(ByRef data_file As String, ByRef app1 As excel.Application, ByRef sheet_name As String)
'defining new app to prevent out of scope open applications
Dim temp_app As excel.Application = app1
Dim workbook As excel.Workbook = temp_app.Workbooks.Open(Path.GetFullPath(data_file))
temp_app.Visible = False
For Each temp_table As excel.DataTable In workbook.Worksheets(sheet_name)
temp_table.Select()
'temp_app.Selection.CopyAsPicture?
Next
End Sub
I'm not going to write any code here, but I will outline a solution for you that will work. Note that this will not reproduce the formatting of the excel document, just simply get the data from it, and put it on an image in the same column/row order as the excel file.
STEP 1:
My solution to this problem would be to read the data from the excel file using an OLEDB connection as outlined in the second example of this post: Reading values from an Excel File
Alternatively, you may need to open the document in excel and re-save it as a CSV if it's too large to fit in your computer's memory. I have some code that reads a CSV into a string list in C# that may help you:
static void Main(string[] args)
{
string Path = "C:/File.csv";
System.IO.StreamReader reader = new System.IO.StreamReader(Path);
//Ignore the header line
reader.ReadLine();
string[] vals;
while (!reader.EndOfStream)
{
ReadText = reader.ReadLine();
vals = SplitLine(ReadText);
//Do some work here
}
}
private static string[] SplitLine(string Line)
{
string[] vals = new string[42];
string Temp = Line;
for (int i = 0; i < 42; i++)
{
if (Temp.Contains(","))
{
if (Temp.Substring(0, Temp.IndexOf(",")).Contains("\""))
{
vals[i] = Temp.Substring(1, Temp.IndexOf("\",", 1) - 1);
Temp = Temp.Substring(Temp.IndexOf("\",", 1) + 2);
}
else {
vals[i] = Temp.Substring(0, Temp.IndexOf(","));
Temp = Temp.Substring(Temp.IndexOf(",") + 1);
}
}
else
{
vals[i] = Temp.Trim();
}
}
return vals;
}
STEP 2:
Create a bitmap object to create an image, then use a for loop to draw all of the data from the excel document onto the image. This post had an example of using the drawstring method to do so: how do i add text to image in c# or vb.net

Partial replace on SQL image data column

This question is related to another one I posted earlier.
To recap, I need to fix an issue with an ancient legacy app where people messed up data storage by re-installing the software the wrong way.
The application stores data by saving a record in an SQL DB. Each record holds a reference to a file on disk of which the filename auto-increments.
By re-installing the app the filename auto-increment was re-set so the DB now holds multiple unrelated records which reference the same filename and I have to directories with files which I obviously cannot merge because of these identical filenames. The files hold no reference to the DB data so the only course of action that remains is to filter the DB records on date created and try to rename "EXED" to "IXED" or something like that.
The DB is relatively simple with one table containing a column that holds data of type "Image".
An example content of this image data is as follows:
0x3200001000000000000000200B0000000EFF00000300000031340000000070EC0100002C50000004000000C90000005D010000040000007955B63F4D01000004000000F879883E4F01000004000000BC95563E98010000040000009A99993F4A01000004000000000000004B01000004000000000000009101000004000000000000004E01000004000000721C83425101000004000000D841493F5E01000004000000898828414101000004000000F2D2BD3F4201000004000000FCA9B13F40010000040000007574204244010000040000000000204345010000040000007DD950414601000004000000000000004701000004000000000000009201000004000000000000008701000004000000D2DF13426A0100000400000000005C42740100000400000046B68F40500100000400000018E97A3F7901000004000000FB50CF3C7A01000004000000E645703F99010000040000000000E0404C010000040000008716593F8601000004000000000006439A0100000400000000008040700100000400000063D887449E01000004000000493CBA3E9C0100000400000069699D429B01000004000000DD60CA3F9D0100000400000035DE3C44B4010000040000008B5C744433000000040000003D0ABB4134000000040000000AFF7C44350000000400000093CB3942750400000400000054A69F41BA010000040000002635C64173040000040000008367C24100000080690100002B5000003101000032000010000000000000002009000000000000000100000000000000F00000000000000080080100000100000010000000540100000100000021F0AA42270000000200000010000000540100000200000021F0AA42280000000300000010000000540100000300000059C9E6432900000004000000100000005401000004000000637888442A00000005000000100000005401000005000000DFEF87442B00000006000000100000005401000006000000000000002C00000007000000100000005401000007000000000000002D00000008000000100000005401000008000000000000002D000000090000001000000054010000090000002F353D442D0000000A00000010000000540100000A00000035DE3C44340000000B00000010000000540100000B0000008B5C7444240000009D50000010000000CDCCCC3E2C513B41F65D5F3F2C51BB419E50000010000000CCBA2C3FE17C8C411553B13F83F32142000000403700000000FE0000090000004558454434386262002D50000008000000447973706E6F65008E5000000E00000056454C442052414D502033363000000000F000000000
The data is apparently Hex which mostly encodes meaningless crap but also holds the name of physical files (towards the end of the data field) in the filesystem that is linked to the SQL records:
??#7???????????EXED48bb?-P??????Dyspnoe??P??????VELD RAMP 360
I'm interested in the EXED part.
There is no clear regularity in the offset at which the filename appears and the filename is of variable length (so I do not know beforehand how long the substring will be).
I can call up all records with SQL like this:
SELECT COUNT(*) as "Number of EXED Files after critical date"
FROM [ZAN].[dbo].[zanu]
WHERE udata is not null
and SUBSTRING(udata, 1 , 2147483647) like '%EXED%'
and [udatum] > 0
and CONVERT(date,[udatum]) > CONVERT(date,'20100629')
What I would like to do now is know how to replace this EXED substring by something else (e.g. IXID).
I'm unfamiliar with SQL and Googling so far has yielded very little information on my options here.
I also have no other info on the original code that generated this data/the data format/encoding/whatever...
It's a mess really.
Any help is welcome!
An update on this:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Data.Linq;
using System.Text;
using System.Data.SqlClient;
using System.Threading;
namespace ZANLinq
{
class Program
{
static void Main(string[] args)
{
try
{
DataContext zanDB = new DataContext(#"Data Source=.\,1433;database=ZAN;Integrated Security=true");
string strSQL = #"SELECT
Idnr,
Udatum,
Uzeit,
Unr,
Uart,
Ubediener,
Uzugriff,
Ugr,
Uflags,
Usize,
Udata
FROM Zanu
WHERE (Udata IS NOT null and SubString(Udata, 1 , 2147483647) LIKE '%EXED%')
AND (Idnr = ' 2')";
var zanQuery = zanDB.ExecuteQuery<Zanu>(strSQL);
List<Zanu> list = zanQuery.ToList<Zanu>();
foreach (Zanu zanTofix in list)
{
string strOriginal = ASCIIEncoding.ASCII.GetString(zanTofix.Udata);
string strFixed = strOriginal.Replace("EXED", "IXED");
zanTofix.Udata = ASCIIEncoding.ASCII.GetBytes(strFixed);
}
zanDB.SubmitChanges();
//Console.WriteLine(zanResults.Count<Zanu>().ToString());
}
catch (SqlException e)
{
Console.WriteLine(e.Message);
}
}
}
}
It finds the records I'm interested in, I can easily manipulate the data but the commit doesnt work. I'm stumped, there are no exceptions, no indication the code is wrong.
Anybody have ideas?
UPDATE:
I think the above does not work because my table appears to have a composite PK (I cannot change this):
Since I could not debug this (no info anywhere, no exceptions, just a silent fail of the submitchanges()) I decided to use another approach and abandon Linq2SQL altogether:
try
{
SqlConnection thisConnection = new SqlConnection(#"Network Library=DBMSSOCN;Data Source=.\,1433;database=ZAN;Integrated Security=SSPI");
DataSet zanDataSet = new DataSet();
SqlDataAdapter zanDa;
SqlCommandBuilder zanCmdBuilder;
thisConnection.Open();
//Initialize the SqlDataAdapter object by specifying a Select command
//that retrieves data from the sample table.
zanDa = new SqlDataAdapter(#"SELECT
Idnr,
Udatum,
Uzeit,
Unr,
Uart,
Ubediener,
Uzugriff,
Ugr,
Uflags,
Usize,
Udata
FROM Zanu
WHERE (Udata IS NOT null and SubString(Udata, 1 , 2147483647) LIKE '%IXED%')
AND (Idnr = ' 2')
AND (Uzeit = '13:21')", thisConnection);
//Initialize the SqlCommandBuilder object to automatically generate and initialize
//the UpdateCommand, InsertCommand, and DeleteCommand properties of the SqlDataAdapter.
zanCmdBuilder = new SqlCommandBuilder(zanDa);
//Populate the DataSet by running the Fill method of the SqlDataAdapter.
zanDa.Fill(zanDataSet, "Zanu");
Console.WriteLine("Records that will be affected: " + zanDataSet.Tables["Zanu"].Rows.Count.ToString());
foreach (DataRow record in zanDataSet.Tables["Zanu"].Rows)
{
string strOriginal = ASCIIEncoding.ASCII.GetString((byte[])record["Udata"]);
string strFixed = strOriginal.Replace("IXED", "EXED");
record["Udata"] = ASCIIEncoding.ASCII.GetBytes(strFixed);
//string strPostMod = ASCIIEncoding.ASCII.GetString((byte[])record["Udata"]);
}
zanDa.Update(zanDataSet, "Zanu");
thisConnection.Close();
Console.ReadLine();
}
catch (SqlException e)
{
Console.WriteLine(e.Message);
}
This seems to work but any input on why the Linq does not work and whether or not my second solution is efficient/optimal or not is still very much appreciated.

Get value from SPFieldUser with AllowMultipleValues fails only in a Timer Job

This one is weird.
I'm executing this code in a Timer Job in SharePoint 2010 ...
...
// Get the field by it's internal name
SPField field = item.Fields.GetFieldByInternalName(fieldInternalName);
if (field != null)
{
SPFieldUser userField = (SPFieldUser)field;
object value = null;
if (userField.AllowMultipleValues)
{
// Bug when getting field value in a timer job? Throws an ArgumentException
users = new SPFieldUserValueCollection(item.ParentList.ParentWeb, item[userField.Id].ToString());
}
else
{
// Get the value from the field, no exception
value = item[userField.Id];
}
}
...
This code works perfectly when run in a simple ConsoleApplication but when run in the context of a Timer Job in SharePoint 2010 it throws an ArgumentException in the line ...
users = new SPFieldUserValueCollection(item.ParentList.ParentWeb, item[userField.Id].ToString());
I've tried many variations to retreive a value from a SPFieldUser but all fail only when a Timer Job is executing it and the field has AllowMultipleValues property set to TRUE.
I have tried debugging with Reflector and it seems that the exception is being thrown here in SPListItem ...
public object this[Guid fieldId]
{
get
{
SPField fld = this.Fields[fieldId];
if (fld == null)
{
throw new ArgumentException();
}
return this.GetValue(fld, -1, false);
}
...
And this here would be the exception stack trace...
System.ArgumentException was caught
Message=Value does not fall within the expected range.
Source=Microsoft.SharePoint
StackTrace:
at Microsoft.SharePoint.SPFieldMap.GetColumnNumber(String strFieldName, Boolean bThrow)
at Microsoft.SharePoint.SPListItemCollection.GetColumnNumber(String groupName, Boolean bThrowException)
at Microsoft.SharePoint.SPListItemCollection.GetRawValue(String fieldname, Int32 iIndex, Boolean bThrow)
at Microsoft.SharePoint.SPListItem.GetValue(SPField fld, Int32 columnNumber, Boolean bRaw, Boolean bThrowException)
at Microsoft.SharePoint.SPListItem.get_Item(Guid fieldId)
at FOCAL.Point.Applications.Audits.AuditUtility.GetPeopleFromField(SPListItem item, String fieldInternalName)
Sighh... any thoughts?
This generally means that you have requested too many lookup fields in a single SPQuery which would cause too many self-joins of the true-lookup-table in the content database unless SharePoint Foundation throttled resources. There is a threshold setting that is at 8 lookups per query for ordinary users. Make sure your query only returns the necessary lookup or person/group fields. If you can't decrease the usage, then consider altering the threshold setting.