Searching Multiple Results using Streamreader -

Does anyone know of a method that allows you to search a string through a text file using StreamReader that allows you to account for multiple instances of finding the results. Basically I am creating a booking application and each time a customer books a seat, their PrimaryKey, FirstName, LastName and the co-ordinates of the seat on a data grid (which I have used as a method to book seats) are generated then saved to a text file.
I want the ability to be able to read multiple instances of a PrimaryKey then find the seat co-ordinates of each line that this PrimaryKey is listed on and repopulate another similar datagridview with these co-ordinates which is all going to be driven by a combobox index change.
It seems a bit complicated to understand but if anyone can help then please let me know.
I just need the knowhow of how to search multiple instances, so after its found the string once then look through the rest of the file to find another instance, I can do the rest by myself.
I'm coding using Visual Basic.Net

Yes it's possible to search multiple times through a file, but you'd either have to reopen the file, or rewind the stream (FileStream.Seek).
Wildly inefficient though.
If it has to remain an unsorted and unstructured file, build an in memory index to it.
If your key is an integer, create a Dictionary<int,int> of Key and Position in the stream.
Then when you want find key X you use FileStream.Seek to move to it, and read a line to get the data. If you find yourself grouping by say aeroplaneID, build a Dictionary<Int, List<Int,Int>>
where the key is the aeroplane id and the list is a list of primary keys and positions in the file.
You could push all that off to a background thread. You could try and get really clever and build them up as you need them. Personally though I'd be trying to move my storage to a more suitable format. You aren't struggling to do this because you've missed a class, you are struggling because you shouldn't.
Something like
Dictionary<int, int> _fileIndex = new Dictionary<int,int>();
using(FileStream fs = new FileStream(DataFileName,FileMode.Open,FileAccess.Read))
StreamReader reader = new StreamReader(fs);
int lastPosition = 0;
string currentLine = null;
while(currentLine = reader.ReadLine() != null)
String[] data = currentLine.Split(new char[] {','});
int key = int.Parse(data[0]);
lastPosition = fs.Position;
NB didn't test the above and there should be a bit more error checking in it. If there's alot of data in the line, then might be better off not suing split and just pulling out everything up to the correct delimiter. Also be careful how many indexes you keep live, wouldn't be long before they used up more space than just reading the entire thing in to memory.
Then you could create a class or structure to implement a line in the file, and write a bit of code
to use FileStream.Seek) to get there. If you wanted to load up a bunch of 'em it would make sense to get your list of positions of each one in the file and then sort them in order, then you could rip through the file in it's 'order' picking them out.


Import txt file data and store it in Multidimensioanl array in

Sorry if the problem is so basic, I'm a bit used to python not
I'm trying to read text file data (numbers) and store it in array/list
# Sample of text
# PS: I can control the output of the numbers to have delimiter = ',' or space between numbers, whatever is easier to import
I wrote the following code to read string data and store it. yet, I don't know how to have a multidimensional array (2D or 3D) instead of 1D string (e.g. for the text above, it would be 2x3 array)
' Import Data
Comp_path = FinalPath & "Components_colors.txt"
reader = New StreamReader(Comp_path)
Dim W As String = ""
Dim wArray(10) As String
Dim i As Integer = 0
Do Until reader.Peek = -1
W = reader.ReadLine()
wArray(i) = W
i += 1
Moreover, I don't know the length of the text file, so I can't determine the length of the array like I did in the code above for the string wArray
For a file like this, you should turn to NuGet for a dedicated CSV parser. There are parsers built into .NET you could also use, but pulling one off of NuGet will also let you parse the values directly into something other than a string.
But if you really don't want to do that you can start with this (assuming Option Infer):
Public Function ImportData(filePath As String) As IEnumerable(Of Double())
Dim lines = File.ReadLines(filePath)
Return lines.Select(Function(line) line.Split(",").Select(AddressOf Double.Parse).ToArray())
End Function
And use it like this:
Comp_path = FinalPath & "Components_colors.txt"
Dim result = ImportData(Comp_path)
Note this code doesn't actually do any meaningful work yet. It doesn't even read the file. What it does is give you an object (result) that you can use with a For Each loop or linq operations. It will read the file in a just-in-time way, parsing out the data for each line as it goes. If you want an array (or List, which you should use in .Net more often), you can append a ToList() call to the end:
Comp_path = FinalPath & "Components_colors.txt"
Dim result = ImportData(Comp_path).ToList()
But you should try to avoid doing that it. It's much less efficient in terms of memory use. The first sample will only ever need to keep one line of the file in memory at a time. Adding ToArray() or ToList() needs to load the entire file.
Some more notes:
Many newer dynamic platforms like Python don't actually use real arrays in the formal computer science sense (fixed block of contiguous memory). Rather, they use collections, and just call them arrays. .Net has collections, too, but when you declare an array, you get an array. This has nice benefits for performance, but if you don't know you want that or how to take advantage of it you're probably better off asking for a generic List most of the time instead.
Thanks to cultural/internationalization issues, parsing numeric (or date) values to string and back again is much slower and more error-prone than you've believed in the past, especially coming from a dynamic platform. It is slow on these other platforms, too, but they want you to pretend it isn't. The first introduction to a strongly-typed platform like .Net can feel stifling in this area, but once you understand the performance and and reliability benefits, you won't want to go back.
In strongly-typed platforms it is very important to understand the data types you are working with at every level of an expression. Otherwise, building and reading statements like the Return line in my answer will be way more difficult and frustrating than it needs to be.

Auto incrementing w/ a chart control

I'm curious to know if the DataVisualization.charting.chart in does any auto counting / plotting for my particular issue.
I have a file with thousands of User Agent Strings which were generated over a period of time. The UA Strings are generated from user logins.
In my program, I am identifying approximately 45 different environments as: Operating Systems + Browser Type (ie., "Windows 7 + IE10"). Each login also has a date stamp in the format of YYYY-MM.
My task is to do a line chart where I have Environment (Y-axis) vs Date (X-axis) using the charting control. I would like the control to increment each time I have a particular data set rather than keeping a hideous amount of arrays & counter data for my chart.
Does the charting control auto increment in this way? I am not able to find anything so far.
I'm not sure I understand the question (what is it that you want to auto increment? The axis min/max? The date? Something else?), but if you want the axes to update each time you add a new point, the chart certainly supports that. Just call Chart.ResetAutoValues() after you add the new point(s), and it will figure out new ranges for both axes.
Edit: Arrange your data before adding it to the chart. Something like:
Dictionary<string, int> values = new ...;
string[] uaStrings = ReadFileOfUAStrings();
foreach (string uaString in uaStrings)
foreach (KVP in values)
The above doesn't separate things out by date, but you should get the idea. As written, it's also not very efficient if another UA string is added to the file and the whole thing is re-read, but optimizations can come after it's functional.

Compare 2 datasets with dbunit?

Currently I need to create tests for my application. I used "dbunit" to achieve that and now need to compare 2 datasets:
1) The records from the database I get with QueryDataSet
2) The expected results are written in the appropriate FlatXML in a file which I read in as a dataset as well
Basically 2 datasets can be compared this way.
Now the problem are columns with a Timestamp. They will never fit together with the expected dataset. I really would like to ignore them when comparing them, but it doesn't work the way I want it.
It does work, when I compare each table for its own with adding a column filter and ignoreColumns. However, this approch is very cumbersome, as many tables are used in that comparison, and forces one to add so much code, it eventually gets bloated.
The same applies for fields which have null-values
A probable solution would also be, if I had the chance to only compare the very first column of all tables - and not by naming it with its column name, but only with its column index. But there's nothing I can find.
Maybe I am missing something, or maybe it just doesn't work any other way than comparing each table for its own?
For the sake of completion some additional information must be posted. Actually my previously posted solution will not work at all as the process reading data from the database got me trapped.
The process using "QueryDataset" did read the data from the database and save it as a dataset, but the data couldn't be accessed from this dataset anymore (although I could see the data in debug mode)!
Instead the whole operation failed with an UnsupportedOperationException at org.dbunit.database.ForwardOnlyResultSetTable.getRowCount(
Example code to produce failure:
QueryDataSet qds = new QueryDataSet(connection);
Even if you try it this way it fails:
IDataSet tmpDataset = connection.createDataSet(tablenames);
In order to make extraction work you need to add this line (the second one):
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
Great, that this was nowhere documented...
But that is not all: now you'd certainly think that one could add this line after doing a "QueryDataSet" as well... but no! This still doesn't work! It will still throw the same Exception! It doesn't make any sense to me and I wasted so much time with it...
It should be noted that extracting data from a dataset which was read in from an xml file does work without any problem. This annoyance just happens when trying to get a dataset directly from the database.
If you have done the above you can then continue as below which compares only the columns you got in the expected xml file:
// put in here some code to read in the dataset from the xml file...
// and name it "expectedDataset"
// then get the tablenames from it...
String[] tablenames = expectedDataset.getTableNames();
// read dataset from database table using the same tables as from the xml
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
for(int i=0;i<tablenames.length;i++)
ITable expectedTable = expectedDataset.getTable(tablenames[i]);
ITable actualTable = actualDataset.getTable(tablenames[i]);
ITable filteredActualTable = DefaultColumnFilter.includedColumnsTable(actualTable, expectedTable.getTableMetaData().getColumns());
You can also use this format:
// Assert actual database table match expected table
String[] columnsToIgnore = {"CONTACT_TITLE","POSTAL_CODE"};
Assertion.assertEqualsIgnoreCols(expectedTable, actualTable, columnsToIgnore);

Getting the RIGHT word count of a PDF file

The response in this topic helped me understand why sometimes my
PDF fails to find a word and why I keep getting different word counts when using
different PDF word count programs. I decided to use xpdf. I converted it to text
and added the -layout tag and then opened the resulting text file with Word 2003.
I noted the word count. Then I decided, unfortunately, to remove the -layout tag.
This time, though, the word count is different.
Why did that tag affect the word count? Is there an accurate way to find the word count
of a PDF file? I would even pay for such software if I have to so long as it gives me
the right number of words.
(I checked another topic but thought I'd find out if the solution I just offered would solve everything. There was another topic where advancedpdf was recommended.)
I'd like to argue that there is no reliable word counting. One could, for example, just to make your life harder, put each character of this lovely Stackoverflow answer into a single text object and position such objects such that, only when rendered, gives a meaningful paragraph to humans. Like this:
div {float: left;}
I would suggest an open source solution using Java. First you would have to parse the pdf file and extract all the text using Tika.
Then i believe you can achieve this simply by scanning the extracted text and counting the words.
Sample code would look like this:
if (f.getName().endsWith(".txt"))
in = new BufferedReader(new FileReader(f));
StringBuilder sb = new StringBuilder();
String s = null;
while ((s = in.readLine()) != null)
String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+"); //to get individual terms
In tokenizedTerms array , you wil have all the terms(words) of the document and you can count them by calling tokenizedTerms.length(). Hope this was useful. :-)

Converting a PDF file to a nice table

I have this PDF file which is arranged in 5 columns.
I have looked and looked through Stack Overflow (and Googled crazily) and tried all the solutions (including the last resort of trying Adobe Acrobat itself).
However, for some reason I cannot get those 5 columns in csv/xls format - as I need them arranged. Usually when I export them, the format is horrible and all the entries are arranged line by line with some data loss.
Here is a link to an excerpt of the file above, but I am really getting frustrated and am running out of options.
iText (or iTextSharp) could do this, if you can give it the boundaries of those 5 columns, and are willing to deal with some overhead (namely reparsing the page's text for each column)
Rectangle2D columnBoxArray[] = buildColumnBoxes();
ArrayList<String> columnTexts = new ArrayList<String>(columnBoxArray.length);
For (Rectangle2D columnBBox : columnBoxArray) {
FilteredTextRenderListener textInRectStrategy =
new FilteredTextRenderListener(new LocationTextExtractionStrategy(),
new RegionTextRenderFilter( columnBBox ) );
columnTexts.add(PdfTextExtractor.extractText( reader, pageNum, textInRectStrategy));
Each line of text should be separated by \n, so it becomes a simple matter of string parsing.
If you wanted to not reparse the whole page for each column, you could probably come up with a custom implementation of FilteredTextRenderListener that would take multiple listener/filter pairs. You could then parse the whole thing once rather than once for each column.