Split text file into several parts by character - vb.net

I apologise in advance if there is already an answer to this problem; if so please just link it (I have looked, btw! I just didn't find anything relating to my specific example) :)
I have a text (.txt) file which contains data in the form 1.10.100.0.200 where 1, 10, 100, 0 and 200 are numbers storing the map terrain layout of a game. This file has multiple lines of 1.10.100.0.200 where each line represents an item of terrain in the map.
Here is what I would like to know:
How do I find out how many lines there are, so I know how many items of terrain to create when I read the map file?
What is the method I should use to get each of 1, 10, 100, 0 and 200:
E.g. when I am translating the file into a map terrain at runtime I might use the terrainitem1.Location = New Point(x, y) or terrainitem1.Size = New Size(p, q) commands, where x, y, p and q are integers or doubles relating to the terrain's location or size. Where would I then find x, y etc. out of 1, 10, 100, 0 and 200, if say x is equal to 1, y to 10 and so on?
I am sorry if this isn't clear, please just ask me and I'll try to explain.
N.B. I am using VB.NET WinForms

There is no way to know how many lines a file has without opening the file and reading its contents.
You didn't indicate how far you've got on this. Do you know how to open a file?
Here's some basic code to do what you want. (Sorry, this is C# but the idea is the same in VB.)
string line;
using (TextReader reader = File.OpenText(#"C:\filename.txt"))
{
// Read each line from the file (until null returned)
while ((line = myTextReader.ReadLine()) != null)
{
// Get each number in line (as string)
string[] values = line.Split(new[] { '.' }, StringSplitOptions.RemoveEmptyEntries);
// Convert each number to integer
id = int.Parse(values[0]);
height = int.Parse(values[1]);
width = int.Parse(values[2]);
x = int.Parse(values[3]);
y = int.Parse(values[4]);
}
}

Related

How can I use the StreamWriteAsText() to write data of the Number type?

My ultimate goal is to write a file of image data and the time it was taken, for multiple times. This could be used to produce time vs intensity plots.
To do this, I am trying to write a 1D image to a file stream repeatedly in time using the ImageWriteImageDataToStream() function. I go about this by attaching a Listener object to the camera view I am reading out and this listener executes a function that writes the image to a file stream using ImageWriteImageDataToStream() every time the data changes (messagemap = "data_changed:MyFunctiontoExecute") .
My question is, is there a way to also write a time stamp to this same file stream?
All I can find is StreamWriteAsText(), which takes a String data type. Can I convert time which is a Number type to a String type?
Does anyone have a better way to do this?
My solution at the moment is to create a separate file at the same time and record the timing using WriteFile(), so not using a file stream.
//MyFunctiontoExecute, where Img is the 1D image at the current time
My_file_stream.StreamSetPos(2,0)
ImageWriteImageDataToStream(Img, My_file_stream, 0)
//Write the time to the same file
Number tmp_time = GetHighResTickCount() - start_time
My_file_stream.StreamSetPos(2,0)
My_file_stream.StreamWriteAsText(0,tmp_time) //does not work
//instead using a different file
WriteFile(My_extrafileID,tmp_time+"/n")
I think your concept of streaming is wrong. When you stream to a file, at the end of the toStream() commands, the stream-position is already at the end. So you don't set the position.
Your script essentially tells the computer to set the stream back to that starting position and then to write the text - overwriting the data.
You only need the 'StreamSetPos()' command when you want to jump over some sections during reading (useful when defining import-scripts for specific file formats, for example. Or to extract only specific sub-sets from a file.).
If all you want to do is "stream-out some raw-data", you do exactly that: Just call the commands after each other:
void WriteDataPlusDateToStream( object fStream, image img, string dateStr )
{
number endian = 0
number encoding = 0
img.ImageWriteImageDataToStream(fStream,endian)
fStream.StreamWriteAsText(encoding,dateStr)
}
Similarly, you just "stream-in" by just following the same sequence:
void ReadDataPlusDateFromStream( object fStream, image img, string &dateStr )
{
number endian = 0
number encoding = 0
img.ImageReadImageDataFromStream(fStream,endian)
fStream.StreamReadTextLine(encoding,dateStr)
}
Two things are important here:
in ImageReadImageDataFromStream it is the size and data-type of the image img which defines how many bytes are read from the stream and how they are interpreted. Therefore img must have been pre-created and of fitting size and file-type.
in StreamReadTextLine the stream will continue to read in as text until it encounters the end-of-line character (\n) or the end of the stream. Therefore make sure to write this end-of-line character when streaming-out. Alternatively, you can make sure that the strings are always of a specific size and then use StreamReadAsText with the appropriate length specified.
Using the two methods above, you can use the following test-script as a starting point:
void WriteDataPlusDateToStream( object fStream, image img, string dateStr )
{
number endian = 0
number encoding = 0
img.ImageWriteImageDataToStream(fStream,endian)
fStream.StreamWriteAsText(encoding,dateStr)
}
void ReadDataPlusDateFromStream( object fStream, image img, string &dateStr )
{
number endian = 0
number encoding = 0
img.ImageReadImageDataFromStream(fStream,endian)
fStream.StreamReadTextLine(encoding,dateStr)
}
void writeTest(string path)
{
Result("\n Writing to :" + path )
image testImg := RealImage("Test",4,100)
string dateStr;
number loop = 5;
number doAutoClose = 1
object fStream = NewStreamFromFileReference( CreateFileForWriting(path), doAutoClose )
for( number i=0; i<loop; i++ )
{
testImg = icol * random()
dateStr = GetDate(1)+"#"+GetTime(1)+"|"+Format(GetHighResTickCount(),"%.f") + "\n"
fStream.WriteDataPlusDateToStream(testImg,dateStr)
sleep(0.33)
}
}
void readTest(string path)
{
Result("\n Reading form :" + path )
image testImg := RealImage("Test",4,100)
string dateStr;
number doAutoClose = 1
object fStream = NewStreamFromFileReference( OpenFileForReading(path), doAutoClose )
while ( fStream.StreamGetPos() < fStream.StreamGetSize() )
{
fStream.ReadDataPlusDateFromStream(testImg,dateStr)
result("\n time:"+dateStr)
testImg.ImageClone().ShowImage()
}
}
string path = "C:/test.dat"
ClearResults()
writeTest(path)
readTest(path)
Note, that when streaming "binary data" like this, it is you who defines the file-format. You must make sure that the writing and reading code matches up.

Make line chart with values and dates

In my app i use ios-charts library (swift alternative of MPAndroidChart).
All i need is to display line chart with dates and values.
Right now i use this function to display chart
func setChart(dataPoints: [String], values: [Double]) {
var dataEntries: [ChartDataEntry] = []
for i in 0..<dataPoints.count {
let dataEntry = ChartDataEntry(value: values[i], xIndex: i)
dataEntries.append(dataEntry)
}
let lineChartDataSet = LineChartDataSet(yVals: dataEntries, label: "Items count")
let lineChartData = LineChartData(xVals: dataPoints, dataSet: lineChartDataSet)
dateChartView.data = lineChartData
}
And this is my data:
xItems = ["27.05", "03.06", "17.07", "19.09", "20.09"] //String
let unitsSold = [25.0, 30.0, 45.0, 60.0, 20.0] //Double
But as you can see - xItems are dates in "dd.mm" format. As they are strings they have same paddings between each other. I want them to be more accurate with real dates. For example 19.09 and 20.09 should be very close. I know that i should match each day with some number in order to accomplish it. But i don't know what to do next - how i can adjust x labels margins?
UPDATE
After small research where i found out that many developers had asked about this feature but nothing happened - for my case i found very interesting alternative to this library in Swift - PNChart. It is easy to use, it solves my problem.
The easiest solution will be to loop through your data and add a ChartDataEntry with a value of 0 and a corresponding label for each missing date.
In response to the question in the comments here is a screenshot from one of my applications where I am filling in date gaps with 0 values:
In my case I wanted the 0 values rather than an averaged line from data point to data point as it clearly indicates there is no data on the days skipped (8/11 for instance).
From #Philipp Jahoda's comments it sounds like you could skip the 0 value entries and just index the data you have to the correct labels.
I modified the MPAndroidChart example program to skip a few data points and this is the result:
As #Philipp Jahoda mentioned in the comments the chart handles missing Entry by just connecting to the next data point. From the code below you can see that I am generating x values (labels) for the entire data set but skipping y values (data points) for index 11 - 29 which is what you want. The only thing remaining would be to handle the x labels as it sounds like you don't want 15, 20, and 25 in my example to show up.
ArrayList<String> xVals = new ArrayList<String>();
for (int i = 0; i < count; i++) {
xVals.add((i) + "");
}
ArrayList<Entry> yVals = new ArrayList<Entry>();
for (int i = 0; i < count; i++) {
if (i > 10 && i < 30) {
continue;
}
float mult = (range + 1);
float val = (float) (Math.random() * mult) + 3;// + (float)
// ((mult *
// 0.1) / 10);
yVals.add(new Entry(val, i));
}
What I did is fully feed the dates for x data even no y data for it, and just not add the data entry for the specific xIndex, then it will not draw the y value for the xIndex to achieve what you want, this is the easiest way since you just write a for loop and continue if you detect no y value there.
I don't suggest use 0 or nan, since if it is a line chart, it will connect the 0 data or bad things will happen for nan. You might want to break the lines, but again ios-charts does not support it yet (I also asked a feature for this), you need to write your own code to break the line, or you can live with connecting the 0 data or just connect to the next valid data.
The down side is it may has performance drop since many xIndex there, but I tried ~1000 and it is acceptable. I already asked for such feature a long time ago, but it took lot of time to think about it.
Here's a function I wrote based on Wingzero's answer (I pass NaNs for the entries in the values array that are empty) :
func populateLineChartView(lineChartView: LineChartView, labels: [String], values: [Float]) {
var dataEntries: [ChartDataEntry] = []
for i in 0..<labels.count {
if !values[i].isNaN {
let dataEntry = ChartDataEntry(value: Double(values[i]), xIndex: i)
dataEntries.append(dataEntry)
}
}
let lineChartDataSet = LineChartDataSet(yVals: dataEntries, label: "Label")
let lineChartData = LineChartData(xVals: labels, dataSet: lineChartDataSet)
lineChartView.data = lineChartData
}
The solution which worked for me is splitting Linedataset into 2 Linedatasets. First would hold yvals till empty space and second after emptyspace.
//create 2 LineDataSets. set1- till empty space set2 after empty space
set1 = new LineDataSet(yVals1, "DataSet 1");
set2= new LineDataSet(yVals2,"DataSet 1");
//load datasets into datasets array
ArrayList<ILineDataSet> dataSets = new ArrayList<ILineDataSet>();
dataSets.add(set1);
dataSets.add(set2);
//create a data object with the datasets
LineData data = new LineData(xVals, dataSets);
// set data
mChart.setData(data);

Why is iTextSharp reading pages 1..N instead of N?

Here's my code:
var sb = new StringBuilder();
var st = new SimpleTextExtractionStrategy();
string raw;
using(var r = new iTextSharp.text.pdf.PdfReader(path)) {
for(int pn = 1; pn <= r.NumberOfPages; pn++) {
raw = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(r, pn, st);
sb.Append(raw);
}
}
This works for almost all PDFs I've run across... until today:
http://www7.dleg.state.mi.us/orr/Files/AdminCode/356_10334_AdminCode.pdf
For this PDF (and others like it on the same site), the extracted text for page 1 is correct, but the text for page 2 contains pages 1 and 2, page 3 contains pages 1-3, etc. So my StringBuilder ends up with the text from pages 1, 1, 2, 1, 2, 3, 1, 2, 3, 4, etc.
Using the default Location-based strategy has the same issue (and won't work for these particular PDFs anyway).
I recently upgraded from a much older version of iTextSharp (5.1-ish?) and didn't experience this issue before (I believe I've parsed some of these files before without issue). I poked through the source and didn't see anything obvious.
I thought I could work around this by asking for only the last page, but this doesn't work -- I get only the last page. If I hard-code the loop to get pages 2..4, I get 2, 2, 3, 2, 3, 4. So the issue may be some sort of data that PdfReader is maintaining between calls to GetTextFromPage.
Change your code to something like this:
var sb = new StringBuilder();
string raw;
using(var r = new iTextSharp.text.pdf.PdfReader(path)) {
for(int pn = 1; pn <= r.NumberOfPages; pn++) {
var st = new SimpleTextExtractionStrategy();
raw = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(r, pn, st);
sb.Append(raw);
}
}
Update based on mkl's comment: a strategy remembers all page content it has been confronted with. Thus, you have to use a fresh strategy if you want an extraction with nothing buffered yet.

Struggling with add.Series (CSV dataset)

I’ll try to keep this short. I have a CSV with traffic counts for specific streets. So far I have plotted the street names on the (x) axis, and the total traffic count on the (y) axis. The CSV also contains counts for vehicles that travel for (< 15 min, 15-30 min, 30-45 min, 60 min, etc).
What I am trying to do is “split” the total count for each street in accordance with the (< 15, 15-30, etc) minute counts, kind of like categories. Essentially, I am trying to replicate this example:
http://dimplejs.org/examples_viewer.html?id=bars_vertical_grouped_stacked where the “Owner” category is instead the “Arterial” category from my dataset.
In short:
1. I can semi-successfully split some of the street names, however, some don’t seem to be split at all even though counts exist for the categories.
The tooltip is not showing category-specific counts. It seems to be shoving all of the counts into one tooltip regardless of hovering over a category.
For the legend, is there a way to ensure that it uses the street names? If I remove the “Commute” values and leave “Arterial” it uses the names correctly, but then I lose the ability to show the categories.
I hope this isn’t too confusing. I’d sincerely appreciate any help.
CODE:
var svg = dimple.newSvg("#chartContainer", 1280, 720);
d3.csv("../HTML/strippedData_v2.csv", function (data) {
var myChart = new dimple.chart(svg, data);
myChart.setBounds(60, 45, 510, 315)
myChart.addCategoryAxis("x", ["Arterial"]);
myChart.addMeasureAxis("y", "Total");
myChart.addSeries(["Arterial", "Commute15", "Commute1530", "Commute3045", "Commute4560", "Commute60"], dimple.plot.bar);
myChart.addLegend(200, 10, 380, 20, "right");
myChart.draw();
});
IMAGES: (Don't have enough rep :/)
(Only the first 3 images of the gallery apply.)
http://imgur.com/a/8P2tN#0
I'm struggling to work out exactly how you would like the chart to look. I suspect the problem may be the CommuteXX fields. It sounds like you are trying to treat them as dimension values, whereas dimple treats columns as dimensions (and their row values as dimension values). Therefore you need to reorganise your data something like this:
Arterial |Commute |Population
Colfax Avenue |Commute15 |1380
Colfax Avenue |Commute1530 |1641
Colfax Avenue |Commute3045 |855
Etc...
This can be done in Javascript once the CSV is loaded. Here is a function to do that:
function unPivot(sourceData, valueFields, newCategoryField, newValueField) {
var returnData = [],
newRow,
key,
i,
j;
for (i = 0; i < sourceData.length; i += 1) {
for (j = 0; j < valueFields.length; j += 1) {
newRow = {}
for (key in sourceData[i]) {
if (sourceData[i].hasOwnProperty(key) && valueFields.indexOf(key === -1)) {
newRow[key] = sourceData[i][key];
}
}
newRow[newCategoryField] = valueFields[j];
newRow[newValueField] = sourceData[i][valueFields[j]];
returnData.push(newRow);
}
}
return returnData;
};
And here it is in a fiddle: http://jsfiddle.net/GeLng/15/
I'm not sure if this is the chart you are looking for, you mention a grouped bar but I'm not sure what you want to group by. Hopefully this will give you enough to create the chart the way you want.

Lucene Highlighter class: highlight different words in different colors

Probably most people reading the title who know a bit about Lucene won't need much further explanation. NB I use Jython but I think most Java users will understand the Java equivalent...
It's a classic thing to want to do: you have more than one term in your search string... in Lucene terms this returns a BooleanQuery. Then you use something like this code to highlight (NB I am a Lucene newbie, this is all closely tweaked from Net examples):
yellow_highlight = SimpleHTMLFormatter( '<b style="background-color:yellow">', '</b>' )
green_highlight = SimpleHTMLFormatter( '<b style="background-color:green">', '</b>' )
...
stream = FrenchAnalyzer( Version.LUCENE_46 ).tokenStream( "both", StringReader( both ) )
scorer = QueryScorer( fr_query, "both" )
fragmenter = SimpleSpanFragmenter(scorer)
highlighter = Highlighter( yellow_highlight, scorer )
highlighter.setTextFragmenter(fragmenter)
best_fragments = highlighter.getBestTextFragments( stream, both, True, 5 )
if best_fragments:
for best_frag in best_fragments:
print "=== best frag: %s, type %s" % ( best_frag, type( best_frag ))
html_text += "&bull %s<br>\n" % unicode( best_frag )
... and then the html_text is put in a JTextPane for example.
But how would you make the first word in your query highlight with a yellow background and the second word highlight with a green background? I have tried to understand the various classes in org.apache.lucene.search... to no avail. So my only way of learning was googling. I couldn't find any clues...
I asked this question four years ago... At the time I did manage to implement a solution using javax.swing.text.html.HTMLDocument. There's also the interface org.w3c.dom.html.HTMLDocument in the standard Java library. This way is hard work.
But for anyone interested there's a far simpler solution. Taking advantage of the fact that Lucene's SimpleHTMLFormatter returns about the simplest imaginable "marked up" piece of text: chosen words are highlighted with the HTML B tag. That's it. It's not even a "proper" HTML fragment, just a String with <B>s and </B>s in it.
A multi-word query generates a BooleanQuery... from which you can extract multiple TermQuerys by going booleanQuery.clauses() ... getQuery()
I'm working in Groovy. The colouring I want to apply is console codes, as per BASH (or Cygwin). Other types of colouring can be worked out on this model.
So you set up a map before to hold your "markup details":
def markupDetails = [:]
Then for each TermQuery, you call this, with the same text param each time, stipulating a different colour param for each term. NB I'm using Lucene 6.
def createHighlightAndAnalyseMarkup( TermQuery tq, String text, String colour ) {
def termQueryScorer = new QueryScorer( tq )
def termQueryHighlighter = new Highlighter( formatter, termQueryScorer )
TokenStream stream = TokenSources.getTokenStream( fieldName, null, text, analyser, -1 )
String[] frags = termQueryHighlighter.getBestFragments( stream, text, 999999 )
// not sure under what circs you get > 1 fragment...
assert frags.size() <= 1
// NB you don't always get all terms in all returned LDocuments...
if( frags.size() ) {
String highlightedFrag = frags[ 0 ]
Matcher boldTagMatcher = highlightedFrag =~ /<\/?B>/
def pos = 0
def previousEnd = 0
while( boldTagMatcher.find()) {
pos += boldTagMatcher.start() - previousEnd
previousEnd = boldTagMatcher.end()
markupDetails[ pos ] = boldTagMatcher.group() == '<B>'? colour : ConsoleColors.RESET
}
}
}
As I said, I wanted to colourise console output. The colour parameter in the method here is per the console colour codes as found here, for example. E.g. yellow is \033[033m. ConsoleColors.RESET is \033[0m and marks the place where each coloured bit of text stops.
... after you've finished doing this with all TermQuerys you will have a nice map telling you where individual colours begin and end. You work backwards from the end of the text so as to insert the "markup" at the right position in the String. NB here text is your original unmarked-up String:
markupDetails.sort().reverseEach{ pos, markup ->
String firstPart = text.substring( 0, pos )
String secondPart = text.substring( pos )
text = firstPart + markup + secondPart
}
... at the end of which text contains your marked-up String: print to console. Lovely.