Parse Text File with Variable Fields Vb.Net - vb.net

A text file that I process has changed in the way data is formatted, so it's time to update the code that parses it. The old file had a fixed number of lines and fields per record and so parsing it by position was easy, of course now that isn't the case (I added the spaces for readability, the ~ indicates a new line, the * is the field separator):
~ENT*1*2J*34*111223333
~NM1*IL*1*SMITHJOHNA***N*123456789
~RMRIKH62XX/PAY/1234567/20150103**12345.67
~REFZZMEDPM/M/12345.67
~REF*LU*40/CSWI
~DTM*582****RD8*20150101-20150131
~ENT*2*2J*34*222334444
~NM1*IL*1*DOEJANES***N*234567891 ~RMRIKH62XX/PAY/1234567/345678901**23456.78
~REF*LU*40/CSWI
~DTM*582****RD8*20141211-20141231
~ENT*3*2J*34*333445555
~NM1*IL*1*DOE*JOHN****N*3456789012 ~RMRIKH62XX/PAY/200462975/20150103**45678.90
~REFZZMEDPM/M/3456.78
~REF*LU*40/CSWI
~DTM*582****RD8*20150101-20150131
~ENT*4*2J*34*444556666
~NM1*IL*1*SMITHJANED***N*456789012 ~RMRIKH62XX/PAY/567890123/678901234**6789.01
~REFZZMEDPM/M/6789.01
~REF*LU*40/CSWI
~DTM*582****RD8*20150101-20150131
~ENT*5*2J*34*666778888
~NM1*IL*1*SMITHJONJ***N*8901234
~RMRIKH62XX/PAY/56789012/67890123**5678.90
~REFZZMEDPM/M/5678.90
~REF*LU*40/CSWI
~DTM*582****RD8*20150101-20150131
~ENT*6*2J*34*777889999
~NM1*IL*1*DOEBOBE***N*567890123
~RMRIKH62XX/PAY/34567890/45678901*5678.90
~REF*LU*40/CSWI
~DTM*582****RD8*20141210-20141231 ~RMRIKH62XX/PAY/1234567890/2345678901**6789.01
~REFZZMEDPM/M/6789.01
~REF*LU*40/CSWI
~DTM*582****RD8*20150101-20150131
What is the best way to parse this data? Is there a better way than using StreamReader?

String.Split is your friend.
If the file is not too large, the simplest approach would be to:
Read the file contents into a string variable (File.ReadAllText).
Split the "lines" (lines = allText.Split("~"c)).
Loop through the lines. For each line:
Split the line into fields (fields = line.Split("*"c))
Process the field values. You'll probably want to have a big Select Case statement on fields(0) and then proceed depending on the first field of the line.

You can get this into an 2-D array fairly easily:
' Dynamic structure to hold the data as we go.
Dim data As New List(Of String())
' Break each delimiter into a new line.
Dim lines = System.IO.File.ReadAllText("data.txt").Split("~")
' Process each line.
For Each line As String In lines
' Break down the components of each line.
data.Add(line.Split("*"))
Next
' Produce 2-D array. Not really needed, as you can just use data if you want.
Dim dataArray = data.ToArray()
Now just iterate through the 2-D structure and process the data accordingly.
If you need to ensure your data always has a specific number of indexes (for example, some lines have 5 fields supplied, but you expect there to always be 8), you can can adjust the data.Add command like so:
' Ensure there are always at least 8 indexes for each line.
' This will insert blank (String.Empty) values into the array indexes if a line of data omits certain values.
data.Add((line & Space(8).Replace(" ", "*")).Split("*"))

Related

Read and split line by line in text file

I am trying to read a text file from my applications resources. For each line in this text file I want to split the text before and after the comma.
Each line in txt file looks like this:
-125.325235,4845636
My issue is that the function loops and does not end constantly repeating the for each statement
For Each Line As String In My.Resources.CompanyBases
MsgBox(My.Resources.CompanyBases.Split(","c).First)
MsgBox(My.Resources.CompanyBases.Split(","c).Last)
Next
Firstly, don't ever get a resource over and over like that. Those properties are not "live". Every time you get the property, the resource has to be extracted from your assembly. If you need to use the value multiple times, get the property once and assign it to a variable, then use that variable over and over.
Secondly, you're not getting a file. The whole point of resources is that they are not distinct files but rather data compiled into your assembly. It's just a String like any other. How would you usually split a String on line breaks?
Finally, you have a For Each loop with a loop control variable Line, yet you never use that variable inside the loop. It should be Line that you're splitting inside the loop, not the resource property containing all the lines.
For Each line In My.Resources.CompanyBases.Split({Environment.NewLine}, StringSplitOptions.None)
Dim fields = line.Split(","c)
Debug.WriteLine(fields(0))
Debug.WriteLine(fields(1))
Next
Note that, if you're using .NET Core, Split will accept a String as well as a String array.

How to tabulate a numeric text file?

I have a numeric text file whose data need to be tabulated. Unfortunately, depending on the input data, this text file shows numbers which are not separed, one to each other, from a fixed number of spaces and unfortunately this number of space could not be the same from a row to another.
I tried this:
For Each line In RichTextBox2.Lines
If Not line.Trim.ToString = "" Then
Dim item() As String = line.Trim.Split(" "c)
DataGridView1.Rows.Add(item)
End If
Next
and then I used to erase the blank columns. But I realised that when the number of space is different on the rows, I have data, that should fill the same column, that are on two or three different columns (with spaces). Let me provide an example:
Here there are the numbers coming from the text file and here are the tabulated numbers.
What I would do, is to erase all the spaces except one, in order to have these numbers equally spaced one from another. Hoping to have been clear, I thank you so much for the support

excel VBA - refurn variable length array from function

I want to emulate the results of the builtin "text-to-fields" in a UDF function.
I need to do this because my original data comes from a web query, and I need to use the results on a separate page and plot those results.
For plotting, I need missing values to parse to empty cells, since that is the only option for excel graphs to show missing values as gaps.
You cannot do that with the builtin, because of two limitations;
1) It cannot target the parsed fields onto another sheet.
2) Trying to copy the data values to the destination sheet to parse them there fails, because text-to-fields parses the referencing expression, instead of the value it references.
3) I cannot parse on the original data sheet, and then copy the parsed fields to the target sheet, because no expression can copy an empty cell, it gets converted into a zero value. (after all, an expression resulting in an empty cell would erase itself!)
So I need a DIY field parser, and in any case using a formula is better for my overall needs than having to macro-ize the builtin function (even if it would work).
My fields look like this:
calm
S 10
S 10 G 20
And I want them to parse just as a text-to-fields would, which would give numerics for numbers, strings for text, and empty for missing fields (i.e. shorter readings.)
So I used this code;
Function Explode(texte As String, Optional ByVal delimiter As String = " ") _
As String()
' mimic the text-to-fields,
' but allow inter-sheet references
Explode = Split(texte, delimiter)
End Function
But to use it, I have to pre-define the function calling cell as part of an array, which is fixed size, and I don't know how to have this return a variable number of parsed fields into a fixed size array. What I want from the sample data above is:
But what I get instead is:
Note that empty cells must be empty - not just look empty (not "" strings).
Edit:
I suspect that I may have to instead create a sub which sets the values of the parsed fields and clears the remainder of cells for missing fields (I always have a maximum of four fields) instead of returning them, but am not very VBA proficient. For example, something that gets two cell references, one for the source reference, another for the target list of parsed fields. Then call that from a function which I can embed in the sheet. Side-effect based programming...
You could make your function return a fixed array size using Redim Preserve.
{=Explode($A$1,50,",")}
Function Explode(texte As String, ArraySize As Integer, Optional ByVal delimiter As String = " ") _
As String()
Explode = Split(texte, delimiter)
ReDim Preserve Explode(ArraySize- 1)
End Function

Word Macro for separating a comma-separated list to columns

I have a very large set of data that represents cartesian coordinates in the form x0,y0,z0,x1,y1,z1...xn,yn,zn. I need to create a new line at the end of each xyz coordinate. I have been trying to record a macro that moves a certain number of spaces from the beginning of each line, then creates a new line. This, of course, will not work since the number of digits in each xyz coordinate differs.
How can I create a macro to do this in Microsoft Word?
Try this:
Public Sub test()
Dim s As String
Dim v As Variant
Dim t As String
Dim I As Long
s = "x0,y0,z0,x1,y1,z1,xn,yn,zn"
v = Split(s, ",")
t = ""
For I = LBound(v) To UBound(v)
t = t + v(I)
If I Mod 3 = 2 Then
t = t + vbCr
Else
t = t + ","
End If
Next I
t = Left(t, Len(t) - 1)
Debug.Print t
End Sub
The Split function splits a string along the delimiter you specify (comma in your case), returning the results in a 0-based array. Then in the For loop we stitch the pieces back together, using a carriage return (vbCR) every third element and a comma otherwise.
The final (optional) step is to remove the trailing carriage return.
Hope that helps
The question placed before us was most clearly asked
“Please produce a macro sufficient to the task
I have Cartesian coordinates, a single line of these
Array them in many lines, triplets if you please!”
Instinctively we start to code, a solution for this quest
Often without asking, “Is this way truly best?”
But then another scheme arises from the mind
That most venerated duo: Word Replace and Find
Provide the two textboxes each an encantation
Check the Wildcard option and prepare for Amazation!
Forgive me!
In Word open Find/Replace
Click the More button and check the Use wildcards box
For Find what enter ([!,]{1,},[!,]{1,},[!,]{1,}),
For Replace with enter \1^p
Use Find Next, Replace and Replace All as usual
How it works
With wildcards, [!,]{1,} finds one or more chars that are NOT commas. This idiom is repeated 3 times with 2 commas separating the 3 instances. This will match 3 comma-delimited coordinates. The whole expression is then wrapped in parentheses to created an auto-numbered group (in this case Group #1). Creating a group allows us to save text that matches the pattern and use it in the Replace box. Outside of the parentheses is one more comma, which separates one triplet of coordinates from the next.
In the Replace box \1 retrieves auto-numbered group 1, which is our coordinate triplet. Following that is ^p which is a new paragraph in Word.
Hope that helps!

Count a specific word using a Function ans a Sub

In VB.net, I want to make a counting program using a Function and a Sub.
There is a textbox to input a date and a button to exercise the programme in Form1.
I have a txt file which was extracted from MS-Excel with sequential date of time at its column A.
And from that txt file, I want to count the number of date(Actually string) such as "18-Jun-12".
The answer showing the count should be in the format of msgbox in the Sub.
I really have no idea how to link a Function and a Sub using variable, because I am just beginner.
Any help will be gratefully accepted.
If the fields are delimited by comma you must be careful since the field itself could contain a comma. Then you cannot differentiate between the value and the delimiter. You either could enclose the fields with quotes to mask them. But then you should use an available CSV parser anyway.
If the values never contain comma and you want a simple solution use File.ReadLines or File.ReadAllLines to read the lines and String.Split to get all fields per line.
Here's a simple approach using a little bit of LINQ to count all lines which contain the searched date (as string):
Dim linesWithThatDate = From line in File.ReadLines("Path to File")
Where line.Split(","c)(0).Trim() = "18-Jun-12"
Dim count = linesWithThatDate.Count()
As an aside, if the user must enter a date you could use a DateTimePicker control instead. Then you should also use Date.Parse(line.Split(","c)(0).Trim()) or Date.TryParse to get a real date.