Count Number of instances of text in a text file - VB - vb.net

I have a need to parse a large, delimited text file (28 million lines plus) and count the number of instances of a particular piece of text in the text file using VB 2015.
The structure of the lines is thus;
123|WD7|ELU|SOD|010116
456|WD9|LFT|AST|010116
135|WD7|TFT|THY|010116
154|AED|ELU|SOD|030116
etc, etc....
My exact requirements are to identify each of the entries in delimited field 2 and delimited field 4 and then count the number of instances of each.
So from the lines above, the items in field 2 would be WD7, WD9 and AED and the number of instances would be WD7 x 2, WD9 x 1 and AED x 1.
Similarly, the items in field 4 would be SOD, AST, THY and SOD and the number of instances would be SOD x 2, THY x 1, AST x 1.
The items in field 2 and field 4 will not be known prior to parsing the file and indeed the parsing is to identify what text is contained in these fields and how many times.
Hopefully that is clear and many thanks for any guidance.
Steve

try this:
Dim textfile As String = "C:/test/test.txt"
Dim stream_reader As New StreamReader(textfil_file)
Dim line As String
line = stream_reader.ReadLine()
Do While Not (line Is Nothing)
Dim parts As String() = line.Split("|")
For Each part In parts
'display them in msgboxes or do whatever you like with them
MsgBox(part(1))
MsgBox(part(3))
Next
line = stream_reader.ReadLine()
Loop
stream_reader.Close()

Related

Superscript Formatting Erased when Text is stored in String

Dim ST As String
ST = ActiveDocument.Paragraphs(1).Range.Text
In my document, Paragraphs(1) is actually 2 + 32. However, with Debug.Print ST, the output is 2 + 32. Is there any way to store the data without compromising the superscript and subscript formatting?
The objective behind this is to store 5 lines in ST(1 to 5) and then shuffle the order of the 5 lines.
1 - It is not clear how do you want to capture the paragraphs so I'm assuming that you will have those paragraphs selected, modify it based on your requirement
2 - It is also not clear on what shuffle means so I will assume that you want it to be reversed, you will need to come out with your own logic on how to shuffle the paragraphs:
FormattedText property can be used to replace a range with formatted text so this should work for you:
Private Sub ShuffleSelectedParagraphs()
ActiveDocument.Content.InsertParagraphAfter
Dim i As Long
For i = Selection.Paragraphs.Count To 1 Step -1
ActiveDocument.Content.Paragraphs.Last.Range.FormattedText = Selection.Paragraphs(i).Range.FormattedText
Next
End Sub
You will need to select the paragraphs first then run the Sub, it will duplicate the selected paragraphs at the end of the document but in the reverse order.

When extracting values from vba word table fields, I get box like special characters. How to remove them from a list?

Below is the snippet of code I used to extract field value from word file with fields. The fields contain data from a xml document. I need to extract 10 10 data as array(1)=10 and array(2)=10 in below image. The text 10 box 10 may have more than 2 values.
I would have to the multiple values in a array. I tried split with delimiter as space, newline. It doesnt split. Then tried
left(x1,(len(x)-1)/2) and right(x1,(len(x)-1)/2)
to extract 2 values.
I am new to vba in word and need help in extracting multiple values or splitting based on these boxes which seem to be a table value.
Function calcinput()
x1 = ActiveDocument.SelectContentControlsByTag("TWCAR_Nominal").Item(1).Range.Text
x2 = ActiveDocument.SelectContentControlsByTag("TWCAR_NominalUnit").Item(1).Range.Text
MsgBox x1 + s2
Above image shows the output.
Above image shows data to be extracted
Assuming that the content control shown in the screenshot is a Plain Text control with multiple paragraphs enabled, the values will be separated with a line feed character, Chr(11). Splitting the text on this character will give you the array.
Dim quantity As Variant
quantity = Split(ActiveDocument.ContentControls(1).Range.Text, Chr(11))

Sort issue with integers [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I am trying to make a leader board that puts highest scores at the top or left with this layout
99 - james,
90 - will,
80 - dan,
70 - phil,
60 - kevin,
570 - jim,
50 - ben,
40 - david,
30 - jose,
220 - peter,
20 - tony,
10 - nick,
The .sort command doesn't work for numbers 3 digits and up i have a list that i am tying to sort but it is not working.
This is what i am currently working with.
leaderboard.Sort()
leaderboard.Reverse()
It does sort numbers under 100 perfectly well this is the only issue i have.
Dim leaderboard As New List(Of String)
Using Reader As New StreamReader("C:\Users\1111\OneDrive\Documents\Leaderboard.txt")
While Reader.EndOfStream = False
leaderboard.Add(Reader.ReadLine())
End While
End Using
leaderboard.Sort()
leaderboard.Reverse()
First I made a Structure as a template to hold your data. It has 2 properties. One to hold the Score and one to hold the name.
Private Structure Leader
Public Property Score As Integer
Public Property Name As String
End Structure
The code starts out by creating a new list of Leader (the name of the structure).
I used the File class from System.IO (you will need to add this to the list of Imports at the top of the code file). .ReadAllLines returns an array of strings, each element is a single line from the text file.
Then we loop through each line, splitting the line by the hyphen. This will give your an array of strings with 2 elements. Before you try to convert the the first element to an Integer be sure to trim off any spaces. The second element of the array will contain the name and need to be trimmed. I also replaced the comma with an empty string.
Finally, a bit of Linq magic orders the list in descending order by score into another list. Function(lead) is a function that takes each item in the original list and tests its Score property. I called .ToList at the end so orderedLeader could be display in a DataGridView.
Private Sub OPCode()
Dim leaderboard As New List(Of Leader)
Dim lines = File.ReadAllLines("leaderboard.txt")
For Each line In lines
Dim splitLine = line.Split("-"c)
Dim sc = CInt(splitLine(0).Trim)
Dim nm = splitLine(1).Trim.Replace(",", "")
leaderboard.Add(New Leader With {.Score = sc, .Name = nm})
Next
Dim orderedLeaderbord = leaderboard.OrderByDescending(Function(lead) lead.Score).ToList
DataGridView1.DataSource = orderedLeaderbord
End Sub

Count lines before specified string of Text File? In VB

is there a way to count the amount of lines before a specific line / string in a text file.
For Example:
1
2
3
4
5
6
7
8
9
Say i want to count the amount of line before '8'...
How would i do that?
thanks!
Hope that this actually you are looking for,
it will read all lines from a file specified. then find the IndexOf particular line(searchText) then add 1 to it will gives you the required count since index is0based.
Dim lines = File.ReadAllLines("f:\sample.txt")
Dim searchText As String = "8"
msgbox(Array.IndexOf(lines, searchText) + 1)
Here's another example using List.FindIndex(), which allows you to pass in a Predicate(T) to define how to make a match:
Dim fileName As String = "C:\Users\mikes\Documents\SomeFile.txt"
Dim lines As New List(Of String)(File.ReadAllLines(fileName))
Dim index As Integer = lines.FindIndex(Function(x) x.Equals("8"))
MessageBox.Show(index)
In the example above, we're looking for an exact match with "8", but you can make the predicate match whatever you like for more complex scenarios. Just make the function (the predicate) return True for what you want to be a match.
For example, a line containing "magic":
Function(x) x.ToLower().Contains("magic")
or a line that begins with a "FirstStep":
Function(x) x.StartsWith("FirstStep")
The predicate doesn't have to be a simple string function, it can be as complex as you like. Here's one that will find a string that ends with "UnicornFarts", but only on Wednesday and if Notepad is currently open:
Function(x) DateTime.Today.DayOfWeek = DayOfWeek.Wednesday AndAlso Process.GetProcessesByName("notepad").Length > 0 AndAlso x.EndsWith("UnicornFarts")
You get the idea...
Using a List, instead of an Array, is good for situations when you need to delete and/or insert lines into the contents before writing them back out to the file.

SSIS - Script Component, Split single row to multiple rows (Parent Child Variation)

Thanks in advance for your help. I'm in need of help on writing SSIS script component to delimit single row to multiple rows. There were many helpful blog and post I looked at below:
http://beyondrelational.com/ask/public/questions/1324/ssis-script-component-split-single-row-to-multiple-rows-parent-child-variation.aspx
http://bi-polar23.blogspot.com/2008/06/splitting-delimited-column-in-ssis.html
However, I need a little extra help on coding to complete the project. Basically here's what I want to do.
Input data
ID Item Name
1 Apple01,02,Banana01,02,03
2 Spoon1,2,Fork1,2,3,4
Output data
ParentID ChildID Item Name
1 1 Apple01
1 2 Apple02
1 3 Banana01
1 4 Banana02
1 5 Banana03
2 1 Spoon1
2 2 Spoon2
2 3 Fork1
2 4 Fork2
2 5 Fork3
2 6 Fork4
Below is my attempt to code, but feel free to revise whole if it's illogic. SSIS Asynchronous output is set.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim posID As Integer, childID As Integer
Dim delimiter As String = ","
Dim txtHolder As String, suffixHolder As String
Dim itemName As String = Row.ItemName
Dim keyField As Integer = Row.ID
If Not (String.IsNullOrEmpty(itemList)) Then
Dim inputListArray() As String = _
itemList.Split(New String() {delimiter}, _
StringSplitOptions.RemoveEmptyEntries)
For Each item As String In inputListArray
Output0Buffer.AddRow()
Output0Buffer.ParentID = keyField
If item.Length >= 3 Then
txtHolder = Trim(item)
Output0Buffer.ItemName = txtHolder
'when item length is less than 3, it's suffix
Else
suffixHolder = Trim(item)
txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
- Len(suffixHolder)) & suffixHolder.ToString()
Output0Buffer.ItemName = txtHolder
End If
Next
End If
End Sub
The current code produces the following output
ID Item Name
1 Apple01
1 02
1 Banana01
1 02
1 03
2 Spoon1
2 2
2 Fork1
2 2
2 3
2 4
If I come across as pedantic in this response, it is not my intention. Based on the comment "I'm new at coding and having a problem troubleshooting" I wanted to walk through my observations and how I came to them.
Problem analysis
The desire is to split a single row into multiple output rows based on a delimited field associated to the row.
The code as it stands now is generating the appropriate number of rows so you do have the asynchronous part (split) of the script working so that's a plus. What needs to happen is we need to 1) Populate the Child ID column 2) Apply the item prefix to all subsequent row when generating the child items.
I treat most every problem like that. What am I trying to accomplish? What is working? What isn't working? What needs to be done to make it work. Decomposing problems into smaller and smaller problems will eventually result in something you can do.
Code observations
Pasting in the supplied code resulted in an error that itemList was not declared. Based on usage, it seems that it was intended to be itemName.
After fixing that, you should notice the IDE indicating you have 2 unused variables (posID, childID) and that the variable txHolder is used before it's been assigned a value. A null reference exception could result at runtime. My coworker often remarks warnings are errors that haven't grown up yet so my advice to you as a fledgling developer is to pay attention to warnings unless you explicitly expect the compiler to warn you about said scenario.
Getting started
With a choice between solving the Child ID situation versus the name prefix/suffix stuff, I'd start with an easy one, the child id
Generating a surrogate key
That's the fancy title phrase that if you searched on you'd have plenty of hits to ssistalk or sqlis or any of a number of fabulously smart bloggers. Devil of course is knowing what to search on. No where do you ever compute or assign the child id value to the stream which of course is why it isn't showing up there.
We simply need to generate a monotonically increasing number which resets each time the source id changes. I am making an assumption that the inbound ID is unique in the incoming data like a sales invoice number would be unique and we are splitting out the items purchased. However if those IDs were repeated in the dataset, perhaps instead of representing invoice numbers they are salesperson id. Sales Person 1 could have another row in the batch selling vegetables. That's a more complex scenario and we can revisit if that better describes your source data.
There are two parts to generating our surrogate key (again, break problems down into smaller pieces). The first thing to do is make a thing that counts up from 1 to N. You have defined a childId variable to serve this. Initialize this variable (1) and then increment it inside your foreach loop.
Now that we counting, we need to push that value onto the output stream. Putting those two steps together would look like
childID = 1
For Each item As String In inputListArray
Output0Buffer.AddRow()
Output0Buffer.ParentId = keyField
Output0Buffer.ChildId = childID
' There might be VB shorthand for ++
childID = childID + 1
Run the package and success! Scratch the generate surrogate key off the list.
String mashing
I don't know of a fancy term for what needs to be done in the other half of the problem but I needed some title for this section. Given the source data, this one might be harder to get right. You've supplied value of Apple01, Banana01, Spoon1, Fork1. It looks like there's a pattern there (name concatenated with a code) but what it is it? Your code indicates that if it's less than 3, it's a suffix but how do you know what the base is? The first row uses a leading 0 and is two digits long while the second row does not use a leading zero. This is where you need to understand your data. What is the rule for identifying the "code" part of the first row? Some possible algorithms
Force your upstream data providers to provide consistent length codes (I think this has worked once in my 13 years but it never hurts to push back against the source)
Assuming code is always digits, evaluate each character in reverse order testing whether it can be cast to an integer (Handles variable length codes)
Assume the second element in the split array will provide the length of the code. This is the approach you are taking with your code and it actually works.
I made no changes to make the generated item name work beyond fixing the local variables ItemName/itemList. Final code eliminates the warnings by removing PosID and initializing txtHolder to an empty string.
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim childID As Integer
Dim delimiter As String = ","
Dim txtHolder As String = String.Empty, suffixHolder As String
Dim itemName As String = Row.ItemName
Dim keyField As Integer = Row.ID
If Not (String.IsNullOrEmpty(itemName)) Then
Dim inputListArray() As String = _
itemName.Split(New String() {delimiter}, _
StringSplitOptions.RemoveEmptyEntries)
' The inputListArray (our split out field)
' needs to generate values from 1 to N
childID = 1
For Each item As String In inputListArray
Output0Buffer.AddRow()
Output0Buffer.ParentId = keyField
Output0Buffer.ChildId = childID
' There might be VB shorthand for ++
childID = childID + 1
If item.Length >= 3 Then
txtHolder = Trim(item)
Output0Buffer.ItemName = txtHolder
Else
'when item length is less than 3, it's suffix
suffixHolder = Trim(item)
txtHolder = Left(txtHolder.ToString(), Len(txtHolder) _
- Len(suffixHolder)) & suffixHolder.ToString()
Output0Buffer.ItemName = txtHolder
End If
Next
End If
End Sub