Sort text file with delimiter - vb.net

I have a text file that is 800KB line by line with comma delimiter.
I am trying to sort this text file by the first part which is a date.
when I run this it takes about 2 seconds to complete.
something is really slowing it down, what do you guys see?
Dim sw As New Stopwatch
sw.Start()
Dim sMilli As Integer = 1000
Dim iSortedDates As New SortedDictionary(Of Date, String)
For Each line As String In IO.File.ReadAllLines(iFilePath)
Dim eachPart() As String = line.Split(","c)
Dim eachDate As Date = Date.Parse(eachPart(0)).AddMilliseconds(sMilli)
iSortedDates(eachDate) = line
If sMilli = 5000 Then sMilli = 1
sMilli += 1
Next
Dim iAllData As String = ""
For Each iSNew In iSortedDates.Keys
iAllData += iSortedDates(iSNew) & Environment.NewLine
Next
IO.File.WriteAllText(AppDomain.CurrentDomain.BaseDirectory & iFilePath, iAllData)
sw.Stop()
Debug.Print("Total Milliseconds: " & sw.Elapsed.TotalMilliseconds)

If you are targeting version 4 or higher of the Framework, you may be able to save some time by using IO.File.ReadLines instead of IO.File.ReadAllLines as ReadLines doesn't make you wait until the whole file is read before you can start processing the lines.
You can avoid building that long iAllData string one line at a time by using iSortedDates.Values to create an array that can be written bi IO.File.WriteAllLines.
Dim sw As New Stopwatch
sw.Start()
Dim sMilli As Integer = 1000
Dim iSortedDates As New SortedDictionary(Of Date, String)
For Each line As String In IO.File.ReadLines(iFilePath)
Dim eachPart() As String = line.Split(","c)
Dim eachDate As Date = Date.Parse(eachPart(0)).AddMilliseconds(sMilli)
iSortedDates(eachDate) = line
If sMilli = 5000 Then sMilli = 1
sMilli += 1
Next
Dim iAllData() As String = iSortedDates.Values.ToArray
IO.File.WriteAllLines(AppDomain.CurrentDomain.BaseDirectory & iFilePath, iAllData)
sw.Stop()
Debug.Print("Total Milliseconds: " & sw.Elapsed.TotalMilliseconds)

I would look into Linq if you could. Below is a quick query I have done for you, it not only reads all of the lines, but it checks if the split string is a date then order by that and put anything else at the end sorted. I tested this on a 6.73 MB file and came out at 1.97 seconds. If you ask me that is really quick I would say.
You can use this anywhere
Dim nDate As Date
Dim lines As List(Of String) = System.IO.File.ReadAllLines(yourfile).Where(Function(x) Date.TryParse(x.Split(","c)(0), nDate) OrElse Not String.IsNullOrEmpty(x)).OrderBy(Function(line) line.Split(",")(0)).ToList
IO.File.WriteAllText("FILE LOCATION", Concat(lines))
Function to return all lines in a string
Public Shared Function Concat(source As List(Of String)) As String
Dim sb As New System.Text.StringBuilder
For Each s As String In source
sb.AppendLine(s)
Next
Return sb.ToString()
End Function
P.S. Sorry if the Linq query looks long, you can make that top down so it can be easier to read if you want.

Related

VB.NET extract data from API

I am a beginner in VB.NET and i am trying to extract data from an API and add it to a listview column but i don't know how to extract the data.
[This is the API][1]
[1]: https://tmnf.exchange/api/tracks?author=lolsport&count=40&fields=TrackId%2CTrackName
It is a API for downloading race tracks for Trackmania.
The data is shown as follows {"TrackId":9707620,"TrackName":"lolsport R444"},
Now what i need is the TrackIDs and TrackNames.
i have two columns in my program where i want to sort them into like so.
**TrackID** **TrackName**
9707620 lolsport R444
How can i do this? i googled a lot about regular expressions but i cant seem to find anything that works.
Dim Data As String = "{""TrackId"":9707620,""TrackName"":""lolsport R444""}"
Dim dataaray() As String = Data.Split(",")
Dim dataval() As String
Dim fileloc As String = Environment.CurrentDirectory & "\Test.txt"
If Not File.Exists(fileloc) Then
File.Create(fileloc).Dispose()
Else
File.Delete(fileloc)
File.Create(fileloc).Dispose()
End If
Dim objwriter As New StreamWriter(fileloc, True)
Dim i As Int32 = 0
Dim val As String
objwriter.WriteLine("**TrackID** **TrackName**")
For Each rw As String In dataaray
dataval = rw.Split(":")
val += dataval(1).Replace("""", "").Replace("}", "") & vbTab
If i = 1 Then
objwriter.WriteLine(val.TrimEnd())
val = String.Empty
i = 0
End If
i += 1
Next
objwriter.Close()
objwriter.Dispose()

Creating multiple .txt files while restricting size of each

In my program, I collect bits of information on a massive scale, hundreds of thousands to millions of lines each. I am trying to limit each file I create to a certain size in order to be able to quickly open it and read the data. I am using a HashSet to collect all the data without duplicates.
Here's my code so far:
Dim Founds As HashSet(Of String)
Dim filename As String = (Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\Sorted_byKING\sorted" + Label4.Text + ".txt")
Using writer As New System.IO.StreamWriter(filename)
For Each line As String In Founds
writer.WriteLine(line)
Next
Label4.Text = Label4.Text + 1 'Increments sorted1.txt, sorted2.txt etc
End Using
So, my question is:
How do I go about saving, let's say 250,000 lines in a text file before moving to another one and adding the next 250,000?
First of all, do not use Labels to simply store values. You should use variables instead, that's what variables are for.
Another advice, always use Path.Combine to concatenate paths, that way you don't have to worry about if each part of the path ends with a separator character or not.
Now, to answer your question:
If you'd like to insert the text line by line, you can use something like:
Sub SplitAndWriteLineByLine()
Dim Founds As HashSet(Of String) 'Don't forget to initialize and fill your HashSet
Dim maxLinesPerFile As Integer = 250000
Dim fileNum As Integer = 0
Dim counter As Integer = 0
Dim filename As String = String.Empty
Dim writer As IO.StreamWriter = Nothing
For Each line As String In Founds
If counter Mod maxLinesPerFile = 0 Then
fileNum += 1
filename = IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),
$"Sorted_byKING\sorted{fileNum.ToString}.txt")
If writer IsNot Nothing Then writer.Close()
writer = New IO.StreamWriter(filename)
End If
writer.WriteLine(line)
counter += 1
Next
writer.Dispose()
End Sub
However, if you will be inserting the text from the HashSet as is, you probably don't need to write line by line, instead you can write each "bunch" of lines at once. You could use something like the following:
Sub SplitAndWriteAll()
Dim Founds As HashSet(Of String) 'Don't forget to initialize and fill your HashSet
Dim maxLinesPerFile As Integer = 250000
Dim fileNum As Integer = 0
Dim filename As String = String.Empty
For i = 0 To Founds.Count - 1 Step maxLinesPerFile
fileNum += 1
filename = IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop),
$"Sorted_byKING\sorted{fileNum.ToString}.txt")
IO.File.WriteAllLines(filename, Founds.Skip(i).Take(maxLinesPerFile))
Next
End Sub

Why is each String being written to a different file?

I am trying to generate a size-based list of files. The current size being passed is 10 MB worth of file-names per text file. Instead of it counting to 10 MB and then incrementing the version letter, it is writing each file-name to its own individual file. This is strange as each file is ~150 kb, but I cannot figure out why it is reporting total as > number every time the code loops.
Private Function GenerateListsForSize(source As String, destination As String, name As String, number As Integer)
Dim files As ArrayList = New ArrayList
Dim total As Integer
Dim version As Char = "A"
Dim path As String
Dim counter As Integer = 0
Dim passTexts As ArrayList = New ArrayList
Dim infoReader As System.IO.FileInfo
For Each foundFile As String In My.Computer.FileSystem.GetFiles(source)
files.Add(foundFile)
Next
If files.Count > 1 Then 'If files exist in dir, count them and get how many lists
path = destination & "\" & name & version & ".txt"
Dim fs As FileStream = File.Create(path) 'creates the first text file
fs.Close()
passTexts.Add(path)
For Each foundfile As String In files
Using sw As StreamWriter = New StreamWriter(path)
Console.WriteLine(foundfile)
sw.WriteLine(foundfile)
End Using
infoReader = My.Computer.FileSystem.GetFileInfo(foundfile)
total = total + infoReader.Length
If total >= number Then 'If max file size is reached
version = Chr(Asc(version) + 1) 'Increments Version
path = destination & "\" & name & version & ".txt" 'Corrects path
fs = File.Create(path) 'creates the new text file with updated path
fs.Close()
passTexts.Add(path)
total = 0 'resets total
End If
Next
End If
Return passTexts
End Function
Every time through the loop, you open the file (using the StreamWriter) which overwrites the previous contents. Your file will only ever have one filename inside it. Instead of opening and writing every time through the loop, only write the file when you have accumulated all the filenames. I removed the calls to File.Create as they aren't necessary. The StreamWriter will create the file if it doesn't exist. And I changed the ArrayList's to List(Of String) since they're easier to work with. Also, be sure to turn Option Strict On. This code has not been tested, but it should get my point across. I hope I haven't misunderstood what you were trying to do.
Private Function GenerateListsForSize(source As String, destination As String, name As String, number As Integer) As List(Of String)
Dim files As New List(Of String)()
Dim filenamesToWrite As New List(Of String)()
Dim total As Integer
Dim version As Char = "A"
Dim filename As String
Dim counter As Integer = 0
Dim passTexts As New List(Of String)()
Dim infoReader As System.IO.FileInfo
files.AddRange(My.Computer.FileSystem.GetFiles(source))
If files.Count > 1 Then 'If files exist in dir, count them and get how many lists
'Path.Combine is preferable to concatenating strings.
filename = Path.Combine(destination, String.Format("{0}{1}.txt", name, version))
passTexts.Add(filename)
For Each foundfile As String In files
filenamesToWrite.Add(foundfile)
infoReader = My.Computer.FileSystem.GetFileInfo(foundfile)
total = total + infoReader.Length
If total >= number Then 'If max file size is reached
'Only write when the list is complete for this batch.
Using sw As StreamWriter = New StreamWriter(filename)
For Each fname As String In filenamesToWrite
Console.WriteLine(foundfile)
sw.WriteLine(foundfile)
Next
End Using
version = Chr(Asc(version) + 1) 'Increments Version
filename = Path.Combine(destination, String.Format("{0}{1}.txt", name, version)) 'corrects path
passTexts.Add(filename) 'IS THIS A DUPLICATE????
total = 0 'resets total
filenamesToWrite.Clear() 'clear the list of file names to write
End If
Next
End If
Return passTexts
End Function

How do I create a loop statement that finds the specific amount of characters in a string?

Beginner here, bear with me, I apologize in advance for any mistakes.
It's some homework i'm having a bit of trouble going about.
Overall goal: outputting the specific amount of characters in a string using a loop statement. Example being, user wants to find how many "I" is in "Why did the chicken cross the road?", the answer should be 2.
1) The form/gui has 1 MultiLine textbox and 1 button titled "Search"
2) User enters/copys/pastes text into the Textbox clicks "Search" button
3) Search button opens an InputBox where the user will type in what character(s) they want to search for in the Textbox then presses "Ok"
4) (where I really need help) Using a Loop Statement, The program searches and counts the amount of times the text entered into the Inputbox, appears in the text inside the MultiLine Textbox, then, displays the amount of times the character showed up in a "messagebox.show"
All I have so far
Private Sub Search_btn_Click(sender As System.Object, e As System.EventArgs) Handles Search_btn.Click
Dim counterInt As Integer = 0
Dim charInputStr As String
charInputStr = CStr(InputBox("Enter Search Characters", "Search"))
I would use String.IndexOf(string, int) method to get that. Simple example of concept:
Dim input As String = "Test input string for Test and Testing the Test"
Dim search As String = "Test"
Dim count As Integer = -1
Dim index As Integer = 0
Do
index = input.IndexOf(search, index) + 1
count += 1
Loop While index > 0
count is initialized with -1 because of do-while loop usage - it will be set to 0 even if there is no pattern occurrence in input string.
Try this Code
Dim input As String = "Test input string for Test and Testing the Test"
Dim search() As String = {"Te"}
MsgBox(input.Split(input.Split(search, StringSplitOptions.None), StringSplitOptions.RemoveEmptyEntries).Count)
Concept: Increment the count until the input containing the particular search string. If it contains the search string then replace the first occurance with string.empty (In String.contains() , the search starts from its first index, that is 0)
Dim input As String = "Test input string for Test and Testing the Test"
Dim search As String = "T"
Dim count As Integer = 0
While input.Contains(search) : input = New Regex(search).Replace(input, String.Empty, 1) : count += 1 : End While
MsgBox(count)
Edit:
Another solution:
Dim Input As String = "Test input string for Test and Testing the Test"
Dim Search As String = "Test"
MsgBox((Input.Length - Input.Replace(Search, String.Empty).Length) / Search.Length)
try this code.... untested but i know my vb :)
Function lol(ByVal YourLine As String, ByVal YourSearch As String)
Dim num As Integer = 0
Dim y = YourLine.ToCharArray
Dim z = y.Count
Dim x = 0
Do
Dim l = y(x)
If l = YourSearch Then
num = num + 1
End If
x = x + 1
Loop While x < z
Return num
End Function
Its a function that uses its own counter... for every character in the string it will check if that character is one that you have set (YourSearch) and then it will return the number of items that it found. so in your case it would return 2 because there are two i's in your line.
Hope this helps!
EDIT:
This only works if you are searching for individual Characters not words
You can try with something like this:
Dim sText As String = TextBox1.Text
Dim searchChars() As Char = New Char() {"i"c, "a"c, "x"c}
Dim index, iCont As Integer
index = sText.IndexOfAny(searchChars)
While index >= 0 Then
iCont += 1
index = sText.IndexOfAny(searchChars, index + 1)
End While
Messagebox.Show("Characters found " & iCont & " times in text")
If you want to search for words and the times each one is appearing try this:
Dim text As String = TextBox1.Text
Dim wordsToSearch() As String = New String() {"Hello", "World", "foo"}
Dim words As New List(Of String)()
Dim findings As Dictionary(Of String, List(Of Integer))
'Dividing into words
words.AddRange(text.Split(New String() {" ", Environment.NewLine()}, StringSplitOptions.RemoveEmptyEntries))
findings = SearchWords(words, wordsToSearch)
Console.WriteLine("Number of 'foo': " & findings("foo").Count)
With this function used:
Private Function SearchWords(ByVal allWords As List(Of String), ByVal wordsToSearch() As String) As Dictionary(Of String, List(Of Integer))
Dim dResult As New Dictionary(Of String, List(Of Integer))()
Dim i As Integer = 0
For Each s As String In wordsToSearch
dResult.Add(s, New List(Of Integer))
While i >= 0 AndAlso i < allWords.Count
i = allWords.IndexOf(s, i)
If i >= 0 Then dResult(s).Add(i)
i += 1
End While
Next
Return dResult
End Function
You will have not only the number of occurances, but the index positions in the file, grouped easily in a Dictionary.

How to check if lines in string are separated by space?

I'm building a program that gets the publisher of the book by scanning its title page and using OCR … since publishers are always at the bottom of the title page I'm thinking that a detecting lines that are separated by space is a solution but I don't know how to test for that. Here is my code:
Dim builder As New StringBuilder()
Dim reader As New StringReader(txtOCR.Text)
Dim iCounter As Integer = 0
While True
Dim line As String = reader.ReadLine()
If line Is Nothing Then Exit While
'i want to put the condition here
End While
txtPublisher.Text = builder.ToString()
Do you mean empty lines? Then you can do this:
Dim bEmpty As Boolean
And then inside the loop:
If line.Trim().Length = 0 Then
bEmpty = True
Else
If bEmpty Then
'...
End If
bEmpty = False
End If
Why not do the following: from the bottom, go up until you find the first non-empty line (no idea how the OCR works … maybe the bottom-most line is always non-empty, in which case this is redundant). In the next step, go up until the first empty line. The text in the middle is the publisher.
You don’t need the StringReader for that:
Dim lines As String() = txtOCR.Text.Split(Environment.NewLine)
Dim bottom As Integer = lines.Length - 1
' Find bottom-most non-empty line.
Do While String.IsNullOrWhitespace(lines(bottom))
bottom -= 1
Loop
' Find empty line above that
Dim top As Integer = bottom - 1
Do Until String.IsNullOrWhitespace(lines(top))
top -= 1
Loop
Dim publisherSubset As New String(bottom - top)()
Array.Copy(lines, top + 1, publisherSubset, 0, bottom - top)
Dim publisher As String = String.Join(Environment.NewLine, publisherSubset)
But to be honest I don’t think this is a particularly good approach. It’s inflexible and doesn’t cope well with unexpected formatting. I’d instead use a regular expression to describe what the publisher string (and its context) looks like. And maybe even that isn’t enough and you have to put some thought into describing the whole page to extrapolate which of the bits is the publisher.
Assuming the publisher is always on the last line and always comes after an empty line. Then perhaps the following?
Dim Lines as New List(Of String)
Dim currentLine as String = ""
Dim previousLine as String = ""
Using reader As StreamReader = New StreamReader(txtOCR.Txt)
currentLine = reader.ReadLine
If String.IsNullOrWhiteSpace(previousLine) then lines.Add(currentLine)
previousLine = currentLine
End Using
txtPublisher.Text = lines.LastOrDefault()
To ignore if the previous line is blank/empty:
Dim Lines as New List(Of String)
Using reader As StreamReader = New StreamReader(txtOCR.Txt)
lines.Add(reader.ReadLine)
End Using
txtPublisher.Text = lines.LastOrDefault()