Speed up large string data parser function - vb.net

I currently have a file with 1 million characters.. the file is 1 MB in size. I am trying to parse data with this old function that still works but very slow.
start0end
start1end
start2end
start3end
start4end
start5end
start6end
the code, takes about 5 painful minutes to process the whole data.
any pointers and suggestions are appreciated.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim sFinal = ""
Dim strData = textbox.Text
Dim strFirst = "start"
Dim strSec = "end"
Dim strID As String, Pos1 As Long, Pos2 As Long, strCur As String = ""
Do While InStr(strData, strFirst) > 0
Pos1 = InStr(strData, strFirst)
strID = Mid(strData, Pos1 + Len(strFirst))
Pos2 = InStr(strID, strSec)
If Pos2 > 0 Then
strID = Microsoft.VisualBasic.Left(strID, Pos2 - 1)
End If
If strID <> strCur Then
strCur = strID
sFinal += strID & ","
End If
strData = Mid(strData, Pos1 + Len(strFirst) + 3 + Len(strID))
Loop
End Sub

The reason that is so slow is because you keep destroying and recreating a 1 MB string over and over. Strings are immutable, so strData = Mid(strData... creates a new string and copies the remaining of the 1 MB string data to a new strData variable over and over and over. Interestingly, even VB6 allowed for a progressive index.
I would have processed the disk file LINE BY LINE and plucked out the info as it was read (see streamreader.ReadLine) to avoid working with a 1MB string. Pretty much the same method could be used there.
' 1 MB textbox data (!?)
Dim sData As String = TextBox1.Text
' start/stop - probably fake
Dim sStart As String = "start"
Dim sStop As String = "end"
' result
Dim sbResult As New StringBuilder
' progressive index
Dim nNDX As Integer = 0
' shortcut at least as far as typing and readability
Dim MagicNumber As Integer = sStart.Length
' NEXT index of start/stop after nNDX
Dim i As Integer = 0
Dim j As Integer = 0
' loop as long as string remains
Do While (nNDX < sData.Length) AndAlso (i >= 0)
i = sData.IndexOf(sStart, nNDX) ' start index
j = sData.IndexOf(sStop, i) ' stop index
' Extract and append bracketed substring
sbResult.Append(sData.Substring(i + MagicNumber, j - (i + MagicNumber)))
' add a cute comma
sbResult.Append(",")
nNDX = j ' where we start next time
i = sData.IndexOf(sStart, nNDX)
Loop
' remove last comma
sbResult.Remove(sbResult.ToString.Length - 1, 1)
' show my work
Console.WriteLine(sbResult.ToString)
EDIT: Small mod for the ad hoc test data

Related

Best way to optimise For Loops and Do Until Loops [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 years ago.
Improve this question
I have the following code that searches through folder directories in a DataGridView table, and puts all files of the wanted format into a list, it also gathers a list of their last modified date for later use in the application.
The code works, but it is sore on the eyes. I want to tidy up the following loops to improve efficiency - what I mean is that I have a For loop within a For loop that creates the list of filenames, then I have two separate Do Until loops that search through the list from start to finish to pick out file names that need adjustment.
I would be very interested to learn a better way of achieving the same result, as my knowledge of efficiency in coding is quite elementary. Basically, can this be done in one or two loops, as the idea of looping through the Lists twice seems inefficient?
Public Class
Private Sub btnDirectory_Click(sender As Object, e As EventArgs) Handles btnDirectory.Click
Dim FileNames As New List(Of String)
Dim FileDates As New List(Of Date)
Dim DocNo As String
Dim rowCheck As String
Dim ProjectNo As String = "1111"
Dim FileNameCheck As String
Dim str As String
Dim k As Integer = 0
Dim i As Integer
Dim j As Integer
Dim CorrectType As Boolean = False
'The first loop grabs all files of the wanted format from a datagridview table containing all directories to be checked
For Each rw In Background.Table1.Rows
rowCheck = Background.Table1(0, k).Value
If Not String.IsNullOrEmpty(rowCheck) Then
For Each file As String In My.Computer.FileSystem.GetFiles(Background.Table1(0, k).Value)
CorrectType = False
FileNameCheck = IO.Path.GetFileNameWithoutExtension(file)
If FileNameCheck.Contains(ProjectNo) AndAlso FileNameCheck.Contains("-") AndAlso Not String.IsNullOrEmpty(FileNameCheck) AndAlso FileNameCheck.Contains(" ") Then
DocNo = FileNameCheck.Substring(0, FileNameCheck.IndexOf(" "))
If FileNameCheck.Substring(0, FileNameCheck.IndexOf("-")) = ProjectNo AndAlso CountLetters(DocNo) = 3 Then
CorrectType = True
End If
End If
If CorrectType = True Then
FileNames.Add(FileNameCheck)
FileDates.Add(IO.File.GetLastWriteTime(file))
End If
Next
End If
k += 1
Next
'The next loop tidies up the file formats that contain a "-00-" in their names
j = FileNames.Count
i = 0
Do
str = FileNames(i)
If str.Contains("-00-") Then
FileNames(i) = RemoveChar(str, "-00-") ' RemoveChar is a function that replaces "-00-" with a "-"
End If
i += 1
Loop Until i = j
i = 0
j = FileNames.Count
'Finally, this loop checks that no two files have the exact same name, and gets rid of one of them if that is the case
Do
Dim st1 As String = FileNames(j - 1)
Dim st2 As String = FileNames(j - 2)
If st1 = st2 Then
FileNames.RemoveAt(j - 1)
FileDates.RemoveAt(j - 1)
End If
j -= 1
Loop Until j = 1
End Sub
End Class
The code is certainly hard on the eyes.
the For Each rw loop does not use rw. You could replace this with a loop such as:
For k = 1 to Background.Table1.Rows.Count
' Do things here
Next k
You assign rowCheck and use it once, but you missed the opportunity to reuse it in the For Each file line.
Where you have CorrectType = True you can easily place the corresponding code instead.
If FileNameCheck.Substring(0, FileNameCheck.IndexOf("-")) = ProjectNo AndAlso CountLetters(DocNo) = 3 Then
CorrectType = True
End If
End If
If CorrectType = True Then
FileNames.Add(FileNameCheck)
FileDates.Add(IO.File.GetLastWriteTime(file))
End If
becomes:
If FileNameCheck.Substring(0, FileNameCheck.IndexOf("-")) = ProjectNo AndAlso CountLetters(DocNo) = 3 Then
FileNames.Add(FileNameCheck)
FileDates.Add(IO.File.GetLastWriteTime(file))
End If
I must admit, the next two loops made my eyes bleed (figuratively, not literally).
j = FileNames.Count
i = 0
Do
str = FileNames(i)
If str.Contains("-00-") Then
FileNames(i) = RemoveChar(str, "-00-") ' RemoveChar is a function that replaces "-00-" with a "-"
End If
i += 1
Loop Until i = j
becomes
for i = 1 to FileNames.Count
str = FileNames(i)
If str.Contains("-00-") Then
FileNames(i) = RemoveChar(str, "-00-") ' RemoveChar is a function that replaces "-00-" with a "-"
End If
Next I
And
i = 0
j = FileNames.Count
'Finally, this loop checks that no two files have the exact same name, and gets rid of one of them if that is the case
Do
Dim st1 As String = FileNames(j - 1)
Dim st2 As String = FileNames(j - 2)
If st1 = st2 Then
FileNames.RemoveAt(j - 1)
FileDates.RemoveAt(j - 1)
End If
j -= 1
Loop Until j = 1
becomes
'Finally, this loop checks that no two files have the exact same name, and gets rid of one of them if that is the case
For j = FileNames.Count - 1 to 1 Step -1 ' Check my counting here - stop at 1, 2 or 0?
Dim st1 As String = FileNames(j)
Dim st2 As String = FileNames(j - 1)
If st1 = st2 Then
FileNames.RemoveAt(j)
FileDates.RemoveAt(j)
End If
Next j

In string search for number to determine next in sequence?

I have a list of build numbers (e.g. R1079-AAA-001, ...-002 etc.) in which the value of "R1079" changes depending on the machine being used. What I want to do is search through the list to determine the last used build number (the last 3 digits) in relation to the specific machine I intend to use. I then need to add one and create a new log for the new build i.e. the last R1079 build was 056, therefore the new one is 057.
Currently the THEORY I have is an in string search for the machine number followed by a number search and store in the string and converted to integer. This is then added into a dynamic array and the maximum found when the loop is complete. One is added to this integer and the new name placed into a cell.
However, the code I have doesn't work so I assume I am missing things/got it all wrong.
Code below:
Sub test()
Dim x As String
Dim n As Integer
Dim i As Integer
Dim Machine_EBM As String
Dim retval As String
Dim retvalint As Integer
Dim LastBuild As Integer
Dim NextBuild As Integer
Dim myarr() As Integer
Machine = "R1079"
x = Cells("A1").Value 'get the first string in the list
n = 1
Do Until x = ""
If InStr(x, Machine) > 0 Then 'search for machine in string
For i = 6 To Len(Str) 'search for numbers at end of string
If Mid(x, i, 1) >= "0" And Mid(x, i, 1) <= "9" Then
retval = retval + Mid(s, i, 1) 'store numbers
End If
Next i
retvalint = CInt(retval) ' convert to integer
ReDim Preserve myarr(n)
myarr(n) = retvalint ' store integer value in array
n = n + 1
End If
Loop
LastBuild = Worksheet.Function.Max(myarr(n)) ' determine maximum array value
NewBuild = LastBuild + 1 'add one to the value
Range("C1").Select
ActiveCell = Machine = "-AAA-" + NewBuild 'input new build number
End Sub
I am fairly new to VBA and self taught so I realise there may be a lot of errors here that I am missing. Any help is appreciated!
Thanks,
Charlie
Here is a small piece of code for getting new build no for inputted build no.
I already tested the code. It give me right answer. So, you can use this code.
Public Sub getBuildNo()
Dim machineCode, lastBuildCode, newBuildCode As String
Dim buildNo As Integer
'Set machine code
machineCode = "R1079"
'Set last build code
lastBuildCode = Range("A1")
'Get last build no
buildNo = Right(lastBuildCode, 3)
'Increase 1
buildNo = buildNo + 1
'Get new build No
newBuildCode = machineCode & "-AAA-"
'adding prefix 0s for getting like (001, 002, 025, etc.)
If buildNo < 10 Then
newBuildCode = newBuildCode & "00" & buildNo
ElseIf buildNo < 100 Then
newBuildCode = newBuildCode & "0" & buildNo
Else
newBuildCode = newBuildCode & buildNo
End If
'show new code
Range("C1") = newBuildCode
End Sub

VBA to load very large file in one go (no buffering)

I am experiencing an unexpected vb limitation on the string max size, as explained in this post:
VBA unexpected reach of string size limit
While I was expecting to be able to load files up to 2GB (2^31 char) using open path for binary and get function, I get an out of string space error when I try to load a string larger than 255,918,061 characters.
I managed to work around this issue buffering the input stream of get. The problem is that I need to load the file as an array of string by splitting the buffer on vbCrLf characters.
This requires then to build the array line by line. Moreover, since I cannot be sure whether the buffer is ending on a break line or not I need additional operations. This solution is Time and Memory consuming. Loading a file of 300MB with this code costs 900MB (!) use of memory by excel. Is there a better solution ?
Here bellow is my code:
Function Load_File(path As String) As Variant
Dim MyData As String, FNum As Integer
Dim LenRemainingBytes As Long
Dim BufferSizeCurrent As Long
Dim FileByLines() As String
Dim CuttedLine As Boolean
Dim tmpSplit() As String
Dim FinalSplit() As String
Dim NbOfLines As Long
Dim LastLine As String
Dim count As Long, i As Long
Const BufferSizeMax As Long = 100000
FNum = FreeFile()
Open path For Binary As #FNum
LenRemainingBytes = LOF(FNum)
NbOfLines = FileNbOfLines(path)
ReDim FinalSplit(NbOfLines)
CuttedLine = False
Do While LenRemainingBytes > 0
MyData = ""
If LenRemainingBytes > BufferSizeMax Then
BufferSizeCurrent = BufferSizeMax
Else
BufferSizeCurrent = LenRemainingBytes
End If
MyData = Space$(BufferSizeCurrent)
Get #FNum, , MyData
tmpSplit = Split(MyData, vbCrLf)
If CuttedLine Then
count = count - 1
tmpSplit(0) = LastLine & tmpSplit(0)
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
Else
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
End If
Erase tmpSplit
LastLine = Right(MyData, Len(MyData) - InStrRev(MyData, vbCrLf) - 1)
CuttedLine = Len(LastLine) > 1
LenRemainingBytes = LenRemainingBytes - BufferSizeCurrent
Loop
Close FNum
Load_File = FinalSplit
Erase FinalSplit
End Function
Where the function FileNbOfLines is efficiently returning the number of line break characters.
Edit:
My Needs are:
To look for a specific string within the file
To get a specific number of lines coming after this string
Here you go, not pretty but should give you the general concept:
Sub GetLines()
Const fileName As String = "C:\Users\bloggsj\desktop\testfile.txt"
Const wordToFind As String = "FindMe"
Dim lineStart As String
Dim lineCount As String
Dim linesAfterWord As Long
With CreateObject("WScript.Shell")
lineCount = .Exec("CMD /C FIND /V /C """" """ & fileName & """").StdOut.ReadAll
lineStart = Split(.Exec("CMD /C FIND /N """ & wordToFind & """ """ & fileName & """").StdOut.ReadAll, vbCrLf)(2)
End With
linesAfterWord = CLng(Trim(Mid(lineCount, InStrRev(lineCount, ":") + 1))) - CLng(Trim(Mid(lineStart, 2, InStr(lineStart, "]") - 2)))
Debug.Print linesAfterWord
End Sub
Uses CMD to count the number of lines, then find the line at which the word appears, then subtract one from the other to give you the amount of lines after the word has been found.
Answer: Yes, using ReadAll from FSO should do the job.
Best answer: Just avoid it !
My needs were:
Identify a specific string within the file
Extract a certain number of lines after this string
As far as you precisely know the exact amout of data you want to extract, and assuming this amount of data is below vba string size limit (!), here is what it does the job the faster.
Decrease of computation time is improved using binary comparison of strings. My code is as follows:
Function GetFileLines(path As String, str As String, NbOfLines As Long) As String()
Const BUFSIZE As Long = 100000
Dim StringFound As Boolean
Dim lfAnsi As String
Dim strAnsi As String
Dim F As Integer
Dim BytesLeft As Long
Dim Buffer() As Byte
Dim strBuffer As String
Dim BufferOverlap As String
Dim PrevPos As Long
Dim NextPos As Long
Dim LineCount As Long
Dim data As String
F = FreeFile(0)
strAnsi = StrConv(str, vbFromUnicode) 'Looked String
lfAnsi = StrConv(vbLf, vbFromUnicode) 'LineBreak character
Open path For Binary Access Read As #F
BytesLeft = LOF(F)
ReDim Buffer(BUFSIZE - 1)
'Overlapping buffer is 3/2 times the size of strBuffer
'(two bytes per character)
BufferOverlap = Space$(Int(3 * BUFSIZE / 4))
StringFound = False
Do Until BytesLeft = 0
If BytesLeft < BUFSIZE Then ReDim Buffer(BytesLeft - 1)
Get #F, , Buffer
strBuffer = Buffer 'Binary copy of bytes.
BytesLeft = BytesLeft - LenB(strBuffer)
Mid$(BufferOverlap, Int(BUFSIZE / 4) + 1) = strBuffer 'Overlapping Buffer
If Not StringFound Then 'Looking for the the string
PrevPos = InStrB(BufferOverlap, strAnsi) 'Position of the looked string within the buffer
StringFound = PrevPos <> 0
If StringFound Then strBuffer = BufferOverlap
End If
If StringFound Then 'When string is found, loop until NbOfLines
Do Until LineCount = NbOfLines
NextPos = InStrB(PrevPos, strBuffer, lfAnsi)
If NextPos = 0 And LineCount < NbOfLines Then 'Buffer end reached, NbOfLines not reached
'Adding end of buffer to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos)
PrevPos = 1
Exit Do
Else
'Adding New Line to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos, NextPos - PrevPos + 1)
End If
PrevPos = NextPos + 1
LineCount = LineCount + 1
If LineCount = NbOfLines Then Exit Do
Loop
End If
If LineCount = NbOfLines then Exit Do
Mid$(BufferOverlap, 1, Int(BUFSIZE / 4)) = Mid$(strBuffer, Int(BUFSIZE / 4))
Loop
Close F
GetFileLines = Split(data, vbCrLf)
End Function
To crunch even more computation time, it is highly advised to use fast string concatenation as explained here.
For instance the following function can be used:
Sub FastConcat(ByRef Dest As String, ByVal Source As String, ByRef ccOffset)
Dim L As Long, Buffer As Long
Buffer = 50000
L = Len(Source)
If (ccOffset + L) >= Len(Dest) Then
If L > Buffer Then
Dest = Dest & Space$(L)
Else
Dest = Dest & Space$(Buffer)
End If
End If
Mid$(Dest, ccOffset + 1, L) = Source
ccOffset = ccOffset + L
End Sub
And then use the function as follows:
NbOfChars = 0
Do until...
FastConcat MyString, AddedString, NbOfChars
Loop
MyString = Left$(MyString,NbOfChars)

How do I get an Ascii to warp to a certain value after it has past 122?

I am trying to write an encryption program. The problem I am facing is that I am converting the text to ascii and then adding on the offset. However when it goes past the letter 'z' I want it to warp back to 'a' and go from there.
Sub enc()
Text = TextBox1.Text
finalmessage = ""
letters = Text.ToCharArray
offset = ComboBox1.SelectedItem
For x = LBound(letters) To UBound(letters)
finalmessage = finalmessage + Chr(Asc(letters(x)) + offset)
Next
TextBox2.Text = finalmessage
End Sub
I guess to make it easy to decode afterwards, you should to it somewhat in the line of base64 encoding, first encoding everything to a normalized binary string, then encode in the range you want (since using binary, it has to be something that fits with 2^X).
To match your range, i used a baseset of 32, and a simple encoding decoding example (a bit more verbose that it should be, perhaps)
Module Module1
Dim encodeChars As String = "abcdefghijklmnopqrstuvwxyzABCDEF" ' use 32 as a base
Function Encode(text As String) As String
Dim bitEncoded As String = ""
Dim outputMessage As String = ""
For Each ch As Char In text.ToCharArray()
Dim i As Integer = Convert.ToByte(ch)
bitEncoded &= Convert.ToString(i, 2).PadLeft(8, "0"c)
Next
While bitEncoded.Length Mod 5 <> 0
bitEncoded &= "0"
End While
For position As Integer = 0 To bitEncoded.Length - 1 Step 5
Dim range As String = bitEncoded.Substring(position, 5)
Dim index As Integer = Convert.ToInt32(range, 2)
outputMessage &= encodeChars(index).ToString()
Next
Return outputMessage
End Function
Function Decode(encodedText As String) As String
Dim bitEncoded As String = ""
Dim outputMessage As String = ""
For Each ch In encodedText
Dim index As Integer = encodeChars.IndexOf(ch)
If index < 0 Then
Throw New FormatException("Invalid character in encodedText!")
End If
bitEncoded &= Convert.ToString(index, 2).PadLeft(5, "0"c)
Next
' strip the extra 0's
While bitEncoded.Length Mod 8 <> 0
bitEncoded = bitEncoded.Substring(0, bitEncoded.Length - 1)
End While
For position As Integer = 0 To bitEncoded.Length - 1 Step 8
Dim range As String = bitEncoded.Substring(position, 8)
Dim index As Integer = Convert.ToInt32(range, 2)
outputMessage &= Chr(index).ToString()
Next
Return outputMessage
End Function
Sub Main()
Dim textToEncode As String = "This is a small test, with some special characters! Just testing..."
Dim encodedText As String = Encode(textToEncode)
Dim decodedText As String = Decode(encodedText)
Console.WriteLine(textToEncode)
Console.WriteLine(encodedText)
Console.WriteLine(decodedText)
If Not String.Equals(decodedText, textToEncode) Then
Console.WriteLine("Encoding / decoding failed!")
Else
Console.WriteLine("Encoding / decoding completed succesfully!")
End If
Console.ReadLine()
End Sub
End Module
this then gives the following output?
This is a small test, with some special characters! Just testing...
krugsCzanfzsayjaonwwcBdmebAgkCBufqqhoAlunaqhgBBnmuqhgCdfmnuwcBbamnugcCtbmnAgkCtteeqeuDltoqqhizltoruwCzzofyxa
This is a small test, with some special characters! Just testing...
Encoding / decoding completed succesfully!

Encrypter with Key and Message - VB.Net

Okay, so I am trying to make my program take whatever your key is, and loop through each character of the key, then find the ascii code for each character code, and then loop through each character of the message, finding the ascii code of each of them, and adding the key code to the message code, doing this for each character in the key, to each character in the message. I ran into a little problem, it changes the message right after the first letter is added, and I can't figure out how to fix it, any help would be great!
Basically, all I want is to that the ascii code for the key characters and add them to the ascii code for the message characters, then convert that final code back to the new characters in the message text. Using this:
tbxMessage.Text = (AscW(Mid(tbxMessage.Text, xForMess, 1)) + AscW(vTemp))
Here is everything I've got so far:
Public Class Form1
Function fctEncryptDecrypt(pMess As String, pKey As String) As String
If Len(tbxMessage.Text) > 0 Then
Dim xForKey As Integer
Dim xForMess As Integer
Dim intKey As Integer
Dim intMessage As Integer
Dim strAsciiKeyChar As String
Dim intAsciiKeyChar As Integer
Dim strAsciiMesChar As String
Dim intAsciiMesChar As Integer
Dim vTemp As String
Dim vTempMess As String
intKey = Len(tbxKey.Text)
intMessage = Len(tbxMessage.Text)
For xForKey = 1 To intKey
strAsciiKeyChar = Mid(tbxKey.Text, xForKey, 1)
intAsciiKeyChar = AscW(strAsciiKeyChar)
vTemp = intAsciiKeyChar
For xForMess = 1 To intMessage
strAsciiMesChar = Mid(tbxMessage.Text, xForMess, 1)
intAsciiMesChar = AscW(strAsciiMesChar)
vTempMess = vTemp + intAsciiMesChar
tbxMessage.Text = (AscW(Mid(tbxMessage.Text, xForMess, 1)) + AscW(vTemp))
Next xForMess
Next xForKey
Label1.Text = vTemp
Else
MessageBox.Show("No Message Found")
End If
End Function
Private Sub btnEncrypt_Click(sender As System.Object, e As System.EventArgs) Handles btnEncrypt.Click
fctEncryptDecrypt(tbxMessage.Text, tbxKey.Text)
End Sub
End Class
tbxMessage.Text = (AscW(Mid(tbxMessage.Text, xForMess, 1)) + AscW(vTemp))
in the inner For loop is setting the value of the text box to a number.
I'd expect the use of a temporary string variable that collects
Chr((intAsciiKeyChar + intAsciiMesChar) mod 256)
I'd also expect the key to be applied one letter at a time over the message. Something like:
Dim i as Integer
Dim s as String
Dim sKey as String
Dim sMesg as String
Dim intCharacter as Integer
s = ""
For i = 1 to len(tbxMessage.Text)
sKey = Mid(tbxKey.Text, (i mod Len(tbxKey)) + 1, 1)
sMesg = Mid(tbxMessage.Text, i, 1)
intCharacter = (WAsc(sKey) + WAsc(sMesg)) mod 256
s = s & Chr(intCharacter)
Next
tbxMessage.Text = s