VBA to load very large file in one go (no buffering) - vba

I am experiencing an unexpected vb limitation on the string max size, as explained in this post:
VBA unexpected reach of string size limit
While I was expecting to be able to load files up to 2GB (2^31 char) using open path for binary and get function, I get an out of string space error when I try to load a string larger than 255,918,061 characters.
I managed to work around this issue buffering the input stream of get. The problem is that I need to load the file as an array of string by splitting the buffer on vbCrLf characters.
This requires then to build the array line by line. Moreover, since I cannot be sure whether the buffer is ending on a break line or not I need additional operations. This solution is Time and Memory consuming. Loading a file of 300MB with this code costs 900MB (!) use of memory by excel. Is there a better solution ?
Here bellow is my code:
Function Load_File(path As String) As Variant
Dim MyData As String, FNum As Integer
Dim LenRemainingBytes As Long
Dim BufferSizeCurrent As Long
Dim FileByLines() As String
Dim CuttedLine As Boolean
Dim tmpSplit() As String
Dim FinalSplit() As String
Dim NbOfLines As Long
Dim LastLine As String
Dim count As Long, i As Long
Const BufferSizeMax As Long = 100000
FNum = FreeFile()
Open path For Binary As #FNum
LenRemainingBytes = LOF(FNum)
NbOfLines = FileNbOfLines(path)
ReDim FinalSplit(NbOfLines)
CuttedLine = False
Do While LenRemainingBytes > 0
MyData = ""
If LenRemainingBytes > BufferSizeMax Then
BufferSizeCurrent = BufferSizeMax
Else
BufferSizeCurrent = LenRemainingBytes
End If
MyData = Space$(BufferSizeCurrent)
Get #FNum, , MyData
tmpSplit = Split(MyData, vbCrLf)
If CuttedLine Then
count = count - 1
tmpSplit(0) = LastLine & tmpSplit(0)
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
Else
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
End If
Erase tmpSplit
LastLine = Right(MyData, Len(MyData) - InStrRev(MyData, vbCrLf) - 1)
CuttedLine = Len(LastLine) > 1
LenRemainingBytes = LenRemainingBytes - BufferSizeCurrent
Loop
Close FNum
Load_File = FinalSplit
Erase FinalSplit
End Function
Where the function FileNbOfLines is efficiently returning the number of line break characters.
Edit:
My Needs are:
To look for a specific string within the file
To get a specific number of lines coming after this string

Here you go, not pretty but should give you the general concept:
Sub GetLines()
Const fileName As String = "C:\Users\bloggsj\desktop\testfile.txt"
Const wordToFind As String = "FindMe"
Dim lineStart As String
Dim lineCount As String
Dim linesAfterWord As Long
With CreateObject("WScript.Shell")
lineCount = .Exec("CMD /C FIND /V /C """" """ & fileName & """").StdOut.ReadAll
lineStart = Split(.Exec("CMD /C FIND /N """ & wordToFind & """ """ & fileName & """").StdOut.ReadAll, vbCrLf)(2)
End With
linesAfterWord = CLng(Trim(Mid(lineCount, InStrRev(lineCount, ":") + 1))) - CLng(Trim(Mid(lineStart, 2, InStr(lineStart, "]") - 2)))
Debug.Print linesAfterWord
End Sub
Uses CMD to count the number of lines, then find the line at which the word appears, then subtract one from the other to give you the amount of lines after the word has been found.

Answer: Yes, using ReadAll from FSO should do the job.
Best answer: Just avoid it !
My needs were:
Identify a specific string within the file
Extract a certain number of lines after this string
As far as you precisely know the exact amout of data you want to extract, and assuming this amount of data is below vba string size limit (!), here is what it does the job the faster.
Decrease of computation time is improved using binary comparison of strings. My code is as follows:
Function GetFileLines(path As String, str As String, NbOfLines As Long) As String()
Const BUFSIZE As Long = 100000
Dim StringFound As Boolean
Dim lfAnsi As String
Dim strAnsi As String
Dim F As Integer
Dim BytesLeft As Long
Dim Buffer() As Byte
Dim strBuffer As String
Dim BufferOverlap As String
Dim PrevPos As Long
Dim NextPos As Long
Dim LineCount As Long
Dim data As String
F = FreeFile(0)
strAnsi = StrConv(str, vbFromUnicode) 'Looked String
lfAnsi = StrConv(vbLf, vbFromUnicode) 'LineBreak character
Open path For Binary Access Read As #F
BytesLeft = LOF(F)
ReDim Buffer(BUFSIZE - 1)
'Overlapping buffer is 3/2 times the size of strBuffer
'(two bytes per character)
BufferOverlap = Space$(Int(3 * BUFSIZE / 4))
StringFound = False
Do Until BytesLeft = 0
If BytesLeft < BUFSIZE Then ReDim Buffer(BytesLeft - 1)
Get #F, , Buffer
strBuffer = Buffer 'Binary copy of bytes.
BytesLeft = BytesLeft - LenB(strBuffer)
Mid$(BufferOverlap, Int(BUFSIZE / 4) + 1) = strBuffer 'Overlapping Buffer
If Not StringFound Then 'Looking for the the string
PrevPos = InStrB(BufferOverlap, strAnsi) 'Position of the looked string within the buffer
StringFound = PrevPos <> 0
If StringFound Then strBuffer = BufferOverlap
End If
If StringFound Then 'When string is found, loop until NbOfLines
Do Until LineCount = NbOfLines
NextPos = InStrB(PrevPos, strBuffer, lfAnsi)
If NextPos = 0 And LineCount < NbOfLines Then 'Buffer end reached, NbOfLines not reached
'Adding end of buffer to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos)
PrevPos = 1
Exit Do
Else
'Adding New Line to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos, NextPos - PrevPos + 1)
End If
PrevPos = NextPos + 1
LineCount = LineCount + 1
If LineCount = NbOfLines Then Exit Do
Loop
End If
If LineCount = NbOfLines then Exit Do
Mid$(BufferOverlap, 1, Int(BUFSIZE / 4)) = Mid$(strBuffer, Int(BUFSIZE / 4))
Loop
Close F
GetFileLines = Split(data, vbCrLf)
End Function
To crunch even more computation time, it is highly advised to use fast string concatenation as explained here.
For instance the following function can be used:
Sub FastConcat(ByRef Dest As String, ByVal Source As String, ByRef ccOffset)
Dim L As Long, Buffer As Long
Buffer = 50000
L = Len(Source)
If (ccOffset + L) >= Len(Dest) Then
If L > Buffer Then
Dest = Dest & Space$(L)
Else
Dest = Dest & Space$(Buffer)
End If
End If
Mid$(Dest, ccOffset + 1, L) = Source
ccOffset = ccOffset + L
End Sub
And then use the function as follows:
NbOfChars = 0
Do until...
FastConcat MyString, AddedString, NbOfChars
Loop
MyString = Left$(MyString,NbOfChars)

Related

Word VBA Natural Sorting

I want to sort an array, or the Files from a Filesystemobject Folder, the way we'd expect them to be if sorted by a human. What I ultimately am trying to accomplish is a macro that takes images from a folder and inserts them into the word document with text above each one to identify what it represents, here I use steps for a guide and it's crucial that Step 2 come before step 100;
Setting up my test sub;
Sub RunTheSortMacro()
Dim i As Long
Dim myArray As Variant
'Set the array
myArray = Array("Step-1", "Step-2", "Step-10", "Step-15", "Step-9", "Step-20", "Step-100", "Step-8", "Step-7")
'myArray variable set to the result of SortArray function
myArray = SortArray(myArray)
'Output the Array through a message box
For i = LBound(myArray) To UBound(myArray)
MsgBox myArray(i)
Next i
End Sub
Then the only/best sort function I found is really only good for numbers;
Function SortArray(ArrayIn As Variant)
Dim i As Long
Dim j As Long
Dim Temp
'Sort the Array A-Z
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
If ArrayIn(i) > ArrayIn(j) Then
SrtTemp = ArrayIn(j)
ArrayIn(j) = ArrayIn(i)
ArrayIn(i) = SrtTemp
End If
Next j
Next i
SortArray = ArrayIn
End Function
That function returns the array as;
Step-1,
Step-10,
Step-100,
Step-15,
Step-2,
Step-20,
Step-7,
Step-8,
Step-9
but I want;
Step-1,
Step-2,
Step-7,
Step-8,
Step-9,
Step-10,
Step-15,
Step-20,
Step-100
I thought using StrComp(ArrayIn(i), ArrayIn(j), vbBinaryCompare/vbTextCompare) would be one way to go but they seem to sort the same way. If it's easier, I am only going the array route because I couldn't find a way to sort the input files from;
Set objFSO = CreateObject("Scripting.Filesystemobject")
Set Folder = objFSO.GetFolder(FolderPath)
For Each image In Folder.Files
ImagePath = image.Path
Selection.TypeText Text:=Left(image.Name, Len(image.Name) - 4)
Selection.TypeText Text:=vbCr
'Insert the images into the word document
Application.Selection.EndKey END_OF_STORY, MOVE_SELECTION
Application.Selection.InlineShapes.AddPicture (ImagePath)
Application.Selection.InsertBreak 'Insert a pagebreak
Next
So I was going to break the file name and path into two arrays that I could sort naturally;
Set objFiles = Folder.Files
FileCount = objFiles.Count
ReDim imageNameArray(FileCount)
ReDim imagePathArray(FileCount)
icounter = 0
For Each image In Folder.Files
imageNameArray(icounter) = (image.Name)
imagePathArray(icounter) = (image.Path)
icounter = icounter + 1
Next
but I can't find any reference to natural sorting in VBA.
Update, Additional Details;
I didn't think about the A and B after numbers and everything I search agrees on what "natural sorting" is; 1,2,3,A,B,C; Apple < 1A < 1C < 2. Regex might be good
This was how I achieved this in a python script;
import os
import re
def tryint(s):
try:
return int(s)
except:
return s
def alphanum_key(s):
""" Turn a string into a list of string and number chunks.
"z23a" -> ["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
def sort_nicely(l):
""" Sort the given list in the way that humans expect.
"""
l.sort(key=alphanum_key)
files = [file for file in os.listdir(".") if (file.lower().endswith('.png')) or (file.lower().endswith('.jpg'))]
files.sort(key=alphanum_key)
for file in sorted(files,key=alphanum_key):
stepname = file.strip('.jpg')
print(stepname.strip('.png')
For VBA I have found that these;
Function SortArray(ArrayIn As Variant)
Dim i As Long
Dim j As Long
Dim Temp1 As String
Dim Temp2 As String
Dim Temp3 As String
Dim Temp4 As String
'Sort the Array A-Z
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
Temp1 = ArrayIn(i)
Temp2 = ArrayIn(j)
Temp3 = onlyDigits(Temp1)
Temp4 = onlyDigits(Temp2)
If Val(Temp3) > Val(Temp4) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
End If
Next j
Next i
SortArray = ArrayIn
End Function
Function onlyDigits(s As String) As String
' Variables needed (remember to use "option explicit"). '
Dim retval As String ' This is the return string. '
Dim i As Integer ' Counter for character position. '
' Initialise return string to empty '
retval = ""
' For every character in input string, copy digits to '
' return string. '
For i = 1 To Len(s)
If Mid(s, i, 1) >= "0" And Mid(s, i, 1) <= "9" Then
retval = retval + Mid(s, i, 1)
End If
Next
' Then return the return string. '
onlyDigits = retval
End Function
Give me the numerical sort but not the alphabetical, so 1B is sorting before 1A.
Here's the solution to sort Naturally in VBA
Setup/Testing
Sub RunTheSortMacro()
Dim i As Long
Dim myArray As Variant
'Set the array
myArray = Array("Step 15B.png", "Cat 3.png", "Step 1.png", "Step 2.png", "Step 15C.png", "Dog 1.png", "Step 10.png", "Step 15A.png", "Step 9.png", "Step 20.png", "Step 100.png", "Step 8.png", "Step 7Beta.png", "Step 7Alpha.png")
'myArray variable set to the result of SortArray function
myArray = SortArray(myArray)
For i = LBound(myArray) To UBound(myArray)
Debug.Print myArray(i)
Next
End Sub
This is the only function needed to be called in the main part;
Function SortArray(ArrayIn As Variant)
Dim i As Long
Dim j As Long
Dim Temp1 As String
Dim Temp2 As String
Dim myRegExp, myRegExp2, Temp3, Temp4, Temp5, Temp6, regExp1_Matches, regExp2_Matches
'Number and what's after the number
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.IgnoreCase = True
myRegExp.Global = True
myRegExp.pattern = "[0-9][A-Z]"
'Text up to a number or special character
Set myRegExp2 = CreateObject("vbscript.regexp")
myRegExp2.IgnoreCase = True
myRegExp2.Global = True
myRegExp2.pattern = "^[A-Z]+"
'Sort by Fisrt Text and number
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
Temp1 = ArrayIn(i)
Temp2 = ArrayIn(j)
Temp3 = onlyDigits(Temp1)
Temp4 = onlyDigits(Temp2)
Set regExp1_Matches = myRegExp2.Execute(Temp1)
Set regExp2_Matches = myRegExp2.Execute(Temp2)
If regExp1_Matches.Count = 1 And regExp2_Matches.Count = 1 Then 'eliminates blank/empty strings
If regExp1_Matches(0) > regExp2_Matches(0) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
ElseIf regExp1_Matches(0) = regExp2_Matches(0) Then
If Val(Temp3) > Val(Temp4) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
End If
End If
End If
Next j
Next i
'Sort the array again by taking two at a time finds number followed by letters and sorts the two alphabetically, ex 1A, 1B
For i = LBound(ArrayIn) To (UBound(ArrayIn) - 1)
j = i + 1
Temp1 = ArrayIn(i)
Temp2 = ArrayIn(j)
Set regExp1_Matches = myRegExp.Execute(Temp1)
Set regExp2_Matches = myRegExp.Execute(Temp2)
If regExp1_Matches.Count = 1 And regExp2_Matches.Count = 1 Then
If regExp1_Matches(0) > regExp2_Matches(0) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
End If
End If
Next i
SortArray = ArrayIn
End Function
Found this was useful for the numerical sorting;
Function onlyDigits(s As String) As String
' Variables needed (remember to use "option explicit"). '
Dim retval As String ' This is the return string. '
Dim i As Integer ' Counter for character position. '
' Initialise return string to empty '
retval = ""
' For every character in input string, copy digits to '
' return string. '
For i = 1 To Len(s)
If Mid(s, i, 1) >= "0" And Mid(s, i, 1) <= "9" Then
retval = retval + Mid(s, i, 1)
End If
Next
' Then return the return string. '
onlyDigits = retval
End Function
Results
Input:
Step 15B.png
Cat 3.png
Step 1.png
Step 2.png
Step 15C.png
Dog 1.png
Step 10.png
Step 15A.png
Step 9.png
Step 20.png
Step 100.png
Step 8.png
Step 7Beta.png
Step 7Alpha.png
Output:
Cat 3.png
Dog 1.png
Step 1.png
Step 2.png
Step 7Alpha.png
Step 7Beta.png
Step 8.png
Step 9.png
Step 10.png
Step 15A.png
Step 15B.png
Step 15C.png
Step 20.png
Step 100.png

How to generate md5-hashes for large files with VBA?

I have the following functions to generate md5-hashes for files. The functions work great for small files, but crashes and generate Run-time error 7 - Out of memory when I try to hash files over ~250 MB (I don't actually know at which exact size it breaks, but files below 200 MB work fine).
I don't understand why it breaks at a certain size, so if anyone could shed some light on that I would appreciate it a lot.
Also, is there anything I can do to make the functions handle larger files? I intend to use the functions in a larger tool where I will need to generate hashes for files of unknown sizes. Most will be small enough for the current functions to work, but I will have to be able to handle large files as well.
I got my current functions from the most upvoted answer this post How to get the MD5 hex hash for a file using VBA?
Public Function FileToMD5Hex(ByVal strFileName As String) As String
Dim varEnc As Variant
Dim varBytes As Variant
Dim strOut As String
Dim intPos As Integer
Set varEnc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
'Convert the string to a byte array and hash it
varBytes = GetFileBytes(strFileName)
varBytes = varEnc.ComputeHash_2((varBytes))
'Convert the byte array to a hex string
For intPos = 1 To LenB(varBytes)
strOut = strOut & LCase(Right("0" & Hex(AscB(MidB(varBytes, intPos, 1))), 2))
Next
FileToMD5Hex = strOut
Set varEnc = Nothing
End Function
Private Function GetFileBytes(ByVal strPath As String) As Byte()
Dim lngFileNum As Long
Dim bytRtnVal() As Byte
lngFileNum = FreeFile
'If file exists, get number of bytes
If LenB(Dir(strPath)) Then
Open strPath For Binary Access Read As lngFileNum
ReDim bytRtnVal(LOF(lngFileNum)) As Byte
Get lngFileNum, , bytRtnVal
Close lngFileNum
Else
MsgBox "Filen finns inte" & vbCrLf & "Avbryter", vbCritical, "Filen hittades inte"
Exit Function
End If
GetFileBytes = bytRtnVal
Erase bytRtnVal
End Function
Thank you
It looks like you reached the memory limit.
A better way would be to compute the MD5 of the file by block:
Public Function ComputeMD5(filepath As String) As String
Dim buffer() As Byte, svc As Object, hFile%, blockSize&, i&
blockSize = 2 ^ 16
' open the file '
If Len(Dir(filepath)) Then Else Err.Raise 5, , "file not found" & vbCr & filepath
hFile = FreeFile
Open filepath For Binary Access Read As hFile
' allocate buffer '
If LOF(hFile) < blockSize Then blockSize = ((LOF(hFile) + 1024) \ 1024) * 1024
ReDim buffer(0 To blockSize - 1)
' compute hash '
Set svc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
For i = 1 To LOF(hFile) \ blockSize
Get hFile, , buffer
svc.TransformBlock buffer, 0, blockSize, buffer, 0
Next
Get hFile, , buffer
svc.TransformFinalBlock buffer, 0, LOF(hFile) Mod blockSize
buffer = svc.Hash
' cleanup '
svc.Clear
Close hFile
' convert to an hexa string '
ComputeMD5 = String$(32, "0")
For i = 0 To 15
Mid$(ComputeMD5, i + i + 2 + (buffer(i) > 15)) = Hex(buffer(i))
Next
End Function
This is an extension to FlorentB's answer, which worked brilliantly for me until my files surpassed the 2GB LOF() size limit.
I tried to adapt for getting file length by alternate means as follows:
Public Function ComputeMD5(filepath As String) As String
If Len(Dir(filepath)) Then Else Err.Raise 5, , "File not found." & vbCr & filepath
Dim blockSize As Long: blockSize = 2 ^ 20
Dim blockSize_f As Double
Dim buffer() As Byte
Dim fileLength As Variant
Dim hFile As Integer
Dim n_Reads As Long
Dim i As Long
Dim svc As Object: Set svc = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
fileLength = DecGetFileSize(filepath)
If fileLength < blockSize Then blockSize = ((fileLength + 1024) \ 1024) * 1024
ReDim buffer(0 To blockSize - 1)
n_Reads = fileLength / blockSize
blockSize_f = fileLength - (CDbl(blockSize) * n_Reads)
hFile = FreeFile
Open filepath For Binary Access Read As hFile
For i = 1 To n_Reads
Get hFile, i, buffer
svc.TransformBlock buffer, 0, blockSize, buffer, 0
Next i
Get hFile, i, buffer
svc.TransformFinalBlock buffer, 0, blockSize_f
buffer = svc.Hash
svc.Clear
Close hFile
ComputeMD5 = String$(32, "0")
For i = 0 To 15
Mid$(ComputeMD5, i + i + 2 + (buffer(i) > 15)) = Hex(buffer(i))
Next
End Function
Public Function DecGetFileSize(fname As String) As Variant
Dim fso As New FileSystemObject
Dim f: Set f = fso.GetFile(fname)
DecGetFileSize = CDec(f.Size)
Set f = Nothing
Set fso = Nothing
End Function
This all runs fine, returning a string, however that string does not equal the MD5 calculated using other tools on the same file.
I can't work out where the discrepancy is originating.
I've checked and double checked filelength, n_reads, blockSize and blockSize_f and I'm sure those values are all correct.
I had some trouble with the Get function, where if I didn't explicitly tell it the block number, it dies at block 2048.
Any ideas / pointers would be much appreciated.

How do I get an Ascii to warp to a certain value after it has past 122?

I am trying to write an encryption program. The problem I am facing is that I am converting the text to ascii and then adding on the offset. However when it goes past the letter 'z' I want it to warp back to 'a' and go from there.
Sub enc()
Text = TextBox1.Text
finalmessage = ""
letters = Text.ToCharArray
offset = ComboBox1.SelectedItem
For x = LBound(letters) To UBound(letters)
finalmessage = finalmessage + Chr(Asc(letters(x)) + offset)
Next
TextBox2.Text = finalmessage
End Sub
I guess to make it easy to decode afterwards, you should to it somewhat in the line of base64 encoding, first encoding everything to a normalized binary string, then encode in the range you want (since using binary, it has to be something that fits with 2^X).
To match your range, i used a baseset of 32, and a simple encoding decoding example (a bit more verbose that it should be, perhaps)
Module Module1
Dim encodeChars As String = "abcdefghijklmnopqrstuvwxyzABCDEF" ' use 32 as a base
Function Encode(text As String) As String
Dim bitEncoded As String = ""
Dim outputMessage As String = ""
For Each ch As Char In text.ToCharArray()
Dim i As Integer = Convert.ToByte(ch)
bitEncoded &= Convert.ToString(i, 2).PadLeft(8, "0"c)
Next
While bitEncoded.Length Mod 5 <> 0
bitEncoded &= "0"
End While
For position As Integer = 0 To bitEncoded.Length - 1 Step 5
Dim range As String = bitEncoded.Substring(position, 5)
Dim index As Integer = Convert.ToInt32(range, 2)
outputMessage &= encodeChars(index).ToString()
Next
Return outputMessage
End Function
Function Decode(encodedText As String) As String
Dim bitEncoded As String = ""
Dim outputMessage As String = ""
For Each ch In encodedText
Dim index As Integer = encodeChars.IndexOf(ch)
If index < 0 Then
Throw New FormatException("Invalid character in encodedText!")
End If
bitEncoded &= Convert.ToString(index, 2).PadLeft(5, "0"c)
Next
' strip the extra 0's
While bitEncoded.Length Mod 8 <> 0
bitEncoded = bitEncoded.Substring(0, bitEncoded.Length - 1)
End While
For position As Integer = 0 To bitEncoded.Length - 1 Step 8
Dim range As String = bitEncoded.Substring(position, 8)
Dim index As Integer = Convert.ToInt32(range, 2)
outputMessage &= Chr(index).ToString()
Next
Return outputMessage
End Function
Sub Main()
Dim textToEncode As String = "This is a small test, with some special characters! Just testing..."
Dim encodedText As String = Encode(textToEncode)
Dim decodedText As String = Decode(encodedText)
Console.WriteLine(textToEncode)
Console.WriteLine(encodedText)
Console.WriteLine(decodedText)
If Not String.Equals(decodedText, textToEncode) Then
Console.WriteLine("Encoding / decoding failed!")
Else
Console.WriteLine("Encoding / decoding completed succesfully!")
End If
Console.ReadLine()
End Sub
End Module
this then gives the following output?
This is a small test, with some special characters! Just testing...
krugsCzanfzsayjaonwwcBdmebAgkCBufqqhoAlunaqhgBBnmuqhgCdfmnuwcBbamnugcCtbmnAgkCtteeqeuDltoqqhizltoruwCzzofyxa
This is a small test, with some special characters! Just testing...
Encoding / decoding completed succesfully!

Speed up large string data parser function

I currently have a file with 1 million characters.. the file is 1 MB in size. I am trying to parse data with this old function that still works but very slow.
start0end
start1end
start2end
start3end
start4end
start5end
start6end
the code, takes about 5 painful minutes to process the whole data.
any pointers and suggestions are appreciated.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim sFinal = ""
Dim strData = textbox.Text
Dim strFirst = "start"
Dim strSec = "end"
Dim strID As String, Pos1 As Long, Pos2 As Long, strCur As String = ""
Do While InStr(strData, strFirst) > 0
Pos1 = InStr(strData, strFirst)
strID = Mid(strData, Pos1 + Len(strFirst))
Pos2 = InStr(strID, strSec)
If Pos2 > 0 Then
strID = Microsoft.VisualBasic.Left(strID, Pos2 - 1)
End If
If strID <> strCur Then
strCur = strID
sFinal += strID & ","
End If
strData = Mid(strData, Pos1 + Len(strFirst) + 3 + Len(strID))
Loop
End Sub
The reason that is so slow is because you keep destroying and recreating a 1 MB string over and over. Strings are immutable, so strData = Mid(strData... creates a new string and copies the remaining of the 1 MB string data to a new strData variable over and over and over. Interestingly, even VB6 allowed for a progressive index.
I would have processed the disk file LINE BY LINE and plucked out the info as it was read (see streamreader.ReadLine) to avoid working with a 1MB string. Pretty much the same method could be used there.
' 1 MB textbox data (!?)
Dim sData As String = TextBox1.Text
' start/stop - probably fake
Dim sStart As String = "start"
Dim sStop As String = "end"
' result
Dim sbResult As New StringBuilder
' progressive index
Dim nNDX As Integer = 0
' shortcut at least as far as typing and readability
Dim MagicNumber As Integer = sStart.Length
' NEXT index of start/stop after nNDX
Dim i As Integer = 0
Dim j As Integer = 0
' loop as long as string remains
Do While (nNDX < sData.Length) AndAlso (i >= 0)
i = sData.IndexOf(sStart, nNDX) ' start index
j = sData.IndexOf(sStop, i) ' stop index
' Extract and append bracketed substring
sbResult.Append(sData.Substring(i + MagicNumber, j - (i + MagicNumber)))
' add a cute comma
sbResult.Append(",")
nNDX = j ' where we start next time
i = sData.IndexOf(sStart, nNDX)
Loop
' remove last comma
sbResult.Remove(sbResult.ToString.Length - 1, 1)
' show my work
Console.WriteLine(sbResult.ToString)
EDIT: Small mod for the ad hoc test data

Read Number of lines in Large Text File VB6

I have text File of Size 230MB. I want to Count Number of Lines OF that File.
I tried "Scripting.FileSystemOblect" but it goes out Of memory.
Please Help.
Thanks.
Normal Windows line breaks are CRLF, so you can count the LFs and add 1 to the count in cases where the last line of your files doesn't have one after it.
In true VB (i.e. VB5, VB6, etc.) you can make use of the byte-oriented String operations to speed many tasks. If we can assume the text files contain ANSI then this is pretty fast:
Option Explicit
Private Sub Main()
Const BUFSIZE As Long = 100000
Dim T0 As Single
Dim LfAnsi As String
Dim F As Integer
Dim FileBytes As Long
Dim BytesLeft As Long
Dim Buffer() As Byte
Dim strBuffer As String
Dim BufPos As Long
Dim LineCount As Long
T0 = Timer()
LfAnsi = StrConv(vbLf, vbFromUnicode)
F = FreeFile(0)
Open "big.txt" For Binary Access Read As #F
FileBytes = LOF(F)
ReDim Buffer(BUFSIZE - 1)
BytesLeft = FileBytes
Do Until BytesLeft = 0
If BufPos = 0 Then
If BytesLeft < BUFSIZE Then ReDim Buffer(BytesLeft - 1)
Get #F, , Buffer
strBuffer = Buffer 'Binary copy of bytes.
BytesLeft = BytesLeft - LenB(strBuffer)
BufPos = 1
End If
Do Until BufPos = 0
BufPos = InStrB(BufPos, strBuffer, LfAnsi)
If BufPos > 0 Then
LineCount = LineCount + 1
BufPos = BufPos + 1
End If
Loop
Loop
Close #F
'Add 1 to LineCount if last line of your files do not
'have a trailing CrLf.
MsgBox "Counted " & Format$(LineCount, "#,##0") & " lines in" & vbNewLine _
& Format$(FileBytes, "#,##0") & " bytes of text." & vbNewLine _
& Format$(Timer() - T0, "0.0#") & " seconds."
End Sub
Given a 7,000,000 line file of 293MB it only takes 0.7 seconds here. But note that I had not rebooted to ensure that the file wasn't cached when I ran that test. Without caching (i.e. after a reboot) I'd expect it to take as long as 5 times that.
Converting to handle Unicode text files is fairly simple. Just replace the B-functions by the non-B equivalents, make sure you set BUFSIZE to a multiple of 2, and search for vbLf instead of an ANSI LF byte.
You can do it by reading each line into the same variable. There's no need to save all the lines:
dim s as string
dim n as integer
open "filename.txt" for input as 1
n = 0
do while not eof(1)
line input #1, s
n = n + 1
loop
This has not been tested, and it's been a while since I've done any VB6, but it should be close.
This takes about 6 seconds for me on a 480mb binary file with 1mil+ 0xD (vbcr)
Dim buff() As Byte
Dim hF As Integer
Dim i As Long, n As Long
hF = FreeFile(0)
Open "c:\windows\Installer\2f91fd.msp" For Binary Access Read As #hF
ReDim buff(LOF(hF) - 1)
Get #hF, , buff()
Close #hF
For i = 0 To UBound(buff)
If buff(i) = 13 Then n = n + 1
Next
MsgBox n