Word VBA Natural Sorting - vba

I want to sort an array, or the Files from a Filesystemobject Folder, the way we'd expect them to be if sorted by a human. What I ultimately am trying to accomplish is a macro that takes images from a folder and inserts them into the word document with text above each one to identify what it represents, here I use steps for a guide and it's crucial that Step 2 come before step 100;
Setting up my test sub;
Sub RunTheSortMacro()
Dim i As Long
Dim myArray As Variant
'Set the array
myArray = Array("Step-1", "Step-2", "Step-10", "Step-15", "Step-9", "Step-20", "Step-100", "Step-8", "Step-7")
'myArray variable set to the result of SortArray function
myArray = SortArray(myArray)
'Output the Array through a message box
For i = LBound(myArray) To UBound(myArray)
MsgBox myArray(i)
Next i
End Sub
Then the only/best sort function I found is really only good for numbers;
Function SortArray(ArrayIn As Variant)
Dim i As Long
Dim j As Long
Dim Temp
'Sort the Array A-Z
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
If ArrayIn(i) > ArrayIn(j) Then
SrtTemp = ArrayIn(j)
ArrayIn(j) = ArrayIn(i)
ArrayIn(i) = SrtTemp
End If
Next j
Next i
SortArray = ArrayIn
End Function
That function returns the array as;
Step-1,
Step-10,
Step-100,
Step-15,
Step-2,
Step-20,
Step-7,
Step-8,
Step-9
but I want;
Step-1,
Step-2,
Step-7,
Step-8,
Step-9,
Step-10,
Step-15,
Step-20,
Step-100
I thought using StrComp(ArrayIn(i), ArrayIn(j), vbBinaryCompare/vbTextCompare) would be one way to go but they seem to sort the same way. If it's easier, I am only going the array route because I couldn't find a way to sort the input files from;
Set objFSO = CreateObject("Scripting.Filesystemobject")
Set Folder = objFSO.GetFolder(FolderPath)
For Each image In Folder.Files
ImagePath = image.Path
Selection.TypeText Text:=Left(image.Name, Len(image.Name) - 4)
Selection.TypeText Text:=vbCr
'Insert the images into the word document
Application.Selection.EndKey END_OF_STORY, MOVE_SELECTION
Application.Selection.InlineShapes.AddPicture (ImagePath)
Application.Selection.InsertBreak 'Insert a pagebreak
Next
So I was going to break the file name and path into two arrays that I could sort naturally;
Set objFiles = Folder.Files
FileCount = objFiles.Count
ReDim imageNameArray(FileCount)
ReDim imagePathArray(FileCount)
icounter = 0
For Each image In Folder.Files
imageNameArray(icounter) = (image.Name)
imagePathArray(icounter) = (image.Path)
icounter = icounter + 1
Next
but I can't find any reference to natural sorting in VBA.
Update, Additional Details;
I didn't think about the A and B after numbers and everything I search agrees on what "natural sorting" is; 1,2,3,A,B,C; Apple < 1A < 1C < 2. Regex might be good
This was how I achieved this in a python script;
import os
import re
def tryint(s):
try:
return int(s)
except:
return s
def alphanum_key(s):
""" Turn a string into a list of string and number chunks.
"z23a" -> ["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
def sort_nicely(l):
""" Sort the given list in the way that humans expect.
"""
l.sort(key=alphanum_key)
files = [file for file in os.listdir(".") if (file.lower().endswith('.png')) or (file.lower().endswith('.jpg'))]
files.sort(key=alphanum_key)
for file in sorted(files,key=alphanum_key):
stepname = file.strip('.jpg')
print(stepname.strip('.png')
For VBA I have found that these;
Function SortArray(ArrayIn As Variant)
Dim i As Long
Dim j As Long
Dim Temp1 As String
Dim Temp2 As String
Dim Temp3 As String
Dim Temp4 As String
'Sort the Array A-Z
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
Temp1 = ArrayIn(i)
Temp2 = ArrayIn(j)
Temp3 = onlyDigits(Temp1)
Temp4 = onlyDigits(Temp2)
If Val(Temp3) > Val(Temp4) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
End If
Next j
Next i
SortArray = ArrayIn
End Function
Function onlyDigits(s As String) As String
' Variables needed (remember to use "option explicit"). '
Dim retval As String ' This is the return string. '
Dim i As Integer ' Counter for character position. '
' Initialise return string to empty '
retval = ""
' For every character in input string, copy digits to '
' return string. '
For i = 1 To Len(s)
If Mid(s, i, 1) >= "0" And Mid(s, i, 1) <= "9" Then
retval = retval + Mid(s, i, 1)
End If
Next
' Then return the return string. '
onlyDigits = retval
End Function
Give me the numerical sort but not the alphabetical, so 1B is sorting before 1A.

Here's the solution to sort Naturally in VBA
Setup/Testing
Sub RunTheSortMacro()
Dim i As Long
Dim myArray As Variant
'Set the array
myArray = Array("Step 15B.png", "Cat 3.png", "Step 1.png", "Step 2.png", "Step 15C.png", "Dog 1.png", "Step 10.png", "Step 15A.png", "Step 9.png", "Step 20.png", "Step 100.png", "Step 8.png", "Step 7Beta.png", "Step 7Alpha.png")
'myArray variable set to the result of SortArray function
myArray = SortArray(myArray)
For i = LBound(myArray) To UBound(myArray)
Debug.Print myArray(i)
Next
End Sub
This is the only function needed to be called in the main part;
Function SortArray(ArrayIn As Variant)
Dim i As Long
Dim j As Long
Dim Temp1 As String
Dim Temp2 As String
Dim myRegExp, myRegExp2, Temp3, Temp4, Temp5, Temp6, regExp1_Matches, regExp2_Matches
'Number and what's after the number
Set myRegExp = CreateObject("vbscript.regexp")
myRegExp.IgnoreCase = True
myRegExp.Global = True
myRegExp.pattern = "[0-9][A-Z]"
'Text up to a number or special character
Set myRegExp2 = CreateObject("vbscript.regexp")
myRegExp2.IgnoreCase = True
myRegExp2.Global = True
myRegExp2.pattern = "^[A-Z]+"
'Sort by Fisrt Text and number
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
Temp1 = ArrayIn(i)
Temp2 = ArrayIn(j)
Temp3 = onlyDigits(Temp1)
Temp4 = onlyDigits(Temp2)
Set regExp1_Matches = myRegExp2.Execute(Temp1)
Set regExp2_Matches = myRegExp2.Execute(Temp2)
If regExp1_Matches.Count = 1 And regExp2_Matches.Count = 1 Then 'eliminates blank/empty strings
If regExp1_Matches(0) > regExp2_Matches(0) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
ElseIf regExp1_Matches(0) = regExp2_Matches(0) Then
If Val(Temp3) > Val(Temp4) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
End If
End If
End If
Next j
Next i
'Sort the array again by taking two at a time finds number followed by letters and sorts the two alphabetically, ex 1A, 1B
For i = LBound(ArrayIn) To (UBound(ArrayIn) - 1)
j = i + 1
Temp1 = ArrayIn(i)
Temp2 = ArrayIn(j)
Set regExp1_Matches = myRegExp.Execute(Temp1)
Set regExp2_Matches = myRegExp.Execute(Temp2)
If regExp1_Matches.Count = 1 And regExp2_Matches.Count = 1 Then
If regExp1_Matches(0) > regExp2_Matches(0) Then
ArrayIn(j) = Temp1
ArrayIn(i) = Temp2
End If
End If
Next i
SortArray = ArrayIn
End Function
Found this was useful for the numerical sorting;
Function onlyDigits(s As String) As String
' Variables needed (remember to use "option explicit"). '
Dim retval As String ' This is the return string. '
Dim i As Integer ' Counter for character position. '
' Initialise return string to empty '
retval = ""
' For every character in input string, copy digits to '
' return string. '
For i = 1 To Len(s)
If Mid(s, i, 1) >= "0" And Mid(s, i, 1) <= "9" Then
retval = retval + Mid(s, i, 1)
End If
Next
' Then return the return string. '
onlyDigits = retval
End Function
Results
Input:
Step 15B.png
Cat 3.png
Step 1.png
Step 2.png
Step 15C.png
Dog 1.png
Step 10.png
Step 15A.png
Step 9.png
Step 20.png
Step 100.png
Step 8.png
Step 7Beta.png
Step 7Alpha.png
Output:
Cat 3.png
Dog 1.png
Step 1.png
Step 2.png
Step 7Alpha.png
Step 7Beta.png
Step 8.png
Step 9.png
Step 10.png
Step 15A.png
Step 15B.png
Step 15C.png
Step 20.png
Step 100.png

Related

Parsing a string in MSAccess VBA with varying numbers of delimiters

In MSAccess VBA, I'm trying to parse a name field into last, first, middle. The problem is that the incoming format is not consistent:
Jones John Q
Doe Jane
Smith Robert X
This is what I'm doing
Dim rsNames As DAO.Recordset
Set rsNames = CurrentDb.OpenRecordset("SELECT * FROM tblInput")
If Not (rsNames.EOF And rsNames.BOF) Then
rsNames.MoveFirst
Do Until rsNames.EOF = True
strFullName = rsNames!Name
intLength = Len(strFullName)
intSpacePos = InStr(strFullName, " ")
strLname = Left(strFullName, intSpacePos - 1)
strFname = Mid(strFullName, intSpacePos, intLength - (intSpacePos - 1))
strFname = Trim(strFname)
If Len(strFname) + Len(strLname) + (intSpacePos - 1) < intLength Then
strMI = Right(strFullName, 1)
End If
rsNames.Edit
rsNames!LastName = strLname
rsNames!FirstName = strFname
rsNames!MiddleInitial = strMI
rsNames.Update
rsNames.MoveNext
Loop
Results
LastName: Jones
FirstName: John Q
Middle Initial: Q
LastName: Doe
FirstName: Jane
Middle Initial: E
If I change this line
strFname = Mid(strFullName, intSpacePos, intLength - (intSpacePos - 1)) to
strFname = Mid(strFullName, intSpacePos, intLength - (intSpacePos), the lines with middle initials parse correctly, but the lines without middle initials cut off the last character of the first name and move it to the middle initial field (Doe Jan E)
I've tried using split and replace but neither works properly because of the varying numbers of spaces separating the fields. I'm wondering if my only option is to read the string character by character and building the individual fields that way, but before I go down that path, am I missing something obvious? I have no control over the incoming file.
I'll propose you to use split() function, in this manner:
Dim rsNames As DAO.Recordset
Dim strLname As String, strFname As String, strMI As String
Dim i As Integer
Dim x, arr As Variant
Set rsNames = CurrentDb.OpenRecordset("SELECT * FROM tblInput")
If Not (rsNames.EOF And rsNames.BOF) Then
'rsNames.MoveFirst
Do Until rsNames.EOF = True
arr = Split(rsNames!Name)
strLname = ""
strFname = ""
strMI = ""
i = 0
For Each x In arr
If (x <> "") Then
If (i = 0) Then
strLname = x
ElseIf (i = 1) Then
strFname = x
Else
strMI = x
End If
i = i + 1
End If
'
If (i > 2) Then
Exit For
End If
Next
'
rsNames.Edit
rsNames!LastName = strLname
rsNames!FirstName = strFname
rsNames!MiddleInitial = strMI
rsNames.Update
rsNames.MoveNext
Loop
End If
rsNames.Close
Set rsNames = Nothing
We use a loop to find non empty split strings as LastName, FirstName and Middle initial.
This pure VBA code avoids us to use extra VBScript.RegExp replacement.
I would lean towards using RegEx and Split:
Private Sub Test()
Dim strFullName As String
Dim NameParts As Variant
strFullName = "Jones John Q"
With CreateObject("vbscript.regexp")
.Pattern = "\s+"
.Global = True
strFullName = .Replace(strFullName, " ")
End With
NameParts = Split(strFullName, " ")
End Sub
NameParts is an array containing Last, First, and possibly Middle names.
Are First Name and Last Name always in the same position? If so, the use of split can be use to determine the existence of the middle, i may be missing something though, i'd go for
Dim a() As String
a() = Split(s, Chr(32))
strLastName = a(0)
strFirstName = a(1)
If UBound(a) = 2 Then
strMiddle = a(2)
Else
strMiddle = ""
End If
Debug.Print strFirstName, strMiddle, strLastName
or something a bit less elegant
If Len(s) - Len(Replace(s, Chr(32), "")) = 2 Then
strMiddle = Right(s, Len(s) - InStrRev(s, Chr(32)))
End If

Adding +2 to any number (VBA)

I am writing a VBA code to add +2 to any string of numbers that are put in the function.
It works fine, until it reaches 6 and 7, then it breaks. I really have no clue why that is.
If you are wondering why I am doing this, this is part of an encryption algorithm and it is specifically looking to encrypt digits in a string.
My code is:
Sub AddNumbers()
Dim Nos As String
Dim AddNo As String
Dim Found As Boolean
Dim Split()
Nos = "0-1-2-3-4-5-6-7-8-9-10"
Sheets("Sheet1").Range("U2").Value = Nos
Length = Len(Nos)
ReDim Split(Length)
For i = 1 To Length
Found = False
Split(i) = Mid(Nos, i, 1)
For O = 48 To 55
If Split(i) = Chr(O) Then
Split(i) = Chr(O + 2)
Found = True
Exit For
End If
Next O
If Split(i) = Chr(56) Then
Split(i) = Chr(48)
ElseIf Split(i) = Chr(57) Then
Split(i) = Chr(49)
End If
Next i
AddNo = Join(Split, "")
Sheets("Sheet1").Range("U3").Value = AddNo
End Sub
I would really appreciate an insight to why it is breaking at 6 and 7.
Take me a moment, but you are double adding.
Look at your loop. When you encounter 6 (Char(54)) you add 2 and have 8 (Char(56)).
But then, after your loop you are testing again for same Split(i). Char for 6 and 7 are now accordingly 56 and 57 - so you add another 2 to them.
If Split(i) = Chr(56) And Found = False Then
Split(i) = Chr(48)
ElseIf Split(i) = Chr(57) And Found = False Then
Split(i) = Chr(49)
End If
Use the actual function Split:
Sub AddNumbers()
Dim Nos As String
Dim AddNo As String
Dim Found As Boolean
Dim SplitStr() As String
Nos = "0-1-2-3-4-5-6-7-8-9-10"
Sheets("Sheet1").Range("U2").Value = Nos
SplitStr = Split(Nos, "-")
Dim i As Long
For i = LBound(SplitStr) To UBound(SplitStr)
Dim vlue As String
vlue = StrConv(SplitStr(i), vbUnicode)
Dim substr() As String
substr = Split(Left(vlue, Len(vlue) - 1), vbNullChar)
Dim j As Long
For j = LBound(substr) To UBound(substr)
Select Case substr(j)
Case 8
substr(j) = 0
Case 9
substr(j) = 1
Case Else
substr(j) = substr(j) + 2
End Select
Next j
SplitStr(i) = Join(substr, "")
Next i
AddNo = Join(SplitStr, "-")
Sheets("Sheet1").Range("U3").Value = AddNo
End Sub
The overall problem is that you are using the Chr codes for numbers and not actual numbers. This method only returns 1 digit because a Chr() refers to a list of single characters.
You are going to need to use Split (mySplit = Split(Nos,"-")) to return each number and work with those.
The lines
If Split(i) = Chr(56) Then
Split(i) = Chr(48)
ElseIf Split(i) = Chr(57) Then
Split(i) = Chr(49)
End If
has me confused. You are saying if the value is "8" change to "0" and if it is "9" change to "1"
This is another way to do it:
Sub AddNumbers()
Dim Nos As String, Nos2 As String
Dim NumSplit As Variant
Dim Num As Variant
Dim tmp As String
Dim i As Long
Nos = "0-1-2-3-4-5-6-7-8-9-10"
Sheets("Sheet1").Range("U2").Value = Nos
NumSplit = Split(Nos, "-")
For Each Num In NumSplit
For i = 1 To Len(Num)
tmp = tmp & Mid(Num, i, 1) + 2
Next i
Nos2 = Nos2 & tmp & "-"
tmp = ""
Next Num
Nos2 = Left(Nos2, Len(Nos2) - 1)
Sheets("Sheet1").Range("U3").Value = Nos2
End Sub
It's a bit messy, but shows the basic idea of splitting the original array into the separate numbers.
The For....Next loop inside the For...Each loop takes care of any numbers with more than one digit (giving the 32).

gas behavior in VBA

my project is to predict non-ideal gas movement, so i wrote this code to give every molecule a specific number, but it keeps repeating numbers (i used randbetween)
how do i chnge it so it wont repeat the same number?
Sub Rand_Number()
'áåçø 20 àçåæ ùì îñôøé äîåì÷åìåú
Dim RandNum As Long
Dim k As Long
Dim Mone As Integer
Mone = 0
Num_molecules = Sheets("Data").Range("A14").Value
RandNum = WorksheetFunction.RandBetween(1, Num_molecules)
For j = 1 To Num_molecules * 0.2
If IsEmpty(Sheets("rand").Cells(1, 1)) = True Then
Sheets("rand").Cells(1, 1) = RandNum
Else
i = 1
'RandNum = WorksheetFunction.RandBetween(1, Num_molecules)
Do 'Until IsEmpty(Sheets("rand").Cells(i, 1)) = True
If Sheets("rand").Cells(i, 1) = RandNum Then
RandNum = WorksheetFunction.RandBetween(1, Num_molecules)
Do Until RandNum = Cells(i, 1) Or IsEmpty(Cells(i, 1)) = True
If RandNum = Sheets("rand").Cells(i, 1) Then
RandNum = WorksheetFunction.RandBetween(1, Num_molecules)
Else
i = i + 1
End If
Loop
ElseIf IsEmpty(Sheets("rand").Cells(i, 1)) = False Then
i = i + 1
Else
Sheets("rand").Cells(i, 1) = RandNum
Exit Do
End If
Loop
End If
Next j
End Sub
Generation of numbers until all numbers from the range are generated. It is inefficient as towards the end of the algorithm most random numbers become a "miss", but it is still more efficient than collection's remove method below.
Sub uniqRndMissedHits()
Dim lb As Long: lb = 1 ' lower bound
Dim ub As Long: ub = 1000 ' upper bound
' populate collection with numbers starting from lb to ub
Dim i As Long
Dim c As New Collection
' iterate while we haven't generated all the random numbers
' in the specified range
While c.Count < ub - lb + 1
i = Int((ub - lb + 1) * Rnd + lb)
If Not contains(c, CStr(i)) Then
c.Add i, CStr(i)
Debug.Print i ' this is your unique random number from the
' remaining in the collection
End If
Wend
End Sub
Function contains(col As Collection, key As String) As Boolean
On Error Resume Next
col.Item key
contains = (Err.Number = 0)
On Error GoTo 0
End Function
This example generates a guaranteed unique (i.e. previously not generated) values, but Remove method of the Collection makes it inefficient for large number of simulations.
Sub uniqRnd()
Dim lb As Long: lb = 1 ' lower bound
Dim ub As Long: ub = 1000 ' upper bound
' populate collection with numbers starting from lb to ub
Dim i As Long
Dim c As New Collection
For i = lb To ub: c.Add i: Next
' randomly pick the number and (!) remove it from the
' collection at the same time so it won't be repeated
While c.Count > 0
lb = 1
ub = c.Count
i = Int((ub - lb + 1) * Rnd + lb)
Debug.Print c(i) ' this is your unique random number from the
' remaining in the collection
c.Remove i
Wend
End Sub
Comparison of performance of all the methods in this answer can be found in this GitHub Gist Excel VBA: Generate complete set of unique random numbers
I'd recommend using a dictionary to keep track of the random numbers that have been generated so far. If the number doesn't exist in the dictionary you can proceed with the simulation, otherwise you could generate a new random number (this would be the Else condition)
Using a dictionary is very fast for doing the lookup.
Here's a code sample of how to work with a dictionary.
Public Sub DictionaryExample()
Dim myDict As Object: Set myDict = CreateObject("Scripting.Dictionary")
Dim myRand As Long
Dim i As Long
For i = 1 To 10000
myRand = WorksheetFunction.RandBetween(1, 10000)
If myDict.exists(myRand) = False Then ' The random number doesn't exist in the previous items added
'If it doesn't exist, add it to the dictionary
myDict.Add myRand, myRand 'First parameter is the key, or the unique value
'The second parameter is the value associated with the key, the lookup value
Else
'Do something here when it does exist
End If
Next i
End Sub

VBA - Setting multidimensional array values in one line

Right, so using Python I would create a multidimensional list and set the values on one line of code (as per the below).
aryTitle = [["Desciption", "Value"],["Description2", "Value2"]]
print(aryTitle[0,0] + aryTitle[0,1])
I like the way I can set the values on one line. In VBA I am doing this by:
Dim aryTitle(0 To 1, 0 To 1) As String
aryTitle(0, 0) = "Description"
aryTitle(0, 1) = "Value"
aryTitle(1, 0) = "Description2"
aryTitle(1, 1) = "Value2"
MsgBox (aryTitle(0, 0) & aryTitle(0, 1))
Is there a way to set the values in one line of code?
Not natively, no. But you can write a function for it. The only reason Python can do that is someone wrote a function to do it. The difference is that they had access to the source so they could make the syntax whatever they like. You'll be limited to VBA function syntax. Here's a function to create a 2-dim array. It's not technically 'one line of code', but throw it in your MUtilities module and forget about it and it will feel like one line of code.
Public Function FillTwoDim(ParamArray KeyValue() As Variant) As Variant
Dim aReturn() As Variant
Dim i As Long
Dim lCnt As Long
ReDim aReturn(0 To ((UBound(KeyValue) + 1) \ 2) - 1, 0 To 1)
For i = LBound(KeyValue) To UBound(KeyValue) Step 2
If i + 1 <= UBound(KeyValue) Then
aReturn(lCnt, 0) = KeyValue(i)
aReturn(lCnt, 1) = KeyValue(i + 1)
lCnt = lCnt + 1
End If
Next i
FillTwoDim = aReturn
End Function
Sub test()
Dim vaArr As Variant
Dim i As Long
Dim j As Long
vaArr = FillTwoDim("Description", "Value", "Description2", "Value2")
For i = LBound(vaArr, 1) To UBound(vaArr, 1)
For j = LBound(vaArr, 2) To UBound(vaArr, 2)
Debug.Print i, j, vaArr(i, j)
Next j
Next i
End Sub
If you supply an odd number of arguments, it ignores the last one. If you use 3-dim arrays, you could write a function for that. You could also write a fancy function that could handle any dims, but I'm not sure it's worth it. And if you're using more than 3-dim arrays, you probably don't need my help writing a function.
The output from the above
0 0 Description
0 1 Value
1 0 Description2
1 1 Value2
You can write a helper function:
Function MultiSplit(s As String, Optional delim1 As String = ",", Optional delim2 As String = ";") As Variant
Dim V As Variant, W As Variant, A As Variant
Dim i As Long, j As Long, m As Long, n As Long
V = Split(s, delim2)
m = UBound(V)
n = UBound(Split(V(0), delim1))
ReDim A(0 To m, 0 To n)
For i = 0 To m
For j = 0 To n
W = Split(V(i), delim1)
A(i, j) = Trim(W(j))
Next j
Next i
MultiSplit = A
End Function
Used like this:
Sub test()
Dim A As Variant
A = MultiSplit("Desciption, Value; Description2, Value2")
Range("A1:B2").Value = A
End Sub

VBA to load very large file in one go (no buffering)

I am experiencing an unexpected vb limitation on the string max size, as explained in this post:
VBA unexpected reach of string size limit
While I was expecting to be able to load files up to 2GB (2^31 char) using open path for binary and get function, I get an out of string space error when I try to load a string larger than 255,918,061 characters.
I managed to work around this issue buffering the input stream of get. The problem is that I need to load the file as an array of string by splitting the buffer on vbCrLf characters.
This requires then to build the array line by line. Moreover, since I cannot be sure whether the buffer is ending on a break line or not I need additional operations. This solution is Time and Memory consuming. Loading a file of 300MB with this code costs 900MB (!) use of memory by excel. Is there a better solution ?
Here bellow is my code:
Function Load_File(path As String) As Variant
Dim MyData As String, FNum As Integer
Dim LenRemainingBytes As Long
Dim BufferSizeCurrent As Long
Dim FileByLines() As String
Dim CuttedLine As Boolean
Dim tmpSplit() As String
Dim FinalSplit() As String
Dim NbOfLines As Long
Dim LastLine As String
Dim count As Long, i As Long
Const BufferSizeMax As Long = 100000
FNum = FreeFile()
Open path For Binary As #FNum
LenRemainingBytes = LOF(FNum)
NbOfLines = FileNbOfLines(path)
ReDim FinalSplit(NbOfLines)
CuttedLine = False
Do While LenRemainingBytes > 0
MyData = ""
If LenRemainingBytes > BufferSizeMax Then
BufferSizeCurrent = BufferSizeMax
Else
BufferSizeCurrent = LenRemainingBytes
End If
MyData = Space$(BufferSizeCurrent)
Get #FNum, , MyData
tmpSplit = Split(MyData, vbCrLf)
If CuttedLine Then
count = count - 1
tmpSplit(0) = LastLine & tmpSplit(0)
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
Else
For i = 0 To UBound(tmpSplit)
If count > NbOfLines Then Exit For
FinalSplit(count) = tmpSplit(i)
count = count + 1
Next i
End If
Erase tmpSplit
LastLine = Right(MyData, Len(MyData) - InStrRev(MyData, vbCrLf) - 1)
CuttedLine = Len(LastLine) > 1
LenRemainingBytes = LenRemainingBytes - BufferSizeCurrent
Loop
Close FNum
Load_File = FinalSplit
Erase FinalSplit
End Function
Where the function FileNbOfLines is efficiently returning the number of line break characters.
Edit:
My Needs are:
To look for a specific string within the file
To get a specific number of lines coming after this string
Here you go, not pretty but should give you the general concept:
Sub GetLines()
Const fileName As String = "C:\Users\bloggsj\desktop\testfile.txt"
Const wordToFind As String = "FindMe"
Dim lineStart As String
Dim lineCount As String
Dim linesAfterWord As Long
With CreateObject("WScript.Shell")
lineCount = .Exec("CMD /C FIND /V /C """" """ & fileName & """").StdOut.ReadAll
lineStart = Split(.Exec("CMD /C FIND /N """ & wordToFind & """ """ & fileName & """").StdOut.ReadAll, vbCrLf)(2)
End With
linesAfterWord = CLng(Trim(Mid(lineCount, InStrRev(lineCount, ":") + 1))) - CLng(Trim(Mid(lineStart, 2, InStr(lineStart, "]") - 2)))
Debug.Print linesAfterWord
End Sub
Uses CMD to count the number of lines, then find the line at which the word appears, then subtract one from the other to give you the amount of lines after the word has been found.
Answer: Yes, using ReadAll from FSO should do the job.
Best answer: Just avoid it !
My needs were:
Identify a specific string within the file
Extract a certain number of lines after this string
As far as you precisely know the exact amout of data you want to extract, and assuming this amount of data is below vba string size limit (!), here is what it does the job the faster.
Decrease of computation time is improved using binary comparison of strings. My code is as follows:
Function GetFileLines(path As String, str As String, NbOfLines As Long) As String()
Const BUFSIZE As Long = 100000
Dim StringFound As Boolean
Dim lfAnsi As String
Dim strAnsi As String
Dim F As Integer
Dim BytesLeft As Long
Dim Buffer() As Byte
Dim strBuffer As String
Dim BufferOverlap As String
Dim PrevPos As Long
Dim NextPos As Long
Dim LineCount As Long
Dim data As String
F = FreeFile(0)
strAnsi = StrConv(str, vbFromUnicode) 'Looked String
lfAnsi = StrConv(vbLf, vbFromUnicode) 'LineBreak character
Open path For Binary Access Read As #F
BytesLeft = LOF(F)
ReDim Buffer(BUFSIZE - 1)
'Overlapping buffer is 3/2 times the size of strBuffer
'(two bytes per character)
BufferOverlap = Space$(Int(3 * BUFSIZE / 4))
StringFound = False
Do Until BytesLeft = 0
If BytesLeft < BUFSIZE Then ReDim Buffer(BytesLeft - 1)
Get #F, , Buffer
strBuffer = Buffer 'Binary copy of bytes.
BytesLeft = BytesLeft - LenB(strBuffer)
Mid$(BufferOverlap, Int(BUFSIZE / 4) + 1) = strBuffer 'Overlapping Buffer
If Not StringFound Then 'Looking for the the string
PrevPos = InStrB(BufferOverlap, strAnsi) 'Position of the looked string within the buffer
StringFound = PrevPos <> 0
If StringFound Then strBuffer = BufferOverlap
End If
If StringFound Then 'When string is found, loop until NbOfLines
Do Until LineCount = NbOfLines
NextPos = InStrB(PrevPos, strBuffer, lfAnsi)
If NextPos = 0 And LineCount < NbOfLines Then 'Buffer end reached, NbOfLines not reached
'Adding end of buffer to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos)
PrevPos = 1
Exit Do
Else
'Adding New Line to data
data = data & Mid$(StrConv(strBuffer, vbUnicode), PrevPos, NextPos - PrevPos + 1)
End If
PrevPos = NextPos + 1
LineCount = LineCount + 1
If LineCount = NbOfLines Then Exit Do
Loop
End If
If LineCount = NbOfLines then Exit Do
Mid$(BufferOverlap, 1, Int(BUFSIZE / 4)) = Mid$(strBuffer, Int(BUFSIZE / 4))
Loop
Close F
GetFileLines = Split(data, vbCrLf)
End Function
To crunch even more computation time, it is highly advised to use fast string concatenation as explained here.
For instance the following function can be used:
Sub FastConcat(ByRef Dest As String, ByVal Source As String, ByRef ccOffset)
Dim L As Long, Buffer As Long
Buffer = 50000
L = Len(Source)
If (ccOffset + L) >= Len(Dest) Then
If L > Buffer Then
Dest = Dest & Space$(L)
Else
Dest = Dest & Space$(Buffer)
End If
End If
Mid$(Dest, ccOffset + 1, L) = Source
ccOffset = ccOffset + L
End Sub
And then use the function as follows:
NbOfChars = 0
Do until...
FastConcat MyString, AddedString, NbOfChars
Loop
MyString = Left$(MyString,NbOfChars)