How to recursively parse data out of an e-mail using VBA? - vba

So I get e-mails every day with information in them. Unfortunately, for some reason, the data is sent in the body of the e-mail, and not as an attachment. Fine then. I'm using Excel to scrape Outlook, using VBA.
Sub mytry()
Dim olapp As Object
Dim olmapi As Object
Dim olmail As Object
Dim olitem As Object
Dim lrow As Integer
Dim olattach As Object
Dim str As String
Dim TextWeNeedToParse as String
Const num As Integer = 6
Const path As String = "C:\HP\"
Const emailpath As String = "C:\Dell\"
Const olFolderInbox As Integer = 6
Set olp = CreateObject("outlook.application")
Set olmapi = olp.getnamespace("MAPI")
Set olmail = olmapi.getdefaultfolder(num)
If olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""").Count = 0 Then
Else
For Each olitem In olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""")
TextWeNeedToParse = olitem.body
'Recursive text parsing here
Next olitem
End If
Ok, so this code snippet should get me the entire body of the text into a string. Now we can pass the string around, and manipulate it.
A sample of the text I'm dealing with:
WAL +300bp QTY
(M) FCTR SECURITY CPN ASK 1mPSA TYPE
0.77 1.15 458 0.04 GNR 2012-61 CA 2.00 99-16 217 SEQ
1.39 2.26 120 0.76 GNR 2005-13 AE 5.00 102-24 223 SUP
1.40 18.16 45 0.65 GNR 2015-157 NH 2.50 95-16 215 EXCH,+
1.50 21.56 25 0.94 GNR 2017-103 HD 3.00 98-08 375 PAC-2
So there are a few different ways I can see myself tackling this, but I don't quite know all of the pieces.
1) I could try counting how many carriage returns exist, and doing a loop. Then "counting" spaces to figure out where everything is. Not quite sure how well it would work.
2) I could regex out the unique ID in the middle, and if I can figure out how to regex the nth instance (a major point where I'm stuck), I could also use that to regex out the numbers - for example, line one would be the 1-5 instance of straight numbers/decimals together surrounded by spaces, and the first instance of number-number-dash-number-number.
Sample Regex Code that I'd throw through it:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
3) I could try some of the methods above, but use recursion. My recursion is fairly weak.
So once I have the text string extracted, I imagine I'd need something like:
Sub QuickExample(Dim Cusip as String, Dim PriceStr as variant, Dim SpreadStr as variant)
Dim ws as WorkSheet
Set ws = thisworkbook.sheets("Results")
LastRow = ws.Cells(sht.Rows.Count, "A").End(xlUp).Row
ws.cells(Lastrow,1).value2 = Cusip
ws.cells(Lastrow,2).value2 = PriceStr
ws.cells(Lastrow,3).value2 = SpreadStr
End Sub
And lastly:
Sub ParsingDate(EmailText as String)
Dim CarriageReturns As Long
CarriageReturns = Len(EmailText) - Len(Replace(EmailText, Chr(10), ""))
For i = 1 to CarriageReturns
'Parse out the data for the ith row, return it to the function above
Next i
End Sub
It's the actual act of parsing which I'm struggling a bit with - how do I properly get the nth result, and only the nth result? How do I make sure it keeps working even if some extra spaces or lines get added? Is there a way to just use regex, and "look" around the nth finding of a given expression? Is it doable to make this without a lot of recursion?
Thank you

WAL +300bp QTY
(M) FCTR SECURITY CPN ASK 1mPSA TYPE
0.77 1.15 458 0.04 GNR 2012-61 CA 2.00 99-16 217 SEQ
1.39 2.26 120 0.76 GNR 2005-13 AE 5.00 102-24 223 SUP
1.40 18.16 45 0.65 GNR 2015-157 NH 2.50 95-16 215 EXCH,+
1.50 21.56 25 0.94 GNR 2017-103 HD 3.00 98-08 375 PAC-2
This seems like a pretty well formatted table. Perhaps pop each line into an array using Split() and then each field into an array, again using split():
Sub dump()
arrLine = Split(TextWeNeedToParse, Chr(10))
For Each Line In arrLine
For Each field In Split(Line, " ")
Debug.Print field
Next
Next
End Sub
That's super short and runs quick. You are just an if statement and counter (or regex test) away from getting the exact items you want.
Testing/counting may be easier if you remove multiple spaces so the split() puts each element in it's proper place. You could employee a loop to remove multiple spaces before running this:
Fully implemented it might be something like:
<your code to get the bod>
'remove multiple spaces from string for parsing
Do While InStr(1, TextWeNeedToParse, " ")
TextWeNeedToParse= Replace(TextWeNeedToParse, " ", " ")
Loop
'Get each line into an array element
arrLine = Split(TextWeNeedToParse, Chr(10))
'Loop through the array
For Each Line In arrLine
'dump fields to an array
arrFields = Split(Line, " ")
'and spit out a particular element (your "unique id" is element 5)
If UBound(arrFields) >= 5 Then Debug.Print "unique id:"; arrFields(5)
Next

Related

EXCEL: Extracting Milliletres and Litres from cell using vba or formula

I am trying to extract the Milliliters and Liters from text(Only the value, not the ml or Ltr), but as there is no specific standard, it is difficult to do a formula. Below is some examples of the types of descriptions I have:
- Settlement 2.5% (Settlement Discount)
- BAGDOL/1 (U-POL DOLPHIN GLAZE Brushable Stopper Bag 440ml)
- P38/5 (ISOPON P38 LIGHTWEIGHT FILLER 3Ltr)
- EGC21TT (EUROPEAN COATINGS PRIMER 4:1 HS 4Ltr TRIPLE TIGHT)
- RLT/1KIT (U-POL RAPTOR TINTABLE 750ml & 250ml STANDARD HARDENER KIT)
- CCWP/AL (U-POL CUSTOM CAN Pregassed Aerosol 400ml (Waterbased))
I have used the formula below which I feel I have over-complicated.:
=IF(LEN(IFERROR(MID(G2,FIND("ml",G2)-3,LEN(G2)),""))=0,IFERROR(SUBSTITUTE(MID(G2,FIND("Ltr",G2)-2,LEN(G2)),"Ltr)","")*1000,""),IFERROR(SUBSTITUTE(MID(G2,FIND("ml",G2)-3,LEN(G2)),"ml)",""),""))
So those with 440ml returns 440 and those that say 3Ltr return 3000 - But this is only if it is at the end of the cells text.
I want to be able to extract the value of the ml or Ltr no matter where it is in the text. Those that have more than one set of ml value in it, I want to sum them. Example : 750ml & 250ml should be 1000. Settlement 2.5% (Settlement Discount) should return 0.
Is there a simple way of doing this ? Id be happier with doing it in VBA, as I think this would be less complicated.
Any help would be appreciated!
There is probably a far better way of doing this using RegEx, but I'm not practiced in those dark arts so I'd use something like the following to get what you need:
Function getvolume(txt As String) As Long
Dim wrd As Variant, wrds As Variant
getvolume = 0
If txt = "" Then Exit Function
' strip out characters that will confuse
txt = Replace(txt, "(", " ")
txt = Replace(txt, ")", " ")
' convert all to lower case
txt = LCase(txt)
' split the text into individual "words"
wrds = Split(txt, " ")
For Each wrd In wrds
If wrd Like "*ml" Then getvolume = getvolume + Val(Replace(wrd, "ml", ""))
If wrd Like "*ltr" Then getvolume = getvolume + 1000 * Val(Replace(wrd, "ltr", ""))
Next
End Function
So if your text is in G2 then you'd use a formula of =getvolume(G2)
regex User Defined Function
Option Explicit
Function metricMeasure(str As String)
Dim n As Long, unit As String, nums() As Variant
Static rgx As Object, cmat As Object
'with rgx as static, it only has to be created once; beneficial when filling a long column with this UDF
If rgx Is Nothing Then
Set rgx = CreateObject("VBScript.RegExp")
End If
metricMeasure = vbNullString
With rgx
.Global = True
.MultiLine = False
.Pattern = "[0-9]{1,4}m?[lL]t?r?"
If .Test(str) Then
Set cmat = .Execute(str)
'resize the nums array to accept the matches
ReDim nums(cmat.Count - 1)
'get measurement unit
unit = Replace(cmat.Item(0), Val(cmat.Item(0)), vbNullString, 1, vbTextCompare)
'populate the nums array with the matches
For n = LBound(nums) To UBound(nums)
nums(n) = Val(cmat.Item(n))
Next n
'convert the nums array to a subtotal with unit suffix
metricMeasure = Application.Sum(nums) & unit
End If
End With
End Function

find and replace multiple values in vba

I am trying to perform this task in Microstation using VBA. I want to use it to find and replace multiple numbers in the drawing.
I know the program generally, however I am having trouble putting this together. I have 2 variables.
Thank you in advance and sorry for the badly written code. Just getting used to VBA!
Sub main()
Dim Find_text () As string = split ("150 160 170 180 190 200 210 220")
Dim Replace_text () As string = split ("15 16 17 18 19 20 21 22")
For i As Integer = 0 To Find_text.length - 1
'I will write my find and replace code here
Next
End Sub
You can use function InStr() to find substring in string and function Replace() to replace found string with replace_string
Sub main()
Dim Find_text() As String
Dim Replace_text() As String
Dim str As String
str = "test 150 test 160 test 170 test 200 test 220"
Find_text = Split("150 160 170 180 190 200 210 220")
Replace_text = Split("15 16 17 18 19 20 21 22")
For i = 0 To UBound(Find_text)
For j = 0 To UBound(Replace_text)
If InStr(str, Find_text(j)) > 0 Then
str = Replace(str, Find_text(j), Replace_text(j))
End If
Next
Next
MsgBox str
End Sub
The trick is to use the Microstation API to get a list of elements to operate on. You can use .GetSelectedElements or .Scan depending on how you want your tool to work. As you said you are just getting used to vba I have heavily commented the code below.
Other gotchas include text elements and text nodes (multi line text) need to be handled differently, and don't forget to .Rewrite the element once you have modified it.
Option Explicit
Sub replaceText()
Dim findText() As String
Dim replaceText() As String
'set find and replace
findText = Split("10 20 30 40 50")
replaceText = Split("a b c d e")
'guard against unequal length searches
If UBound(findText) <> UBound(replaceText) Then
MsgBox "Find and replace are not equal lengths"
Exit Sub
End If
' Scan Criteria are needed when looking for elements in a model
' Set up scan criteria to only include text types
Dim eSC As New ElementScanCriteria
eSC.ExcludeAllTypes
eSC.IncludeType msdElementTypeText 'text element
eSC.IncludeType msdElementTypeTextNode 'multiple line text element
Dim model As ModelReference
Set model = ActiveModelReference
' if you need to loop through multiple models you could use this
' Set model = ActiveDesignFile.Models(i)
' Element Enumerator is a list of elements
Dim elements As ElementEnumerator
'scan active model for text
Set elements = model.Scan(eSC)
' the elements could also be retrieved using
' Set elements = model.GetSelectedElements
' iterate through element set
' If there is another element in the list then
' MoveNext sets elements.Current to the next element and returns true
' otherwise it returns false and the loop exits.
Do While elements.MoveNext
Dim i As Integer
Dim textNodeI As Integer
Dim tempText As String
' elements.Current is a generic element
' we need to check its type to handle it correctly
If elements.Current.IsTextElement Then
' access the generic element using the text element interface
With elements.Current.AsTextElement
'.text here refers to elements.Current.AsTextElement.text as specified by the With statement
tempText = .text
'split is 0 indexed
For i = 0 To UBound(findText)
tempText = Replace(tempText, findText(i), replaceText(i))
Next
'set the elements text to the replaced text
.text = tempText
'rewrite the text element to the model
.Rewrite
End With
ElseIf elements.Current.IsTextNodeElement Then
With elements.Current.AsTextNodeElement
' TextNodes have an array of TextLines (1 indexed)
For textNodeI = 1 To .TextLinesCount
'same as for text but for each line of node
tempText = .TextLine(textNodeI)
For i = 0 To UBound(findText)
tempText = Replace(tempText, findText(i), replaceText(i))
Next
.TextLine(textNodeI) = tempText
Next
' Rewrite the text node after you have replaced each line.
.Rewrite
End With
End If
Loop
End Sub

Excel VBA Custom Function Remove Words Appearing in One String From Another String

I am trying to remove words appearing in one string from a different string using a custom function. For instance:
A1:
the was why blue hat
A2:
the stranger wanted to know why his blue hat was turning orange
The ideal outcome in this example would be:
A3:
stranger wanted to know his turning orange
I need to have the cells in reference open to change so that they can be used in different situations.
The function will be used in a cell as:
=WORDREMOVE("cell with words needing remove", "cell with list of words being removed")
I have a list of 20,000 rows and have managed to find a custom function that can remove duplicate words (below) and thought there may be a way to manipulate it to accomplish this task.
Function REMOVEDUPEWORDS(txt As String, Optional delim As String = " ") As String
Dim x
'Updateby20140924
With CreateObject("Scripting.Dictionary")
.CompareMode = vbTextCompare
For Each x In Split(txt, delim)
If Trim(x) <> "" And Not .exists(Trim(x)) Then .Add Trim(x), Nothing
Next
If .Count > 0 Then REMOVEDUPEWORDS = Join(.keys, delim)
End With
End Function
If you can guarantee that your words in both strings will be separated by spaces (no comma, ellipses, etc), you could just Split() both strings then Filter() out the words:
Function WORDREMOVE(ByVal strText As String, strRemove As String) As String
Dim a, w
a = Split(strText) ' Start with all words in an array
For Each w In Split(strRemove)
a = Filter(a, w, False, vbTextCompare) ' Remove every word found
Next
WORDREMOVE = Join(a, " ") ' Recreate the string
End Function
You can also do this using Regular Expressions in VBA. The version below is case insensitive and assumes all words are separated only by space. If there is other punctuation, more examples would aid in crafting an appropriate solution:
Option Explicit
Function WordRemove(Str As String, RemoveWords As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.ignorecase = True
.Global = True
.Pattern = "(?:" & Join(Split(WorksheetFunction.Trim(RemoveWords)), "|") & ")\s*"
WordRemove = .Replace(Str, "")
End With
End Function
My example is certainly not the best code, but it should work
Function WORDREMOVE(FirstCell As String, SecondCell As String)
Dim FirstArgument As Variant, SecondArgument As Variant
Dim FirstArgumentCounter As Integer, SecondArgumentCounter As Integer
Dim Checker As Boolean
WORDREMOVE = ""
FirstArgument = Split(FirstCell, " ")
SecondArgument = Split(SecondCell, " ")
For SecondArgumentCounter = 0 To UBound(SecondArgument)
Checker = False
For FirstArgumentCounter = 0 To UBound(FirstArgument)
If SecondArgument(SecondArgumentCounter) = FirstArgument(FirstArgumentCounter) Then
Checker = True
End If
Next FirstArgumentCounter
If Checker = False Then WORDREMOVE = WORDREMOVE & SecondArgument(SecondArgumentCounter) & " "
Next SecondArgumentCounter
WORDREMOVE = Left(WORDREMOVE, Len(WORDREMOVE) - 1)
End Function

Find and replace all names of variables in VBA module

Let's assume that we have one module with only one Sub in it, and there are no comments. How to identify all variable names ? Is it possible to identify names of variables which are not defined using Dim ? I would like to identify them and replace each with some random name to obfuscate my code (O0011011010100101 for example), replace part is much easier.
List of characters which could be use in names of macros, functions and variables :
ABCDEFGHIJKLMNOPQRSTUVWXYZdefghijklmnopqrstuvwxyzg€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžź ˇ˘Ł¤Ą¦§¨©Ş«¬­®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙ÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ˙
Below are my function I've wrote recenlty :
Function randomName(n as integer) as string
y="O"
For i = 2 To n:
If Rnd() > 0.5 Then
y = y & "0"
Else
y = y & "1"
End If
Next i
randomName=y
End Function
In goal to replace given strings in another string which represent the code of module I use below sub :
Sub substituteNames()
'count lines in "Module1" which is part of current workbook
linesCount = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.CountOfLines
'read code from module
code = ActiveWorkbook.VBProject.VBComponents("Module1").CodeModule.Lines(StartLine:=1, Count:=linesCount)
inputStr = Array("name1", "name2", "name2") 'some hardwritten array with string to replace
namesLength = 20 'length of new variables names
For i = LBound(inputStr) To UBound(inputStr)
outputString = randomName(namesLength-1)
code = Replace(code, inputStr(i), outputString)
Next i
Debug.Print code 'view code
End Sub
then we simply substitute old code with new one, but how to identify strings with names of variables ?
Edition
Using **Option Explicit ** decrease safety of my simple method of obfuscation, because to reverse changes you only have to follow Dim statements and replace ugly names with something normal. Except that to make such substitution harder, I think it's good idea to break the line in the middle of variable name :
O0O000O0OO0O0000 _
0O00000O0OO0
the simple method is also replacing some strings with chains based on chr functions chr(104)&chr(101)&chr(108)&chr(108)&chr(111) :
Sub stringIntoChrChain()
strInput = "hello"
strOutput = ""
For i = 1 To Len(strInput)
strOutput = strOutput & "chr(" & Asc(Mid(strInput, i, 1)) & ")&"
Next i
Debug.Print Mid(strOutput, 1, Len(strOutput) - 1)
End Sub
comments like below could make impression on user and make him think that he does not poses right tool to deal with macro etc.:
'(k=Äó¬)w}ż^¦ů‡ÜOyúm=ěËnóÚŽb W™ÄQó’ (—*-ĹTIäb
'R“ąNPÔKZMţ†üÍQ‡
'y6ű˛Š˛ŁŽ¬=iýQ|˛^˙ ‡ńb ¬ĂÇr'ń‡e˘źäžŇ/âéç;1qýěĂj$&E!V?¶ßšÍ´cĆ$Âű׺Ůî’ﲦŔ?TáÄu[nG¦•¸î»éüĽ˙xVPĚ.|
'ÖĚ/łó®Üă9Ę]ż/ĹÍT¶Mµę¶mÍ
'q[—qëýY~Pc©=jÍ8˘‡,Ú+ń8ŐűŻEüńWü1ďëDZ†ć}ęńwŠbŢ,>ó’Űçµ™Š_…qÝăt±+‡ĽČg­řÍ!·eŠP âńđ:ŶOážű?őë®ÁšńýĎáËTbž}|Ö…ăË[®™
You can use a regular expression to find variable assignments by looking for the equals sign. You'll need to add a reference to the Microsoft VBScript Regular Expressions 5.5 and Microsoft Visual Basic for Applications Extensibility 5.3 libraries as I've used early binding.
Please be sure to back up your work and test this before using it. I could have gotten the regex wrong.
UPDATE:
I've refined the regular expressions so that it no longer catches datatypes of strongly typed constants (Const ImAConstant As String = "Oh Noes!" previously returned String). I've also added another regex to return those constants as well. The last version of the regex also mistakenly caught things like .Global = true. That was corrected. The code below should return all variable and constant names for a given code module. The regular expressions still aren't perfect, as you'll note that I was unable to stop false positives on double quotes. Also, my array handling could be done better.
Sub printVars()
Dim linesCount As Long
Dim code As String
Dim vbPrj As VBIDE.VBProject
Dim codeMod As VBIDE.CodeModule
Dim regex As VBScript_RegExp_55.RegExp
Dim m As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim i As Long
Dim j As Long
Dim isInDatatypes As Boolean
Dim isInVariables As Boolean
Dim datatypes() As String
Dim variables() As String
Set vbPrj = VBE.ActiveVBProject
Set codeMod = vbPrj.VBComponents("Module1").CodeModule
code = codeMod.Lines(1, codeMod.CountOfLines)
Set regex = New RegExp
With regex
.Global = True ' match all instances
.IgnoreCase = True
.MultiLine = True ' "code" var contains multiple lines
.Pattern = "(\sAs\s)([\w]*)(?=\s)" ' get list of datatypes we've used
' match any whole word after the word " As "
Set matches = .Execute(code)
End With
ReDim datatypes(matches.count - 1)
For i = 0 To matches.count - 1
datatypes(i) = matches(i).SubMatches(1) ' return second submatch so we don't get the word " As " in our array
Next i
With regex
.Pattern = "(\s)([^\.\s][\w]*)(?=\s\=)" ' list of variables
' begins with a space; next character is not a period (handles "with" assignments) or space; any alphanumeric character; repeat until... space
Set matches = .Execute(code)
End With
ReDim variables(matches.count - 1)
For i = 0 To matches.count - 1
isInDatatypes = False
isInVariables = False
' check to see if current match is a datatype
For j = LBound(datatypes) To UBound(datatypes)
If matches(i).SubMatches(1) = datatypes(j) Then
isInDatatypes = True
Exit For
End If
'Debug.Print matches(i).SubMatches(1)
Next j
' check to see if we already have this variable
For j = LBound(variables) To i
If matches(i).SubMatches(1) = variables(j) Then
isInVariables = True
Exit For
End If
Next j
' add to variables array
If Not isInDatatypes And Not isInVariables Then
variables(i) = matches(i).SubMatches(1)
End If
Next i
With regex
.Pattern = "(\sConst\s)(.*)(?=\sAs\s)" 'strongly typed constants
' match anything between the words " Const " and " As "
Set matches = .Execute(code)
End With
For i = 0 To matches.count - 1
'add one slot to end of array
j = UBound(variables) + 1
ReDim Preserve variables(j)
variables(j) = matches(i).SubMatches(1) ' again, return the second submatch
Next i
' print variables to immediate window
For i = LBound(variables) To UBound(variables)
If variables(i) <> "" And variables(i) <> Chr(34) Then ' for the life of me I just can't get the regex to not match doublequotes
Debug.Print variables(i)
End If
Next i
End Sub

loading formatted data in VBA from a text file

I'm looking for the best way of loading formatted data in VBA. I’ve spent quite some time trying to find the equivalent of C-like or Fortran-like fscanf type functions, but without success.
Basically I want to read from a text file millions of numbers placed on many (100,000’s) lines with 10 numbers each (except the last line, possibly 1-10 numbers). The numbers are separated by spaces, but I don’t know in advance the width of each field (and this width changes between data blocks).
e.g.
397143.1 396743.1 396343.1 395943.1 395543.1 395143.1 394743.1 394343.1 393943.1 393543.1
-0.11 -0.10 -0.10 -0.10 -0.10 -0.09 -0.09 -0.09 -0.09 -0.09
0.171 0.165 0.164 0.162 0.158 0.154 0.151 0.145 0.157 0.209
Previously I’ve used the Mid function but in this case I can’t, because I don’t know in advance the width of each field. Also it's too many lines to load in an Excel sheet. I can think of a brute force way in which I look at each successive character and determine whether it’s a space or a number, but it seems terribly clumsy.
I’m also interested in pointers on how to write formatted data, but this seems easier -- just format each string and concatenate them using &.
The following snippet will read whitespace-delimited numbers from a text file:
Dim someNumber As Double
Open "YourDataFile.txt" For Input As #1
Do While Not (EOF(1))
Input #1, someNumber
`// do something with someNumber here...`
Loop
Close #1
update: Here is how you could read one line at a time, with a variable number of items on each line:
Dim someNumber As Double
Dim startPosition As Long
Dim endPosition As Long
Dim temp As String
Open "YourDataFile" For Input As #1
Do While Not (EOF(1))
startPosition = Seek(1) '// capture the current file position'
Line Input #1, temp '// read an entire line'
endPosition = Seek(1) '// determine the end-of-line file position'
Seek 1, startPosition '// jump back to the beginning of the line'
'// read numbers from the file until the end of the current line'
Do While Not (EOF(1)) And (Seek(1) < endPosition)
Input #1, someNumber
'// do something with someNumber here...'
Loop
Loop
Close #1
You could also use regular expressions to replace multiple whitespaces to one space and then use the Split function for each line like the example code shows below.
After 65000 rows have been processed a new sheet will be added to the Excel workbook so the source file can be bigger than the max number of rows in Excel.
Dim rx As RegExp
Sub Start()
Dim fso As FileSystemObject
Dim stream As TextStream
Dim originalLine As String
Dim formattedLine As String
Dim rowNr As Long
Dim sht As Worksheet
Dim shtCount As Long
Const maxRows As Long = 65000
Set fso = New FileSystemObject
Set stream = fso.OpenTextFile("c:\data.txt", ForReading)
rowNr = 1
shtCount = 1
Set sht = Worksheets.Add
sht.Name = shtCount
Do While Not stream.AtEndOfStream
originalLine = stream.ReadLine
formattedLine = ReformatLine(originalLine)
If formattedLine <> "" Then
WriteValues formattedLine, rowNr, sht
rowNr = rowNr + 1
If rowNr > maxRows Then
rowNr = 1
shtCount = shtCount + 1
Set sht = Worksheets.Add
sht.Name = shtCount
End If
End If
Loop
End Sub
Function ReformatLine(line As String) As String
Set rx = New RegExp
With rx
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "[\s]+"
ReformatLine = .Replace(line, " ")
End With
End Function
Function WriteValues(formattedLine As String, rowNr As Long, sht As Worksheet)
Dim colNr As Long
colNr = 1
stringArray = Split(formattedLine, " ")
For Each stringItem In stringArray
sht.Cells(rowNr, colNr) = stringItem
colNr = colNr + 1
Next
End Function