loading formatted data in VBA from a text file - vba

I'm looking for the best way of loading formatted data in VBA. I’ve spent quite some time trying to find the equivalent of C-like or Fortran-like fscanf type functions, but without success.
Basically I want to read from a text file millions of numbers placed on many (100,000’s) lines with 10 numbers each (except the last line, possibly 1-10 numbers). The numbers are separated by spaces, but I don’t know in advance the width of each field (and this width changes between data blocks).
e.g.
397143.1 396743.1 396343.1 395943.1 395543.1 395143.1 394743.1 394343.1 393943.1 393543.1
-0.11 -0.10 -0.10 -0.10 -0.10 -0.09 -0.09 -0.09 -0.09 -0.09
0.171 0.165 0.164 0.162 0.158 0.154 0.151 0.145 0.157 0.209
Previously I’ve used the Mid function but in this case I can’t, because I don’t know in advance the width of each field. Also it's too many lines to load in an Excel sheet. I can think of a brute force way in which I look at each successive character and determine whether it’s a space or a number, but it seems terribly clumsy.
I’m also interested in pointers on how to write formatted data, but this seems easier -- just format each string and concatenate them using &.

The following snippet will read whitespace-delimited numbers from a text file:
Dim someNumber As Double
Open "YourDataFile.txt" For Input As #1
Do While Not (EOF(1))
Input #1, someNumber
`// do something with someNumber here...`
Loop
Close #1
update: Here is how you could read one line at a time, with a variable number of items on each line:
Dim someNumber As Double
Dim startPosition As Long
Dim endPosition As Long
Dim temp As String
Open "YourDataFile" For Input As #1
Do While Not (EOF(1))
startPosition = Seek(1) '// capture the current file position'
Line Input #1, temp '// read an entire line'
endPosition = Seek(1) '// determine the end-of-line file position'
Seek 1, startPosition '// jump back to the beginning of the line'
'// read numbers from the file until the end of the current line'
Do While Not (EOF(1)) And (Seek(1) < endPosition)
Input #1, someNumber
'// do something with someNumber here...'
Loop
Loop
Close #1

You could also use regular expressions to replace multiple whitespaces to one space and then use the Split function for each line like the example code shows below.
After 65000 rows have been processed a new sheet will be added to the Excel workbook so the source file can be bigger than the max number of rows in Excel.
Dim rx As RegExp
Sub Start()
Dim fso As FileSystemObject
Dim stream As TextStream
Dim originalLine As String
Dim formattedLine As String
Dim rowNr As Long
Dim sht As Worksheet
Dim shtCount As Long
Const maxRows As Long = 65000
Set fso = New FileSystemObject
Set stream = fso.OpenTextFile("c:\data.txt", ForReading)
rowNr = 1
shtCount = 1
Set sht = Worksheets.Add
sht.Name = shtCount
Do While Not stream.AtEndOfStream
originalLine = stream.ReadLine
formattedLine = ReformatLine(originalLine)
If formattedLine <> "" Then
WriteValues formattedLine, rowNr, sht
rowNr = rowNr + 1
If rowNr > maxRows Then
rowNr = 1
shtCount = shtCount + 1
Set sht = Worksheets.Add
sht.Name = shtCount
End If
End If
Loop
End Sub
Function ReformatLine(line As String) As String
Set rx = New RegExp
With rx
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "[\s]+"
ReformatLine = .Replace(line, " ")
End With
End Function
Function WriteValues(formattedLine As String, rowNr As Long, sht As Worksheet)
Dim colNr As Long
colNr = 1
stringArray = Split(formattedLine, " ")
For Each stringItem In stringArray
sht.Cells(rowNr, colNr) = stringItem
colNr = colNr + 1
Next
End Function

Related

How to recursively parse data out of an e-mail using VBA?

So I get e-mails every day with information in them. Unfortunately, for some reason, the data is sent in the body of the e-mail, and not as an attachment. Fine then. I'm using Excel to scrape Outlook, using VBA.
Sub mytry()
Dim olapp As Object
Dim olmapi As Object
Dim olmail As Object
Dim olitem As Object
Dim lrow As Integer
Dim olattach As Object
Dim str As String
Dim TextWeNeedToParse as String
Const num As Integer = 6
Const path As String = "C:\HP\"
Const emailpath As String = "C:\Dell\"
Const olFolderInbox As Integer = 6
Set olp = CreateObject("outlook.application")
Set olmapi = olp.getnamespace("MAPI")
Set olmail = olmapi.getdefaultfolder(num)
If olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""").Count = 0 Then
Else
For Each olitem In olmail.items.restrict("[ReceivedTime]>=""&MacroDate&12:00am&""")
TextWeNeedToParse = olitem.body
'Recursive text parsing here
Next olitem
End If
Ok, so this code snippet should get me the entire body of the text into a string. Now we can pass the string around, and manipulate it.
A sample of the text I'm dealing with:
WAL +300bp QTY
(M) FCTR SECURITY CPN ASK 1mPSA TYPE
0.77 1.15 458 0.04 GNR 2012-61 CA 2.00 99-16 217 SEQ
1.39 2.26 120 0.76 GNR 2005-13 AE 5.00 102-24 223 SUP
1.40 18.16 45 0.65 GNR 2015-157 NH 2.50 95-16 215 EXCH,+
1.50 21.56 25 0.94 GNR 2017-103 HD 3.00 98-08 375 PAC-2
So there are a few different ways I can see myself tackling this, but I don't quite know all of the pieces.
1) I could try counting how many carriage returns exist, and doing a loop. Then "counting" spaces to figure out where everything is. Not quite sure how well it would work.
2) I could regex out the unique ID in the middle, and if I can figure out how to regex the nth instance (a major point where I'm stuck), I could also use that to regex out the numbers - for example, line one would be the 1-5 instance of straight numbers/decimals together surrounded by spaces, and the first instance of number-number-dash-number-number.
Sample Regex Code that I'd throw through it:
Function regex(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
Dim replaceNumber As Integer
With inputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = matchPattern
End With
With outputRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "\$(\d+)"
End With
With outReplaceRegexObj
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
Set inputMatches = inputRegexObj.Execute(strInput)
If inputMatches.Count = 0 Then
regex = False
Else
Set replaceMatches = outputRegexObj.Execute(outputPattern)
For Each replaceMatch In replaceMatches
replaceNumber = replaceMatch.SubMatches(0)
outReplaceRegexObj.Pattern = "\$" & replaceNumber
If replaceNumber = 0 Then
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).Value)
Else
If replaceNumber > inputMatches(0).SubMatches.Count Then
'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
regex = CVErr(xlErrValue)
Exit Function
Else
outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
End If
End If
Next
regex = outputPattern
End If
End Function
3) I could try some of the methods above, but use recursion. My recursion is fairly weak.
So once I have the text string extracted, I imagine I'd need something like:
Sub QuickExample(Dim Cusip as String, Dim PriceStr as variant, Dim SpreadStr as variant)
Dim ws as WorkSheet
Set ws = thisworkbook.sheets("Results")
LastRow = ws.Cells(sht.Rows.Count, "A").End(xlUp).Row
ws.cells(Lastrow,1).value2 = Cusip
ws.cells(Lastrow,2).value2 = PriceStr
ws.cells(Lastrow,3).value2 = SpreadStr
End Sub
And lastly:
Sub ParsingDate(EmailText as String)
Dim CarriageReturns As Long
CarriageReturns = Len(EmailText) - Len(Replace(EmailText, Chr(10), ""))
For i = 1 to CarriageReturns
'Parse out the data for the ith row, return it to the function above
Next i
End Sub
It's the actual act of parsing which I'm struggling a bit with - how do I properly get the nth result, and only the nth result? How do I make sure it keeps working even if some extra spaces or lines get added? Is there a way to just use regex, and "look" around the nth finding of a given expression? Is it doable to make this without a lot of recursion?
Thank you
WAL +300bp QTY
(M) FCTR SECURITY CPN ASK 1mPSA TYPE
0.77 1.15 458 0.04 GNR 2012-61 CA 2.00 99-16 217 SEQ
1.39 2.26 120 0.76 GNR 2005-13 AE 5.00 102-24 223 SUP
1.40 18.16 45 0.65 GNR 2015-157 NH 2.50 95-16 215 EXCH,+
1.50 21.56 25 0.94 GNR 2017-103 HD 3.00 98-08 375 PAC-2
This seems like a pretty well formatted table. Perhaps pop each line into an array using Split() and then each field into an array, again using split():
Sub dump()
arrLine = Split(TextWeNeedToParse, Chr(10))
For Each Line In arrLine
For Each field In Split(Line, " ")
Debug.Print field
Next
Next
End Sub
That's super short and runs quick. You are just an if statement and counter (or regex test) away from getting the exact items you want.
Testing/counting may be easier if you remove multiple spaces so the split() puts each element in it's proper place. You could employee a loop to remove multiple spaces before running this:
Fully implemented it might be something like:
<your code to get the bod>
'remove multiple spaces from string for parsing
Do While InStr(1, TextWeNeedToParse, " ")
TextWeNeedToParse= Replace(TextWeNeedToParse, " ", " ")
Loop
'Get each line into an array element
arrLine = Split(TextWeNeedToParse, Chr(10))
'Loop through the array
For Each Line In arrLine
'dump fields to an array
arrFields = Split(Line, " ")
'and spit out a particular element (your "unique id" is element 5)
If UBound(arrFields) >= 5 Then Debug.Print "unique id:"; arrFields(5)
Next

Writing Fixed width text files from excel vba

This is the output of a program.
I have specified what shall be width of each cell in the program and my program shows correct output.
What I want to do is cell content shall be written from right to left. E.g highlighted figure 9983.54 has width of 21. Text file has used first 7 columns. But I want it to use last 7 columns of text file.
Please see expected output image.
I am not getting any clue how to do this. I am not a very professional programmer but I love coding. This text file is used as input to some other program and i am trying to automate writing text file from excel VBA.
Can anyone suggest a way to get this output format?
Here is the code which gave me first output
Option Explicit
Sub CreateFixedWidthFile(strFile As String, ws As Worksheet, s() As Integer)
Dim i As Long, j As Long
Dim strLine As String, strCell As String
'get a freefile
Dim fNum As Long
fNum = FreeFile
'open the textfile
Open strFile For Output As fNum
'loop from first to last row
'use 2 rather than 1 to ignore header row
For i = 1 To ws.Range("a65536").End(xlUp).Row
'new line
strLine = ""
'loop through each field
For j = 0 To UBound(s)
'make sure we only take chars up to length of field (may want to output some sort of error if it is longer than field)
strCell = Left$(ws.Cells(i, j + 1).Value, s(j))
'add on string of spaces with length equal to the difference in length between field length and value length
strLine = strLine & strCell & String$(s(j) - Len(strCell), Chr$(32))
Next j
'write the line to the file
Print #fNum, strLine
Next i
'close the file
Close #fNum
End Sub
'for example the code could be called using:
Sub CreateFile()
Dim sPath As String
sPath = Application.GetSaveAsFilename("", "Text Files,*.txt")
If LCase$(sPath) = "false" Then Exit Sub
'specify the widths of our fields
'the number of columns is the number specified in the line below +1
Dim s(6) As Integer
'starting at 0 specify the width of each column
s(0) = 21
s(1) = 9
s(2) = 15
s(3) = 11
s(4) = 12
s(5) = 10
s(6) = 186
'for example to use 3 columns with field of length 5, 10 and 15 you would use:
'dim s(2) as Integer
's(0)=5
's(1)=10
's(2)=15
'write to file the data from the activesheet
CreateFixedWidthFile sPath, ActiveSheet, s
End Sub
Something like this should work:
x = 9983.54
a = Space(21-Len(CStr(x))) & CStr(x)
Then a will be 14 spaces followed by x:
a = " 9983.54"
Here 21 is the desired column width --- change as necessary. CStr may be unnecessary for non-numeric x.
If you're going to right-justify a lot of different data to different width fields you could write a general purpose function:
Function LeftJust(val As String, width As Integer) As String
LeftJust = Space(width - Len(val)) & val
End Function
The you call it with LeftJust(CStr(9983.54), 21).
Also note that VBA's Print # statement has a Spc(n) parameter that you can use to produce fixed-width output, e.g., Print #fNum, Spc(n); a; before this statement you calculate n: n = 21-Len(CStr(a)).
Hope that helps

Trying to write a an Excel macro to find a large text string and copy to another sheet

I have a spreadsheet with a column of cells that each contain several paragraphs of text. I'm trying to write a macro that will grab several sentences between these text phrases "How we made our decision" and "Conclusion"
The location of this text string varies from sheet to sheet but the column is always consistent.
I've been able to find a bunch of vba scripts that allow me to find and copy 1 word at a time or simple batches of single word. I'm just not able to figure our or find something that allows me to copy an entire paragraph from within a single cell of paragraphs.
The code below just grabs the entire table. As you can see in the beginning portion I was able to get what I need however I found out that the (70) is irrelevant because the table size changes with each pull of the record.
Sub GetTheData()
Dim T As String
Dim SWs As New SHDocVw.ShellWindows
Dim IE As SHDocVw.InternetExplorer
Dim LetPr As InternetExplorer
Dim Doc As HTMLDocument
'Dim IE As Object
Dim tbls, tbl, trs, tr, tds, td, r, c
For Each IE In SWs
If IE.LocationName = "Letter Preparation Case Summary – Member Case" Then
Set LetPr = IE
'LetPr.document.getElementById
T = IE.document.getElementsByTagName("td")(70).innerText
'T = Trim(Mid(T, InStr(T, "How We Made Our Decision: ") + 0, InStr(T, "Conclusion") - (InStr(T, "How We Made Our Decision:") + 26)))
Exit For
End If
Next
Set tbls = IE.document.getElementsByTagName("table")
For r = 0 To tbls.Length - 1
Debug.Print r, tbls(r).Rows.Length
Next r
Set tbl = IE.document.getElementsByTagName("table")(9)
Set trs = tbl.getElementsByTagName("tr")
For r = 0 To trs.Length - 1
Set tds = trs(r).getElementsByTagName("td")
'if no <td> then look for <th>
If tds.Length = 0 Then Set tds = trs(r).getElementsByTagName("th")
For c = 0 To tds.Length - 1
ActiveSheet.Range("A1").Offset(r, c).Value = tds(c).innerText
Next c
Next r
End Sub
You stated that you wanted the text 'between these text phrases' so the beginning position of the found text will have to be adjusted by hte length of the searched string.
dim beginStr as string, endStr as string, beginPos as long, endPos as long
beginStr = "How We Made Our Decision:"
endStr = "Conclusion"
beginPos = instr(1, T, beginStr, vbtextcompare)
endPos = instr(beginPos, T, endStr, vbtextcompare)
if cbool(beginPos) and cbool(endPos) then
beginPos = beginPos + len(beginStr)
T = Trim(Mid(T, beginPos, endPos - beginPos))
end if
That last endPos - beginPos might have to be adjusted by subtracting 1.

Please suggest reg express for every nth occurrence of a character in a string in vba

At the very outset, let me confess. I am not a developer nor do I have any technical background. However, I have learned a bit of coding. I want to a regular expression for finding every nth position of a character. For example, google, yahoo, rediff, facebook, cnn, pinterest, gmail. I want to find every third occurrence of the comma in the string and replace it with a semicolon. This is for a excel vba macro I am working on. For now, I am trying to loop through it and then replace. If the data is large, the macro fails. Would appreciate your reply. Thanks a ton in advance.
Here is what I am doing:
Option Explicit
Sub reg()
Dim regx As RegExp
Set regx = New RegExp
Dim allMatches As Object
Dim contents As String
Dim contents2 As String
contents = "hello, wow, true, false, why, try, cry, haha"
contents = "contents1" & contents
regx.pattern = ",{4}"
regx.Global = True
regx.IgnoreCase = False
Set allMatches = regx.Execute(contents)
contents2 = regx.Replace(contents, ";")
MsgBox contents2
End Sub
I get the data from all selected cells. Join it. Add semicolon (an indicator for the row end) at every fourth comma found. Please suggest if there is a better way to do it as I am new to this:
Here is what I have done currently by looping through array. I want to avoid this.
Function insertColon(sInputString As String) As Variant
Dim data3 As String
Dim sFind As String
Dim sReplacewith As String
Dim result As String
'Dim i As Integer
Dim Counter As Integer
sFind = ","
sReplacewith = ";"
data3 = sInputString
' MsgBox = data3
' Dim J As Integer
Application.Volatile
FindN = 0
'Dim i As Integer
' i = 1
Counter = 4
' MsgBox Len(data3)
While InStr(Counter, sInputString, sFind)
FindN = InStr(Counter, sInputString, sFind)
data3 = Application.Substitute(data3, sFind, sReplacewith, Counter)
Counter = Counter + 3
' MsgBox "loop" & i
'
' i = i + 1
Wend
If I understood you properly then all your code could be summarized to a few lines
Dim sText As String
sText = "hello, wow, true, false, why, try, cry, haha, huh, muh"
For p = 3 To Len(sText) Step 2
sText = WorksheetFunction.Substitute(sText, ",", ";", p)
Next p
MsgBox sText
'Output
'hello, wow, true; false, why, try; cry, haha, huh; muh
Test it and let me know whether it fails. If not don't forget to accept the answer.

Splitting a string with variable number of spaces VBA

I have a file with a bunch of numbers in columns. These numbers are separated by a variable number of spaces. I want to skip the first line and get all the other lines and separate each number on the line. Finally, I want to write each number in Excel. I've been able to get the lines and write them on Excel but I can't separate each number (I'm getting the whole line as one string).
Does any body know how to split a string that has a variable number of spaces?
Here is my code.
Sub Test()
r = 0
With New Scripting.FileSystemObject
With .OpenTextFile("C:\Users\User\Desktop\File.tab", ForReading)
If Not .AtEndOfStream Then .SkipLine
Do Until .AtEndOfStream
ActiveCell.Offset(r, 0) = Split(.ReadLine, vbCrLf)
r = r + 1
Loop
End With
End With
End Sub
If you use the Excel worksheet function trim in place of the VBA function then excel will remove multiple spaces within a cell (not just from the left and right ends). Something like the below should solve the problem. I'm afraid I've not tested it as I haven't a copy of Excel handy.
Sub Test()
Dim splitValues As Variant
Dim i As Long
r = 0
With New Scripting.FileSystemObject
With .OpenTextFile("C:\Users\User\Desktop\File.tab", ForReading)
If Not .AtEndOfStream Then .SkipLine
Do Until .AtEndOfStream
ActiveCell.Offset(r, 0) = Split(.ReadLine, vbCrLf)
Application.Trim(ActiveCell.Offset(r, 0))
splitValues = Split(ActiveCell.Offset(r, 0), " ")
For i = 0 To UBound(x)
ActiveCell.Offset(r, i+1) = splitValues(i)
Next
Loop
r = r + 1
End With
End With