Trying to write a an Excel macro to find a large text string and copy to another sheet - vba

I have a spreadsheet with a column of cells that each contain several paragraphs of text. I'm trying to write a macro that will grab several sentences between these text phrases "How we made our decision" and "Conclusion"
The location of this text string varies from sheet to sheet but the column is always consistent.
I've been able to find a bunch of vba scripts that allow me to find and copy 1 word at a time or simple batches of single word. I'm just not able to figure our or find something that allows me to copy an entire paragraph from within a single cell of paragraphs.
The code below just grabs the entire table. As you can see in the beginning portion I was able to get what I need however I found out that the (70) is irrelevant because the table size changes with each pull of the record.
Sub GetTheData()
Dim T As String
Dim SWs As New SHDocVw.ShellWindows
Dim IE As SHDocVw.InternetExplorer
Dim LetPr As InternetExplorer
Dim Doc As HTMLDocument
'Dim IE As Object
Dim tbls, tbl, trs, tr, tds, td, r, c
For Each IE In SWs
If IE.LocationName = "Letter Preparation Case Summary – Member Case" Then
Set LetPr = IE
'LetPr.document.getElementById
T = IE.document.getElementsByTagName("td")(70).innerText
'T = Trim(Mid(T, InStr(T, "How We Made Our Decision: ") + 0, InStr(T, "Conclusion") - (InStr(T, "How We Made Our Decision:") + 26)))
Exit For
End If
Next
Set tbls = IE.document.getElementsByTagName("table")
For r = 0 To tbls.Length - 1
Debug.Print r, tbls(r).Rows.Length
Next r
Set tbl = IE.document.getElementsByTagName("table")(9)
Set trs = tbl.getElementsByTagName("tr")
For r = 0 To trs.Length - 1
Set tds = trs(r).getElementsByTagName("td")
'if no <td> then look for <th>
If tds.Length = 0 Then Set tds = trs(r).getElementsByTagName("th")
For c = 0 To tds.Length - 1
ActiveSheet.Range("A1").Offset(r, c).Value = tds(c).innerText
Next c
Next r
End Sub

You stated that you wanted the text 'between these text phrases' so the beginning position of the found text will have to be adjusted by hte length of the searched string.
dim beginStr as string, endStr as string, beginPos as long, endPos as long
beginStr = "How We Made Our Decision:"
endStr = "Conclusion"
beginPos = instr(1, T, beginStr, vbtextcompare)
endPos = instr(beginPos, T, endStr, vbtextcompare)
if cbool(beginPos) and cbool(endPos) then
beginPos = beginPos + len(beginStr)
T = Trim(Mid(T, beginPos, endPos - beginPos))
end if
That last endPos - beginPos might have to be adjusted by subtracting 1.

Related

Fast way to add an array of paragraphs to Word document

The test code below examines paragraphs in ActiveDocument and puts 'copies' of unique paragraphs at the bottom of the document, followed by their original spacing blank paragraphs. Paragraphs are manipulated in an array, and qualifying paragraphs are then added one by one to the bottom of that document. Is there a faster way of adding those paragraphs there? I am hoping there is a way to add the array directly without needing the loop. I think that it is possible to assign an array to a range in Excel (see Rick Rothstein), but I can't see how to do that in Word 2010.
Sub FullArray()
Dim StartTime 'Start time
Dim p As Paragraph 'is each initial paragraph object in ActiveDocument
Dim pDict As New Scripting.Dictionary 'Keys=plain text versions of each inital para
'Items=signifiers of each key's (and para's)uniqueness or otherwise
Dim t As String 'Plain text version of each p, being a key of pDict
Dim pArray(1000) As Variant 'Contains all initial paragraph objects
Dim c As Integer 'c is ordinal number of each element of pArray
Dim dky As String 'dky is whichever element of pArray is to be used as a key of pDict
Dim pc As Integer 'running count of plain text paras in pDict
Dim lastdky As Integer 'signifies whether previous key of pDict is unique
'faster when dimmed, option explicit
StartTime = Timer
Application.ScreenUpdating = False 'Line 1 of Go to end of doc
ActiveDocument.Characters.Last.Select 'Line 2 of Go to end of doc. Is there a Faster way?
Selection.Collapse
'ADD each para object to Array. Write its plain text to dictionary...
'...in order to determine uniqueness of each para.
For Each p In ActiveDocument.Paragraphs
t = p.Range.Text
If Not pDict.Exists(t) Then
pDict.Add Key:=t, Item:=1 '1 flag means 1st instance of a para, including blamk paras
Else: pDict(t) = 2 '2 flag means a para which has duplicates
End If
pc = pc + 1 'count plain text paras in pDict
pArray(pc) = p 'set element number pc of Array = current paragraph object
Next p
'PLACE copies of certain paras at the end of document...
'...being those content-containing paras which were initially unique....
'...and place after each such para any following contiguous blank paras
lastdky = 1 '2/1 means PREVIOUS initial paragraph had/had not dupes.
For c = pc - 1 To 1 Step -1
dky = pArray(c)
If pDict(dky) = 1 And pArray(c) <> Chr(13) Then Selection.FormattedText = pArray(c) 'place para with content ('content paras') at end
If pArray(c) = Chr(13) And lastdky = 1 Then Selection.FormattedText = pArray(c) 'place (only) blank paras following content paras at end
If pDict(dky) = 2 Then lastdky = 2 Else: lastdky = 1
Next c
MsgBox "This code ran successfully in " & Round(Timer - StartTime, 2) & " seconds", vbInformation
Application.ScreenUpdating = True
End Sub

Extracting cell contents based on the string within the cell using dictionaries

I am quite new to VBA and was looking through stackoverflow. I found a neat VBA macro that utilizes Dictionaries and seemed like it could be applied to my problem also. However, after editing the macro I can't seem to make it work like I mean to.
My data is as follows: In column A I have Review numbers, Review topics and Analysis numbers. These follow a structure where Review number is 1st, then 2 rows later there is Review topic and under Review topic can be multiple Analysis numbers but the 1st one is 2 rows below the Review topic. In column B there are details about the reviewed item. I am interested in 3 different ones(height, weight and price). There is sometimes extra details here which is why I use string-matching (InStr). Sometimes there are less details. In general the data does not have a specific enough form to rely on the amount of rows between specific groups of data.
Data looks generally like this: https://imgur.com/a/QcdrMcR
The goal is to move extract the contents of the cells containing Review number, Review topic, Analysis number, Height, Weight and Price. These should be in separate cells on the same row. In case of multiple analysis, the following analysis should be below the row containing the 1st analysis and height, weight and price following this like before. Review number and topic dont need to be duplicated.
In the code I use dictionaries and a whole lotta ElseIfs. As I said, this is largely taken from another post. It works fine if I try to find the 1st analysis detail but when I try to find all 3 it stops working altogether giving me error 424 object required on the 2 last ElseIfs in the 1st loop. Moreover, the part that mostly works (finding the detailA which is height) only works if the searched string is found on the cell one row below the current row. In other words, it only works if the height/detailA is on row i+1
Sub FindData()
Dim datasheet As Worksheet
Dim reportsheet As Worksheet
Dim SearchString As String
Dim SearchString2 As String
Dim i As Integer
Set datasheet = Sheet1
Set reportsheet = Sheet2
Dim chNum As String
Dim chSub As String
Dim analysisNum As String
Dim detailA As String
Dim detailB As String
Dim detailC As String
Dim ReviewCollection As New Dictionary
Dim dictKey1 As Variant
Dim dictKey2 As Variant
Dim dictKey3 As Variant
Dim dictKey4 As Variant
Dim dictKey5 As Variant
Dim dictKey6 As Variant
reportsheet.Range("A1:H200").ClearContents
finalrow = datasheet.Cells(datasheet.Rows.Count, 1).End(xlUp).Row
For i = 1 To finalrow
SearchString = datasheet.Range("A" & i)
SearchString2 = datasheet.Range("B" & i)
If InStr(1, SearchString, "Review number") Then
chNum = datasheet.Cells(i, 1)
ReviewCollection.Add chNum, New Dictionary 'For review numbers
ElseIf InStr(1, SearchString, "Review topic") Then
chSub = datasheet.Cells(i, 1)
ReviewCollection.Item(chNum).Add chSub, New Dictionary 'For review topics
ElseIf InStr(1, SearchString, "Analysis number") Then
analysisNum = datasheet.Cells(i, 1)
ReviewCollection.Item(chNum).Item(chSub).Add analysisNum, New Dictionary 'For Analysis numbers
ElseIf InStr(1, SearchString2, "Height") Then
detailA = datasheet.Cells(i, 2)
ReviewCollection.Item(chNum).Item(chSub).Item(analysisNum).Add detailA, New Dictionary 'For Analysis detail #1
'ElseIf InStr(1, SearchString2, "Weight") Then
' detailB = datasheet.Cells(i, 2)
' ReviewCollection.Item(chNum).Item(chSub).Item(analysisNum).Item(detailA).Add detailB, New Dictionary 'For Analysis detail #2
'ElseIf InStr(1, SearchString2, "Price") Then
' detailA = datasheet.Cells(i, 2)
' ReviewCollection.Item(chNum).Item(chSub).Item(analysisNum).Item(detailA).Item(detailB).Add detailC, New Dictionary 'For Analysis detail #3
End If
Next i
'Loop to print out the dictionary
i = 1
For Each dictKey1 In ReviewCollection.Keys
reportsheet.Cells(i, 1) = dictKey1 'Review number
If ReviewCollection.Item(dictKey1).Count > 0 Then
For Each dictKey2 In ReviewCollection.Item(dictKey1).Keys
reportsheet.Cells(i, 2) = dictKey2 'Review topic
If ReviewCollection.Item(dictKey1).Item(dictKey2).Count > 0 Then
For Each dictKey3 In ReviewCollection.Item(dictKey1).Item(dictKey2).Keys 'Report Number
reportsheet.Cells(i, 3) = dictKey3
If ReviewCollection.Item(dictKey1).Item(dictKey2).Item(dictKey3).Count > 0 Then
For Each dictKey4 In ReviewCollection.Item(dictKey1).Item(dictKey2).Item(dictKey3).Keys 'Analysis detail #1
reportsheet.Cells(i, 4) = dictKey4
'START of the printing for the problematic area
If ReviewCollection.Item(dictKey1).Item(dictKey2).Item(dictKey3).Item(dictKey4).Count > 0 Then
For Each dictKey5 In ReviewCollection.Item(dictKey1).Item(dictKey2).Item(dictKey3).Item(dictKey4).Keys 'Analysis detail #2
reportsheet.Cells(i, 5) = dictKey5
If ReviewCollection.Item(dictKey1).Item(dictKey2).Item(dictKey3).Item(dictKey4).Item(dictKey5).Count > 0 Then
For Each dictKey6 In ReviewCollection.Item(dictKey1).Item(dictKey2).Item(dictKey3).Item(dictKey4).Item(dictKey5).Keys 'Analysis detail #3
reportsheet.Cells(i, 6) = dictKey6
Next dictKey6
Else
i = i + 1 'no reports, so moves down to prevent overwriting change number
End If
Next dictKey5
Else
i = i + 1 'no reports, so moves down to prevent overwriting change number
End If
'END of the problematic area
Next dictKey4
Else
i = i + 1 'no reports, so moves down to prevent overwriting change number
End If
Next dictKey3
Else
i = i + 1 'no reports, so moves down to prevent overwriting change number
End If
Next dictKey2
Else
i = i + 1 'no change subject, so moves down to prevent overwriting change number
End If
Next dictKey1
End Sub
I am also open to any other improvements. My logic seems very heavy but I couldn't get it working even to this degree with any of the other ways (tried using more loops and less if-structures).
I plan on trimming the contents of the cells to only include the numbers but this is a worry for the future. I already have made working excel formulas for this.

Website data table scraper

Before I ask my question, I'm an amateur coder with basically no meaningful experience beyond VBA in ms office apps (I know - noob!)
I'm trying to create a web scraper using VBA to import data into excel and as per my comments in the below extract of code, the best I've been able to find on this is was in the winning answer to this question.
Below, I'm using investing.com as an example but in reality my project will be across multiple sites and will feed into a matrices which will be updating daily and self cannibalizing as events expire - For this reason I'd rather front-up the workload on the code side to make the inputs on an ongoing basis as minimal as possible (for me).
With that in mind, can I ask if there's a way to do any of the following (brace yourself, this will be cringe-worthy basic knowledge for some):
Is there a way in which I can and navigate to a url and run a for each loop on every table on that page (without have a known id for any)? this is to speed up my code as much as it's to minimise my inputs as there'll be quite a bit of data to be updated and I was planning on putting a 2 minute looping trigger on the refresh.
Instead of doing what I've been doing below, is it possible to reference a table, rather than a row, and do something along the lines of Cells(2,5).value to return the value within row 1, column 4? (assuming that both the array indexing starts at 0 in both dimensions?) Further to that, my first column (my primary key in some ways) may not be in the same order on all sources so is there a way in which I could do the equivalent to Columns("A:A").Find(What:=[Primary key], After:=Cells(1, 1), LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=True, SearchFormat:=False).Row to find what row within the table relates to the even I'm looking for?
Code :
Sub Scraper()
Dim appIE, allRowOfData As Object
' As per https://stackoverflow.com/questions/27066963/scraping-data-from-website-using-vba
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page
.Visible = False
End With
Do While appIE.Busy
Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again
Loop
Set allRowOfData = appIE.document.getElementById("pair_8907")
'tr id="[ID of row within table]"
Dim myValue As String: myValue = allRowOfData.Cells(8).innerHTML
'The 8 is the column number of the table
'(note: column numbers start at 0 so the 9th column should have "8" entered here
Set appIE = Nothing
Range("A1").Value = myValue
End Sub
If you want to use Excel functions to navigate the tables why not dump the tables first onto a worksheet this code works for me
Option Explicit
Sub Scraper()
Dim appIE As Object
' As per http://stackoverflow.com/questions/27066963/scraping-data-from-website-using-vba
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page
.Visible = True
End With
Do While appIE.Busy
DoEvents
Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again
Loop
'Debug.Print TypeName(appIE.document)
Dim doc As Object 'MSHTML.HTMLDocument
Set doc = appIE.document
'* appIE busy is good but you need to wait for the whole document to completely load and initialise so use this
While doc.readyState <> "complete"
DoEvents
Wend
'* we can select all the tables because they share the same CSS class name
Dim tablesSelectedByClass As Object 'MSHTML.HTMLElementCollection
Set tablesSelectedByClass = doc.getElementsByClassName("genTbl")
'* you can change this, it was just convenient for me to add sheets to my workbook
Dim shNewResults As Excel.Worksheet
Set shNewResults = ThisWorkbook.Worksheets.Add
Dim lRowCursor As Long '* this controls pasting down the sheet
lRowCursor = 1
Dim lTableIndexLoop As Long
For lTableIndexLoop = 0 To tablesSelectedByClass.Length - 1
Dim tableLoop As Object 'MSHTML.HTMLTable
Set tableLoop = tablesSelectedByClass.Item(lTableIndexLoop)
If LenB(tableLoop.ID) > 0 Then '* there are some extra nonsense tables, this subselects
Dim sParentColumn As String, objParentColumn As Object ' MSHTML.HTMLSemanticElement
Set objParentColumn = FindMyColumn(tableLoop, sParentColumn) '* need to understand is table on left hand or right hand side
Dim vHeader As Variant: vHeader = Empty
If sParentColumn = "leftColumn" Then
'* tables on the left have a preceding H3 element with the table's description
Dim objH3Headers As Object
Set objH3Headers = objParentColumn.getElementsByTagName("H3")
vHeader = objH3Headers.Item(lTableIndexLoop).innerText
Else
'* tables on the right have a hidden attribute we can use
vHeader = tableLoop.Attributes.Item("data-gae").Value
If Len(vHeader) > 3 Then
vHeader = Mid$(vHeader, 4)
Mid$(vHeader, 1, 1) = Chr(Asc(Mid$(vHeader, 1, 1)) - 32)
End If
End If
'* tables on the right do not have column headers
Dim bHasColumnHeaders As Boolean
bHasColumnHeaders = (tableLoop.ChildNodes.Length = 2)
Dim vTableCells() As Variant '* this will be our table data container which we will paste in one go
Dim lRowCount As Long: lRowCount = 0
Dim lColumnCount As Long: lColumnCount = 0
Dim lDataHeadersSectionIdx As Long: lDataHeadersSectionIdx = 0
Dim objColumnHeaders As Object: Set objColumnHeaders = Nothing
If bHasColumnHeaders Then
Set objColumnHeaders = tableLoop.ChildNodes.Item(0).ChildNodes.Item(0)
lRowCount = lRowCount + 1
lDataHeadersSectionIdx = 1
Else
lDataHeadersSectionIdx = 0
End If
Dim objDataRows As Object 'MSHTML.HTMLElementCollection
Set objDataRows = tableLoop.ChildNodes.Item(lDataHeadersSectionIdx).ChildNodes
lColumnCount = objDataRows.Item(0).ChildNodes.Length
lRowCount = lRowCount + objDataRows.Length
ReDim vTableCells(1 To lRowCount, 1 To lColumnCount) As Variant
'* we have them get the column headers
Dim lColLoop As Long
If bHasColumnHeaders Then
For lColLoop = 1 To lColumnCount
vTableCells(1, lColLoop) = objColumnHeaders.ChildNodes.Item(lColLoop - 1).innerText
Next
End If
'* get the data cells
Dim lRowLoop As Long
For lRowLoop = 1 To lRowCount - VBA.IIf(bHasColumnHeaders, 1, 0)
For lColLoop = 1 To lColumnCount
vTableCells(lRowLoop + VBA.IIf(bHasColumnHeaders, 1, 0), lColLoop) = objDataRows.Item(lRowLoop - 1).ChildNodes.Item(lColLoop - 1).innerText
Next
Next
'* paste our table description
shNewResults.Cells(lRowCursor, 1).Value2 = vHeader
lRowCursor = lRowCursor + 1
'* paste our table data
shNewResults.Cells(lRowCursor, 1).Resize(lRowCount, lColumnCount).Value2 = vTableCells
lRowCursor = lRowCursor + lRowCount + 1
End If
Next
End Sub
Function FindMyColumn(ByVal node As Object, ByRef psColumn As String) As Object
'* this code ascends the DOM looking for "column" in the id of each node
While InStr(1, node.ID, "column", vbTextCompare) = 0 And Not node.ParentNode Is Nothing
DoEvents
Set node = node.ParentNode
Wend
If InStr(1, node.ID, "column", vbTextCompare) > 0 Then
Set FindMyColumn = node
psColumn = CStr(node.ID)
End If
End Function
By the way, if you trade a lot the brokers get rich and you get poor, brokerage charges really impact in long run.

Read a table in outlook mail using macro

I'm writing a macro to read the below Email:
Start Date: July-07-2016
Name Accept Approved
John Yes No
Peter No No
I'm good with search the word "Start date" and get the next 13 character to copy and paste that in a text file. But my problem is the next part is in a Table format. So when I'm searching for the name "John" and trying to copy the next 10 Characters. It doesn't work.
Is there a way to search for the word "Accept" and get the First Row data(Which will be No) and then Second Row data(Which will be No)? Is that possible?
This EMail's table will have only 2 Rows. So, I don't need any dynamic way to get the data. Can someone guide me?
I've tried searching the internet first, but the solutions are too huge for me to understand. Is there any simple way?
I have even tried the solution give here: How to read table pasted in outlook message body using vba? but that method works when the body has ONLY TABLE. But my EMail will have text as well as table.
I've never actually programmed in vba, but I think I can help (a bit) nevertheless.
In the answer on the post you linked to, there is the line
Set msg = ActiveExplorer.Selection.item(1)
I think you can change this to something like
Set msg = Right(ActiveExplorer.Selection.item(1), 25)
to get rid of the text before the table (I got the Right part from here: http://www.exceltrick.com/formulas_macros/vba-substring-function/, but it should also work in Outlook).
This way, you run the code on the table itself instead of on the whole message.If there is also text after the table, it might be more difficult, but you might get that done by searching for the table ending.
I hope this helps!
Attempt 2
After some searching and thinking, I came up with the idea to get the html of the message and use that to parse the table (Ok, not really, I got it from the comments here: http://www.codeproject.com/Questions/567073/Howplustoplusrecognizeplusandplusreadplustableplus). Based on that and other sources, it is possible to write a code that gets the table from an email.
I've written some code that might work, but I couldn't test it as I do not have Outlook. Also, this is my first time writing vba, so there may be a lot of syntax errors (and the code is ugly).
Sub GetTable()
Dim msg As Outlook.mailItem
Dim html As String
Dim tableBegin As String
Dim tableEnd As String
Dim posTableBegin As Long
Dim posTableEnd As Long
Dim table As String
Dim rowBegin As String
Dim rowEnd As String
Dim rowCount As Long
Dim columnBegin As String
Dim columnBeginLen As Long
Dim columnEnd As String
Dim posRowBegin As Long
Dim posRowEnd As Long
Dim values As String(0, 3)
Dim beginValue0 As Long
Dim beginValue1 As Long
Dim beginValue2 As Long
Dim EndValue0 As Long
Dim EndValue1 As Long
Dim EndValue2 As Long
' Get the message and the html
Set msg = ActiveExplorer.Selection.item(1)
html = msg.HTMLbody
' Get the begin and end positions of the table (within the html)
tableBegin = "<table>"
tableEnd = "</table>"
posTableBegin = InStr(1, html, tableBegin)
posTableEnd = InStr(posTableBegin, html, tableEnd)
' Get the html table
table = Mid(html, posTableBegin + Len(tableBegin), posTableEnd - posTableBegin - Len(tableBegin))
' Set the variables for the loop
rowBegin = "<tr>"
rowEnd = "</tr>"
rowCount = 0
columnBegin = "<td>"
columnBeginLen = Len(columnBegin)
columnEnd = "</td>"
' Loop trough all rows
posRowBegin = InStr(lastPos, table, rowBegin)
Do While posRowBegin != 0
' Get the end from the current row
posRowEnd = InStr(posRowBegin, table, rowEnd)
rowCount = rowCount + 1
' Make the array larger
ReDim Preserve values(rowCount + 1, 3)
' Get the contents from that row
row = Mid(table, posRowBegin + Len(rowBegin), posRowEnd - posRowBegin - Len(rowBegin))
' Get the three values from that row (name, Accept, Approved) and put it in the array
beginValue0 = InStr(1, row, columnBegin) + columnBeginLen
endValue0 = InStr(beginValue0, row, columnEnd)
beginValue1 = InStr(endValue0, row, columnBegin) + columnBeginLen
endValue1 = InStr(beginValue1, row, columnEnd)
beginValue2 = InStr(endValue1, row, columnBegin) + columnBeginLen
endValue2 = InStr(beginValue2, row, columnEnd)
values(rowCount, 0) = Mid(row, beginValue0, endValue0)
values(rowCount, 1) = Mid(row, beginValue1, endValue1)
values(rowCount, 2) = Mid(row, beginValue2, endValue2)
' Get the beginning of the next row
posRowBegin = InStr(lastPos, table, rowBegin)
Loop
' The values are now in the (double) array 'values'.
' values(0, [1-3]) contains the headers.
End Sub
As said before, the original idea came from http://www.codeproject.com/Questions/567073/Howplustoplusrecognizeplusandplusreadplustableplus. Additionally, I used Word VBA how to select text between two substrings and assign to variable? and the Microsoft documentation to write this.
While it is likely that the code does not work out of the box, I think it still gets the general idea (and some specifics) across, so that it can be used as a guide. I hope this is the solution you need!
You can actually use the Word Object Model to parse out the text from the table - assuming that the email is in HTML format.
Get a Word.Document object from the Inspector.WordEditor property and use Word objects and methods to get the text, like the following below example from MSDN. Just replace ActiveDocument with the variable you declare and set from WordEditor.
Sub ReturnCellContentsToArray()
Dim intCells As Integer
Dim celTable As Cell
Dim strCells() As String
Dim intCount As Integer
Dim rngText As Range
If ActiveDocument.Tables.Count >= 1 Then
With ActiveDocument.Tables(1).Range
intCells = .Cells.Count
ReDim strCells(intCells)
intCount = 1
For Each celTable In .Cells
Set rngText = celTable.Range
rngText.MoveEnd Unit:=wdCharacter, Count:=-1
strCells(intCount) = rngText
intCount = intCount + 1
Next celTable
End With
End If
End Sub

loading formatted data in VBA from a text file

I'm looking for the best way of loading formatted data in VBA. I’ve spent quite some time trying to find the equivalent of C-like or Fortran-like fscanf type functions, but without success.
Basically I want to read from a text file millions of numbers placed on many (100,000’s) lines with 10 numbers each (except the last line, possibly 1-10 numbers). The numbers are separated by spaces, but I don’t know in advance the width of each field (and this width changes between data blocks).
e.g.
397143.1 396743.1 396343.1 395943.1 395543.1 395143.1 394743.1 394343.1 393943.1 393543.1
-0.11 -0.10 -0.10 -0.10 -0.10 -0.09 -0.09 -0.09 -0.09 -0.09
0.171 0.165 0.164 0.162 0.158 0.154 0.151 0.145 0.157 0.209
Previously I’ve used the Mid function but in this case I can’t, because I don’t know in advance the width of each field. Also it's too many lines to load in an Excel sheet. I can think of a brute force way in which I look at each successive character and determine whether it’s a space or a number, but it seems terribly clumsy.
I’m also interested in pointers on how to write formatted data, but this seems easier -- just format each string and concatenate them using &.
The following snippet will read whitespace-delimited numbers from a text file:
Dim someNumber As Double
Open "YourDataFile.txt" For Input As #1
Do While Not (EOF(1))
Input #1, someNumber
`// do something with someNumber here...`
Loop
Close #1
update: Here is how you could read one line at a time, with a variable number of items on each line:
Dim someNumber As Double
Dim startPosition As Long
Dim endPosition As Long
Dim temp As String
Open "YourDataFile" For Input As #1
Do While Not (EOF(1))
startPosition = Seek(1) '// capture the current file position'
Line Input #1, temp '// read an entire line'
endPosition = Seek(1) '// determine the end-of-line file position'
Seek 1, startPosition '// jump back to the beginning of the line'
'// read numbers from the file until the end of the current line'
Do While Not (EOF(1)) And (Seek(1) < endPosition)
Input #1, someNumber
'// do something with someNumber here...'
Loop
Loop
Close #1
You could also use regular expressions to replace multiple whitespaces to one space and then use the Split function for each line like the example code shows below.
After 65000 rows have been processed a new sheet will be added to the Excel workbook so the source file can be bigger than the max number of rows in Excel.
Dim rx As RegExp
Sub Start()
Dim fso As FileSystemObject
Dim stream As TextStream
Dim originalLine As String
Dim formattedLine As String
Dim rowNr As Long
Dim sht As Worksheet
Dim shtCount As Long
Const maxRows As Long = 65000
Set fso = New FileSystemObject
Set stream = fso.OpenTextFile("c:\data.txt", ForReading)
rowNr = 1
shtCount = 1
Set sht = Worksheets.Add
sht.Name = shtCount
Do While Not stream.AtEndOfStream
originalLine = stream.ReadLine
formattedLine = ReformatLine(originalLine)
If formattedLine <> "" Then
WriteValues formattedLine, rowNr, sht
rowNr = rowNr + 1
If rowNr > maxRows Then
rowNr = 1
shtCount = shtCount + 1
Set sht = Worksheets.Add
sht.Name = shtCount
End If
End If
Loop
End Sub
Function ReformatLine(line As String) As String
Set rx = New RegExp
With rx
.MultiLine = False
.Global = True
.IgnoreCase = True
.Pattern = "[\s]+"
ReformatLine = .Replace(line, " ")
End With
End Function
Function WriteValues(formattedLine As String, rowNr As Long, sht As Worksheet)
Dim colNr As Long
colNr = 1
stringArray = Split(formattedLine, " ")
For Each stringItem In stringArray
sht.Cells(rowNr, colNr) = stringItem
colNr = colNr + 1
Next
End Function