I am trying to get data from a different website using the vba code bellow, but I don't know how to identify the string inside the parenthesis in this statement "With htm.getelementbyid("comps-results"). How do I get the string in the parenthesis from, for example, this website
I would appreciate very much if someone could help me on this matter.
Thank you in advance.
Sub GetData()
Dim x As Long, y As Long
Dim htm As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://www.zillow.com/homes/comps/67083361_zpid/", False
.send
htm.body.innerhtml = .responsetext
End With
With htm.getelementbyid("comps-results")
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
Sheets(1).Cells(x + 1, y + 1).Value = .Rows(x).Cells(y).innertext
Next y
Next x
End With
End Sub
The getElementByID method takes a unique ID as an argument and returns a single HTML element if there is one with such an ID value.
Probably what you need to do is use the getElementsByTagName method, which returns a collection of matching elements. Since this may result in multiple matches, I find it best to create an object first, and an iterator variable:
Dim compresults
Dim el
Set compresults = htm.getelementsbytagname("comps-results")
For each el in compresults
MsgBox el.InnerText
Next
BTW, I am fairly certain ( but have not verified) that an HTMLElementCollection does not have a .Rows member, so the next line in your code will probably raise an error. Likewise, the .Rows does not have a .Length property, so there's at least two errors on that single line of code AND in the next line, note that .Cells does not have a .Length member, either, so another error.
For assistance with those parts of your code, I urge you to ask a new question. This answer addresses your original question.
Related
Excel 2013 on Windows 7. XPath/Javascript/jQuery is out of scope.
I am trying to iterate over select div elements in a page, namely elements that have a specific data-level attribute.
My current approach is similar to this, but I was unable to find a non-manual way to select elements based on attributes. The closest I came was something like:
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", url, False
.Send
pHTML.body.innerHTML = .ResponseText
End With
Set eCollection = pHTML.getElementsByClassName("chapter").getElementsByTagName("div")
For i = 0 To eCollection.Length
If eCollection(i).getAttribute("data-level") >= 0 Then ' Throw cake
Next i
This solution, while I am sure it is viable (if unelegant), seems sub-optimal if only for how big the loop is going to end up being when I start looking for specific elements and sequences of elements within these elements.
So I am looking for a way to do something like this:
For Each pElement In pHTML.getElementsByClassName("chapter").getElementsByTagName("div").getElementsByAttribute("data-level")
' Throw cake at the element
Next
I'm aware that there is no method getElementsByAttribute, hence the question.
Is there some approach here that I am blind to, or am I locked to manual iteration?
Alternatively, if I swap my current approach for creating an IE instance, รก la this answer, could I concievably use querySelectorAll to end up with something resembling the result I have outlined above?
For anyone else coming this way, the outer shell, so to speak, can look like this:
Sub ScrapeWithHTMLObj(url As String, domClassName As String, domTag As String, domAttribute As String, domAttributeValue As String)
' Dependencies:
' * Microsoft HTML Object Library
' Declare vars
Dim pHTML As HTMLDocument
Dim pElements As Object, pElement As Object
Set pHTML = New HTMLDocument
' Basic URL healthcheck
Do While (url = "" Or (Left(url, 7) <> "http://" And Left(url, 8) <> "https://"))
MsgBox ("Invalid URL!")
url = InputBox("Enter new URL: (0 to terminate)")
If url = "0" Then Exit Sub
Loop
' Fetch page at URL
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", url, False
.Send
pHTML.body.innerHTML = .ResponseText
End With
' Declare page elements
Set pElements = pHTML.getElementsByClassName(domClassName)
Set pElement = pElements(0).getElementsByTagName(domTag)
' Extract only elements with wanted attribute
pEleArray = getElementsByAttribute(pElement, domAttribute, domAttributeValue)
For Each e In pEleArray
' Do stuff to elements
Debug.Print e.getAttribute(domAttribute)
Next
End Sub
If you go this route, you'll also need something like this:
Function getElementsByAttribute(pObj As Object, domAttribute As String, domAttributeValue As String) As Object()
Dim oTemp() As Object
ReDim oTemp(1 To 1)
For i = 0 To pObj.Length - 1
'Debug.Print pObj(i).getAttribute(domAttribute)
If pObj(i).getAttribute(domAttribute) = domAttributeValue Then
Set oTemp(UBound(oTemp)) = pObj(i)
ReDim Preserve oTemp(1 To UBound(oTemp) + 1)
End If
Next i
ReDim Preserve oTemp(1 To UBound(oTemp) - 1)
getElementsByAttribute = oTemp
End Function
Depending on the HTML tree, you'll need to change which elements you zero in on in the sub, obviously. For the site I used in testing, this structure worked flawlessly.
Example usage:
Call ScrapeWithHTMLObj("https://somesite", "chapter-index", "div", "data-level", "1")
It will enter the first class named chapter-index, select all elements with the div tag, and finally extract all elements containing the attribute data-level with value 1.
I'm writing a macro to read the below Email:
Start Date: July-07-2016
Name Accept Approved
John Yes No
Peter No No
I'm good with search the word "Start date" and get the next 13 character to copy and paste that in a text file. But my problem is the next part is in a Table format. So when I'm searching for the name "John" and trying to copy the next 10 Characters. It doesn't work.
Is there a way to search for the word "Accept" and get the First Row data(Which will be No) and then Second Row data(Which will be No)? Is that possible?
This EMail's table will have only 2 Rows. So, I don't need any dynamic way to get the data. Can someone guide me?
I've tried searching the internet first, but the solutions are too huge for me to understand. Is there any simple way?
I have even tried the solution give here: How to read table pasted in outlook message body using vba? but that method works when the body has ONLY TABLE. But my EMail will have text as well as table.
I've never actually programmed in vba, but I think I can help (a bit) nevertheless.
In the answer on the post you linked to, there is the line
Set msg = ActiveExplorer.Selection.item(1)
I think you can change this to something like
Set msg = Right(ActiveExplorer.Selection.item(1), 25)
to get rid of the text before the table (I got the Right part from here: http://www.exceltrick.com/formulas_macros/vba-substring-function/, but it should also work in Outlook).
This way, you run the code on the table itself instead of on the whole message.If there is also text after the table, it might be more difficult, but you might get that done by searching for the table ending.
I hope this helps!
Attempt 2
After some searching and thinking, I came up with the idea to get the html of the message and use that to parse the table (Ok, not really, I got it from the comments here: http://www.codeproject.com/Questions/567073/Howplustoplusrecognizeplusandplusreadplustableplus). Based on that and other sources, it is possible to write a code that gets the table from an email.
I've written some code that might work, but I couldn't test it as I do not have Outlook. Also, this is my first time writing vba, so there may be a lot of syntax errors (and the code is ugly).
Sub GetTable()
Dim msg As Outlook.mailItem
Dim html As String
Dim tableBegin As String
Dim tableEnd As String
Dim posTableBegin As Long
Dim posTableEnd As Long
Dim table As String
Dim rowBegin As String
Dim rowEnd As String
Dim rowCount As Long
Dim columnBegin As String
Dim columnBeginLen As Long
Dim columnEnd As String
Dim posRowBegin As Long
Dim posRowEnd As Long
Dim values As String(0, 3)
Dim beginValue0 As Long
Dim beginValue1 As Long
Dim beginValue2 As Long
Dim EndValue0 As Long
Dim EndValue1 As Long
Dim EndValue2 As Long
' Get the message and the html
Set msg = ActiveExplorer.Selection.item(1)
html = msg.HTMLbody
' Get the begin and end positions of the table (within the html)
tableBegin = "<table>"
tableEnd = "</table>"
posTableBegin = InStr(1, html, tableBegin)
posTableEnd = InStr(posTableBegin, html, tableEnd)
' Get the html table
table = Mid(html, posTableBegin + Len(tableBegin), posTableEnd - posTableBegin - Len(tableBegin))
' Set the variables for the loop
rowBegin = "<tr>"
rowEnd = "</tr>"
rowCount = 0
columnBegin = "<td>"
columnBeginLen = Len(columnBegin)
columnEnd = "</td>"
' Loop trough all rows
posRowBegin = InStr(lastPos, table, rowBegin)
Do While posRowBegin != 0
' Get the end from the current row
posRowEnd = InStr(posRowBegin, table, rowEnd)
rowCount = rowCount + 1
' Make the array larger
ReDim Preserve values(rowCount + 1, 3)
' Get the contents from that row
row = Mid(table, posRowBegin + Len(rowBegin), posRowEnd - posRowBegin - Len(rowBegin))
' Get the three values from that row (name, Accept, Approved) and put it in the array
beginValue0 = InStr(1, row, columnBegin) + columnBeginLen
endValue0 = InStr(beginValue0, row, columnEnd)
beginValue1 = InStr(endValue0, row, columnBegin) + columnBeginLen
endValue1 = InStr(beginValue1, row, columnEnd)
beginValue2 = InStr(endValue1, row, columnBegin) + columnBeginLen
endValue2 = InStr(beginValue2, row, columnEnd)
values(rowCount, 0) = Mid(row, beginValue0, endValue0)
values(rowCount, 1) = Mid(row, beginValue1, endValue1)
values(rowCount, 2) = Mid(row, beginValue2, endValue2)
' Get the beginning of the next row
posRowBegin = InStr(lastPos, table, rowBegin)
Loop
' The values are now in the (double) array 'values'.
' values(0, [1-3]) contains the headers.
End Sub
As said before, the original idea came from http://www.codeproject.com/Questions/567073/Howplustoplusrecognizeplusandplusreadplustableplus. Additionally, I used Word VBA how to select text between two substrings and assign to variable? and the Microsoft documentation to write this.
While it is likely that the code does not work out of the box, I think it still gets the general idea (and some specifics) across, so that it can be used as a guide. I hope this is the solution you need!
You can actually use the Word Object Model to parse out the text from the table - assuming that the email is in HTML format.
Get a Word.Document object from the Inspector.WordEditor property and use Word objects and methods to get the text, like the following below example from MSDN. Just replace ActiveDocument with the variable you declare and set from WordEditor.
Sub ReturnCellContentsToArray()
Dim intCells As Integer
Dim celTable As Cell
Dim strCells() As String
Dim intCount As Integer
Dim rngText As Range
If ActiveDocument.Tables.Count >= 1 Then
With ActiveDocument.Tables(1).Range
intCells = .Cells.Count
ReDim strCells(intCells)
intCount = 1
For Each celTable In .Cells
Set rngText = celTable.Range
rngText.MoveEnd Unit:=wdCharacter, Count:=-1
strCells(intCount) = rngText
intCount = intCount + 1
Next celTable
End With
End If
End Sub
I have some problems with my excel VBA code, it does not work and yes, I do not know why...
I want to add each Record number once in a collection. My code looks like this:
For i = 1 To lo.ListRows.Count
Count = 1
Do While recordList.Count >= Count
recordFound = False
If lo.ListColumns("Record").DataBodyRange.Rows(i) = recordList(Count) Then
recordFound = True
End If
If recordFound = False Then
recordList.Add (lo.ListColumns("Record").DataBodyRange.Rows(i))
End If
Count = Count + 1
Loop
Next
What it does now, it returns empty collection...
Whould be great if you could help me guys!
There is no real need to test the Collection to see if the item exists if you give it a key.
You can code something like:
On Error Resume Next
For I = 1 To lo.ListRows.Count
With lo.ListColumns("Record").DataBodyRange.Rows(I)
RecordList.Add Item:=.Value, Key:=CStr(.Value)
End With
Next I
On Error GoTo 0
Adding an item with the same key will cause the operation to be rejected. If you are concerned about other errors than the duplicate key error, you can always check the error number in the inline code and branch depending on the results.
I haven't been able to test this with the reference to lo but it works with a reference to a range
Dim objDictionary As Object
Dim dictionaryKey As Variant
Dim i As Long
Set objDictionary = CreateObject("Scripting.Dictionary")
For i = 1 To lo.ListRows
objDictionary.Item(CStr(lo.ListColumns("Record").DataBodyRange.Rows(i))) = objDictionary.Item(CStr(lo.ListColumns("Record").DataBodyRange.Rows(i))) + 1
Next i
For Each dictionaryKey In objDictionary.keys
' Do something
Next dictionaryKey
I have used a dictionary object instead of a normal collection object as it should do what you are trying to do. Because the item is incremented each time, you can also return the count of each item by using
objDictionary.item(dictionaryKey)
Weird problem. Stepping through the code with inspections gives me correct answers. Just running it doesn't.
This program loops through each cell in a column, searching for a regex match. When it finds something, checks in a adjacent column to which group it belongs and keeps a count in a dictonary. Ex: Group3:7, Group5: 2, Group3:8
Just stepping through the code gives me incorrect results at the end, but adding and inspection for each known item in the dictionary does the trick. Using Debug.Print for each Dictionary(key) to check how many items I got in each loop also gives me a good output.
Correct // What really hapens after running the code
Group1:23 // Group1:23
Group3:21 // Group3:22
Group6:2 // Group6:2
Group7:3 // Group7:6
Group9:8 // Group9:8
Group11:1 // Group11:12
Group12:2 // Group12:21
Sub Proce()
Dim regEx As New VBScript_RegExp_55.RegExp
Dim matches
Dim Rango, RangoJulio, RangoAgosto As String
Dim DictContador As New Scripting.Dictionary
Dim j As Integer
Dim conteo As Integer
Dim Especialidad As String
regEx.Pattern = "cop|col"
regEx.Global = False 'True matches all occurances, False matches the first occurance
regEx.IgnoreCase = True
i = 3
conteo = 1
RangoJulio = "L3:L283"
RangoAgosto = "L3:L315"
Julio = Excel.ActiveWorkbook.Sheets("Julio")
Rango = RangoJulio
Julio.Activate
For Each celda In Julio.Range(Rango)
If regEx.Test(celda.Value) Then
Set matches = regEx.Execute(celda.Value)
For Each Match In matches
j = 13 'column M
Especialidad = Julio.Cells(i, j).Value
If (Not DictContador.Exists(Especialidad)) Then
Call DictContador.Add(Especialidad, conteo)
GoTo ContinueLoop
End If
conteo = DictContador(Especialidad)
conteo = CInt(conteo) + 1
DictContador(Especialidad) = conteo
Next
End If
ContinueLoop:
i = i + 1
'Debug.Print DictContador(key1)
'Debug.Print DictContador(key2)
'etc
Next
'Finally, write the results in another sheet.
End Sub
It's like VBA saying "I'm going to dupe you if I got a chance"
Thanks
Seems like your main loop can be reduced to this:
For Each celda In Julio.Range(Rango)
If regEx.Test(celda.Value) Then
Especialidad = celda.EntireRow.Cells(13).Value
'make sure the key exists: set initial count=0
If (Not DictContador.Exists(Especialidad)) Then _
DictContador.Add Especialidad, 0
'increment the count
DictContador(Especialidad) = DictContador(Especialidad) +1
End If
Next
You're getting different results stepping through the code because there's a bug/feature with dictionaries that if you inspect items using the watch or immediate window the items will be created if they don't already exist.
To see this put a break point at the first line under the variable declarations, press F5 to run to the break point, then in the immediate window type set DictContador = new Dictionary so the dictionary is initialised empty and add a watch for DictContador("a"). You will see "a" added as an item in the locals window.
Collections offer an alternative method that don't have this issue, they also show values rather than keys which may be more useful for debugging. On the other hand an Exists method is lacking so you would either need to add on error resume next and test for errors instead or add a custom collection class with an exists method added. There are trade-offs with both approaches.
before I start I want to point out that I tagged this question as VBA because I can't actually make a new tag for Winwrap and I've been told that Winwrap is pretty much the same as VBA.
I'm working on SPSS V19.0 and I'm trying to make a code that will help me identify and assign value labels to all values that don't have a label in the specified variable (or all variables).
The pseudo code below is for the version where it's a single variable (perhaps inputted by a text box or maybe sent via a custom dialogue in the SPSS Stats program (call the .sbs file from the syntax giving it the variable name).
Here is the Pseudo Code:
Sub Main(variable As String)
On Error GoTo bye
'Variable Declaration:
Dim i As Integer, intCount As Integer
Dim strValName As String, strVar As String, strCom As String
Dim varLabels As Variant 'This should be an array of all the value labels in the selected record
Dim objSpssApp As 'No idea what to put here, but I want to select the spss main window.
'Original Idea was to use two loops
'The first loop would fill an array with the value lables and use the index as the value and
'The second loop would check to see which values already had labels and then
'Would ask the user for a value label to apply to each value that didn't.
'loop 1
'For i = 0 To -1
'current = GetObject(variable.valuelist(i)) 'would use this to get the value
'Set varLabels(i) = current
'Next
'Loop for each number in the Value list.
strValName = InputBox("Please specify the variable.")
'Loop for each number in the Value list.
For i = 0 To varLabels-1
If IsEmpty (varLabels(i)) Then
'Find value and ask for the current value label
strVar = InputBox("Please insert Label for value "; varLabels(i);" :","Insert Value Label")
'Apply the response to the required number
strCom = "ADD VALUE LABELS " & strVar & Chr$(39) & intCount & Chr$(39) & Chr$(39) & strValName & Chr$(39) &" ."
'Then the piece of code to execute the Syntax
objSpssApp.ExecuteCommands(strCom, False)
End If
'intCount = intCount + 1 'increase the count so that it shows the correct number
'it's out of the loop so that even filled value labels are counted
'Perhaps this method would be better?
Next
Bye:
End Sub
This is in no way functioning code, it's just basically pseudo code for the process that I want to achieve I'm just looking for some help on it, if you could that would be magic.
Many thanks in advance
Mav
Winwrap and VBA are almost identical with differences that you can find in this post:
http://www.winwrap.com/web/basic/reference/?p=doc_tn0143_technote.htm
I haven't used winwrap, but I'll try to answer with my knowledge from VBA.
Dim varLabels As Variant
You can make an array out of this by saying for example
dim varLabels() as variant 'Dynamically declared array
dim varLabels(10) as variant 'Statically declared array
dim varLabels(1 to 10) as variant 'Array starting from 1 - which I mostly use
dim varLabels(1 to 10, 1 to 3) 'Multidimensional array
Dim objSpssApp As ?
"In theory", you can leave this as a variant type or even do
Dim objSpssApp
Without further declaration, which is basically the same - and it will work because a variant can be anything and will not generate an error. It is good custom though to declare you objects according to an explicit datatype in because the variant type is expensive in terms of memory. You should actually find out about the objects class name, but I cannot give you this. I guess that you should do something like:
set objSpssApp = new <Spss Window>
set objSpssApp = nothing 'In the end to release the object
Code:
'loop 1
For i = 0 To -1
current = GetObject(variable.valuelist(i)) 'would use this to get the value
Set varLabels(i) = current
Next
I don't exactly know why you want to count from 0 to -1 but perhaps it is irrelevant.
To fill an array, you can just do: varLabels(i) = i
The SET statement is used to set objects and you don't need to create an object to create an array. Also note that you did not declare half of the variables used here.
Code:
strVar = InputBox("Please insert Label for value "; varLabels(i);" :","Insert Value Label")
Note that the concatenation operator syntax is &.
This appears to be the same in WinWrap:
http://www.winwrap.com/web/basic/language/?p=doc_operators_oper.htm
But you know this, since you use it in your code.
Code:
'intCount = intCount + 1 'increase the count so that it shows the correct number
'it's out of the loop so that even filled value labels are counted
'Perhaps this method would be better?
I'm not sure if I understand this question, but in theory all loops are valid in any situation, it depends on your preference. For ... Next, Do ... Loop, While ... Wend, in the end they all do basically the same thing. intCount = intCount + 1 seems valid when using it in a loop.
Using Next (for ... next)
When using a counter, always use Next iCounter because it increments the counter.
I hope this reply may be of some use to you!