I found some code on Stack Exchange, of all places, and was able to modify it 95% of the way for my needs but one last issue keeps cropping up, all the DIVs in the parent DIV paste into one cell, I want them to post to individual cells in my worksheet. The code came from Stack Overflow user "Portland Runner" and the original post can be found here. The HTML I'm up against looks something like this:
<div class="right-header">
<div>Entry 1</div>
<div>Entry 2</div>
<div>Entry 3</div>
<div>Entry 4</div>
<div>Entry 5</div>
<div>Entry 6</div>
</div>
The child DIVs have no IDs, Classes, or Styles, just information surrounded by a lonesome DIV tag. This all gets dumped into a single cell where I'd like instead for it to be dumped into Al (Entry 1), B1 (Entry 2), C1 (Entry 3), etc. The original code is as follows:
Sub extract()
Dim IE As InternetExplorer
Dim html As HTMLDocument
Set IE = New InternetExplorerMedium
IE.Visible = False
IE.Navigate2 "C:\Users\john\Documents\Test.html"
' Wait while IE loading
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Set html = IE.document
Set holdingsClass = html.getElementsByClassName("right-header")
Dim results As Variant
results = Split(holdingsClass(0).textContent, vbLf)
cntr = 1
For i = LBound(results) To UBound(results)
If Trim(results(i)) <> "" Then
Select Case Right(Trim(results(i)), 1)
Case "<div>"
Range("B" & cntr) = CStr(Trim(results(i)))
Case "%"
Range("C" & cntr).Value = Trim(results(i))
cntr = cntr + 1
Case 0
Range("C" & cntr).Value = Trim(results(i))
Case Else
Range("A" & cntr).Value = Trim(results(i))
End Select
End If
Next i
Sheets("Sheet3").Range("A1").Value = holdingsClass(0).textContent
'Cleanup
IE.Quit
Set IE = Nothing
End Sub
Thank you any and all for your help!
Compiled but not tested:
Sub extract()
Dim IE As InternetExplorer
Dim topDiv, div, childDivs, tc As String, cntr
Set IE = New InternetExplorerMedium
IE.Visible = False
IE.Navigate2 "C:\Users\john\Documents\Test.html"
' Wait while IE loading
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Set topDiv = IE.document.getElementsByClassName("right-header")(0)
Set childDivs = topDiv.getElementsByTagName("div")
cntr = 2
For Each div In childDivs
tc = Trim(div.textContent)
If tc <> "" Then
Select Case Right(tc, 1)
Case "<div>"
'not sure whether you should be seeing HTML in textcontent...?
Range("B" & cntr) = CStr(tc)
Case "%"
Range("C" & cntr).Value = tc
cntr = cntr + 1
Case 0
Range("C" & cntr).Value = tc
Case Else
Range("A" & cntr).Value = tc
End Select
End If
Next div
Sheets("Sheet3").Range("A1").Value = topDiv.textContent
'Cleanup
IE.Quit
Set IE = Nothing
End Sub
Related
I am working on a website data extractor. I have two worksheets one for input and other for output, which looks like this..
In the first sheet the cell contains the URL needed to extract data. I am trying this URL
https://www.propertyguru.com.sg/singapore-property-listing/property-for-sale?limit=30&market=residential&property_type_code%5B%5D=4S&property_type=H&freetext=Yishun
I have written this macro..
Sub extractTablesData()
Dim IE As Object, obj As Object
Dim str, e As String
Dim pgf, pgt, pg As Integer
Dim ele, Results As Object
Dim add, size, cno, price, inurl, sp, sp1 As String
Dim isheet, rts As Worksheet
Dim LastRow As Long
Dim pgno As Variant
Set IE = CreateObject("InternetExplorer.Application")
Set isheet = Worksheets("InputSheet")
Set rts = Worksheets("Results")
URL = isheet.Cells(3, 2)
RowCount = 1
rts.Range("A" & RowCount) = "Address"
rts.Range("B" & RowCount) = "Size"
rts.Range("C" & RowCount) = "Contact Number"
rts.Range("D" & RowCount) = "Price"
rts.Range("E" & RowCount) = "Url"
LastRow = rts.Cells(Rows.Count, 2).End(xlUp).Row
'RowCount = LastRow
With IE
.Visible = True
.navigate (URL)
DoEvents
Do While IE.Busy Or IE.readyState <> 4
Loop
'Application.Wait (Now + #12:00:05 AM#)
For Each Results In .document.all
Select Case Results.className
Case "title search-title"
str = Results.innerText
str1 = Split(str, " ")
str = CInt(str1(0))
End Select
If Results.className = "btn btn-main-inverted dropdown-toggle" And InStr(1, Results.Title, " page") > 2 Then
str2 = Results.Title
str1 = Split(str2, " ")
str2 = CInt(str1(0))
End If
Next
If str2 = 0 Then
pgno = CVErr(xlErrDiv0)
Else
pgno = WorksheetFunction.RoundUp(str / str2, 0)
End If
End With
IE.Quit
Set IE = Nothing
UrlS = Split(URL, "?")
Url1 = UrlS(0)
Url2 = "?" & UrlS(1)
For i = 1 To pgno
Set IE = CreateObject("InternetExplorer.Application")
URL = Url1 & "/" & i & Url2
With IE
.Visible = True
.navigate (URL)
DoEvents
Do While IE.Busy Or IE.readyState <> 4
Loop
'Application.Wait (Now + #12:00:08 AM#)
For Each ele In .document.all
Select Case ele.className
Case "listing-img-a"
inurl = ele.href
rts.Cells(LastRow + 1, 5) = inurl
Case "listing-location"
LastRow = LastRow + 1
add = ele.innerText
rts.Cells(LastRow, 1) = add
Case "lst-sizes"
sp = Split(ele.innerText, " ยท")
size = sp(0)
rts.Cells(LastRow, 2) = size
Case "pgicon pgicon-phone js-agent-phone-number" ' btn-tools" 'pgicon pgicon-phone js-agent-phone-number" 'agent-phone-number"
rts.Cells(LastRow, 3) = ele.innerText
Case "listing-price"
price = ele.innerText
rts.Cells(LastRow, 4) = price
End Select
Next
LastRow = rts.Cells(Rows.Count, 2).End(xlUp).Row
rts.Activate
rts.Range("A" & LastRow).Select
End With
IE.Quit
Set IE = Nothing
Application.Wait (Now + #12:00:04 AM#)
Next i
MsgBox "Success"
End Sub
When I run this macro I am getting the error
Type Miss Match
When I debug it highlights the code
For i = 1 To pgno
Set IE = CreateObject("InternetExplorer.Application") URL = Url1 & "/" & i & Url2
With IE .Visible = True .navigate (URL)
I have tried my best to figure it out but could not understand where the problem is. Please help me to make correction..
It is also not getting the whole records on the link. This link contains more than 200 Records as per page is 30 records.
You can rely on implicit conversion and use the following. Assuming all pages do have numbering. You might want to improve error handling. I default to page numbers = 1 if the penultimate li CSS selector fails, otherwise it attempts to get the last page number before the ">"
Refer to my prior answer to your related question which shows you how to more effiently scrape the info off the page.
Sample code to show function being used:
Option Explicit
Public Sub GetListings()
Dim IE As New InternetExplorer, pgno As Long
With IE
.Visible = True
.navigate "https://www.propertyguru.com.sg/singapore-property-listing/property-for-sale?limit=30&market=residential&property_type_code%5B%5D=4S&property_type=H&freetext=Yishun", False
While .Busy Or .readyState < 4: DoEvents: Wend
pgno = GetNumberOfPages(.document)
End With
End Sub
Public Function GetNumberOfPages(ByVal doc As HTMLDocument) As Long
On Error GoTo errhand:
GetNumberOfPages = doc.querySelector(".listing-pagination li:nth-last-child(2)").innerText
Exit Function
errhand:
If Err.Number <> 0 Then GetNumberOfPages = 1
End Function
The below code opens an instance of InternetExplorer and downloads odds. It works fine but occasionally a pop-up window appears which causes the code to not work. Any help on how to navigate the below pop-up (i.e. click 'continue to oddschecker') when the pop-up does appear?
<a class="continue beta-callout js-close-class" onclick='s_objectID="javascript:void(0)_9";return this.s_oc?this.s_oc(e):true' href="javascript:void(0)">Continue to Oddschecker</a>
Full code:
Sub Oddschecker()
Dim ie, wp As Object
Dim i As Integer
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = False
ie.Navigate "https://www.oddschecker.com/horse-racing/racing-coupon"
Do While ie.Busy
DoEvents
Loop
Do While ie.ReadyState <> 4
DoEvents
Loop
Set wp = ie.Document
'Application.ActiveSheet.UsedRange.ClearContents
Application.Worksheets("sheet1").UsedRange.ClearContents
i = 2
For Each rw In wp.getElementsByTagName("table")(0).getElementsByTagName("tr")
If rw.className = "date" Then
Worksheets("sheet1").Range("A1") = rw.innerText
ElseIf rw.className = "fixture-name" Then
i = i + 1
Worksheets("sheet1").Range("A" & i) = rw.getElementsByTagName("td")(0).innerText
i = i + 1
ElseIf rw.className = "coupons-table-row match-on" Then
For Each od In rw.getElementsByTagName("p")
If InStr(od.innerText, "(") <> 0 Then
Worksheets("sheet1").Range("A" & i) = Trim(Left(od.innerText, InStr(od.innerText, "(") - 1))
np = Trim(Right(od.innerText, Len(od.innerText) - InStr(od.innerText, "(")))
Worksheets("sheet1").Range("B" & i) = Left(np, Len(np) - 1)
i = i + 1
Else
Worksheets("sheet1").Range("A" & i) = Trim(od.innerText)
i = i + 1
End If
Next od
End If
Next rw
ie.Quit
Range("A1:B" & i).WrapText = False
Columns("A:B").EntireColumn.AutoFit
Set wp = Nothing
Set ie = Nothing
End Sub
If you wish to continue with that page (navigating to that popup page), you can try like:
Dim HTML As HTMLDocument, addcheck As Object
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend ''(You can write it the way you feel comfortable)
Set HTML = IE.document ''place this line after the prevous line
Set addcheck = HTML.querySelector("#promo-modal a.continue")
If Not addcheck Is Nothing Then
addcheck.Click
End If
But, that is not a good idea cause it will lead you to some page where you might need to do some activity to get back on this data ridden page.
I suppose you should get rid of that popup blocker by ticking the cross button located on the top right area and continue to do what you are doing:
Dim HTML As HTMLDocument, addcheck As Object
While IE.Busy = True Or IE.readyState < 4: DoEvents: Wend ''(You can write it the way you feel comfortable)
Set HTML = IE.document ''place this line after the prevous line
Set addcheck = HTML.querySelector("#promo-modal span[title='Close")
If Not addcheck Is Nothing Then
addcheck.Click
End If
If I didn't understand what your intention was, do let me know. Thanks.
I have the following code, worked up from my earlier question which was answered Tim Williams. However, a few minutes after it was working I figured out something else that was a new requirement and wasn't posed with the original question. Given the code below and the following HTML structure how can I modify the code to extract data from the second or even third DIV using the same "right-header" class? The child DIVs have no class or ID, they are just wrappers.
Here is the HTML:
<div class="right-header">
<div>Entry 1</div>
<div>Entry 2</div>
<div>Entry 3</div>
<div>Entry 4</div>
<div>Entry 5</div>
<div>Entry 6</div>
</div>
<div class="right-header">
<div>Entry 1</div>
<div>Entry 2</div>
<div>Entry 3</div>
<div>Entry 4</div>
<div>Entry 5</div>
<div>Entry 6</div>
</div>
<div class="right-header">
<div>Entry 1</div>
<div>Entry 2</div>
<div>Entry 3</div>
<div>Entry 4</div>
<div>Entry 5</div>
<div>Entry 6</div>
</div>
Here is the modified VBA from Tim Williams:
Sub extract()
Dim IE As InternetExplorer
Dim topDiv, div, childDivs, tc As String, cntr
Set IE = New InternetExplorerMedium
IE.Visible = False
IE.Navigate2 "C:\Users\john\Documents\Test.html"
' Wait while IE loading
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
Set topDiv = IE.document.getElementsByClassName("right-header")(0)
Set childDivs = topDiv.getElementsByTagName("div")
cntr = 2
For Each div In childDivs
tc = Trim(div.textContent)
If tc <> "" Then
Select Case Right(tc, 1)
Case "<div>"
'not sure whether you should be seeing HTML in textcontent...?
Range("B" & cntr) = CStr(tc)
Case "%"
Range("C" & cntr).Value = tc
cntr = cntr + 1
Case 0
Range("C" & cntr).Value = tc
Case Else
Range("A" & cntr).Value = tc
End Select
End If
cntr = cntr + 1
Next div
Sheets("Sheet3").Range("A1").Value = topDiv.textContent
'Cleanup
IE.Quit
Set IE = Nothing
End Sub
Thank you everyone, sorry for the additional question so close to the original.
you can just put it in a loop if the number of divs are known
Sub extract()
Dim IE As InternetExplorer
Dim topDiv, div, childDivs, tc As String, cntr
Set IE = New InternetExplorerMedium
IE.Visible = False
IE.Navigate2 "C:\Nitesh\test.html"
' Wait while IE loading
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
For i = 0 To 2
Set topDiv = IE.document.getElementsByClassName("right-header")(i)
Set childDivs = topDiv.getElementsByTagName("div")
cntr = 2
For Each div In childDivs
tc = Trim(div.textContent)
If tc <> "" Then
Select Case Right(tc, 1)
Case "<div>"
'not sure whether you should be seeing HTML in textcontent...?
Range("B" & cntr).Offset(0, i) = CStr(tc)
Case "%"
Range("C" & cntr).Offset(0, i).Value = tc
cntr = cntr + 1
Case 0
Range("C" & cntr).Offset(0, i).Value = tc
Case Else
Range("A" & cntr).Offset(0, i).Value = tc
End Select
End If
cntr = cntr + 1
Next div
Next i
End Sub
and offset all your outputs by i to get the results in a new column.
Sub extract()
Dim IE As InternetExplorer
Dim topDivs, topDiv, div, childDivs, tc As String, cntr
Set IE = New InternetExplorerMedium
IE.Visible = False
IE.Navigate2 "C:\Users\john\Documents\Test.html"
' Wait while IE loading
Do While IE.Busy
Application.Wait DateAdd("s", 1, Now)
Loop
cntr = 2
'get all the top-level divs
Set topDivs = IE.document.getElementsByClassName("right-header")
'loop over the top-level divs
For Each topDiv In topDivs
'get child divs for this top-level div
Set childDivs = topDiv.getElementsByTagName("div")
For Each div In childDivs
tc = Trim(div.textContent)
If tc <> "" Then
Select Case Right(tc, 1)
Case "<div>"
'not sure whether you should be seeing HTML in textcontent...?
Range("B" & cntr) = CStr(tc)
Case "%"
Range("C" & cntr).Value = tc
cntr = cntr + 1
Case 0
Range("C" & cntr).Value = tc
Case Else
Range("A" & cntr).Value = tc
End Select
End If
cntr = cntr + 1
Next div
Next topDiv
'Sheets("Sheet3").Range("A1").Value = topDiv.textContent
'Cleanup
IE.Quit
Set IE = Nothing
End Sub
I have been trying to get this code to work for workflow efficiency purposes, but I cannot seem to make it function correctly.
Steps:
1. Login to Amazon Seller
Use order numbers in column A and place them in searchbox to search
Search for element innerText of "Estimated Delivery:" and scrape information into column B adjacent the order number
Move onto the next order number and repeat process until order number column is empty.
The webpage code (what I'm trying to obtain is highlighted):
Option Explicit
Dim HTMLDoc As HTMLDocument
Dim MyBrowser As InternetExplorer
Sub MyAmazonSellereEDD()
Dim MyHTML_Element As IHTMLElement
Dim MyURL As String
Dim oSignInLink As HTMLLinkElement
Dim oInputEmail As HTMLInputElement
Dim oInputPassword As HTMLInputElement
Dim oInputSignInButton As HTMLInputButtonElement
'InputSearchOrder will be the destination for order numbers taken from the workbook
Dim InputSearchOrder As HTMLInputElement
Dim InputSearchButton As HTMLInputButtonElement
Dim IE As InternetExplorer
Dim AAOrder As Workbook
Dim AAws As Worksheet
Dim AAws2 As Worksheet
Dim R As Range
Dim x As Integer
Dim i As Long
Dim ar As Variant
Dim elems As IHTMLElementCollection
Dim TDelement As HTMLTableCell
Dim ExcludWords() As Variant, a As Range, b As Long, LR As Long
ExcludWords = Array("Estimated Delivery:")
MyURL = "https://sellercentral.amazon.com/gp/homepage.html"
Set IE = New InternetExplorer
' Open the browser and navigate.
With IE
.Silent = True
.navigate MyURL
.Visible = True
Do
DoEvents
Loop Until .readyState = READYSTATE_COMPLETE
End With
' Get the html document.
Set HTMLDoc = IE.document
With HTMLDoc
.all.Item("username").Value = "blankityblank#blank.net"
.all.Item("password").Value = "*********"
.all.Item("sign-in-button").Click
End With
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:08"))
'Set AAOrder = Application.Workbooks.Open("Z:\Automation Anywhere\5 Automated Tracking Imports\Amazon Prime\PrimeOrdersWithNoFulfillment.csv")
'Set AAws = AAOrder.Worksheets("PrimeOrdersWithNoFulfillment")
x = 2
'Do Until Range("A" & x) = ""
If Range("B" & x).Value = "" Then
'If AAws.Range("B" & x).Value = "" Then
'x = x + 1
Do Until Range("A" & x) = ""
Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field")
InputSearchOrder.Value = Range("A" & x)
Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0)
InputSearchButton.Click
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:05"))
Set elems = HTMLDoc.getElementsByTagName("td")
'ExcludWords = Array("Package Weight:", "Tracking ID:", "Ship Date:", "Carrier:", "Shipping Service:")
i = 2
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).Value = TDelement.innerText
i = i + 1
End If
Next
LR = Range("B" & Rows.Count).End(xlUp).Row
For i = 1 To LR
Set a = Cells(i, "B")
For b = 0 To UBound(ExcludWords)
a.Formula = Replace((a.Formula), ExcludWords(b), "")
Next b
Next i
'End If
x = x + 1
Loop
'Loop
End If
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
MsgBox ("Process is done! :)")
End Sub
My issue is that when it scrapes the data, the words "Estimated Delivery:" and the actual estimated delivery date it is supposed to scrape are separate, but still should be included in the output data in column B. What it's doing is finding and inserting only "Estimated Delivery:" and then using the code, it trims those characters as instructed. The space is still blank after that. I'm not sure what the issue is.
The TDelement you pick up in the following part of code only includes "Estimated Delivery:" in its innerText, the part with the date is actually a separate TDelement:
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).Value = TDelement.innerText
i = i + 1
End If
Next
As there is not any unique information in the html code (e.g. id, name etc.) to use to reference the TDelement that contains the date you could use the reference you already have in conjunction with NextSibling so that you get the element after the one that contains the text "Estimated Delivery:". Perhaps try this (unable to test anything at the moment but should work):
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).value = TDelement.NextSibling.innerText
i = i + 1
End If
Next
I have problem with scraping table data from this page [http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures]. I use this code, but do not scrape any data:
Public Sub ScrapTableData()
Dim sURL As String
Dim XMLHttpRequest As XMLHTTP
Dim HTMLDoc As New HTMLDocument
Dim elc As HTMLHtmlElement
Dim i As Integer
sURL = "http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures"
Set XMLHttpRequest = New MSXML2.XMLHTTP
XMLHttpRequest.Open "GET", sURL, False
XMLHttpRequest.responseXML.async = False
XMLHttpRequest.send
Do While XMLHttpRequest.Status <> 200
DoEvents
Loop
While XMLHttpRequest.ReadyState <> 4
DoEvents
Wend
HTMLDoc.body.innerHTML = XMLHttpRequest.responseText
' Tables
Dim tbl As HTMLTable, row As HTMLTableRow, cell As HTMLTableCell
i = 1
For Each tbl In HTMLDoc.getElementsByTagName("table")
For Each row In tbl.Rows
For Each cell In row.Cells
ActiveSheet.Cells(i, 5) = cell.innerText
i = i + 1
Next
Next
Next
End Sub
My code does not find HTML table tags.
Also, if I use this part of code, do not list all HTML tags (for example HTML DIV tag) and HTML that describes 6 buttons:
i = 0
Dim elc As HTMLHtmlElement
For Each elc In HTMLDoc.all
Worksheets("Tables").Range("A1").Offset(i, 0) = elc.tagName
i = i + 1
Next
6 buttons: Year, Quarter, Month,..., Day
I need to simulate click on them to display (scrape) different tables' data.
I don't think the XMLHTTP approach will work in this case, you need to open IE. The following code will do this. You may need to modify the loop to put data in your worksheet, I didn't tinker with this. At the end, I've also placed some code that will change the tabs. Hope this helps
Sub test()
' open IE, navigate to the website of interest and loop until fully loaded
Set IE = CreateObject("InternetExplorer.Application")
my_url = "http://www.eex.com/en/market-data/power/derivatives-market/phelix-futures"
With IE
.Visible = True
.navigate my_url
.Top = 50
.Left = 530
.Height = 400
.Width = 400
Do Until Not IE.Busy And IE.readyState = 4
DoEvents
Loop
End With
' Collect data from tables
Set tbl = IE.document.getElementsByTagName("table")
For Each itm In tbl
i = 1
For Each itm2 In itm.Rows
For Each cell In itm2.Cells
ActiveSheet.Cells(i, 5) = cell.innertext
i = i + 1
Next
Next
Next
' Click on the 6 buttons, substitute "week", "year", etc. for the button you want to click
Set Results = IE.document.getElementsByTagName("a")
For Each itm In Results
If InStr(1, itm.innertext, "month", vbTextCompare) > 0 Then
itm.Click
Do Until Not IE.Busy And IE.readyState = 4
DoEvents
Loop
Exit For
End If
Next
' Do whatever is next
End Sub