Get website data from Urls using VBA - vba

I have multiple urls stored in Excel sheet. I want to Get data reside within particular div tag. For One Website it works fine
Sub Cityline()
Dim IE As Object
Set IE = CreateObject("Internetexplorer.application")
IE.Visible = True
IE.navigate "http://Someurl.com/bla/bla/bla"
Do While IE.busy
DoEvents
Loop
Do
DoEvents
Dim Doc As Object
Set Doc = IE.Document
Dim workout As String
workout = Doc.getElementsByClassName("CLASS_NAME_OF_DATA")(0).innertext
Range("A2") = workout
Loop
End Sub
I used Below code for loop Through all urls but its not working
Sub GetData()
Dim oHtm As Object: Set oHtm = CreateObject("HTMLFile")
Dim req As Object: Set req = CreateObject("msxml2.xmlhttp")
Dim oRow As Object
Dim oCell As Range
Dim url As String
Dim y As Long, x As Long
x = 1
For Each oCell In Sheets("sheet1").Range("A2:A340")
req.Open "GET", oCell.Offset(, 1).Value, False
req.send
With oHtm
.body.innerhtml = req.responsetext
With .getelementsbytagname("table")(1)
With Sheets(1)
.Cells(x, 1).Value = oCell.Offset(, -1).Value
.Cells(x, 2).Value = oCell.Value
End With
y = 3
For Each oRow In .Rows
Sheets(1).Cells(x, y).Value = oRow.Cells(1).innertext
y = y + 1
Next oRow
End With
End With
x = x + 1
Next oCell
End Sub
But its not working
can any one suggest me where i went wrong ?
I used Fetching Data from multiple URLs but it doesn't works for me.
Please guide me how to get data from all urls at a Time

I'm new to SO, so apologies to the mods if this should be in comments (I couldn't get it to fit).
I agree with Silver's comments, but I thought I'd suggest a different approach that might help. If you have URLs in a column of cells, you could create a custom VBA function that will extract the relevant data out of the HTML. Just use this function in the cells to the right of your URL to return the relevant data from the HTML. An example is this:
Public Function GetHTMLData(SiteURL As String, FieldSearch As String) As String
Dim IE As Object
Dim BodyHTML As String
Dim FieldStart As Integer
Dim FieldEnd As Integer
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Navigate SiteURL
Do While .Busy Or .ReadyState <> 4
DoEvents
Loop
BodyHTML = IIf(StrComp(.Document.Title, "Cannot find server", vbTextCompare) = 0, _
vbNullString, .Document.body.innerhtml)
FieldStart = InStr(1, BodyHTML, FieldSearch) + Len(FieldSearch) + 12
FieldEnd = InStr(FieldStart, BodyHTML, "<")
GetHTMLData = Mid(BodyHTML, FieldStart, FieldEnd - FieldStart)
.Quit
End With
Set IE = Nothing
End Function
The function above has 2 input parameters: the URL and a string that will be searched for within the HTML. It will then return a string from within the HTML, starting from 12 characters after the searched parameter and ending at the following '<' within the HTML.
Hope that helps.

Related

Extracting data from website to excel error

I am having a difficulty setting up data extraction from website to Excel.
I want to extract exact price of a product to excel.
So far I have this code:
Sub GetData()
Dim objIE As InternetExplorer 'Microsoft Internet Controls library added
Dim itemEle As Object
Dim data As String
Dim y As Integer
Set objIE = New InternetExplorer
objIE.Visible = True
objIE.navigate "https://www.nay.sk/samsung-ue55nu7172"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemEle In objIE.document.getElementsByClassName("price")
data = itemEle.getElementsByClassName("price")(0).innerText
y = y + 1
Next
data = Range("A1").Value
End Sub
What would you suggest?
Do you want every price?
You can list the first two for example this way:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, i As Long, html As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.nay.sk/samsung-ue55nu7172", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
Dim titles As Object, prices As Object
With html
.body.innerHTML = sResponse
Set titles = .querySelectorAll(".title")
Set prices = .querySelectorAll(".price")
End With
For i = 0 To 1
Debug.Print titles(i).innerText & prices(i).innerText
Next i
End Sub
That loop returns you these:
You actually have all the elements on the page with a price class stored in the object prices.
You can view all the prices by looping the length of that object/nodeList with:
For i = 0 To prices.Length - 1
Debug.Print Prices.item(i).innerText
Next i
Likewise you can loop the .Length of titles but note that it is a different length from prices. There are more prices on the page (or rather elements with a price class versus elements with a title class.
References (VBE>Tools>References):
HTML Object Library
Try this:
Sub GetData()
Dim objIE As New InternetExplorer 'Microsoft Internet Controls library added
Dim itemEle As Object
Dim data As String
Dim y As Integer
objIE.Visible = True
objIE.navigate "https://www.nay.sk/samsung-ue55nu7172"
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
y = 1
For Each itemEle In objIE.document.getElementsByClassName("price")
Cells(y, 1) = itemEle.outertext
y = y + 1
Next
End Sub
This is what you get:
To get the correct properties of the itemEle:
put a stop sign on the line from the printscreen below
select itemEle with your mouse
press Shift+F9

Need to split the data in the cell into different columns and help in copying data from website to excel using vba

So i am copying the traffic data from this website.
I have used the following code so far:
Sub main()
Dim IE As InternetExplorer
Dim i
Set IE = New InternetExplorer
IE.Navigate "https://www.cp24.com/mobile/commuter-centre/traffic"
Do
DoEvents
Loop Until IE.ReadyState = ReadyState_Complete
Dim Doc As HTMLDocument
Set Doc = IE.Document
Dim AllRoute As String
Set holdingsClass =
Doc.getElementsByClassName("trafficWidget")
ActiveSheet.Range("A1").Value = holdingsClass(0).textContent
IE.Quit
End Sub
There are two problems i am facing
1) It's copying all the data in traffic widget class into one cell so its deleting data when the cell runs out of space
2) I want a way to split the data so right now everything shows up in one cell
It should look like this
col.A col.B col.C col.D
HighwayName Current Ideal Delay
Any guidance would be appreciated?
Here you go using CSS selectors to target the information required.
Option Explicit
Sub Getinfo()
Dim http As New XMLHTTP60, html As New HTMLDocument '< XMLHTTP60 is for Excel 2016 so change according to your versione.g. XMLHTTP for 2013
Const URL As String = "https://www.cp24.com/mobile/commuter-centre/traffic"
Application.ScreenUpdating = False
With http
.Open "GET", URL, False
.send
html.body.innerHTML = .responseText
End With
Dim routeNodeList As Object, currentNodeList As Object, idealNodeList As Object, delayNodeList As Object
With html
Set routeNodeList = .querySelectorAll(".location")
Set currentNodeList = .querySelectorAll(".current")
Set idealNodeList = .querySelectorAll(".ideal")
Set delayNodeList = .querySelectorAll(".delaymin")
End With
Dim i As Long
For i = 0 To routeNodeList.Length - 1
With ActiveSheet
.Cells(i + 2, 1) = routeNodeList.item(i).innerText
.Cells(i + 2, 2) = currentNodeList.item(i).innerText
.Cells(i + 2, 3) = idealNodeList.item(i).innerText
.Cells(i + 2, 4) = delayNodeList.item(i).innerText
End With
Next i
Application.ScreenUpdating = True
End Sub
References required (VBE > Tools > References):
HTML Object library and MS XML < your version
Example output:
Late bound version:
Option Explicit
Public Sub Getinfo()
Dim http As Object, html As Object, i As Long
Const URL As String = "https://www.cp24.com/mobile/commuter-centre/traffic"
Application.ScreenUpdating = False
With CreateObject("MSXML2.serverXMLHTTP")
.Open "GET", URL, False
.send
Set html = CreateObject("HTMLFile")
html.body.innerHTML = .responseText
End With
Dim counter As Long: counter = 1
With ActiveSheet
For i = 0 To html.all.Length - 1
Select Case html.all(i).className
Case "location"
counter = counter + 1
.Cells(counter, 1).Value = html.all(i).innerText
Case "current"
.Cells(counter, 2).Value = html.all(i).innerText
Case "ideal"
.Cells(counter, 3).Value = html.all(i).innerText
Case "delaymin"
.Cells(counter, 4).Value = html.all(i).innerText
End Select
Next i
End With
Application.ScreenUpdating = True
End Sub

Extracting data from a live website

I am a bit new to using VBA to extract data from a website to excel and i was wondering if you guys can help me on extracting data from a website to excel using vba and then make it run every hour?
I can use the code to educate myself on it
Basically, i want to go to the following website
And then just copy paste the travel times for all the roads into excel. This would include Current, Ideal and delay.
Would it be possible to do that?
You will need to add two references in your VBA library to run this code.
To add required references: (VBE > Tools > References)
Microsoft HTML Object Library & Microsft Internet Controls
After running this, you will notice that you will need to Split the string outputs to isolate the individual variables (current, ideal, delay) that you want and then you will need to systematically assign these to a table in your excel.
You should do some research on web-scraping to fully understand what is happening. If you navigate to the page > right click on a route > inspect element > you will see the below code is pulling from the tag "tr". ("tr")(3) will corrospond with the 3rd route detailed on the site.
Sub MainSub()
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Navigate "https://www.cp24.com/mobile/commuter-centre/traffic"
Do
DoEvents
Loop Until IE.ReadyState = ReadyState_Complete
Dim Doc As HTMLDocument
Set Doc = IE.Document
Dim FirstRoute As String
Dim SecondRoute As String
FirstRoute = Trim(Doc.getElementsByTagName("tr")(1).innerText)
SecondRoute = Trim(Doc.getElementsByTagName("tr")(2).innerText)
MsgBox FirstRoute & vbNewLine & vbNewLine & vbNewLine & SecondRoute
IE.Quit
Set IE = Nothing
End Sub
This should do what you want!
Option Explicit
Sub Web_Table_Option_One()
Dim xml As Object
Dim html As Object
Dim objTable As Object
Dim result As String
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
With xml
.Open "GET", "https://www.cp24.com/mobile/commuter-centre/traffic", False
.send
End With
result = xml.responseText
Set html = CreateObject("htmlfile")
html.body.innerHTML = result
Set objTable = html.getElementsByTagName("Table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End Sub
Set appropriate references...

Excel Amazon Seller Web Scraper Issue

I have been trying to get this code to work for workflow efficiency purposes, but I cannot seem to make it function correctly.
Steps:
1. Login to Amazon Seller
Use order numbers in column A and place them in searchbox to search
Search for element innerText of "Estimated Delivery:" and scrape information into column B adjacent the order number
Move onto the next order number and repeat process until order number column is empty.
The webpage code (what I'm trying to obtain is highlighted):
Option Explicit
Dim HTMLDoc As HTMLDocument
Dim MyBrowser As InternetExplorer
Sub MyAmazonSellereEDD()
Dim MyHTML_Element As IHTMLElement
Dim MyURL As String
Dim oSignInLink As HTMLLinkElement
Dim oInputEmail As HTMLInputElement
Dim oInputPassword As HTMLInputElement
Dim oInputSignInButton As HTMLInputButtonElement
'InputSearchOrder will be the destination for order numbers taken from the workbook
Dim InputSearchOrder As HTMLInputElement
Dim InputSearchButton As HTMLInputButtonElement
Dim IE As InternetExplorer
Dim AAOrder As Workbook
Dim AAws As Worksheet
Dim AAws2 As Worksheet
Dim R As Range
Dim x As Integer
Dim i As Long
Dim ar As Variant
Dim elems As IHTMLElementCollection
Dim TDelement As HTMLTableCell
Dim ExcludWords() As Variant, a As Range, b As Long, LR As Long
ExcludWords = Array("Estimated Delivery:")
MyURL = "https://sellercentral.amazon.com/gp/homepage.html"
Set IE = New InternetExplorer
' Open the browser and navigate.
With IE
.Silent = True
.navigate MyURL
.Visible = True
Do
DoEvents
Loop Until .readyState = READYSTATE_COMPLETE
End With
' Get the html document.
Set HTMLDoc = IE.document
With HTMLDoc
.all.Item("username").Value = "blankityblank#blank.net"
.all.Item("password").Value = "*********"
.all.Item("sign-in-button").Click
End With
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:08"))
'Set AAOrder = Application.Workbooks.Open("Z:\Automation Anywhere\5 Automated Tracking Imports\Amazon Prime\PrimeOrdersWithNoFulfillment.csv")
'Set AAws = AAOrder.Worksheets("PrimeOrdersWithNoFulfillment")
x = 2
'Do Until Range("A" & x) = ""
If Range("B" & x).Value = "" Then
'If AAws.Range("B" & x).Value = "" Then
'x = x + 1
Do Until Range("A" & x) = ""
Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field")
InputSearchOrder.Value = Range("A" & x)
Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0)
InputSearchButton.Click
Do
DoEvents
Loop Until IE.readyState = READYSTATE_COMPLETE
Application.Wait (Now + TimeValue("0:00:05"))
Set elems = HTMLDoc.getElementsByTagName("td")
'ExcludWords = Array("Package Weight:", "Tracking ID:", "Ship Date:", "Carrier:", "Shipping Service:")
i = 2
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).Value = TDelement.innerText
i = i + 1
End If
Next
LR = Range("B" & Rows.Count).End(xlUp).Row
For i = 1 To LR
Set a = Cells(i, "B")
For b = 0 To UBound(ExcludWords)
a.Formula = Replace((a.Formula), ExcludWords(b), "")
Next b
Next i
'End If
x = x + 1
Loop
'Loop
End If
Err_Clear:
If Err <> 0 Then
Err.Clear
Resume Next
End If
MsgBox ("Process is done! :)")
End Sub
My issue is that when it scrapes the data, the words "Estimated Delivery:" and the actual estimated delivery date it is supposed to scrape are separate, but still should be included in the output data in column B. What it's doing is finding and inserting only "Estimated Delivery:" and then using the code, it trims those characters as instructed. The space is still blank after that. I'm not sure what the issue is.
The TDelement you pick up in the following part of code only includes "Estimated Delivery:" in its innerText, the part with the date is actually a separate TDelement:
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).Value = TDelement.innerText
i = i + 1
End If
Next
As there is not any unique information in the html code (e.g. id, name etc.) to use to reference the TDelement that contains the date you could use the reference you already have in conjunction with NextSibling so that you get the element after the one that contains the text "Estimated Delivery:". Perhaps try this (unable to test anything at the moment but should work):
For Each TDelement In elems
If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then
Range("B" & x).value = TDelement.NextSibling.innerText
i = i + 1
End If
Next

Excel VBA Macro: Scraping data from site table that spans multiple pages

Thanks in advance for the help. I'm running Windows 8.1, I have the latest IE / Chrome browsers, and the latest Excel. I'm trying to write an Excel Macro that pulls data from StackOverflow (https://stackoverflow.com/tags). Specifically, I'm trying to pull the date (that the macro is run), the tag names, the # of tags, and the brief description of what the tag is. I have it working for the first page of the table, but not for the rest (there are 1132 pages at the moment). Right now, it overwrites the data everytime I run the macro, and I'm not sure how to make it look for the next empty cell before running.. Lastly, I'm trying to make it run automatically once per week.
I'd much appreciate any help here. Problems are:
Pulling data from the web table beyond the first page
Making it scrape data to the next empty row rather than overwriting
Making the Macro run automatically once per week
Code (so far) is below. Thanks!
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'Cells.Clear
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Next
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A3").CurrentRegion.EntireColumn.AutoFit
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1:D1").Merge
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub
There's no need to scrape Stack Overflow when they make the underlying data available to you through things like the Data Explorer. Using this query in the Data Explorer should get you the results you need:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
The permalink to that query is here and the "Download CSV" option which appears after the query runs is probably the easiest way to get the data into Excel. If you wanted to automate that part of things, the direct link to the CSV download of results is here
You can improve this to parse out exact elements but it loops all the pages and grabs all the tag info (everything next to a tag)
Option Explicit
Public Sub ImportStackOverflowData()
Dim ie As New InternetExplorer, html As HTMLDocument
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "https://stackoverflow.com/tags"
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Dim numPages As Long, i As Long, info As Object, item As Object, counter As Long
numPages = html.querySelector(".page-numbers.dots ~ a").innerText
For i = 1 To 2 ' numPages ''<==1 to 2 for testing; use to numPages
DoEvents
Set info = html.getElementById("tags_list")
For Each item In info.getElementsByClassName("grid-layout--cell tag-cell")
counter = counter + 1
Cells(counter, 1) = item.innerText
Next item
html.querySelector(".page-numbers.next").Click
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Next i
Application.ScreenUpdating = True
.Quit '<== Remember to quit application
End With
End Sub
I'm not making use of the DOM, but I find it very easy to get around just searching between known tags. If ever the expressions you are looking for are too common just tweak the code a bit so that it looks for a string after a string).
An example:
Public Sub ZipLookUp()
Dim URL As String, xmlHTTP As Object, html As Object, htmlResponse As String
Dim SStr As String, EStr As String, EndS As Integer, StartS As Integer
Dim Zip4Digit As String
URL = "https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=1&companyName=&address1=1642+Harmon+Street&address2=&city=Berkeley&state=CA&urbanCode=&postalCode=&zip=94703"
Set xmlHTTP = CreateObject("MSXML2.XMLHTTP")
xmlHTTP.Open "GET", URL, False
On Error GoTo NoConnect
xmlHTTP.send
On Error GoTo 0
Set html = CreateObject("htmlfile")
htmlResponse = xmlHTTP.ResponseText
If htmlResponse = Null Then
MsgBox ("Aborted Run - HTML response was null")
Application.ScreenUpdating = True
GoTo End_Prog
End If
'Searching for a string within 2 strings
SStr = "<span class=""address1 range"">" ' first string
EStr = "</span><br />" ' second string
StartS = InStr(1, htmlResponse, SStr, vbTextCompare) + Len(SStr)
EndS = InStr(StartS, htmlResponse, EStr, vbTextCompare)
Zip4Digit = Left(Mid(htmlResponse, StartS, EndS - StartS), 4)
MsgBox Zip4Digit
GoTo End_Prog
NoConnect:
If Err = -2147467259 Or Err = -2146697211 Then MsgBox "Error - No Connection": GoTo End_Prog 'MsgBox Err & ": " & Error(Err)
End_Prog:
End Sub