Retrieve Website Data Excel VBA - vba

First, I am very new with VBA and have only got as far as I am with solutions from other questions asked online. What I have is a macro that opens IE to a specified URL, enters text into a search, loads the results, then loops the search through more specific search values.
What I am trying to do is scrape the results of a search into excel. However, the results don't appear in the resulting HTML code but look to be generated by a script on the website.
An example of the page I am searching:
https://www.gamestop.com/PickUpAtStore/75083/0/917850
When loaded the results are found on the page, but not in the page source. Looking at the page source there looks to be a script that pulls the results in:
<script id="stores" type="text/x-handlebars-template">
{{#if this}}
<ul>
{{#each this}}
<li id="{{StoreNumber}}|{{#if true}}917850 {{/if}}" class="{{#if false}}checkOnly{{/if}}"">
<div class="fluidWrapper ats-storelist" id="{{StoreNumber}}">
<div class="contactInfo">
<div class="title ats-storetitle">{{DisplayName}}</div>
<div class="address ats-storeaddress">
{{{AddressStreet}}}<br />{{AddressCityStateZip}}
</div>
<div class="phoneNumber ats-storephone">
{{Phone}}
</div>
</div>
<div class="rightInfo">
<div class="distance ats-storedistance">{{Distance}} {{#if true}}<i id="showHoldOptions_{{StoreNumber}}" class="{{#if false}} plus_{{/if}}icon"></i>{{/if}}</div>
</div>
</div>
..................
Ideally, what I would like to happen is when the results are loaded the store name, address and phone # are put into excel starting at A4, B4, C4 and adding each store to the next line.
Am I looking in the entirely wrong place to grab these results? I appreciate any help solving this.
edit adding current macro:
Sub Search_Cell()
Dim ie As Object
Dim lRow As Long
Dim URL As Range
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
For Each URL In Range("B1")
ie.navigate URL.Value
Application.StatusBar = "Submitting"
While ie.Busy
DoEvents
Wend
Next
For lRow = 1 To 89
With ie.document
.all("puas_search").Value = Sheets("Zipcodes").Range("A" & lRow).Value
.getElementById("puas_search").Focus
End With
Application.SendKeys ("~")
Application.Wait Now + #12:00:02 AM# 'wait 2 seconds
' Get results of search
' Add Store name to A4, Address to B4, Phone# to C4 (but for following searches start at the next empty row)
' Add following results to next row
Next lRow 'loop to next search
ie.Quit
Set ie = Nothing
MsgBox "Done"
End Sub

I solved this, I was entirely wrong thinking that the results couldn't be scraped from the html. Thank You #Tigregalis for nudging me in the right direction.
Here is the snippet of code that pulls the data I need, places it in the correct location in excel, then moves the to next row.
Set HTMLDoc = IE.document
Set Stores = HTMLDoc.getElementsByClassName("contactInfo")
For Each Store In Stores
ColNum = 1
For Each Name In Store.Children
Cells(RowNum, ColNum) = Name.innerText
ColNum = ColNum + 1
Next Name
RowNum = RowNum + 1
Next Store

Related

Starting JavaScript query in IE from VBA

I am trying to develop a macro that will be able to automatically check the format of tracking number in a sheet, decide which courier site to use, and then get the status of shipment. Until now it's going pretty well, since both tnt and dhl's results have tracking no in it's address, but I got kinda stuck with ups, where the submission of form seems to be done via javascript.
I managed to trick it by focusing on input box and making my macro send keys tab, tab and enter after inputting no, but it's not so elegant and the success rate seems to be about 70% from my testing.
I tried all different variations of getelementsby..., but nothing really seemed to submit the form successfully.
Is there a way to call the JavaScript to start query, or maybe another way of submitting the form itself?
Added an edited chunk of code for testing as well as html for both what seems to be code for the button and for the JavaScript :
sub ups()
Dim objIE
Dim Website
Dim Element
Dim cRng As Range
Set cRng = "1Z30RY119130505965"
Set objIE = "https://www.ups.com/tracking/tracking.html"
With objIE
.visible = True
.navigate Website
Do While objIE.Busy Or objIE.ReadyState <> 4
DoEvents
Loop
Set Element = .Document.GetElementsByName("trackNums")
'a part of code I found useful for when previous Do
'While fails and events start executing too early
Dim btimer As Integer
btimer = 0
Do While Element.Item(3).Value = 0
Application.Wait (Now + TimeValue("0:00:01"))
btimer = 4 Then
Exit Do
End If
Loop
Element.Item(3).Focus
Element.Item(3).Value = cRng.Value
SendKeys "{TAB}"
SendKeys "{TAB}"
SendKeys "{~}"
<div class="btnBar">
<input name="track.x" class="ups-cta ups-cta_primary" type="submit" value="Track"></input>
</div>
<script language="JavaScript1.2" type="text/javascript">
$(function)
{
Tracking.setLocaleString('en_KR');
Tracking.setLoadingText('Loading. Please wait a moment.');
Tracking.setErrorMessage3('The systen is unable to process your request at this time. Please try again later.');
Tracking.addInatructionTextForyinputPage("Enter up to 25 tracking or InfoNotice numbers, one per line.", 'y');
Thanks for any ideas in advance.

Grab third element in html table row VBA

I am working on a personal project where I am getting some information from a website.
The website has a table. I know how to loop through each row and For each row I want to compare information with the contents of the first and fourth columns and, where both are matches to information from my table, copy the content of the fifth column in that row into my table but do not how to go about doing this.
This is what each row in the table html looks like.
<tr class="player_tr_1" data-url="/18/player/1/Pelé">
<td class="table-row-text" style="text-align: left;">
<img style="padding: 0;" class="player_img player_right_curve
form rating icon gold rare" src="./FIFA 18 Players _
FUTBIN_files/237067.png">
//Check info here
<a href="https://www.futbin.com/18/player/1/Edson%20Arantes"
class="player_name_players_table"> CHECK THIS TEXT </a>
</td> <td><span class="form rating icon gold rare">98</span> </td>
<td class="">CAM</td>
//Check info here
<td class="">CHECK THIS INFO</td>
//Grab info here
<td><span class="ps4_color"> GRAB THIS INFO </span></td>
<td><span class="xb1_color">0</span></td>
<td><span class="pc_color">0</span></td>
<td><span class="yellow_players_stat">76</span></td>
<td>173cm | 5'8"</td>
<td>77</td>
<td>516</td>
<td>2513</td>
</tr>
Here is the code I have so far:
It iterates through each row in the table and in that loop I need to grab info from that row. I commented in the loop for what I need to do specifically
Private Sub CommandButton1_Click()
SearchBot
End Sub
'start a new subroutine called SearchBot '
Sub SearchBot()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLHtmlElement
Dim X As Integer 'integer variable we'll use as a counter
Dim version As String 'string variable that will hold our version '
Dim NumRows As Integer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
' Set numrows = number of rows of data.
NumRows = Range("A2", Range("A2").End(xlDown)).Rows.Count + 1
' Select cell a1.
Range("A2").Select
' Establish "For" loop to loop "numrows" number of times.
' screen is updated every time a new price is inserted '
Application.ScreenUpdating = True
For X = 2 To NumRows
' go to player page with correct name '
objIE.navigate "https://www.futbin.com/18/players?page=1&search=" & Sheets("Sheet1").Range("A" & X).Value
' wait '
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
' loop through each row '
For Each aEle In objIE.document.getElementsByTagName("tr")
' check player name, if correct loop through table to grab price if it is also the right version maybe or'
' or grab first elements text content, check if correct name, check 4th element for correct version, grab 5th element, and insert in table '
'if 1st element = Sheets("Sheet1").Range("A" & X).Value & "" '
' And 4th element = Sheets("Sheet1").Range("B" & X).Value Then '
'Sheets("Sheet1").Range("F" & X).Value = 5th element '
Next
Next
' click the correct version of the player '
'close the browser
objIE.Quit
MsgBox ("Done")
'exit our SearchBot subroutine '
End Sub
EDIT: As you can see in the html, the elements I'm trying to grab do not have class names.
Thanks in advance!
There seems to be only one element without any ID/class that you can use to identify it, so that you'll have to get by index while you get the rest by class name.
Dim FirstColumn As String
Set FirstColumn = aEle.getElementsByClassName("player_name_players_table")(1).innerText
Dim FourthColumn As String
Set FourthColumn = aEle.getElementsByTagName("td")(4).innerText
Dim FifthColumn As String
Set FifthColumn = aEle.getElementsByClassName("ps4_color")(1).innerText
Note that I'm not very used to VBA (I'm a VB.NET guy) so I'm not sure whether all the syntax is correct, nor if the returned arrays are starts at zero or one. :)

I want to extract the element webpage title on vba excel

I am trying to extract from this website using VBA the element Title in format text on this code
<div class="train-logo
train-logo-monochrome
train-logo--trenitalia-monochrome
train-logo--frecciarossa-monochrome" title="Trenitalia">
</div>
VBA:
For Each treno In IE.Document.getelementsbytagname("div")
If treno.getattribute("title") Then
Cells(rownumber2, "u").Value = treno.innertext
Something like this should work:
Dim treno, t
For Each treno In IE.Document.getelementsbytagname("div")
t = treno.getattribute("title")
If t <> "" Then Cells(rownumber2, "u").Value = t
Next treno
It's not clear from your question how many of these you need to read from the page: if just one you can Exit For after getting the value.

Extracting a specific varying element from website source code

I'm trying to extract a specific link from a website and I'm having trouble pulling into a String.
I have to search about 5000 companies from a website and all of the links vary. A link to the source code of an example company (Nokia) is this: view-source:http://finder.fi/yrityshaku/Nokia+oyj this is the part I'm looking at:
<div class="itemName">
<!-- Yritysnimi -->
<!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle -->
<a href="/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159838" class="resultGray">
I want to extract the Substring between
<!-- Yritysnimi -->
<!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle -->
<a href="
and
" class="resultGray">
this substring will vary with each company I search and so I will only know what the strings are around the substring I'm trying to extract.
I've tried to use browserIE.Document.body.innerHTML
Sub Macro1()
Set browserIE = CreateObject("InternetExplorer.Application")
browserIE.Top = 0
browserIE.Left = 800
browserIE.Width = 800
browserIE.Height = 1200
browserIE.Visible = True
Set ws = ThisWorkbook.Worksheets("Sheet1")
browserIE.Navigate ("http://www.finder.fi/yrityshaku")
Do
DoEvents
Loop Until browserIE.ReadyState = 4
browserIE.Document.getElementById("companysearchform_query_companySearchTypename").Click
browserIE.Document.getElementById("SearchInput").Value = "nokia oyj"
browserIE.Document.getElementById("SearchSubmit").Click
Application.Wait (Now + TimeValue("0:00:4"))
codeArea = Mid(V, InStr(V, "<div class=""itemName""> <!-- Yritysnimi --> <!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle --> <a href="""), Len(V))
Debug.Print codeArea
theLink = Mid(codeArea, 117, InStr(codeArea, """ class=""resultGray"">" - 1))
End Sub
but I get an invalid procedure call or argument
I've researched some but I haven't found a suitable solution yet. Some have suggested pulling just an element from the source code and others copying the whole source code into a string variable. As a person who's not too expert in vba I'd prefer pulling the whole code into a string as I think this way would be easier to understand.
Original website (in finnish) http://finder.fi/yrityshaku/nokia+oyj
You need to locate all of the <div> elements with a classname of itemName. Loop through those to find the <a> element(s) and use the first one to get the href property.
Sub Macro1()
Dim browserIE As Object, ws As Worksheet
Set browserIE = CreateObject("InternetExplorer.Application")
browserIE.Top = 0
browserIE.Left = 800
browserIE.Width = 800
browserIE.Height = 1200
browserIE.Visible = True
Set ws = ThisWorkbook.Worksheets("Sheet1")
browserIE.Navigate ("http://www.finder.fi/yrityshaku")
Do While browserIE.ReadyState <> 4 And browserIE.Busy: DoEvents: Loop
browserIE.Document.getElementById("companysearchform_query_companySearchTypename").Click
browserIE.Document.getElementById("SearchInput").Value = "nokia oyj"
browserIE.Document.getElementById("SearchSubmit").Click
Do While browserIE.ReadyState <> 4 And browserIE.Busy: DoEvents: Loop
'Application.Wait (Now + TimeValue("0:00:4"))
Dim iDIV As Long
With browserIE.Document.body
If CBool(.getelementsbyclassname("itemName").Length) Then
'there is at least one div with the itemName class
For iDIV = 0 To .getelementsbyclassname("itemName").Length - 1
With .getelementsbyclassname("itemName")(iDIV)
If CBool(.getelementsbytagname("a").Length) Then
'there is at least one anchor element inside this div
Debug.Print .getelementsbytagname("a")(0).href
End If
End With
Next iDIV
End If
End With
End Sub
I added Microsoft HTML Object library and Microsoft Internet controls to the project via the VBE's Tools ► References.
Results from the Immediate window.
http://www.finder.fi/Televiestint%C3%A4laitteita+ja+palveluja/Nokia+Oyj/ESPOO/yhteystiedot/159843
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/SALO/yhteystiedot/960395
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/TAMPERE/yhteystiedot/853264
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/ESPOO/yhteystiedot/2931747
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/ESPOO/yhteystiedot/2931748
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/TAMPERE/yhteystiedot/835172
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159838
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/SALO/yhteystiedot/159839
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159850
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159857

Excel VBA HTML retrieve

I have a fully functional macro that goes through a list of personnel records and works out if they are leavers or never started. The only issue is its a very slow process when search the string created with all the HTML code (~10000 characters)
I was wondering if there is a way to restrict the retrieve to be just part of the webpage
The macro i am currently using is below, this macro iterates through each row and pulls in the code from the URL for each persons personnel page
Sub RetrieveEndDate()
Dim myArray() As Variant, Search As Variant
Dim strURL As String, strCSV As String, dbClose As String
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Call LogOn
RowsWithData = Application.CountA(Range("A:A"))
For R = 2 To RowsWithData
Application.StatusBar = R & " Out of " & RowsWithData
UKNo = Cells(R, 1).Value
strURL = "http://www.pers.fs.com/People_Detail.asp?Pers_no=" & UKNo & "&mode=CURRENT"
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", strURL, False
http.Send
strCSV = http.responseText
Cells(R, 3).Value = strCSV
'Works of if employee has left, never started or if neither of them leaves blank
If InStr(1, strCSV, "Employee has Left") > 0 Then
Cells(R, 2).Value = "Left"
ElseIf InStr(1, strCSV, "Non-Starter") > 0 Then
Cells(R, 2).Value = "Did not start"
Else
Cells(R, 2).Value = ""
End If
Set http = Nothing
Next R
1
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
The retrieve from the webpage is ~10000 characters long, but the info i am interested in is at the beginning of the page as below "(Employee has Left)"which is on the 3rd from bottom line
<head>
<title> List</title>
<link rel="stylesheet" href="_stylesheets/atc.css" type="text/css">
</head>
<body CLASS="Skill" >
<form name="People_Detail" method="Post" action=History_list.asp>
<P><INPUT id="Pers_No" type = "HIDDEN" name="Pers_No" value=UK111111 ></P>
<P><INPUT id="mode" type = "HIDDEN" name="mode"Value="HISTORY_LIST"></P>
<Table Border = 0 CellPadding = 0 width = 100% >
<TR><TR><TD Colspan = 2 ><H1 id=Test name=test>Current Active Record<BR>(Employee has Left)</H1><TD align = right>
<P><INPUT id="btnSubmit" name="btnSubmit" type="SUBMIT" value="View Record History List"></P>
</TD></TD></TR></TR>
AFAIK there is no way do this with XMLHTTP.
This KB article contains code that performs a download using the WinInet API.
The While bDoLoop loop reads the URL in Len(sReadBuffer) chunks, you can modify this to add a condition and exit the loop whenever you like.
If you wanted to begin the download at a specific offset (and the server supports it) you could also try InternetSetFilePointer.
I had a similar problem. The reponse text at a certain website was sooo big that it was taking my macro forever to search through it. A solution that I came up with is as follows. First I used the SPLIT function on the response text.
arr_1 = Split(my_var, "zc-st-a", -1, vbTextCompare)
You didn't provide enough of the source code for me to be specific, but there is usually some tag you can split on that breaks the response text down into array elements with data you want and those elements without useful information. Next use the FILTER function to filter out the useless elements in arr_1
arr_2 = Filter(arr_1, "zc-pg-y", True, vbTextCompare)
Finally, you can combine the useful elements that are present in arr_2 using the JOIN function.
my_var = Join(arr_2, " ")
In my case, using this method to make the response text smaller reduced my macro run time from 1 hour 15 minutes to 15 minutes. Hope this helps