Grab third element in html table row VBA - vba

I am working on a personal project where I am getting some information from a website.
The website has a table. I know how to loop through each row and For each row I want to compare information with the contents of the first and fourth columns and, where both are matches to information from my table, copy the content of the fifth column in that row into my table but do not how to go about doing this.
This is what each row in the table html looks like.
<tr class="player_tr_1" data-url="/18/player/1/Pelé">
<td class="table-row-text" style="text-align: left;">
<img style="padding: 0;" class="player_img player_right_curve
form rating icon gold rare" src="./FIFA 18 Players _
FUTBIN_files/237067.png">
//Check info here
<a href="https://www.futbin.com/18/player/1/Edson%20Arantes"
class="player_name_players_table"> CHECK THIS TEXT </a>
</td> <td><span class="form rating icon gold rare">98</span> </td>
<td class="">CAM</td>
//Check info here
<td class="">CHECK THIS INFO</td>
//Grab info here
<td><span class="ps4_color"> GRAB THIS INFO </span></td>
<td><span class="xb1_color">0</span></td>
<td><span class="pc_color">0</span></td>
<td><span class="yellow_players_stat">76</span></td>
<td>173cm | 5'8"</td>
<td>77</td>
<td>516</td>
<td>2513</td>
</tr>
Here is the code I have so far:
It iterates through each row in the table and in that loop I need to grab info from that row. I commented in the loop for what I need to do specifically
Private Sub CommandButton1_Click()
SearchBot
End Sub
'start a new subroutine called SearchBot '
Sub SearchBot()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLHtmlElement
Dim X As Integer 'integer variable we'll use as a counter
Dim version As String 'string variable that will hold our version '
Dim NumRows As Integer
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
' Set numrows = number of rows of data.
NumRows = Range("A2", Range("A2").End(xlDown)).Rows.Count + 1
' Select cell a1.
Range("A2").Select
' Establish "For" loop to loop "numrows" number of times.
' screen is updated every time a new price is inserted '
Application.ScreenUpdating = True
For X = 2 To NumRows
' go to player page with correct name '
objIE.navigate "https://www.futbin.com/18/players?page=1&search=" & Sheets("Sheet1").Range("A" & X).Value
' wait '
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
' loop through each row '
For Each aEle In objIE.document.getElementsByTagName("tr")
' check player name, if correct loop through table to grab price if it is also the right version maybe or'
' or grab first elements text content, check if correct name, check 4th element for correct version, grab 5th element, and insert in table '
'if 1st element = Sheets("Sheet1").Range("A" & X).Value & "" '
' And 4th element = Sheets("Sheet1").Range("B" & X).Value Then '
'Sheets("Sheet1").Range("F" & X).Value = 5th element '
Next
Next
' click the correct version of the player '
'close the browser
objIE.Quit
MsgBox ("Done")
'exit our SearchBot subroutine '
End Sub
EDIT: As you can see in the html, the elements I'm trying to grab do not have class names.
Thanks in advance!

There seems to be only one element without any ID/class that you can use to identify it, so that you'll have to get by index while you get the rest by class name.
Dim FirstColumn As String
Set FirstColumn = aEle.getElementsByClassName("player_name_players_table")(1).innerText
Dim FourthColumn As String
Set FourthColumn = aEle.getElementsByTagName("td")(4).innerText
Dim FifthColumn As String
Set FifthColumn = aEle.getElementsByClassName("ps4_color")(1).innerText
Note that I'm not very used to VBA (I'm a VB.NET guy) so I'm not sure whether all the syntax is correct, nor if the returned arrays are starts at zero or one. :)

Related

scraping webpage data modification

its just that i have come up with some code, which does the copy paste webpage into text format in my excel sheet.
few modification were required.
Addition modification requires to make a loop through code so that it access the input from Excel(in attachment-Input sheet) and make changes to URL(i noticed in URL that only last word needs to be changed which will be taken from excel file column 1and so on till its find blank).
As, its looping correctly but there is no loop for data pasting henece its dumping all the looped data to one cell.
My basic requirment of this macro is to access link from column A, and paste its data to column B.
Sub Trial()
Dim IE As Object
Dim URL As Range
For Each URL In Range("A1:A3").Cells
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate "1ox11is" & URL
Do Until .readyState = 4: DoEvents: Loop
'Range("B1").Value = .document.body.innerText
'wsSheet.Range("B" & Rows).Value = .document.body.innerText
Sheets("Sheet1").Range("B1").Value = .document.body.innerText
.Quit
End With
Next
End Sub
Assuming that links are in cells A1,A2,A3 etc. and data from websites is supposed to appear next to them in cells B1,B2,B3 etc, change:
Sheets("Sheet1").Range("B1").Value = .document.body.innerText
to:
Sheets("Sheet1").Range("B" & URL.Row).Value = .document.body.innerText

Retrieve Website Data Excel VBA

First, I am very new with VBA and have only got as far as I am with solutions from other questions asked online. What I have is a macro that opens IE to a specified URL, enters text into a search, loads the results, then loops the search through more specific search values.
What I am trying to do is scrape the results of a search into excel. However, the results don't appear in the resulting HTML code but look to be generated by a script on the website.
An example of the page I am searching:
https://www.gamestop.com/PickUpAtStore/75083/0/917850
When loaded the results are found on the page, but not in the page source. Looking at the page source there looks to be a script that pulls the results in:
<script id="stores" type="text/x-handlebars-template">
{{#if this}}
<ul>
{{#each this}}
<li id="{{StoreNumber}}|{{#if true}}917850 {{/if}}" class="{{#if false}}checkOnly{{/if}}"">
<div class="fluidWrapper ats-storelist" id="{{StoreNumber}}">
<div class="contactInfo">
<div class="title ats-storetitle">{{DisplayName}}</div>
<div class="address ats-storeaddress">
{{{AddressStreet}}}<br />{{AddressCityStateZip}}
</div>
<div class="phoneNumber ats-storephone">
{{Phone}}
</div>
</div>
<div class="rightInfo">
<div class="distance ats-storedistance">{{Distance}} {{#if true}}<i id="showHoldOptions_{{StoreNumber}}" class="{{#if false}} plus_{{/if}}icon"></i>{{/if}}</div>
</div>
</div>
..................
Ideally, what I would like to happen is when the results are loaded the store name, address and phone # are put into excel starting at A4, B4, C4 and adding each store to the next line.
Am I looking in the entirely wrong place to grab these results? I appreciate any help solving this.
edit adding current macro:
Sub Search_Cell()
Dim ie As Object
Dim lRow As Long
Dim URL As Range
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
For Each URL In Range("B1")
ie.navigate URL.Value
Application.StatusBar = "Submitting"
While ie.Busy
DoEvents
Wend
Next
For lRow = 1 To 89
With ie.document
.all("puas_search").Value = Sheets("Zipcodes").Range("A" & lRow).Value
.getElementById("puas_search").Focus
End With
Application.SendKeys ("~")
Application.Wait Now + #12:00:02 AM# 'wait 2 seconds
' Get results of search
' Add Store name to A4, Address to B4, Phone# to C4 (but for following searches start at the next empty row)
' Add following results to next row
Next lRow 'loop to next search
ie.Quit
Set ie = Nothing
MsgBox "Done"
End Sub
I solved this, I was entirely wrong thinking that the results couldn't be scraped from the html. Thank You #Tigregalis for nudging me in the right direction.
Here is the snippet of code that pulls the data I need, places it in the correct location in excel, then moves the to next row.
Set HTMLDoc = IE.document
Set Stores = HTMLDoc.getElementsByClassName("contactInfo")
For Each Store In Stores
ColNum = 1
For Each Name In Store.Children
Cells(RowNum, ColNum) = Name.innerText
ColNum = ColNum + 1
Next Name
RowNum = RowNum + 1
Next Store

Extracting a specific varying element from website source code

I'm trying to extract a specific link from a website and I'm having trouble pulling into a String.
I have to search about 5000 companies from a website and all of the links vary. A link to the source code of an example company (Nokia) is this: view-source:http://finder.fi/yrityshaku/Nokia+oyj this is the part I'm looking at:
<div class="itemName">
<!-- Yritysnimi -->
<!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle -->
<a href="/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159838" class="resultGray">
I want to extract the Substring between
<!-- Yritysnimi -->
<!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle -->
<a href="
and
" class="resultGray">
this substring will vary with each company I search and so I will only know what the strings are around the substring I'm trying to extract.
I've tried to use browserIE.Document.body.innerHTML
Sub Macro1()
Set browserIE = CreateObject("InternetExplorer.Application")
browserIE.Top = 0
browserIE.Left = 800
browserIE.Width = 800
browserIE.Height = 1200
browserIE.Visible = True
Set ws = ThisWorkbook.Worksheets("Sheet1")
browserIE.Navigate ("http://www.finder.fi/yrityshaku")
Do
DoEvents
Loop Until browserIE.ReadyState = 4
browserIE.Document.getElementById("companysearchform_query_companySearchTypename").Click
browserIE.Document.getElementById("SearchInput").Value = "nokia oyj"
browserIE.Document.getElementById("SearchSubmit").Click
Application.Wait (Now + TimeValue("0:00:4"))
codeArea = Mid(V, InStr(V, "<div class=""itemName""> <!-- Yritysnimi --> <!-- Aukeaa aina yhteystiedot-vÃ?lilehdelle --> <a href="""), Len(V))
Debug.Print codeArea
theLink = Mid(codeArea, 117, InStr(codeArea, """ class=""resultGray"">" - 1))
End Sub
but I get an invalid procedure call or argument
I've researched some but I haven't found a suitable solution yet. Some have suggested pulling just an element from the source code and others copying the whole source code into a string variable. As a person who's not too expert in vba I'd prefer pulling the whole code into a string as I think this way would be easier to understand.
Original website (in finnish) http://finder.fi/yrityshaku/nokia+oyj
You need to locate all of the <div> elements with a classname of itemName. Loop through those to find the <a> element(s) and use the first one to get the href property.
Sub Macro1()
Dim browserIE As Object, ws As Worksheet
Set browserIE = CreateObject("InternetExplorer.Application")
browserIE.Top = 0
browserIE.Left = 800
browserIE.Width = 800
browserIE.Height = 1200
browserIE.Visible = True
Set ws = ThisWorkbook.Worksheets("Sheet1")
browserIE.Navigate ("http://www.finder.fi/yrityshaku")
Do While browserIE.ReadyState <> 4 And browserIE.Busy: DoEvents: Loop
browserIE.Document.getElementById("companysearchform_query_companySearchTypename").Click
browserIE.Document.getElementById("SearchInput").Value = "nokia oyj"
browserIE.Document.getElementById("SearchSubmit").Click
Do While browserIE.ReadyState <> 4 And browserIE.Busy: DoEvents: Loop
'Application.Wait (Now + TimeValue("0:00:4"))
Dim iDIV As Long
With browserIE.Document.body
If CBool(.getelementsbyclassname("itemName").Length) Then
'there is at least one div with the itemName class
For iDIV = 0 To .getelementsbyclassname("itemName").Length - 1
With .getelementsbyclassname("itemName")(iDIV)
If CBool(.getelementsbytagname("a").Length) Then
'there is at least one anchor element inside this div
Debug.Print .getelementsbytagname("a")(0).href
End If
End With
Next iDIV
End If
End With
End Sub
I added Microsoft HTML Object library and Microsoft Internet controls to the project via the VBE's Tools ► References.
Results from the Immediate window.
http://www.finder.fi/Televiestint%C3%A4laitteita+ja+palveluja/Nokia+Oyj/ESPOO/yhteystiedot/159843
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/SALO/yhteystiedot/960395
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/TAMPERE/yhteystiedot/853264
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/ESPOO/yhteystiedot/2931747
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/ESPOO/yhteystiedot/2931748
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia/TAMPERE/yhteystiedot/835172
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159838
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/SALO/yhteystiedot/159839
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159850
http://www.finder.fi/Tietoliikennepalveluja%2C+tietoliikennelaitteita/Nokia+Oyj/TAMPERE/yhteystiedot/159857

Run-time error '91' (Object variable or With block variable not set)

I am relatively new to VBA and am trying to put together a msgbox that will give me a specific number from a web scrape, however I keep running into a run-time error '91' and I simply cannot figure out how to fix this. I have searched countless stackoverflow questions, youtube videos and generic google searches, however have not been successful in finding out the error on my own.
Here is the code:
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate ("http://brokercheck.finra.org")
Do
DoEvents
Loop Until IE.ReadyState = 4
'Enter values from the corresponding sheet
'Set some generic typing for ease
Set doc = IE.document
doc.GetElementbyID("GenericSearch_IndividualSearchText").Value = Worksheets("Master").Range("D203")
doc.GetElementbyID("GenericSearch_EmploymingFirmSearchText").Value = Worksheets("Master").Range("C203")
Set elements = doc.getElementsByTagName("button")
For Each element In elements
If element.getAttribute("type") = "submit" Then
element.Click
Exit For
End If
Next element
Do
DoEvents
Loop Until IE.ReadyState = 4
'find CRD#
Set crd = doc.getElementsByClassName("summarydisplaycrd")(0).innerText 'here is where the run time error occurs
MsgBox crd
and the HTML I am trying to get the information from:
<div class="searchresulttext">
<div class="bcrow">
<div class=""> <span class="summarydisplayname">[redacted]</span> <span class="summarydisplaycrd text-nowrap">(CRD# 5944070)</span></div>
I'm reviewing this code and the finra.org site, and have the following observations, which when addressed, should resolve the problem.
The HTML example you provided is simply incorrect, based on the actual HTML that is returned from the "Check" button.
The actual HTML returned looks like this, and the classname is "displayname", not "summarydisplaycrd":
<div class="SearchResultItemColor bcrow">
<div class="searchresulttext">
<div class="bcsearchresultfirstcol">
<span class="displayname">[redacted]</span> <span class="displaycrd">(CRD# 123456789)</span>
Your code exits the For each element loop upon finding the first "submit" button. This may not be the "Check" button (although I can get results either way, you may want to add more logic in the code to ensure the "Check " button is submit.
UPDATE
On further review, while I can replicate the Type 91 error, I still don't know why your class name appears different than mine (maybe an IE11 thing, dunno...) in any case, I'm able to resolve that by forcing a longer delay, as in this case the DoEvents loop is simply not adequate (sometimes this is the case when data is served dynamically from external functions, the browser is ReadyState=4 and .Busy=True, so the loop doesn't do anything)
I use the WinAPI Sleep function and force a 1 second delay after the "Click" button pressed, looping on condition of ReadyState = 4 and .Busy=True.
NOTE you will need to modify the classname parameter depending on how it is appearing on your HTML.
Option Explicit
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub finra()
Dim IE As Object
Dim doc As Object, element As Object, elements As Object, crd
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate ("http://brokercheck.finra.org")
Call WaitIE(IE, 1000)
'Enter values from the corresponding sheet
'Set some generic typing for ease
Set doc = IE.document
doc.GetElementbyID("GenericSearch_IndividualSearchText").Value = "steve"
doc.GetElementbyID("GenericSearch_EmploymingFirmSearchText").Value = "ed"
Set elements = doc.getElementsByTagName("button")
For Each element In elements
If element.getAttribute("type") = "submit" Then
If element.innerText = "Check " Then
element.Click
Exit For
End If
End If
Next element
Call WaitIE(IE, 1000)
Dim itms As Object
'Set itms = doc.getElementsByClassName("displaycrd")
crd = doc.getElementsByClassName("displaycrd")(0).innerText 'here is where the run time error occurs
MsgBox crd
End Sub
Sub WaitIE(IE As Object, Optional time As Long = 250)
Dim i As Long
Do
Sleep time
Debug.Print CStr(i) & vbTab & "Ready: " & CStr(IE.ReadyState = 4) & _
vbCrLf & vbTab & "Busy: " & CStr(IE.Busy)
i = i + 1
Loop Until IE.ReadyState = 4 And Not IE.Busy
End Sub

Excel VBA HTML retrieve

I have a fully functional macro that goes through a list of personnel records and works out if they are leavers or never started. The only issue is its a very slow process when search the string created with all the HTML code (~10000 characters)
I was wondering if there is a way to restrict the retrieve to be just part of the webpage
The macro i am currently using is below, this macro iterates through each row and pulls in the code from the URL for each persons personnel page
Sub RetrieveEndDate()
Dim myArray() As Variant, Search As Variant
Dim strURL As String, strCSV As String, dbClose As String
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Call LogOn
RowsWithData = Application.CountA(Range("A:A"))
For R = 2 To RowsWithData
Application.StatusBar = R & " Out of " & RowsWithData
UKNo = Cells(R, 1).Value
strURL = "http://www.pers.fs.com/People_Detail.asp?Pers_no=" & UKNo & "&mode=CURRENT"
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", strURL, False
http.Send
strCSV = http.responseText
Cells(R, 3).Value = strCSV
'Works of if employee has left, never started or if neither of them leaves blank
If InStr(1, strCSV, "Employee has Left") > 0 Then
Cells(R, 2).Value = "Left"
ElseIf InStr(1, strCSV, "Non-Starter") > 0 Then
Cells(R, 2).Value = "Did not start"
Else
Cells(R, 2).Value = ""
End If
Set http = Nothing
Next R
1
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
The retrieve from the webpage is ~10000 characters long, but the info i am interested in is at the beginning of the page as below "(Employee has Left)"which is on the 3rd from bottom line
<head>
<title> List</title>
<link rel="stylesheet" href="_stylesheets/atc.css" type="text/css">
</head>
<body CLASS="Skill" >
<form name="People_Detail" method="Post" action=History_list.asp>
<P><INPUT id="Pers_No" type = "HIDDEN" name="Pers_No" value=UK111111 ></P>
<P><INPUT id="mode" type = "HIDDEN" name="mode"Value="HISTORY_LIST"></P>
<Table Border = 0 CellPadding = 0 width = 100% >
<TR><TR><TD Colspan = 2 ><H1 id=Test name=test>Current Active Record<BR>(Employee has Left)</H1><TD align = right>
<P><INPUT id="btnSubmit" name="btnSubmit" type="SUBMIT" value="View Record History List"></P>
</TD></TD></TR></TR>
AFAIK there is no way do this with XMLHTTP.
This KB article contains code that performs a download using the WinInet API.
The While bDoLoop loop reads the URL in Len(sReadBuffer) chunks, you can modify this to add a condition and exit the loop whenever you like.
If you wanted to begin the download at a specific offset (and the server supports it) you could also try InternetSetFilePointer.
I had a similar problem. The reponse text at a certain website was sooo big that it was taking my macro forever to search through it. A solution that I came up with is as follows. First I used the SPLIT function on the response text.
arr_1 = Split(my_var, "zc-st-a", -1, vbTextCompare)
You didn't provide enough of the source code for me to be specific, but there is usually some tag you can split on that breaks the response text down into array elements with data you want and those elements without useful information. Next use the FILTER function to filter out the useless elements in arr_1
arr_2 = Filter(arr_1, "zc-pg-y", True, vbTextCompare)
Finally, you can combine the useful elements that are present in arr_2 using the JOIN function.
my_var = Join(arr_2, " ")
In my case, using this method to make the response text smaller reduced my macro run time from 1 hour 15 minutes to 15 minutes. Hope this helps