Scraper clicking on the next page button but fetches nothing - vba

I've written some code in vba in combination with selenium to parse data from different tables spreading across multiple pages. When I run my script I can see that it parses data from the first page and then keep clicking on next page button until there is no more button is available. However, I'm getting the data from first page and seeing the browser clicking on the next page button for nothing cause it doesn't fetch any data from other pages. I don't understand what I'm doing wrong here. Perhaps, the loop I have created has got something to do with it or I don't know. Thanks for taking a look into it. Here is the full code:
Sub Table_data()
Dim driver As New ChromeDriver
Dim tabl As Object, rdata As Object, cdata As Object
driver.Get "https://toolkit.financialexpress.net/santanderam"
driver.Wait 1000
For Each tabl In driver.FindElementsByXPath("//table[#class='fe-datatable']")
For Each rdata In tabl.FindElementsByXPath(".//tr")
For Each cdata In rdata.FindElementsByXPath(".//td")
y = y + 1
Cells(x + 1, y) = cdata.Text
Next cdata
x = x + 1
y = 0
Next rdata
driver.FindElementByLinkText("Next").Click
driver.Wait 1000
Next tabl
End Sub

Consider pressing the Next button outside of your loops. You should use it within another loop, and the loop should terminate when there is no more Next button to press (Run-time Error 7: NoSuchElementError)
Xpath //table[#class='fe-datatable'] returns Page numbers as well. You should be using the inner table which is //table[#class='fe-fund-tableBody'] by class name or if you seek by id //*[#id='docRows']. They will point to the same element.
You might have noticed there are 7 occurrences of the above mentioned element. Your code loops through the empty ones for each page. You can avoid this by looping through the first occurence only, like this: (//table[#class='fe-fund-tableBody'])[1] or (//*[#id='docRows'])[1].
I also would recommend to find a way to implicit/explicit wait instead of wait. If we don't go further to improve anything else, in the end your code should look something like this:
Sub Table_data()
Dim driver As New ChromeDriver
Dim tabl As Object, rdata As Object, cdata As Object
driver.Get "https://toolkit.financialexpress.net/santanderam"
driver.Wait 1000
Do
For Each tabl In driver.FindElementsByXPath("(//*[#id='docRows'])[1]") 'or "(//table[#class='fe-fund-tableBody'])[1]"
For Each rdata In tabl.FindElementsByXPath(".//tr")
For Each cdata In rdata.FindElementsByXPath(".//td")
y = y + 1
Cells(x + 1, y) = cdata.Text
Next cdata
x = x + 1
y = 0
Next rdata
Next tabl
On Error Resume Next
driver.FindElementByLinkText("Next").Click
driver.Wait 1000
Loop Until Err.Number = 7
End Sub

Personally I would change the way you are iterating the pages. It should be like this in pseudo code:
function element getNextButton(){
all_buttons = driver.findElementsByXpath("""//*[#id="Price_1_1"]/tfoot/tr/td/div/div/a""");
next_button = all_buttons[all_buttons.Size()-1];
return next_button;
}
main(){
next_button = getNextButton();
while true{
do something with your current table;
next_button.click();
wait(2); // wait some time till the page loads
next_button = getNextButton();
if next_button.text does not contains 'Next'{
break;
}
}
}
I have just tested it on Python:
from selenium import webdriver
import time
def get_next_button():
buttons = driver.find_elements_by_xpath("""//*[#id="Price_1_1"]/tfoot/tr/td/div/div/a""")
next_element_button = buttons[len(buttons)-1]
return next_element_button
chrome_path = r"chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://toolkit.financialexpress.net/santanderam")
time.sleep(5)
next_button =get_next_button()
while(True):
# Do something with the table
next_button.click()
time.sleep(2)
next_button = get_next_button()
if 'Next' not in next_button.text:
break
print 'End'
I am not familiar with vba, but if you do not understand Python I can try to translate it to vba.
EDIT
An "approximation" to VBA solution should be this (please check syntax errors, I have never used VBA):
Function GetNextElement() as Object
Dim all_buttons As Object
Dim next_button As Object
all_buttons= driver.FindElementsByXpath("""//*[#id="Price_1_1"]/tfoot/tr/td/div/div/a""")
next_button = all_buttons[all_buttons.Length-1]
Return next_button
End Function
Sub Table_data()
Dim driver As New ChromeDriver
Dim position as Integer
Dim next_button As Object
driver.Get "https://toolkit.financialexpress.net/santanderam"
driver.Wait 1000
next_button = GetNextElement()
Do While True
// Do something with the table
next_button.Click
driver.Wait 2000
next_button = GetNextElement()
position = InStr(next_button.Text,"Next")
If position = 0 Then
Exit Do
End If
Loop
End Sub

Related

VBA Selenium : How to extract the text data in the unmerged form..as in like Spec1 : Value1 | Spec2 : Value2 | Spec3 : Value3...etc

enter image description hereI was trying to extract some data part from the list of webpage links that I have in my excel sheet and with the help of the below code, I could able to extract the data, but the problem is.. the result data that I am getting is in merged form... like this "ColorOrangeMaterialPolyethyleneDimensions6 x 37 inFor Use With(1 to 3) 36 in Blankets...." This is the link of the data.. FYI "https://www.grainger.com/product/SALISBURY-Blanket-Canister-Orange-3KUX9". Any suggestions on how to export the data in the form of like(Spec1:Value1|Spec2:Value2... Like this)would be greatly appreciated. Please advise.
This is the code..
Sub Test()
Dim ResultSections As Selenium.WebElements
Dim ResultSection As Selenium.WebElement
Dim i As Long
Dim lastrow As Long
lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
For i = 2 To lastrow
Dim MyUrl As String
MyUrl = Sheet1.Cells(i, 1).Value
Set MB = New Selenium.ChromeDriver
MB.Start
MB.Get MyUrl
MB.Wait 10000
Set ResultSections = MB.FindElementsByClass("P9I57X")
For Each ResultSection In ResultSections
Sheet1.Cells(i, "B").Value = ResultSection.Text
Exit For
Next ResultSection
If i = lastrow Then
MB.Quit
End If
Next i
End Sub
Kindly help me out with this..:-)
I tried to extract the data part of a webpage link, but it is coming the merged form and I couldn't able to differentiate which is the label name and which is the value. So I need that extracted data to be in the right format. Kindly advise.
Select for the dt and dd elements rather than a parent. That way you can set two lists to iterate over and access the desired content and the level where text is as appears on screen. Select a single element, as you are, higher up the DOM and you get this mangled looking string.
Code below should get you started. Note you also don't need to keep creating a new webDriver instance inside your loop.
Dim specs As Selenium.WebElements, values As Selenium.WebElements, i As Long
Set specs = MB.FindElementsByCss("[data-testid='product-techs'] dt")
Set values = MB.FindElementsByCss("[data-testid='product-techs'] dd")
For i = 1 To specs.Count
Debug.Print Join$(Array(specs.Item(i).Text, values.Item(i).Text), ":")
Next

Obtaining a part from pagesource using Selenium in VBA

I am a beginner of using VBA.
I am struggling to obtain the html from a webpage using Selenium and VBA. Nevertheless, I found that I have failed to get all the html from that webpage because the maximum character allowed in a cell in Excel is 32k. What I am trying to do is to obtain the following line from pagesource to get through all the Id element .Attribute("InnerHtml") to print a part of the page source but it seemed not to work :(
I have tried all I can find from the Internet, including the
article class="q q-scale q-l0" id="i_67398910"
data-question-number="9"
Nevertheless
The code is following:
Sub Getting_full_pagesource()
Dim FindBy As New Selenium.By
Dim mypos, i, y As Integer
Set CD = New Selenium.ChromeDriver
CD.start
CD.Get Sheet1.Range("B1").Value
y = 1
Do While y <> 0
Sheet1("A" & i).Value = CD.PageSource
If CD.IsElementPresent(FindBy.class("btn-finish")) = True Then
CD.Quit
Exit Do
End If
y = y + 1
CD.FindElementByTag("button").Click
i = i + 1
Loop
End Sub

vba excel If condition error on final iteration

I'm having the code takes the input from my checkboxes and grab data from the related worksheet. I ran it line by line and found out that it always gets a runtime error at the If statement on the final loop. Is there something wrong in my code?
Option Explicit
Private Sub UserForm_Initialize()
Dim counter As Long
Dim chkBox As MSForms.CheckBox
''''Add checkboxes based on total sheet count
For counter = 1 To Sheets.count - 2
Set chkBox = Me.Frame1.Controls.Add("Forms.CheckBox.1", "CheckBox" & counter)
chkBox.Caption = Sheets(counter + 2).Name
chkBox.Left = 10
chkBox.Top = 5 + ((counter - 1) * 20)
Next
End Sub
Private Sub cmdContinue_Click()
Dim Series As Object
Dim counter As Long
'''Clear old series
For Each Series In Sheets(2).SeriesCollection
Sheets(2).SeriesCollection(1).Delete
Next
''Cycle through checkboxes
For counter = 1 To Sheets.count - 2
''If the box is checked then
If Me.Frame1.Controls(counter).Value = True Then ''Error here on 4th iteration
''Add new series
With Sheets(2).SeriesCollection.NewSeries
.Name = Sheets(counter + 2).Range("$A$1")
.XValues = Sheets(counter + 2).Range("$A$12:$A$25")
.Values = Sheets(counter + 2).Range("$B$12:$B$25")
End With
End If
Next counter
Me.Hide
End Sub
Also, a second problem is it always run on the wrong loop. If i check box 2 it'll run data for the box 1 sheet, 3 run for 2, 4 run for 3, and 1 run for 4. Can anyone explain the reason behind this?
EDIT: So as VincentG point out below, adding an explicit name "checkbox" in there did the trick (i didn't know you could do that). Index 1 was probably taken by one of the buttons or the frame in the user form, causing it to get off set.
I guess your main problem comes from the fact that the controls have to be accessed starting from index 0. So to loop over all controls, you would do something like
For counter = 0 To Me.Frame1.Controls.Count - 1
Debug.Print counter; Me.Frame1.Controls(counter).Name
Next counter
So, when you stick to you code, I assume you have to change the if-statement to
If Me.Frame1.Controls(counter-1).Value = True Then

Pull data from Website into VBA

This might fall under the dumb question from a newbie. But I honestly don't know where to start in VBA. I tried a few different approaches on the web trying to pull data from the site I'm trying to and all of them failed miserably. Can someone help me (more or less show me) how to pull the data from this website?
https://rotogrinders.com/projected-stats/nfl?site=fanduel
It wouldn't even let me do the data->import. here is what I have so far. I keep getting stuck on line For t = 0 To (Table.Length - 1).
Sub test1()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
Dim Table As Object
Dim t As Integer
Dim r As Integer
Dim c As Integer
With appIE
.Navigate "https://rotogrinders.com/projected-stats/nfl?site=fanduel"
.Visible = True
End With
Do While appIE.Busy
DoEvents
Loop
Set Table = appIE.document.getElementById("proj-stats")
For t = 0 To (Table.Length - 1)
For r = 0 To (Table(t).Rows.Length - 1)
For c = 0 To (Table(t).Rows(r).Cells.Length - 1)
ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = Table(t).Rows(r).Cells(c).innerText
Next c
Next r
Next t
appIE.Quit
Set appIE = Nothing
End Sub
You are close, and there are several ways to get the data. I chose to extract all row elements (HTML <TD>) and step through a simple loop. Since there are six columns I'm using two variables (r & c for row and column) to offset the data to format correctly.
Set Table = appIE.document.getElementsbytagname("td")
r = 0
c = 0
For Each itm In Table
Worksheets(1).Range("A1").Offset(r, c).Value = itm.innertext
c = c + 1
If c Mod 6 = 0 Then
r = r + 1
c = 0
End If
Next itm
Example Result:
One last note, sometimes the browser didn't finish loading before the script went on... I cheated by using a break point before the loop, waited until it loaded, then hit F5 to continue execution of code to ensure it would alway run.

How to identify string in htm.getelementbyid("mystring") using vba?

I am trying to get data from a different website using the vba code bellow, but I don't know how to identify the string inside the parenthesis in this statement "With htm.getelementbyid("comps-results"). How do I get the string in the parenthesis from, for example, this website
I would appreciate very much if someone could help me on this matter.
Thank you in advance.
Sub GetData()
Dim x As Long, y As Long
Dim htm As Object
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", "http://www.zillow.com/homes/comps/67083361_zpid/", False
.send
htm.body.innerhtml = .responsetext
End With
With htm.getelementbyid("comps-results")
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
Sheets(1).Cells(x + 1, y + 1).Value = .Rows(x).Cells(y).innertext
Next y
Next x
End With
End Sub
The getElementByID method takes a unique ID as an argument and returns a single HTML element if there is one with such an ID value.
Probably what you need to do is use the getElementsByTagName method, which returns a collection of matching elements. Since this may result in multiple matches, I find it best to create an object first, and an iterator variable:
Dim compresults
Dim el
Set compresults = htm.getelementsbytagname("comps-results")
For each el in compresults
MsgBox el.InnerText
Next
BTW, I am fairly certain ( but have not verified) that an HTMLElementCollection does not have a .Rows member, so the next line in your code will probably raise an error. Likewise, the .Rows does not have a .Length property, so there's at least two errors on that single line of code AND in the next line, note that .Cells does not have a .Length member, either, so another error.
For assistance with those parts of your code, I urge you to ask a new question. This answer addresses your original question.