I have written a python script that aims to take data off a website but I am unable to navigate and loop through pages to collect the links. The website is https://www.shearman.com/people? The Xpath on the site looks like this below;
ul class="results-pagination"
li class/a href onclick="PageRequest('2', event)"
When I run the query below is says that the element is not attached to the page;
try:
# this is navigate to next page
driver.find_element_by_xpath('//ul[#class="results-pagination"]/li/[#onclick=">"]').click()
time.sleep(5)
except NoSuchElementException:
break
Any ideas what I'm doing wrong on this?
Many thanks in advance.
Chris
You can try this code :
browser.get("https://www.shearman.com/people")
wait = WebDriverWait(browser, 30)
main_tab = browser.current_window_handle
navigation_buttons = browser.find_elements_by_xpath('//ul[#class="results-pagination"]//descendant::a')
size = len(navigation_buttons )
print ('this the length of list:',navigation_buttons )
i = 0
while i<size:
ActionChains(browser).key_down(Keys.CONTROL).click(navigation_buttons [i]).key_up(Keys.CONTROL).perform()
browser.switch_to_window(main_tab)
i=i+1;
if i >= size:
break
Make sure to import these :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
Note this will open each link in new tab. As per your requirement you can click on next button using this xpath : //ul[#class="results-pagination"]//descendant::a
If you want to open links one by one in same tab , then you will have to handle stale element reference as once you will be moved out from main page , all element will become stale.
Related
I am learning web scraping with Selenium for Finance team project. The idea is:
Login to HR system
Search for Purchase Order Number
System display list of attachments
Download the attachments
Below are my codes:
# interaction with Google Chrome
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# specify chromedriver location
PATH = './chromedriver_win32/chromedriver.exe'
# open Google Chrome browser & visit Purchase Order page within HRIS
browser = webdriver.Chrome(PATH)
browser.get('https://purchase.sea.com/#/list')
< user input ID & password >
# user interface shows "My Purhcase Request" & "My Purchase Order" tabs
# click on the Purchase Order tab
try:
po_tab = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.LINK_TEXT, "My Purchase Orders"))
)
po_tab.click()
except:
print('HTML element not found!!')
# locate PO Number field and key in PO number
try:
po_input_field = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "input-field"))
)
po_input_field.send_keys(<dummy PO num#>) ## any PO number
except:
print("PO field not found!!")
# locate Search button and click search
try:
search_button = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.LINK_TEXT, "Search"))
)
search_button.click()
except:
print("Search button not found!!")
I stuck at the step # click on the Purchase Order tab and following steps.
I can find the elements but I can see error after executing the py script. The most interesting part is....I can do it perfectly in Jupyter Notebook.
Python Script Execute Error
Here are the elements after inspection screens:
Purchase Orders tab
PO Number input field
Search button
See you are using presence_of_element_located which is basically
""" An expectation for checking that an element is present on the DOM
of a page. This does not necessarily mean that the element is visible.
locator - used to find the element
returns the WebElement once it is located
"""
What I would suggest to you is to use element_to_be_clickable
""" An expectation for checking that an element is present on the DOM of a
page and visible. Visibility means that the element is not only displayed
but also has a height and width that is greater than 0.
locator - used to find the element
returns the WebElement once it is located and visible
so, in your code it'd be something like this :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "My Purchase Orders"))).click()
Also, we could try with different locators if this does not work, locators like CSS, XPATH. But I'll let you comment first if this works or not.
The first error in your log is HTML element not found! You are performing a click before element is visible on DOM. Please try below possible solutions,
Wait till element is visible then perform click operation.
EC.visibility_of_element_located((By.LINK_TEXT, "My Purchase Orders"));
If still you are not able click with above code then wait for element to be clickable.
EC.element_to_be_clickable((By.LINK_TEXT, "My Purchase Orders"));
I would suggest to create reusable methods for actions like click() getText() etc.
I want to create with selenium webdriver a Drupal's widget given some data (texts, images, ...)
I managed to open the back office of a website build on a Drupal solution, but I stuck on trying to make the button "add widget" work :
I have something like :
# id of the button "Ajouter widget" (add widget)
id_add_widget = 'edit-field-content-add-more-add-modal-form-area-add-more'
widget_button = driver.find_element_by_id(id_add_widget)
widget_button.click()
corresponding to the following state :
but it fails to launch the choices of widgets that you would get by clicking on it :
(and now, I need to choose amongst this liste...)
Is this id edit-field-content-add-more-add-modal-form-area-add-more unique in HTML DOM ?
If Yes, then try below code trials :
Code 1 :
widget_button = driver.find_element_by_id(id_add_widget)
widget_button.click()
Code 2:
widget_button = driver.find_element_by_id(id_add_widget)
driver.execute_script("arguments[0].click();", button)
Code 3:
Use Explicit waits
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "edit-field-content-add-more-add-modal-form-area-add-more"))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
If this ID is not unique in HTML DOM then, You'd have to look for different locator.
I am running a webscraper and I am not able to click on the third element. I am not sure what to do as I have tried googling and running several types of code.
Below is a screenshot of the html and my code. I need the third element in the list to be clicked on. It is highlighted in the screenshot. I am not sure what to do with the css and data-bind
here is the code for max bed options. I also need to get the 2 beds just like we did for min bed options
thanks!!
According to the picture the following should work:
driver.find_element_by_xpath('//span[#id="bedsMinMaxRangeControl"]//li[#data-value="2"]').click()
But we need to see the entire page HTML to give a correct answer.
Also don't forget to use delays/ waits there.
UPD
For the new question the code will be:
driver.find_element_by_xpath('//span[#id="bedsMinMaxRangeControl"]//ul[contains(#class,"maxBedOptions")]//li[#data-value="2"]').click()
Here you should also use the appropriate data-value that has values from -1 up to 3
You can use css_selector with the data-value attribute.
locator = ".dropdownContent .minBedOptions li[data-value='2']"
WebDriverWait(driver, 10).until((EC.element_to_be_clickable, (By.CSS_SELECTOR, locator))).click()
I used WebDriverWait so make sure to import it...
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
If it's just a list using li tags as it seems from the shared snap, You could probably write the easiest xpath as :
//li[contains(text(), '2 Beds')]
and use it like this :
driver.find_element_by_xpath("//li[contains(text(), '2 Beds')]").click()
or if you want to use xpath in conjunction of WebDriverWait, use it like this :
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.XPATH, "//li[contains(text(), '2 Beds')]")))
Import :
from selenium.webdriver.support import expected_conditions as EC
Now let's talk about when we do not want to be dependent on 2 Beds text in XPATH, cause if text changes in the UI, we'd have to change the locator in Selenium-Python bindings.
A good way to do is to :
Make use of data-value attribute : //li[#data-value = '2']
Make use of ul and li tags : //ul[contains(#class, 'minBedOptions')/li[#data-value = '2']]
I'm trying to scrape Google results using selenium chromedriver. Before, I used requests + Beautifulsoup to scrape google Results, and this worked, however I got blocked from Google after around 300 results. I've been reading into this topic and it seems to me that using selenium + webdriver is less easily blocked by Google.
Now, I'm trying to scrape Google results using selenium. I would like to scrape the title, link and description of all items. Essentially, I want to do this: How to scrape all results from Google search results pages (Python/Selenium ChromeDriver)
NoSuchElementException: no such element: Unable to locate element:
{"method":"css selector","selector":"h3"} (Session info:
chrome=90.0.4430.212)
Therefore, I'm trying another code. This code is able to scrape some, but not ALL the titles + descriptions. See picture below. I cannot scrape the last 4 titles, and the last 5 descriptions are also empty. Any clues on this? Much appreciated.
import urllib
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
root = "https://www.google.com/"
url = "https://google.com/search?q="
query = 'Why do I only see the first 4 results?' # Fill in google query
query = urllib.parse.quote_plus(query)
link = url + query
print(f'Main link to search for: {link}')
options = Options()
# options.headless = True
options.add_argument("--window-size=1920,1200")
driver = webdriver.Chrome(options=options)
driver.get(link)
wait = WebDriverWait(driver, 30)
wait.until(EC.presence_of_all_elements_located((By.XPATH, './/h3')))
link_tag = './/div[#class= "yuRUbf"]/a'
title_tag = './/h3'
description_tag = './/span[#class= "aCOpRe"]'
titles = driver.find_elements_by_xpath(title_tag)
links = driver.find_elements_by_xpath(link_tag)
descriptions = driver.find_elements_by_xpath(description_tag)
for t in titles:
print('title:', t.text)
for l in links:
print('links:', l.get_attribute("href"))
for d in descriptions:
print('descriptions:', d.text)
# Why are the last 4 titles and the last 5 descriptions empty??
Image of the results:
Cause those 4 are not the actual links, Google always show "People also ask". If you see their DOM structure
<div style="padding-right:24px" jsname="xXq91c" class="cbphWd" data-
kt="KjCl66uM1I_i7PsBqYb-irfI74DmAeDWm-uv7IveYLKIxo-bn9L1H56X2ZSUy9L-6wE"
data-hveid="CAgQAw" data-ved="2ahUKEwjAoJ2ivd3wAhXU-nMBHWj1D8EQuk4oAHoECAgQAw">
How do I get Google to show all results?
</div>
it is not an anchor tag so you won't see href tag so your links list will have 4 empty value cause there are 4 divs like that.
to grab those 4 you need to use different locator :
XPATH : //*[local-name()='svg']/../following-sibling::div[#style]
title_tags = driver.find_elements(By.XPATH, "//*[local-name()='svg']/../following-sibling::div[#style]")
for title in title_tags:
print(title.text)
Tried to practice selenium on indeed.ca. The following describes my steps:
Openeded 'indeed.ca'
Typed 'IT support' in text area for searching
clicked on first job among the group of published jobs
clicked on "Apply Now' button,
a window-pop has come which has the fields to enter data relevant to labels like 'First Name', 'Last Name', 'Email' and a button 'choose file' to upload a resume.
After I switched the driver focus to 'window-pop', I am unable to locate elements.
Here are all the links used:
https://www.indeed.ca/
https://www.indeed.ca/jobs?q=it+support&l=Toronto%2C+ON (with search criteria IT SUPPORT)
https://www.indeed.ca/jobs?q=it%20support&l=Toronto%2C%20ON&vjk=837c0cbbf26a68a7 (link for the window after clicking first option in the jobs list)
I shared the screen-shot for window-pop after clicking on 'Apply Now'
Try this
The fields are inside nested iframes.
driver.switch_to_frame(driver.find_element_by_id('indeed-ia-1532701404288-0-modal-iframe'))
driver.switch_to_frame(driver.find_element_by_tag_name('frame'))
first_name = driver.find_element_by_id('input-applicant.name')
you can use this code, after clicking on Apply Now button :
There are two iframes , in order to interact with newly opened pop up, you will have to switch to both of the frames.
Code :
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path = r'D:/Automation/chromedriver.exe')
driver.maximize_window()
driver.get("https://www.indeed.ca/jobs?q=it%20support&l=Toronto%2C%20ON&vjk=837c0cbbf26a68a7")
wait = WebDriverWait(driver, 10)
apply_now = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Apply Now']/ancestor::a")))
apply_now.click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"div.indeed-apply-bd>iframe")))
driver.switch_to.frame(driver.find_element_by_css_selector("iframe[src^='https://apply.indeed.com/indeedapply/resumeapply?']"))
Name = wait.until(EC.element_to_be_clickable((By.ID, "input-applicant.name")))
Name.send_keys("Vijay")