I am learning web scraping with Selenium for Finance team project. The idea is:
Login to HR system
Search for Purchase Order Number
System display list of attachments
Download the attachments
Below are my codes:
# interaction with Google Chrome
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# specify chromedriver location
PATH = './chromedriver_win32/chromedriver.exe'
# open Google Chrome browser & visit Purchase Order page within HRIS
browser = webdriver.Chrome(PATH)
browser.get('https://purchase.sea.com/#/list')
< user input ID & password >
# user interface shows "My Purhcase Request" & "My Purchase Order" tabs
# click on the Purchase Order tab
try:
po_tab = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.LINK_TEXT, "My Purchase Orders"))
)
po_tab.click()
except:
print('HTML element not found!!')
# locate PO Number field and key in PO number
try:
po_input_field = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "input-field"))
)
po_input_field.send_keys(<dummy PO num#>) ## any PO number
except:
print("PO field not found!!")
# locate Search button and click search
try:
search_button = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.LINK_TEXT, "Search"))
)
search_button.click()
except:
print("Search button not found!!")
I stuck at the step # click on the Purchase Order tab and following steps.
I can find the elements but I can see error after executing the py script. The most interesting part is....I can do it perfectly in Jupyter Notebook.
Python Script Execute Error
Here are the elements after inspection screens:
Purchase Orders tab
PO Number input field
Search button
See you are using presence_of_element_located which is basically
""" An expectation for checking that an element is present on the DOM
of a page. This does not necessarily mean that the element is visible.
locator - used to find the element
returns the WebElement once it is located
"""
What I would suggest to you is to use element_to_be_clickable
""" An expectation for checking that an element is present on the DOM of a
page and visible. Visibility means that the element is not only displayed
but also has a height and width that is greater than 0.
locator - used to find the element
returns the WebElement once it is located and visible
so, in your code it'd be something like this :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "My Purchase Orders"))).click()
Also, we could try with different locators if this does not work, locators like CSS, XPATH. But I'll let you comment first if this works or not.
The first error in your log is HTML element not found! You are performing a click before element is visible on DOM. Please try below possible solutions,
Wait till element is visible then perform click operation.
EC.visibility_of_element_located((By.LINK_TEXT, "My Purchase Orders"));
If still you are not able click with above code then wait for element to be clickable.
EC.element_to_be_clickable((By.LINK_TEXT, "My Purchase Orders"));
I would suggest to create reusable methods for actions like click() getText() etc.
Related
First time using selenium for web scraping a website, and I'm fairly new to python. I have tried to scrape a Swedish housing site to extract price, address, area, size, etc., for every listing for a specific URL that shows all houses for sale in a specific area called "Lidingö".
I managed to bypass the pop-up window for accepting cookies.
However, the output I get from the terminal is blank when the script runs. I get nothing, not an error, not any output.
What could possibly be wrong?
The code is:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
s = Service("/Users/brustabl1/hemnet/chromedriver")
url = "https://www.hemnet.se/bostader?location_ids%5B%5D=17846&item_types%5B%5D=villa"
driver = webdriver.Chrome(service=s)
driver.maximize_window()
driver.implicitly_wait(10)
driver.get(url)
# The cookie button clicker
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[62]/div/div/div/div/div/div[2]/div[2]/div[2]/button"))).click()
lists = driver.find_elements(By.XPATH, '//*[#id="result"]/ul[1]/li[1]/a/div[2]')
for list in lists:
adress = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[2]/a/div[2]/div/div[1]/div[1]/h2')
area = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[1]/div[1]/div/span[2]')
price = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[1]')
rooms = list.find_element(By.XPATH,'//*
[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[3]')
size = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[2]')
print(adress.text)
There are a lot of flaws in your code...
lists = driver.find_elements(By.XPATH, '//*[#id="result"]/ul[1]/li[1]/a/div[2]')
in your code this returns a list of elements in the variable lists
for list in lists:
adress = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[2]/a/div[2]/div/div[1]/div[1]/h2')
area = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[1]/div[1]/div/span[2]')
price = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[1]')
rooms = list.find_element(By.XPATH,'//*
[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[3]')
size = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[2]')
print(adress.text)
you are not storing the value of each address in a list, instead, you are updating its value through each iteration.And xpath refers to the exact element, your loop is selecting the same element over and over again!
And scraping text through selenium is a bad practice, use BeautifulSoup instead.
I want to create with selenium webdriver a Drupal's widget given some data (texts, images, ...)
I managed to open the back office of a website build on a Drupal solution, but I stuck on trying to make the button "add widget" work :
I have something like :
# id of the button "Ajouter widget" (add widget)
id_add_widget = 'edit-field-content-add-more-add-modal-form-area-add-more'
widget_button = driver.find_element_by_id(id_add_widget)
widget_button.click()
corresponding to the following state :
but it fails to launch the choices of widgets that you would get by clicking on it :
(and now, I need to choose amongst this liste...)
Is this id edit-field-content-add-more-add-modal-form-area-add-more unique in HTML DOM ?
If Yes, then try below code trials :
Code 1 :
widget_button = driver.find_element_by_id(id_add_widget)
widget_button.click()
Code 2:
widget_button = driver.find_element_by_id(id_add_widget)
driver.execute_script("arguments[0].click();", button)
Code 3:
Use Explicit waits
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.ID, "edit-field-content-add-more-add-modal-form-area-add-more"))).click()
Imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
If this ID is not unique in HTML DOM then, You'd have to look for different locator.
I tried
driver.find_elements_by_xpath("//*[contains(text()),'panel')]")
but it's only pulling through a singular result when there should be about 25.
I want to find all the xpath id's containing the word panel on a webpage in selenium and create a list of them. How can I do this?
The xpath in your question has an error, not sure if it is a typo but this will not fetch any results.
There is an extra parantheses.
Instead of :
driver.find_elements_by_xpath("//*[contains(text()),'panel')]")
It should be :
driver.find_elements_by_xpath("//*[contains(text(),'panel')]")
Tip: To test your locators(xpath/CSS) - instead of directly using it in code, try it out on a browser first.
Example for Chrome:
Right click on the web page you are trying to automate
Click Inspect and do a CTRL-F.
Type in your xpath and press ENTER
You should be able to scroll through all the matched elements and also verify the total match count
To collect all the elements containing a certain text e.g. panel using Selenium you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:
Using XPATH and contains():
elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[contains(., 'panel')]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Tried to practice selenium on indeed.ca. The following describes my steps:
Openeded 'indeed.ca'
Typed 'IT support' in text area for searching
clicked on first job among the group of published jobs
clicked on "Apply Now' button,
a window-pop has come which has the fields to enter data relevant to labels like 'First Name', 'Last Name', 'Email' and a button 'choose file' to upload a resume.
After I switched the driver focus to 'window-pop', I am unable to locate elements.
Here are all the links used:
https://www.indeed.ca/
https://www.indeed.ca/jobs?q=it+support&l=Toronto%2C+ON (with search criteria IT SUPPORT)
https://www.indeed.ca/jobs?q=it%20support&l=Toronto%2C%20ON&vjk=837c0cbbf26a68a7 (link for the window after clicking first option in the jobs list)
I shared the screen-shot for window-pop after clicking on 'Apply Now'
Try this
The fields are inside nested iframes.
driver.switch_to_frame(driver.find_element_by_id('indeed-ia-1532701404288-0-modal-iframe'))
driver.switch_to_frame(driver.find_element_by_tag_name('frame'))
first_name = driver.find_element_by_id('input-applicant.name')
you can use this code, after clicking on Apply Now button :
There are two iframes , in order to interact with newly opened pop up, you will have to switch to both of the frames.
Code :
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path = r'D:/Automation/chromedriver.exe')
driver.maximize_window()
driver.get("https://www.indeed.ca/jobs?q=it%20support&l=Toronto%2C%20ON&vjk=837c0cbbf26a68a7")
wait = WebDriverWait(driver, 10)
apply_now = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Apply Now']/ancestor::a")))
apply_now.click()
wait.until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"div.indeed-apply-bd>iframe")))
driver.switch_to.frame(driver.find_element_by_css_selector("iframe[src^='https://apply.indeed.com/indeedapply/resumeapply?']"))
Name = wait.until(EC.element_to_be_clickable((By.ID, "input-applicant.name")))
Name.send_keys("Vijay")
I have written a python script that aims to take data off a website but I am unable to navigate and loop through pages to collect the links. The website is https://www.shearman.com/people? The Xpath on the site looks like this below;
ul class="results-pagination"
li class/a href onclick="PageRequest('2', event)"
When I run the query below is says that the element is not attached to the page;
try:
# this is navigate to next page
driver.find_element_by_xpath('//ul[#class="results-pagination"]/li/[#onclick=">"]').click()
time.sleep(5)
except NoSuchElementException:
break
Any ideas what I'm doing wrong on this?
Many thanks in advance.
Chris
You can try this code :
browser.get("https://www.shearman.com/people")
wait = WebDriverWait(browser, 30)
main_tab = browser.current_window_handle
navigation_buttons = browser.find_elements_by_xpath('//ul[#class="results-pagination"]//descendant::a')
size = len(navigation_buttons )
print ('this the length of list:',navigation_buttons )
i = 0
while i<size:
ActionChains(browser).key_down(Keys.CONTROL).click(navigation_buttons [i]).key_up(Keys.CONTROL).perform()
browser.switch_to_window(main_tab)
i=i+1;
if i >= size:
break
Make sure to import these :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
Note this will open each link in new tab. As per your requirement you can click on next button using this xpath : //ul[#class="results-pagination"]//descendant::a
If you want to open links one by one in same tab , then you will have to handle stale element reference as once you will be moved out from main page , all element will become stale.