Web-scraping dynamic website with user input using Selenium and Python

Web-scraping dynamic website with user input using Selenium and Python - selenium

As A Swimmer, I am trying to pull times from a table that can be accessed after the User Inputs their name or other optional fields. The website dynamically generates this data. Below is my current code which does not factor in user inputs.
I am very confused about how selenium's automation works and how to find the right text field for it to read my results and for the rest of my code to extract the table.
Can anyone give some advice on how to proceed?
Any help is appreciated and thanks in advance.
This Is My Current Code:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
site = 'https://www.swimming.org.nz/results.html'
wd = webdriver.Chrome( "C:\\Users\\joseph\\webscrape\\chromedriver.exe")
wd.get(site)
html = wd.page_source
df = pd.read_html(html)
df[1].to_csv('Results.csv')

To start with you need to send a character sequence to the Swimmer field.
To send a character sequence to the Swimmer field as the elements are within an iframe so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired element to be clickable.
You can use either of the following Locator Strategies:
Using CSS_SELECTOR:
driver.get("https://www.swimming.org.nz/results.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#iframe")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input[id^='x-MS_FIELD_MEMBER']"))).send_keys("Joseph Zhang")
Using XPATH:
driver.get("https://www.swimming.org.nz/results.html")
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#id='iframe']")))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[starts-with(#id, 'x-MS_FIELD_MEMBER')]"))).send_keys("Joseph Zhang")
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Browser Snapshot:
References
You can find a couple of relevant discussions in:
Switch to an iframe through Selenium and python
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element while trying to click Next button with selenium
selenium in python : NoSuchElementException: Message: no such element: Unable to locate element

Related

Extracting information from HTML using CLASS_NAME attribute

I'm trying to extract a table from html using selenium. My intention is capture the all class "iframe-b3", because I need the date and the values in the table.
https://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/boletim-diario/boletim-diario-do-mercado/
"Tabela: Participação dos Investidores"
Code trials:
driver.find_element(by=By.CLASS_NAME, value="iframe-b3")
print_tabela = driver.find_element(by=By.CLASS_NAME, value="iframe-b3")
I need to copy this informations and transcript to Excel.

I don't find the table on the website. However the datepicker element is within an <iframe> so you have to:
Induce WebDriverWait for the desired frame to be available and switch to it.
Induce WebDriverWait for the desired presence_of_element_located.
You can use either of the following locator strategies:
Using CSS_SELECTOR:
driver.get('https://www.b3.com.br/pt_br/market-data-e-indices/servicos-de-dados/market-data/consultas/boletim-diario/boletim-diario-do-mercado/')
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#onetrust-accept-btn-handler"))).click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe.iframe-b3")))
print(WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.duet-date input[name='date']"))).get_attribute("value"))
driver.quit()
Console Output:
2023-02-07
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
Reference
You can find a couple of relevant discussions in:
Ways to deal with #document under iframe
Switch to an iframe through Selenium and python
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element while trying to click Next button with selenium
selenium in python : NoSuchElementException: Message: no such element: Unable to locate element

How do I get part of website that is not written out

How do I get part of website
I would be interested in the number shown in the picture, that is not directly displayed on the page.

To print the value of the data-datetime attribute you can use either of the following Locator Strategies:
Using Python and CSS_SELECTOR:
print(driver.find_element(By.CSS_SELECTOR, "div.TimeStamp span[data-interval='60']").get_attribute("data-datetime"))
Using Java and xpath:
System.out.println(wd.findElement(By.xpath("//div[#class='TimeStamp' and #data-interval='60']")).getAttribute("data-datetime"));
Ideally you need to induce WebDriverWait for the visibilityOfElementLocated() / visibility_of_element_located() and you can use either of the following Locator Strategies:
Using Java and cssSelector:
System.out.println(new WebDriverWait(driver, 20).until(ExpectedConditions.visibilityOfElementLocated(By.cssSelector("div.TimeStamp span[data-interval='60']"))).get_attribute("data-datetime"));
Using Python and XPATH:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='TimeStamp' and #data-interval='60']"))).get_attribute("data-datetime"))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Selenium access table by xpath

i'm fairly new to selenium and i'm building a scraper to extract info from a table.
I'm able to acces the table body by ID with no problem but when I try to access it's children they are not found.
the inspector shows the xpath for the first cell as //*[#id="tb_list"]/tr[1]/td[1] but
find_element_by_xpath(//*[#id="tb_list"]/tr[1]/td[1])
can't find it.
I also tried the following to no avail.
table = driver.find_element_by_id("tb_list")
table.find_element_by_xpath(".//tr[1]/td[1]")
it's able to find tb_list but fails to locate the children
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//tr[1]/td[1]"}
Everywhere I looked people suggest one of these 2 methods, what am I doing wrong? The table is dynamically populated from a database, could this be an issue?
I'm using python and the chrome web driver,
I'm hesitant to give a snippet of the html as the site is not publicly available and i dont own it.

[1] indicates first descendent. So the xpath:
//*[#id="tb_list"]/tr[1]/td[1]
can be optimized as:
//*[#id="tb_list"]/tr/td
Effectively the line of code would be:
driver.find_element(By.XPATH, "//*[#id='tb_list']/tr/td")
Ideally, to locate the element you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:
Using XPATH:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//*[#id='tb_list']/tr/td")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

find_element_by_class_name in selenium giving error

I am trying to automate a button click using selenium but it is giving me the error. The html code of the page is:
The code i am trying is:
create_team=driver.find_element_by_class_name('ts-btn ts-btn-fluent ts-btn-fluent-secondary ts-btn-fluent-with-icon join-team-button')
create_team.click()
I am getting the following error:

driver.find_element_by_class_name() only accepts one className, it's not built to handle multiple classNames, reference - (How to locate an element with multiple class names?), THIS SEEMS TO BE UP FOR DEBATE
Use driver.find_element_by_css_selector('ts-btn.ts-btn-fluent.ts-btn-fluent-secondary.ts-btn-fluent-with-icon.join-team-button')
With driver.find_element_by_css_selector you can chain multiple classNames together using a dot(.) between each className in the selector.

To click on the element you can use either of the following Locator Strategies:
Using css_selector:
driver.find_element_by_css_selector("button.ts-btn.ts-btn-fluent.ts-btn-fluent-secondary.ts-btn-fluent-with-icon.join-team-button[data-tid='tg-discover-team']").click()
Using xpath:
driver.find_element_by_xpath("//button[#class='ts-btn ts-btn-fluent ts-btn-fluent-secondary ts-btn-fluent-with-icon join-team-button' and #data-tid='tg-discover-team']").click()
As the desired element is an Angular element, ideally to click on the element you need to induce WebDriverWait for the element_to_be_clickable() and you can use either of the following Locator Strategies:
Using CSS_SELECTOR:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.ts-btn.ts-btn-fluent.ts-btn-fluent-secondary.ts-btn-fluent-with-icon.join-team-button[data-tid='tg-discover-team']"))).click()
Using XPATH:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[#class='ts-btn ts-btn-fluent ts-btn-fluent-secondary ts-btn-fluent-with-icon join-team-button' and #data-tid='tg-discover-team']"))).click()
Note: You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

create_team=driver.find_element_by_class_name('ts-btn.ts-btn-fluent.ts-btn-fluent-secondary ts-btn-fluent-with-icon.join-team-button')
create_team.click()
You have to replace space with . as space indicate multiple class
You can use xpath or css also:
create_team=driver.find_element_by_xpath("//*[#class='ts-btn.ts-btn-fluent.ts-btn-fluent-secondary ts-btn-fluent-with-icon.join-team-button]')
create_team.click()
create_team=driver.find_element_by_css_selector("[class='ts-btn.ts-btn-fluent.ts-btn-fluent-secondary ts-btn-fluent-with-icon.join-team-button]')
create_team.click()
More to answer
If you check the exception from the by_class_name:
You can see that it is using css_class locator under the hood ( You can see it add . in frontautomatically)
Working example:
from selenium import webdriver
import time
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://stackoverflow.com/questions/65579491/find-element-by-class-name-in-selenium-giving-error/65579606?noredirect=1#comment115946541_65579606")
time.sleep(5)
elem = driver.find_element_by_class_name('overflow-x-auto.ml-auto.-secondary.grid.ai-center.list-reset.h100')
print(elem.get_attribute("outerHTML"))

How to find all the elements containing a certain text using xpath Selenium and Python

I tried
driver.find_elements_by_xpath("//*[contains(text()),'panel')]")
but it's only pulling through a singular result when there should be about 25.
I want to find all the xpath id's containing the word panel on a webpage in selenium and create a list of them. How can I do this?

The xpath in your question has an error, not sure if it is a typo but this will not fetch any results.
There is an extra parantheses.
Instead of :
driver.find_elements_by_xpath("//*[contains(text()),'panel')]")
It should be :
driver.find_elements_by_xpath("//*[contains(text(),'panel')]")
Tip: To test your locators(xpath/CSS) - instead of directly using it in code, try it out on a browser first.
Example for Chrome:
Right click on the web page you are trying to automate
Click Inspect and do a CTRL-F.
Type in your xpath and press ENTER
You should be able to scroll through all the matched elements and also verify the total match count

To collect all the elements containing a certain text e.g. panel using Selenium you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:
Using XPATH and contains():
elements = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[contains(., 'panel')]")))
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Web-scraping dynamic website with user input using Selenium and Python - selenium

Related

Extracting information from HTML using CLASS_NAME attribute

How do I get part of website that is not written out

Selenium access table by xpath

find_element_by_class_name in selenium giving error

How to find all the elements containing a certain text using xpath Selenium and Python

Categories

Resources