Scraping web-page with button-multitems click

Scraping web-page with button-multitems click - selenium

I'am trying to scrape this web page: https://whalewisdom.com/filer/fisher-asset-management-llc#tabholdings_tab_link
I would like to setup the python selenium code, in order to setup correctly multitems in: "50" pages per page
But my code click on wrong button. where is my code error?
options = webdriver.FirefoxOptions()
options.binary_location = r'C:/Users/Mozilla Firefox/firefox.exe'
driver = selenium.webdriver.Firefox(executable_path='C:/geckodriver.exe' , options=options)
driver.execute("get", {'url': 'https://whalewisdom.com/filer/fisher-asset-management-llc#tabholdings_tab_link'})
driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//label[#id='qtr-1-label']"))))
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[#class='btn btn-default dropdown-toggle']"))).click()
thank you for your help.
-ag

You code clicked on wrong button because you have multiple elements with exact same class and you are fetching the first one and clicking on it.
Also I see on the page, you sometime get a popup which may make other elements not interactable. SO we would want the popup to close first(if appeared) then move ahead.
Using Chrome driver
Setup and Imports
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# REPLACE YOUR CHROME PATH HERE
chrome_path = r"C:\Users\hpoddar\Desktop\Tools\chromedriver_win32\chromedriver.exe"
s = Service(chrome_path)
driver = webdriver.Chrome(service=s)
Fetch the page
driver.get(' https://whalewisdom.com/filer/fisher-asset-management-llc#tabholdings_tab_link')
Close the popup(if appeared)
try:
popup = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//a[#id='dfwid-close-184302']")))
popup.click()
except TimeoutException:
print("No Popup appeared on the page")
Click on dropdown and the menu item 50
dropdown = driver.find_element(By.CSS_SELECTOR, '.btn-group.dropdown')
dropdown.click()
ele50 = driver.find_element(By.XPATH, '//li[#role="menuitem"]/a[contains(text(), "50")]')
ele50.click()
Output
The above code clicks on item 50
Using Firefox driver
The imports would be same as above, the following code would also remains some with just a minute change.
# REPLACE YOUR FIREFOX DRIVER PATH HERE
firefoxpath = r'C:\Users\hpoddar\Desktop\Tools\firefoxdriver\geckodriver.exe'
s = Service(firefoxpath)
driver = webdriver.Firefox(service=s)
driver.get(' https://whalewisdom.com/filer/fisher-asset-management-llc#tabholdings_tab_link')
try:
popup = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//a[#id='dfwid-close-184302']")))
popup.click()
except TimeoutException:
print("No Popup appeared on the page")
dropdown = driver.find_element(By.CSS_SELECTOR, '.btn-group.dropdown')
dropdown.click()
ele50 = driver.find_element(By.XPATH, '//li[#role="menuitem"]/a[contains(text(), "50")]')
ele50.click()
Output
which similarly clicks on the desired element

Related

Selenium trouble finding button

I have the following code snippet:
enter image description here
I am trying to select the button download in the page:
enter image description here
I am using the following code
from selenium import webdriver
from selenium.webdriver.common.by import By
import datetime
d_ref = datetime.date.today()
driver = webdriver.Chrome('D:\\User\\Download\\chromedriver.exe')
chrome_options = webdriver.ChromeOptions()
prefs = {'download.default_directory' : 'D:\\User\\Download' }
chrome_options.add_experimental_option('prefs', prefs)
driver.get('https://www.anbima.com.br/pt_br/informar/sistema-reune.htm')
# driver.maximize_window()
driver.execute_script("window.scrollTo(0, 320);")
driver.switch_to.frame(0)
# driver.find_element(By.NAME, "Dt_Ref").clear()
# driver.find_element(By.NAME, "Dt_Ref").send_keys(d_ref.strftime('%d%m%Y'))
dropdown = driver.find_element(By.ID, "TpInstFinanceiro")
dropdown.find_element(By.XPATH, "//option[. = 'C F F']").click()
driver.find_element(By.CSS_SELECTOR, "fieldset:nth-child(3) input:nth-child(1)").click()

The CSS-selector fieldset:nth-child(3) is highlighting Financial Instrument and fieldset:nth-child(3) input:nth-child(1) is not highlighting any element in the DOM. Link to refer
Below CSS-selector is highlighting the Download option in the DOM.
fieldset:nth-child(5) input:nth-child(4)
Better to close the Cookie pop-up to interact with other elements. Use Explicit waits.
# Imports
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://www.anbima.com.br/pt_br/informar/sistema-reune.htm")
wait = WebDriverWait(driver,20)
# Click on Proceed on the Cookie pop-up
wait.until(EC.element_to_be_clickable((By.XPATH,"//a[#class='LGPD_ANBIMA_global_sites__text__btn']"))).click()
# Switch to Iframe
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[#class]")))
# Select Download option
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR,"fieldset:nth-child(5) input:nth-child(4)"))).click()

Print piece of text using selenium with multiple open windows

With the help of the community I have been able to develop a piece of code that is able that prints the line of a webpage. However, I know want the code to print the piece of text for multiple webpages that match a certain xpath selector. How can this be done?
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get('https://www.flashscore.com/')
wait = WebDriverWait(driver, 20)
driver.maximize_window() # For maximizing window
time.sleep(2)
driver.find_element_by_id('onetrust-reject-all-handler').click()
matchpages = driver.find_elements_by_xpath("//*[#class='preview-ico icon--preview']//*[name()='use']")
for matchpages in matchpages:
matchpages.click()
new_window = driver.window_handles[1]
original_window = driver.window_handles[0]
driver.switch_to.window(driver.window_handles[1])
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.previewShowMore.showMore"))).click()
main = driver.find_element(By.XPATH,"//div[#class='previewLine' and ./b[text()='Hot stat:']]").text
main = main.replace('Hot stat:','')
print(main)
driver.close()
driver.switch_to_window(original_window)
I think the following line selects the first 'preview' page:
new_window = driver.window_handles[1]
However, this then needs to be adjusted to all the 'preview' pages on flashscore.com.
Furthermore, the following lines should also be incorporated in the opened windows, as I would like to print out these lines in order to get a quick overview of the hot stats of that day.
main = driver.find_element(By.XPATH,"//div[#class='previewLine' and ./b[text()='Hot
stat:']]").text
main = main.replace('Hot stat:','')
print(main)
Thanks in advance! : )

The code you provided was close. I ended up changing a few things, such as:
Used webdriver manager instead of locally installed version
Used service and options within webdriver.Chrome()
Used XPATH for most of the elements
Code is below:
NOTE that I had to click to the next day to get PREVIEW buttons to test, code is within two blocks, remove if needed
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.support.wait import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--disable-extensions')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--ignore-certificate-errors')
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)
url = 'https://www.flashscore.com/'
driver.get(url)
wait = WebDriverWait(driver, 20)
driver.maximize_window() # For maximizing window
wait.until(ec.visibility_of_element_located((By.XPATH, "/html/body/div[6]/div[1]/div/div[1]/div[2]/div[4]/div[2]/div/section/div/div/div[2]")))
driver.find_element(By.ID, 'onetrust-reject-all-handler').click()
# Put this Code in so I could test (clicks next so I had PREVIEW buttons to click)
driver.find_element(By.XPATH, "/html/body/div[6]/div[1]/div/div[1]/div[2]/div[4]/div[2]/div/div[1]/div[2]/div/div[3]").click()
#
wait.until(ec.visibility_of_element_located((By.XPATH, "/html/body/div[6]/div[1]/div/div[1]/div[2]/div[4]/div[2]/div/section/div/div/div[2]")))
# Changed this to find all svg tags with the class of preview-ico icon--preview
matchpages = driver.find_elements(By.XPATH, "//*[local-name()='svg' and #class='preview-ico icon--preview']/..")
# Loop through those elements found
for matchpages in matchpages:
try:
matchpages.click()
# Switch to pop-up window
driver.switch_to.window(driver.window_handles[1])
wait.until(ec.visibility_of_element_located((By.XPATH, "/html/body/div[2]/div/div[7]/div[1]")))
# click on the show more
driver.find_element(By.XPATH, "/html/body/div[2]/div/div[7]/div[2]/div[3]").click()
# get text of Hot stat element
main = driver.find_element(By.XPATH, "/html/body/div[2]/div/div[7]/div[2]/div[6]").text
main = main.replace('Hot stat:', '')
print(main)
# Scroll to close window and click it
close = driver.find_element(By.XPATH, "//*[contains(text(), 'Close window')]")
driver.execute_script("arguments[0].scrollIntoView();", close)
driver.find_element(By.XPATH, "//*[contains(text(), 'Close window')]").click()
# Switch back to main window
driver.switch_to.window(driver.window_handles[0])
# Handle timeout
except TimeoutException:
close = driver.find_element(By.XPATH, "//*[contains(text(), 'Close window')]")
driver.execute_script("arguments[0].scrollIntoView();", close)
driver.find_element(By.XPATH, "//*[contains(text(), 'Close window')]").click()
driver.switch_to.window(driver.window_handles[0])
pass
# Handle no element found
except NoSuchElementException:
close = driver.find_element(By.XPATH, "//*[contains(text(), 'Close window')]")
driver.execute_script("arguments[0].scrollIntoView();", close)
driver.find_element(By.XPATH, "//*[contains(text(), 'Close window')]").click()
driver.switch_to.window(driver.window_handles[0])
pass
driver.quit()
EDIT
To handle possible Hot streak or Hot stat text field, please add an if/elif statement after finding the text field "main".
main = driver.find_element(By.XPATH, "/html/body/div[2]/div/div[7]/div[2]/div[6]").text
if 'Hot stat:' in main:
main = main.replace('Hot stat:', '')
elif 'Hot streak:' in main:
main = main.replace('Hot streak:', '')

Selenium: How use while loop to click link if it exists?

I am trying to write a Python program that uses Selenium to click a button to go to the next page if the button is clickable. This is because I am web scraping from varying amounts of pages.
I have tried to use a while loop that checks the href attribute, but the code doesn't click the button, nor does it return an error. If I simply write button.click(), but without a while loop or conditional check for the href attribute, then the program clicks the button correctly.
My code also has a while loop condition of "variable is not None". Is this a valid usage of "is not"? My logic is for the program to click the button to go to the next page if there is an href available from the to click.
Code:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import numpy as np
import pandas as pd
PATH = "C:\Program Files (x86)\chromedriver.exe"
wd = webdriver.Chrome(PATH)
wd.get("https://profiles.ucr.edu/app/home/search;name=;org=Physics%20and%20Astronomy;title=;phone=;affiliation=Faculty")
time.sleep(1)
button = wd.find_element_by_xpath("""//a[#aria-label='Next page']""")
#<a tabindex="0" aria-label="Next page" class="ng-star-inserted" style=""> Next <span class="show-for-sr">page</span></a>
href_data = button.get_attribute('href')
while (href_data is not None):
time.sleep(0.5)
button.click()
href_data = button.get_attribute('href')
Would anyone here be willing to assist me with this? I understand that Selenium requires the user to download a webdriver, so I apologize for any difficulties with testing my code.
Thank you, ExactPlace441

To loop until all pages were clicked.
wd.get('https://profiles.ucr.edu/app/home/search;name=;org=Physics%20and%20Astronomy;title=;phone=;affiliation=Faculty')
wait=WebDriverWait(wd, 10)
while True:
try:
wait.until(EC.element_to_be_clickable((By.XPATH, "//a[#aria-label='Next page']"))).click()
time.sleep(5)
except:
break
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

I faced the same problem then I used gecko driver(selenium Firefox) instead of Chrome. My code was working perfectly in selenium Firefox but same code was not working in selenium Chrome. Without while loop I hadn't any problem to click on button in selenium Chrome browser but it was not working when added while loop. After using gecko driver(selenium Firefox) my problem was solved. Here is an example of while loop that you can use. It will clicking on button until the button disappeared or reach the last page.
i = 1
try:
while i < 2:
button_element = driver.find_element_by_xpath("give your button xpath")
button_element.click() #Our loop will continuing until our button xpath disappeared from web page
except:
pass #when the button xpath will disappeared it will ignore the error and jump to the next section of our code.
Here I modified your code:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
import numpy as np
import pandas as pd
driver = webdriver.Firefox()
driver.maximize_window()
url = "https://profiles.ucr.edu/app/home/search;name=;org=Physics%20and%20Astronomy;title=;phone=;affiliation=Faculty"
driver.get(url)
timeout = 20
# This container collect data from first page
containers = WebDriverWait(driver, timeout).until(EC.visibility_of_all_elements_located((By.XPATH,'//div[#class="column ng-star-inserted"]' )))
for container in containers:
name = container.find_element_by_css_selector('.header-details h5') #we are srcaping name from each page
print(name.text)
i = 1
try:
while i < 2: #Now it will look for “next page button” in every page and continuing click on “next page button” until it will reach the last page.
next_page_button = driver.find_element_by_xpath("//li[#class='pagination-next ng-star-inserted']")
next_page_button.click()
#our this container2 start collect data from second page to last page
containers = WebDriverWait(driver, timeout).until(EC.visibility_of_all_elements_located((By.XPATH,'//div[#class="column ng-star-inserted"]' )))
for container in containers:
name = container.find_element_by_css_selector('.header-details h5') #we are srcaping name from each page
print(name.text)
time.sleep(3)
except:
pass #if any page don't have “next page button” then our code will be end without any error.

How to locate button containing text with Selenium?

What I need: switch to the Reviews tab in description of an extension from Chrome Store (e.g. this one) in order to count the number of reviews.
What I've done: Used BeautifulSoup + Selenium to switch between tabs. I used driver.find_element_by_id('id') BUT it returns an error that it can not find the element.
Here's the code I use:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get(url)
button = driver.find_element_by_id(':22')
button.click()
page = requests.get(driver.current_url)
soup = BeautifulSoup(page.content,'html5lib')
comment_list = soup.find('div', class_ = 'e-f-b-L') #the class of reviews I need to count.
Here's the html-code of the Review button element:
Issues:
How do I make it click the 'Reviews' button so the 'Reviews' tab is displayed?

You can click on that Reviews tab very smoothly If you define a simple xpath like '//div[.="Reviews"]' or so. Check out the script as a proof of concept:
from selenium import webdriver
from selenium.webdriver.support import ui
url = "https://chrome.google.com/webstore/detail/emoji-keyboard-by-emojion/ipdjnhgkpapgippgcgkfcbpdpcgifncb?hl=en"
driver = webdriver.Chrome()
wait = ui.WebDriverWait(driver, 10)
driver.get(url)
wait.until(lambda driver: driver.find_element_by_xpath('//div[.="Reviews"]')).click()
driver.quit()
To make it headless:
from selenium import webdriver
from selenium.webdriver.support import ui
url = "https://chrome.google.com/webstore/detail/emoji-keyboard-by-emojion/ipdjnhgkpapgippgcgkfcbpdpcgifncb?hl=en"
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chromeOptions)
wait = ui.WebDriverWait(driver, 10)
driver.get(url)
wait.until(lambda driver: driver.find_element_by_xpath('//div[.="Reviews"]')).click()
print("It's done")

Download all secret links from a map design website

There is a website which shows links on a map (map layer currently can't be shown but links can be shown as points).
To view this website, this must be followed: (Pictures 1-2-3 also shows the way)
Firstly, click this website 'http://svtbilgi.dsi.gov.tr/Sorgu.aspx',
Secondly, choose '15. Kizilirmak Havzasi' from 'Havza' tab,
Finally, click 'sorgula' bottom.
After the final stage, you should view the website ('http://svtbilgi.dsi.gov.tr/HaritaNew.aspx') where the points can be shown on a map.
Normally, I can use selenium to download web pages or can grab all links using different libraries. However, these methods can't obtain the links because they are embedded almost in a secret way.
I would like to download all the links that these points have.
For example, this script doesn't continue after 'parent_handle = driver.current_window_handle' line. I don't know why?
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
driver = webdriver.Firefox(executable_path=r'D:\geckodriver.exe')
driver.get("http://svtbilgi.dsi.gov.tr/Sorgu.aspx")
driver.find_element_by_id("ctl00_hld1_cbHavza").click()
Select(driver.find_element_by_id("ctl00_hld1_cbHavza")).select_by_visible_text("15. Kizilirmak Havzasi")
driver.find_element_by_id("ctl00_hld1_cbHavza").click()
driver.find_element_by_id("ctl00_hld1_btnListele").click()
parent_handle = driver.current_window_handle
all_urls = []
all_images = driver.find_elements_by_xpath("//div[contains(#id,'OL_Icon')]/img")
for image in all_images :
image.click()
for handle in driver.window_handles :
if handle != parent_handle:
driver.switch_to_window(handle)
WebDriverWait(driver, 5).until(lambda d: d.execute_script('return document.readyState') == 'complete')
all_urls.append(driver.current_url)
driver.close()
driver.switchTo.window(parent_handle)

Why not click them one by one and then get the URL of the opened window, using driver.getCurrentUrl()?
In the below code, first I wait for all the images and then perform the click action using ActionChains class since the normal Selenium click() wasn't working.
Complete code in Python -
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path=r'D:\Test automation\chromedriver.exe')
driver.get("http://svtbilgi.dsi.gov.tr/Sorgu.aspx")
driver.find_element_by_id("ctl00_hld1_cbHavza").click()
Select(driver.find_element_by_id("ctl00_hld1_cbHavza")).select_by_visible_text("15. Kizilirmak Havzasi")
driver.find_element_by_id("ctl00_hld1_btnListele").click()
parent_handle = driver.current_window_handle
driver.maximize_window()
all_urls = []
all_images = WebDriverWait(driver, 15).until(EC.presence_of_all_elements_located((By.XPATH,"//div[contains(#id,'OL_Icon')]/img")))
print len(all_images)
for image in all_images :
webdriver.ActionChains(driver).move_to_element(image).click(image).perform()
for handle in driver.window_handles :
if handle != parent_handle:
driver.switch_to_window(handle)
WebDriverWait(driver, 15).until(lambda d: d.execute_script('return document.readyState') == 'complete')
all_urls.append(driver.current_url)
driver.close()
driver.switch_to.window(parent_handle)
print all_urls

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scraping web-page with button-multitems click - selenium

Related

Selenium trouble finding button

Print piece of text using selenium with multiple open windows

Selenium: How use while loop to click link if it exists?

How to locate button containing text with Selenium?

Download all secret links from a map design website

Categories

Resources