Selenium can't find by element - selenium

I try to scroll down by element class. I need to scroll by tweet in twitter.com
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://twitter.com/elonmusk')
sleep(5)
while True:
html = driver.find_element_by_class_name('css-901oao r-1fmj7o5 r-1qd0xha r-a023e6 r-16dba41 r-rjixqe r-bcqeeo r-bnwqim r-qvutc0')
html.send_keys(Keys.END)
I have error:
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .css-901oao r-1fmj7o5 r-1qd0xha r-a023e6 r-16dba41 r-rjixqe r-bcqeeo r-bnwqim r-qvutc0

That class name have spaces in your code, class_name does not work with spaces, try the below xpath :
//div[#data-testid='tweet']
and you can write like this in code :
counter = 1
while True:
html = driver.find_element_by_xpath(f"(//div[#data-testid='tweet'])[{counter}]")
counter = counter + 1
html.send_keys(Keys.END)

Related

Selenium error 'Message: no such element: Unable to locate element'

I get this error when trying to get the price of the product:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element
But the thing is that I am searching by XPath from the Inspect HTML panel. SO how come it doesn't'search it?
Here is the code:
main_page = 'https://www.takealot.com/500-classic-family-computer-tv-game-ejc/PLID90448265'
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get(main_page)
time.sleep(10)
active_price = driver.find_element(By.XPATH,'//*[#id="shopfront-app"]/div[3]/div[1]/div[2]/aside/div[1]/div[1]/div[1]/span').text
print(active_price)
I found the way to get the price the other way, but I am still interested why selenium can't find it by XPath:
price = driver.find_element(By.CLASS_NAME, 'buybox-module_price_2YUFa').text
active_price = ''
after_discount = ''
count = 0
for char in price:
if char == 'R':
count +=1
if count ==2:
after_discount += char
else:
active_price += char
print(active_price)
print(after_discount)
To extract the text R 345 ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and text attribute:
driver.get("https://www.takealot.com/500-classic-family-computer-tv-game-ejc/PLID90448265")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span[data-ref='buybox-price-main']"))).text)
Using XPATH and get_attribute("innerHTML"):
driver.get("https://www.takealot.com/500-classic-family-computer-tv-game-ejc/PLID90448265")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[#data-ref='buybox-price-main']"))).get_attribute("innerHTML"))
Console Output:
R 345
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
You are trying to get a price from buybox-module_price_2YUFa but price is actually inside the child span
//span[#data-ref='buybox-price-main']
Use following xpath to get the price

Does Selenium only work Javascript sites? (BeautifulSoup works but Selenium does not)

I am trying to scrape data from google finance with following link
https://www.google.com/finance/quote/ACN:NYSE
The section I am trying to fetch is on right side containing information like market cap, p/e ratio etc.
Earlier I thought it was javascript and wrote the following snippet:
class_name = 'gyFHrc'
options = Options()
options.headless = True
service = Service('/usr/local/bin/geckodriver')
browser = Firefox(service=service, options=options)
browser.get(base_url+suffix)
wait = WebDriverWait(browser, 15)
wait.until(presence_of_element_located((By.CLASS_NAME, class_name))) # <--line 58
stuff = browser.find_elements(By.CLASS_NAME, class_name)
print(f'stuff-->{stuff}')
for elem in stuff:
html = elem.get_attribute("outerHTML")
# print(f'html:{html}')
I get the following error:
File "scraping_google_finance_js.py", line 58, in <module>
wait.until(presence_of_element_located((By.CLASS_NAME, class_name)))
File "/Users/me/opt/anaconda3/envs/scraping/lib/python3.10/site-packages/selenium/webdriver/support/wait.py", line 90, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
WebDriverError#chrome://remote/content/shared/webdriver/Errors.jsm:183:5
NoSuchElementError#chrome://remote/content/shared/webdriver/Errors.jsm:395:5
element.find/</<#chrome://remote/content/marionette/element.js:300:16
Later, I realised that this was plain HTML and I can use BeautifulSoup as follows:
class_name = 'gyFHrc'
soup = BeautifulSoup(html, 'html.parser')
box_rows = soup.find_all("div", class_name)
print(box_rows)
for row in box_rows:
print(type(row), str(row.contents[1].contents))
This worked with following output:
<class 'bs4.element.Tag'> ['$295.14']
<class 'bs4.element.Tag'> ['$289.67 - $298.00']
<class 'bs4.element.Tag'> ['$261.77 - $417.37']
.....
The question is, why did it not work with Selenium? Did I do something wrong? or Selenium only works with Javascript site?
Clearly time to load the page was not the problem as BeautifulSoup could fetch and parse the page
The error selenium.common.exceptions.TimeoutException says the element you are trying to load or find is not found within the given time.
Probably your internet is slow to load the stuff in time. Increase the wait time to get the result.
This error usually happens when selenium can't find the desired tag or element. But in your case, the element was there.
I checked the code with a few changes, and it worked for me so it's probably an issue with the element loading in time.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(ChromeDriverManager().install())
class_name = "gyFHrc"
driver.get("https://www.google.com/finance/quote/ACN:NYSE")
wait = WebDriverWait(driver, 15)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, class_name))) # <--line 58
stuff = driver.find_elements(By.CLASS_NAME, class_name)
print(f"stuff-->{stuff}")
for elem in stuff:
html = elem.get_attribute("outerHTML")
print(f"html:{html}")
Result

How To Scrape This Field

I want the following field, the "514" id. (Id is located in the first row of this webpage)
I tried using xpath with class name and then get attribute, but that prints blank.
Here is a screenshot of the tag in question
Screenshot
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.abstractsonline.com/pp8/#!/10517/sessions/#timeSlot=Apr08/1')
page_source = driver.page_source
element = driver.find_elements_by_xpath('.//li[#class="result clearfix"]')
for el in element:
id=el.find_element_by_class_name('name').get_attribute("data-id")
print(id)
You can use find once.
by css - .result.clearfix .name
by xpath - .//*[#class='result clearfix']//*[#class='name']

Selenium find element error, but same code with js can work

use the python execute can find the element bottom and click, it works:
driver.execute_script("document.getElementsByClassName(\"g-c-R webstore-test-button-label\")[0].click()")
but with the similar code, the python code can not work:
element_install_bottom=driver.find_element(by=By.ID, value=r'g-c-R webstore-test-button-label')
and throw the exception:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":"[id="g-c-R webstore-test-button-label"]"}
and have try by.ID, by.ClassName; but always throw the exception.
the whole code as below:
from selenium import webdriver
from selenium.webdriver.common.by import By
import selenium.webdriver.support.ui as ui
import time
option = webdriver.ChromeOptions()
USER_DATA_PATH = r"D:\Chrome\User Data 3"
option.add_argument(f'--user-data-dir={USER_DATA_PATH}')
option.add_experimental_option('excludeSwitches', ['enable-automation'])
print(option.arguments)
driver = webdriver.Chrome(options=option)
extension_url = "https://chrome.google.com/webstore/detail/dark-reader/eimadpbcbfnmbkopoojfekhnkhdbieeh?hl=zh-CN"
driver.get(extension_url)
time.sleep(60)
element_install_bottom=driver.find_element(by=By.ID, value=r'g-c-R webstore-test-button-label')
print(element_install_bottom)
You're seeing that exception as there is no element with that ID, that value is it's class.
Python:
driver.find_element_by_class_name("g-c-R webstore-test-button-label")
or:
driver.find_element(By.CLASS_NAME, "g-c-R webstore-test-button-label")

How to get the "none display" html from selenium

I'm trying to get some context by using selenium, however I can't just get the "display: none" part content. I tried use attribute('innerHTML') but still not work as expected.
Hope if you could share some knowledge.
[Here is the html][1]
[1]: https://i.stack.imgur.com/LdDL4.png
# -*- coding: utf-8 -*-
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import re
from pyvirtualdisplay import Display
from lxml import etree
driver = webdriver.PhantomJS()
driver.get('http://flights.ctrip.com/')
driver.maximize_window()
time.sleep(1)
element_time = driver.find_element_by_id('DepartDate1TextBox')
element_time.clear()
element_time.send_keys(u'2017-10-22')
element_arr = driver.find_element_by_id('ArriveCity1TextBox')
element_arr.clear()
element_arr.send_keys(u'北京')
element_depart = driver.find_element_by_id('DepartCity1TextBox')
element_depart.clear()
element_depart.send_keys(u'南京')
driver.find_element_by_id('search_btn').click()
time.sleep(1)
print(driver.current_url)
driver.find_element_by_id('btnReSearch').click()
print(driver.current_url)
overlay=driver.find_element_by_id("mask_loading")
print(driver.exeucte_script("return arguments[0].getAttribute('style')",overlay))
driver.quit()
To retrieve the attribute "display: none" you can use the following line of code:
String my_display = driver.findElement(By.id("mask_loading")).getAttribute("display");
System.out.println("Display attribute is set to : "+my_display);
if element style attribute has the value display:none, then it is a hidden element. basically selenium doesn't interact with hidden element. you have to go with javascript executor of selenium to interact with it. You can get the style value as given below.
WebElement overlay=driver.findElement(By.id("mask_loading"));
JavascriptExecutor je = (JavascriptExecutor )driver;
String style=je.executeScript("return arguments[0].getAttribute("style");", overlay);
System.out.println("style value of the element is "+style);
It prints the value "z-index: 12;display: none;"
or if you want to get the innerHTML,
String innerHTML=je.executeScript("return arguments[0].innerHTML;",overlay);
In Python,
overlay=driver.find_element_by_id("mask_loading")
style =driver.exeucte_script("return arguments[0].getAttribute('style')",overlay)
or
innerHTML=driver.execute_script("return arguments[0].innerHTML;", overlay)