How To Scrape This Field - selenium

I want the following field, the "514" id. (Id is located in the first row of this webpage)
I tried using xpath with class name and then get attribute, but that prints blank.
Here is a screenshot of the tag in question
Screenshot
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.abstractsonline.com/pp8/#!/10517/sessions/#timeSlot=Apr08/1')
page_source = driver.page_source
element = driver.find_elements_by_xpath('.//li[#class="result clearfix"]')
for el in element:
id=el.find_element_by_class_name('name').get_attribute("data-id")
print(id)

You can use find once.
by css - .result.clearfix .name
by xpath - .//*[#class='result clearfix']//*[#class='name']

Related

Selenium error 'Message: no such element: Unable to locate element'

I get this error when trying to get the price of the product:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element
But the thing is that I am searching by XPath from the Inspect HTML panel. SO how come it doesn't'search it?
Here is the code:
main_page = 'https://www.takealot.com/500-classic-family-computer-tv-game-ejc/PLID90448265'
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get(main_page)
time.sleep(10)
active_price = driver.find_element(By.XPATH,'//*[#id="shopfront-app"]/div[3]/div[1]/div[2]/aside/div[1]/div[1]/div[1]/span').text
print(active_price)
I found the way to get the price the other way, but I am still interested why selenium can't find it by XPath:
price = driver.find_element(By.CLASS_NAME, 'buybox-module_price_2YUFa').text
active_price = ''
after_discount = ''
count = 0
for char in price:
if char == 'R':
count +=1
if count ==2:
after_discount += char
else:
active_price += char
print(active_price)
print(after_discount)
To extract the text R 345 ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following locator strategies:
Using CSS_SELECTOR and text attribute:
driver.get("https://www.takealot.com/500-classic-family-computer-tv-game-ejc/PLID90448265")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span[data-ref='buybox-price-main']"))).text)
Using XPATH and get_attribute("innerHTML"):
driver.get("https://www.takealot.com/500-classic-family-computer-tv-game-ejc/PLID90448265")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[#data-ref='buybox-price-main']"))).get_attribute("innerHTML"))
Console Output:
R 345
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python
References
Link to useful documentation:
get_attribute() method Gets the given attribute or property of the element.
text attribute returns The text of the element.
Difference between text and innerHTML using Selenium
You are trying to get a price from buybox-module_price_2YUFa but price is actually inside the child span
//span[#data-ref='buybox-price-main']
Use following xpath to get the price

Selenium can't find by element

I try to scroll down by element class. I need to scroll by tweet in twitter.com
from time import sleep
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://twitter.com/elonmusk')
sleep(5)
while True:
html = driver.find_element_by_class_name('css-901oao r-1fmj7o5 r-1qd0xha r-a023e6 r-16dba41 r-rjixqe r-bcqeeo r-bnwqim r-qvutc0')
html.send_keys(Keys.END)
I have error:
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: .css-901oao r-1fmj7o5 r-1qd0xha r-a023e6 r-16dba41 r-rjixqe r-bcqeeo r-bnwqim r-qvutc0
That class name have spaces in your code, class_name does not work with spaces, try the below xpath :
//div[#data-testid='tweet']
and you can write like this in code :
counter = 1
while True:
html = driver.find_element_by_xpath(f"(//div[#data-testid='tweet'])[{counter}]")
counter = counter + 1
html.send_keys(Keys.END)

sometimes selenium don't get inspection element

i'm trying to crawling this part
enter image description here
sometimes selenium got that part, but sometimes not.
But i don't figure out the reason why.
my code is below:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome(r'C:(my path)\chromedriver.exe')
url = 'https://www.rocketpunch.com/companies?page=1'
driver.get(url)
driver.implicitly_wait(5)
html = driver.page_source
print(html)
soup = BeautifulSoup(html, 'lxml')
comments = soup.findAll('h4', {'class': 'header name'})
for comment in comments:
print(comment)
After the driver.implicitly_wait(5) you are using here just add some short fixed delay like time.sleep(1)
implicitly_wait waits until some element is found. It doesn't wait for all the elements / the entire page to be loaded.

How to get the "none display" html from selenium

I'm trying to get some context by using selenium, however I can't just get the "display: none" part content. I tried use attribute('innerHTML') but still not work as expected.
Hope if you could share some knowledge.
[Here is the html][1]
[1]: https://i.stack.imgur.com/LdDL4.png
# -*- coding: utf-8 -*-
from selenium import webdriver
import time
from bs4 import BeautifulSoup
import re
from pyvirtualdisplay import Display
from lxml import etree
driver = webdriver.PhantomJS()
driver.get('http://flights.ctrip.com/')
driver.maximize_window()
time.sleep(1)
element_time = driver.find_element_by_id('DepartDate1TextBox')
element_time.clear()
element_time.send_keys(u'2017-10-22')
element_arr = driver.find_element_by_id('ArriveCity1TextBox')
element_arr.clear()
element_arr.send_keys(u'北京')
element_depart = driver.find_element_by_id('DepartCity1TextBox')
element_depart.clear()
element_depart.send_keys(u'南京')
driver.find_element_by_id('search_btn').click()
time.sleep(1)
print(driver.current_url)
driver.find_element_by_id('btnReSearch').click()
print(driver.current_url)
overlay=driver.find_element_by_id("mask_loading")
print(driver.exeucte_script("return arguments[0].getAttribute('style')",overlay))
driver.quit()
To retrieve the attribute "display: none" you can use the following line of code:
String my_display = driver.findElement(By.id("mask_loading")).getAttribute("display");
System.out.println("Display attribute is set to : "+my_display);
if element style attribute has the value display:none, then it is a hidden element. basically selenium doesn't interact with hidden element. you have to go with javascript executor of selenium to interact with it. You can get the style value as given below.
WebElement overlay=driver.findElement(By.id("mask_loading"));
JavascriptExecutor je = (JavascriptExecutor )driver;
String style=je.executeScript("return arguments[0].getAttribute("style");", overlay);
System.out.println("style value of the element is "+style);
It prints the value "z-index: 12;display: none;"
or if you want to get the innerHTML,
String innerHTML=je.executeScript("return arguments[0].innerHTML;",overlay);
In Python,
overlay=driver.find_element_by_id("mask_loading")
style =driver.exeucte_script("return arguments[0].getAttribute('style')",overlay)
or
innerHTML=driver.execute_script("return arguments[0].innerHTML;", overlay)

How to get innerHTML of whole page in selenium driver?

I'm using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.
Somebody has shown how to get inner HTML of an element in a Selenium WebDriver. Is there a way to get HTML of the whole page? Thanks
The sample code in Python
(Based on the post above, the language seems to not matter too much):
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)
the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
To get the HTML for the whole page:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")
html = driver.page_source
To get the outer HTML (tag included):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')
To get the inner HTML (tag excluded):
# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
driver.page_source probably outdated. Following worked for me
let html = await driver.getPageSource();
Reference: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/ie_exports_Driver.html#getPageSource
Using page object in Java:
#FindBy(xpath = "xapth")
private WebElement element;
public String getInnnerHtml() {
System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML"));
return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")
}
A C# snippet for those of us who might want to copy / paste a bit of working code some day
var element = yourWebDriver.FindElement(By.TagName("html"));
string outerHTML = element.GetAttribute(nameof(outerHTML));
Thanks to those who answered before me. Anyone in the future who benefits from this snippet of C# that gets the HTML for any page element in a Selenium test, please consider up voting this answer or leaving a comment.