I've been trying to create a program that finds data on a site using selenium. I've downloaded the webdriver and by module.
driver.get("https://en.wikipedia.org/wiki/Main_Page")
number = driver.find_element(By.CSS_SELECTOR("articlecount a"))
print(number)
When I run the code, it displays the error that the str object is not callable. (I'm assuming the error comes from the article count a part.
I've tried removing the two quotations around "articlecount a" but that just creates more errors.
Does anyone know what I have to do to allow the CSS Selector to extract data?
Instead of this
number = driver.find_element(By.CSS_SELECTOR("articlecount a"))
It should be like
number = driver.find_element(By.CSS_SELECTOR,"articlecount a")
print(number)
There are several problems here:
The command you are trying to use has somewhat different syntax.
Instead of
number = driver.find_element(By.CSS_SELECTOR("articlecount a"))
It should be something like
number = driver.find_element(By.CSS_SELECTOR,"articlecount a")
Your locator is wrong.
Instead of articlecount a it probably should be #articlecount a
You are missing a delay here. The best practice is to use Expected Conditions explicit waits
You need to extract the text value from number web element object.
With all the above your code will be:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 20)
driver.get("https://en.wikipedia.org/wiki/Main_Page")
number = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#articlecount a"))).text
print(number)
The output here will be
6,469,178
It should be
number = driver.find_element(By.CSS_SELECTOR,"#articlecount a")
print(number.text)
Related
First time using selenium for web scraping a website, and I'm fairly new to python. I have tried to scrape a Swedish housing site to extract price, address, area, size, etc., for every listing for a specific URL that shows all houses for sale in a specific area called "Lidingö".
I managed to bypass the pop-up window for accepting cookies.
However, the output I get from the terminal is blank when the script runs. I get nothing, not an error, not any output.
What could possibly be wrong?
The code is:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
s = Service("/Users/brustabl1/hemnet/chromedriver")
url = "https://www.hemnet.se/bostader?location_ids%5B%5D=17846&item_types%5B%5D=villa"
driver = webdriver.Chrome(service=s)
driver.maximize_window()
driver.implicitly_wait(10)
driver.get(url)
# The cookie button clicker
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "/html/body/div[62]/div/div/div/div/div/div[2]/div[2]/div[2]/button"))).click()
lists = driver.find_elements(By.XPATH, '//*[#id="result"]/ul[1]/li[1]/a/div[2]')
for list in lists:
adress = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[2]/a/div[2]/div/div[1]/div[1]/h2')
area = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[1]/div[1]/div/span[2]')
price = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[1]')
rooms = list.find_element(By.XPATH,'//*
[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[3]')
size = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[2]')
print(adress.text)
There are a lot of flaws in your code...
lists = driver.find_elements(By.XPATH, '//*[#id="result"]/ul[1]/li[1]/a/div[2]')
in your code this returns a list of elements in the variable lists
for list in lists:
adress = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[2]/a/div[2]/div/div[1]/div[1]/h2')
area = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[1]/div[1]/div/span[2]')
price = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[1]')
rooms = list.find_element(By.XPATH,'//*
[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[3]')
size = list.find_element(By.XPATH,'//*[#id="result"]/ul[1]/li[1]/a/div[2]/div/div[2]/div[1]/div[2]')
print(adress.text)
you are not storing the value of each address in a list, instead, you are updating its value through each iteration.And xpath refers to the exact element, your loop is selecting the same element over and over again!
And scraping text through selenium is a bad practice, use BeautifulSoup instead.
I am trying to select a certain element in a webpage in selenium. I know that the element's name looks like person_xxxxx with xxxx being random numbers. I would like to know if it is possible to select this element using xpath or css selector. So far I have tried:
cartes_profils=liste_profils_shadow.find_elements_by_xpath('//*[contains(#id,"person_")]')
which is deprecated and doesn't work
cartes_profils=liste_profils_shadow.find_elements(by=By.CSS_SELECTOR,value='input[id^="#person_"]')
which runs but doesn't select the desired elements
cartes_profils=liste_profils_shadow.find_elements(by=By.XPATH,value=("//*[contains(#id, 'person_')]"))
which returns an "invalid selector" error
PS: I know that there are similar topics already answered but they are all a million years old and the solutions do not work for me
You should ideally be using starts-with for xpath. Also, yes you are right
find_elements_by_*** is deprecated in selenium 4. You should use find_element(By.XPATH, "")
so your effective xpath would be:
//*[starts-with(#id,'person_')]
use it like
driver.find_element(By.XPATH, "//*[starts-with(#id,'person_')]")
and equivalent CSS would be: (you are fairly close here)
input[id^="person_"]
use it like
driver.find_element(By.CSS_SELECTOR, "input[id^="person_"]")
My recommendation would be explicit waits:
element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//*[starts-with(#id,'person_')]")))
You'll have to import these for explicit waits:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
I am running a webscraper and I am not able to click on the third element. I am not sure what to do as I have tried googling and running several types of code.
Below is a screenshot of the html and my code. I need the third element in the list to be clicked on. It is highlighted in the screenshot. I am not sure what to do with the css and data-bind
here is the code for max bed options. I also need to get the 2 beds just like we did for min bed options
thanks!!
According to the picture the following should work:
driver.find_element_by_xpath('//span[#id="bedsMinMaxRangeControl"]//li[#data-value="2"]').click()
But we need to see the entire page HTML to give a correct answer.
Also don't forget to use delays/ waits there.
UPD
For the new question the code will be:
driver.find_element_by_xpath('//span[#id="bedsMinMaxRangeControl"]//ul[contains(#class,"maxBedOptions")]//li[#data-value="2"]').click()
Here you should also use the appropriate data-value that has values from -1 up to 3
You can use css_selector with the data-value attribute.
locator = ".dropdownContent .minBedOptions li[data-value='2']"
WebDriverWait(driver, 10).until((EC.element_to_be_clickable, (By.CSS_SELECTOR, locator))).click()
I used WebDriverWait so make sure to import it...
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
If it's just a list using li tags as it seems from the shared snap, You could probably write the easiest xpath as :
//li[contains(text(), '2 Beds')]
and use it like this :
driver.find_element_by_xpath("//li[contains(text(), '2 Beds')]").click()
or if you want to use xpath in conjunction of WebDriverWait, use it like this :
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.XPATH, "//li[contains(text(), '2 Beds')]")))
Import :
from selenium.webdriver.support import expected_conditions as EC
Now let's talk about when we do not want to be dependent on 2 Beds text in XPATH, cause if text changes in the UI, we'd have to change the locator in Selenium-Python bindings.
A good way to do is to :
Make use of data-value attribute : //li[#data-value = '2']
Make use of ul and li tags : //ul[contains(#class, 'minBedOptions')/li[#data-value = '2']]
How to I effectively use elements retrieved from Selenium that are stored in variables? I am using python. In the program below:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.select import Select
driver = webdriver.Firefox()
driver.get("https://boards.4chan.org/wg/archive")
matching_threads = []
key = "Pixel"
for i in driver.find_elements_by_class_name("teaser-col"):
if key in i.text:
matching_threads.append(i)
matched_thread = i
print(matching_threads)
driver.quit()
I get the following from the printout of matching_threads:
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="aa74a4a6-5bb2-4b48-92b6-50f5d51a9e5c", element="59b6076f-a5a2-4862-9c1f-028025e4b567")>]
How can I use that output to select said element in selenium and interact with it? What I am trying to do is goto that element and then click on the element to the right of it. What I am failing to understand is how to retrieve the element in selenium using the stored information in matching_threads.
If anyone can help me, I would very much appreciate it.
To click on the next opposing td with an a tag with class quotelink.
i.find_element_by_xpath(".//following::td[1]/a[#class='quotelink']").click()
Now if the page moves to another you could just grab the href value, insert them into an array and than loop through them with and use driver.get(). If it opens a new tab you should be fine.
.get_attribute('href')
I'm trying to check the value of the src attribute for this image (highlighted in blue):
This is what I'm trying (not working):
visual = col_12_wrapper.find_element_by_class_name('visual')
left_text_img = visual.find_element_by_css_selector('div.col-sm-6:first-of-type')
left_img = left_text_img.find_element_by_tag_name('img')
#this line below fails
left_img[contains(#src,'../../static/images/databytes/colors/frame-0164.jpg')]
This line:
left_img[contains(#src,'../../static/images/databytes/colors/frame-0164.jpg')]
Is trying to use an XPATH as an index.
you would need to use find_element like so:
left_img.find_element_by_xpath(".//*[contains(#src, '../../static/images/databytes/colors/frame-0164.jpg')]")
I would recommend a more direct path of finding this element though:
direct_path = driver.find_element_by_xpath(".//div[#class='visual']/div[#class='col-sm-6']//img[#class='color-frame' and contains(#src, 'frame-0164.jpg')]")
If you want to get the element and then check it's src attribute, try this:
direct_path = driver.find_element_by_xpath(".//div[#class='visual']/div[#class='col-sm-6']//img[#class='color-frame']")
src_attribute = direct_path.get_attribute('src')
SIDENOTE: Based on your error message in the comments, you are running on an old chromedriver 2.35 which does not support your current version of Chrome 67, please go HERE to update your chromedriver as well. Recommended for build 67 is current chromedriver 2.40.
You can try with this code : [you can get the src attribute from the DOM like this and it's always good to use WebDriverWait]
img = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.color-squares.squares-20+img')))
source = img.get_attribute("src")
print(source)
Note that you will have to import these :
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
In case you are looking for xpath , that would be :
//div[contains(#class,'color-squares squares-20')]/following-sibling::img
Below is the XPath of the img element
//img[#class='color-frame'][contains(#src,'frame-0164.jpg')]