Scrapy + Selenium + Datepicker

Scrapy + Selenium + Datepicker - selenium

So i need to scrap a page like this for example and i am using Scrapy + Seleninum to interact with the datepicker calendar but i am running into a ElementNotVisibleException: Message: Element is not currently visible and so may not be interacted with.
So far i have:
def parse(self, response):
self.driver.get("https://www.airbnb.pt/rooms/9315238")
try:
element = WebDriverWait(self.driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//input[#name='checkin']"))
)
finally:
x = self.driver.find_element_by_xpath("//input[#name='checkin']").click()
import ipdb;ipdb.set_trace()
self.driver.quit()
I saw some references on how to achieve this https://stackoverflow.com/a/25748322/977622 and https://stackoverflow.com/a/19009256/977622 .
I appreciate if someone could help me out with my issue or even provide a better example on how i can interact the this datepicker calendar.

There are two elements with name="checkin" - the first one that you actually find is invisible. You need to make your locator more specific to match the desired input. I would also use the visibility_of_element_located condition instead:
element = WebDriverWait(self.driver, 10).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, ".book-it-panel input[name=checkin]"))
)

Related

Login successfully but HTML element not found

I am learning web scraping with Selenium for Finance team project. The idea is:
Login to HR system
Search for Purchase Order Number
System display list of attachments
Download the attachments
Below are my codes:
# interaction with Google Chrome
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# specify chromedriver location
PATH = './chromedriver_win32/chromedriver.exe'
# open Google Chrome browser & visit Purchase Order page within HRIS
browser = webdriver.Chrome(PATH)
browser.get('https://purchase.sea.com/#/list')
< user input ID & password >
# user interface shows "My Purhcase Request" & "My Purchase Order" tabs
# click on the Purchase Order tab
try:
po_tab = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.LINK_TEXT, "My Purchase Orders"))
)
po_tab.click()
except:
print('HTML element not found!!')
# locate PO Number field and key in PO number
try:
po_input_field = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "input-field"))
)
po_input_field.send_keys(<dummy PO num#>) ## any PO number
except:
print("PO field not found!!")
# locate Search button and click search
try:
search_button = WebDriverWait(browser, 10).until(
EC.presence_of_element_located((By.LINK_TEXT, "Search"))
)
search_button.click()
except:
print("Search button not found!!")
I stuck at the step # click on the Purchase Order tab and following steps.
I can find the elements but I can see error after executing the py script. The most interesting part is....I can do it perfectly in Jupyter Notebook.
Python Script Execute Error
Here are the elements after inspection screens:
Purchase Orders tab
PO Number input field
Search button

See you are using presence_of_element_located which is basically
""" An expectation for checking that an element is present on the DOM
of a page. This does not necessarily mean that the element is visible.
locator - used to find the element
returns the WebElement once it is located
"""
What I would suggest to you is to use element_to_be_clickable
""" An expectation for checking that an element is present on the DOM of a
page and visible. Visibility means that the element is not only displayed
but also has a height and width that is greater than 0.
locator - used to find the element
returns the WebElement once it is located and visible
so, in your code it'd be something like this :
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.LINK_TEXT, "My Purchase Orders"))).click()
Also, we could try with different locators if this does not work, locators like CSS, XPATH. But I'll let you comment first if this works or not.

The first error in your log is HTML element not found! You are performing a click before element is visible on DOM. Please try below possible solutions,
Wait till element is visible then perform click operation.
EC.visibility_of_element_located((By.LINK_TEXT, "My Purchase Orders"));
If still you are not able click with above code then wait for element to be clickable.
EC.element_to_be_clickable((By.LINK_TEXT, "My Purchase Orders"));
I would suggest to create reusable methods for actions like click() getText() etc.

How to display search results using selenium and GeckoDriver?

I'm trying to print search results of DuckDuckgo using a headless WebDriver and Selenium. However, I cannot locate the DOM elements referring to the search results no matter what ID or class name I search for and no matter how long I wait for it to load.
Here's the code:
opts = Options()
opts.headless = False
browser = Firefox(options=opts)
browser.get('https://duckduckgo.com')
search = browser.find_element_by_id('search_form_input_homepage')
search.send_keys("testing")
search.submit()
# wait for URL to change with 15 seconds timeout
WebDriverWait(browser, 15).until(EC.url_changes(browser.current_url))
print(browser.current_url)
results = WebDriverWait(browser,10)
.until(EC.presence_of_element_located((By.ID,"links")))
time.sleep(10)
results = browser.find_elements_by_class_name('result results_links_deep highlight_d result--url-above-snippet') # I tried many other ID's and class names
print(results) # prints []
I'm starting to suspect there is some trickery to avoid web scraping in DuckDuckGo. Does anyone has a clue?

I've changed to use cssSelector then it works.I use java, not python.
List<WebElement> elements = driver.findElements(
By.cssSelector(".result.results_links_deep.highlight_d.result--url-above-snippet"));
System.out.println(elements.size());
//10

Scrapy returning empty lists when using css

I am trying to scrape nordstrom product descriptions. I got all the item links (stored in local mongodb db) and now am itertating through them and here is an example link https://www.nordstrom.ca/s/leith-ruched-body-con-tank-dress/5420732?origin=category-personalizedsort&breadcrumb=Home%2FWomen%2FClothing%2FDresses&color=001
My code for the spider is:
def parse(self, response):
items = NordstromItem()
description = response.css("div._26GPU").css("div::text").extract()
items['description'] = description
yield items
I also tried scrapy shell and the returned page is blank.
I am also using scrapy random agents.

I suggest you to use css or xpath selector to get the info you want. Here's more about it: https://docs.scrapy.org/en/latest/topics/selectors.html
And you can also use css/xpath checker to help identify if the selector gets the info you want. Like this Chrome extesion: https://autonomiq.io/chropath/

Selecting elements using xpath

So very new here to Selenium but I'm having trouble selecting the element I want from this website. In this case, I got the x_path using Chrome's 'copy XPath tool.' Basically, I'm looking to extract the CID text (in this case 4004) from the website, but my code seems to be unable to do this. Any help would be appreciated!
I have also tried using the CSS selector method as well but it returns the same error.
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = '/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary'
driver= webdriver.Chrome()
chem_name = "D008294"
url = "https://pubchem.ncbi.nlm.nih.gov/#query=" + chem_name
driver.get(url)
elements = driver.find_elements_by_xpath('//*[#id="collection-results-container"]/div/div/div[2]/ul/li/div/div/div/div[2]/div[2]/div[2]/span/a/span/span')
driver.close()
print(elements.text)
As of now, this is the error I receive: 'list' object has no attribute 'text'

Here is the xpath that you can use.
//span[.='Compound CID']//following-sibling::a/descendant::span[2]
Why your script did not worked: I 2 issues in your code.
elements = driver.find_elements_by_xpath('//*[#id="collection-results-container"]/div/div/div[2]/ul/li/div/div/div/div[2]/div[2]/div[2]/span/a/span/span')
driver.close() # <== don't close the browser until you are done with all your steps on the browser or elements
print(elements.text) # <== you can not get text from list (python will through error here
How to fix it:
CID = driver.find_element_by_xpath("//span[.='Compound CID']//following-sibling::a/descendant::span[2]").text # <== returning the text using find_element (not find_elements)
driver.close()
print(CID) # <== now you can print `CID` though browser closed as the value already stored in variable.

Function driver.find_elements_by_xpath return list of Element. You should loop to get text of each element,
Like this:
for ele in print(elements.text):
print(ele.text)
Or if you want to match first Element, use driver.find_element_by_xpath function instead.

Using xpath provided chrome is always does not work as expected. First you have to know how to write xpath and verify it chrome console.
see these links, which helps you to know about xpaths
https://www.guru99.com/xpath-selenium.html
https://www.w3schools.com/xml/xpath_syntax.asp
In this case, first find the span contains text Compound CID and move to parent span the down to child a/span/span. something like //span[contains(text(),'Compound CID']/parent::span/a/span/span.
And also you need to findelement which return single element and get text from it. If you use findelements then it will return list of elements, so you need to loop and get text from those elements.

xpath: //a[contains(#href, 'compound')]/span[#class='breakword']/span
you can use the "href" as your attribute reference since I noticed that it has unique value for each component.
Example:
href="https://pubchem.ncbi.nlm.nih.gov/substance/53790330"
href="https://pubchem.ncbi.nlm.nih.gov/compound/4004"

Django selenium select form options in <select>

How do you use selenium in Django to choose and select an option in a <select> tag of a form?
This is how far I got:
def setUp(self):
self.browser = webdriver.Firefox()
def tearDown(self):
self.browser.quit()
def test_project_info_form(self):
# set url
self.browser.get(self.live_server_url + '/tool/project_info/')
# get module select
my_select = self.browser.find_element_by_name('my_select')
#! select an option, say the first option !#
...

So this post was very useful:
https://sqa.stackexchange.com/questions/1355/what-is-the-correct-way-to-select-an-option-using-seleniums-python-webdriver
Basically I had to target the <select> and <option> by xpath directly, followed by a click event:
self.browser.find_element_by_xpath(
"//select[#id='my_select_id']/option[text()='my_option_text']"
).click()
Or I could have targeted the by index:
self.browser.find_element_by_xpath(
"//select[#id='my_select_id']/option[2]"
).click()
I hope this is helpful to someone with a similar problem.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Scrapy + Selenium + Datepicker - selenium

Related

Login successfully but HTML element not found

How to display search results using selenium and GeckoDriver?

Scrapy returning empty lists when using css

Selecting elements using xpath

Django selenium select form options in <select>

Categories

Resources