Dynamic(with mouseover/coordinates) web scraping python unable to extract information

Dynamic(with mouseover/coordinates) web scraping python unable to extract information - selenium

I'm trying to scrape the data that only appears on mouseover(selenium). It's a concert map and this is my entire code. I keep getting TypeError: 'ActionChains' object is not iterable
The idea would be to hover over the whole map & always scrape the code when the html changes. I'm pretty sure I need two for loops for that, but I don't know yet, how to combine them. Also, I know I'll have to use bs4, could someone share ideas how I could go about this?
driver = webdriver.Chrome()
driver.maximize_window()
driver.get('https://www.ticketcorner.ch/event/simple-minds-hallenstadion-12035377/')
#accept shadow-root cookie banner
time.sleep(5)
driver.execute_script('return document.querySelector("#cmpwrapper").shadowRoot.querySelector("#cmpbntyestxt")').click()
time.sleep(5)
# Click on the saalplan so we get to the concert map
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#tickets > div.seat-switch-wrapper.js-tab-switch-group > a:nth-child(3) > div > div.seat-switch-radio.styled-checkbox.theme-switch-bg.theme-text-color.theme-link-color-hover.theme-switch-border > label'))).click()
time.sleep(5)
#Scroll to the concert map, which will be the element to hover over
element_map = driver.find_element(By.CLASS_NAME, 'js-tickettype-cta')
actions = ActionChains(driver)
actions.move_to_element(element_map).perform()
# Close the drop-down which is partially hiding the concert map
driver.find_element(by=By.XPATH, value='//*[#id="seatmap-tab"]/div[2]/div/div/section/div[2]/div[1]/div[1]/div/div/div[1]/div').click()
# Mouse Hover over the concert map and find the empty seats to extract the data.
actions = ActionChains(driver)
data = actions.move_to_element_with_offset(driver.find_element(by=By.XPATH, value='//*[#id="seatmap-tab"]/div[2]/div/div/section/div[2]/div[1]/div[2]/div/div[2]/div[1]/div[2]/div[2]/canvas'),0,0)
for i in data:
actions.move_by_offset(50, 50).perform()
time.sleep(2)
# print content of each box
hover_data = driver.find_element(By.XPATH, '//*[#id="tooltipster-533522"]/div[1]').get_attribute('tooltipster-content')
print(hover_data)```
# The code I would use to hover over the element
#actions.move_by_offset(100, 50).perform()
# time.sleep(5)
# actions.move_by_offset(150, 50).perform()
# time.sleep(5)
# actions.move_by_offset(200, 50).perform()

Related

Webdriver Selenium not loading new page after click()

I´m using selenium to scrape a webpage and it finds the elements on the main page, but when I use the click() function, the driver never finds the elements on the new page. I used beautifulSoup to see if it´s getting the html, but the html is always from the main. (When I see the driver window it shows that the page is opened).
html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify)
I´ve used webDriverWait() to see if it´s not loading but even after 60 seconds it never does,
element = WebDriverWait(driver, 60).until(EC.presence_of_element_located((By.ID, "ddlProducto")))
also execute_script() to check if by clicking the button using javascript loads the page, but it returns None when I print a variable saving the new page.
selectProducto = driver.execute_script("return document.getElementById('ddlProducto');")
print(selectProducto)
Also used chwd = driver.window_handles and driver.switch_to_window(chwd[1]) but it says that the index is out of range.
chwd = driver.window_handles
driver.switch_to.window(chwd[1])

Why not able to get the page and search form with login details (selenium, Beautifulsoup)

I want to scrape this this for some of my natural language processing work. I have a subscription to the website but still, I am not able to get the result. I got the error that unable to locate the element.
The link to login page is login
This is the code that I tried in python with selenium.
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_experimental_option('useAutomationExtension', False)
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_argument("disable-infobars")
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", options=options)
driver.get('https://login.newscorpaustralia.com/login?state=hKFo2SBmOXc1TjRJNDlBX3hObkZPN1NsRWgzcktONTlPVnJMS6FupWxvZ2luo3RpZNkgUi1ZRmV2Z2dwcWJmZUpqdWtZdk5CUUllX0h3YngwanSjY2lk2SAwdjlpN0tvVzZNQkxTZmUwMzZZU1FUNzl6QThaYXo0WQ&client=0v9i7KoW6MBLSfe036YSQT79zA8Zaz4Y&protocol=oauth2&response_type=token%20id_token&scope=openid%20profile&audience=newscorpaustralia&site=couriermail&redirect_uri=https%3A%2F%2Fwww.couriermail.com.au%2Fremote%2Fidentity%2Fauth%2Flatest%2Flogin%2Fcallback.html%3FredirectUri%3Dhttps%253A%252F%252Fwww.couriermail.com.au%252Fsearch-results%253Fq%253Djason%252520huny&prevent_sign_up=true&nonce=7j4grLXRD39EVhGsxcagsO5c-PtAY4Md&auth0Client=eyJuYW1lIjoiYXV0aDAuanMiLCJ2ZXJzaW9uIjoiOS4xOS4wIn0%3D')
time.sleep(10)
elem = driver.find_element(by=By.CLASS_NAME,value='navigation_search')
username = driver.find_element(by=By.ID,value='1-email')
password = driver.find_element(by=By.NAME,value='password')
login = driver.find_element(by=By.NAME,value='submit')
username.send_keys("myid");
password.send_keys("password");
login.click();
time.sleep(20)
soup = BeautifulSoup(driver.page_source, 'html.parser')
search = driver.find_element(by=By.CSS_SELECTOR,value='form.navigation_search')
search.click();
search.send_keys("jason hunt");
print(driver.page_source)
Below is the error that I am getting. I want to grab the search icon and send the keys there but I am not getting the search form after login.
Below is the text based HTML of the element.
I tried printing the page source and I was not able to locate the html element there too.

Not a proper answer, but since you can't add formatting to comments and this has the same desired effect:
driver.get("https://www.couriermail.com.au/search-results");
WebDriverWait(driver, timeout=10).until(lambda d: d.find_element(By.CLASS_NAME, "search_box_input"))
searchBox = driver.find_element(By.CLASS_NAME, "search_box_input")
searchBox.send_keys("test");

Trying to click a link to get a popup in selenium

I'm trying to scrape a website, that has a link that produces a popup that I want to scrape. If I click on the product it would give me the information, but then I would have to do back() back() a million times if I get the popup I can scrape the info close the popup, and move to the next product.
m/RlY2k.jpg
These are just some of the things I've tried:
quick = driver.find_element_by_xpath("/html/body/div2/div2/div/div3/div3/div[5]/ul2/li2/div")
quick.click()
//*[#id="js_proList"]/ul[1]/li[1]/div/div[1]/span
//body[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[5]/ul[1]/li[1]/div[1]/div[1]/span[1]
//span[contains(#xpath,'1')]
This the code.
<div class="goods_img pr fast-btn-hover js_goodsHoverImg" data-goods-id="475526308" xpath="1"> <span data-logsss-const-value="" data-href="/m-goods-a-fast-id-7684901.htm" class="fast-buy js_fast_buy">QUICK SHOP</span>

That Quick shop is hidden, you need to hover over it and then its interactible
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from time import sleep
from selenium.webdriver.common.action_chains import ActionChains
driver = webdriver.Chrome(executable_path="D:/chromedriver.exe")
driver.get('https://www.rosegal.com/plus-size-tops-120/')
driver.maximize_window()
#get all quick shop
quick_shop_as_list = driver.find_elements_by_xpath("//*[#class='fast-buy js_fast_buy']")
#for all quick shop hover over the produc tab to display the quick shop
for i in range(0,len(quick_shop_as_list)):
a = ActionChains(driver)
current_product_to_hover_over = driver.find_elements_by_xpath("//div[(#class='goods_img pr fast-btn-hover js_goodsHoverImg')]")
a.move_to_element(current_product_to_hover_over[i]).perform()
sleep(1)
quick_shop_as_list[i].click()
sleep(1)
#there's an iframe on this pop up, have to switch to it
iframe = driver.find_element_by_class_name('xubox_iframe')
driver.switch_to_frame(iframe)
#do the scrapping, im getting the whole div
popup_div = driver.find_element_by_xpath("//div[#id='page']")
print(popup_div.text)
#exit iframe and close the pop up
driver.switch_to.default_content()
driver.find_element_by_xpath("//*[contains(#class,'xubox_close')]").click()
sleep(1)
#will require some scroll down at some point, i think, in case that move_to_element fails

Locating elements in section with selenium

I'm trying to enter text into a field (the subject field in the image) in a section using Selenium .
I've tried locating by Xpath , ID and a few others but it looks like maybe I need to switch context to the section. I've tried the following, errors are in comments after lines.
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
opts = Options()
browser = Firefox(options=opts)
browser.get('https://www.linkedin.com/feed/')
sign_in = '/html/body/div[1]/main/p/a'
browser.find_element_by_xpath(sign_in).click()
email = '//*[#id="username"]'
browser.find_element_by_xpath(email).send_keys(my_email)
pword = '//*[#id="password"]'
browser.find_element_by_xpath(pword).send_keys(my_pword)
signin = '/html/body/div/main/div[2]/div[1]/form/div[3]/button'
browser.find_element_by_xpath(signin).click()
search = '/html/body/div[8]/header/div[2]/div/div/div[1]/div[2]/input'
name = 'John McCain'
browser.find_element_by_xpath(search).send_keys(name+"\n")#click()
#click on first result
first_result = '/html/body/div[8]/div[3]/div/div[1]/div/div[1]/main/div/div/div[1]/div/div/div/div[2]/div[1]/div[1]/span/div/span[1]/span/a/span/span[1]'
browser.find_element_by_xpath(first_result).click()
#hit message button
msg_btn = '/html/body/div[8]/div[3]/div/div/div/div/div[2]/div/div/main/div/div[1]/section/div[2]/div[1]/div[2]/div/div/div[2]/a'
browser.find_element_by_xpath(msg_btn).click()
sleep(10)
## find subject box in section
section_class = '/html/body/div[3]/section'
browser.find_element_by_xpath(section_class) # no such element
browser.switch_to().frame('/html/body/div[3]/section') # no such frame
subject = '//*[#id="compose-form-subject-ember156"]'
browser.find_element_by_xpath(subject).click() # no such element
compose_class = 'compose-form__subject-field'
browser.find_element_by_class_name(compose_class) # no such class
id = 'compose-form-subject-ember156'
browser.find_element_by_id(id) # no such element
css_selector= 'compose-form-subject-ember156'
browser.find_element_by_css_selector(css_selector) # no such element
wind = '//*[#id="artdeco-hoverable-outlet__message-overlay"]
browser.find_element_by_xpath(wind) #no such element
A figure showing the developer info for the text box in question is attached.
How do I locate the text box and send keys to it? I'm new to selenium but have gotten thru login and basic navigation to this point.
I've put the page source (as seen by the Selenium browser object at this point) here.
The page source (as seen when I click in the browser window and hit 'copy page source') is here .

Despite the window in focus being the one I wanted it seems like the browser object saw things differently . Using
window_after = browser.window_handles[1]
browser.switch_to_window(window_after)
allowed me to find the element using an Xpath.

Scrapy Running Results

Just getting started with Scrapy, I'm hoping for a nudge in the right direction.
I want to scrape data from here:
https://www.sportstats.ca/display-results.xhtml?raceid=29360
This is what I have so far:
import scrapy
import re
class BlogSpider(scrapy.Spider):
name = 'sportstats'
start_urls = ['https://www.sportstats.ca/display-results.xhtml?raceid=29360']
def parse(self, response):
headings = []
results = []
tables = response.xpath('//table')
headings = list(tables[0].xpath('thead/tr/th/span/span/text()').extract())
rows = tables[0].xpath('tbody/tr[contains(#class, "ui-widget-content ui-datatable")]')
for row in rows:
result = []
tds = row.xpath('td')
for td in enumerate(tds):
if headings[td[0]].lower() == 'comp.':
content = None
elif headings[td[0]].lower() == 'view':
content = None
elif headings[td[0]].lower() == 'name':
content = td[1].xpath('span/a/text()').extract()[0]
else:
try:
content = td[1].xpath('span/text()').extract()[0]
except:
content = None
result.append(content)
results.append(result)
for result in results:
print(result)
Now I need to move on to the next page, which I can do in a browser by clicking the "right arrow" at the bottom, which I believe is the following li:
<li><a id="mainForm:j_idt369" href="#" class="ui-commandlink ui-widget fa fa-angle-right" onclick="PrimeFaces.ab({s:"mainForm:j_idt369",p:"mainForm",u:"mainForm:result_table mainForm:pageNav mainForm:eventAthleteDetailsDialog",onco:function(xhr,status,args){hideDetails('athlete-popup');showDetails('event-popup');scrollToTopOfElement('mainForm\\:result_table');;}});return false;"></a>
How can I get scrapy to follow that?

If you open the url in a browser without javascript you won't be able to move to the next page. As you can see inside the li tag, there is some javascript to be executed in order to get the next page.
Yo get around this, the first option is usually try to identify the request generated by javascript. In your case, it should be easy: just analyze the java script code and replicate it with python in your spider. If you can do that, you can send the same request from scrapy. If you can't do it, the next option is usually to use some package with javascript/browser emulation or someting like that. Something like ScrapyJS or Scrapy + Selenium.

You're going to need to perform a callback. Generate the url from the xpath from the 'next page' button. So url = response.xpath(xpath to next_page_button) and then when you're finished scraping that page you'll do yield scrapy.Request(url, callback=self.parse_next_page). Finally you create a new function called def parse_next_page(self, response):.
A final, final note is if it happens to be in Javascript (and you can't scrape it even if you're sure you're using the correct xpath) check out my repo in using splash with scrapy https://github.com/Liamhanninen/Scrape

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Dynamic(with mouseover/coordinates) web scraping python unable to extract information - selenium

Related

Webdriver Selenium not loading new page after click()

Why not able to get the page and search form with login details (selenium, Beautifulsoup)

Trying to click a link to get a popup in selenium

Locating elements in section with selenium

Scrapy Running Results

Categories

Resources