Trying to find the correct xpath - selenium

I have made code, see following lines:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.by import By
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.flashscore.com/match/jBvNMej6/#match-summary")
print(driver.title)
driver.maximize_window() # For maximizing window
driver.implicitly_wait(10) # gives an implicit wait for 20 seconds
driver.find_element_by_id('onetrust-reject-all-handler').click()
time.sleep(2)
driver.find_element(By.CLASS_NAME,'previewShowMore.showMore').click()
main = driver.find_element(By.CLASS_NAME,'previewLine'[b[text()="Hot stat:"]]/text)
print(main.text)
time.sleep(2)
driver.close()
However, I get the following error.
main = driver.find_element(By.CLASS_NAME,'previewLine'[b[text()="Hot stat:"]]/text)
^
SyntaxError: invalid syntax
What can I do to avoid this?
thx! : )

Well, in this line
main = driver.find_element(By.CLASS_NAME,'previewLine'[b[text()="Hot stat:"]]/text)
You have made a great mix :)
Your locator is absolutely invalid.
Also, if you want to print the paragraph text without the "Hot streak" you will need to remove that string from the entire div (paragraph) text.
This should do what you are trying to achieve:
main = driver.find_element(By.XPATH,"//div[#class='previewLine' and ./b[text()='Hot streak']]").text
main = main.replace('Hot streak','')
print(main)

I'm not finding any text 'Hot stat:'. You'll have to attach the html code where you found that.
I assume that you want to retrieve the text of a specific previewLine?
main = driver.find_element(By.XPATH ,'//div[#class="previewLine"]/b[contains(text(),"Hot streak")]/..')
print(main.text)

Related

Selenium (Python)- Webscraping verb-conjugation tables (Accessing web elements underneath '#document')

Section 0: Introduction:
This is my first webscraping project and I am not experienced in using selenium . I am trying to scrape arabic verb-conjugation tables from the website:
Online Sarf Generator
Any help with the following probelem will be great.
Thank you.
Section 1: The Problem:
I am trying to webscrape from the following website:
Online Sarf Generator
For doing this, I am trying to use Selenium.
I basically need to select the three root letters and the family from the four toggle menus as shown in the picture below:
After this, I have to click the 'Generate Sarf Table' button.
Section 2: My Attempt:
Here is my code:
#------------------ Just Setting Up the web_driver:
s = Service('/usr/local/bin/chromedriver')
# Set some selenium chrome options:
chromeOptions = Options()
# chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)
driver.get('https://sites.google.com/view/sarfgenerator/home')
# I switch the frame once:
iframe = driver.find_elements(by=By.CSS_SELECTOR, value='iframe')[0]
driver.switch_to.frame(iframe)
# I switch the frame again:
iframe = driver.find_elements(by=By.CSS_SELECTOR, value='iframe')[0]
driver.switch_to.frame(iframe)
This takes me to the frame within which the webelements that I need are located.
Now, I print the html to see where I am at:
print(BeautifulSoup(driver.execute_script("return document.body.innerHTML;"),'html.parser'))
Here is the output that I get:
<iframe frameborder="0" id="userHtmlFrame" scrolling="yes">
</iframe>
<script>function loadGapi(){var loaderScript=document.createElement('script');loaderScript.setAttribute('src','https://apis.google.com/js/api.js?checkCookie=1');loaderScript.onload=function(){this.onload=function(){};loadGapiClient();};loaderScript.onreadystatechange=function(){if(this.readyState==='complete'){this.onload();}};(document.head||document.body||document.documentElement).appendChild(loaderScript);}function updateUserHtmlFrame(userHtml,enableInteraction,forceIosScrolling){var frame=document.getElementById('userHtmlFrame');if(enableInteraction){if(forceIosScrolling){var iframeParent=frame.parentElement;iframeParent.classList.add('forceIosScrolling');}else{frame.style.overflow='auto';}}else{frame.setAttribute('scrolling','no');frame.style.pointerEvents='none';}clearCookies();clearStorage();frame.contentWindow.document.open();frame.contentWindow.document.write('<base target="_blank">'+userHtml);frame.contentWindow.document.close();}function onGapiInitialized(){gapi.rpc.call('..','innerFrameGapiInitialized');gapi.rpc.register('updateUserHtmlFrame',updateUserHtmlFrame);}function loadGapiClient(){gapi.load('gapi.rpc',onGapiInitialized);}if(document.readyState=='complete'){loadGapi();}else{self.addEventListener('load',loadGapi);}function clearCookies(){var cookies=document.cookie.split(";");for(var i=0;i<cookies.length;i++){var cookie=cookies[i];var equalPosition=cookie.indexOf("=");var name=equalPosition>-1?cookie.substr(0,equalPosition):cookie;document.cookie=name+"=;expires=Thu, 01 Jan 1970 00:00:00 GMT";document.cookie=name+"=;expires=Thu, 01 Jan 1970 00:00:01 GMT ;domain=.googleusercontent.com";}}function clearStorage(){try{localStorage.clear();sessionStorage.clear();}catch(e){}}</script>
However, the actual html on the website looks like this:
Section 3: The main problem with my approach:
I am unable to access the anything #document contained within the iframe.
Section 4: Conclusion:
Is there a possible solution that can fix my current approach to the problem?
Is there any other way to solve the problem described in Section 1?
You put a lot of effort into structuring your question, so I couldn't not answer it, even if it meant double negation.
Here is how you can drill down into the iframe with content:
EDIT: here is how you can select some options, click the button and access the results:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver_linux64/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 25)
url = 'https://sites.google.com/view/sarfgenerator/home'
driver.get(url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[#aria-label="Custom embed"]')))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[#id="innerFrame"]')))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[#id="userHtmlFrame"]')))
first_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[#id="root1"]'))))
second_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[#id="root2"]'))))
third_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[#id="root3"]'))))
first_select.select_by_visible_text("ج")
second_select.select_by_visible_text("ت")
third_select.select_by_visible_text("ص")
wait.until(EC.element_to_be_clickable((By.XPATH, ('//button[#onclick="sarfGenerator(false)"]')))).click()
print('clicked')
result = wait.until(EC.presence_of_element_located((By.XPATH, '//p[#id="demo"]')))
print(result.text)
Result printed in terminal:
clicked
جَتَّصَ يُجَتِّصُ تَجتِيصًا مُجَتِّصٌ
جُتِّصَ يُجَتَّصُ تَجتِيصًا مُجَتَّصٌ
جَتِّصْ لا تُجَتِّصْ مُجَتَّصٌ Highlight Root Letters
Selenium setup is for Linux, you just have to observe the imports, and the part after defining the driver.
Selenium documentation can be found here.

Python move_to_element().click() is not pressing a right element visible on the screen or returns the error. A trial code is included

I try to interact with the elements (button at this scenario) inside Disqus iframe on this webpage:
This is my trial code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
path_to_chromedriver = r"c:\users\tv21\source\repos\chromedriver.exe"
driver = webdriver.Chrome(executable_path=path_to_chromedriver)
driver.maximize_window()
url = "https://www.postoj.sk/91472/po-navsteve-kina-si-precitajte-aj-kniznu-predlohu"
driver.get(url)
time.sleep(5)
button_to_close = driver.execute_script("return document.querySelector('body').querySelector('div.grv-dialog-host').shadowRoot.querySelector('div').querySelector('div.buttons-wrapper').querySelector('button.sub-dialog-btn.block_btn')")
ac = ActionChains(driver)
ac.move_to_element(button_to_close).click().perform()
open_discussion = driver.find_element_by_class_name('article-disqus-wrapper')
driver.execute_script("arguments[0].setAttribute('style','display: block;')", open_discussion)
disqus_thread = driver.find_element_by_id("disqus_thread")
iframe_element = disqus_thread.find_element_by_tag_name("iframe")
driver.switch_to.frame(iframe_element)
time.sleep(1)
button_to_load_more = driver.find_element_by_partial_link_text("Nahraj viac komentárov")
ac = ActionChains(driver)
ac.move_to_element(button_to_load_more).click().perform()
The issue is the last command:
ac.move_to_element(button_to_load_more).click().perform()
which shows an error: "move target out of bounds"
I tried instead:
button_to_load_more.click()
and
driver.execute_script("arguments[0].click();", button_to_load_more)
which both work completely fine as the alternatives and I can click the button.
However, I try to understand the reason for being out of bounds when using move_to_element(). I get exactly the same error always when I want to hover over any elements inside Disqus iframe too.
Can anyone help me to fix it or explain to me how to fix it?
First one dint worked because of the known issue in selenium,i guess you are using 3.4 hence facing this.(But it should work after trying newer version of selenium)
Some of the useful links fyr
Selenium MoveTargetOutOfBoundsException even after scrolling to element
https://github.com/SeleniumHQ/selenium/issues/4148

trying to open full text when webscraping using selenium

I am trying to open the full text on a page when using selenium.
I got the following code:
import selenium
from selenium import webdriver as wb
from selenium.webdriver.common.by import By
import time
webD=wb.Chrome("C:\Program Files (x86)\chromedriver.exe")
webD.get('https://www.flashscore.com/')
webD.maximize_window() # For maximizing window
webD.implicitly_wait(2) # gives an implicit wait for 20 seconds
webD.find_element_by_id('onetrust-reject-all-handler').click()
matchpages = webD.find_elements(By.CLASS_NAME, "preview-ico.icon--preview")
Open_full_text = webD.find_elements(By.CLASS_NAME, "previewShowMore.showMore")
for matchpages in matchpages:
matchpages.click()
time.sleep(5)
for Open_full_text in Open_full_text:
Open_full_text.click()
However, When I try to let this script click on the open full text button, it does not open up completely.
What is the reason for this, or how can this be adjusted? Is it ok to use the for loop in this scenario, or do I have to use another method?
Thanks very much!

how to use time sleep to make selenium output consistent

This might be the stupidest question i asked yet but this is driving me nuts...
Basically i want to get all links from profiles but for some reason selenium gives different amounts of links most of the time ( sometimes all sometimes only a tenth)
I experimented with time.sleep and i know its affecting the output somehow but i dont understand where the problem is.
(but thats just my hypothesis maybe thats wrong)
I have no other explanation why i get incosistent output. Since i get all profile links from time to time the program is able to find all relevant profiles.
heres what the output should be (for different gui input)
input:anlagenbau output:3070
Fahrzeugbau output:4065
laserschneiden output:1311
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.common.exceptions import TimeoutException
from urllib.request import urlopen
from datetime import date
from datetime import datetime
import easygui
import re
from selenium.common.exceptions import NoSuchElementException
import time
#input window suchbegriff
suchbegriff = easygui.enterbox("Suchbegriff eingeben | Hinweis: suchbegriff sollte kein '/' enthalten")
#get date and time
now = datetime.now()
current_time = now.strftime("%H-%M-%S")
today = date.today()
date = today.strftime("%Y-%m-%d")
def get_profile_url(label_element):
# get the url from a result element
onlick = label_element.get_attribute("onclick")
# some regex magic
return re.search(r"(?<=open\(\')(.*?)(?=\')", onlick).group()
def load_more_results():
# load more results if needed // use only on the search page!
button_wrapper = wd.find_element_by_class_name("loadNextBtn")
button_wrapper.find_element_by_tag_name("span").click()
#### Script starts here ####
# Set some Selenium Options
options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
# Webdriver
wd = webdriver.Chrome(options=options)
# Load URL
wd.get("https://www.techpilot.de/zulieferer-suchen?"+str(suchbegriff))
# lets first wait for the timeframe
iframe = WebDriverWait(wd, 5).until(
EC.frame_to_be_available_and_switch_to_it("efficientSearchIframe")
)
# the result parent
result_pane = WebDriverWait(wd, 5).until(
EC.presence_of_element_located((By.ID, "resultPane"))
)
#get all profilelinks as list
time.sleep(5)
href_list = []
wait = WebDriverWait(wd, 15)
while True:
try:
#time.sleep(1)
wd.execute_script("loadFollowing();")
#time.sleep(1)
try:
wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".fancyCompLabel")))
except TimeoutException:
break
#time.sleep(1) # beeinflusst in irgeneiner weise die findung der ergebnisse
result_elements = wd.find_elements_by_class_name("fancyCompLabel")
#time.sleep(1)
for element in result_elements:
url = get_profile_url(element)
href_list.append(url)
#time.sleep(2)
while True:
try:
element = wd.find_element_by_class_name('fancyNewProfile')
wd.execute_script("""var element = arguments[0];element.parentNode.removeChild(element);""", element)
except NoSuchElementException:
break
except NoSuchElementException:
break
wd.close #funktioniert noch nicht
print("####links secured: "+str(len(href_list)))
Since you say that the sleep is affecting the number of results, it sounds like they're loading asynchronously and populating as they're loaded, instead of all at once.
The first question is whether you can ask the web site developers to change this, to only show them when they're all loaded at once.
Assuming you don't work for the same company as them, consider:
Is there something else on the page that shows up when they're all loaded? It could be a button or a status message, for instance. Can you wait for that item to appear, and then get the list?
How frequently do new items appear? You could poll for the number of results relatively infrequently, such as only every 2 or 3 seconds, and then consider the results all present when you get the same number of results twice in a row.
The issue is the method presence_of_all_elements_located doesn't wait for all elements matching a passed locator. It waits for presence of at least 1 element matching the passed locator and then returns a list of elements found on the page at that moment matching that locator.
In Java we have
wait.until(ExpectedConditions.numberOfElementsToBeMoreThan(element, expectedElementsAmount));
and
wait.until(ExpectedConditions.numberOfElementsToBe(element, expectedElementsAmount));
With these methods you can wait for predefined amount of elements to appear etc.
Selenium with Python doesn't support these methods.
The only thing you can see with Selenium in Python is to build some custom method to do these actions.
So if you are expecting some amount of elements /links etc. to appear / be presented on the page you can use such method.
This will make your test stable and will avoid usage of hardcoded sleeps.
UPD
I have found this solution.
This looks to be the solution for the mentioned above methods.
This seems to be a Python equivalent for wait.until(ExpectedConditions.numberOfElementsToBeMoreThan(element, expectedElementsAmount));
myLength = 9
WebDriverWait(browser, 20).until(lambda browser: len(browser.find_elements_by_xpath("//img[#data-blabla]")) > int(myLength))
And this
myLength = 10
WebDriverWait(browser, 20).until(lambda browser: len(browser.find_elements_by_xpath("//img[#data-blabla]")) == int(myLength))
Is equivalent for Java wait.until(ExpectedConditions.numberOfElementsToBe(element, expectedElementsAmount));

how to restart the execution of a test case in selenium ide

I have a test case which stops and throws an error "Element not found".
Now what I want to do is, I want to selenium ide to refresh the page and run the script from beginning.
All I want to know is that, is there any way to make this happen in Selenium IDE. If yes, how?
Yes, of course. First of all, you should have a try-except-else structure. Then you should have a while True loop and specify the NoSuchElementException for continuing from the beginning of the loop. In the first line of this except block, you put a line as driver.refresh(). In the else block you break the loop and consider that you should have two except block; one for NoSuchElementException and another for other exceptions. In the second you should break the loop I guess. Here's a sample code:
import selenium.common.exceptions as SeleniumExceptions
from selenium import webdriver
import sys
driver = webdriver.Firefox()
while True:
try:
element = driver.find_element_by_id('some_id')
# do whatever you want
except SeleniumExceptions.NoSuchElementException:
driver.refresh()
continue
except:
print sys.exc_info()
break
else:
break
EDIT1:
Consider that if you have to work with a specific frame or iframe, you should switch back to it with driver.switch_to_frame(driver.find_element_by_id('frame_id')) and continue your work; or if you think you should wait for the frame to load, you can use this code:
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
WebDriverWait(driver, timeout=10).until(EC.frame_to_be_available_and_switch_to_it(driver.find_element_by_id('frame_id')))