trying to open full text when webscraping using selenium

trying to open full text when webscraping using selenium - selenium

I am trying to open the full text on a page when using selenium.
I got the following code:
import selenium
from selenium import webdriver as wb
from selenium.webdriver.common.by import By
import time
webD=wb.Chrome("C:\Program Files (x86)\chromedriver.exe")
webD.get('https://www.flashscore.com/')
webD.maximize_window() # For maximizing window
webD.implicitly_wait(2) # gives an implicit wait for 20 seconds
webD.find_element_by_id('onetrust-reject-all-handler').click()
matchpages = webD.find_elements(By.CLASS_NAME, "preview-ico.icon--preview")
Open_full_text = webD.find_elements(By.CLASS_NAME, "previewShowMore.showMore")
for matchpages in matchpages:
matchpages.click()
time.sleep(5)
for Open_full_text in Open_full_text:
Open_full_text.click()
However, When I try to let this script click on the open full text button, it does not open up completely.
What is the reason for this, or how can this be adjusted? Is it ok to use the for loop in this scenario, or do I have to use another method?
Thanks very much!

Related

Selenium (Python)- Webscraping verb-conjugation tables (Accessing web elements underneath '#document')

Section 0: Introduction:
This is my first webscraping project and I am not experienced in using selenium . I am trying to scrape arabic verb-conjugation tables from the website:
Online Sarf Generator
Any help with the following probelem will be great.
Thank you.
Section 1: The Problem:
I am trying to webscrape from the following website:
Online Sarf Generator
For doing this, I am trying to use Selenium.
I basically need to select the three root letters and the family from the four toggle menus as shown in the picture below:
After this, I have to click the 'Generate Sarf Table' button.
Section 2: My Attempt:
Here is my code:
#------------------ Just Setting Up the web_driver:
s = Service('/usr/local/bin/chromedriver')
# Set some selenium chrome options:
chromeOptions = Options()
# chromeOptions.headless = False
driver = webdriver.Chrome(service=s, options=chromeOptions)
driver.get('https://sites.google.com/view/sarfgenerator/home')
# I switch the frame once:
iframe = driver.find_elements(by=By.CSS_SELECTOR, value='iframe')[0]
driver.switch_to.frame(iframe)
# I switch the frame again:
iframe = driver.find_elements(by=By.CSS_SELECTOR, value='iframe')[0]
driver.switch_to.frame(iframe)
This takes me to the frame within which the webelements that I need are located.
Now, I print the html to see where I am at:
print(BeautifulSoup(driver.execute_script("return document.body.innerHTML;"),'html.parser'))
Here is the output that I get:
<iframe frameborder="0" id="userHtmlFrame" scrolling="yes">
</iframe>
<script>function loadGapi(){var loaderScript=document.createElement('script');loaderScript.setAttribute('src','https://apis.google.com/js/api.js?checkCookie=1');loaderScript.onload=function(){this.onload=function(){};loadGapiClient();};loaderScript.onreadystatechange=function(){if(this.readyState==='complete'){this.onload();}};(document.head||document.body||document.documentElement).appendChild(loaderScript);}function updateUserHtmlFrame(userHtml,enableInteraction,forceIosScrolling){var frame=document.getElementById('userHtmlFrame');if(enableInteraction){if(forceIosScrolling){var iframeParent=frame.parentElement;iframeParent.classList.add('forceIosScrolling');}else{frame.style.overflow='auto';}}else{frame.setAttribute('scrolling','no');frame.style.pointerEvents='none';}clearCookies();clearStorage();frame.contentWindow.document.open();frame.contentWindow.document.write('<base target="_blank">'+userHtml);frame.contentWindow.document.close();}function onGapiInitialized(){gapi.rpc.call('..','innerFrameGapiInitialized');gapi.rpc.register('updateUserHtmlFrame',updateUserHtmlFrame);}function loadGapiClient(){gapi.load('gapi.rpc',onGapiInitialized);}if(document.readyState=='complete'){loadGapi();}else{self.addEventListener('load',loadGapi);}function clearCookies(){var cookies=document.cookie.split(";");for(var i=0;i<cookies.length;i++){var cookie=cookies[i];var equalPosition=cookie.indexOf("=");var name=equalPosition>-1?cookie.substr(0,equalPosition):cookie;document.cookie=name+"=;expires=Thu, 01 Jan 1970 00:00:00 GMT";document.cookie=name+"=;expires=Thu, 01 Jan 1970 00:00:01 GMT ;domain=.googleusercontent.com";}}function clearStorage(){try{localStorage.clear();sessionStorage.clear();}catch(e){}}</script>
However, the actual html on the website looks like this:
Section 3: The main problem with my approach:
I am unable to access the anything #document contained within the iframe.
Section 4: Conclusion:
Is there a possible solution that can fix my current approach to the problem?
Is there any other way to solve the problem described in Section 1?

You put a lot of effort into structuring your question, so I couldn't not answer it, even if it meant double negation.
Here is how you can drill down into the iframe with content:
EDIT: here is how you can select some options, click the button and access the results:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")
webdriver_service = Service("chromedriver_linux64/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 25)
url = 'https://sites.google.com/view/sarfgenerator/home'
driver.get(url)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[#aria-label="Custom embed"]')))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[#id="innerFrame"]')))
wait.until(EC.frame_to_be_available_and_switch_to_it((By.XPATH, '//*[#id="userHtmlFrame"]')))
first_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[#id="root1"]'))))
second_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[#id="root2"]'))))
third_select = Select(wait.until(EC.element_to_be_clickable((By.XPATH, '//select[#id="root3"]'))))
first_select.select_by_visible_text("ج")
second_select.select_by_visible_text("ت")
third_select.select_by_visible_text("ص")
wait.until(EC.element_to_be_clickable((By.XPATH, ('//button[#onclick="sarfGenerator(false)"]')))).click()
print('clicked')
result = wait.until(EC.presence_of_element_located((By.XPATH, '//p[#id="demo"]')))
print(result.text)
Result printed in terminal:
clicked
جَتَّصَ يُجَتِّصُ تَجتِيصًا مُجَتِّصٌ
جُتِّصَ يُجَتَّصُ تَجتِيصًا مُجَتَّصٌ
جَتِّصْ لا تُجَتِّصْ مُجَتَّصٌ Highlight Root Letters
Selenium setup is for Linux, you just have to observe the imports, and the part after defining the driver.
Selenium documentation can be found here.

find_element_by_name selenium python question

I'm a beginner trying to use selenium to automate browser interactions through an undetectable chrome browser.
The code i've done so far is below (you have no idea the time i've sunk into 5 lines).
I've tried so many iterations of the same code that I've lost sanity. This SHOULD work?
This is almost copied exactly from a youtube video now, there were some other ideas that youtubers did use but I didn't understand the coding so I haven't touched them. Anything #'d can be ignored or assumed that i've played with it and failed.
import autogui, sys, time, webbrowser, selenium
import undetected_chromedriver.v2 as uc
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common import action_chains
#Open Browser and visit website.
driver = uc.Chrome()
driver.get('https://www.iqrpg.com/game.html')
time.sleep(5)
#Complete username and password fields
userN = 'Gary'
passW = 'Barry'
find_element_by_name('login_username').send_keys(userN)
#find_element_by_name('login_password').send_keys(passW)
#driver.find_element_by_css_selector("input[type=\"submit"
#userField =
#passField = driver.find_element(By.ID, "passwd-id")
#search_box = driver.find_element_by_name('Battling')
#search_box.send_keys('ChromeDriver')
#search_box.submit()
#time.sleep(5)
1.Expecting the browser to open, forcibly logging you out due to selenium chrome
2. select name='login_username', send key the string saved under userN
3. same for password
4. click login (not yet coded, but plans)

In the latest Selenium with Python, you cannot use - driver.find_element_by_name.
Instead, you have to use: driver.find_element
from selenium.webdriver.common.by import By
driver.find_element(By.NAME, "login_username").send_keys("userN")
driver.find_element(By.NAME, "login_password").send_keys("passW")

EASY PYTHON SELENIUM: How do I hit CTRL + F?

I am following this tutorial but I cannot get it to work. My goal is to hit CTRL + F. The page opens but nothing happens after that. Do you see any issues?
Code:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from time import sleep
driver = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe")
driver.get("https://www.geeksforgeeks.org/")
sleep(2)
action = ActionChains(driver)
action.key_down(Keys.CONTROL).send_keys('f').key_up(Keys.CONTROL).perform()

I don't think you're required to include the Key.up, try:
action.key_down(Keys.CONTROL).send_keys('f').perform()
Additional information can be found here.

Trying to find the correct xpath

I have made code, see following lines:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.by import By
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.flashscore.com/match/jBvNMej6/#match-summary")
print(driver.title)
driver.maximize_window() # For maximizing window
driver.implicitly_wait(10) # gives an implicit wait for 20 seconds
driver.find_element_by_id('onetrust-reject-all-handler').click()
time.sleep(2)
driver.find_element(By.CLASS_NAME,'previewShowMore.showMore').click()
main = driver.find_element(By.CLASS_NAME,'previewLine'[b[text()="Hot stat:"]]/text)
print(main.text)
time.sleep(2)
driver.close()
However, I get the following error.
main = driver.find_element(By.CLASS_NAME,'previewLine'[b[text()="Hot stat:"]]/text)
^
SyntaxError: invalid syntax
What can I do to avoid this?
thx! : )

Well, in this line
main = driver.find_element(By.CLASS_NAME,'previewLine'[b[text()="Hot stat:"]]/text)
You have made a great mix :)
Your locator is absolutely invalid.
Also, if you want to print the paragraph text without the "Hot streak" you will need to remove that string from the entire div (paragraph) text.
This should do what you are trying to achieve:
main = driver.find_element(By.XPATH,"//div[#class='previewLine' and ./b[text()='Hot streak']]").text
main = main.replace('Hot streak','')
print(main)

I'm not finding any text 'Hot stat:'. You'll have to attach the html code where you found that.
I assume that you want to retrieve the text of a specific previewLine?
main = driver.find_element(By.XPATH ,'//div[#class="previewLine"]/b[contains(text(),"Hot streak")]/..')
print(main.text)

Selenium can't find search element inside form

I'm trying to use selenium to perform searches in lexisnexis and I can't get it to find the search box.
I've tried find_element_by using all possible attributes and I only get the "NoSuchElementException: Message: no such element: Unable to locate element: " error every time.
See screenshot of the inspection tab -- the highlighted part is the element I need
My code:
from selenium import webdriver
import numpy as np
import pandas as pd
searchTerms = r'something'
url = r'https://www.lexisnexis.com/uk/legal/news' # this is the page after login - not including the code for login here.
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get(url)
I tried everything:
browser.find_element_by_id('search-query')
browser.find_element_by_xpath('//*[#id="search-query"]')
browser.find_element_by_xpath('/html/body/div/header/div/form/div[2]/input')
etc..
Nothing works. Any suggestions?

Could be possible your site is taking to long to load , in such cases you can use waits to avoid synchronization issue.
wait = WebDriverWait(driver, 10)
inputBox = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#id='search-query']")))
Note : Add below imports to your solution
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

trying to open full text when webscraping using selenium - selenium

Related

Selenium (Python)- Webscraping verb-conjugation tables (Accessing web elements underneath '#document')

find_element_by_name selenium python question

EASY PYTHON SELENIUM: How do I hit CTRL + F?

Trying to find the correct xpath

Selenium can't find search element inside form

Categories

Resources