Which element to find to go to next webpage? - selenium

Trying to make a web scraper/web crawler however I can't seem to find the correct way to find the element for the first recommended video in this shutterstock page. I've tried all of the possible ways to try to get to it but it always ends up being infinite (or at least very long) or it doesn't return anything back.
Here is the current code that I have.
while vid_count < num_vids:
actual_vid_link = wd.find_element(By.TAG_NAME,'source')
print("Added - " + actual_vid_link.get_attribute('src'))
vid_links.add(actual_vid_link.get_attribute('src'))
vid_count = len(vid_links)
if len(vid_links) >= num_vids:
print(f"Found: {len(vid_links)} video links, done!")
break
else:
wd.find_elements(By.CLASS_NAME, "jss996")[0].click()```

Related

RSelenium: Entering search term - trouble with css selector

I am trying to trigger a search on this site. First, I want to enter a search term, then click the search button. I am able to to do the second step, however I am unfortunately unable to access the search field. Below my attempt.
Start RSelenium
link_to_page<-"https://www.cec.ro/sucursale"
library(RSelenium)
rD <- rsDriver(browser = "firefox", port = 483L, verbose = F)
remDr <- rD[["client"]]
# Navigate to site, and wait
remDr$navigate(link_to_page)
Sys.sleep(5)
#Search for element by its id
remDr$findElement(using="css", "#edit-localitate--_jjPr3WukFY")
Error: Summary: NoSuchElement
Detail: An element could not be located on the page using the given search parameters.
class: org.openqa.selenium.NoSuchElementException
Further Details: run errorDetails method
There is apparently something wrong with the css selector. I checked, it's not nested in an iframe or so, but mabe it's related to the 'form' element it is nested in? Grateful for any hint. Many thanks.
The error is NoSuchElement which indicates an element could not be located on the page using the given search parameter
The second part of the classname i.e. _jjPr3WukFY is dynamically generated and is bound to change sooner/later. They may change next time you access the application afresh or even while next application startup. So can't be used in locators.
Solution
You need to consider any of the other attributes which is static in nature. Example:
remDr$findElement(using="css", "button[id^edit-localitate]")
or
remDr$findElement(using="xpath", "//button[starts-with(#id, 'edit-localitate')]")

In Adobe Acrobat Javascript, how can I force a page to become "editable" before a certain part of a script acts upon it?

What I'm trying to do: Iterate over each page in a PDF, and extract the number of words on each page.
What is happening instead: The code below will return 0 words for any page that has not become "editable". Although I have selected for all pages to become editable at once, Adobe will not maintain the editability of a page for very long after I have left that page. Side note: It also seems to cap how many pages I can have "editable" at once. This is a problem because right now I'm working with a 10 page selection of a pdf file. This same code will have to work with a 120+ page pdf. Please click 'Edit PDF'-->'Scanned Documents'-->'Settings' to see what I mean by "editable". I have already selected the option to have all pages become editable at once.
What I've tried so far: I've tried various ways to get Acrobat to make the page being iterated upon the "active one" so that it would become editable. I've tried manually setting the page number after each iteration of the for loop, and including an artificial delay like with the h variabled for loop in the sample code. I've tried looking for some sort of method that determines which page is the "active one" but I've had no luck so far.
CurrDoc = app.activeDocs[0]
CurrDoc.title;
NumPagesInDoc = CurrDoc.numPages;
console.println("Document has "+NumPagesInDoc+" pages");
for (j=0; j<NumPagesInDoc; j++)
{
NumWordsOnPage = CurrDoc.getPageNumWords(j);
CurrDoc.pageNum = j;
for(h=0; h<10000;h++); //<--I've tried adding in delays to give time so that
//Acrobat can catch up, but this hasn't worked.
console.println("Page number: "+j+" has this number of words: "+ NumWordsOnPage);
};
Output:
Document has 10 pages
Page number: 0 has this number of words: 309
Page number: 1 has this number of words: 0
Page number: 2 has this number of words: 0
Page number: 3 has this number of words: 0
Page number: 4 has this number of words: 0
Page number: 5 has this number of words: 0
Page number: 6 has this number of words: 0
Page number: 7 has this number of words: 0
Page number: 8 has this number of words: 0
Page number: 9 has this number of words: 158
true
Note: Different pages might work on the output at different times depending on which pages I've clicked on most recently before running the script.
Any guidance or help would be greatly appreciated. Thank you for your time.
So. I'm still not entirely sure what the issue is, but I've found a way to get acrobat to function most of the time.
Before clicking the "make all pages editable" option, zoom all the way out until you can see all the pages in the document. For whatever reason, when I did this, it would seem to refresh something about the settings and once again make all the pages editable. This even seemed to work when I opened a totally different pdf and pressed "make all pages editable" even without zooming out.

Scraping a page every time it changes

Hi I am currently looking to scrape a [age such as this "https://www.tennis24.com/match/ABiALWlt/#match-statistics;0" every time it changes score. currently i have the ability to scrape it using selenium and BS using the below code
from selenium import webdriver
Chrom_path = r"C:\Users\Dan1\Desktop\chromedriver.exe"
driver = webdriver.Chrome(Chrom_path)
driver.get("https://www.tennis24.com/match/zVrM3ySQ/#match-statistics;0")
data = driver.find_elements_by_class_name("statTextGroup")
for d in data:
sub_data = d.find_elements_by_xpath(".//*")
assert len(sub_data)==3
for s_d in sub_data:
print(s_d.get_attribute('class')[19:], s_d.get_attribute('innerText'))
but I have no idea how to automate it so that once the score at the top of the page located here"Medical timeout6 : 6 ( 0 : 0 )" changes, the scraper scrapes the new data. the change to monitor though is only visible when the match is in play and not always there.
if you need anymore info please let me know and ill be happy to add it
You can scrape in a while loop the "scoreboard"-class and when this is not the same as the old value of this then this value changed and you can scrape the other things you wanted.
Hope it helped

What's the proper way to access a previous page's elements using Protractor?

I have a test where I click through list of links on a page. If I open the links in new pages, I can iterate through the list using browser.switchTo().window and the original window's handle, clicking on each one.
But if I open a link in the same page (_self) and I navigate back to the original list of links using browser.navigate().back(), I get the following error when it iterates to click the next link:
StaleElementReferenceError: stale element reference: element is not attached to the page document
What's the proper way to access elements on a prior page once you've navigated away?
When you request a object from Selenium, Selenium IDs that object with a internal Unique ID. See the below command
>>> driver.find_element_by_tag_name("a")
<selenium.webdriver.firefox.webelement.FirefoxWebElement
(session="93fc2bec-c9f8-0c46-aec3-1939af00c917",
element="5173f7fb-63ca-e447-b176-4a226d956834")>
As you can see the element has a unique uuid. Selenium maintains these list internally, so when you take an action like click, it fetches the element from its cached and takes action on it.
Once the page is refreshed or a new page is loaded, this cache is no longer valid. But the object you created in your language binding still is. If I try and execute some action on its
>>> elem.is_displayed()
selenium.common.exceptions.StaleElementReferenceException:
Message: The element reference of [object Null] null
stale: either the element is no longer attached to the DOM or the page has been refreshed
So in short there is no way to use the same object. Which means you need to alter your approach. Consider the below code
for elem in driver.find_elements_by_tag_name("a"):
elem.click()
driver.back()
Above code will fail on the second attempt of elem.click(). So the fix is to make sure not to re-use a collection object. Instead use a number based loop. I can write the above code in many different ways. Consider few approaches below
Approach 1
elems = driver.find_elements_by_tag_name("a")
count = len(elems)
for i in range(0, count):
elems[i].click()
driver.back()
elems = driver.find_elements_by_tag_name("a")
This is not a very great approach as I am getting a collection of objects and only using one of them. A page which would have 500+ odd links will make this code quite slow
Approach 2
elems = driver.find_elements_by_tag_name("a")
count = len(elems)
for i in range(1, count + 1):
elem = driver.find_element_by_xpath("(//a)[{}]".format(i))
driver.back()
This is better than approach 1 as I am getting all objects just one. Latter I am getting one and using one
Approach 3
elems = driver.find_elements_by_tag_name("a")
links = []
for elem in elems:
links.append(elem.get_attribute("href"))
for link in links:
driver.get(link)
# do some action
This approach will only work when links are href based. So it is that based on the situation I would choose or alter my approach

applescript getInputByClass2 with Safari 10.1

My Applescripts I used to run everyday to get text from Safari isn't working since my last system update
It used to work only in Safari, and not in Safari Preview, I guess the system for Safari Preview was bring to safari now/
tell application "Safari"
set DinfoGrab to do JavaScript "
document.getElementsByClassName(' field type-string field-Dinfo ')[0].innerHTML;" in tab 3 of window 1
end tell
with this error :
Safari got an error: Can’t make " document.getElementsByClassName('
field type-string field-Dinfo ')[0].innerHTML;" into type text.
how can I fix that? thanks.
UPDATE :
Here is something working perfectly with Chrome :
tell application "Google Chrome"
tell tab 3 of window 1 to set r to execute javascript "document.getElementsByClassName('field type-string field-Dinfo')[0].innerHTML;"
end tell
Without seeing the complete code, I can't say for sure what's going on. But judging by the name of your function -- getInputByClass2 -- I assume you're trying to get the value of HTML <input> fields. If this is true, you should be using outPut.push(arr[i].value) instead of outPut.push(arr[i].innerHTML)
As for the second bit of code, your JavaScript doesn't have any error handling in case the value of document.getElementsByClassName(' field type-string field-Dinfo ')[0] is null.
var els = document.getElementsByClassName(' field type-string field-Dinfo ');
//set to value of [0].innerHTML if [0] exists, else empty string
var html = els.length ? els[0].innerHTML : "";
//return value to AppleScript
html;
update (response to updated question)
Running the following script in Script Editor against this StackOverflow page will return the correct value (assuming you have the correct window/tab numbers set). If the search field at the top of this StackOverflow page is empty, you will get an empty string. If you enter a term (but don't submit) then run the AppleScript, you will get the value of the field.
tell application "Safari"
set DinfoGrab to do JavaScript "
document.getElementsByClassName('js-search-field')[0].value;" in tab 1 of window 1
end tell
The only changes from your script are the window/tab numbers, the classname (changed to match the StackOverflow page), and I used value instead of innerHTML.
I have tested in the most current version of Safari (10.0.3); if this doesn't work in your version of Safari, ensure you're pointing to the correct class name. If this script DOES work for you, then the issue is probably due to something on the page you're trying to search, perhaps related to the type of <input> field you're fetching or an incorrect classname. Maybe the update to Safari is causing the page to render differently, which indirectly affects your code.