I'm using RSelenium to automatically scroll down a social media website and save posts. Sometimes I get to the bottom of the webpage and no more posts can be loaded as no more data is available. I just want to be able to check if this is the case so I can stop trying to scroll.
How can I tell if it's possible to continue scrolling in RSelenium? The code below illustrates what I'm trying to do - I think I just need help with the "if" statement.
FYI there's a solution for doing this in Python here (essentially checking if page height changes between iterations), but I can't figure out how to implement it (or any another solution) in R.
# Open webpage
library(RSelenium)
rD = rsDriver(browser = "firefox")
remDr = rD[["client"]]
url = "https://stocktwits.com/symbol/NZDCHF"
remDr$navigate(url)
# Keep scrolling down page, loading new content each time.
ptm = proc.time()
repeat {
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(3) #delay by 3sec to give chance to load.
# Here's where i need help
if([INSERT CONDITION TO CHECK IF SCROLL DOWN IS POSSIBLE]) {
break
}
}
Stumbled across a way to do this in Python here and modified it to work in R. Below is a now-working update of the original code I posted above.
# Open webpage
library(RSelenium)
rD = rsDriver(browser = "firefox")
remDr = rD[["client"]]
url = "https://stocktwits.com/symbol/NZDCHF"
remDr$navigate(url)
# Keep scrolling down page, loading new content each time.
last_height = 0 #
repeat {
remDr$executeScript("window.scrollTo(0,document.body.scrollHeight);")
Sys.sleep(3) #delay by 3sec to give chance to load.
# Updated if statement which breaks if we can't scroll further
new_height = remDr$executeScript("return document.body.scrollHeight")
if(unlist(last_height) == unlist(new_height)) {
break
} else {
last_height = new_height
}
}
Related
Update: I used HtmlView and yes, the height is dynamic with the content,
but it seems like not support <img />
I'm currently using WebView to render the markdown content as below
<WebView :src="marked_content" height="1200px" margin="20dp" />
The result is fixed at 1200px with scrollBar as expected, but what I actually want to do is to render the whole markdown content with various height and without scrollBar.
Anyone may help, please?
PS: Any other methods which may render markdown content is welcome! Thank you!
Do you have control over the website, if Yes you can do this
https://discourse.nativescript.org/t/dynamic-webview-height/4215/2?u=manojdcoder
I have worked out how to do this without plugins.
There is a solution above which appends the URL with a hash containing the page height. It didn't work for me because I'm adding HTML code directly.
For example
src="<p>blah blah</p>"
This is a plain JS solution, so you'll have to rework it to get it working in Vue / Typescript.
Give your WebView an id, do not set the height, and add the "loaded" and "loadFinished" handlers.
For the loaded handler.
platformModule = require("tns-core-modules/platform");
var webViewSrcObj = {};
exports.webViewLoaded = function(webargs){
if(platformModule.isAndroid){console.log("IS ANDROID!!!"); return false;}
webview = webargs.object;
if(webview.height == "auto"){
webViewSrcObj[webview.id] = webview.src;
webview.src += '<script>function getPageHeight(){if(document.documentElement.clientHeight>document.body.clientHeight){height = document.documentElement.clientHeight}else{height = document.body.clientHeight}; ph = document.getElementById("pageHeight"); window.location = "pageHeight.html?height="+height;} setTimeout(getPageHeight, 1);</script>';
}
}
It checks the platform, and returns false if it is Android (Android works fine already).
Then it checks if the height is set to "auto" (which is default).
if it is set to auto, it will copy the HTML content. I'll explain more about this later. It then appends the html with some JavaScript code that calculates the view height, and then redirects to an empty page. It does this in order to use the query string for the page height. Make sure that page exists in your app folder to avoid any page not found errors!
Then for the onLoadeFinished handler...
exports.webViewLoadFinished = function(webargs){
if(platformModule.isAndroid){
console.log("IS ANDROID!!!"); return false;
}
webview = webargs.object;
if(webargs.url.indexOf("?height") > -1){
height = (webargs.url).split("?height=");
height = height[1].substr(0, height[1].length)/1;
webview.height = height; webview.src = webViewSrcObj[webview.id];
}
}
This will check to see if the query string height value exists.
If so it uses the height value to set the height of the webview.
And finally it adds the HTML content that was copied in the onLoaded handler.
My initial testing works well even with multiple webviews in a page.
I haven’t done extensive testing, but it might help to increase the setTimeout time if you experience any problems.
If anyone is able to improve this solution, please share your results.
I want to detect the location of elements on a page using Watir and PhantomJS.
My second approach using Capybara resulted in the same offset.
While the elements on the left side look good, the right side is misaligned:
I made the screenshot before and after I grab the positions for each element with element.wd.location, but the offset is always the same. I used evaluate_script and .getBoundingClientRect() with Capybara.
One thing looks suspicious to me: The search input field is not loaded correctly and not only shows a misalignment, but also a different size and the magnifying glass isn't shown. I don't know if this causes the offset.
I tested it with pure PhantomJS 2.1.1 (phantomjs file.js):
var fs = require('fs');
var page = require('webpage').create();
page.viewportSize = {
width: 1024,
height: 768
};
page.open('http://en.wikipedia.org/', function() {
var positions = page.evaluate(function() {
positions = [];
elements = document.getElementsByTagName('IMG');
for (var i=0, l=elements.length; i<l; i++) {
pos = elements[i].getBoundingClientRect();
positions.push(pos.left + ' ' + pos.top);
};
return positions;
});
fs.write('test.txt', positions.join("\r\n"), 'w');
page.render('test.png');
phantom.exit();
});
Same result: If you open the test.png, you see the an image on the right (left: 952px, top: 259px), but the test.txt shows it shifted to the left (left: 891px).
Do you know what could cause this problem?
Do you know what could cause this offset?
A bug in PhantomJS v2.1.1 or in the embedded Qt WebEngine.
Is there any workaround ?
No.
But I want it to work anyway, how?
Fix it yourself or hire someone to fix it or wait for it to be fixed.
Note that the issue no longer occurs in version 2.5, but it is still in beta :
https://github.com/ariya/phantomjs/milestone/16
https://bitbucket.org/ariya/phantomjs/downloads/
Here's a screenshot taken with phantomjs-2.5.0-beta :
This seems to be an issue in PhantomJS.
On the GitHub thread of the issue, #dantarion seems to have found a solution:
I am running this as well.
My fix is to run the following on the page in an evaluate block to force PhantomJS to render at the right height viewport. It works for my use case, and while I want to see it fixed in 2.2, since its still an issue I thought I'd post here.
document.getElementsByTagName("body")[0].style.overflow = "hidden";
document.getElementsByTagName("body")[0].style.height = "1080px";
document.getElementsByTagName("body")[0].style.maxHeight = "1080px";
document.getElementsByTagName("html")[0].style.overflow = "hidden";
document.getElementsByTagName("html")[0].style.height = "1080px";
document.getElementsByTagName("html")[0].style.maxHeight = "1080px";
It seems to be solving the problem. The only problem being that background-size: cover might still be off (as reported by #Luke-SF).
Please take a look at my jsFiddle here
I am using jQuery Isotope plugin and I am having troubles using their itemPositionDataEnabled to be able to scroll from my clicked item to the top of whats currently visible in the browsers window.
With itemPositionDataEnabled I should be able to extract the x and y position of what ever item I'm requesting. However mine does nothing at all....
var $this = $(this),
scrollTop = $(window).scrollTop(),
itemPosition = $this.data('isotope-item-position'),
itemPositionY = $this.itemPosition.y,
distance = (itemPositionY - scrollTop);
$('html, body').stop().animate({
scrollTop: distance
}, 1000);
You have a plain and simple error in these two lines:
itemPosition = $this.data('isotope-item-position'),
itemPositionY = $this.itemPosition.y;
The second line should be:
itemPositionY = itemPosition.y;
Not sure if you're all the way there since it only seems to work on the way you want on the first click.
http://jsfiddle.net/EA8tM/90/
I am working on creation of automated test for some Web Application. This application is very complex. In fact it is text editor for specific content. As a part of functionality it has some pop-up frames. You may open this pop-up? make some changes and save them - closing current frame. May problem is in that fact, that close button situated inside frame will be eliminating. And this force Chrome WebDriver to hung. My first try was like this:
driver.findElement(By.xpath("//input[#id='insert']")).click();
driver.switchTo().defaultContent();
But it hungs on first line after executinh click command as this command close frame.
Then I change to this(I have JQuery on the page):
driver.executeScript("$(\"input#insert\").click()");
driver.switchTo().defaultContent();
But this leads to same result.
Then I use this solution:
driver.executeScript("setTimeout(function(){$(\"input#insert\").click()}, 10)");
driver.switchTo().defaultContent();
And it hungs on second line. Only this solution works:
driver.executeScript("setTimeout(function(){$(\"input#insert\").click()}, 100)");
driver.switchTo().defaultContent();
but only if you don't take into account, that it is unstable - some timing issue may occur.
So may question is there more cleaner and more stable way for switch out from closed frame?
P.S.: executeScript - self defined function to decrease amount of code. It simply executer some js on page.
Update:
I realized I was wrong. This problem is not for all iframes. It's occur when tinyMCE popup used. Situation is exactly like in this topic. So it's doubtful I will find answer here, but who knows. Solution described above will help, but only for very short amount of time, meaning that after several seconds pass chromedriver will hangs on next command.
This is how i would do it in Ruby, hopefully you can change it for java
$driver.find_element(:xpath, "//input[#id='insert']").click
$wait.until {$driver.window_handles.size < 2} #this will "explicitly wait" for the window to close
handles = $driver.window_handles #get available window handles
$driver.switch_to.window(handles[0]) #navigate to default in this case the First window handle
hope this helps
Problem was in this line of tinyMCEPopup code:
DOM.setAttrib(id + '_ifr', 'src', 'javascript:""'); // Prevent leak
Executing this script on page fix hang problem(but possibly creates leaks :) ):
(function() {
var domVar;
if (window.tinymce && window.tinymce.DOM) {
domVar = window.tinymce.DOM
}
else if (window.tinyMCE && window.tinyMCE.DOM) {
domVar = window.tinyMCE.DOM
}
else {
return;
}
var tempVar = domVar.setAttrib;console.log(123)
domVar.setAttrib = function(id, attr, val) {
if (attr == 'src' && typeof(val)== 'string' &&(val + "").trim().match(/javascript\s*:\s*("\s*"|'\s*')/)) {
console.log("Cool");
return;
}
else {
tempVar.apply(this, arguments);
}
}
}());
Bug and solution also described here
Note. Code above should be added to parent frame, not into popup frame.
I have written a PhantomJs script to scrape multiple URLs by chaining calls to page.open() recursively. (Code snippet below.) This works for upto 3 or 4 URLs, however with a larger number of URLs I just get blank pages. By blank, I mean that document.URL contains "about: blank", and a screenshot just shows a blank white background. I have also noticed that memory usage of phantomJs keeps increasing as it continues to process a large number of URLs. Is there anything specific I need to to do deallocate any memory used to render previous pages?
Have other people seen this issue? Is it possible to scale PhantomJs to scrape a larger number of URLs (say 100)?
Thanks
Rohit
Recursive code snippet to scrape multiple URLs:
srcProducts = [{'url':'http://...' }, { 'url': 'http://...' },...];
destProducts = [];
gRetries = 0;
process();
function process() {
if (srcProducts.length == 0) {
// Output to file
phantom.exit();
} else {
product = srcProducts.pop();
page = require('webpage').create();
page.open(product['url'], onOpen);
}
}
function onOpen(status) {
// check status
// scrape info into product
destProducts.push(product);
process();
}
Someone was kind enough to answer this question on google groups. The solution is to call page.release() after you are done using a page object.
https://groups.google.com/forum/?fromgroups#!topic/phantomjs/lquzLFvZtrA
page.release() has been deprecated in the current version of PhantomJS (v1.9).
You should now use page.close() instead to free the page from memory.
http://phantomjs.org/api/webpage/method/release.html
http://phantomjs.org/api/webpage/method/close.html