How to integrate mouse scrolling in a Javascript web scraper? - qtwebkit

I found an awesome bit of code online from this source:
https://webscraping.com/blog/Scraping-multiple-JavaScript-webpages-with-webkit/
that empowers me to scrape javascript-heavy sites with ease. However, some sites only fully load when there is mouse scrolling involved. How do I integrate mouse scrolling into this particular set of code? I have googled super alot for this and I found methods to read AJAX calls and stuff but I still would like to use this set of code as some websites have a ton of calls that are impossible to track. Hence I would rather use this set of code to just scrape the html.
Part of the code is here, the rest is in the link above.
class Render(QWebPage):
def __init__(self, urls, cb):
self.app = QApplication(sys.argv)
QWebPage.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.urls = urls
self.cb = cb
self.crawl()
self.app.exec_()

Related

Capybara Selenium Navigate To URL Hangs With Popup Alert on Safari

At the end of my tests Capybara automatically navigates to "about:blank" in order to set up the next test. Sometimes the application I'm testing will throw a popup alert if the user leaves the page (which is expected). I have some code to handle this:
begin
page.driver.browser.navigate.to("about:blank")
page.driver.browser.switch_to.alert.accept
rescue Selenium::WebDriver::Error::NoAlertPresentError
# No alert was present. Don't need to do anything
end
This works fine on Firefox, Chrome, and IE. But for some reason on Safari the navigate command hangs, I assume because of the popup. Anyone know a workaround for this?
There is no simple workaround for this at this time in any version of Selenium language bindings. It is a known issue the Selenium team is not interested in resolving. Fundamentally, it is due to the architecture of Safari and consequently the architecture of the Safari Driver.
The JavaScript of the Safari Driver extension does not know about most of the alerts and popups and dialogs that appear as modal Cocoa layer windows.
It also cannot interact with them.
There is a way but it won't be easy and nobody's done it.
You would need to use Cocoa.
So you would want to use RubyCocoa in this case.
(or PyObjC if you used Python)
You would then possibly also want a sidecar app actually written in Objective-C.
The trick would be to use the AX (Accessibility API) and a separate process to observe if there is an alert as the front window and poke at its labels and buttons' text as visible to the AX APIs.
AX APIs are probably exposed in RubyCocoa via the ScriptingBridge.
However, you would need to add your 'app' to the Security preference pane's list of things allowed to control the computer.
With that, you could detect the window and handle it.
It could be fairly brittle across web sites, but if built well, you could handle expected conditions.
You could try to confirm like this which I believe should work across browsers
# click ok to confirm
page.evaluate_script('window.confirm = function() { return true; }')

Use node-webkit to remote control an iframe?

I'm trying to automate a work flow where we have to log in to a website, navigate, get redirected several times and finally have to upload a file into a reporting system.
After failing with phantomjs/casparjs (where we also do not really get visual feedback) I was thinking about using node-webkit.
So basically, what I am trying to do is writing a "controller" that is opening another webpage in an iframe and then manipulating the fields, hitting buttons, ...
Is this something that can be done? If yes, I am struggeling to get a handle on the fields to fill them...
Or is this a classic "wrong tool" approach and we shouldn't be doing that?
Something along the line of
var new_win = gui.Window.get(
window.open('https://remote/login/site/')
);
gui.Window.get(new_win).on('loaded', function () {
//all of this doesn't really work but might help you to understand what I try to do
//window.console.log(new_win.window.document.getElementById("user"));
//window.eval(new_win, "code_to_fill_the_user_field");
//var userField = new_win.window.document.getElementById("user");
//console.log(userField);
});
Update: 2014-08-02:
I understand now that webkit is intended for creating desktop apps with HTML5 and not remote controlling websites, so we forget about this question.
I did solve the problem with phantomjs/casperjs now, BTW.
I understand now that node-webkit is intended for creating desktop apps with HTML5 and not remote controlling websites, so we forget about this question. I did manage to solve the problem with phantomjs/casperjs now.

How to set my canvas drawings visible to anther user who opens same page where actions performed using paper.js script

I am working with paper.js in asp.net mvc4 application ,which helps in drawing on canvas region of HTML. I need your support for my requirement:
When I draw on my canvas I want to make visible these drawings on other canvas who opened same page over internet.
Paper.js provides a global variable called project.activeLayer to access items on view. I saved cavnas data in JSON format from active Layer , then I send this data through server communication. How can I rebuild the view in the canvas with same data. ?
(or)
Is there any way to do this without transmission of data.?
Thanks,
surbob.
This is not going to be simple. What you're talking about is basically the same as a chat room with caht cleints in the browser. You need to send the canvas data to the server, and then have the server update any other clients connected to it.
Probably the best place to start would be a chat-room sample and modify the code to handle the canvas data. The SignalR real-time communications library would probablyhelp to make tings a lot simpler, and it has good samples to get you going.

Getting DOM from page using Chromium/WebKit

Trying to get access to a page's DOM after rendering. I do not need to view the page and plan to apply this programmatically without any GUI or interaction.
The reason I am interested in post-rendering is that I want to know where objects appear. Some location information is coded in the HTML (e.g., via offsetLeft), but much is not. Also, Javascript can change the ultimate positioning. I want positions that are as close to what the user will see as possible.
I've looked into Chromium code and think there is a way to do this but there is not enough documentation to get started.
Putting it VERY simply I'd be interested in pseudo-code like this:
DOMRoot *r = new Page("http://stackoverflow.com")->getDom();
Any tips on starting points?
You should use the Web API wrapper that Chromium exposes; specifically, the WebDocument class contains the functionality that you need. You can call it like this:
WebFrame * mainFrame = webView->mainFrame();
WebDocument document = mainFrame->document();
WebElement docElement = document->docElement();
// Manipulate the DOM here using docElement
...
You can browse the source code for Chromium's Web API wrapper here. Although there's not much in the way of documentation, the header files are fairly well-commented and you can browse Chrome's source code to see the API in action.
It's difficult to get started using Chromium. I recommend looking at the test_shell application. Also, a framework like the Chromium Embedded Framework (CEF) simplifies the process of embedding Chromium in your application; I use CEF in my current project and I'm very satisfied with it.

Cocoa automated WebView

I looking into making a kind of robot testing browser. Like Selenium, but one that we can use to make full integration tests of our site. I'm wondering if it's possible to create a Cocoa app that loads up a web page in a WebView and they programmatically sends click events. I realize you could use:
- (NSString *)stringByEvaluatingJavaScriptFromString:(NSString *)script
To send js click evenets, but it would be better if you could send click events to the DOMElements themselves. That way you could test file uploads and other elements that can't be accessed via javascript like flash. Does anyone know if this is possible?
You can obtain DOMNode* objects corresponding exactly to JavaScript Node objects by using a WebView's -windowScriptObject method (that returns the WebScriptingObject* that corresponds to the JavaScript window object) or any frame's -DOMDocument method to return that frame's JavaScript document method.
Example:
DOMDocument* d = [[webView mainFrame] DOMDocument];
[[[d getElementsByTagName:#"a"] item:0] click];
Fake sounds like exactly what you want. It's WebKit based, automated, has tab support, and a huge library full of useful things like evaluating JavaScript, assertions, variables, events, and loops. Highly recommended.