PhantomJS: Blank pages when scraping multiple URLs - phantomjs

I have written a PhantomJs script to scrape multiple URLs by chaining calls to page.open() recursively. (Code snippet below.) This works for upto 3 or 4 URLs, however with a larger number of URLs I just get blank pages. By blank, I mean that document.URL contains "about: blank", and a screenshot just shows a blank white background. I have also noticed that memory usage of phantomJs keeps increasing as it continues to process a large number of URLs. Is there anything specific I need to to do deallocate any memory used to render previous pages?
Have other people seen this issue? Is it possible to scale PhantomJs to scrape a larger number of URLs (say 100)?
Thanks
Rohit
Recursive code snippet to scrape multiple URLs:
srcProducts = [{'url':'http://...' }, { 'url': 'http://...' },...];
destProducts = [];
gRetries = 0;
process();
function process() {
if (srcProducts.length == 0) {
// Output to file
phantom.exit();
} else {
product = srcProducts.pop();
page = require('webpage').create();
page.open(product['url'], onOpen);
}
}
function onOpen(status) {
// check status
// scrape info into product
destProducts.push(product);
process();
}

Someone was kind enough to answer this question on google groups. The solution is to call page.release() after you are done using a page object.
https://groups.google.com/forum/?fromgroups#!topic/phantomjs/lquzLFvZtrA

page.release() has been deprecated in the current version of PhantomJS (v1.9).
You should now use page.close() instead to free the page from memory.
http://phantomjs.org/api/webpage/method/release.html
http://phantomjs.org/api/webpage/method/close.html

Related

How can I provide a tag that would run my cucumber background once for all scenarios?

I am using cucumber-js
I have some slides within the same url. For my feature, I want to provide the tester a way to open a url, and then have multiple scenarios on the same url:
The problem with the solution below is that the url re-opens for every scenario, reseting the slide to the start. I can never test each slide step as a separate scenario.
Any help or suggestion appreciated: example:
Feature: Valuation slide user journey - pre-reqisite As a developer I want to open the url /valuation/
Background:
Given I open the url "/valuation/"
Scenario: Test valuation slide button
Given the element "valuationIntro" is visible
When I click on the button "valuationIntro.cta"
Then I expect that element "valuationSlide1" becomes visible
Scenario: Test valuation autocomplete
Given the element "valuationSlide1.cta" has the class "invalid"
When I set "jk5 7kj" to the inputfield "valuationSlide1.autocomplete"
Then I expect that element "valuationSlide1.cta" does not have the class "invalid"
I understand I can use tags, but not entirely sure how I can use a tag to run a background once.
var executed = false;
var myStepDefinitionsWrapper = function () {
this.Given(/^I open the url "([^"]*)"$/, function (url) {
if (!executed)
// do some work with url
executed = true;
});
};
module.exports = myStepDefinitionsWrapper;
Just a simplification to make a point. I would use singletons with state.

HTML string to PDF conversion

I need to create various reports in PDF format and email it to specific person. I managed to load HTML template into string and am replacing certain "custom markers" with real data. At the end I have a fulle viewable HTML file. This file must now be printed into PDF format which I am able todo after following this link : https://www.appcoda.com/pdf-generation-ios/. My problem is that I do not understand how to determine the number of pages from the HTML file as the pdf renderer requires creating page-by-page.
I know this is an old thread, I would like to leave this answer here. I also used the same tutorial you've mention and here's what I did to make multiple pages. Just modify the drawPDFUsingPrintPageRenderer method like this:
func drawPDFUsingPrintPageRenderer(printPageRenderer: UIPrintPageRenderer) -> NSData! {
let data = NSMutableData()
UIGraphicsBeginPDFContextToData(data, CGRect.zero, nil)
printPageRenderer.prepare(forDrawingPages: NSMakeRange(0, printPageRenderer.numberOfPages))
let bounds = UIGraphicsGetPDFContextBounds()
for i in 0...(printPageRenderer.numberOfPages - 1) {
UIGraphicsBeginPDFPage()
printPageRenderer.drawPage(at: i, in: bounds)
}
UIGraphicsEndPDFContext()
return data
}
In your custom PrintPageRenderer you can access the numberOfPages to have the total count of the pages

Create FullPage Screenshot WebDriver

Does someone knows a way to create full page screenshots using WebDriver?
I want if one of my tests fails to create a FULL PAGE (even the not visible part on the screen) screenshot before the browser close and save it on share location.
Also, if it is possible I want to output the result to Jenkins Console log.
Thanks!
You can use the following extension for Firefox: https://addons.mozilla.org/nl/firefox/addon/fireshot/
You can find its javascript code in %APPDATA%\Mozilla\Firefox\Profiles\
The extensions provide the ability to copy the screenshot to the clipboard.
You can use its JS methods to perform the screenshot. After that, you can retrieve the image from the clipboard and save it to as a file on shared location.
Image image = default(Image);
if (Clipboard.GetDataObject() != null)
{
IDataObject data = Clipboard.GetDataObject();
if (data.GetDataPresent(DataFormats.Bitmap))
{
Image image = (Image)data.GetData(DataFormats.Bitmap,true);
image.Save("image.jpg",System.Drawing.Imaging.ImageFormat.Jpeg);
}
else
{
Console.WriteLine("The Data In Clipboard is not as image format");
}
}
else
{
Console.WriteLine("The Clipboard was empty");
}
string newImageName = string.Concat(#"C:\SampleSharedFolder\", Guid.NewGuid());
image.Save(newImageName );
Console.WriteLine("Image save location: {0}", newImageName);
Once you have populated the result to Console it is really easy to output it back to Jenkins. You can find more in my article: http://automatetheplanet.com/output-mstest-tests-logs-jenkins-console-log/
You can use Snagit to perform full page screenshots. More information here: https://www.techsmith.com/tutorial-snagit-documentation.html
First you need to start the Snagit server and then follow the documentation.

Chrome WebDriver hungs when currently selected frame closed

I am working on creation of automated test for some Web Application. This application is very complex. In fact it is text editor for specific content. As a part of functionality it has some pop-up frames. You may open this pop-up? make some changes and save them - closing current frame. May problem is in that fact, that close button situated inside frame will be eliminating. And this force Chrome WebDriver to hung. My first try was like this:
driver.findElement(By.xpath("//input[#id='insert']")).click();
driver.switchTo().defaultContent();
But it hungs on first line after executinh click command as this command close frame.
Then I change to this(I have JQuery on the page):
driver.executeScript("$(\"input#insert\").click()");
driver.switchTo().defaultContent();
But this leads to same result.
Then I use this solution:
driver.executeScript("setTimeout(function(){$(\"input#insert\").click()}, 10)");
driver.switchTo().defaultContent();
And it hungs on second line. Only this solution works:
driver.executeScript("setTimeout(function(){$(\"input#insert\").click()}, 100)");
driver.switchTo().defaultContent();
but only if you don't take into account, that it is unstable - some timing issue may occur.
So may question is there more cleaner and more stable way for switch out from closed frame?
P.S.: executeScript - self defined function to decrease amount of code. It simply executer some js on page.
Update:
I realized I was wrong. This problem is not for all iframes. It's occur when tinyMCE popup used. Situation is exactly like in this topic. So it's doubtful I will find answer here, but who knows. Solution described above will help, but only for very short amount of time, meaning that after several seconds pass chromedriver will hangs on next command.
This is how i would do it in Ruby, hopefully you can change it for java
$driver.find_element(:xpath, "//input[#id='insert']").click
$wait.until {$driver.window_handles.size < 2} #this will "explicitly wait" for the window to close
handles = $driver.window_handles #get available window handles
$driver.switch_to.window(handles[0]) #navigate to default in this case the First window handle
hope this helps
Problem was in this line of tinyMCEPopup code:
DOM.setAttrib(id + '_ifr', 'src', 'javascript:""'); // Prevent leak
Executing this script on page fix hang problem(but possibly creates leaks :) ):
(function() {
var domVar;
if (window.tinymce && window.tinymce.DOM) {
domVar = window.tinymce.DOM
}
else if (window.tinyMCE && window.tinyMCE.DOM) {
domVar = window.tinyMCE.DOM
}
else {
return;
}
var tempVar = domVar.setAttrib;console.log(123)
domVar.setAttrib = function(id, attr, val) {
if (attr == 'src' && typeof(val)== 'string' &&(val + "").trim().match(/javascript\s*:\s*("\s*"|'\s*')/)) {
console.log("Cool");
return;
}
else {
tempVar.apply(this, arguments);
}
}
}());
Bug and solution also described here
Note. Code above should be added to parent frame, not into popup frame.

How to get an image from mshtml.htmlimg to hard disk

Without using API?
I know there are several way.
I am using mshtml library by the way, which is better than webbrowser control. I am effectively automating internet explorer straight.
Basically I prefer a way to take the image straight without having to know the URL of the htmlimg and download it.
I know I can take URL from the image element and downloading it with webclient. The image changes depending on cookies and IP. So that wouldn't do.
I want the exact images displayed by the htmlimg element to be the one stored.
Basically as if someone is taking a local screenshot of what shows up on screen.
There's an old solution for this here:
http://p2p.wrox.com/c/42780-mshtml-how-get-images.html#post169674
These days though you probably want to check out the Html Agility Pack:
http://htmlagilitypack.codeplex.com/
The documentation isn't exactly great however; so this code snippet may help:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
// You can also load a web page by utilising WebClient and loading in the stream - use one of the htmlDoc.Load() overloads
var body = htmlDoc.DocumentNode.Descendants("body").FirstOrDefault();
foreach (var img in body.Descendants("img"))
{
var fileUrl = img.Attributes["src"].Value;
var localFile = #"c:\localpath\tofile.jpg";
// Download the image using WebClient:
using (WebClient client = new WebClient())
{
client.DownloadFile("fileUrl", localFile);
}
}