Using phantomjs to read the response body - phantomjs

Is there any way to request a resource with phantomjs and be able to get to the response's body?

UPDATE: Regarding the other possible meaning of "fetch and do something with all other resources such as images, CSS, fonts, etc.", I've recently blogged how to do this in SlimerJS. I believe the only way to do it in PhantomJS as of 1.9.1 is to apply a patch and recompile.
Maybe I'm misunderstanding what you mean by "response body", or maybe it was added to PhantomJS more recently than this question, but it is as easy as this:
var page = require('webpage').create();
var url = 'http://google.com/';
page.open(url,function(){
console.log(page.content);
phantom.exit();
});
(By the way, use page.plainText to get it without the HTML tags.)
If you just wanted the <body> tag contents, none of the <head> is an alternative way that can be used to get any part of the response:
var page = require('webpage').create();
var url = 'http://google.com/';
page.open(url,function(){
var html = page.evaluate(function(){
return document.getElementsByTagName('body')[0].innerHTML;
});
console.log(html);
phantom.exit();
});

This is one big problem with PhantomJS right now. The open (as of writing) ticket is located at http://code.google.com/p/phantomjs/issues/detail?id=158 and as of yet, have no reliable solutions. This applies to collecting your request data as well as response data, so you cannot collect your submitted post data, then re-send with a CasperJS download like scheme.

Use slimmerjs. All your 'phantomjs' code will work with 'slimmerjs' too.
More info here. Notice the body property at the end which is only available for slimmerjs as of now.
Note: Please set page.captureContent = [/.*/] for the 'body' to show up in the response. More info about this: here

SlimerJS cannot work on newer version of FireFox, therefore no good for me.
This answer explains how to get the response body from XHR today in late 2019

Related

Blazor : How to get HTML of SSRS from controller?

I'm trying to get a report from SSRS Rest API. I can see it when I navigate to the URL
https://myPC:443/ReportService?%2fSSRS%2fPatientèle&rs:Command=Embed&rc:LinkTarget=main&Hospital=CHRU%20Strasbourg
in chrome
When I navigate there in the browser I can see my report.
So I've tried to get the HTML from a controller:
[HttpGet]
public async Task<string> GetReportAsHTML()
{
using (var client = new HttpClient())
{
using (var result = await client.GetAsync("http://myPC:80/ReportService?%2fSSRS%2fPatientèle&rs:Command=Embed&rc:LinkTarget=main&Hospital=CHRU%20Strasbourg"))
{
if (result.IsSuccessStatusCode)
{
return await result.Content.ReadAsStringAsync();
}
}
}
return "";
}
It's returning 401 unauthorized, and the statement in the if is never reached.
Can someone please explain how I can resolve this problem so I get the correct response?
EDIT
I tried both, I mean in http & https, and both return the report without authentification. Http access (http://localhost:80/...) was even better because charts aren't displayed, only tables. With https access, I get the following picture instead of the charts:
If you need to embed reports, you can also consider doing so in an <iframe> - just point its src to the report URL.
<iframe src="https://myPC:443/ReportService?%2fSSRS%2fPatientèle&rs:Command=Embed&rc:LinkTarget=main&Hospital=CHRU%20Strasbourg "></iframe>
The added benefits are:
iframes usually share cookies with their parent, so if the report server needs such authentication it may work immediately
if the report is somehow interactive (say, it actually returns an html page with filters, dropodwns and the like) - your end user would get that too instead of static HTML that might even break when taken out of its context
You may also want to look into ready-made Blazor report viewer components - perhaps your reporting solution vendor already has one.
It seems like url in your code do not match the one you use through Chrome? Could this be the issue?

node express multer fast-csv pug file upload

I trying to upload a file using pug, multer and express.
The pug form looks like this
form(method='POST' enctype="multipart/form-data")
div.form-group
input#uploaddata.form-control(type='file', name='uploaddata' )
br
button.btn.btn-primary(type='submit' name='uploaddata') Upload
The server code looks like this (taken out of context)
.post('/uploaddata', function(req, res, next) {
upload.single('uploaddata',function(err) {
if(err){
throw err;
} else {
res.json({success : "File upload sucessfully.", status : 200});
}
});
})
My issue is that while the file uploads successfully, the success message is not shown on the same page, ie: a new page is loaded showing
{success : "File upload sucessfully.", status : 200}
As an example for other elements (link clicks) the message is displayed via such javascript:
$("#importdata").on('click', function(){
$.get( "/import", function( data ) {
$("#message").show().html(data['success']);
});
});
I tried doing a pure javascript in order to workaround the default form behaviour but no luck.
Your issue has to do with mixing form submissions and AJAX concepts. To be specific, you are submitting a form then returning a value appropriate to an AJAX API. You need to choose one or the other for this to work properly.
If you choose to submit this as a form you can't use res.json, you need to switch to res.render or res.redirect instead to render the page again. You are seeing exactly what you are telling node/express to do with res.json - JSON output. Rendering or redirecting is what you want to do here.
Here is the MDN primer on forms and also a tutorial specific to express.js.
Alternatively, if you choose to handle this with an AJAX API, you need to use jquery, fetch, axios, or similar in the browser to send the request and handle the response. This won't cause the page to reload, but you do need to handle the response somehow and modify the page, otherwise the user will just sit there wondering what has happened.
MDN has a great primer on AJAX that will help you get started there. If you are going down this path also make sure you read up on restful API design.
Neither one is inherently a better strategy, both methods are used in large-scale production applications. However, you do need to choose one or the other and not mix them as you have above.

Download file depending on mimetype in casperjs

In a web scraping exercise, I need to click on links, let them render the content if is html and download it otherwise. How do I accomplish this with casperjs or some other tools on top of phantom/slimerjs?
As I understand it, phantom/slimerjs lack the APIs to support download. casperjs has a download API but I am not able to see how to examine the mime type and let the html render while download other content.
In both PhantomJS and SlimerJS you can register a listener to each received response:
page.onResourceReceived = function(response) {
...
}
However, only in SlimerJS is response.body defined. By using this you can save the file. There is a full example in this blog post. (As that example shows, you must set page.captureContent to cover the files you want data for.)
There is no way to do this in PhantomJS 1.9.x (and I believe PhantomJS 2.x still has the same problem, but I have not personally confirmed this yet).
The other part of your question was about deciding what to save based on mime type. The full list of available fields shows you can use response.contentType.

Set a string as the response of a webpage in phantomjs

Hi what i am trying to do is not to get the webpage as
page.open(url);
but to set a string that has already been retrieved as the page response. Can that be done?
Yes, and it is as easy as assigning to page.content. It is usually also worth setting a page.url (as otherwise you might hit cross-domain issues if doing anything with Ajax, SSE, etc.), and the setContent function is helpful to do both those steps in one go. Here is the basic example:
var page = require('webpage').create();
page.setContent("<html><head><style>body{background:#fff;text:#000;}</style><title>Test#1</title></head><body><h1>Test #1</h1><p>Something</p></body></html>","http://localhost/imaginary/file1.html");
console.log(page.plainText);
page.render("test.png");
phantom.exit();
So call page.setContent with the "previously retrieved page response" that you have.

save phantom js processed page into html file with absolute url

I want to save my special web pages after document loaded into special file name via all url and links convert to absolute url such as wget -k.
//phantomjs
var page = require('webpage').create();
var url = 'http://google.com/';
page.open(url, function (status) {
var js = page.evaluate(function () {
return document;
});
console.log(js.all[0].outerHTML);
phantom.exit();
});
for example my html content somthing like this:
page
must be
page
It's my sample script but how can i convert all url and links such as wget -k using phantomjs?
You can modify your final HTML so that it has a <base> tag - this will make all relative URLs working. In your case, try putting <base href="http://google.com/"> right after the <head> on the page.
It is not really supported by PhantomJS is more than just an HTTP client. Imagine if there is a JavaScript code which pulls a random content with image on the main landing page.
The workaround which might or might not for you is to replace all the referred resource in the DOM. This is possible using some CSS3 selector (href for a, src for img, etc) and manual path resolve relative to the base URL. If you really need to track and enlist every single resource URL, use the network traffic monitoring feature.
Last but not least, to get the generated content you can use page.content instead of that complicated dance with evaluate and outerHTML.