downloading a file that comes as an attachment in a POST request response in PhantomJs - phantomjs

I want to download a CSV file, it is generated on a button click through a POST request. I researched to my best on casperJs and phantomJS forums and returned empty handed. In a normal browser like firefox, a browser download dialog window appears after the post request. How to handle this case in PhantomJS
TTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.5
Content-disposition: attachment;filename=ExportData.csv
X-AspNet-Version: 2.0.50727
X-Powered-By: ASP.NET
Date: Fri, 19 Apr 2013 23:26:40 GMT
Content-Length: 65183

I've found a way to do this using casperjs (it should work with phantomjs alone if you implement the download function using XMLHttpRequest, but i've not tried).
I'll leave you the working example, that tries to download the mos recent PDF from this page. When you click the download link, some javascript code is triggered that generates some hidden input fields that are then POSTed.
What we do is replace the form's onsubmit function so that it cancels the submission, and get the form destination (action) and all its fields. We use this information later to do the actual download.
var casper=require('casper').create();
casper.start("https://sede.gobcan.es/tributos/jsf/publico/notificaciones/comparecencia/ultimosanuncios.jsp", function() {
var theFormRequest = this.page.evaluate(function() {
var request = {};
var formDom = document.forms["resultadoUltimasNotif"];
formDom.onsubmit = function() {
//iterate the form fields
var data = {};
for(var i = 0; i < formDom.elements.length; i++) {
data[formDom.elements[i].name] = formDom.elements[i].value;
}
request.action = formDom.action;
request.data = data;
return false; //Stop form submission
}
//Trigger the click on the link.
var link = $("table.listado tbody tr:first a");
link.click();
return request; //Return the form data to casper
});
//Start the download
casper.download(theFormRequest.action, "downloaded_file.pdf", "POST", theFormRequest.data);
});
casper.run();
Note: you have to run it with --ignore-ssl-errors, as the CA they use isn't in your browser default CA list.
casperjs --ignore-ssl-errors=true downloadscript.js

You can listen to the page.resource.received event and download() the file when received:
casper.on('page.resource.received', function(resource) {
if (resource.stage !== "end") {
return;
}
if (resource.url.indexOf('ExportData.csv') > -1) {
this.download(resource.url, 'ExportData.csv');
}
});

#julianjm aproach is almost the solution, but in my case i did not have the correct form name to replace the form submission.
So i found another solution using phantomjs beta:
There is a beta version of phantomjs 2.0 that includes an event handler that solves this issue.
It is still a beta version, so there is no debugging.
So i have developed the clicks and the page treatments on the release version and then changed the phantom version to make download work.
casper.start('http://www.website.com.br/', function() {
this.page.onFileDownload = function(status){console.log('onFileDownload(' + status + ')');
//SYSTEM WILL DETECT THE DOWNLOAD, BUT YOU WILL HAVE TO NAME THE FILE BY YOURSLEF!!
return "ContactList_08-25-14.csv"; };
});
casper.then(function() {
//DO YOUR STUFF HERE TO CLICK ON THE DOWNLOAD LINK.
});
casper.run();
Download: Phantom 2.0 BETA
Download the exe, rename the release version of phantom.exe to phantom.bkp.exe
and insert this 2.0 version on the place.
Then, in casperjs you will need to add some lines at the beggining of casperjs/bin/bootstrap.js
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*
*/
var system = require('system');
var argsdeprecated = system.args;
argsdeprecated.shift();
phantom.args = argsdeprecated;
also comment the version check (same file):
(function(version) {
// required version check
/* if (version.major !== 1) {
return __die('CasperJS needs PhantomJS v1.x');
} if (version.minor < 8) {
return __die('CasperJS needs at least PhantomJS v1.8 or later.');
}
if (version.minor === 8 && version.patch < 1) {
return __die('CasperJS needs at least PhantomJS v1.8.1 or later.');
} */
})(phantom.version);
Remember, this is a tweak!!.
So this lines on bootstrap will cause problems if you want to run phantom release version or slimerjs.
So DEVELOP ON RELEASE VERSION, than tweak to this version to be able to download.
If you need to debug, you will have to remove the lines of bootstrap.js

I have to deal with a site written with some kind of ASP.Net framework which sends a remarkable amount of POST data at each request (some 100 Kb of data, of which about 95 never seem to change between requests - viewport state related apparently).
However, no method I could find worked for me. I've looked into intercepting XHR, I've even found someone who is tackling the very same framework (at least judging from the selectors) but with a simpler case, inspired by this very question. I found out that back in the day this couldn't be done with PhantomJS.
My problem is that a click on a button starts a chain of AJAX requests culminating with the sending of this enormous POST form, to which finally the server replies with a "Content-Disposition: attachment".
In the end, I found this approach which works for me, even if it is network-inefficient:
...setting up everything, until I just need to click on a button...
phantomData = null;
phantomRequest = null;
// Here, I just recognize the form being submitted and copy it.
casper.on('resource.requested', function(requestData, request) {
for (var h in requestData.headers) {
if (requestData.headers[h].name === 'Content-Type') {
if (requestData.headers[h].value === 'application/x-www-form-urlencoded') {
phantomData = requestData;
phantomRequest = request;
}
}
}
});
// Here, I recognize when the request has FAILED because PhantomJS does
// not support straight downloading.
casper.on('resource.received', function(resource) {
for (var h in resource.headers) {
if (resource.headers[h].name === 'content-disposition') {
if (resource.stage === 'end') {
if (phantomData) {
// to do: get name from resource.headers[h].value
casper.download(
resource.url,
"output.pdf",
phantomData.method,
phantomData.postData
);
} else {
// Something went wrong.
}
// Possibly, remove listeners?
}
}
}
});
// Now, click on the button and initiate the dance.
casper.click(pdfLinkSelector);
The download works flawlessly, even if I can see that the file gets requested (and sent) twice.
[debug] [phantom] Navigation requested: url=https://somesite/SomePage.aspx, type=FormSubmitted, willNavigate=true, isMainFrame=true
[debug] [application] GOT FORM, REQUEST DATA SAVED
[warning] [phantom] Loading resource failed with status=fail (HTTP 200): https://somesite/SomePage.aspx
[debug] [application] END STAGE REACHED, PHANTOMDATA PRESENT
[debug] [application] ATTEMPTING CASPERJS.DOWNLOAD
[debug] [remote] sendAJAX(): Using HTTP method: 'POST'
[debug] [phantom] Downloaded and saved resource in output.pdf
[debug] [application] TERMINATING SUCCESSFULLY
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
(Next, I'll probably modify the script to try invoking request.abort() from inside the resource.requested listener, set a semaphore and invoke again the downloader - I won't be able to get the attachment filename, but that matters little to me).

Related

Chrome Extension - Migration to Manifest v3 - chrome.permissions user gesture issue

I have built a chrome extension in manifest version 2 and am now looking at migrating to version 3. As part of this migration I have come across an issue when trying to toggle an optional permission to use the chrome notifications api.
Since you can't request a new permission from a content script as the api is not accessible from a content script, you have to send a message to the background script to perform the request and return the response to the content script. This worked as expected with version 2, now I am receiving this error:
Unchecked runtime.lastError: This function must be called during a user gesture
This means that the extension wants the permission request to be initiated on the back of an event initiated by a user action, such as a click. This indicates that the extension wishes the permission request to be completed from the content script but as stated above this is impossible.
Could anyone illuminate me if I'm missing something?
Content Script:
chrome.runtime.sendMessage(
{message: 'requestPermissions', permissions: ['notifications']},
(res) => console.log(res)
);
Background Script:
export function requestPermissions(request, sender, sendResponse) {
const {permissions} = request;
new Promise((resolve) => {
chrome.permissions.request(
{
permissions
},
(granted) => resolve(granted)
);
}).then((res) => sendResponse(res));
return true;
}

Force reload cached image with same url after dynamic DOM change

I'm developping an angular2 application (single page application). My page is never "reloaded", but it's content changes according to user interactions.
I'm having some cache problems especially with images.
Context :
My page contains an editable image list :
<ul>
<li><img src="myImageController/1">Edit</li>
<li><img src="myImageController/2">Edit</li>
<li><img src="myImageController/3">Edit</li>
</ul>
When i want to edit an image (Edit link), my dom content is completly changed to show another angular component with a fileupload component.
The myImageController returns the LastModified header, and cache-control : no-cache and must-revalidate.
After a refresh (hit F5), my page does a request to get all img src, which is correct : if image has been modified, it is downloaded, if not, i just get a 304 which is fine.
Note : my images are stored in database as blob fields.
Problem :
When my page content is dynamically reloaded with my single page app, containing img tags, the browser do not call a GET http request, but immediatly take image from cache. I assume this a browser optimization to avoid getting the same resource on the same page multiple times.
Wrong solutions :
The first solution is to add something like ?time=(new Date()).getTime() to generate unique urls and avoid browser cache. This won't send the If-Modified-Since header in the request, and i will download my image every time completly.
Do a "real" refresh : the first page load in angular apps is quite slow, and i don't to refresh all.
Tests
To simplify the problem, i trying to create a static html page containing 3 images with the exact same link to my controller : /myImageController/1. With the chrome developper tool, i can see that only one get request is called. If i manage to get mulitple server calls in this case, it would probably solve my problem.
Thank you for your help.
5th version of HTML specification describes this behavior. Browser may reuse images regardless of cache related HTTP headers. Check this answer for more information. You probably need to use XMLHttpRequest and blobs. In this case you also need to consider Same-origin policy.
You can use following function to make sure user agent performs every request:
var downloadImage = function ( imgNode, url ) {
var xhr = new XMLHttpRequest();
xhr.open("GET", url, true);
xhr.responseType = "blob";
xhr.onreadystatechange = function () {
if (xhr.readyState == 4) {
if (xhr.status == 200 || xhr.status == 304) {
var blobUrl = URL.createObjectURL(xhr.response);
imgNode.src = blobUrl;
// You can also use imgNode.onload callback to release blob resources.
setTimeout(function () {
URL.revokeObjectURL(blobUrl);
}, 1000);
}
}
};
xhr.send();
};
For more information check New Tricks in XMLHttpRequest2 article by Eric Bidelman, Working with files in JavaScript, Part 4: Object URLs article by Nicholas C. Zakas and URL.createObjectURL() MDN page and Same-origin policy MDN page.
You can use the random ID trick. This changes the URL so that the browser reloads the image. Not that this can be done in the query parameters to force a full cache break or in the hash to allow the browser to re-validate the image from the cache (and avoid re-downloading it if unchanged).
function reloadWithCache(img: HTMLImageElement, url: string) {
img.src = url.replace(/#.*/, "") + "#" + Math.random();
}
function reloadBypassCache(img: HTMLImageElement, url: string) {
let sep = img.indexOf("?") == -1? "?" : "&";
img.src = url + sep + "nocache=" + Math.random()
}
Note that if you are using reloadBypassCache regularly you are better off fixing your cache headers. This function will always hit your origin server leading to higher running costs and making CDNs ineffective.

Migrating to self hosted Parse Server isn't giving me the logged user

I'm trying to migrate my Parse server to my own server instance in DigitalOcean. After deploying my parse-server I'm falling in some issue I can't understand.
When you make a call to the Cloud Code, you can retrieve your user as request.user if you have revocable sessions enabled.
Everything is OK, but sometimes (random times) I get this strange behaviour: my request.user doesn't appear in Cloud Code.
I thought it could be a bad session token so I got rid of it by doing:
if (!request.user) {
response.error("INVALID_SESSION_TOKEN");
return;
}
and obbligate the user to log-in again.
This wasn't working, I was getting an INVALID_SESSION_TOKEN everytime I log in, so I decided to debug. These are my steps:
1.- Log in my user, so a _Session object is created:
so the sessionToken is r:a425239d4184cd98b9b693bbdedfbc9c
2.- Make call cloud function (sniff log):
POST /parse-debug/functions/getHomeAudios HTTP/1.1
X-Parse-OS-Version: 6.0.1
X-Parse-App-Build-Version: 17
X-Parse-Client-Key: **** (hidden)
X-Parse-Client-Version: a1.13.0
X-Parse-App-Display-Version: 1.15.17
X-Parse-Installation-Id: d7ea4fa0-b4dc-4eff-9b7d-ff53a1424dcb
User-Agent: Parse Android SDK 1.13.0 (com.pronuntiapp.debug.uat/17) API
Level 23
X-Parse-Session-Token: r:a425239d4184cd98b9b693bbdedfbc9c
X-Parse-Application-Id: **** (hidden)
Content-Type: applicati¡á“WÇX�
Content-Length: 346
Host: 46.101.89.192:1338
Connection: Keep-Alive
Accept-Encoding: gzip
3.- request.user is still not appearing on CloudCode.
EDIT: Reseting the parse-server worked in this case, but not in some others.
Days ago I got the solution.
When you have successfully deployed your Parse server, you will get request.user from any end point of the cloud, but if you call a cloud function from cloud, you won't get this request.user at least you pass the sessionToken:
Parse.Cloud.define("foo", function(request, response) {
if (!request.user) {
response.error("INVALID_SESSION_TOKEN");
return;
}
var countResponses = 0;
var responsesNeeded = 1;
Parse.Cloud.run('bar', request.params, {
sessionToken: request.user.getSessionToken(),
success: function(c) {
countResponses++;
result = c;
if (countResponses >= responsesNeeded) {
response.success(result);
}
},
error: function(error) {
response.error(error);
}
});
});
in this case, foo will have request.user and bar won't, unless you pass sessionToken.

Firefox add-on SDK: Get http response headers

I'm new to add-on development and I've been struggling with this issue for a while now. There are some questions here that are somehow related but they haven't helped me to find a solution yet.
So, I'm developing a Firefox add-on that reads one particular header when any web page that is loaded in any tab in the browser.
I'm able to observer tab loads but I don't think there is a way to read http headers inside the following (simple) code, only url. Please correct me if I'm wrong.
var tabs = require("sdk/tabs");
tabs.on('open', function(tab){
tab.on('ready', function(tab){
console.log(tab.url);
});
});
});
I'm also able to read response headers by observing http events like this:
var {Cc, Ci} = require("chrome");
var httpRequestObserver =
{
init: function() {
var observerService = Cc["#mozilla.org/observer-service;1"].getService(Ci.nsIObserverService);
observerService.addObserver(this, "http-on-examine-response", false);
},
observe: function(subject, topic, data)
{
if (topic == "http-on-examine-response") {
subject.QueryInterface(Ci.nsIHttpChannel);
this.onExamineResponse(subject);
}
},
onExamineResponse: function (oHttp)
{
try
{
var header_value = oHttp.getResponseHeader("<the_header_that_i_need>"); // Works fine
console.log(header_value);
}
catch(err)
{
console.log(err);
}
}
};
The problem (and a major source of personal confusion) is that when I'm reading the response headers I don't know to which request the response is for. I want to somehow map the request (request url especially) and the response header ("the_header_that_i_need").
You're pretty much there, take a look at the sample code here for more things you can do.
onExamineResponse: function (oHttp)
{
try
{
var header_value = oHttp.getResponseHeader("<the_header_that_i_need>");
// URI is the nsIURI of the response you're looking at
// and spec gives you the full URL string
var url = oHttp.URI.spec;
}
catch(err)
{
console.log(err);
}
}
Also people often need to find the tab related, which this answers Finding the tab that fired an http-on-examine-response event

dojo.io.iframe.send does not send a request on second time onwards in dojo 1.8

Example code snippet
this._deferred = dojo.io.iframe.send({
url: "/Some/Servie",
method: "post",
handleAs: 'html',
content: {},
load: function(response, ioArgs){
//DO successfull callback
},
error: function(response, ioArgs){
// DO Failer callback
}
});
Steps
click submit button send a request and successfully got a response
click submit button again...request never send...
Appreciate any help
I can't talk for 1.8, but I am using dojo 1.6 and had a very similar issue that I resolved with the following method:
dojo.io.iframe._currentDfd = null; //insert this line
dojo.io.iframe.send
({...
*verified in Chrome Version 25.0.1364.152 m
Source: http://mail.dojotoolkit.org/pipermail/dojo-interest/2012-May/066109.html
dojo.io.frame.send will only send one request at a time, so if it thinks that the first request is still processing (whether it actually is or not), it won't work on the second call. The trick is to call cancel() on the returned deferred result if one exists, like so:
if (this._deferred) {
this._deferred.cancel();
}
this._deferred = dojo.io.iframe.send({
....
that will cancel the first request and allow the second request to send properly.
For dojo 1.8, dojo.io.iframe is deprecated. dojo.request.iframe is used instead.
And the solution from #Sorry-Im-a-N00b still works:
iframe._currentDfd = null;
iframe.get(url, {
data: sendData,
});