Hi what i am trying to do is not to get the webpage as
page.open(url);
but to set a string that has already been retrieved as the page response. Can that be done?
Yes, and it is as easy as assigning to page.content. It is usually also worth setting a page.url (as otherwise you might hit cross-domain issues if doing anything with Ajax, SSE, etc.), and the setContent function is helpful to do both those steps in one go. Here is the basic example:
var page = require('webpage').create();
page.setContent("<html><head><style>body{background:#fff;text:#000;}</style><title>Test#1</title></head><body><h1>Test #1</h1><p>Something</p></body></html>","http://localhost/imaginary/file1.html");
console.log(page.plainText);
page.render("test.png");
phantom.exit();
So call page.setContent with the "previously retrieved page response" that you have.
Related
I am testing a login functionality on a 3rd party website. I have this url example.com/login . When I copy and paste this into the browser (chrome), page sometimes load, but sometime does not (empty blank white page).
The problem is that I have to run a script on this page to click one of the elements (all the elements are embedded inside #shadow-root). If the page loads, no problem, script is evaluated successfully. But page sometimes does not load and it returns a 404 in response to an XHR request, and as a result, my * eval(scrip("script") step returns "js eval failed...".
So I found the solution to refresh the page, and to do that, I am considering to capture the xhr request response. If the status code is 404, then refresh the page. If not, continue with the following steps.
Now, I think this may work, but I do not know how to implement karate's Intercepting HTTP Requests. And firstly, is that something doable?
I have looked into documentation here, but could not understand the examples.
https://github.com/karatelabs/karate/tree/master/karate-netty
Meanwhile, if there is another way of refreshing the page conditionally, I will be more than happy to hear about it. Thanks anyone in advance.
First, using JavaScript you should be able to handle shadow roots: https://stackoverflow.com/a/60618233/143475
And the above answer links to advanced examples of executing JS in the context of the current page. I suggest you do some research into that, try to take the help of someone who knows JS, the DOM and HTML well - and you should be find a way to know if the XHR has been made successfully or not - for e.g. based on whether some element on the page has changed etc.
Finally here is how you can do interception: https://stackoverflow.com/a/61372471/143475
I'm trying to get a report from SSRS Rest API. I can see it when I navigate to the URL
https://myPC:443/ReportService?%2fSSRS%2fPatientèle&rs:Command=Embed&rc:LinkTarget=main&Hospital=CHRU%20Strasbourg
in chrome
When I navigate there in the browser I can see my report.
So I've tried to get the HTML from a controller:
[HttpGet]
public async Task<string> GetReportAsHTML()
{
using (var client = new HttpClient())
{
using (var result = await client.GetAsync("http://myPC:80/ReportService?%2fSSRS%2fPatientèle&rs:Command=Embed&rc:LinkTarget=main&Hospital=CHRU%20Strasbourg"))
{
if (result.IsSuccessStatusCode)
{
return await result.Content.ReadAsStringAsync();
}
}
}
return "";
}
It's returning 401 unauthorized, and the statement in the if is never reached.
Can someone please explain how I can resolve this problem so I get the correct response?
EDIT
I tried both, I mean in http & https, and both return the report without authentification. Http access (http://localhost:80/...) was even better because charts aren't displayed, only tables. With https access, I get the following picture instead of the charts:
If you need to embed reports, you can also consider doing so in an <iframe> - just point its src to the report URL.
<iframe src="https://myPC:443/ReportService?%2fSSRS%2fPatientèle&rs:Command=Embed&rc:LinkTarget=main&Hospital=CHRU%20Strasbourg "></iframe>
The added benefits are:
iframes usually share cookies with their parent, so if the report server needs such authentication it may work immediately
if the report is somehow interactive (say, it actually returns an html page with filters, dropodwns and the like) - your end user would get that too instead of static HTML that might even break when taken out of its context
You may also want to look into ready-made Blazor report viewer components - perhaps your reporting solution vendor already has one.
It seems like url in your code do not match the one you use through Chrome? Could this be the issue?
I'm scraping http://www.germandeli.com/Meats/Sausages
I would like to extract the link for every product(or item) from the page. I use scrapy shell to test but it keeps return the empty value [ ].
Here is the code I use:
response.xpath('*//h2[#class="item-cell-name"]/a/#href')
Any helps would be greatly appreciated.
Well unfortunately the item content is rendered through JS. But luckily the URL sends a AJAX request to fetch a JSON of the items. This makes it much easier for us to parse it. You can check the XHR tab in the google chrome console to imitate the request with the required headers.
This URL returns the list of products. The limit and the offset parameters in the URL can be played around with to fetch the next set of data. Also to parse the JSON content you can use json.loads from the standard library.
Is there any way to request a resource with phantomjs and be able to get to the response's body?
UPDATE: Regarding the other possible meaning of "fetch and do something with all other resources such as images, CSS, fonts, etc.", I've recently blogged how to do this in SlimerJS. I believe the only way to do it in PhantomJS as of 1.9.1 is to apply a patch and recompile.
Maybe I'm misunderstanding what you mean by "response body", or maybe it was added to PhantomJS more recently than this question, but it is as easy as this:
var page = require('webpage').create();
var url = 'http://google.com/';
page.open(url,function(){
console.log(page.content);
phantom.exit();
});
(By the way, use page.plainText to get it without the HTML tags.)
If you just wanted the <body> tag contents, none of the <head> is an alternative way that can be used to get any part of the response:
var page = require('webpage').create();
var url = 'http://google.com/';
page.open(url,function(){
var html = page.evaluate(function(){
return document.getElementsByTagName('body')[0].innerHTML;
});
console.log(html);
phantom.exit();
});
This is one big problem with PhantomJS right now. The open (as of writing) ticket is located at http://code.google.com/p/phantomjs/issues/detail?id=158 and as of yet, have no reliable solutions. This applies to collecting your request data as well as response data, so you cannot collect your submitted post data, then re-send with a CasperJS download like scheme.
Use slimmerjs. All your 'phantomjs' code will work with 'slimmerjs' too.
More info here. Notice the body property at the end which is only available for slimmerjs as of now.
Note: Please set page.captureContent = [/.*/] for the 'body' to show up in the response. More info about this: here
SlimerJS cannot work on newer version of FireFox, therefore no good for me.
This answer explains how to get the response body from XHR today in late 2019
I am using scrapy tool to scrape content from website, i need help from you guys how to scrape the reponse which is dynamically loaded from ajax.
when content loading from ajax at that mean time url not changing it keep remains same but content would be changed so on that event i need to crawl.
thank you,
G.kavirajan
yield FormRequest('http://addons.prestashop.com/en/modules/featureproduct/ajax-homefeatured.php',
formdata={'type':'new','ajax':'1'},
callback=self.your_callback_method)
bellow are the urls that you can easily catch using fiddler or firebug
this is for featured tab http://addons.prestashop.com/en/modules/featureproduct/ajax-homefeatured.php?ajax=1&type=random
this is for new tab http://addons.prestashop.com/en/modules/featureproduct/ajax-homefeatured.php?ajax=1&type=new
you can request on these url directly to get results you required, although website is using POST request to get data for these url, but i tried with parameter GET request is also working properly