Phantomjs write scraped data to database

Phantomjs write scraped data to database - phantomjs

I have written a phantomjs script to scrap Hoover.
Following is my flow:
1:Get data from database using Nodejs API .
2:At a time I fetch 10 rows,pass these rows one at a time to Website,scrap it(the prob is here. I somehow want to store results from Scrapped into a array or something then pass this data back to node API to update database in Azure).
Right now I am able to get data from azure using nodejs API and also able to scrap using phantomjs my only prob is how do I store the results in tempopary storage or array, which then can be passsed to nodejs API for updating database in azure.

(I'm using CasperJS - it adds a layer on PhantomJS, but I think it might also work in PhantomJS)
You can have CasperJS do an AJAX call to your backend with the data you want to store.
Make CasperJS include a content script to each page it visits:
var casper = require('casper').create({ clientScripts: ['content.js'] });
Then, in content.js:
function sendToServer(theData){
var xhr2 = new XMLHttpRequest();
xhr2.open('POST', your_server_url, true);
xhr2.send(theData);
}
Now you can call sendToServer with casper.evaluate from your script.
And remember to include this in your receiving app (or see this module):
res.writeHead(200, {
'Access-Control-Allow-Origin': '*'
});
otherwise your ajax will fail. It is possible that you would have to add OPTIONS route that returns CORS headers as well. Another solution for this is disabling cross-origin checks on PhantomJS with command line switch.

Related

Token management in Karate parallel execution

Scenario : All the endpoints in my API test need authentication and hence authorization header needs to be passed. I have Authentication.feature file where I read refresh token from a file, generate new access token, write the new refresh token back to the file. After running each scenario, I need to update the refresh token back to the file and it will be consumed by next feature. Authentication.feature file is called from karate-config.js file and authentication header is set as shown below
var response = karate.call('classpath:Test/features/Authentication.feature',config).response;
var token = response.access_token
karate.configure('headers',{Authorization: 'Bearer '+token});
Everything till now is working fine, but when I use junit5 parallel runner, it causes issues with the authentication token. Not the latest refresh token is written to the file. I tried by making the file read/write part synchronized, but it does not solve the problem. Also I tried #parallel=false annotation in Authentication.feature, still no luck. How can I make my test run parallel at the same time it correctly update the file with latest refresh token

The recommended way to do this is to use karate.callSingle() - please read about it if you haven't already: https://github.com/karatelabs/karate#hooks
Note that this code example below is JS in karate-config.js:
var result = karate.callSingle('classpath:some/package/my.feature');
Also see this answer for some other ideas: https://stackoverflow.com/a/53516885/143475

How to return Vuex-generated page to client on initial Vue load?

I have a Vue / Nuxtjs app which displays lots of user-provided content (think of it as a crowdsourced blog). The content on the client is retrieved and stored in Vuex. When a page is loaded, it displays the current content and then uses fetch to get the updated data. Here is a typical component:
fetch() {
this.$store.dispatch('feeds/refreshLatest')
},
computed: {
feed() {
return this.$store.state.feeds.latest
}
}
where feeds/refreshLatest uses axios to retrieve the posts.
This works quite well. The problem is the initial load is very slow, especially on the front page which has to process and display dozens of articles.
I have SSR enabled, and would like the server to store the content, and then on initial load provide a rendered page to the client. However, the Vuex object on the server seems to be new for each request, and so the client has to wait for the entire set of articles to be fetched before anything is displayed, which is unacceptable. Doing all the fetches only on the client solves this problem, but it is still too slow.
I thought I could somehow use the same server Vuex on each call and sending it to the client with nuxtServerInit, but I don't see a way to achieve sharing the Vuex. Thank you for any pointers or other packages which could help.

The question is that after the fetch is finished after the api call in the server rendering, the DOM is dropped to the client, and the process is running every time and slow?
I solved similar issues using cookies. This is because cookies can also be used to render servers. I used the method below.
Store the data in the cookie after the initial api call, and send the data in the cookie to the client first.(If cookies are present, do not call api from server)
Call api from client to update data.
I use this library.
https://github.com/microcipcip/cookie-universal/tree/master/packages/cookie-universal-nuxt#readme

In TestCafe Is possibile to register xhr and use as mocking (automocking)

I'm using testCafe for my functional test.
My project used a lot of XHR request and I don't want to waste my time to generate each single mock.
Exists an automocker like this: https://github.com/scottschafer/cypressautomocker for testcafe?

TestCafe does not provide the described functionality out of the box. However, you can use the combination of RequestLogger and RequestMock
The idea is in that you can create a JSON file with request results at the first run using the RequestLogger.
Then, based on results of the first run, you can configure your RequestMock object to respond with the results from the file for all consequent requests.

How is XHR a viable alternative to asynchronous module definition?

I'm learning about the case for asynchronous module definition (AMD) from here but am not quite clear about the below:
It is tempting to use XMLHttpRequest (XHR) to load the scripts. If XHR
is used, then we can massage the text above -- we can do a regexp to
find require() calls, make sure we load those scripts, then use eval()
or script elements that have their body text set to the text of the
script loaded via XHR.
XHR is using ajax or something to make a call to grab a resource from the database, correct? What does the eval() or script elements have to do with this? An example would be very helpful

That part of RequireJS' documentation is explaining why using XHR rather than doing what RequireJS does is problematic.
XHR is using ajax or something to make a call to grab a resource from the database, correct?
XHR is what allows you to make an Ajax call. jQuery's $.ajax for instance creates an XHR instance for you and uses it to perform the query. How the server responds depends on how the server is designed. Most of the servers I've developed won't use a database to answer a request made to a URL that corresponds to a JavaScript file. The file is just read from the file system and sent back to the client.
What does the eval() or script elements have to do with this?
Once the request is over, what you have is a string that contains JavaScript. You've fetched the code of your module but presumably you also want to execute it. eval is one way to do it but it has the disadvantages mentioned in the documentation. Another way to do it would be to create a script element whose body is the code you've fetched, and then insert this script in the DOM but this also has issues, as explained in the documentation you refer to.

Intercept Requests With Custom Responses in PhantomJS?

Is there a way to intercept a resource request and give it a response directly from the handler? Something like this:
page.onRequest(function(request){
request.reply({data: 123});
});
My use case is for using PhantomJS to render a page that makes calls to my API. In order to avoid authentication issues, I'd like to intercept all http requests to the API and return the responses manually, without making the actual http request.
onResourceRequest almost does this, but doesn't have any modification capabilities.
Possibilities that I see:
I could store the page as a Handlebars template, and render the data into the page and pass it off as the raw html to PhantomJS (instead of a URL). While this would work, it would make changes difficult since I'd have to write the data layer for each webpage, and the webpages couldn't stand alone.
I could redirect to localhost, and have a server there that listens and responds to the requests. This assumes that it would be ok to have an open, un-authenticated version of the API on localhost.
Add the data via page.evaluate to the page's global window object. This has the same problems as #1: I'd need to know a-priori what data the page needs, and write server side code unique to each page.

I recently needed to do this when generating pdfs with phantom js.
It's slightly hacky, but seems to work.
var page = require('webpage').create(),
server = require('webserver').create(),
totallyRandomPortnumber = 29522,
...
//in my actual code, totallyRandomPortnumber is created by a java application,
//because phantomjs will report the port in use as '0' when listening to a random port
//thereby preventing its reuse in page.onResourceRequested...
server.listen(totallyRandomPortnumber, function(request, response) {
response.statusCode = 200;
response.setHeader('Content-Type', 'application/json;charset=UTF-8');
response.write(JSON.stringify({data: 'somevalue'}));
response.close();
});
page.onResourceRequested = function(requestData, networkRequest) {
if(requestData.url.indexOf('interceptme') != -1) {
networkRequest.changeUrl('http://localhost:' + totallyRandomPortnumber);
}
};
In my actual application I'm sending some data to phantomjs to overwrite request/responses, so I'm doing more checking on urls both in server.listen and page.onResourceRequested.
This feels like a poor-mans-interceptor, but it should get you (or whoever this may concern) going.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Phantomjs write scraped data to database - phantomjs

Related

Token management in Karate parallel execution

How to return Vuex-generated page to client on initial Vue load?

In TestCafe Is possibile to register xhr and use as mocking (automocking)

How is XHR a viable alternative to asynchronous module definition?

Intercept Requests With Custom Responses in PhantomJS?

Categories

Resources