CasperJS - How to open up in parallel all links from an array of links - phantomjs

I need to open all the links of the array in parallel.
How to make it?
In my code, all links will be open one by one, instead of parallel.
Here is my code:
casper.then(function(){
links = this.evaluate(function(){
var links = document.getElementsByTagName('a');
links = Array.prototype.map.call(links,function(link){
return link.getAttribute('href');
});
return links;
});
});
casper.then(function(){
this.each(links,function(self,link){
self.thenOpen(link,function(a){
this.echo(this.getCurrentUrl());
});
});
});
casper.run(function(){this.exit()});

The casper object represents a single browser window.
One approach is to create multiple casper objects, one per URL you want to get in parallel. Note that this is not officially supported, and so may be fragile.
Another approach is to use a bash script to start multiple instances of casperjs, and give each of them a set of the URLs to fetch. This is nice and clean (if using persistent cookies, you might want to make sure they each have their own cookie file: --cookies-file=/path/to/cookies.txt), but might be harder for you to script, depending on how you were getting your initial list of URLs.

Related

How to avoid two sequential alerts (one for read and one for edit) when using `window.showDirectoryPicker()`

const dirHandle = await window.showDirectoryPicker();
await dirHandle.requestPermission({ mode: "readwrite" });
I'm using the File System Access API in chrome. I'd like to let the user pick a folder, then write into the folder.
My code works, but two alerts are shown sequentially, one for read and one for write:
The first one is unnecessary. How can I avoid it?
Interestingly, if the user uses drag and drop, only the 2nd alert will appear after the folder is dropped, which is the desired behavior. The first alert seems to come from showDirectoryPicker. In the ideal world, I imagine being able to pass in an option like showDirectoryPicker({ permission: 'readwrite' }), which will request the 2 permissions together.
I agree it feels suboptimal, but it's a one-time thing. When you run the same code again and pick the same folder (or a nested folder), there will be no prompts at all.
This design was chosen because there are two different things that are being asked here:
First, for the app to read all files (which, recursively for subfolders can be a lot).
Second, for the app to be allowed to write (anywhere) into the folder.
As of Chrome 105, you can get a writable directory with just one prompt
const dirHandle = await window.showDirectoryPicker({ mode: "readwrite" });
Or be explicit in asking for a read-only directory (which is the default).
const dirHandle = await window.showDirectoryPicker({ mode: "read" });

How would you redirect calls to the top object in Cypress?

In my application code, there are a lot of calls (like 100+) to the "top object" referring to window.top such as top.$("title") and so forth. Now, I've run into the problem using Cypress to perform end-to-end testing. When trying to log into the application, there are some calls to top.$(...) but the DevTools shows a Uncaught TypeError: top.$ is not a function. This resulted in my team and I discovering that the "top" our application is trying to reach is the Cypress environment itself.
The things I've tried before coming here are:
1) Trying to stub the window.top with the window object referencing our app. This resulted in us being told window.top is a read-only object.
2) Researching if Cypress has some kind of configuration that would smartly redirect calls to top in our code to be the top-most environment within our app. We figured we probably weren't the only ones coming across this issue.
If there were articles, I couldn't find any, so I came to ask if there was a way to do that, or if anyone would know of an alternate solution?
Another solution we considered: Looking into naming window objects so we can reference them by name instead of "window" or "top". If there isn't a way to do what I'm trying to do through Cypress, I think we're willing to do this as a last resort, but hopefully, we don't have to change that, since we're not sure how much of the app it will break upfront.
#Mikkel Not really sure what code I can provide to be useful, but here's the code that causes Cypress to throw the uncaught exception
if (sample_condition) {
top.$('title').text(...).find('content') // Our iframe
} else {
top.$('title').text(page_title)
}
And there are more instances in our code where we access the top object, but they are generally similar. We found out the root cause of the issue is that within Cypress calls to "top" actually interface with Cypress instead of their intended environment which is our app.
This may not be a direct answer to your question, it's just expanding on your request for more information about the technique that I used to pass info from one script to another. I tried to do it within the same script without success - basically because the async nature of .then() stopped it from working.
This snippet is where I read a couple of id's from sessionStorage, and save them to a json file.
//
// At this point the cart is set up, and in sessionStorage
// So we save the details to a fixtures file, which is read
// by another test script (e2e-purchase.js)
//
cy.window().then(window => {
const contents = {
memberId: window.sessionStorage.getItem('memberId'),
cartId: window.sessionStorage.getItem('mycart')
}
cy.writeFile(`tests/cypress/fixtures/cart.json`, contents)
})
In another script, it loads the file as a fixture (fixtures/cart.json) to pull in a couple of id's
cy.fixture(`cart`).then(cart => {
cy.visit(`/${cart.memberId}/${cart.cartId}`)
})

Changing window.navigator within puppeteer to bypass antibot system

I'm trying to make my online bot undetectable. I read number of articles how to do it and I took all tips together and used them. One of them is to change window.navigator.webdriver.
I managed to change window.navigator.webdriver within puppeteer by this code:
await page.evaluateOnNewDocument(() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
});
I'm bypassing this test just fine:
However this test is still laughing at me somehow:
Why WEBDRIVER is inconsistent?
Try this,
First, remove the definition, it will not work if you define and delete from prototype.
Object.defineProperty(navigator, 'webdriver', ()=>{}) // <-- delete this part
Replace your code with this one.
delete navigator.__proto__.webdriver;
Result:
Why does it work?
Removing directly just remove the instance of the object rather than the actual definition. The getter and setter is still there, so browser can find it.
However if you remove from the actual prototype, it will not exist at all in any instance anymore.
Additional Tips
You mentioned you want to make your app undetectable, there are many plugins which achieve the same, for example this package called puppeteer-extra-plugin-stealth includes some cool anti-bot detection techniques. Sometimes it's better to just reuse some packages than to re-create a solution over and over again.
PS: I might be wrong above the above explanation, feel free to guide me so I can improve the answer.

Data sharing mechanism

I need to develop a web application, where our data/html need to be displayed on third party sites using iframe or javascript. Example: cricket widget sharing.
Can someone tell me what type development is it called ?
There would be multiple kind of widget, some of them will also need to be upgrded short periodically (per x second).
Also, should i use iframe or use javascript implemenration merhod to generate the output on clients server.
Can someone provide me a reference or idea ?
Considering I have understood your question rightly, which means you currently need to develop a widget(not a web app) for websites. A widget is something that can be used by others on their websites.
Adding to the above understanding with the example you gave-
Say you develop a Cricket widget, 100's of other websites can put this widget on their site using a small piece of API or code. And, this widget refreshes every 'x' seconds to display Live Score. Hope this is what you need.
Here is the solution/way to solve this:
What you have to do is write an embedder or loader script keeping the following points in mind-
Make it asynchronous, so that the client website doesn't slow down
Keep it short. Abstract your code. Don't give the user 100's of lines.
Preferably don't use a framework. Chances are it can conflict with your client's website/frameworks. Don't even use jQuery!(Because it can conflict with user's jquery version and cause lot of problems to the widget and website) Write pure Javascript code :)
Don't ever use GLOBAL variables. Use var or let and follow best practices. Using global variables may conflict with user's variables and the whole website/client can get messed up.
A sample code -
<script data-version='v1' data-widget-id='your-clients-id' id='unique-embedder-id' type='text/javascript'>
// "data-version": It's useful to put the version of your embedder script in the "version" data attribute.
// This will enable you to more easily debug if any issues arise.
// "data-widget-id": This ID allows us to pull up the correct widget settings from our database.
// It's a "client id" of sorts.
// "id": This HTML id allows us to refer to this embedder script's location. This is helpful if you want to inject html
// code in specific places in hour hosts's website. We use it to insert our payload script into the <head> tag.
//We wrap our code in an anonymous function to prevent any ofour variables from leaking out into the host's site.
(function() {
function async_load(){
//BELOW: we begin to construct the payload script that we will shortly inject into our client's DOM.
var s = document.createElement('script');
s.type = 'text/javascript';
s.async = true;
// Below is the URL you want them to get your payload script from!
var theUrl = 'http://www.your-website.com/assets/your-payload-script.js';
// At the end we tack on the referrer's URL so we know which website is asking for our payload script.
s.src = theUrl + ( theUrl.indexOf("?") >= 0 ? "&" : "?") + 'ref=' + encodeURIComponent(window.location.href);
// This finds the location of our embedder script by finding the element by it's Id.
// See why it was useful to give it a unique id?
var embedder = document.getElementById('unique-embedder-id');
// This inserts our payload script into the DOM of the client's website, just before our embedder script.
embedder.parentNode.insertBefore(s, embedder);
}
// We want the our embedder to do its thing (run async_load and inject our payload script)
// only after our client's website is fully loaded (see above).
// Hence, we wait for onload event trigger. This no longer blocks the client's website from loading!
// We attachEvent (or addEventListener to be cross-browser friendly) because we don't want to replace our
// client's current onLoad function with ours (they might be doing a bunch of things on onLoad as well!)
if (window.attachEvent)
window.attachEvent('onload', async_load);
else
window.addEventListener('load', async_load, false);
})();
</script>
Read the comments. They are very helpful :)
Your users will have to just use a nicely abstracted script tag-
<script type="text/javascript" src="http://example.com/widget.js"></script>
Or maybe even an iFrame(old is gold days) render HTML in iFrame(Best way if you don't want to 'openly' expose(as most won't know) your Abstracted Javascript as well)
<iframe src="http://example.com/mywidget.html"></iframe>
Here are the references that I referred and also have sample widgets which can be reused-
Developing an Embeddable Javascript Widget / Snippet for Client Sites
Creating an Embeddable JavaScript Widget
Creating Asynchronous, Embeddable JavaScript Widgets
Building an embeddable Javascript widget (third-party javascript)
One last time, please don't use global variables and write asynchronous code. These 2 are the main things that you have to take care to make a great/successful widget with top-notch performance.
Happy Coding! Hope it helps :)

Phantomjs write scraped data to database

I have written a phantomjs script to scrap Hoover.
Following is my flow:
1:Get data from database using Nodejs API .
2:At a time I fetch 10 rows,pass these rows one at a time to Website,scrap it(the prob is here. I somehow want to store results from Scrapped into a array or something then pass this data back to node API to update database in Azure).
Right now I am able to get data from azure using nodejs API and also able to scrap using phantomjs my only prob is how do I store the results in tempopary storage or array, which then can be passsed to nodejs API for updating database in azure.
(I'm using CasperJS - it adds a layer on PhantomJS, but I think it might also work in PhantomJS)
You can have CasperJS do an AJAX call to your backend with the data you want to store.
Make CasperJS include a content script to each page it visits:
var casper = require('casper').create({ clientScripts: ['content.js'] });
Then, in content.js:
function sendToServer(theData){
var xhr2 = new XMLHttpRequest();
xhr2.open('POST', your_server_url, true);
xhr2.send(theData);
}
Now you can call sendToServer with casper.evaluate from your script.
And remember to include this in your receiving app (or see this module):
res.writeHead(200, {
'Access-Control-Allow-Origin': '*'
});
otherwise your ajax will fail. It is possible that you would have to add OPTIONS route that returns CORS headers as well. Another solution for this is disabling cross-origin checks on PhantomJS with command line switch.