Is there a way to block requests to the most common trackers using phantomJS? - phantomjs

Using Fiddler, I am monitoring all network traffic during phantomJS-based scraping.
I discover that http://www.venez.net/tracker.js?site_id=4&section_id=0 gets called once for every session on port 9524.
What is the purpose of this call?
Edit:
As suggested below, this may be because the scraped site calls this tracker. Is there a way to block requests to the most common trackers using phantomJS, maybe involving a dict of some kind?
Edit 2:
It has been suggested that this provides a solution. However, this is not the case AT ALL since I am also blocking google analytics and as we can see, the aforementioned tracker still gets called. None of the code in this solution has any relation to things different than google analytics.
.

Related

Workarounds for Safari ITP 2.3

I am very confused as to how Safari 2.3 works in certain respects, and why sites can’t easily circumvent it. I don’t understand under what circumstances limits are applied, what the exact limits are, to what they are applied, and for how long.
To clarify my question I broke it down into several cases. I will be referring to Apple’s official blog post about ITP 2.3 [1] which you can quote from, but feel free to link to any other authoritative or factually correct sources in your answer.
For third-party sites loaded in iframes:
Why can’t they just use localStorage to store the values of cookies, and send this data along not as actual browser cookies🍪, but as data in the body of the request? Similarly, they can parse the response to updaye localStorage. What limits does ITP actually place on localStorage in third party iframes?
If the localStorage is frequently purged (see question 1), why can’t they simply use postMessage to tell a script on the enclosing website to store some information (perhaps encrypted) and then spit it back whenever it loads an iframe?
For sites that use link decoration
I still don’t understand what the limits on localStorage are in third party sites in iframes, which did NOT get classified as link decorator sites. But let’s say they are link decorator sites. According to [1] Apple only start limiting stuff further if there is a querystring or fragment. But can’t a website rather trivially store this information in the URL path before the querystring, ie /in/here without ?in=here … certainly large companies like Google can trivially choose to do that?
In the case a site has been labeled as a tracking site, does that mean all its non-cookie data is limited to 7 days? What about cookies set by the server, aren’t they exempted? So then simply make a request to your server to set the cookie instead of using Javascript. After all, the operator of the site is very likely to also have access to its HTTP server and app code.
For all sites
Why can’t a service like Google Analytics or Facebook’s widgets simply convince a site to additional add a CNAME to their DNS and get Google’s and Facebook’s servers under a subdomain like gmail.mysite.com or analytics.mysite.com ? And then boom, they can read and set cookies again, in some cases even on the top-level domain for website owners who don’t know better. Doesn’t this completely defeat the goals of Apple’s ITP, since Google and Facebook have now become a “second party” in some sense?
Here on StackOverflow, when we log out on iOS Safari the StackOverflow network is able to log out of multiple sites at once … how is that even accomplished if no one can track users across websites? I have heard it said that “second party cookies” still can be stored but what exactly makes a second party cookie different from a third party?
My question is broken down into 6 cases but the overall theme is, in each case: how does Apple’s latest ITP work in that case, and how does it actually block all cases of potentially malicious tracking (to the point where a well-funded company can’t just do the workarounds above) while at the same time allowing legitimate use cases?
[1] https://webkit.org/blog/9521/intelligent-tracking-prevention-2-3/

Can Zap be used as a DAST tool via API without spidering?

I'm trying to use Zap as a DAST tool via the API and it's getting a bit annoying.
Can i use the tool as an attack tool instead of a proxy tool? what i mean is, currently i can't launch an active scan without the url being in the tree, which is only done via the spider afaik right?
What i want is to provide the url and launch an active scan based on a policy and get results, now that i think about it, this is similar to fuzzing just with attack vectors, although i see the logic of what to do with URL X if there is no history or scanning done, can't it just scan the page for actions and variables? the main difference is page\url scanning contrary to spidering which assumes there are other urls.
After writing this i'm not sure it can be done without a spider unless you're in my situation so let me explain it.
Lets say for example's sake i just want to scan the login page for SQLI and i'm using Owasp JuiceShop to make things easier, can i tell zap to attack the one page? the only way i found on that example is via the POST method since the url is not a static page and isn't being pick up by Zap unless it's an action, but then i can't launch it without spidering so this is like a loop.
Sorry for the long post hopefully you can provide some insights.
Update in comments
ZAP has to know about the site its going to attack. We deliberately separate the concepts of discovery and attacking because theres no one discovery option thats best for all. You can use the standard spider, the ajax spider, import urls, import defns like OpenAPI, proxy your browser, proxy regression tests or even make direct requests to the target site via the ZAP API.
It looks like you have quite a few questions about ZAP. The ZAP User Group is probably a better forum for them: https://groups.google.com/group/zaproxy-users

How to call Google NLP Api from a Google Chrome extension

My aim is to select some text from a web page, start a google chrome extension and give the text to a google cloud api (Natural Language API) in my case.
I want to do some sentimental analysis and then get back the result to mark/ highlight positive sentences in green and negative ones in red.
I am new to this and do not know how to start.
The extension consists of manifest, popup etc. How should I call an API from there that does Natural Language Processing?
Should I create a Google Cloud Application with an API_KEY to call? In that case I would have to upload my credentials right?
Sorry sounds a bit confusing I know but I just don't know how I can bring this 2 things together an would be more than happy about any help
The best way to authenticate your app will depend on the specific needs and use cases of your application. You can see an overview of all the different methods here.
If you are not planning on identifying users nor on using a back end server that handles authenticating (as I assume to be your case), the best option would indeed be to use API keys. They do not identify the user, but are enough for the Natural Language APIs.
To do this you will need to create an API key for the services you want and add the necessary restrictions to make the key as secure as possible. Detailed instructions on how to do this and how to use the key in a url can be found here.
The API call could be made from within the Chrome extension with any JavaScript method capable of performing POST requests. For example using XMLHttpRequest or the Fetch API. You can find an example of the parameters that need to be included in the request here.
You may run into CORS issues when making the request directly from the extension. I recommend reading this answer, where a couple of workarounds for these issues are suggested.

How to test google analytics with Selenium on my app?

I would like to test google analytics and see that it works properly (see that events are tracked when appropriate with the appropriate parameters, etc..). What would be the best approach to do it? Can Selenium do something like expect for certain REST calls and make sure they happened?
You could integrate a proxy such Browsermob into your tests. This would allow you to perform actions on the web page with Selenium then query the proxy via its API to assert that the correct calls were made and with the correct values.
Here is a link explaining how to block analytics using Browsermob but should still be a good background to approach your problem;
https://sqa.stackexchange.com/questions/6859/how-do-you-block-google-analytics-from-selenium-automated-visits

How to properly benchmark / stresstest single-page web application

I am somehow familiar with benchmarking/stress testing traditional web application and I find it relatively easy to start estimating maximum load for it. With tools I am familiar (Apache ab, Apache JMeter) I can get a rough estimate of the number of request/second a server with a standard app can handle. I can come up with user story, create a list of page I would like to check and benchmark them separately. A lot of information can be found on the internet how to go from novice like me to a master.
But in my opinion a lot of things is different when benchmarking single page application. The main entry point is the most expensive request, because the user loads majority of things needed for proper app experience (or at least in my app it is this way). After it navigation to other places is just ajax request, waiting for json, templating. So time to window load is not important anymore.
To add problems to this, I was not able to find any resources how people do it properly.
In my particular case I have a SPA written with knockout, and sitting on apache server (most probably this is irrelevant). I would like to have rough estimate how many users can my app on the particular server handle. I am not looking for a tool recommendation (also it would be nice), I am looking for experienced person to share his insight about benchmarking process.
I suggest you to test this application just like you would test any other web application, as you said - identify the common use cases, prepare scripts for them, run in the appropriate mix and analyze the results.
Web-applications can break in many different ways for different reasons. You are speculating that the first page load is heavy and the rest is just small ajax. From experience I can tell you that this is sometimes misleading - for example, you can find that the heavy page is coming from cache and the server is not working hard for it, but a small ajax response requires a lot of computing power or long running database query or has some locking in the code that cause it to break or be slow under load - that's why we do load testing.
You can do this with any load testing tool, ideally one that can handle those types of script with many dynamic values. My personal preference is WebLOAD by RadView
I am dealing with similar scenario, SPA application where first page loads and there after everything is done by just requesting for other html pages and/or web service calls to get the data.
My goal is to stress test the web server and DB server.
My solution is to just create request for those html pages(very low performance issue, IMO, since they are static and they can be cached for a very long time in the browser) and web service call requests. Biggest load will come from the request for data/processing via the web service call requests.
Capture all the requests for html and web service calls using a tool like fiddler, and use any load test tools(like JMeter) to run these requests using as many virtual users as you want to test your application with.