Zap Vaadin setup issues - authentication

Being absolutely new to security testing, I am trying to run basic steps and then trying to run a spider and active scan. I have seen couple of videos from owasp & youtube and tried to make sense of the included ZAP documentation. However, I don't think ZAP is logged in while making the spider crawl or active scans.
Mine is a Java + Vaadin & Spring based application and the POST URL's don't change on making any kind of request. Just the request parameters change. POST URL's are always
http://example.com/UIDL/?v-uiId=0
I have used
Set Active Session under HTTP session, before running the spider/active scan
have setup the context(though very few URL's appear under Sites),
I have also tried to update structure parameter with words like "page", "UIDL" etc. which i could see in the URL's
I have also tried to exclude logout URL, but since all the POST URL's are the same, the spider and active scan don't really do anything, in that case.
On right clicking i got an option to select the request as Form based Authentication.
I have tried that as well, however, it populates the "Login Request POST data" with vaues like
12929ddf-5d7f-4264-b810-2fa8f38eca6f[["597","v","v",["pagelength",["i","28"]]],["597","v","v",["firstToBeRendered",["i","0"]]],["597","v","v",["lastToBeRendered",["i","26"]]],["597","v","v",["reqfirstrow",["i","15"]]],["597","v","v",["reqrows",["i","12"]]]]
and fills the username/password fields with exactly the same.
One last thing, do we really need to disable the XSRF to be able to use ZAP properly with Java/Vaadin applications?
I am simply trying to get it working and then continue learning from there. Would appreciate if anyone can please help me with the correct setup.

Related

ZAP: Mix manual browsing, active scanning and fuzzing for testing a very large Web application?

We've got a very large Web application with about 1000 pages to be tested (www.project-open.com, a project + finance management application for service companies). Each page may take multiple parameters (object-id, filters, column name to use for sorting, ...). We are now going to implement additional security checks on these parameters, so we need to systematically test that a) offensive parameter values are rejected and b) that the parameter values actually used by the application are accepted correctly.
Example: We might want to say that the sort_column parameter in a page should only consist of alphanumeric characters. But the application in reality may include a column name with a space in it, leading to a false positive security alert (space character not being an alphanumeric character).
My idea for testing this would be to 1) manually navigate to each of these pages in proxy mode, 2) tell ZAP to start spidering all links on this page for one or two levels and 3) tell ZAP to start fuzzing on these URLs.
How can this be implemented? I've got a basic understanding of ZAP and did some security testing of ]project-open[. I've read about a ZAP extension for scanning a list of URLs, but in our case we want to execute some specific ZAP actions on each of these URLs...
I'll summarise some of your options:
I'd start by using the ZAP desktop so that you can control it and see exactly what effect it has. You can launch a browser, explore you app and then active scan the urls you've found. The standard spider will find explore traditional apps very effectively but apps that make a lot of use of JavaScript will probably require the ajax spider.
You can also use the 'attack mode' which attacks everything that is in scope (which you define) that you proxy through ZAP. That just means the ZAP effectively just follows what you do and attacks anything new. If you dont explore part of your app then ZAP wont attack it.
If you want to implement your own tests then I'd have a look at creating scripted active scan rules. We can help you with those but I'd just start with exploring your app and running the default rules for now.

Accessing Metacritic API and/or Scraping

Does anybody know where documentation for the Metacritic api is/if it still works. There used to be a Metacritic API at https://market.mashape.com/byroredux/metacritic-v2#get-user-details which disappeared today.
Otherwise I'm trying to scrape the site myself but keeping getting a blocked by a 429 Slow down. I got data like 3 times this hour and haven't been able to get anymore in the last 20 minutes which is making testing difficult and application possibly useless. Please let me know if there's anything else I can be doing to scape I don't know about.
I was using that API as well for an app I wrote a while ago. Looks like the creator removed it from Mashape. I just sent him an email to ask whether it'll be back up. I did find this scraper online. It only has a few endpoints but following the examples given you could easily add more. Let me know if you make any progress!
Edit: Looks like CBS requested it to be taken down. The ToS prohibits scraping:
[…] you agree not to do the following, or assist others to do the following:
Engage in unauthorized spidering, “scraping,” data mining or harvesting of Content, or use any other unauthorized automated means to gather data from or about the Services;
Though I was hoping for a Javascript way of doing this, the creator of the API also told me some info.
He says I was getting blocked for not having a User agent in the header and should use a 429 handling procedure i.e. re-request with longer pauses in between.
A PHP plugin available as well: http://datalinx.io/shop/metacritic-api/
I had to add a user agent like JCDJulian said and now it allows me to scrape. So for Ruby:
agent = Mechanize.new
agent.user_agent_alias = "Mac Firefox"
Then it stopped giving me the 403 Forbidden error.

What is preventing people from using someone else's CAPTCHA as their own?

Why (other than moral reasons) don't more people use the CAPTCHAs of other sites as their own while selling the solving of said CAPTCHAs?
To me, such a system seems like it would be simple to implement:
set up a script that does something on another website that requires a CAPTCHA to be completed through the use of a proxy service
when a user on your site performs a task that requires the completion of a CAPTCHA, simply serve them the CAPTCHA that the other
site asks you to solve
when the user solves the CAPTCHA, your script can perform the desired action on the other site that is the source of the CAPTCHA,
and the user on your site is also verified through this process
Is this commonplace? If not, why not? What, if anything, could be done to prevent this?
Fetching the captcha. Assuming one could easily fetch the exact visual of the captcha from the foreign host. To do this, you have to pass the referral check (most browsers (navigated by humans) allow to send the http_referer). You also would have to save the session_id and the secret from the hidden input.
Checking the result. The foreign host must link the saved variables with the ones associated with the session of your first request, which requires you to implement tricky cURL methods. You would have to handle multiple parallel requests, all from your single ip.
Your server will probably use more resources when hacking a captcha on a foreign host than if it generates a captcha on its own.
Prevents
http_referer check
limit requests for single IP to e.g. 5 / minute
good session handling and tricky cookies
it's not impossible to reverse engineer javascript, but the more complicated your javascript is, ...
you have to find a pattern that recognizes the result on the foreign host. the easiest signature may be the Location header field, leading either to /path/success.html or /path/tryagain.php
Challenge:
I took a moment to prepare an example: http://woisteinebank.de/test/
In this example, I attach keys to the session_id(); and save it in the database.
Through session_regenerate_id(); I have a fresh session on every request.
In check.php, I compare the database values to the $_GET values.
Try to find a way to get leech this captcha, I'll try to defend. Everytime you sucessfully use my captcha on your site, I try to defend it.

Prevention from entire website downloading?

There is one IP (from China) which is trying to download my entire website. It downloads all my pages and loads the server significantly (I have more than 500 000 pages). Looking at the access logs I can tell it's definitely not a Google bot or any other search engine bot.
Temporarily I've banned it (using iptables rules), but it's not a solution for me, because some of my real users also have the same IP, so they are also banned and cannot acces the website.
Is there any way to prevent such kind of "user activity"? Maybe a mechanism which implements captcha if you try to request more than 5 requests a second or something?
P.S. I'm using Yii framework (PHP).
Any suggestions are greatly appreciated.
thank you!
You have answered your own question!
Make captcha appear if the request exceeds certain number per second or per minute!
You should use CCaptchaAction to implement, like this.
I guess the best way to monitor for suspicious user activity is really user session, CWebUser's getState()/setState(). Store current request time in user session, compare it to several previous values, show captcha if user makes requests too often.
Create new component, preload it via CWebApplication::$preload and check user activity in components init() function. This way you'll be able to turn bot check on and off easily.

Figure out if a website has restricted/password protected area

I have a big list of websites and I need to know if they have areas that are password protected.
I am thinking about doing this: downloading all of them with httrack and then writing a script that looks for keywords like "Log In" and "401 Forbidden". But the problem is these websites are different/some static and some dynamic (html, cgi, php,java-applets...) and most of them won't use the same keywords...
Do you have any better ideas?
Thanks a lot!
Looking for password fields will get you so far, but won't help with sites that use HTTP authentication. Looking for 401s will help with HTTP authentication, but won't get you sites that don't use it, or ones that don't return 401. Looking for links like "log in" or "username" fields will get you some more.
I don't think that you'll be able to do this entirely automatically and be sure that you're actually detecting all the password-protected areas.
You'll probably want to take a library that is good at web automation, and write a little program yourself that reads the list of target sites from a file, checks each one, and writes to one file of "these are definitely passworded" and "these are not", and then you might want to go manually check the ones that are not, and make modifications to your program to accomodate. Using httrack is great for grabbing data, but it's not going to help with detection -- if you write your own "check for password protected area" program with a general purpose HLL, you can do more checks, and you can avoid generating more requests per site than would be necessary to determine that a password-protected area exists.
You may need to ignore robots.txt
I recommend using the python port of perls mechanize, or whatever nice web automation library your preferred language has. Almost all modern languages will have a nice library for opening and searching through web pages, and looking at HTTP headers.
If you are not capable of writing this yourself, you're going to have a rather difficult time using httrack or wget or similar and then searching through responses.
Look for forms with password fields.
You may need to scrape the site to find the login page. Look for links with phrases like "log in", "login", "sign in", "signin", or scrape the whole site (needless to say, be careful here).
I would use httrack with several limits and then search the downloaded files for password fields.
Typically, a login form could be found within two links of the home page. Almost all ecommerce sites, web apps, etc. have login forms that are accessed just by clicking on one link on the home page, but another layer or even two of depth would almost guarantee that you didn't miss any.
I would also limit the speed that httrack downloads, tell it not to download any non-HTML files, and prevent it from downloading external links. I'd also limit the number of simultaneous connections to the site to 2 or even 1. This should work for just about all of the sites you are looking at, and it should be keep you off the hosts.deny list.
You could just use wget and do something like:
wget -A html,php,jsp,htm -S -r http://www.yoursite.com > output_yoursite.txt
This will cause wget to download the entire site recursively, but only download endings listed with the -A option, in this case try to avoid heavy files.
The header will be directed to file output_yoursite.txt which you then can parse for the header value 401, which means that the part of the site requires authentication, and parse the files accordingly to Konrad's recommendation also.
Looking for 401 codes won't reliably catch them as sites might not produce links to anything you don't have privileges for. That is, until you are logged in, it won't show you anything you need to log in for. OTOH some sites (ones with all static content for example) manage to pop a login dialog box for some pages so looking for password input tags would also miss stuff.
My advice: find a spider program that you can get source for, add in whatever tests (plural) you plan on using and make it stop of the first positive result. Look for a spider that can be throttled way back, can ignore non HTML files (maybe by making HEAD requests and looking at the mime type) and can work with more than one site independently and simultaneously.
You might try using cURL and just attempting to connect to each site in turn (possibly put them in a text file and read each line, try to connect, repeat).
You can set up one of the callbacks to check the HTTP response code and do whatever you need from there.