I'm starting to learn about ModSecurity and rule creation so say I know a page in an web app is vulnerable to cross site scripting. For argument's sake lets say page /blah.html is prone to XSS.
Would the rule in question look something like this?:
SecRule ARGS|REQUEST_URI "blah" REQUEST_HEADERS “#rx <script>” id:101,msg: ‘XSS Attack’,severity:ERROR,deny,status:404
Is it possible to create a config file for that particular page (or even wise to do so?) or better said is it possible to create rules for particular URL's?
Not quite right as a few things wrong with this rule as it's written now. But think you get the general concept.
To explain what's wrong with the rule as it currently stands takes a fair bit of explanation:
First up, ModSecurity syntax for defining rules is made up of several parts:
SecRule
field or fields to check
values to check those fields for
actions to take
You have two sets of parts 2 & 3 which is not valid. If you want to check 2 things you need to chain two rules together, which I'll show an example of later.
Next up you are checking the Request Headers for the script tag. This is not where most XSS attacks exist and you should be checking the arguments instead - though see below for further discussion on XSS.
Also checking for <script> is not very thorough. It would easily be defeated by <script type="text/javascript"> for example. Or <SCRIPT> (should add a t:lowercase to ignore case). Or by escaping characters that might be processed by your parts of your application.
Moving on, there is no need to specify the #rx operator as that's implied if no other operator is given. While no harm in being explicitly it's a bit odd you didn't give it for blah but did for the <script> bit.
It's also advisable to specify a phase rather than use default (usually phase 2). In this case you'd want phase 2 which is when all Request Headers and Request Body pieces are available for processing, so default is probably correct but best to be explicit.
And finally 404 is "page not found" response code. A 500 or 503 might be a better response here.
So your rule would be better rewritten as:
SecRule REQUEST_URI "/blah.html" id:101,phase:2,msg: ‘XSS Attack’,severity:ERROR,deny,status:500,chain
SecRule ARGS "<script" "t:lowercase"
Though this still doesn't address all the ways that the basic check you are doing for a script tag can be worked around, as mentioned above. The OWASP Core Rule Set is a free ModSecurity set of rules that has been built up over time and has some XSS rules in it that you can check out. Though be warned some of their regexprs get quite complex to follow!
ModSecurity also works better as a more generic check, so it's unusual to want to protect just one page like this (though not completely unheard of). If you know one page is vulnerable to a particular attack then it's often better to code that page, or how it's processed, to fix the vulnerability, rather than using ModSecurity to handle it. Fix the problem at source rather than patching round it, is always a good mantra to follow where possible. You would do that by sanitising and escaping input HTML code from inputs for example.
Saying that it is often a good short term solution to use a ModSecurity rule to get a quick fix in while you work on the more correct longer term fix - this is called "virtual patching". Though sometimes they have a tendency to become the long term solutions then as no one gets time to fix the core problem.
If however you wanted a more generic check for "script" in any arguments for any page, then that's what ModSecurity is more often used for. This helps add extra protection on what already should be there in a properly coded app, as a back up and extra line of defence. For example in case someone forgets to protect one page out of many by sanitising user inputs.
So it might be best dropping the first part of this rule and just having the following to check all pages:
SecRule ARGS "<script" id:101,phase:2,msg: ‘XSS Attack’,severity:ERROR,deny,status:500,"t:lowercase"
Finally XSS is quite a complex subject. Here I've assumed you want to check parameters sent when requesting a page. So if it uses user input to construct the page and displays the input, then you want to protect that - known as "reflective XSS." There are other XSS vulnerabilities though. For example:
If bad data is stored in a database and used to construct a page. Known as "stored XSS". To address this you might want to check the results returned from the page in the RESPONSE_BODY parameter in phase 4, rather than the inputs sent to the page in the ARGS parameter in phase 2. Though processing response pages is typically slower and more resource intensive compared to requests which are usually much smaller.
You might be able to execute a XSS without going through your server e.g. If loading external content like a third party commenting system. Or page is served over http and manipulated between server and cling. This is known as "DOM-based XSS" and as ModSecurity is on your server then it cannot protect against these types of XSS attacks.
Gone into quite a lot of detail there but hope that helps explain a little more. I found the ModSecurity Handbook the best resource for teaching yourself ModSecurity despite its age.
Related
There's routing for my server:
/api/debts/:id
to get info about particular debt and
/api/debts/search?searchData=something
to search debts by something.
And first route always handles search requests. I want to change it and know how to do it in several ways (just remove /api/debts in the second route etc) but I want to know a good practice for this.
UPD: There's also the route
/api/debts
to get all debts
For search or filter purposes /api/debts?searchData=something is preferred.
In /api/debts/search?searchData=something, this is not considered good, because it has search which it verb and the searchData is outside the resources.
For details please refer
here
I want to change it and know how to do it in several ways (just remove /api/debts in the second route etc) but I want to know a good practice for this.
REST doesn't care what spelling conventions you use for your resource identifiers.
So if you decide, in the support of other concerns, that you want to use a different path, then go ahead and do that.
/debts/search?searchData=something
/search/debts?searchData=something
Those are both fine. Sticking additional path-segments like api on the front is also fine. General purpose clients don't care, because they are just copying the identifier into the request line
this is not considered good, because it has search which it verb
Consider the URI below
https://www.merriam-webster.com/dictionary/search
This works, and exactly the way you would expect, even though "search" happens to be a verb (sometimes) in the English language.
It's even ok to use registered HTTP methods in your identifiers:
GET /dictionary/delete HTTP/1.1
Host: www.merriam-webster.com
You can try that one in your web browser by clicking on
https://www.merriam-webster.com/dictionary/delete
What doesn't work particularly well is assuming that the identifier defines the request semantics; the request method is the appropriate mechanism for communicating the request semantics, not the identifier. In other words
GET /delete
is a request that promises safe semantics; if you handle that request by making a bunch of destructive edits, then that's a fault of the implementation
The confusion here is not Raj's fault, of course -- there is a LOT of literature on the web that (a) describes arbitrary spelling constraints for resource identifiers and (b) cites "REST" as an authority for those constraints.
Part (b) has no factual basis - it's just folklore that has gained traction.
REST doesn't care about spelling conventions for your identifiers in the same way that compilers don't care about spelling conventions for your variable names.
REST cares a lot about when two identifiers are the same, because caching, and particularly cache invalidation are important things for the general purpose components to be able to do correctly.
But the machines don't distinguish between nouns, verbs, hmac codes, rot13 ciphers, etc.
One of my clients has a PHP script that kept crashing inexplicably. After hours of research, I determined if you send any PHP script a variable (either through GET or POST) that contains " having t", or escaped for the URL "+having+t", it crashes the script and returns a "403 forbidden error". To test it, I made a sample script with the entire contents:
<?php echo "works";
I put it live (temporarily) here: http://primecarerefer.com/test/test.php
Now if you try sending it some data like: http://primecarerefer.com/test/test.php?x=+having+x
It fails. The last letter can be any letter and it will still crash, but changing any other letter makes the script load fine. What would cause this and how can it be fixed? The link is live for now if anyone wants to try out different combinations.
PS - I found that if I get the 403 error a bunch of times in a row, the sever blocks me for 15 minutes.
I had this type of issue on a webserver that ran apache mod_security, but it was very poorly configured, actually mod_security has very bad default regex rules, which are very easy to trip with valid POST or GET data.
To be clear, this has nothing to do with PHP or HTML, it's about POST and GET data passing through mod_security, almost certainly, and mod_security rejecting the request because it believes it is an sql injection attempt.
You can edit the rules yourself depending on the server access, but I don't believe you can do anything, well, if it's mod_security, I know you can't do anything via PHP to get around this.
/etc/httpd/conf.d/mod_security.conf (old path, it's changed, but it gives the idea)
Examples of the default rules:
SecFilter "delete[[:space:]]+from"
SecFilter "insert[[:space:]]+into"
SecFilter "select.+from"
These are samples of the rules
https://www.howtoforge.com/apache_mod_security
here they trip the filter:
http://primecarerefer.com/test/test.php?x=%20%22%20%20select%20from%22
Note that the article is very old and the rules actually are quite differently structured now, but the bad regex remains, ie: select[any number of characters, no matter how far removed, or close]from will trip it, any sql that matches these loose rules will trip it.
But since editing those default files requires access to them, and also assumes they won't be altered in an upgrade of apache mod_security at some point, it's not a good way to fix the problem I found, moving to a better, more professionally setup, hoster, fixed those issues for us. But it does help if you talk to the hosting support to know what the cause of the issue is.
In fact 'having' is not irrelevant at all, it's part of sql injection filters in the regex rules in the security filters run on POST/GET. We used to hit this all the time when admins would edit CMS pages, which would trigger invariably some sql filter, since any string of human words would invariably contain something like 'select.*from' or 'insert.*into' etc.
This mod_security issue used to drive me bonkers trying to debug why backend edit form updates would just hang, until I finally realized it was badly done generic regex patterns in the mod_security file itself.
In a sense, this isn't an answer, because the only fix is going into the server and either editing the rules file, which is pretty easy, or disabling mod_security, or moving to a web hoster that doesn't use those bad generic defaults.
I am struggling (in some sense) to determine which HTTP method is more appropriate for rebooting a remote resource: GET or PUT?
On one hand, it seems more semantic to call http://tools.serviceprovider.net/canopies/d34db33fc4f3?reboot=true because one might want to GET a representation of a freshly rebooted canopy.
On the other hand, a reboot is not 'safe' (nor is it necessarily idempotent, but then a canopy or modem is not just a row in a database) so it might seem more semantic to PUT the canopy into a state of rebooting, then have the server return a 202 to indicate that the reboot was initiated and is processing.
I have been reading up on HTTP/1.1, REST, HATEOAS, and other related concepts over the last week, so I am still putting the pieces together. Could a more seasoned developer please weigh in and confirm or dispel my hunch?
A GET doesn't seem appropriate because a GET is expected, like you said, to be "safe". i.e. no action other than retrieval.
A PUT doesn't seem appropriate because a PUT is expected to be idempotent. i.e. multiple identical operations cause same side-effects as as a single operation. Moreover, a PUT is usually used to replace the content at the request URI with the request body.
A POST appears most appropriate here. Because:
A POST need not be safe
A POST need not be idempotent
It also appears meaningful in that you are POSTing a request for a reboot (much like submitting a form, which also happens via POST), which can then be processed, possibly leading to a new URI containing reboot logs/results returned along with a 303 See Other status code.
Interestingly, Tim Bray wrote a blog post on this exact topic (which method to use to tell a resource representing a virtual machine to reboot itself), in which he also argued for POST. At the bottom of that post there are links to follow-ups on that topic, including one from none other than Roy Fielding himself, who concurs.
Rest is definitely not HTTP. But HTTP definitely does not have only four (or eight) methods. Any method is technically valid (even if as an extension method) and any method is RESTful when it is self describing — such as ‘LOCK’, ‘REBOOT’, ‘DELETE’, etc. Something like ‘MUSHROOM’, while valid as an HTTP extension, has no clear meaning or easily anticipated behavior, thus it would not be RESTful.
Fielding has stated that “The REST style doesn’t suggest that limiting the set of methods is a desirable goal. [..] In particular, REST encourages the creation of new methods for obscure operations” and that “it is more efficient in a true REST-based architecture for there to be a hundred different methods with distinct (non-duplicating), universal semantics.”
Sources:
http://xent.com/pipermail/fork/2001-August/003191.html
http://tech.groups.yahoo.com/group/rest-discuss/message/4732
With this all in mind I am going to be 'self descriptive' and use the REBOOT method.
Yes, you could effectively create a new command, REBOOT, using POST. But there is a perfectly idempotent way to do reboots using PUT.
Have a last_reboot field that contains the time at which the server was last rebooted. Make a PUT to that field with the current time cause a reboot if the incoming time is newer than the current time. If an intermediate server resends the PUT, no problem -- it has the same value as the first command, so it's a no-op.
You might want to get the current time from the server you're rebooting, unless you know that everyone is reasonably time-synced.
Or you could just use a times_rebooted count, eliminating the need for a clock. A PUT times_rebooted: 4 request will cause a reboot if times_rebooted is currently 3, but not if it's 4 or 5. If the current value is 2 and you PUT a 4, that's an error.
The only advantage to using time, if you have a clock, is that sometimes you care about when it happened. You could of course have BOTH a times_rebooted and a last_reboot_time, letting times_rebooted be the trigger.
I get confused when and why should you use specific verbs in REST?
I know basic things like:
Get -> for retrieval
Post -> adding new entity
PUT -> updating
Delete -> for deleting
These attributes are to be used as per the operation I wrote above but I don't understand why?
What will happen if inside Get method in REST I add a new entity or inside POST I update an entity? or may be inside DELETE I add an entity. I know this may be a noob question but I need to understand it. It sounds very confusing to me.
#archil has an excellent explanation of the pitfalls of misusing the verbs, but I would point out that the rules are not quite as rigid as what you've described (at least as far as the protocol is concerned).
GET MUST be safe. That means that a GET request must not change the server state in any substantial way. (The server could do some extra work like logging the request, but will not update any data.)
PUT and DELETE MUST be idempotent. That means that multiple calls to the same URI will have the same effect as one call. So for example, if you want to change a person's name from "Jon" to "Jack" and you do it with a PUT request, that's OK because you could do it one time or 100 times and the person's name would still have been updated to "Jack".
POST makes no guarantees about safety or idempotency. That means you can technically do whatever you want with a POST request. However, you will lose any advantage that clients can take of those assumptions. For example, you could use POST to do a search, which is semantically more of a GET request. There won't be any problems, but browsers (or proxies or other agents) would never cache the results of that search because it can't assume that nothing changed as a result of the request. Further, web crawlers would never perform a POST request because it could not assume the operation was safe.
The entire HTML version of the world wide web gets along pretty well without PUT or DELETE and it's perfectly fine to do deletes or updates with POST, but if you can support PUT and DELETE for updates and deletes (and other idempotent operations) it's just a little better because agents can assume that the operation is idempotent.
See the official W3C documentation for the real nitty gritty on safety and idempotency.
Protocol is protocol. It is meant to define every rule related to it. Http is protocol too. All of above rules (including http verb rules) are defined by http protocol, and the usage is defined by http protocol. If you do not follow these rules, only you will understand what happens inside your service. It will not follow rules of the protocol and will be confusing for other users. There was an example, one time, about famous photo site (does not matter which) that did delete pictures with GET request. Once the user of that site installed the google desktop search program, that archieves the pages locally. As that program knew that GET operations are only used to get data, and should not affect anything, it made GET requests to every available url (including those GET-delete urls). As the user was logged in and the cookie was in browser, there were no authorization problems. And the result - all of the user photos were deleted on server, because of incorrect usage of http protocol and GET verb. That's why you should always follow the rules of protocol you are using. Although technically possible, it is not right to override defined rules.
Using GET to delete a resource would be like having a function named and documented to add something to an array that deletes something from the array under the hood. REST has only a few well defined methods (the HTTP verbs). Users of your service will expect that your service stick to these definition otherwise it's not a RESTful web service.
If you do so, you cannot claim that your interface is RESTful. The REST principle mandates that the specified verbs perform the actions that you have mentioned. If they don't, then it can't be called a RESTful interface.
My Web host has refused to help me with this, so I'm coming to the wise folks here for some help "black-box debugging". Here's an edited version of what I sent to them:
I have two (among other) domains at dreamhost:
1) thefigtrees.net
2) shouldivoteformccain.com
I noticed today that when I host a CGI script on #1, that by the time the
CGI script runs, the HTTP GET query string passed to it as the QUERY_STRING
environment variable has already been URL decoded. This is a problem because
it then means that a standard CGI library (such as perl's CGI.pm) will try to
split on ampersands and then decode the string itself. There are two
potential problems with this:
1) the string is doubly-decoded, so if a value is submitted to the script
such as "%2525", it will end up being treated as just "%" (decoded twice)
rather than "%25" (decoded once)
2) (more common) if there is an ampersand in a value submitted, then it
will get (properly) submitted as %26, but the QUERY_STRING env. variable will
have it already decoded into an "&" and then the CGI library will improperly
split the query string at that ampersand. This is a big problem!
The script at http://thefigtrees.net/test.cgi demonstrates this. It echoes back the
environment variables it is called with. Navigating in a browser to:
http://thefigtrees.net/lee/test.cgi?x=y%26z
You can see that REQUEST_URI properly contains x=y%26z (unencoded) but that
QUERY_STRING already has it decoded to x=y&z.
If I repeat the test at domain #2 (
http://www.shouldivoteformccain.com/test.cgi?x=y%26z ) I see that the
QUERY_STRING remains undecoded, so that CGI.pm then splits and decodes
correctly.
I tried disabling my .htaccess files on both to make sure that was not the
problem, and saw no difference.
Could anyone speculate on potential causes of this, since my Web host seems unwilling to help me?
thanks,
Lee
I have the same behavior in Apache.
I believe mod_rewrite will automatically decode the URL if it is installed, however, I have seen the auto-decode behavior even without it. I haven't tracked down the other culprit.
A common workaround is to double encode the input parameter (taking advantage of URL decoding being safe when called on an unencoded URL).
Curious. Nothing I can see from here would give us a clue why this would happen... I can only confirm that it is an environment bug and suspect maybe configuration differences like maybe rewrite rules.
Per CGI 1.1, this decoding should only happen to SCRIPT-NAME and PATH-INFO, not QUERY-STRING. It's pointless and annoying that it happens at all, but that's the spec. Using REQUEST-URI instead of those variables where available (ie. Apache) is a common workaround for places where you want to put out-of-bounds and Unicode characters in path parts, so it might be reasonable to do the same for query strings until some sort of resolution is available from the host.
VPSs are cheap these days...