Why would Apache be URL decoding my query string? - apache

My Web host has refused to help me with this, so I'm coming to the wise folks here for some help "black-box debugging". Here's an edited version of what I sent to them:
I have two (among other) domains at dreamhost:
1) thefigtrees.net
2) shouldivoteformccain.com
I noticed today that when I host a CGI script on #1, that by the time the
CGI script runs, the HTTP GET query string passed to it as the QUERY_STRING
environment variable has already been URL decoded. This is a problem because
it then means that a standard CGI library (such as perl's CGI.pm) will try to
split on ampersands and then decode the string itself. There are two
potential problems with this:
1) the string is doubly-decoded, so if a value is submitted to the script
such as "%2525", it will end up being treated as just "%" (decoded twice)
rather than "%25" (decoded once)
2) (more common) if there is an ampersand in a value submitted, then it
will get (properly) submitted as %26, but the QUERY_STRING env. variable will
have it already decoded into an "&" and then the CGI library will improperly
split the query string at that ampersand. This is a big problem!
The script at http://thefigtrees.net/test.cgi demonstrates this. It echoes back the
environment variables it is called with. Navigating in a browser to:
http://thefigtrees.net/lee/test.cgi?x=y%26z
You can see that REQUEST_URI properly contains x=y%26z (unencoded) but that
QUERY_STRING already has it decoded to x=y&z.
If I repeat the test at domain #2 (
http://www.shouldivoteformccain.com/test.cgi?x=y%26z ) I see that the
QUERY_STRING remains undecoded, so that CGI.pm then splits and decodes
correctly.
I tried disabling my .htaccess files on both to make sure that was not the
problem, and saw no difference.
Could anyone speculate on potential causes of this, since my Web host seems unwilling to help me?
thanks,
Lee

I have the same behavior in Apache.
I believe mod_rewrite will automatically decode the URL if it is installed, however, I have seen the auto-decode behavior even without it. I haven't tracked down the other culprit.
A common workaround is to double encode the input parameter (taking advantage of URL decoding being safe when called on an unencoded URL).

Curious. Nothing I can see from here would give us a clue why this would happen... I can only confirm that it is an environment bug and suspect maybe configuration differences like maybe rewrite rules.
Per CGI 1.1, this decoding should only happen to SCRIPT-NAME and PATH-INFO, not QUERY-STRING. It's pointless and annoying that it happens at all, but that's the spec. Using REQUEST-URI instead of those variables where available (ie. Apache) is a common workaround for places where you want to put out-of-bounds and Unicode characters in path parts, so it might be reasonable to do the same for query strings until some sort of resolution is available from the host.
VPSs are cheap these days...

Related

Modsecurity create config file with rules for specific URL

I'm starting to learn about ModSecurity and rule creation so say I know a page in an web app is vulnerable to cross site scripting. For argument's sake lets say page /blah.html is prone to XSS.
Would the rule in question look something like this?:
SecRule ARGS|REQUEST_URI "blah" REQUEST_HEADERS “#rx <script>” id:101,msg: ‘XSS Attack’,severity:ERROR,deny,status:404
Is it possible to create a config file for that particular page (or even wise to do so?) or better said is it possible to create rules for particular URL's?
Not quite right as a few things wrong with this rule as it's written now. But think you get the general concept.
To explain what's wrong with the rule as it currently stands takes a fair bit of explanation:
First up, ModSecurity syntax for defining rules is made up of several parts:
SecRule
field or fields to check
values to check those fields for
actions to take
You have two sets of parts 2 & 3 which is not valid. If you want to check 2 things you need to chain two rules together, which I'll show an example of later.
Next up you are checking the Request Headers for the script tag. This is not where most XSS attacks exist and you should be checking the arguments instead - though see below for further discussion on XSS.
Also checking for <script> is not very thorough. It would easily be defeated by <script type="text/javascript"> for example. Or <SCRIPT> (should add a t:lowercase to ignore case). Or by escaping characters that might be processed by your parts of your application.
Moving on, there is no need to specify the #rx operator as that's implied if no other operator is given. While no harm in being explicitly it's a bit odd you didn't give it for blah but did for the <script> bit.
It's also advisable to specify a phase rather than use default (usually phase 2). In this case you'd want phase 2 which is when all Request Headers and Request Body pieces are available for processing, so default is probably correct but best to be explicit.
And finally 404 is "page not found" response code. A 500 or 503 might be a better response here.
So your rule would be better rewritten as:
SecRule REQUEST_URI "/blah.html" id:101,phase:2,msg: ‘XSS Attack’,severity:ERROR,deny,status:500,chain
SecRule ARGS "<script" "t:lowercase"
Though this still doesn't address all the ways that the basic check you are doing for a script tag can be worked around, as mentioned above. The OWASP Core Rule Set is a free ModSecurity set of rules that has been built up over time and has some XSS rules in it that you can check out. Though be warned some of their regexprs get quite complex to follow!
ModSecurity also works better as a more generic check, so it's unusual to want to protect just one page like this (though not completely unheard of). If you know one page is vulnerable to a particular attack then it's often better to code that page, or how it's processed, to fix the vulnerability, rather than using ModSecurity to handle it. Fix the problem at source rather than patching round it, is always a good mantra to follow where possible. You would do that by sanitising and escaping input HTML code from inputs for example.
Saying that it is often a good short term solution to use a ModSecurity rule to get a quick fix in while you work on the more correct longer term fix - this is called "virtual patching". Though sometimes they have a tendency to become the long term solutions then as no one gets time to fix the core problem.
If however you wanted a more generic check for "script" in any arguments for any page, then that's what ModSecurity is more often used for. This helps add extra protection on what already should be there in a properly coded app, as a back up and extra line of defence. For example in case someone forgets to protect one page out of many by sanitising user inputs.
So it might be best dropping the first part of this rule and just having the following to check all pages:
SecRule ARGS "<script" id:101,phase:2,msg: ‘XSS Attack’,severity:ERROR,deny,status:500,"t:lowercase"
Finally XSS is quite a complex subject. Here I've assumed you want to check parameters sent when requesting a page. So if it uses user input to construct the page and displays the input, then you want to protect that - known as "reflective XSS." There are other XSS vulnerabilities though. For example:
If bad data is stored in a database and used to construct a page. Known as "stored XSS". To address this you might want to check the results returned from the page in the RESPONSE_BODY parameter in phase 4, rather than the inputs sent to the page in the ARGS parameter in phase 2. Though processing response pages is typically slower and more resource intensive compared to requests which are usually much smaller.
You might be able to execute a XSS without going through your server e.g. If loading external content like a third party commenting system. Or page is served over http and manipulated between server and cling. This is known as "DOM-based XSS" and as ModSecurity is on your server then it cannot protect against these types of XSS attacks.
Gone into quite a lot of detail there but hope that helps explain a little more. I found the ModSecurity Handbook the best resource for teaching yourself ModSecurity despite its age.

"+having+" in $GET/$POST causes server to return 403 Forbidden

One of my clients has a PHP script that kept crashing inexplicably. After hours of research, I determined if you send any PHP script a variable (either through GET or POST) that contains " having t", or escaped for the URL "+having+t", it crashes the script and returns a "403 forbidden error". To test it, I made a sample script with the entire contents:
<?php echo "works";
I put it live (temporarily) here: http://primecarerefer.com/test/test.php
Now if you try sending it some data like: http://primecarerefer.com/test/test.php?x=+having+x
It fails. The last letter can be any letter and it will still crash, but changing any other letter makes the script load fine. What would cause this and how can it be fixed? The link is live for now if anyone wants to try out different combinations.
PS - I found that if I get the 403 error a bunch of times in a row, the sever blocks me for 15 minutes.
I had this type of issue on a webserver that ran apache mod_security, but it was very poorly configured, actually mod_security has very bad default regex rules, which are very easy to trip with valid POST or GET data.
To be clear, this has nothing to do with PHP or HTML, it's about POST and GET data passing through mod_security, almost certainly, and mod_security rejecting the request because it believes it is an sql injection attempt.
You can edit the rules yourself depending on the server access, but I don't believe you can do anything, well, if it's mod_security, I know you can't do anything via PHP to get around this.
/etc/httpd/conf.d/mod_security.conf (old path, it's changed, but it gives the idea)
Examples of the default rules:
SecFilter "delete[[:space:]]+from"
SecFilter "insert[[:space:]]+into"
SecFilter "select.+from"
These are samples of the rules
https://www.howtoforge.com/apache_mod_security
here they trip the filter:
http://primecarerefer.com/test/test.php?x=%20%22%20%20select%20from%22
Note that the article is very old and the rules actually are quite differently structured now, but the bad regex remains, ie: select[any number of characters, no matter how far removed, or close]from will trip it, any sql that matches these loose rules will trip it.
But since editing those default files requires access to them, and also assumes they won't be altered in an upgrade of apache mod_security at some point, it's not a good way to fix the problem I found, moving to a better, more professionally setup, hoster, fixed those issues for us. But it does help if you talk to the hosting support to know what the cause of the issue is.
In fact 'having' is not irrelevant at all, it's part of sql injection filters in the regex rules in the security filters run on POST/GET. We used to hit this all the time when admins would edit CMS pages, which would trigger invariably some sql filter, since any string of human words would invariably contain something like 'select.*from' or 'insert.*into' etc.
This mod_security issue used to drive me bonkers trying to debug why backend edit form updates would just hang, until I finally realized it was badly done generic regex patterns in the mod_security file itself.
In a sense, this isn't an answer, because the only fix is going into the server and either editing the rules file, which is pretty easy, or disabling mod_security, or moving to a web hoster that doesn't use those bad generic defaults.

Apache Tomcat with Mod_JK URL decoding issue

I am using apache tomcat with mod_jk and running shindig on it. i am trying to pass the below url to it
http://download.finance.yahoo.com/d/quotes.csv?s=^GSPTSE+^SPCDNX+MIC.TO+ABX.TO+AEM.TO&f=snl1d1t1c1&e=.csv&random=5683
and it fails giving error 400 (Invalid url parameter)
if i pass the url without any parameter it works perfectly fine.
you can have look at consol log for below url
http://portaltab.com/shindig/gadgets/ifr?url=http://igstock.googlecode.com/svn/trunk/modules/canada_stock_market_on_ig.xml
i tried so many things, but no luck. i am not sure whether it is tomcat issue or something else.
if any expert experience the same issue could you please share some info.
Thank you.
Regards,
Raj
Most likely your issue is because carets (^) are not valid URL characters. They are considered "unsafe" per RFC1738. Quoting from that RFC:
...Other characters are unsafe because gateways and other transport
agents are known to sometimes modify such characters. These
characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`".
You should encode the carets in your URL using %5E. Some programmers and libraries do not do this by default as it is not a commonly used symbol and some systems handle it without error even if not fully compliant.
It's not clear from your example if you are encoding your URL, and if so, where you are doing so. If not encoding at all, you might also need to encode the plus symbols. A fully encoded s value per your example would be:
%5EGSPTSE%2B%5ESPCDNX%2BMIC.TO%2BABX.TO%2BAEM.TO

How to force dispatcher cache urls with get parameters

As I understood after reading these links:
How to find out what does dispatcher cache?
http://docs.adobe.com/docs/en/dispatcher.html
The Dispatcher always requests the document directly from the AEM instance in the following cases:
If the HTTP method is not GET. Other common methods are POST for form data and HEAD for the HTTP header.
If the request URI contains a question mark "?". This usually indicates a dynamic page, such as a search result, which does not need to be cached.
The file extension is missing. The web server needs the extension to determine the document type (the MIME-type).
The authentication header is set (this can be configured)
But I want to cache url with parameters.
If I once request myUrl/?p1=1&p2=2&p3=3
then next request to myUrl/?p1=1&p2=2&p3=3 must be served from dispatcher cache, but myUrl/?p1=1&p2=2&p3=3&newParam=newValue should served by CQ for the first time and from dispatcher cache for subsequent requests.
I think the config /ignoreUrlParams is what you are looking for. It can be used to white list the query parameters which are used to determine whether a page is cached / delivered from cache or not.
Check http://docs.adobe.com/docs/en/dispatcher/disp-config.html#Ignoring%20URL%20Parameters for details.
It's not possible to cache the requests that contain query string. Such calls are considered dynamic therefore it should not be expected to cache them.
On the other hand, if you are certain that such request should be cached cause your application/feature is query driven you can work on it this way.
Add Apache rewrite rule that will move the query string of given parameter to selector
(optional) Add a CQ filter that will recognize the selector and move it back to query string
The selector can be constructed in a way: key_value but that puts some constraints on what could be passed here.
You can do this with Apache rewrites BUT it would not be ideal practice. You'll be breaking the pattern that AEM uses.
Instead, use selectors and extensions. E.g. instead of server.com/mypage.html?somevalue=true, use:
server.com/mypage.myvalue-true.html
Most things you will need to do that would ever get cached will work this way just fine. If you give me more details about your requirements and what you are trying to achieve, I can help you perfect the solution.

Could a manipulated URL cause security issues inside a RewriteRule?

I'm brand new to Apache.
I have the following .htaccess file
Options -Indexes
RewriteBase /
RewriteEngine on
RewriteRule ^([a-zA-Z0-9_]+)*$ redirect.php?uniqueID=$1 [QSA,L]
so that going to: mySite.com/242skl2j
loads the page: mySite.com/redirect.php?uniqueID=242skl2j
But let's say I didn't have this RegEx in my Apache code [a-zA-Z0-9_] and I just allowed for all characters.... could someone load Apache code directly into this by navigating to something like mySite.com/2%20[R]%20reWriteRule%20^([a-zA-Z0-9_]+)*$%20anything.html#www.aDifferentSite.com/index.html
Like SQL injection but for Apache? (I'm not even sure %20 would convert to a space in my apache code but there might be something that can?)
Or is this not really a concern because they can't do any real "harm" to the site, only to their own unique navigation?
As far as I know, there is no known security hole in Apache where something like this could slip through. Whatever is in your URL gets escaped before it's used inside Apache's engine.
Also, different from those in the central server config, rewrites and redirections defined in .htaccess can not "break out" of the current web root*, so even an accidental mis-written (or exploited) RewriteRule could not be used to get hold of something that isn't supposed to be served publicly.
* = see the description of RewriteRule's behaviour in the docs.
There's not big risks. But there's maybe some things to care a little about.
Usually things handled by mod_rewrite are already url decoded. So you could manipulate the HTTP_REFERRER or the query path without any character decoding considerations.
Also the rewrite rules does not suffer from rules injections. But you could try it, if you find a way to inject something in a Rule that would be interpreted as a rewriteRule code you would become a rich guy :-). Seriously I don't think you could, theses rules manage arguments as an SQL server would manage arguments when using parameterized queries, parameters cannot be read as code.
But mod_rewrite receive also the query string part of the query before any urldecode on it. So this is a point where you should be cautious.
Any rule that apply some access restriction based on query string arguments (everything after the ?), like this Access Control by Query String from http://wiki.apache.org is wrong:
Deny access to http://example.com/page?query_string if query_string contains the string foo.
RewriteCond %{QUERY_STRING} foo
RewriteRule ^/page - [F]
Using http://example.com/page?toto=%66oo with %66 for f would not be detected by mod_rewrite. Managing rules on the query_string part of the request is a very hard stuff. you could check this answer for examples, usually it means checking for combination of both encoded and decoded characters in the string. But the simple rule is avoid any access control by mod_rewrite based on the query string, work only on the query path. And even with paths, double check that using // instead of / stills works.
In the past some mod_rewrite exploits have existed:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2003-0542 (2003, buffer overflow)
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2006-3747 (2006, buffer overflow)
http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-1862 (2013, bad log escaping)
No, that's not possible. The only thing you need to worry about is how you handle the uniqueID parameter.
Short answer: No
Long answer: rewrite rule is nothing more than a replace function, it just replaces what it finds with what you have given it. But since you're passing that to a web application , that may cause a problem if you haven't taken care of incoming data!