Rewriting a URL causing a 400 Bad Request - apache

I'm trying to get around an HTTP 400 Bad Request with Apache that is caused when this URL is accessed (actual domain redacted):
http://example.com/nw/f/RUD/E.enc<space>
The URL ends with an actual space character, thus giving an HTTP 400 Bad Request. I cannot get the clients requesting this to remove the space in the URL, so I need to rewrite the URL without a space.
I've tried some RewriteRules, like this one (after enabling the RewriteEngine):
RewriteRule "^/nw/f/RUD/E(.*)$" /nw/f/RUD/E.enc [P]
The RewriteRule has no effect and is still giving an HTTP 400.
It even happens when I escape the space character instead of using the wildcard in the rule. Same thing if I try to replace the HTTP 400 error page to lead to the actual content (which wouldn't be ideal, since there's 4 different files).
How can I correctly rewrite the URL, removing the space on it, without getting HTTP 400?

Your RewriteRule produces a redirect loop as .../E.enc<space> and .../E.enc both match .../E(.*)$.
Better use a [R]edirect, though proxying also works.
RewriteEngine On
RewriteRule "^/nw/f/RUD/E\.enc $" /nw/f/RUD/E.enc [R,L]
or shorter
RewriteEngine On
RewriteRule "^(/nw/f/RUD/E\.enc) $" $1 [R,L]

Related

htaccess url redirect with get parameters ID and reduce value

I want to do an url redirect to a new domain by retrieving the ID parameter but only taking the first 4 characters. Anyone know how to do this?
For example, an original url:
http://www.original.example/see/news/actualite.php?newsId=be9e836&newsTitle="blablabla"
To :
https://www.new.example/actualites/be9e
I have tested :
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
RewriteCond %{QUERY_STRING} ^newsId=(.*)$ [NC]
RewriteRule ^$ https://www.new.example/actualites/%1? [NC,L,R]
There are a couple of problems with this:
The regex ^$ in the RewriteRule pattern only matches the document root. The URL in your example is /see/news/actualite.php - so this rule will never match (and the conditions are never processed).
The regex ^newsId=(.*)$ is capturing everything after newsId=, including any additional URL parameters. You only need the first 4 characters of this particular URL param.
As an aside, your existing condition is dependent on newsId being the first URL parameter. Maybe this is always the case, maybe not. But it is relatively trivial to check for this URL parameter, regardless of order.
Also, do you need a case-insensitive match? Or is it always newsId as stated in your example. Only use the NC flag if this is necessary, not as a default.
Try the following instead:
RewriteCond %{QUERY_STRING} (?:^|&)newsId=([^&]{4})
RewriteRule ^see/news/actualite\.php$ https://www.new.example/actualites/%1 [QSD,R,L]
The %1 backreference now contains just the first 4 characters of the newsId URL parameter value (ie. non & characters), as denoted by the regex ([^&]{4}).
The QSD flag (Apache 2.4) discards the original query string from teh redirect response. No need to append the substitution string with ? (an empty query string), as would have been required in earlier versions of Apache.
UPDATE:
I have an anchor link (#) which is added at the end of the link, is there a possibility of deleting it to make a clean link? Example, currently I have: https://www.new.example/news/4565/#title Ideally : https://www.new.example/news/4565
The "problem" here is that the browser manages the "fragment identifier" (fragid) (ie. the "anchor link (#)") and preserves this through the redirect. In other words, the browser re-appends the fragid to the redirect response from the server. The fragid is never sent to the server, so we cannot detect this server side prior to issuing the HTTP redirect.
The only thing we can do is to append an empty fragid (ie. a trailing #) in the hope that the browser discards the original fragment. Unfortunately, you will likely end up with a trailing # on your redirected URLs (browser dependent).
For example (simplified):
:
RewriteRule .... https://example.com/# [R=301,NE,L]
Note that you will need the NE flag here to prevent Apache from URL-encoding the # in the redirect response.
Like I say above, browsers might handle this differently.
Further reading:
URL Fragment and 302 redirects
redirect is keeping hash
How to clear fragment identifier on 302 redirect?

Mod_rewrite rules not working in .htaccess to change the URL

I'm trying to rewrite the below URL but the URLs just don't change, no errors.
Current URL:
https://example.com/test/news/?c=value1&s=value2&id=9876
Expected URL:
https://example.com/test/news/value1/value2
My .htaccess
RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess
But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.
Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:
The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.
Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).
To quote from #IMSoP's answer to this reference question on the subject:
Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly
So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:
/test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).
You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.
If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.
So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.
(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)
This "redirect" goes before the above rewrite:
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.
The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.
Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.
Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)
The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.
The QSD flag (Apache 2.4) discards the original query string from the request.
You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.
Summary
Your complete .htaccess file should look something like this:
Options -MultiViews +FollowSymLinks
# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php
RewriteEngine On
# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]
# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

POST information getting lost in .htaccess redirect

So, I have a fully working CRUD. The problem is, because of my file structure, my URLs were looking something like https://localhost/myapp/resources/views/add-product.php but that looked too ugly, so after research and another post here, I was able to use a .htaccess file to make the links look like https://localhost/myapp/add-product (removing .php extension and the directories), and I'm also using it to enforce HTTPS. Now, most of the views are working fine, but my Mass Delete view uses POST information from a form on my index. After restructuring the code now that the redirect works, the Mass Delete view is receiving an empty array. If I remove the redirect and use the "ugly URLs" it works fine. Here's how my .htaccess file is looking like:
Options +FollowSymLinks +MultiViews
RewriteEngine On
RewriteBase /myapp/
RewriteRule ^resources/views/(.+)\.php$ $1 [L,NC,R=301]
RewriteCond %{DOCUMENT_ROOT}/myapp/resources/views/$1.php -f
RewriteRule ^(.+?)/?$ resources/views/$1.php [END]
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
I didn't actually write any of it, it's a mesh between answered questions and research. I did try to change the L flag to a P according to this post: Is it possible to redirect post data?, but that gave me the following error:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at admin#example.com to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.
Apache/2.4.52 (Win64) OpenSSL/1.1.1m PHP/8.1.2 Server at localhost Port 443
POST information getting lost in .htaccess redirect
You shouldn't be redirecting the form submission in the first place. Ideally, you should be linking directly to the "pretty" URL in your form action. If you are unable to change the form action in the HTML then include an exception in your .htaccess redirect to exclude this particular URL from being redirected.
Redirecting the form submission is not really helping anyone here. Users and search engines can still see the "ugly" URL (it's in the HTML source) and you are doubling the form submission that hits your server (and doubling the user's bandwidth).
"Redirects" like this are only for when search engines have already indexed the "ugly" URL and/or is linked to by external third parties that you have no control over. This is in order to preserve SEO, just like when you change any URL structure. All internal "ugly" URLs should have already been converted to the "pretty" version. The "ugly" URLs are then never exposed to users or search engines.
So, using a 307 (temporary) or 308 (permanent) status code to get the browser to preserve the request method across the redirect should not be necessary in the first place. For redirects like this it is common to see an exception for POST requests (because the form submission shouldn't be redirected). Or only target GET requests. For example:
RewriteCond %{REQUEST_METHOD} GET
:
Changing this redirect to a 307/8 is a workaround, not a solution. And if this redirect is for SEO (as it only should be) then this should be a 308 (permanent), not a 307 (temporary).
Aside:
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Your HTTP to HTTPS redirect is in the wrong place. This needs to go as the first rule, or make sure you are redirecting to HTTPS in the current first rule and include this as the second rule, before the rewrite (to ensure you never get a double redirect).
By placing this rule last then any HTTP requests to /resources/views/<something>.php (or /<something>) will not be upgraded to HTTPS.

HTTPS redirect fails with .htaccess rewrite for certain URL length

I have an .htaccess file for showing a default image if the requested URL does not exist. I simplified it to this:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . default.png [L]
Using HTTPS, this suddenly stopped working if the URL exceeds a certain length (connection closed).
HTTP always works.
It used to work like this for years and it still does on other servers.
It also seems that the kind of characters matter:
not working:
https://server.abc/images/01234567890123456789012345678901234567890123456789abc.png
https://server.abc/images/012345678901234567890123456789012345678901234567890123456789.png
working:
https://server.abc/images/01234567890123456789012345678901234567890123456789.png
https://server.abc/images/01234567890123456789012345678901234567890123456789123.png
https://server.abc/images/0123456789012345678901234567890123456789012345678912345.png
The redirect works if the condition is removed (second line), so it seems like it has something to do with REQUEST_FILENAME, HTTPS and the byte size (encoding?) of the filename/URL string.
This occurs with Apache/2.4.46 and macOS/10.15.7. It might have started after one of the latest security updates.
Any idea where this is coming from or what kind of configuration could cause this?
Thanks for your help!
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . default.png [L]
It's not clear why this would "fail" for only certain requests over HTTPS only. A "security" update (particularly if it involves mod_security) is a likely cause - although an unusual one.
However, you shouldn't really be doing it this way to begin with. This will result in a request for any non-existent URL being served /default.png with a "200 OK" response and potentially risk being indexed by search engines and abused by a malicious user.
What you are doing here is essentially setting a custom 404 response to an image, which you could do with the following instead and which will also return the "correct" 404 status.
ErrorDocument 404 /default.png
Now, any request that does not map to file (or directory) will be served the image /default.png but with a 404 "Not Found" HTTP response code, so search engines/bots get the "correct" response.
This also naturally gets around the REQUEST_FILENAME issue, assuming these "not working" URLs do ultimately result in a 404 and not some other response (due to the "security" update).

Apache rewrite XSS protection issues

We are developing a website for a trucking company and it recently been subjected to penetration testing. One of the attacks done was injecting a XSS script into the request url:
ourcompanyhostname.com/abc/authorize<script>alert('xss');</script>
Since our web server is Apache, we have fixed the issue by setting up the ff. in the httpd.conf file. basically, rather reflecting the script in the 404 response erorr, a generic 400 response is thrown instead.
RewriteRule ^/abc/authorize/.*[^A-Za-z0-9./\-_]+ "-" [L,R=400]
The issue is when the attack was changed to the one below, it no longer can be caught:
ourcompanyhostname.com/abc/authorize%3c%3cSCRIPT%3ealert(%22XSS%22)%3b%2f%2f%3c%3c%2fSCRIPT%3e
Response still was 404 instead of 400.
Is there another way to achieve what we want? We already have tried doing the one below but it still won't work. We just want it to return an http 400 error when a XSS attack is done.
RewriteCond %{REQUEST_URI} ^.*(\*|;|<|>|\)|%0A|%0D|%3C|%3E|%00).* [NC]
RewriteCond %{REQUEST_URI} abc
RewriteRule ^(.*)$ "-" [L,R=400]
I don't think the encoding matters, mod_rewrite sees the path in the URL after decoding.
I think you may have missed that your original rule requires matching a trailing slash after "authorize" and the new malicious request doesn't have it.
Your final rule works fine for me, if you get an unexpected result for a particular URL you have to study the rewritelog/logelvel trace8 output.
If the 404 is generated by Apache, just use a custom ErrorDocument for 404.
If it is generated by your EE server, do the same in your web.xml.