Rewrite rule to prevent apache decoding url before reaching htaccess? - apache

We have a htaccess rule like this:
RewriteRule ^(.*)/(.*)/(.*) ../app$1/scripts/api/index.php?fn=$2&$3 [L]
This works fine in most cases, however, Apache decodes the url before it arrives at this rule, so a url like beta/list/&cat=red%20%26%20blue, is seen by htaccess as beta/list/&cat=red & blue so we get cat='red' and blue=null coming into index.php instead of cat='red & blue'.
I've read that the workaround for this issue is to use server variables like %{REQUEST_URI} %{THE_REQUEST} in the htaccess rule as these are not decoded before use, but it's difficult to implement. The question mark in the RewriteRule makes everything go crazy and I can't figure out how to escape it.
Can any experts out there help me fix the rule below to behave like the one above?
RewriteCond %{REQUEST_URI} ^(.*)/(.*)/(.*)
RewriteRule . ../app%1/scripts/api/index.php?fn=%2&%3 [L]

Indeed, the solution is to use the special server-variable called THE_REQUEST.
From mod_rewrite documentation:
THE_REQUEST
The full HTTP request line sent by the browser to the server (e.g.,
"GET /index.html HTTP/1.1"). This does not include any additional
headers sent by the browser. This value has not been unescaped
(decoded), unlike most other variables below.
Here is how your rule should look like
# don't touch urls ending by index.php
RewriteRule index\.php$ - [L]
# user request matching /xxx/xxx/xxx (with optional query string)
RewriteCond %{THE_REQUEST} \s/([^/\?]+)/([^/\?]+)/([^\?]+)(?:\s|\?) [NC]
RewriteRule ^ ../app%1/scripts/api/index.php?fn=%2&%3 [L,QSA]
Please note that you shouldn't be using relative path for internal rewrite, which could lead to confusion. Instead, define a RewriteBase, use an absolute path or start from the domain root with a /.
UPDATE
Since you can have encoded forward slashes in your url, you need to set AllowEncodedSlashes to NoDecode (or On but it's unsafe). Note also that, due to a bug, you must put this directive inside a virtual host context, even if the server config context is said to be OK (otherwise, it is simply ignored). By default, AllowEncodedSlashes is set to Off. So, Apache handles encoded slashes automatically by itself and refuses them, without passing the request to mod_rewrite. See the official documentation here.

Related

.htaccess rewrite returning Error 404

RewriteEngine on
RewriteCond %{QUERY_STRING} (^|&)public_url=([^&]+)($|&)
RewriteRule ^process\.php$ /api/%2/? [L,R=301]
Where domain.tld/app/process.php?public_url=abcd1234 is the actual location of the script.
But I am trying to get .htaccess to make the URL like this: domain.tld/app/api/acbd1234.
Essentially hides the process.php script and the get query ?public_url.
However the script above is returning error 404 not found.
I think this is what you are actually looking for:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)public_url=([^&]+)(?:$|&)
RewriteRule ^/?app/process\.php$ /app/api/%1 [R=301,QSD]
RewriteRule ^/?app/api/([^/]+)/?$ /app/process.php?public_url=$1 [END]
If you receive an internal server error (http status 500) for that then check your http servers error log file. Chances are that you operate a very old version of the apache http server, you may have to replace the [END] flag with the [L] flag which probably will work just fine in this scenario.
And a general hint: you should always prefer to place such rules inside the http servers (virtual) host configuration instead of using dynamic configuration files (.htaccess style files). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only supported as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).
UPDATE:
Based on your many questions in the comments below (we see again how important it is to be precise in the question itself ;-) ) I add this variant implementing a different handling of path components:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)public_url=([^&]+)(?:$|&)
RewriteRule ^/?app/process\.php$ /api/%1 [R=301,QSD]
RewriteRule ^/?api/([^/]+)/?$ /app/process.php?public_url=$1 [END]
I am trying to get .htaccess to make the URL like this: example.com/app/api/acbd1234.
You don't do this in .htaccess. You change the URL in your application and then rewrite the new URL to the actual/old URL. (You only need to redirect this, if the old URLs have been indexed by search engines - but you need to watch for redirect loops.)
So, change the URL in your application to /app/api/acbd1234 and then rewrite this in .htaccess (which I assume in in your /app subdirectory). For example:
RewriteEngine On
# Rewrite new URL back to old
RewriteRule ^api/([^/]+)$ process.php?public_url=$1 [L]
You included a trailing slash in your earlier directive, but you omitted this in your example URL, so I've omitted it here also.
If you then need to also redirect the old URL for the sake of SEO, then you can implement a redirect before the internal rewrite:
RewriteEngine On
# Redirect old URL to new (if request by search engines or external links)
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} (?:^|&)public_url=([^&]+)(?:$|&)
RewriteRule ^process\.php$ /app/api/%1? [R=302,L]
# Rewrite new URL back to old
RewriteRule ^api/([^/]+)$ process.php?public_url=$1 [L]
The check against REDIRECT_STATUS is to avoid a rewrite loop. ?: inside the parenthesised subpattern avoids the group being captured as a backreference.
Change the 302 (temporary) to 301 (permanent) only when you are sure it's working OK, to avoid erroneous redirects being cached by the browser.

Why is this RewriteRule altering QUERY_STRING, but leaving REQUEST_URI untouched?

I have a copy of Concrete5, a PHP-based CMS, running on example.com.
Concrete5 comes with the following basic instructions for pretty URLs (redirecting all URLs to a central index.php)
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_URI} !^/c5.7
RewriteRule ^.*$ c5.7/$0 [L] # Concrete5 is running in the c5.7/ subdirectory
</IfModule>
Pretty straightforward.
Now I have a certain set of URLs that take the form
/product/{productname}
that I need to forward to the Concrete5 (virtual) URL
/products/details?name={productname}
That URL is set up and works as expected when I enter it manually in the browser.
So I added a line to the htaccess file and it now looks like this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
# New rule for products
RewriteCond %{REQUEST_URI} ^/product/
RewriteRule ^product/(.+)$ /products/details?name=$1 [QSA]
RewriteCond %{REQUEST_URI} !^/c5.7
RewriteRule ^.*$ c5.7/$0 [L]
</IfModule>
I can confirm the RewriteRule gets triggered when I choose a random, external URL as the redirection target.
But whenever it is an internal redirect like above, what happens is, I get a 404 inside Concrete5. When I inspect what was passed to it, I see:
REQUEST_URI: /product/my-random-product
QUERY_STRING: name=my-random-product
So it appears that the rule is triggered and does some rewriting, but REQUEST_URI remains unchanged!
Why?
Is it because PHP 7.1 is running via CGI?
I have tried a zillion variations and all the flags in the book, with little success.
The REQUEST_URI in PHP is not the same as the REQUEST_URI within mod_rewrite, so you can't do it like this. In PHP it always contains the original URL. So you can't change it like this if your CMS is working off that.
You should set up your CMS to use the URLs you want, rather than trying to augment your CMS's URL rewriting like this.
If you inspect REDIRECT_URL in PHP you will see the last rewritten URI.
REQUEST_URI in PHP will always be the original request URI.
Because this is already explained by LSerni and SuperDuperApps, I won't elaborate.
Instead, I'm offering a quick solution: modify the REQUEST_URI and add a name parameter in PHP instead of in .htaccess.
Add the following code to the start of your Concrete5 index.php to make sure that REQUEST_URI is modified
before any Concrete5 code runs:
if(preg_match('-^/product/([^?]*)-',$_SERVER['REQUEST_URI'],$matches)){
$_SERVER['REQUEST_URI'] = '/products/details';
$_GET['name'] = $matches[1];
}
Your setup works on a PHP 7.1 machine (without Concrete5). It does call a script I just put in, which is in /c5.7/products/details. So the Apache part is working.
Inside the script, I see that REQUEST_URI is the old value prior to the rewrite.
So its value is normal and it not being rewritten is a red herring - it isn't supposed to be rewritten. The 404 error must be due to something else.
Your Concrete5 routing should support the real URL, not just the virtual one, because C5's routing relies itself on REQUEST_URI. If this is so, you need to create a route for your short URLs
Route::register('/product/{productname}' ...)
and an appropriate controller to get the parameters and invoke the "old" controller.
One possibility using .htaccess could be this, but I'm not too sure it will work since REQUEST_URI is still left unchanged:
# New rule for products
RewriteCond %{REQUEST_URI} ^/product/
RewriteRule ^product/(.+)$ c5.7/products/details?name=$1 [L,QSA]
Otherwise you need to do an external redirect, which will disclose the URL in the browser:
RewriteRule product/(.*)$ http://.../products/details?name=$1 [QSA]
See also this other question.

Apache rewrite rule that ignores query/parameters, always redirects based on path

I'm trying to create a rewrite rule that will ignore any additional URL query/parameters and just redirect based on the path.
My company has a Wifi Hotspot service that does some DNS routing trick to force people to login before they can use it. Unfortunately when folks get disconnected from the WiFi and dropped back to their normal cell data service sometimes a URL request is still sent to our host, and it shows up as:
www.ourwebsite.com/login?dst=http://www.google.com/m?client=ms-android-verizon&source=android-home
I already wrote a set of rules to take care of base paths of /login and /login/ to redirect to our homepage,
RewriteCond %{THE_REQUEST} ^.*\/login/\ HTTP/
RewriteRule ^(.*)login/?$ "/$1" [R=301,L]
RewriteCond %{THE_REQUEST} ^.*\/login\ HTTP/
RewriteRule ^(.*)login?$ "/$1" [R=301,L]
but I am having trouble coming up with an appropriate string to ALWAYS redirect based souly on the path, and ignore any query parameters that may or may not come after.
Any help would be appreciate! Thanks in advance.
If I understood right, something like this should do it:
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
RewriteRule ^login /? [R=301,L]
This rule-set will redirect to root as long as the incoming URL is something like:
http://www.ourwebsite.com/login?any_query
From Apache 2.4.0 on you can apply the QSD-flag to the rule.
When the requested URI contains a query string, and the target URI does not, the default behavior of RewriteRule is to copy that query string to the target URI. Using the [QSD] flag causes the query string to be discarded.
-- https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_qsd
When using this flag for earlier Apache versions you'll cause an 500 Internal Server Error.

htaccess rewrite drive me nuts

I want to use a rather simple rewrite, something like this:
RewriteRule monitor.html index.php/\?first_category_id=B008 [NC,L]
But it doesn't work as expected, goes to like index.php/monitor.html (which kicks in symfony's routing and returns a 404 error but this is a different story)
However if i include full url like:
RewriteRule monitor.html http://example.com/index.php/\?first_category_id=B008 [NC,L]
it responses the correct content, but this looks like a full redirect, the rewrited url is revealed in the browser. And thats not transparent nor easily deployable.
What am i missing here?
the rest of the htaccess file if it matters:
RewriteCond %{REQUEST_URI} \..+$
RewriteRule .* - [L]
RewriteRule ^(.*)$ index.php [QSA,L]
Your rule is outputting a relative path and you're in a per-directory context. You need RewriteBase. In a per-directory context, rewriting is being done on expanded filesystem paths, not on the original URL's. But the results of the expansion are converted to a URL again! RewriteBase supplies the prefix needed to do that. Without it, the URL is naively made out of the same filesystem prefix that was stripped prior to the substitution and you end up with for instance http://example.com/var/www/docroot/blah... which is nonsense. Either RewriteBase or put out an absolute, beginning with a slash.
Also, you should anchor the match:
RewriteRule ^monitor.html$ ...
Otherwise the rule will potentially match somewhere in the middle of the path and just that matching part will be replaced with the substitution! You don't want to match and translate amonitor.htmly/foobar, right, and convert just the monitor.html part to a the index.php stuff.
You should not escape the question mark in the substitution. It's not a regexp! Just index.php/?etc not index.php/\?etc (Could that backslash be what is screwing up, causing `index.php/monitor.html'?)

How to prevent mod_rewrite from rewriting URLs more than once?

I want to use mod_rewrite to rewrite a few human-friendly URLs to arbitrary files in a folder called php (which is inside the web root, since mod_rewrite apparently won't let you rewrite to files outside the web root).
/ --> /php/home.php
/about --> /php/about_page.php
/contact --> /php/contact.php
Here are my rewrite rules:
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^$ php/home.php [L]
RewriteRule ^about$ php/about_page.php [L]
RewriteRule ^contact$ php/contact.php [L]
However, I also want to prevent users from accessing files in this php directory directly. If a user enters any URL beginning with /php, I want them to get a 404 page.
I tried adding this extra rule at the end:
RewriteRule ^php php/404.php [L]
...(where 404.php is a file that outputs 404 headers and a "Not found" message.)
But when I access / or /about or /contact, I always get redirected to the 404. It seems the final RewriteRule is applied even to the internally rewritten URLs (as they now all start with /php).
I thought the [L] flag (on the first three RewriteRules) was supposed to prevent further rules from being applied? Am I doing something wrong? (Or is there a smarter way to do what I'm trying to do?)
[L] flag should be used only in the last rule,
L - Last Rule - Stops the rewriting process here and don’t apply any more rewriting rules & because of that you are facing issues.
I had similar problem. I have a content management system written in PHP and based on Model-View-Control paradigm. The most base part is the mod_rewrite. I've successfully prevent access to PHP files globally. The trick has name THE_REQUEST.
What's the problem?
Rewriting modul rewrites the URI. If the URI matches a rule, it is rewritten and other rules are applied on the new, rewritted URI. But! If the matched rule ends with [L], the engine doesn't terminate in fact, but starts again. Then the new URI doesn't more match the rule ending with [L], continues and matches the last one. Result? The programmer stars saying bad words at the unexpected 404 error page. However computer does, what you say and doesn't do, what you want. I had this in my .htaccess file:
RewriteEngine On
RewriteBase /
RewriteRule ^plugins/.* pluginLoader.php [L]
RewriteCond %{REQUEST_URI} \.php$
RewriteRule .* index.php [L]
That's wrong. Even the URIs beginning with plugins/ are rewritten to index.php.
Solution
You need to apply the rule if and only if the original - not rewritten - URI matches the rule. Regrettably the mod_rewrite does not provide any variable containing the original URI, but it provides some THE_REQUEST variable, which contains the first line of HTTP request header. This variable is invariant. It doesn't change while rewrite engine is working.
...
RewriteCond %{THE_REQUEST} \s.*\.php\s
RewriteRule \.php$ index.php [L]
The regular expression is different. It is not applied on the URI only, but on entire first line of the header, that means on something like GET /script.php HTTP/1.1. But the critical rule is this time applied only if the user is explicitly requesting some PHP-script directly. The rewritten URI is not used.