mod_rewrite question - apache

I would like to serve /foo and /foo/ locally, but proxy requests for /foo/* to a remote server. However, the following rule matches all of the above. What am I doing wrong?
RewriteRule ^/foo/(.+)$ http://remote.host/$1 [P,L]

You will need to escape for the first couple of conditions so that they don't all send them off to the remote host. Try this:
RewriteEngine On
RewriteRule ^foo$ /$1 [L]
RewriteRule ^foo/$ /$1 [L]
RewriteRule ^foo/([a-zA-Z0-9].*)$ http://example.com/$1 [L]
First rule checks the first condition to be plainly /foo. If so, stay at home.
Next test checks to see if it's not just /foo/. If so, again, stay local.
Last test checks to see if you have anything dangling behind a slash, if so, then you probably want the remote host and sends it there.

Well, since mod_rewrite normally strips leading slashes from the matched text, I suspect you're either transcribing/anonymizing imperfectly or there's a good deal else going on in your rewrite configuration. That seems further borne out by the impossibility of the pattern /foo/.+ matching /foo.
Can you expand and double-check what you're posting from your rewrite config, so we can see what else might be going on?

I think I got it -- somewhere the default docname is being set to index.php, which is silently being appended to my rewrite.
RewriteLog output:
(2) init rewrite engine with requested uri /foo
(3) applying pattern '^/foo(/.+)+$' to uri '/foo'
(1) pass through /foo
(2) init rewrite engine with requested uri /foo/
(3) applying pattern '^/foo(/.+)+$' to uri '/foo/'
(1) pass through /foo/
(2) init rewrite engine with requested uri /foo/index.php
(3) applying pattern '^/foo(/.+)+$' to uri '/foo/index.php'
(2) rewrite '/foo/index.php' -> 'http://remote.host//index.php'
(2) forcing proxy-throughput with http://remote.host//index.php
(1) go-ahead with proxy request proxy:http://remote.host//index.php [OK]

Related

htaccess rewrites causing side affects

I have the following in my .htaccess file.
RewriteEngine On
RewriteRule ^([^/]+)?$ /member/profile.php?user=$1 [L]
RewriteRule ^assets(/.*)?$ /member/assets$1 [L]
RewriteRule ^images(/.*)?$ /member/images$1 [L]
RewriteRule ^php(/.*)?$ /member/php$1 [L]
The desired effect is:
https://example.com/username -> https://example.com/member/profile.php?user=$1
This works, however, the issue is there are 2 undesired outcomes happening from this.
First: https://example.com and https://example.com/ return 404 errors but https://example.com/index.php works just fine.
Second: https://example.com/username/ ends up forwarding to https://example/member/php/?user=username and returning a 404 error.
I have also attempted
DirectoryIndex index.htm index.html index.php
But this seems to have no effect on the issue
My actual desired end result would look more like:
https://example.com -> https://example.com/index.php
https://example.com/ -> https://example.com/index.php
https://example.com/username -> https://example.com/member/profile.php?user=$1
https://example.com/username/ -> https://example.com/member/profile.php?user=$1
RewriteRule ^([^/]+)?$ /member/profile.php?user=$1 [L]
First: https://example.com and https://example.com/ return 404 errors but https://domain.name/index.php works just fine.
The first rule will catch the request (since it allows an empty URL-path) and will rewrite the request to /member/profile.php?user=. So, presumably it is your script that is triggering the 404?
In fact, it looks like you are missing a slash before ? to match an optional trailing slash (ie. /username or /username/), rather than making the entire pattern optional! ie. ^([^/]+)/?$
You would also need the NS (nosubreq) flag to prevent the subrequest by mod_dir for the DirectoryIndex (ie. index.php) also being caught by this rule. However, this rule is arguably matching too much, as it will also catch direct requests for index.php (and any other files you might have in the root). So, maybe you need to be more restrictive in what characters are allowed in usernames? For example, at a minimum, exclude dots (as well as slashes) with ^([^/.]+)/?$? Or allow only letters and numbers (and underscores), eg. ^(\w+)/?$. (\w is a shorthand character class that represents [0-9a-zA-Z_].)
Note that the first rule will also match assets, images and php - so these are valid usernames. Is that intentional? You could reverse the rules so this does not happen, but you would need to ensure that there are no usernames that match these strings.
NB: https://example.com and https://example.com/ are exactly the same request. (The browser effectively appends the slash after the hostname to make a valid HTTP request. See the following question on the Webmasters stack: Is trailing slash automagically added on click of home page URL in browser?)
Second: https://example.com/username/ ends up forwarding to https://example.com/member/php/?user=username and returning a 404 error.
I can't see how that would happen with the directives as posted. None of your rules would match /username/ (with a trailing slash), unless the username is "assets", "images" or "php" - but that still wouldn't result in the stated rewrite? However, /username/ would result in a 404 because nothing actually happens to rewrite the URL!
Your rules should perhaps be written like this instead:
RewriteEngine On
RewriteRule ^(\w+)/?$ member/profile.php?user=$1 [L]
RewriteRule ^assets(/.*) member/assets$1 [L]
RewriteRule ^images(/.*) member/images$1 [L]
RewriteRule ^php(/.*) member/php$1 [L]
The capturing subpattern in rules 2, 3 and 4 is not optional, so I've removed the trailing ?$.
I've also removed the slash prefix on the substitution string, to make it a relative file-path.
Which could also be further "simplified" to:
RewriteEngine On
RewriteBase /member
RewriteRule ^(\w+)/?$ profile.php?user=$1 [L]
RewriteRule ^((assets|images|php)/.*) $1 [L]

redirect a page to an url with #

I would like to redirect an url (let's say https://www.example.com/de/page-a/) to an url having a parameter with a # (https://www.example.com/de/page-b/#filter:fields=6).
I don't find the right rule using RewriteRule or Redirect. It always redirects to https://www.example.com/de/page-b/, ignoring the last part.
Can someone help me on that?
Best regards
Redirect ^de/page\-a/$ exemple.com/de/page-b/#filter:fields=51? [L,R=301]
You are mixing up the directives. The mod_alias Redirect directive takes a simple root-relative URL prefix (starting with a slash) as the source URL argument. So the above will never match and nothing happens. There are no [L,R=301] flags with the Redirect directive, which is used by mod_rewrite RewriteRule.
For example:
Redirect 301 /de/page-a/ /de/page-b/#filter:fields=51
You do not need to specify an absolute URL as the target if you are redirecting to the same scheme + hostname.
When redirecting to a fragment identifier (everything after the #) you do need to be careful of redirect loops since the fragid is not passed back to the server. In this case you are OK since you are redirecting to a different URL-path, ie. page-a to page-b. But you could not redirect from page-a to page-a (same URL-path and query string) and simply change the fragid as it will create a redirect loop. For this you would need to use JavaScript.
NB: Test with 302 (temporary) redirect first to avoid caching issues.
If you want to implement this using mod_rewrite (ie. RewriteRule) - perhaps if you are already using mod_rewrite - then you could do the following instead:
RewriteEngine On
RewriteRule ^de/page-a/$ /de/page-b/#filter:fields=51 [NE,R=301,L]
Note that there is no slash prefix on the RewriteRule pattern.
The NE flag is required here in order to prevent the # being URL-encoded in the response and treated as part of the URL-path.

Rewrite rule to prevent apache decoding url before reaching htaccess?

We have a htaccess rule like this:
RewriteRule ^(.*)/(.*)/(.*) ../app$1/scripts/api/index.php?fn=$2&$3 [L]
This works fine in most cases, however, Apache decodes the url before it arrives at this rule, so a url like beta/list/&cat=red%20%26%20blue, is seen by htaccess as beta/list/&cat=red & blue so we get cat='red' and blue=null coming into index.php instead of cat='red & blue'.
I've read that the workaround for this issue is to use server variables like %{REQUEST_URI} %{THE_REQUEST} in the htaccess rule as these are not decoded before use, but it's difficult to implement. The question mark in the RewriteRule makes everything go crazy and I can't figure out how to escape it.
Can any experts out there help me fix the rule below to behave like the one above?
RewriteCond %{REQUEST_URI} ^(.*)/(.*)/(.*)
RewriteRule . ../app%1/scripts/api/index.php?fn=%2&%3 [L]
Indeed, the solution is to use the special server-variable called THE_REQUEST.
From mod_rewrite documentation:
THE_REQUEST
The full HTTP request line sent by the browser to the server (e.g.,
"GET /index.html HTTP/1.1"). This does not include any additional
headers sent by the browser. This value has not been unescaped
(decoded), unlike most other variables below.
Here is how your rule should look like
# don't touch urls ending by index.php
RewriteRule index\.php$ - [L]
# user request matching /xxx/xxx/xxx (with optional query string)
RewriteCond %{THE_REQUEST} \s/([^/\?]+)/([^/\?]+)/([^\?]+)(?:\s|\?) [NC]
RewriteRule ^ ../app%1/scripts/api/index.php?fn=%2&%3 [L,QSA]
Please note that you shouldn't be using relative path for internal rewrite, which could lead to confusion. Instead, define a RewriteBase, use an absolute path or start from the domain root with a /.
UPDATE
Since you can have encoded forward slashes in your url, you need to set AllowEncodedSlashes to NoDecode (or On but it's unsafe). Note also that, due to a bug, you must put this directive inside a virtual host context, even if the server config context is said to be OK (otherwise, it is simply ignored). By default, AllowEncodedSlashes is set to Off. So, Apache handles encoded slashes automatically by itself and refuses them, without passing the request to mod_rewrite. See the official documentation here.

mod_rewrite rule is ignored with Wordpress

I am trying to use mod_rewrite to redirect all requests to a certain directory to a specific page:
RewriteEngine On # Turn on rewriting
RewriteRule /about/(.*) /wp-content/themes/twentyfiteen/test.php
From here I plan to get the requested URI and serve up the appropriate page.
But it seems that this rule does not even get triggered.
Thanks
URI's that are sent through rewrite rules in an htaccess file have the leading slash removed, so you can't match /about/, you need to remove the leading slash:
RewriteEngine On
RewriteRle ^about/(.*)$ /wp-content/themes/twentyfiteen/test.php [L]

How to prevent mod_rewrite from rewriting URLs more than once?

I want to use mod_rewrite to rewrite a few human-friendly URLs to arbitrary files in a folder called php (which is inside the web root, since mod_rewrite apparently won't let you rewrite to files outside the web root).
/ --> /php/home.php
/about --> /php/about_page.php
/contact --> /php/contact.php
Here are my rewrite rules:
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^$ php/home.php [L]
RewriteRule ^about$ php/about_page.php [L]
RewriteRule ^contact$ php/contact.php [L]
However, I also want to prevent users from accessing files in this php directory directly. If a user enters any URL beginning with /php, I want them to get a 404 page.
I tried adding this extra rule at the end:
RewriteRule ^php php/404.php [L]
...(where 404.php is a file that outputs 404 headers and a "Not found" message.)
But when I access / or /about or /contact, I always get redirected to the 404. It seems the final RewriteRule is applied even to the internally rewritten URLs (as they now all start with /php).
I thought the [L] flag (on the first three RewriteRules) was supposed to prevent further rules from being applied? Am I doing something wrong? (Or is there a smarter way to do what I'm trying to do?)
[L] flag should be used only in the last rule,
L - Last Rule - Stops the rewriting process here and don’t apply any more rewriting rules & because of that you are facing issues.
I had similar problem. I have a content management system written in PHP and based on Model-View-Control paradigm. The most base part is the mod_rewrite. I've successfully prevent access to PHP files globally. The trick has name THE_REQUEST.
What's the problem?
Rewriting modul rewrites the URI. If the URI matches a rule, it is rewritten and other rules are applied on the new, rewritted URI. But! If the matched rule ends with [L], the engine doesn't terminate in fact, but starts again. Then the new URI doesn't more match the rule ending with [L], continues and matches the last one. Result? The programmer stars saying bad words at the unexpected 404 error page. However computer does, what you say and doesn't do, what you want. I had this in my .htaccess file:
RewriteEngine On
RewriteBase /
RewriteRule ^plugins/.* pluginLoader.php [L]
RewriteCond %{REQUEST_URI} \.php$
RewriteRule .* index.php [L]
That's wrong. Even the URIs beginning with plugins/ are rewritten to index.php.
Solution
You need to apply the rule if and only if the original - not rewritten - URI matches the rule. Regrettably the mod_rewrite does not provide any variable containing the original URI, but it provides some THE_REQUEST variable, which contains the first line of HTTP request header. This variable is invariant. It doesn't change while rewrite engine is working.
...
RewriteCond %{THE_REQUEST} \s.*\.php\s
RewriteRule \.php$ index.php [L]
The regular expression is different. It is not applied on the URI only, but on entire first line of the header, that means on something like GET /script.php HTTP/1.1. But the critical rule is this time applied only if the user is explicitly requesting some PHP-script directly. The rewritten URI is not used.