Whitelist in .htaccess - apache

Instead of blacklisting inaccessible directories (like with deny all) I want to use a whitelist. Basically, I need this functionality:
If the uri requests a file that exists in /public directory, display it;
Otherwise route the request to /public/index.php;
'public' string is not needed in request string: http://site.com/flower.jpg displays DOCUMENT_ROOT/public/flower.jpg file from the file system;
Example:
Directory structure:
public\
flower.jpg
index.php
data\
secret_file.crt
Request string and expected result:
site.com/flower.jpg
flower.jpg is displayed
site.com/data/secret_file.crt
site.com/public/flower.jpg
site.com/public
site.com/data
site.com/any/random_url
request is routed to public/index.php
What I have now:
(and even that with outside help)
# the functionality described in #1 above
RewriteCond %{DOCUMENT_ROOT}/public%{REQUEST_URI} -f
RewriteRule .* public%{REQUEST_URI} [L]
# I'd like to take out the following line so ALL other requests route to index.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* public/index.php
If I remove the
RewriteCond %{REQUEST_FILENAME} !-f
line, it seizes to work, I've experimented countless configurations, read the modRewrite docs but can't figure out why this simple thing refuses to simply function.
Can anyone help me out or point in the right direction?
Complete final solution for reference
RewriteEngine On
# following line stops mod_rewrite from looping because this rule has already been applied
RewriteCond %{REQUEST_URI} !^/public/index.php
RewriteCond %{DOCUMENT_ROOT}/public%{REQUEST_URI} -f
RewriteRule .* /public%{REQUEST_URI} [L]
# don't apply this rule if the first rule has been applied
RewriteCond %{REQUEST_URI} !^/public/
RewriteRule .* /public/index.php [L]
It's a little more complicated when the application is in a subdirectory, like http://site.com/uk/, but this works great.

Ok, this is going to be a little confusing to explain. The problem you are having is that when mod_rewrite rewrites something, without the [R] or [P], it redirects internally, and all the rewrite rules get applied again. This keeps happening until the rewritten uri is the same as the un-rewritten uri. So the first rule you have is getting rewritten by the second rule. You need to prevent that from happening.
First, let's look at the first rule. What you had is totally fine, except you need to add a condition for the caveat site.com/public/flower.jpg rerouted to public/index.php. This means if the request itself has a /public/ in it, it will not serve the request (and let the 2nd rule handle things). An additional caveat here is if you have a directory "public" inside "/public", as in DOCUMENT_ROOT/public/public/, it will be inaccessible.
# Make sure the request itself isn't for /public/
RewriteCond %{THE_REQUEST} !^[A-Z]+\ /public/
# Make sure the filename exists.
RewriteCond %{DOCUMENT_ROOT}/public%{REQUEST_URI} -f
RewriteRule ^ /public%{REQUEST_URI} [L]
Here we've done the extra check for a request starting with something like GET /public/flower.jpg, if it matches, we skip this rule entirely. Also, this rule will break if you try to access a directory in /public/. For example, if you have a directory "stuff" inside "/public" and try to access it via the request site.com/stuff/, this rule will not allow you to see the contents (even if there is an index.html file in /stuff/) because you are not checking if directories exist. You can do that by adding this condition for -d, like so:
# Make sure the request itself isn't for /public/
RewriteCond %{THE_REQUEST} !^[A-Z]+\ /public/
# Make sure the filename/directory exists.
RewriteCond %{DOCUMENT_ROOT}/public%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}/public%{REQUEST_URI} -d
RewriteRule ^ /public%{REQUEST_URI} [L]
The -d condition along with the [OR] of the -f says: if %{DOCUMENT_ROOT}/public%{REQUEST_URI} is a regular file OR a directory. (See the RewriteCond docs)
Now for the second rule, and this is going to look a bit confusing because we have to handle the negation of the first rule's conditions. If the first rule passes and the URI is rewritten, 2 things happen:
The request doesn't start with something like: GET /public/
The uri got rewritten to "/public/[something]"
So we'll have 2 conditions to deal with that. If the first rule rewrote the URI, we don't want to touch it again. This solves the problem that I mentioned in the first paragraph. Additionally, we don't want to URI to get re-rewritten, causing a rewrite loop. So we need to add a condition to stop rewriting if the 2nd rule has already been applied, which means the URI is now /public/index.php. Here are the combination of those conditions:
# stops mod_rewrite from looping because this rule has already been applied
RewriteCond %{REQUEST_URI} !^/public/index.php
# don't apply this rule if the first rule has been applied
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /public/ [OR]
RewriteCond %{REQUEST_URI} !^/public/
RewriteRule ^ /public/index.php [L]

This may work:
RewriteCond %{DOCUMENT_ROOT}/public%{REQUEST_FILENAME} -f [OR]
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f
RewriteRule (.*) public$1 [QSA,L]
RewriteRule .* public/index.php
The optimized version may work too but I'm not sure:
RewriteCond %{DOCUMENT_ROOT}(/public|public|)%{REQUEST_FILENAME} -f
RewriteRule (.*) public$1 [QSA,L]
RewriteRule .* public/index.php
By the way your logic is weird: the following rule:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* public/index.php
Means: "if the request is not a file, rewrite to public/index.php". The problem is here: if it's a file, what's going on? Nothing. The RewriteRule is ignored. This is not safe, imagine if it's a file that you may not want the user to access? Just remove this rule, it's useless, and without it, it's safer (from my point of view).
May I ask you to tell me if the optimized version worked?
Please try to use the RewriteLog directive: it helps you to track down such problems:
# Trace:
# (!) file gets big quickly, remove in prod environments:
RewriteLog "/web/logs/mywebsite.rewrite.log"
RewriteLogLevel 9
RewriteEngine On
Tell me if it works.

I'm a bit confused with your first set of rules, since %{REQUEST_URI} would be /public/flower.jpg if I'm not mistaking. I would have done it this way :
RewriteCond public/%{REQUEST_FILENAME} -f
RewriteRule ^.*$ public/%{REQUEST_FILENAME} [L]
RewriteCond public/%{REQUEST_FILENAME} !-f
RewriteRule ^.*$ public/index.php [L]
I'm not sure of the behaviour if %{REQUEST_FILENAME} is empty but basically the rules says:
If the filename exists in public, rewrite all URI to that file, if it does not rewrite to index.php
Would that work for you?

Have you considered programmatically creating your .htaccess file to blacklist anything that isn't on a whitelist that you set in whatever file you use to create it? If you ask me, you can't get much simpler.

Related

.htaccess 301 redirect a directory, not working

I need to 301 redirect a directory and its included rewitten URLs:
/old-directory/any-url.html 301-redirected to /new-directory/any-url.html
and
/old-directory/ 301-redirected to /new-directory/
Inside that directory (moved to /new-directory), I have this directory-specific .htaccess with this rules I need to keep:
RewriteEngine On
RewriteRule ^list\.html$ list.php [QSA]
RewriteRule ^(.*)\.html$ index.php?hash=$1 [QSA,L]
Maybe it's not necessary to specify that, but on the root directory, I have this root .htaccess with rules that I don't want to interfere with:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^([a-zA-Z0-9-]*)$ index.php?page=$1 [QSA,L]
I tried to add this in the root .htaccess:
RewriteRule ^old-directory$ /new-directory/ [R=301,NC,L]
RewriteRule ^old-directory/(.*)$ /new-directory/$1 [R=301,NC,L]
...but it does not work (I get a 500 internal server error).
Have you got an idea why? Since I successfully tested this rule on a .htaccess testing site, I guess it's because it interfere with the other /new-directory/ specific .htaccess... In that case, would it be better to merge the two .htaccess into one, and how?
This issue is, even when you fix your 500 error, your base rewrite rule will always take precedence because old-directory matches RewriteRule ^([a-zA-Z0-9-]*)$ so make sure the order is correct and you place your old-directory rules BEFORE the existing ones:
RewriteEngine On
RewriteRule ^old-directory(.*)$ /new-directory$1 [R=301,NC,L]
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^([a-zA-Z0-9-]*)$ index.php?page=$1 [QSA,L]
You'll notice I also simplified your rule. There's no reason you should have 2 lines for that.
Don't forget that once redirected to new-directory, RewriteRule ^([a-zA-Z0-9-]*)$ (from the root dir) will still apply so you might need need to change it to:
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !^new-directory
RewriteRule ^([a-zA-Z0-9-]*)$ index.php?page=$1 [QSA,L]
%{REQUEST_FILENAME}.php also looks a bit suspicious. You probably only need %{REQUEST_FILENAME}
In the end, you need to sort the 500 error first by looking at the Apache error logs, and then reorganise your rules so they don't clash. You could potentially merge them. It's actually easier to merge them to control their order. Otherwise, they just cascade and it can become fairly complex.

Complex(?) htaccess rewriting / redirecting

It seems every few weeks I have to ask more .htaccess rewriting/redirecting questions. Every time I think I understand it, another wrench gets thrown into my project that shows that I don't.
EDIT: My original question wasn't very clear so the following is an attempt to be more concise.
As it stands, all of the .html files live in the root directory. eg: http://example.com/about.html
There aren't any sub-directories with the exception of normal ones like img, css, etc.
For tracking purposes, if someone types in http://example.com/random/ where "random" can be any string of characters, I'd want them to see the index.html file, without modifying the url. The directory "random" doesn't actually exist on the server at all.
The same goes for other pages like about.html. If someone types in http://example.com/random/about.html I'd want them to see the about.html page.
At the same time, I'd like http://example.com/random/about or http://example.com/about (missing file extension) to also show the about page.
However, if someone typed in a page that doesn't exist, I'd like for it to use the ErrorDocument
Example: I don't have a file named "pickups.html" so the following would all be 404s:
http://example.com/pickups.html
http://example.com/pickups
http://example.com/random/pickups.html
http://example.com/random/pickups
It would be nice if the end redirect/rewrite did have the file extension stripped off (because it looks nicer).
My thoughts are that any request ending with a / would just serve up the index.html file that exists at the site root. So that leaves the files.
My thought process is:
strip the file extension off of the request
check if that file with an extension exists at site root
if yes, display that page.
if no, 404.
My initial code (had help on it) was this:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)/(.*)$ /$2 [R=301,L]
I understand that in that code I'm grabbing everything after the last slash and serving it from the document root. Unfortunately, it doesn't account for files that do not exist.
Starting with existing files, they will be passed through unchanged. This also prevents rewrite loops.
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
Next are existing files, requested as part of an optional, virtual subdirectory
RewriteCond %{DOCUMENT_ROOT}/$2 -f
RewriteRule ^(.+/)?(.+)$ /$2 [L]
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+)$ /$2.html [L]
This splits the request into an optional prefix (.+/)? and the file part. If this file part exists, maybe with an appended .html, you're done.
Next comes anything with a trailing slash, just rewrite to index.html
RewriteRule /$ /index.html [L]
Anything else will be requests for non-existing files, which yield a 404 status code.
In order to remove an optional .html extension and remove an optional trailing slash / for existing files, we must insert two rules at the beginning
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)\.html/?$ /$1$2 [R,L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)/$ /$1$2 [R,L]
These rules are similar to the other rules, except they do a redirect R|redirect instead of a rewrite, and have an additional condition to prevent a rewrite loop.
Putting everything together gives
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)\.html/?$ /$1$2 [R,L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)/$ /$1$2 [R,L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
RewriteCond %{DOCUMENT_ROOT}/$2 -f
RewriteRule ^(.+/)?(.+)$ /$2 [L]
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+)$ /$2.html [L]
RewriteRule /$ /index.html [L]

RewriteCond Being Ignored?

I am trying to use mod_rewrite on a Ubuntu 12.04 server to make my URLs more readable, however I want to add an exception for images and css files.
My input URLs are in the format \controller\action which is then re-written to index.php?controller=controller&action=action. I want to add an exception so that if an image or css file is specified, the URL is not re-written, e.g. \images\image.jpg would not be re-written.
My .htaccess code is as follows:
RewriteEngine on
RewriteCond %{REQUEST_URI} !(\.gif|\.jpg|\.png|\.css)$ [NC]
RewriteRule ^([a-zA-z]+)/([a-zA-z]+)$ test.php?controller=$1&action=$2 [L]
RewriteRule ^([a-zA-z]+)/([a-zA-z]+)/([^/]*)$ test.php?controller=$1&action=$2&$3 [L]
My re-write code is working fine and the URLs are coming out as intended, however even if I request an image, the URL is still being re-written. It appears that my RewriteCond is being ignored, anyone any suggestions as to why this might be?
The RewriteCond only applies to your first RewriteRule, it should be reproduced for the second rule. However, I think that is better to add a non-rewriting rule, before, to exclude existing stuffs.
# Do nothing for files which physically exist
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule .* - [L]
# your MVC rules
RewriteRule ^([a-zA-z]+)/([a-zA-z]+)$ test.php?controller=$1&action=$2 [L]
RewriteRule ^([a-zA-z]+)/([a-zA-z]+)/([^/]*)$ test.php?controller=$1&action=$2&$3 [L]
The rewriteCond rule is only applied for the next RewriteRule.
So you need to at least repeat the rewriteCond for your seconde RewriteRule.
No there is certainly better things to do.
For example a usual way of doing it is to test that the url is matching a real static ressource. If all your php code is outside the web directory (in libraries directory, except for index.php) then all styatic ressources available directly on the the document root can only be js files, css files, or image files.
So this is the usual way of doing it:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-zA-z]+)/([a-zA-z]+)$ test.php?controller=$1&action=$2 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-zA-z]+)/([a-zA-z]+)/([^/]*)$ test.php?controller=$1&action=$2&$3 [L]
But this is a starting point. We could certainly find something to avoid doing 2 rules for this (maybe I'll have a look later)

Is it possible to chain RewriteCond in htaccess?

I am going to be doing some basic %{HTTP_HOST} work in my .htaccess file and was wondering if it would be possible to do something similar to this:
RewriteCond %{HTTP_HOST} ((foo|bar|baz).com)$
RewriteCond %{DOCUMENT_ROOT}/apps/%1/webroot%{REQUEST_URI} -d [OR]
RewriteCond %{DOCUMENT_ROOT}/apps/%1/webroot%{REQUEST_URI} -f
RewriteRule %{DOCUMENT_ROOT}/apps/%1/webroot%{REQUEST_URI} [L]
RewriteRule ^(.*)$ index.php?uri=$1 [QSA,L]
basically, if someone visits foo.com on any sub-domain, I want them to be served files directly from that folder but also have any requests that aren't for specific files sent to my index.php file for processing (which will do the routing)
The reason I am asking is because what I have written above does not actually work, so is there a way to do it? (also if this SHOULD work then it'll obviously be a problem with the rest of my .htaccess file, but it all works when dealing with just one application folder)
The other (messy IMO) way would be to route everything to the folders and have a second .htaccess file for each domain, but I'd rather not do this if it can be done in one file!
Your first rewrite rule is missing a regex, which makes the rewrite engine thing you are trying to match (as a regular expression) %{DOCUMENT_ROOT}/apps/%1/webroot%{REQUEST_URI} and you want to rewrite that to [L]. I suspect you want that to look like:
RewriteRule ^ /apps/%1/webroot%{REQUEST_URI} [L]
The ^ matches anything, since you've already vetted the request with your 3 conditions.
Now you need to add a few conditions to your last rule so that it doesn't blindly get applied to everything. You probably want something like:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?uri=$1 [QSA,L]

RewriteCond strange behavior (file exists check)

I can't understand why redirect depends on RewriteRule (not on RewriteCond).
My .htaccess:
Options +FollowSymLinks +SymLinksIfOwnerMatch
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ true.txt
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ false.txt
</IfModule>
Root folder contains:
true.txt (contains 'true')
false.txt (contains 'false')
test.txt (contains 'test')
If I try to open test.txt I get true and if I try to open nonexist.txt i get true too.
Now I change my .htaccess:
...
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ $1
...
And now if I try to open test.txt I get test and if I try to open nonexist.txt i get false.
UPDATE: Thanks for answers, I understood how it works but one problem still exists.
If I try to check 'if file exists' in another directory it always returns false.
/files/test.txt
/script/.htaccess
/script/false.txt
/script/true.txt
now my .htaccess looks like
RewriteCond %{REQUEST_FILENAME} .*(true|false).*$
RewriteRule .* - [S=2]
RewriteCond %{DOCUMENT_ROOT}/files/%{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ true.txt [L]
RewriteCond %{DOCUMENT_ROOT}/files/%{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ false.txt [L]
I always get false.
I also tried RewriteCond ../files/%{REQUEST_FILENAME} and also always get false result.
If I move test.txt in script folder then and change RewriteCond %{REQUEST_FILENAME} all works fine.
It's because of the way mod_rewrite works: the user requests test.txt, mod_rewrite catches the requests and rewrites the URI to false.txt, then it makes a second pass, by sending an internal request for false.txt, which is caught and rewritten to true.txt. Then a third pass is made, the request is caught and rewritten to true.txt, but since the URI stays the same, no more passes are made.
It's rather counter-intuitive, but there's logic to it. Here's the control flow diagram from the docs:
The [L] flag is often advertised as a magic bullet to stop the recursion, but in fact it just ensures that once a request matches a pattern, then the execution stops and no further processing will take place in that pass, but the internal request will be sent out anyhow, so a second pass is made through the same ruleset. The execution stops only if the URI is unchanged after a pass.
re: update
Your problem is, the REQUEST_FILENAME environmental variable actually holds a path (by default the full filesystem path, but there are a few twists to that), so %{DOCUMENT_ROOT}/files/%{REQUEST_FILENAME} ends up being something horrible.
As for a solution... well, it's tricky, I think. It'd be a lot easier if the .htaccess were in root. The only solution I can think of right now is:
RewriteEngine on
RewriteCond %{REQUEST_URI} script/(.*)$
RewriteCond %{DOCUMENT_ROOT}/files/%1 -f
RewriteRule .* true.txt [L]
RewriteCond %{REQUEST_URI} !(true.txt)|(false.txt)
RewriteRule .* false.txt [L]
It's rather ugly, and not very scalable or portable. In the first condition I get the file's name, in the second I check if it exists, and if it does, it's true. Everything else is false. Then again, if the files directory is also in the scope of the .htaccess, it's easier and nicer by magnitudes.
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{REQUEST_URI} !(true|false)\.txt$
RewriteRule .* true.txt [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* false.txt [L]
Note the second RewriteCond to prevent rewriting true.txt and false.txt files, and L flag on the rules to stop rules execution
These are to prevent rules loop
UPDATE:
%{REQUEST_FILENAME} is full path, hence if you add it to some path, you'll get false (it will try to match this, essentially: /var/www/subfolder/var/www/filename.txt
To match a file in another folder you will need a match vs URI part...
Here's how you can do it:
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_URI} ^/([^/]+)$
RewriteCond %{DOCUMENT_ROOT}/files/%1 -f
RewriteRule .* files/$0 [L]
The first condition checks if the request was to some filename in the root directory (it checks that uri starts with a /, but does not contain any more slashes
Note that the first condition encloses everything but the slash in the beginning with parentheses - this matched subpattern will be used later
The second condition ensures the file, which name is saved in subpattern %1 (matched by first condition) exists in subfolder files/ inside %{DOCUMENT_ROOT}
If both the rules matched, the request is rewritten to that file (via sub-request - the browser is not redirected).
Instead of using "RewriteCond %{REQUEST_FILENAME} !-f" you can try:
"RewriteCond %{THE_REQUEST} !-U", which checks the if the address exists.
Sometimes the file path and the address where the file is served are different, making the former unusable.
example:
RewriteEngine On
RewriteCond %{THE_REQUEST} !-U
RewriteRule ^(.*/media/.*)\.(gif|png|jpe?g)$ https://xyz.company.com$1.$2 [NC,L,R=301]