Complex(?) htaccess rewriting / redirecting - apache

It seems every few weeks I have to ask more .htaccess rewriting/redirecting questions. Every time I think I understand it, another wrench gets thrown into my project that shows that I don't.
EDIT: My original question wasn't very clear so the following is an attempt to be more concise.
As it stands, all of the .html files live in the root directory. eg: http://example.com/about.html
There aren't any sub-directories with the exception of normal ones like img, css, etc.
For tracking purposes, if someone types in http://example.com/random/ where "random" can be any string of characters, I'd want them to see the index.html file, without modifying the url. The directory "random" doesn't actually exist on the server at all.
The same goes for other pages like about.html. If someone types in http://example.com/random/about.html I'd want them to see the about.html page.
At the same time, I'd like http://example.com/random/about or http://example.com/about (missing file extension) to also show the about page.
However, if someone typed in a page that doesn't exist, I'd like for it to use the ErrorDocument
Example: I don't have a file named "pickups.html" so the following would all be 404s:
http://example.com/pickups.html
http://example.com/pickups
http://example.com/random/pickups.html
http://example.com/random/pickups
It would be nice if the end redirect/rewrite did have the file extension stripped off (because it looks nicer).
My thoughts are that any request ending with a / would just serve up the index.html file that exists at the site root. So that leaves the files.
My thought process is:
strip the file extension off of the request
check if that file with an extension exists at site root
if yes, display that page.
if no, 404.
My initial code (had help on it) was this:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)/(.*)$ /$2 [R=301,L]
I understand that in that code I'm grabbing everything after the last slash and serving it from the document root. Unfortunately, it doesn't account for files that do not exist.

Starting with existing files, they will be passed through unchanged. This also prevents rewrite loops.
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
Next are existing files, requested as part of an optional, virtual subdirectory
RewriteCond %{DOCUMENT_ROOT}/$2 -f
RewriteRule ^(.+/)?(.+)$ /$2 [L]
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+)$ /$2.html [L]
This splits the request into an optional prefix (.+/)? and the file part. If this file part exists, maybe with an appended .html, you're done.
Next comes anything with a trailing slash, just rewrite to index.html
RewriteRule /$ /index.html [L]
Anything else will be requests for non-existing files, which yield a 404 status code.
In order to remove an optional .html extension and remove an optional trailing slash / for existing files, we must insert two rules at the beginning
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)\.html/?$ /$1$2 [R,L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)/$ /$1$2 [R,L]
These rules are similar to the other rules, except they do a redirect R|redirect instead of a rewrite, and have an additional condition to prevent a rewrite loop.
Putting everything together gives
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)\.html/?$ /$1$2 [R,L]
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+?)/$ /$1$2 [R,L]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
RewriteCond %{DOCUMENT_ROOT}/$2 -f
RewriteRule ^(.+/)?(.+)$ /$2 [L]
RewriteCond %{DOCUMENT_ROOT}/$2.html -f
RewriteRule ^(.+/)?(.+)$ /$2.html [L]
RewriteRule /$ /index.html [L]

Related

How to have fake short urls but let apache get files from old directory?

We have a tool that unfortunately has to be set up in a subfolder - so it can't be moved to the root without a lot of effort. Even if it could, we would have unsightly URLs with parameters attached.
The current setup is something like this:
https://example.com/subfolder1/subfolder2/file.php?page=home
Nicer would be:
https://example.com/home/
Now we have the following RewriteRules:
RewriteEngine On
# www
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ %{REQUEST_SCHEME}://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
# add slash to the end
#RewriteRule ^(.*)([^/])$ /$1$2/ [L,R=301]`
RewriteRule ^home/$ /subfolder1/subfolder2/file.php?page=home&param1=foo&param2=bar
RewriteRule ^contact/$ /subfolder1/subfolder2/file.php?page=contact&param1=foo&param2=bar
RewriteRule ^appointments/$ /subfolder1/subfolder2/file.php?page=appointments&param1=foo&param2=bar
So the URL https://example.com/home/ is currently "working" - at least the content of the page is there, but of course all relatively used elements (images, JS, CSS, ...) now lead to nowhere because e.g. https://example.com/home/images/ or https://example.com/home/js/ don't exist.
We have tried several other rules to tell Apache "show /home/ but take all other files from /subfolder1/subfolder2/". A few times the whole page was not reachable.
Currently these are the last lines in the htaccess:
RewriteCond %{DOCUMENT_ROOT}/subfolder1/subfolder2/%{REQUEST_FILENAME} -f
RewriteRule (.*) %{DOCUMENT_ROOT}/subfolder1/subfolder2/$1 [L]
What is wrong here? How should it be correct? And why? :)

htaccess rewrite not working for multiple files in main directory

The title does not fully describe the issue, but I have rewrite rules setup to go to three different files which exist in the main directory: api.php, admin.php, and index.php
Here is my .htaccess
RewriteEngine on
RewriteCond $1 ^(api)
RewriteRule ^(.*)$ /api.php/$1 [L]
RewriteCond $1 ^(admin)
RewriteRule ^(.*)$ /admin.php/$1 [L]
RewriteCond $1 !^(index\.php|admin\.php|api\.php|admin|api|_|robots\.txt)
RewriteRule ^(.*)$ /index.php/$1 [L]
For /admin and /api I get a 500 Internal Server Error. I am not sure why that happens, yet if I put those php files within a folder like /_ and edit the .htaccess to match it then it rewrites without an error. Am I limited on the number of main directory file redirects I can do? Or did I am I missing something?
My main goal is:
Redirect all /api requests to /api.php/whatever/is/after/here
Redirect all /admin requests to /admin.php/whatever/is/after/here
Redirect all other requests apart from the exceptions to /index.php/whatever/is/here
Try with:
RewriteEngine on
# skip all files and directories from rewrite rules below
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^ - [L]
RewriteRule ^(api.*)$ /api.php/$1 [L]
RewriteRule ^(admin.*)$ /admin.php/$1 [L]
RewriteCond $1 !^(index\.php|admin\.php|api\.php|admin|api|_|robots\.txt)
RewriteRule ^(.*)$ /index.php/$1 [L]
I found that I simply had to be more specific in my regex match. Previously I was matching anything that started with "admin" or "api" (eg. admins-at-the-new-school) which actually I was unaware of and would cause problems in the future anyways. I changed my regex and now it only matches if it's the end of a line, a pound sign, slash, or question mark. (based on testing)
Here is my final code:
RewriteEngine on
RewriteRule ^(api(?=$|[/?#]).*)$ /api.php/$1 [L]
RewriteRule ^(admin(?=$|[/?#]).*)$ /admin.php/$1 [L]
RewriteCond $1 !^(index\.php|admin\.php|api\.php|_|robots\.txt)
RewriteRule ^(.*)$ /index.php/$1 [L]
I appreciate Croises answer, and it did help give me an idea of what might be going wrong that his solution would work. However, this is what I was looking for as I did not want to open up access to files and directories simply because they existed.

Apache Rewrite urls - remove .html and return 404 if .html is present

I am adding a directory to a website that is served with Apache 2 that I want to drop the .html extension from incoming requests. In /new-directory I have a .htaccess file containing:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^\.]+)$ $1.html [NC,L]
So with this rule /new-directory/page works, but /new-directory/page.html also works, which I don't want. I want all pages in new-directory/ and sub-directories to only serve pages without .html, and return a 404 not found if a page.html request comes in.
These are new pages so I don't care about redirecting.
Edit:
Forgot to mention that there is only one file in /new-directory (/new-directory/dhandler) - a Perl script that parses the incoming url if there is a matching database entry. There are no files to match so I can drop that condition.
Figured it out, my fault that I didn't explain that there is only one default file handler in /new-directory which lead to some confusion, see edit above - anyway this worked:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^\.]+)$ $1.html [NC,L]
RewriteCond %{THE_REQUEST} .*\.html[\s\?]{1}
RewriteRule .*\.html$ - [R=404,L]
What messed me up was that sometimes these urls will have query strings so I have to check for both space and ? to match .html in %{THE_REQUEST}.
EDIT: If someone has actual .html files in a directory(instead of one file that dynamically handles all requests like I do) then they should probably add:
RewriteCond %{REQUEST_FILENAME} -f
right after 'RewriteEngine On' To make sure that the incoming request matches an existing file in /new-directory.
You need a couple of rules to do this:
RewriteBase /
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule (.*) $1.html [NC,L]
RewriteCond %{THE_REQUEST} .*\.html\s
RewriteRule .*\.html$ - [R=404,L]
The first one checks that there is actually an html file that corresponds to the request. If it does, it will internally rewrite the request to that.
The second rule will redirect anything that ends .html to a 404 not found.
Place this code in /new-directory/.htaccess:
RewriteEngine On
RewriteBase /new-directory/
RewriteCond %{THE_REQUEST} /(?:index)?(.*?)\.html[\s?] [NC]
RewriteRule ^ %1 [R=301,L,NE]
RewriteCond %{REQUEST_FILENAME} !-d
#RewriteCond %{DOCUMENT_ROOT}/new-directory/$1\.html -f [NC]
RewriteRule ^(.+?)/?$ $1.html [L]

RewriteCond strange behavior (file exists check)

I can't understand why redirect depends on RewriteRule (not on RewriteCond).
My .htaccess:
Options +FollowSymLinks +SymLinksIfOwnerMatch
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ true.txt
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ false.txt
</IfModule>
Root folder contains:
true.txt (contains 'true')
false.txt (contains 'false')
test.txt (contains 'test')
If I try to open test.txt I get true and if I try to open nonexist.txt i get true too.
Now I change my .htaccess:
...
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ $1
...
And now if I try to open test.txt I get test and if I try to open nonexist.txt i get false.
UPDATE: Thanks for answers, I understood how it works but one problem still exists.
If I try to check 'if file exists' in another directory it always returns false.
/files/test.txt
/script/.htaccess
/script/false.txt
/script/true.txt
now my .htaccess looks like
RewriteCond %{REQUEST_FILENAME} .*(true|false).*$
RewriteRule .* - [S=2]
RewriteCond %{DOCUMENT_ROOT}/files/%{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ true.txt [L]
RewriteCond %{DOCUMENT_ROOT}/files/%{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ false.txt [L]
I always get false.
I also tried RewriteCond ../files/%{REQUEST_FILENAME} and also always get false result.
If I move test.txt in script folder then and change RewriteCond %{REQUEST_FILENAME} all works fine.
It's because of the way mod_rewrite works: the user requests test.txt, mod_rewrite catches the requests and rewrites the URI to false.txt, then it makes a second pass, by sending an internal request for false.txt, which is caught and rewritten to true.txt. Then a third pass is made, the request is caught and rewritten to true.txt, but since the URI stays the same, no more passes are made.
It's rather counter-intuitive, but there's logic to it. Here's the control flow diagram from the docs:
The [L] flag is often advertised as a magic bullet to stop the recursion, but in fact it just ensures that once a request matches a pattern, then the execution stops and no further processing will take place in that pass, but the internal request will be sent out anyhow, so a second pass is made through the same ruleset. The execution stops only if the URI is unchanged after a pass.
re: update
Your problem is, the REQUEST_FILENAME environmental variable actually holds a path (by default the full filesystem path, but there are a few twists to that), so %{DOCUMENT_ROOT}/files/%{REQUEST_FILENAME} ends up being something horrible.
As for a solution... well, it's tricky, I think. It'd be a lot easier if the .htaccess were in root. The only solution I can think of right now is:
RewriteEngine on
RewriteCond %{REQUEST_URI} script/(.*)$
RewriteCond %{DOCUMENT_ROOT}/files/%1 -f
RewriteRule .* true.txt [L]
RewriteCond %{REQUEST_URI} !(true.txt)|(false.txt)
RewriteRule .* false.txt [L]
It's rather ugly, and not very scalable or portable. In the first condition I get the file's name, in the second I check if it exists, and if it does, it's true. Everything else is false. Then again, if the files directory is also in the scope of the .htaccess, it's easier and nicer by magnitudes.
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{REQUEST_URI} !(true|false)\.txt$
RewriteRule .* true.txt [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* false.txt [L]
Note the second RewriteCond to prevent rewriting true.txt and false.txt files, and L flag on the rules to stop rules execution
These are to prevent rules loop
UPDATE:
%{REQUEST_FILENAME} is full path, hence if you add it to some path, you'll get false (it will try to match this, essentially: /var/www/subfolder/var/www/filename.txt
To match a file in another folder you will need a match vs URI part...
Here's how you can do it:
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_URI} ^/([^/]+)$
RewriteCond %{DOCUMENT_ROOT}/files/%1 -f
RewriteRule .* files/$0 [L]
The first condition checks if the request was to some filename in the root directory (it checks that uri starts with a /, but does not contain any more slashes
Note that the first condition encloses everything but the slash in the beginning with parentheses - this matched subpattern will be used later
The second condition ensures the file, which name is saved in subpattern %1 (matched by first condition) exists in subfolder files/ inside %{DOCUMENT_ROOT}
If both the rules matched, the request is rewritten to that file (via sub-request - the browser is not redirected).
Instead of using "RewriteCond %{REQUEST_FILENAME} !-f" you can try:
"RewriteCond %{THE_REQUEST} !-U", which checks the if the address exists.
Sometimes the file path and the address where the file is served are different, making the former unusable.
example:
RewriteEngine On
RewriteCond %{THE_REQUEST} !-U
RewriteRule ^(.*/media/.*)\.(gif|png|jpe?g)$ https://xyz.company.com$1.$2 [NC,L,R=301]

Multiple RewriteConds and RewriteRule Stacked Together

I have this apache rewrite rule:
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} mycompany.com
RewriteRule ^$ http://mycompany.com/login [L]
# we check if the .html version is here (caching)
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
# no, so we redirect to our front web controller
RewriteRule ^(.*)$ index.php [QSA,L]
The only thing I can make sense of is if it's mycompany.com, then the script will redirect to http://mycompany.com/login. If not, then ...
I can't figure out already.
Any idea what does the above script say?
Something quite interesting, not easy to understand.
A google search on the comment texts inside the code gave interesting results: http://www.google.com/search?q=%22%23+we+check+if+the+.html+version+is+here+%28caching%29%22
Edit: if we look at the last lines and knowing that Symfony uses caching (it creates local files with .html extension in the same directories as the URL shows 'em) I can try to explain the lines here
If the requested url is something like http://yoursite.com/blabla/ we try to open an index.html file in that directory. If the file is not there, another cycle of rewriting will happen and the last Cond will be hit (where the file does not exist)
RewriteRule ^$ index.html [QSA]
If something more is in the url, like http://yoursite.com/blabla/blblbl, try to find a file blblbl.html
RewriteRule ^([^.]+)$ $1.html [QSA]
This is the collector of all urls that did not match any of the previous rules or the cached file did not exist:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php [QSA,L]