Wildcard in URL (mod_alias or mod_rewrite) - apache

I need to have URLs such as mydomain.com/whatever, where "whatever" can be any arbitrary string, all call the same php file where it sorts out what to display (or displays a 404). However, I want files and other php files to work normally (anything that is otherwise aliased, or that actually exists in the file system).
A simple AliasMatch /* myphpfile.php (after all the other Aliases in httpd.conf) works fine on my own setup, but on a production server, the wildcard alias sends all the other php files to myphpfile.php. I'm not sure what else might be confusing things.
Technically the whatever string will be alphabetic and lower case, so it can filter for that, but all attempts I've made with regex's haven't been successful.

Use these rules (you need mod_rewrite):
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
# do not do anything for already existing files
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule .+ - [L]
RewriteRule ([a-z]+) /myfile.php [L]
Place in .htaccess in website root folder. If placed elsewhere some tweaking may be required.
This will rewrite (internal redirect) all NON-EXISTING single-lowercase-word requests to /myfile.php, where using $_SERVER['REQUEST_URI'] script can determine which URL was called and decide what to do (routing).
This will work for URLs like /whatever, but will do nothing for /what-ever, /hello/pinky, /hello123.

Related

.htaccess rewrite - remove all extensions

I would like a to do a rewrite rule that removes all extensions - regardless of filename
https://example.com/filename.extension -> https://example.com/filename
for example:
https://example.com/horses.txt -> https://example.com/horses
https://example.com/icecream.json -> https://example.com/icecream
I tried:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^(.*)\.*$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)$ *? [QSA,L]
</IfModule>
not working as it should
You can only reasonably do what you are asking with MultiViews.
For example, as simple as:
Options +MultiViews
You need to remove your existing mod_rewrite directives.
Now, a request for example.com/horses will be correctly routed to /horses.txt, or whatever file extensions you are using. MultiViews uses mod_negotiation.
This isn't so easy to do with mod_rewrite, since you need to test each file extension in turn in order to work out what file you need to rewrite back to in order to route the request correctly. eg. Should a request for example.com/horses route to /horses.txt or horses.jpg? MultiViews does this comparison for you.
I would like a to do a rewrite rule that removes all extensions
Although, you need to actually remove the file extension in the HTML source. This isn't something you do in .htaccess, unless you need to preserve SEO or backlinks that have already linked back to the old URLs.
UPDATE: Perhaps I wasn't clear enough, I would like the url to display without the extension even if it is linked to it, or to go to that file if linked without the extension
Well, you need to actually remove the file extension on all your internal links. You can issue a "redirect" in .htaccess to remove the extension for the benefit of search engines and 3rd party links - but if you rely on this for your internal links then it will potentially slow users and your site as you are doubling the number of requests hitting your server.
To remove the file extension for direct requests (SEO / 3rd party links), you could do something like this:
RewriteEngine On
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^([^.]+)\.[\w]{2,4}$ /$1 [R=302,L]
This does assume that the only dot in the URL-path is the one that delimits the file extension.
The difficult part is then internally rewriting the request back to the underlying file with an extension - that's where MultiViews comes in (first part of my answer).

mod_rewrite: redirect `x -> /legacy/x` if file _does not exist_ in `/` but _does exist_ in `/legacy/`

For over 10 years I uploaded all sorts of files to the root of my webserver.
/
oldphoto.jpg
oldjunk.txt
oldfolder/
oldfile.txt
newpage.html
newimg.png
There's now ~1800 files in the root. My FTP client is slow to retrieve directory listing, so managing the website is difficult.
I'd like to tidy this up. I know which files I want to be stored at root-level (new stuff). Everything else, I've moved into a legacy/ folder.
/
legacy/
oldphoto.jpg
oldjunk.txt
oldfolder/
oldfile.txt
newpage.html
newimg.png
Problematically: some of those old files are still accessed today, from various external websites.
I want to make a cunning mod_rewrite rule that works like this:
file didn't exist?
okay, does it exist in the legacy/ folder?
then, I'll redirect you to the appropriate file in the legacy/ folder
So the following cases need to work:
#[A] file does not exist in /, exists in legacy folder: redirect
http://example.com/oldphoto.jpg -> http://example.com/legacy/oldphoto.jpg
#[B] file does not exist in /, exists in legacy folder: redirect
http://example.com/oldfolder/oldfile.txt -> http://example.com/legacy/oldfolder/oldfile.txt
#[C] file does not exist in /, does not exist in legacy folder: 404 as usual
http://example.com/not-exist.txt -> 404
#[D] file exists in /: serve page as usual
http://example.com/newpage.html -> http://example.com/newpage.html
I got pretty close to getting this working:
RewriteEngine on
# No such file exists:
RewriteCond %{SCRIPT_FILENAME} !-f
# No such directory exists:
RewriteCond %{SCRIPT_FILENAME} !-d
# Capture the head of REQUEST_URI into %2 backreference; this tells us the absolute path to our web root
RewriteCond %{REQUEST_URI}::%{SCRIPT_FILENAME} ^(.*?)::(.*)\1$
# File exists web_root/legacy/REQUEST_URI OR
RewriteCond %2/legacy/%{REQUEST_URI} -f [OR]
# Directory exists web_root/legacy/REQUEST_URI OR
RewriteCond %2/legacy/%{REQUEST_URI} -d
# Redirect to /legacy/REQUEST_URI
RewriteRule .* /legacy/%{REQUEST_URI} [L,R=301]
But this only solves cases A,C, and D. The nested case (B) fails, because %{SCRIPT_FILENAME} is not what I thought it was.
I am testing the redirect like so:
curl -sI 'http://example.com/oldphoto.jpg' | grep Location | sed 's/^Location: //'
http://example.com/legacy//oldphoto.jpg
Here is what the macros expand to:
# when requesting 'http://example.com/oldphoto.jpg':
SCRIPT_FILENAME: /customer/homepages/13/c12345678/htdocs/user/oldphoto.jpg
REQUEST_FILENAME: /customer/homepages/13/c12345678/htdocs/user/oldphoto.jpg
DOCUMENT_ROOT: /var/www/html
REQUEST_URI: /oldphoto.jpg
THE_REQUEST: HEAD /oldphoto.jpg HTTP/1.1
# when requesting 'http://example.com/oldfolder/oldfile.txt':
SCRIPT_FILENAME: /customer/homepages/13/c12345678/htdocs/user/oldfolder
REQUEST_FILENAME: /customer/homepages/13/c12345678/htdocs/user/oldfolder
DOCUMENT_ROOT: /var/www/html
REQUEST_URI: /oldfolder/oldfile.txt
THE_REQUEST: HEAD /oldfolder/oldfile.txt HTTP/1.1
We can see that %{SCRIPT_FILENAME} does not give the full path to my file.
We can also see that %{DOCUMENT_ROOT} cannot be relied upon to give us the absolute path to my web root.
If somebody requests http://example.com/oldfolder/oldfile.txt:
how do I check for the existence of /customer/homepages/13/c12345678/htdocs/user/legacy/oldfolder/oldfile.txt?
how do I redirect the user to http://example.com/legacy/oldfolder/oldfile.txt?
I assume that /customer/homepages/13/c12345678/htdocs/user/ is likely to change (I have managed web hosting), so I would prefer not to hard-code it.
I am surprised that %{DOCUMENT_ROOT} does not give me this. Maybe it gives logical web root instead of physical web root.
I am also surprised that %{SCRIPT_FILENAME} gives me …/oldfolder rather than the path suggested in %{REQUEST_URI}: …/oldfolder/oldfile.txt.
I think you might be over complicating the process. The process is simply:
If the requested file does not exist, but does exist in the /legacy subdirectory then redirect.
This would seem to handle situations A, B, C and D. For C and D you don't actually need to do anything.
So, try something like the following instead:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/legacy/$1 -f
RewriteRule (.*) /legacy/$1 [R=302,L]
This naturally only checks that a file exists in the /legacy subdirectory. Is there really a need to check for directories as well? I thought you were only moving "files"?
Change the 302 (temporary) redirect to a 301 (permanent) only when you are sure it's working OK. Make sure you clear your browser cache before testing.
We can also see that %{DOCUMENT_ROOT} cannot be relied upon to give us the absolute path to my web root.
:
I am surprised that %{DOCUMENT_ROOT} does not give me this.
Me too, something is a bit off here... this is precisely what the DOCUMENT_ROOT server variable should be returning: the absolute filesystem path to your web root (ie. the document root). In your output, /var/www/html, looks "normal". (If this isn't returning the expected value then many web applications will fail to work? Unless your environment is mashing something together before the server-side script gets to have a go?)
In order to get /customer/homepages/13/c12345678/htdocs/user it looks like your server is maybe using some kind of Alias to map files from a different area of the filesystem?!
If you do need to grab the filesystem path from REQUEST_FILENAME (same as SCRIPT_FILENAME) then you could perhaps do something like:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} ^((/[^/]+){6})
RewriteCond %1/legacy%{REQUEST_URI} -f
RewriteRule .* /legacy/$0 [R=302,L]
This grabs the first 6 path segments from the REQUEST_FILENAME server variable.

Redirect loop with simple htaccess rule

I have been pulling my air out over this. It worked before the server migration!
Ok so basically it's as simple as this:
I have a .php file that I want to view the content of using a SEO friendly URL via a ReWrite rule.
Also to canonicalise and to prevent duplicate content I want to 301 the .php version to the SEO friendly version.
This is what I used and has always worked till now on the new server:
RewriteRule ^friendly-url/$ friendly-url.php [L,NC]
RewriteRule ^friendly-url.php$ /friendly-url/$1 [R=301,L]
However disaster has struck and now it causes a redirect loop.
Logically I can only assume that in this version of Apache it is tripping up as it's seeing that the script being run is the .php version and so it tries the redirect again.
How can I re-work this to make it work? Or is there a config I need to switch in WHM?
Thanks!!
This is how your .htaccess should look like:
Options +FollowSymLinks -MultiViews
RewriteEngine On
RewriteBase /
# To externally redirect /friendly-url.php to /friendly-url/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+(friendly-url)\.php [NC]
RewriteRule ^ /%1/? [R=302,L]
## To internally redirect /anything/ to /anything.php
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{DOCUMENT_ROOT}/$1\.php -f
RewriteRule ^(.+?)/$ $1.php [L]
Note how I am using R=302, because I don't want the rule to cache on my browser until I confirm its working as expected, then, once I can confirm its working as expected I switch from R=302 to R=301.
Keep in mind you may have also been cached from previous attempts since you're using R=301, so you better of trying to access it from a different browser you have used just to make sure its working.
However disaster has struck and now it causes a redirect loop.
It causes a redirect loop because your redirecting it to itself, the different on my code is that I capture the request, and redirect the php files from there to make it friendly and then use the internal redirect.
The exact same .htaccess file will work differently depending on where it's placed because the [L]ast flag means something different depending on location. In ...conf, [L]ast means all finished processing so get out, but in .htaccess the exact same [L]ast flag means start all over at the top of this file.
To work as expected when moving a block of code from ...conf to .htaccess, most .htaccess files will need one or the other of these tweaks:
Change the [L]ast flags to [END]. (Problem is, the [END] flag is only available in newer [version 2.3.9 and later] Apaches, and won't even "fall back" in earlier versions.)
Add boilerplate code like this at the top of each of your .htaccess files:
*
RewriteCond %{ENV:REDIRECT_STATUS} !^[\s/]*$
RewriteRule ^ - [L]

How to do a mod_rewrite redirection to relative URL

I am trying to achieve a basic URL redirection for pretty-URLs, and due to images, CSS etc. also residing in the same path I need to make sure that if the URL is accessed without a trailing slash, it is added automatically.
This works fine if I put the absolute URL like this:
RewriteRule ^myParentDir/([A-Z0-9_-]+)$ http://www.mydomain.com/myParentDir/$1/ [R,nc,L]
But if I change this to a relative URL, so that I don't have to change it each time I move things in folders, this simply doesn't work.
These are what I tried and all do not work, or redirect me to the actual internal directory path of the server like /public_html/... :
RewriteRule ^myParentDir/([A-Z0-9_-]+)$ ./myParentDir/$1/ [R,nc,L]
RewriteRule ^myParentDir/([A-Z0-9_-]+)$ myParentDir/$1/ [R,nc,L]
What is the right way to do a URL redirection so that if the user enters something like:
http://www.mydomain.com/somedir/myVirtualParentDir/myVirtualSubdir
he gets redirected to (via HTTP 301 or 302):
http://www.mydomain.com/somedir/myVirtualParentDir/myVirtualSubdir/
Thanks.
EDIT: Adding some more details because it does not seem to be clear.
Lets say I am implementing a gallery, and I want to have pretty URLs using mod_rewrite.
So, I would like to have URLs as follows:
http://www.mydomain.com/somedir/galleries/cats
which shows thumbnails of cats, while:
http://www.mydomain.com/somedir/galleries/cats/persian
which shows one image from the thumbnails of all cats, named persian.
So in actual fact the physical directory structure and rewriting would be as follows:
http://www.domain.com/somedir/gallery.php?category=cats&image=persian
So what I want to do is put a .htaccess file in /somedir which catches all requests made to /galleries and depending on the virtual subdirectories following it, use them as placeholders in the rewriting, with 2 rewrite rules:
RewriteRule ^galleries/(A-Z0-9_-]+)/$ ./gallery.php?category=$1 [nc]
RewriteRule ^galleries/(A-Z0-9_-]+)/+([A-Z0-9_-]+)$ ./gallery.php?category=$1&image=$2 [nc]
Now the problem is that the gallery script in fact needs some CSS, Javascript and Images, located at http://www.domain.com/somedir/css, http://www.domain.com/somedir/js, and http://www.domain.com/somedir/images respectively.
I don't want to hardcode any absolute URLs, so the CSS, JS and Images will be referred to using relative URLs, (./css, ./js, ./images etc.). So I can do rewriting URLs as follows:
RewriteRule ^galleries/[A-Z0-9_-]+/css/(.*)$ ./css/$1 [nc]
The problem is that since http://www.domain.com/somedir/galleries/cats is a virtual directory, the above only works if the user types:
http://www.domain.com/somedir/gallaries/cats/
If the user omits the trailing slash mod_dir will not add it because in actual fact this directory does not actually exist.
If I put a redirect rewrite with the absolute URL it works:
RewriteRule ^galleries/([A-Z0-9_-]+)$ http://www.mydomain.com/subdir/galleries/$1/ [R,nc,L]
But I don't want to have the URL prefix hardcoded because I want to be able to put this on whatever domain I want in whatever subdir I want, so I tried this:
RewriteRule ^galleries/([A-Z0-9_-]+)$ galleries/$1/ [R,nc,L]
But instead it redirects to:
http://www.mydomain.com/home/myaccount/public_html/subdir/galleries/theRest
which obviously is not what I want.
EDIT: Further clarifications
The solution I am looking for is to avoid hardcoding the domain name or folder paths in .htaccess. I am looking for a solution where if I package the .htaccess with the rest of the scripts and resources, wherever the user unzips it on his web server it works out of the box. All works like that apart from this trailing slash issue.
So any solution which involves hardcoding the parent directory or the webserver's path in .htaccess in any way is not what I am looking for.
Here's a solution straight from the Apache Documentation (under "Trailing Slash Problem"):
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R]
Here's a solution that tests the REQUEST_URI for a trailing slash, then adds it:
RewriteCond %{REQUEST_URI} !(/$|\.)
RewriteRule (.+) http://www.example.com/$1/ [R=301,L]
Here's another solution that allows you to exempt certain REQUEST_URI patterns:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} !example.php
RewriteCond %{REQUEST_URI} !(.*)/$
RewriteRule ^(.*)$ http://domain.com/$1/ [L,R=301]
Hope these help. :)
This rule should add a trailing slash to any URL which is not a real file/directory (which is, I believe, what you need since Apache usually does the redirect automatically for existing directories).
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+[^/])$ $1/ [L,R=301]
Edit:
In order to prevent Apache from appending the path relative to the document root, you have to use RewriteBase. So, for instance, in the folder meant to be your application's root, you add the following, which overrides the physical path:
RewriteBase /
This might work:
RewriteRule ^myParentDir/[A-Z0-9_-]+$ %{REQUEST_URI}/ [NS,L,R=301]
However, I'm not sure why you think you need this at all. Just make your CSS / JS / image file rewrite rule look something like this:
RewriteRule ^galleries/([A-Za-z0-9_-]+/)*(css|js|images)/(.*)$ ./$2/$3
and everything should work just fine regardless of whether the browser requests /somedir/galleries/css/whatever.css or /somedir/galleries/cats/css/whatever.css or even /somedir/galleries/cats/persian/calico/css/whatever.css.
Ps. One problem with this rule is that it prevents you from having any galleries names "css", "js" or "images". You might want to fix that by naming those virtual directories something like ".css", ".js" and ".images", or using some other naming scheme that doesn't conflict with valid gallery names.
I'm not sure I complelty understand your problem.
The trailing slash redirection is done automatically on most Apache installation because of mod_dir module (99% of chance you'have the mod_dir module).
You may need to add:
DirectorySlash On
But it's the default value.
So. If you access foo/bar and bar is not a file in foo directory but a subdirectory then mod_dir performs the redirection to foo/bar/.
The only thing I known that could break this is the Option Multiviews which is maybe trying to fin a bar.php, bar.php, bar.a-mime-extension-knwon-by-apache in the directory. So you could try to add:
Option -Multiviews
And remove all rewriteRules. If you do not get this default Apache behavior you'll maybe have to look at mod-rewrite, but it's like using a nuclear bomb to kill a spider. Nuclear bombs may get quite touchy to use well.
EDIT:
For the trailing slash problem with mod-rewrite you can check this documentation howto, stating this should work:
RewriteEngine on
RewriteBase /myParentDir/
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R]

.htaccess mod_rewrite issue

Almost in any project I work on, some issues with .htaccess occur. I usually just find the easiest solution and leave it because I don't have any knowledge or understanding for Apache, servers etc. But this time I thought I would ask you guys.
This is the files and folders in my (simplified) setup:
/modrewrite-test
.htaccess
/config
/inc
/lib
/public_html
.htaccess
/cms
/navigation
index.php
edit.php
/pages
index.php
edit.php
login.php
page.php
The "config", "inc" and "lib" folders are meant to be "hidden" from the root of the website. I try to accomplish this by making a .htaccess-file in the root that redirects the user to "public_html". The .htacess-file contains this:
RewriteEngine On
RewriteRule (.*) public_html/$1
This works perfect. If I type "http://localhost/modrewrite-test/login.php" in my browser, I end up in public_html/login.php which is my intention. So this works fine. The .htaccess-file in "public_html" contains this:
RewriteEngine On
# Root
RewriteRule ^$ page.php [L]
# Login
RewriteRule ^(admin)|(login)\/?$ login.php [L]
# Page (if not a file/directory)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ page.php?url=$1 [L]
The first rewrite just redirects me to public_html/page.php if I try to reach "http://localhost/modrewrite-test/". The next rewrite is just for the convenience of users trying to log in - so if they try to reach "http://localhost/modrewrite-test/admin" or "http://localhost/modrewrite-test/login" they will end up at the login.php-file. The third and last rewrite handles the rest of the requests. If I try to reach "http://localhost/modrewrite-test/bla/bla/bla" it will just redirect me to public_html/page.php (with the 'url' GET-variable set) instead of finding a folder called "la", containing a folder named "bla" and etc.
All of these things work perfect but a minor issues occurs when I for instance try to reach "http://localhost/modrewrite-test/cms/navigation" without a slash at the end of the URL. When I try to reach that page the browser is somehow redirected to "http://localhost/modrewrite-test/public_html/cms/navigation/". The correct page is shown but why does it get redirected and add the "public_html" part in the URL? The desired behavior is that the URL stays intact and that the page public_html/cms/navigation/index.php is shown.
The files and folders in the (simplified) can be found at http://highbars.com/modrewrite-test.zip
I ran into the same problem with "strange" redirects when trying to access existing directory without slash at end. In my case this redirection was done by mod_dir Apache module. To disable redirection I used DirectorySlash directive. Try putting in .htaccess files following string:
DirectorySlash Off
RewriteBase may help. Try this in public_html/.htaccess:
RewriteEngine On
RewriteBase /
Add the following to /modrewrite-test/.htaccess:
RewriteBase /modrewrite-test
Just to be on the safe side, I'd add the same rule also to /modrewrite-test/public_html/.htaccess. I found that having RewriteBase always set prevents a lot of potential problems in the future. This however means that you might need to update the values if you change the URI structure of your site.
Update:
I don't think that this is possible with your current folder structure. I believe that the problem is that existing subdirectories prevent rewrite rules from firing. Note the behavior please - everything works fine while you are working with non-existent files and directories, thanks to these two conditions:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
However if you try to open any index file from an existing subdirectory, you get redirected to .../public_html/.... Since you can properly open /modrewrite-test/cms/navigation/edit.php, I can only assume that the request is being overwritten by some Apache core directive, which adds slashes at end of folder URLs. Notice that everything works fine if you have an ending-slash at each URL (i.e. the Apache core directory does not need to "correct" your URL, thus everything gets rewritten by your own rewrite rules).
Suggested solution (unless anyone can advise better):
Change /modrewrite-test/public_html/.htaccess as follows:
RewriteEngine On
RewriteBase /modrewrite-test
# Page (if not a file/directory)
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ page.php?url=$1 [L]
Then Remove all PHP files from subfolders and use the Front Controller pattern, i.e. route all requests through your main page.php file and do not delegate anything down below.
You can then use the Factory pattern to initiate individual UIs (i.e. navigation/edit.php) directly from your main page.php file based on contents of $_GET['url'] (make sure to properly sanitize that).
Update #2:
This other post on StackOverflow advises on project structure used by Zend Framework - it essentially shows the approach which I suggested above. It is a valuable information asset regardless if you use Zend Framework or not.