Apache Mod_Rewrite Scenario - apache

I was wondering how I would do a complex mod_rewrite. Below is basically how I want it done.
If the user goes to:
-http://files.stuff.example.txt.r.site.com/doc.txt
Then the server would rewrite the url to:
-http://r.site.com/index.php?type=txt&username=example&dir=files.stuff&file=doc.txt
Better picture:
-http://[dir3-dir2-dir1].[username].[type].r.site.com/[file]
Rewrites to:
-http://r.site.com/index.php?type=[type]&username=[username]&dir=[dir3.dir2.dir1]&file=[file]
I created a colour coded image to clearly show what I mean:
(can't embed images) look here:
http://i.stack.imgur.com/24H8j.png
The first subdomains are a directory structure (shown in red), so the amount of subdomains can change.
I hope someone can provide me with a solution. Either using mod_rewrite or maybe another method. Thanks.

Provided that you have configured your DNS so that requested URL hits server where your application is (maybe wildcard DNS on your domain: *.site.com -> 123.45.67.89, if supported by your DNS server/hosting), you can create more or less complicated rewrite rule. I'd do it this way:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(.*).r.site.com$
RewriteCond $1 !^index.php
RewriteRule (.*) index.php?subdomain_part=%1&file_part=$1
So in index.php you get $_GET['subdomain_part'] and $_GET['file_part'], which you can parse further to extract parameters according to your convention.
Of course, you can write more complicated regex to get URL parts extracted by mod_rewrite (I'm not such an regex expert myself). However doing parsing in PHP would be much easier and you can do better error handling (e.g. if URL is not formed properly).

Related

Apache 301 redirect with get parameters

I am trying to do a 301 redirect with lightspeed webserver htaccess with no luck.
I need to do a url to url redirect without any related parameters.
for example:
from: http://www.example.com/?cat=123
to: http://www.example.com/some_url
I have tried:
RewriteRule http://www.example.com/?cat=123 http://www.example.com/some_url/ [R=301,L,NC]
Any help will be appreciated.
Thanks for adding your code to your question. Once more we see how important that is:
your issue is that a RewriteRule does not operate on URLs, but on paths. So you need something like that instead:
RewriteEngine on
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
From your question it is not clear if you want to ignore any GET parameters or if you only want to redirect if certain parameters are set. So here is a variant that will only get applied if some parameter is actually set in the request:
RewriteEngine on
RewriteCond %{QUERY_STRING} (?:^|&)cat=123(?:&|$)
RewriteRule ^/?$ /some_url/ [R=301,L,NC,QSD]
Another thing that does not really get clear is if you want all URLs below http://www.example.com/ (so below the path /) to be rewritten, or only that exact URL. If you want to keep any potential further path component of a request and still rewrite (for example http://www.example.com/foo => http://www.example.com/some_url/foo), then you need to add a capture in your regular expression and reuse the captured path components:
RewriteEngine on
RewriteRule ^/?(.*)$ /some_url/$1 [R=301,L,NC,QSD]
For either of this to work you need to have the interpretation of .htaccess style files enabled by means of the AllowOverride command. See the official documentation of the rewriting module for details. And you have to take care that that -htaccess style file is actually readable by the http server process and that it is located right inside the http hosts DOCUMENT_ROOT folder in the local file system.
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using .htaccess style files. Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).

apache redirect / Rewrite-Engine

Is the following possible?
A user requests the url http://example1.com/example.php and the apache opens http:// example1.com/example.php?id=1
A user requests the url http://example2.com/example.php and the apache opens http:// example2.com/example.php?id=2
But the user should not see the id in his browser adress bar (the user should only see http://example1.com/example.php or http://example2.com/example.php).
You can say the id is invisible for the user but transfered to the example.php.
How can I implement this?
Is that the correct solution?
RewriteEngine On
RewriteRule ^/example.php http://example1.com/example.php$1 [P]
ProxyPassReverse /example.php?id=1 http:// example1.com/example.php
RewriteEngine On
RewriteRule ^/example.php http://example2.com/example.php$1 [P]
ProxyPassReverse /example.php?id=2 http:// example2.com/example.php
You have to understand several concept.
Once the server received the user requested url he can do several things
Take the requested path from the url and use it without modifications. That's the default solution
Map the requested path to any other physical path, things that can be done via Alias, AliasMatch or RewriteRules.
Map the requested path to another website while hiding the fact thtat another website is requested. That's the proxy solution, thta mod_proxy or mod_rewrite could handle (but you do not need that)
Redirect the user to another path, sending him a new url to use, making another client/server roundtrip, with Redirect instructions or mod_rewrite (the swiss knife). But you do no need that.
So you want a server-side only remapping of the requested path.
Let,s say we will use mod rewrite to make this mapping. If you check all tags available in RewriteRule (summary here) the interesting ones are:
passthrough|PT : Forces the resulting URI to be passed back to the URL mapping engine for processing of other URI-to-filename translators, such as Alias or Redirect.
qsappend|QSA: Appends any query string from the original request URL to any query string created in the rewrite target
last|L: Stop the rewriting process immediately and don't apply any more rules. Especially note caveats for per-directory and .htaccess context (see also the END flag)
nocase|NC: Makes the pattern comparison case-insensitive.
details on the PT flag shows that:
The target (or substitution string) in a RewriteRule is assumed to be a file path, by default.
Well, that,s maybe enough for you. But using PT is a good thing, if you have other apache configusation elements you should try to let them apply after mod_rewrite job.
So... assuming you may need to handle some query strings arguments and that this id argument is based on the domain name in the request, and that only the example.php script needs this behavior; you should start your research with such rules (untested):
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example1.com$ [nocase]
RewriteRule ^example\.php$ example.php?id=1 [passthrough,qsappend,last]
RewriteCond %{HTTP_HOST} ^example2.com$ [nocase]
RewriteRule ^example\.php$ example.php?id=2 [passthrough,qsappend,last]

example.com./ vs example.com/ create problems with visitor tracking programs

I tried to search this but due to the search terms I can not find an answer anywhere.
My visitor tracking site (I use Piwik) can not decipher between the following issue below.
example.com./ vs example.com/
example.com./ when types in to find a website is of course wrong, somehow others get to my site and every page they visit shows as an external link and this ./ problem.
How can I in .htaccess (in my root of my domain) add a rule to fix this problem and possibly others that are similar to this?
This is not something that is fixable from htaccess, as the example.com. hostname is invalid and any request for that will never reach your server. There's no way to rewrite outgoing requests from the browser to piwik because those requests have nothing to do with your server and thus anything htaccess wouldn't be applicable.
You need to figure out why browsers are sending tracking information to piwik with the hostname example.com.. Just as a wild uneducated guess, it may have to do with some inelegant appending of hostname and badly formed relative URI's like <a href="./my/page.html">. So when the code unintelligently appends them together, you get example.com./my/page.html.
Try using the HTTP_HOST variable, something like the following might work.
RewriteEngine on
Rewritecond %{HTTP_HOST} ^www\.example\.com\.
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

SEO URLs with ColdFusion controller?

quick ref: area = portal type page.
I would like old urls http://domain.com/long/rubbish/url/blah/blah/index.cfm?id=12345
to redirect to http://domain.com/area/12345-short-title
http://domain.com/area/12345-short-title should display the content.
I have worked out so far to do this I could use apache to write all URLs to
http://domain.com/index.cfm/long/rubbish/url/blah/blah/index.cfm?id=12345
and
http://domain.com/index.cfm/area/12345-short-title
The index.cfm will either server the content or apply a permanent redirect, but it will need to get the title and area information from the database first.
There are 50,000 pages on this website. I also have other ideas for subdomain redirects, and permanent subdomains and controlling how they act through the index.cfm.
Infrastructure are keen to do as much through Apache rewrite as possible, we suspect it would be faster. However I'm not sure we have that choice if we need to get the area and title information for each page.
Has anyone got some experience with this that can provide input?
--
Something to note, I'm assuming we'll have to keep all the internal URLs used on the website in the old format. It would be a mega job to change them all.
This means all internal URLs will have to use a permanent redirect every time.
Rather than redirecting both groups of URLs to the same script, why not simply send them to two distinct scripts?
Simply like this:
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^\w+/\d+-[\w-]+$ /content.cfm/$0 [L]
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^.* /redirect.cfm/$0 [L,QSA]
Then, the redirect.cfm can lookup the replacement URL and do the 301 redirect, whilst content.cfm simply serves the content.
(You haven't specified how your CF is setup; you may need to update the Jrun/Tomcat/other config to support /content.cfm/* and /redirect.cfm/* - it'll be done the same as it's done for index.cfm)
For performance reasons, you still want to avoid the database hits for redirecting if you can, and you can do that by generating rewrite rules for each page that performs the 301 redirect on the Apache side. This can be as simple as appending a line to the .htaccess file, like so:
<cfset NewLine = 'RewriteRule #ReEscape(OldUrl)# #NewUrl# [L,QSA,R=301]' />
<cffile action="append" file="./.htaccess" output=#NewLine# />
(Where OldUrl and NewUrl have been looked-up from the database.)
You might also want to investigate using mod_alias redirect instead of mod_rewrite RewriteRule, where the syntax would be Redirect permanent #OldUrl# #NewUrl# - since the OldUrl is an exact path match it would likely be faster.
Note that these rules will need to be checked before the above redirect.cfm redirect is done - if they are in the same .htaccess you can't simply do an append, but if they are in the site's general Apache config files then the .htaccess rules will be checked first.
Also, as per Sharon's comment, you should verify if your Apache will handle 50k rules - whilst I've seen it reported that "thousands" of regex-based Apache rewrites are perfectly fine, there may well be some limit (or at least the need to split across multiple files).
Using apache rewrites would only be faster if they were static rewrites, or if they all followed some rule that you could write in regex within the .htaccess file. If you're having to touch the database for these redirects, then it may not make sense to do it in .htaccess.
Another approach is the one used by most CMSs for handling virtual directories and redirects. An index.cfm file at the root of the site handles all incoming requests and returns the correct pages and pathing. MURA CMS uses this approach (as well as Joomla and most of the others.)
Basically you're using the CGI.path_info variable on an incoming request, searching for it in your DB, and doing a redirect to the new path. As usual, Ben Nadel has a good write-up of how to use this approach: Ben Nadel: Using IIS URL Rewriting And CGI.PATH_INFO With IIS MOD-Rewrite
You can, however, use the .htaccess to remove the "index.cfm" from the url string entirely if you want by redirecting all incoming requests to the root URL with something that looks like this in your .htaccess:
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^([a-zA-Z0-9-]{1,})/([a-zA-Z0-9/-]+)$ /$1/index.cfm/$2 [PT]
Basically this would redirect something like http://www.yourdomain.com/your-new-url/ to http://www.yourdomain.com/index.cfm/your-new-url/ where it could be processed as described by the blog post above. The user would never see the index.cfm.

Apache mod_rewrite not doing anything (?)

I'm having some trouble with Apache's mod_rewrite. One of the things I'm trying to get it to do is hide some of my implementation details, so that, for example, the user sees the URL http://www.mysite.com/login but Apache responds with the page at http://www.mysite.com/doc_root/login.php instead (preferably without showing the user that it's a PHP file or the directory structure). Here's what I have in my .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?mysite.com*
RewriteRule ^/(\w+) /doc_root/$1.php [L]
#Redirect http://www.mysite.com to the login page
RewriteRule ^/?$ https://www.mysite.com/doc_root/login.php
But when I go to http://www.mysite.com/login, I get a 404 error even though the page exists. I clearly don't have a great understanding of how the mod_rewrite conditionals and rules work, so can anyone please tell me what I'm doing wrong? Thanks.
Take doc_root out of all the stuff you have it in. That will give you the result you're asking for. However I'm not sure if it's desired or not. How are you going to force someone to login if they manually type http://www.mysite.com/index.php?
Also if you're trying to force all traffic to SSL it's better to use a second VirtualHost and Redirect instead of mod_rewrite. Those are all questions probably better suited for ServerFault
Unless your site has a bunch of different domain names, and you only want mysite.com to do the rewriting, you don't need the RewriteCond. (Potential problem. Apache likes to dick around with the domain name unless you set UseCanonicalName off. If the name isn't what it's expecting, the rewrite won't happen.)
In RewriteCond (and RewriteRule) patterns, . matches any character. Add a backslash before them. (Minor bug. Shouldn't cause rewrites to fail, but they would match stuff like "mysite-com" as well.)
mod_rewrite is actually a URL-to-filename filter. Though it is often used to rewrite URLs to other URLs, sometimes it will misbehave if what you're rewriting to is a URL and it can't tell. (Especially if what it's rewriting to would be an alias, or would otherwise not translate directly to a real filename.) If you add a [PT] flag onto your rule, though, it will consider the rewritten thing a URL and pass it along to the other filters (including the ones that turn URLs into filenames).
Do you really need "/doc_root"? The document root should already be set up in Apache using the DocumentRoot directive, and shouldn't need to be part of the URL unless you have multiple apps on the same domain (in which case it's the app root; the document root doesn't change).
UPDATE:
Another thing i just thought about: Rewrite rules work differently in .htaccess files. Apache likes to strip off the leading slash. So you will probably want to get rid of the first slash in your patterns, or at least make it optional (^/?login instead of ^/login).
^/?(\w+) will match /doc_root/login.php, and cause a rewrite to /doc_root/doc_root.php. You should probably have a $ at the end of your pattern.