example.com./ vs example.com/ create problems with visitor tracking programs - apache

I tried to search this but due to the search terms I can not find an answer anywhere.
My visitor tracking site (I use Piwik) can not decipher between the following issue below.
example.com./ vs example.com/
example.com./ when types in to find a website is of course wrong, somehow others get to my site and every page they visit shows as an external link and this ./ problem.
How can I in .htaccess (in my root of my domain) add a rule to fix this problem and possibly others that are similar to this?

This is not something that is fixable from htaccess, as the example.com. hostname is invalid and any request for that will never reach your server. There's no way to rewrite outgoing requests from the browser to piwik because those requests have nothing to do with your server and thus anything htaccess wouldn't be applicable.
You need to figure out why browsers are sending tracking information to piwik with the hostname example.com.. Just as a wild uneducated guess, it may have to do with some inelegant appending of hostname and badly formed relative URI's like <a href="./my/page.html">. So when the code unintelligently appends them together, you get example.com./my/page.html.

Try using the HTTP_HOST variable, something like the following might work.
RewriteEngine on
Rewritecond %{HTTP_HOST} ^www\.example\.com\.
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Related

I want to set up redirects in htaccess from one domain to another but I've gone wrong somewhere

I have a website, let's say fruit.com, and currently I have a bunch of redirects set up that work just fine, so for example fruit.com/apples/mcintosh will redirect to fruit.com/apples.php?id=mcintosh.
I also used to have some redirects set up to allow me to use a short URL, so fru.it/mcintosh would redirect to fruit.com/apples.php?id=mcintosh.
So far so good. A few years ago, though, my short domain lapsed and I didn't renew. Recently I've purchased it again and I'm interested in getting the same setup back.
Now, though, the redirects from the short domain to the main domain aren't working, although I've used exactly the same code, so I'm at a bit of a loss for what's going wrong.
RewriteCond %{HTTP_HOST} ^www\.fru\.it$
RewriteRule ^([0-9]+)$ "http\:\/\/www\.fruit\.com\/apples.php?id=$1" [R=301,L]
although I've used exactly the same code
But the code you've posted won't redirect the stated example URL fru.it/mcintosh, since the code matches digits only, not letters.
Try the following instead:
RewriteCond %{HTTP_HOST} ^www\.fru\.it
RewriteRule ^(\w+)$ http://www.fruit.com/apples.php?id=$1 [R=301,L]
The \w shorthand character class matches upper and lowercase letters, numbers and underscore.
You don't need all the backslash-escapes in the substitution string.
Also bear in mind that the order of these directives can be important. This rule would likely need to go near the top of the .htaccess file to avoid conflicts.
Test first with a 302 (temporary) redirect to avoid potential caching issues. Clear your browser cache before testing.
Aside:
fruit.com/apples/mcintosh will redirect to fruit.com/apples.php?id=mcintosh
It would seem to make more sense that this would be a (internal) "rewrite", not a (external) "redirect"? The shortcode would then redirect to fruit.com/apples/mcintosh, not fruit.com/apples.php?id=mcintosh?

Apache: doing pattern matching and grouping with a RewriteRule leads to the local path instead of getting the URL component

I'd like to use RewriteRule's pattern to get the path requested and redirect the client elsewhere keeping the path in the resulting redirect.
I thought something like this would do the trick:
RewriteRule. ^(.*)$ http://testserver/test/$1
If the user requests foo, send him to test/foo (don't worry about looping, I put some RewriteCond logic to prevent that).
To my surprise, Apache ends up with something like http://testserver/foo/var/www/html. What it did was the following:
/bar /var/www/html/bar
I raised the log level of mod_rewrite and found out it did the match, but Apache was expand matching the local path of /, which is /var/www/html and using that to redirect the browser, which won't surely work.
I tried using [PT] which I thought would prevent the expansion, but it didn't.
Any idea on how can I prevent it from happening. Any help would be appreciated.
Best

SEO URLs with ColdFusion controller?

quick ref: area = portal type page.
I would like old urls http://domain.com/long/rubbish/url/blah/blah/index.cfm?id=12345
to redirect to http://domain.com/area/12345-short-title
http://domain.com/area/12345-short-title should display the content.
I have worked out so far to do this I could use apache to write all URLs to
http://domain.com/index.cfm/long/rubbish/url/blah/blah/index.cfm?id=12345
and
http://domain.com/index.cfm/area/12345-short-title
The index.cfm will either server the content or apply a permanent redirect, but it will need to get the title and area information from the database first.
There are 50,000 pages on this website. I also have other ideas for subdomain redirects, and permanent subdomains and controlling how they act through the index.cfm.
Infrastructure are keen to do as much through Apache rewrite as possible, we suspect it would be faster. However I'm not sure we have that choice if we need to get the area and title information for each page.
Has anyone got some experience with this that can provide input?
--
Something to note, I'm assuming we'll have to keep all the internal URLs used on the website in the old format. It would be a mega job to change them all.
This means all internal URLs will have to use a permanent redirect every time.
Rather than redirecting both groups of URLs to the same script, why not simply send them to two distinct scripts?
Simply like this:
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^\w+/\d+-[\w-]+$ /content.cfm/$0 [L]
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^.* /redirect.cfm/$0 [L,QSA]
Then, the redirect.cfm can lookup the replacement URL and do the 301 redirect, whilst content.cfm simply serves the content.
(You haven't specified how your CF is setup; you may need to update the Jrun/Tomcat/other config to support /content.cfm/* and /redirect.cfm/* - it'll be done the same as it's done for index.cfm)
For performance reasons, you still want to avoid the database hits for redirecting if you can, and you can do that by generating rewrite rules for each page that performs the 301 redirect on the Apache side. This can be as simple as appending a line to the .htaccess file, like so:
<cfset NewLine = 'RewriteRule #ReEscape(OldUrl)# #NewUrl# [L,QSA,R=301]' />
<cffile action="append" file="./.htaccess" output=#NewLine# />
(Where OldUrl and NewUrl have been looked-up from the database.)
You might also want to investigate using mod_alias redirect instead of mod_rewrite RewriteRule, where the syntax would be Redirect permanent #OldUrl# #NewUrl# - since the OldUrl is an exact path match it would likely be faster.
Note that these rules will need to be checked before the above redirect.cfm redirect is done - if they are in the same .htaccess you can't simply do an append, but if they are in the site's general Apache config files then the .htaccess rules will be checked first.
Also, as per Sharon's comment, you should verify if your Apache will handle 50k rules - whilst I've seen it reported that "thousands" of regex-based Apache rewrites are perfectly fine, there may well be some limit (or at least the need to split across multiple files).
Using apache rewrites would only be faster if they were static rewrites, or if they all followed some rule that you could write in regex within the .htaccess file. If you're having to touch the database for these redirects, then it may not make sense to do it in .htaccess.
Another approach is the one used by most CMSs for handling virtual directories and redirects. An index.cfm file at the root of the site handles all incoming requests and returns the correct pages and pathing. MURA CMS uses this approach (as well as Joomla and most of the others.)
Basically you're using the CGI.path_info variable on an incoming request, searching for it in your DB, and doing a redirect to the new path. As usual, Ben Nadel has a good write-up of how to use this approach: Ben Nadel: Using IIS URL Rewriting And CGI.PATH_INFO With IIS MOD-Rewrite
You can, however, use the .htaccess to remove the "index.cfm" from the url string entirely if you want by redirecting all incoming requests to the root URL with something that looks like this in your .htaccess:
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^([a-zA-Z0-9-]{1,})/([a-zA-Z0-9/-]+)$ /$1/index.cfm/$2 [PT]
Basically this would redirect something like http://www.yourdomain.com/your-new-url/ to http://www.yourdomain.com/index.cfm/your-new-url/ where it could be processed as described by the blog post above. The user would never see the index.cfm.

Apache Mod_Rewrite Scenario

I was wondering how I would do a complex mod_rewrite. Below is basically how I want it done.
If the user goes to:
-http://files.stuff.example.txt.r.site.com/doc.txt
Then the server would rewrite the url to:
-http://r.site.com/index.php?type=txt&username=example&dir=files.stuff&file=doc.txt
Better picture:
-http://[dir3-dir2-dir1].[username].[type].r.site.com/[file]
Rewrites to:
-http://r.site.com/index.php?type=[type]&username=[username]&dir=[dir3.dir2.dir1]&file=[file]
I created a colour coded image to clearly show what I mean:
(can't embed images) look here:
http://i.stack.imgur.com/24H8j.png
The first subdomains are a directory structure (shown in red), so the amount of subdomains can change.
I hope someone can provide me with a solution. Either using mod_rewrite or maybe another method. Thanks.
Provided that you have configured your DNS so that requested URL hits server where your application is (maybe wildcard DNS on your domain: *.site.com -> 123.45.67.89, if supported by your DNS server/hosting), you can create more or less complicated rewrite rule. I'd do it this way:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^(.*).r.site.com$
RewriteCond $1 !^index.php
RewriteRule (.*) index.php?subdomain_part=%1&file_part=$1
So in index.php you get $_GET['subdomain_part'] and $_GET['file_part'], which you can parse further to extract parameters according to your convention.
Of course, you can write more complicated regex to get URL parts extracted by mod_rewrite (I'm not such an regex expert myself). However doing parsing in PHP would be much easier and you can do better error handling (e.g. if URL is not formed properly).

Apache mod_rewrite not doing anything (?)

I'm having some trouble with Apache's mod_rewrite. One of the things I'm trying to get it to do is hide some of my implementation details, so that, for example, the user sees the URL http://www.mysite.com/login but Apache responds with the page at http://www.mysite.com/doc_root/login.php instead (preferably without showing the user that it's a PHP file or the directory structure). Here's what I have in my .htaccess file:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?mysite.com*
RewriteRule ^/(\w+) /doc_root/$1.php [L]
#Redirect http://www.mysite.com to the login page
RewriteRule ^/?$ https://www.mysite.com/doc_root/login.php
But when I go to http://www.mysite.com/login, I get a 404 error even though the page exists. I clearly don't have a great understanding of how the mod_rewrite conditionals and rules work, so can anyone please tell me what I'm doing wrong? Thanks.
Take doc_root out of all the stuff you have it in. That will give you the result you're asking for. However I'm not sure if it's desired or not. How are you going to force someone to login if they manually type http://www.mysite.com/index.php?
Also if you're trying to force all traffic to SSL it's better to use a second VirtualHost and Redirect instead of mod_rewrite. Those are all questions probably better suited for ServerFault
Unless your site has a bunch of different domain names, and you only want mysite.com to do the rewriting, you don't need the RewriteCond. (Potential problem. Apache likes to dick around with the domain name unless you set UseCanonicalName off. If the name isn't what it's expecting, the rewrite won't happen.)
In RewriteCond (and RewriteRule) patterns, . matches any character. Add a backslash before them. (Minor bug. Shouldn't cause rewrites to fail, but they would match stuff like "mysite-com" as well.)
mod_rewrite is actually a URL-to-filename filter. Though it is often used to rewrite URLs to other URLs, sometimes it will misbehave if what you're rewriting to is a URL and it can't tell. (Especially if what it's rewriting to would be an alias, or would otherwise not translate directly to a real filename.) If you add a [PT] flag onto your rule, though, it will consider the rewritten thing a URL and pass it along to the other filters (including the ones that turn URLs into filenames).
Do you really need "/doc_root"? The document root should already be set up in Apache using the DocumentRoot directive, and shouldn't need to be part of the URL unless you have multiple apps on the same domain (in which case it's the app root; the document root doesn't change).
UPDATE:
Another thing i just thought about: Rewrite rules work differently in .htaccess files. Apache likes to strip off the leading slash. So you will probably want to get rid of the first slash in your patterns, or at least make it optional (^/?login instead of ^/login).
^/?(\w+) will match /doc_root/login.php, and cause a rewrite to /doc_root/doc_root.php. You should probably have a $ at the end of your pattern.