Blocking IPs with htaccess and log bloat - apache

I set a 'deny from' in my htaccess to block certain spam bots from parsing my site. While using the code below, I noticed in my log file that I'm getting a lot of 'client denied by server configuration' and it's cluttering up the log files when the bot starts its scan. Any thoughts?
Thanks,
Steve
<Files *>
order allow,deny
allow from all
deny from 123.45.67.8
</Files>
I ended up going with the following:
RewriteCond %{REMOTE_ADDR} 123.4.3.4.5
RewriteRule (.*) - [F,L]

Take a look at the conditional logging here - I think that will provide everything you need:
http://httpd.apache.org/docs/2.2/logs.html
Also - if you can identify that the various bots are always coming from a specific IP address, you can block them in your hosts.allow/deny files VIA IP address or automatically using something like blockhosts or possibly mod_evasive, that way apache will never see the requests to log them.
-sean
UPDATE:
Are you identifying the ip addresses manually then adding them to your htaccess? that sounds painful. If you really want to do it that way I would suggest you block the ip addresses at the firewall with a drop rule OR as above in hosts allow/deny.
SPURIOUS BROKEN RECORD UPDATE:
Take a look at blockhosts, it can block ip addresses based on their 'behavior' & will eliminate the need for you to manually prune them out every day.

You can get the log file to be sent to a program (aka a script).
Perhaps implement a script than just gives a periodic summary?). The rest to log file?

Related

Virtual Hosts (Apache) with mod_rewrite issues

I am trying to fix this whole day without success, so I hope someone might be able to help me. I have an app at http://localhost/, and it uses Pylons for the app I am hosting. In addition to that, I need to host a PHP/MySQL site, so I had to use Apache too.
My current setup is that I use haproxy with this config for the Apache backend:
backend apache
mode http
timeout connect 4000
timeout server 30000
timeout queue 60000
balance roundrobin
server app02-8002 localhost:8002 maxconn 1000
This is triggered by this:
acl image url_sub images
use_backend apache if image
So, when I open my IP/images, it will trigger that and open Apache then, with port 8002.
For Apache, I created virtual hosts, and this is the "image" one:
<VirtualHost *:8002>
ServerAdmin my#email.com
ServerName image
ServerAlias image
DocumentRoot /srv/www/image/public_html/
ErrorLog /srv/www/image/logs/error.log
CustomLog /srv/www/image/logs/access.log combined
</VirtualHost>
So, that all works nicely, when I type IP/images it open the /srv/www/image/public_html. But then the issues come. As I am using the image uploading script, it involves a lot of rewriting, so I had to enable that mod. This is the .htaccess which is located in the public_html/images folder (I somehow had to make this subfolder too, to "match" the URL with the actual location in the public_html.
SetEnv PHP_VER 5_3
RewriteEngine On
# You must define your installation directory and uncomment the line :
RewriteBase /images/
RewriteRule ^([a-zA-Z]+)\.(jpg|gif|png|wbmp)$ controller/Resizer.php?m=original&a=$1&e=$2 [L]
RewriteRule ^(icon|small|medium|square)\/([a-zA-Z]+)\.(jpg|gif|png|wbmp)$ controller/Resizer.php?m=$1&a=$2&e=$3 [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule (.*) application.php?request=$1 [L,QSA]
So, basically, this is somethow not working. I suppose there is a conflict between this virtual host, subdirectory, rewriting or something, but I can't seem to isolate it.
It is a bit confusing that when I open the IP/images/xxxx.jpg it opens the image, which is located in the public_html/images/upload/original folder, so the rewrite is working. The the other rules seem not to be working. All of the thumbnails and smaller versions are not rendering properly (with the icon, small, medium, square), so that makes the site quite unsusable.
Here is the link of the development server: http://localhost/images/
Thanks in advance for your time and help!
The first thing you should do is determine whether mod_rewrite is in fact part of the problem by accessing one of the failing URLs directly via its rewritten form and verifying that you get the expected result.
Indeed, the problem might simply be that the PHP script for the smaller resolutions "doesn't work" while it does for the original size ones. The first of the following URLs nicely served me an image; the second one is supposed to give me a smaller version of the same image, but served me an HTTP 500:
http://106.186.21.176/images/controller/Resizer.php?m=original&a=q&e=png
http://106.186.21.176/images/controller/Resizer.php?m=small&a=q&e=png
I got the same result (HTTP 500) for any of the smaller-size format names mentioned in your post, which matches your problem description.
Once you've verified that the script works as expected, it's likely that the problem is with mod_rewrite. If so, enable rewrite logging: use the RewriteLog directive to activate it, and RewriteLogLevel to control its verbosity. Especially at the higher log levels, it can give you very detailed information about exactly what it's doing. This should make the problem readily apparent from the logs.
Also, if possible, try to avoid configuring mod_rewrite rules in .htaccess files -- move them into your main server config file instead. The reason is explained on Apache mod_rewrite Technical Details, section "API phases":
Unbelievably mod_rewrite provides URL manipulations in per-directory context, i.e., within .htaccess files, although these are reached a very long time after the URLs have been translated to filenames. It has to be this way because .htaccess files live in the filesystem, so processing has already reached this stage. In other words: According to the API phases at this time it is too late for any URL manipulations. To overcome this chicken and egg problem mod_rewrite uses a trick: When you manipulate a URL/filename in per-directory context mod_rewrite first rewrites the filename back to its corresponding URL (which is usually impossible, but see the RewriteBase directive below for the trick to achieve this) and then initiates a new internal sub-request with the new URL. This restarts processing of the API phases.
Again mod_rewrite tries hard to make this complicated step totally transparent to the user, but you should remember here: While URL manipulations in per-server context are really fast and efficient, per-directory rewrites are slow and inefficient due to this chicken and egg problem. But on the other hand this is the only way mod_rewrite can provide (locally restricted) URL manipulations to the average user.
In general, not using .htaccess at all has the added advantage that you can tell Apache to not even bother and disable the functionality all together, which save Apache from having to scan each directory level it serves from for the .htaccess files.

secure underlaying directory with htaccess

I have created an axtra ftp account for someone else, so he can upload files.(tournament results, about 20/30 htm files and images)
I am also very paranoid, so in case he upload "possible dangerous" files, i do not want those files to be accessible via an http request. With the help of PHP I want to grab the content of those files. (I do not expect troubles with that yet)
Problem:
My hoster does not allow extra ftp accounts have access outside the public_html.
So i thought htacces should solve my problem. Just by deny from all rule.
But with ftp acces this htaccess file can be deleted or changed.
So i tried to add the following code in my main htacces file in the root of my site:
<Directory "/home/xxxx.nl/public_html/xxxxxxxx.nl/onzetoernooien/swissmaster_ftp">
deny from all
</Directory>
My site hung with an internal server error.
I have no access to the httpd file.
My idea was to use an htacces file above this directory.
If the absolute path was incorrect, i could use some kind of wildcard, like *swissmaster?
I have searched on the Apache website, but i get lost in the overwhelming amount of information.
Thanks in advance for any help!
Unfortunately you can't use a <Directory> section in .htaccess, only in the server configuration file. That causes the server error (check your error logs and you'll see the error message). We can't secure a subdirectory with a <Filesmatch "subdir/.*$"> either, as FilesMatch examines only the filename part of the requested URI.
You can, however, use mod_rewrite, along these lines:
RewriteEngine on
RewriteRule ^subdir.*$ - [NC,F]
If the requested URI matches the regex pattern subdir.* (so "subdir" followed by anything else; you may need to tweak the pattern, as it happily catches subdir_new/something.txt too -- I'm sure you get the idea), then mod_rewrite's F flag will return a 403 Forbidden status (the NC stands for No-Case, making the pattern case-insensitive).

Tips for debugging .htaccess rewrite rules

Many posters have problems debugging their RewriteRule and RewriteCond statements within their .htaccess files. Most of these are using a shared hosting service and therefore don't have access to the root server configuration. They cannot avoid using .htaccess files for rewriting and cannot enable a RewriteLogLevel" as many respondents suggest. Also there are many .htaccess-specific pitfalls and constraints are aren't covered well. Setting up a local test LAMP stack involves too much of a learning curve for most.
So my Q here is how would we recommend that they debug their rules themselves. I provide a few suggestions below. Other suggestions would be appreciated.
Understand that the mod_rewrite engine cycles through .htaccess files. The engine runs this loop:
do
execute server and vhost rewrites (in the Apache Virtual Host Config)
find the lowest "Per Dir" .htaccess file on the file path with rewrites enabled
if found(.htaccess)
execute .htaccess rewrites (in the user's directory)
while rewrite occurred
So your rules will get executed repeatedly and if you change the URI path then it may end up executing other .htaccessfiles if they exist. So make sure that you terminate this loop, if necessary by adding extra RewriteCond to stop rules firing. Also delete any lower level .htaccess rewrite rulesets unless explicitly intent to use multi-level rulesets.
Make sure that the syntax of each Regexp is correct by testing against a set of test patterns to make sure that is a valid syntax and does what you intend with a fully range of test URIs. See answer below for more details.
Build up your rules incrementally in a test directory. You can make use of the "execute the deepest .htaccess file on the path feature" to set up a separate test directory (tree) and debug rulesets here without screwing up your main rules and stopping your site working. You have to add them one at a time because this is the only way to localise failures to individual rules.
Use a dummy script stub to dump out server and environment variables. (See Listing 2)If your app uses, say, blog/index.php then you can copy this into test/blog/index.php and use it to test out your blog rules in the test subdirectory. You can also use environment variables to make sure that the rewrite engine in interpreting substitution strings correctly, e.g.
RewriteRule ^(.*) - [E=TEST0:%{DOCUMENT_ROOT}/blog/html_cache/$1.html]
and look for these REDIRECT_* variables in the phpinfo dump. BTW, I used this one and discovered on my site that I had to use %{ENV:DOCUMENT_ROOT_REAL} instead. In the case of redirector looping REDIRECT_REDIRECT_* variables list the previous pass. Etc..
Make sure that you don't get bitten by your browser caching incorrect 301 redirects. See answer below. My thanks to Ulrich Palha for this.
The rewrite engine seems sensitive to cascaded rules within an .htaccess context, (that is where a RewriteRule results in a substitution and this falls though to further rules), as I found bugs with internal sub-requests (1), and incorrect PATH_INFO processing which can often be prevents by use of the [NS], [L] and [PT] flags.
Any more comment or suggestions?
Listing 1 -- phpinfo
<?php phpinfo(INFO_ENVIRONMENT|INFO_VARIABLES);
Here are a few additional tips on testing rules that may ease the debugging for users on shared hosting
1. Use a Fake-user agent
When testing a new rule, add a condition to only execute it with a fake user-agent that you will use for your requests. This way it will not affect anyone else on your site.
e.g
#protect with a fake user agent
RewriteCond %{HTTP_USER_AGENT} ^my-fake-user-agent$
#Here is the actual rule I am testing
RewriteCond %{HTTP_HOST} !^www\.domain\.com$ [NC]
RewriteRule ^ http://www.domain.com%{REQUEST_URI} [L,R=302]
If you are using Firefox, you can use the User Agent Switcher to create the fake user agent string and test.
2. Do not use 301 until you are done testing
I have seen so many posts where people are still testing their rules and they are using 301's. DON'T.
If you are not using suggestion 1 on your site, not only you, but anyone visiting your site at the time will be affected by the 301.
Remember that they are permanent, and aggressively cached by your browser.
Use a 302 instead till you are sure, then change it to a 301.
3. Remember that 301's are aggressively cached in your browser
If your rule does not work and it looks right to you, and you were not using suggestions 1 and 2, then re-test after clearing your browser cache or while in private browsing.
4. Use a HTTP Capture tool
Use a HTTP capture tool like Fiddler to see the actual HTTP traffic between your browser and the server.
While others might say that your site does not look right, you could instead see and report that all of the images, css and js are returning 404 errors, quickly narrowing down the problem.
While others will report that you started at URL A and ended at URL C, you will be able to see that they started at URL A, were 302 redirected to URL B and 301 redirected to URL C. Even if URL C was the ultimate goal, you will know that this is bad for SEO and needs to be fixed.
You will be able to see cache headers that were set on the server side, replay requests, modify request headers to test ....
Online .htaccess rewrite testing
I found this Googling for RegEx help, it saved me a lot of time from having to upload new .htaccess files every time I make a small modification.
from the site:
htaccess tester
To test your htaccess rewrite rules, simply fill in the url that you're applying the rules to, place the contents of your htaccess on the larger input area and press "Check Now" button.
Don't forget that in .htaccess files it is a relative URL that is matched.
In a .htaccess file the following RewriteRule will never match:
RewriteRule ^/(.*) /something/$s
Set environment variables and use headers to receive them:
You can create new environment variables with RewriteRule lines, as mentioned by OP:
RewriteRule ^(.*) - [E=TEST0:%{DOCUMENT_ROOT}/blog/html_cache/$1.html]
But if you can't get a server-side script to work, how can you then read this environment variable? One solution is to set a header:
Header set TEST_FOOBAR "%{REDIRECT_TEST0}e"
The value accepts format specifiers, including the %{NAME}e specifier for environment variables (don't forget the lowercase e). Sometimes, you'll need to add the REDIRECT_ prefix, but I haven't worked out when the prefix gets added and when it doesn't.
Make sure that the syntax of each Regexp is correct
by testing against a set of test patterns to make sure that is a valid syntax and does what you intend with a fully range of test URIs.
See regexpCheck.php below for a simple script that you can add to a private/test directory in your site to help you do this. I've kept this brief rather than pretty. Just past this into a file regexpCheck.php in a test directory to use it on your website. This will help you build up any regexp and test it against a list of test cases as you do so. I am using the PHP PCRE engine here, but having had a look at the Apache source, this is basically identical to the one used in Apache. There are many HowTos and tutorials which provide templates and can help you build your regexp skills.
Listing 1 -- regexpCheck.php
<html><head><title>Regexp checker</title></head><body>
<?php
$a_pattern= isset($_POST['pattern']) ? $_POST['pattern'] : "";
$a_ntests = isset($_POST['ntests']) ? $_POST['ntests'] : 1;
$a_test = isset($_POST['test']) ? $_POST['test'] : array();
$res = array(); $maxM=-1;
foreach($a_test as $t ){
$rtn = #preg_match('#'.$a_pattern.'#',$t,$m);
if($rtn == 1){
$maxM=max($maxM,count($m));
$res[]=array_merge( array('matched'), $m );
} else {
$res[]=array(($rtn === FALSE ? 'invalid' : 'non-matched'));
}
}
?> <p> </p>
<form method="post" action="<?php echo $_SERVER['SCRIPT_NAME'];?>">
<label for="pl">Regexp Pattern: </label>
<input id="p" name="pattern" size="50" value="<?php echo htmlentities($a_pattern,ENT_QUOTES,"UTF-8");;?>" />
<label for="n"> Number of test vectors: </label>
<input id="n" name="ntests" size="3" value="<?php echo $a_ntests;?>"/>
<input type="submit" name="go" value="OK"/><hr/><p> </p>
<table><thead><tr><td><b>Test Vector</b></td><td> <b>Result</b></td>
<?php
for ( $i=0; $i<$maxM; $i++ ) echo "<td> <b>\$$i</b></td>";
echo "</tr><tbody>\n";
for( $i=0; $i<$a_ntests; $i++ ){
echo '<tr><td> <input name="test[]" value="',
htmlentities($a_test[$i], ENT_QUOTES,"UTF-8"),'" /></td>';
foreach ($res[$i] as $v) { echo '<td> ',htmlentities($v, ENT_QUOTES,"UTF-8"),' </td>';}
echo "</tr>\n";
}
?> </table></form></body></html>
One from a couple of hours that I wasted:
If you've applied all these tips and are only going on 500 errors because you don't have access to the server error log, maybe the problem isn't in the .htaccess but in the files it redirects to.
After I had fixed my .htaccess-problem I spent two more hours trying to fix it some more, even though I simply had forgotten about some permissions.
Make sure you use the percent sign in front of variables, not the dollar sign.
It's %{HTTP_HOST}, not ${HTTP_HOST}. There will be nothing in the error_log, there will be no Internal Server Errors, your regexp is still correct, the rule will just not match. This is really hideous if you work with django / genshi templates a lot and have ${} for variable substitution in muscle memory.
If you're creating redirections, test with curl to avoid browser caching issues.
Use -I to fetch http headers only.
Use -L to follow all redirections.
Regarding 4., you still need to ensure that your "dummy script stub" is actually the target URL after all the rewriting is done, or you won't see anything!
A similar/related trick (see this question) is to insert a temporary rule such as:
RewriteRule (.*) /show.php?url=$1 [END]
Where show.php is some very simple script that just displays its $_GET parameters (you can display environment variables too, if you want).
This will stop the rewriting at the point you insert it into the ruleset, rather like a breakpoint in a debugger.
If you're using Apache <2.3.9, you'll need to use [L] rather than [END], and you may then need to add:
RewriteRule ^show.php$ - [L]
At the very top of your ruleset, if the URL /show.php is itself being rewritten.
Some mistakes I observed happens when writing .htaccess
Using of ^(.*)$ repetitively in multiple rules, using ^(.*)$ causes other rules to be impotent in most cases, because it matches all of the url in single hit.
So, if we are using rule for this url sapmle/url it will also consume this url sapmle/url/string.
[L] flag should be used to ensure our rule has done processing.
Should know about:
Difference in %n and $n
%n is matched during %{RewriteCond} part and $n is matches on %{RewriteRule} part.
Working of RewriteBase
The RewriteBase directive specifies the URL prefix to be used for
per-directory (htaccess) RewriteRule directives that substitute a
relative path.
This directive is required when you use a relative path in a
substitution in per-directory (htaccess) context unless any of the
following conditions are true:
The original request, and the substitution, are underneath the
DocumentRoot (as opposed to reachable by other means, such as Alias).
The filesystem path to the directory containing the RewriteRule,
suffixed by the relative substitution is also valid as a URL path on
the server (this is rare). In Apache HTTP Server 2.4.16 and later,
this directive may be omitted when the request is mapped via Alias or
mod_userdir.
I found this question while trying to debug my mod_rewrite issues, and it definitely has some helpful advice. But in the end the most important thing is to make sure you have your regex syntax correct. Due to problems with my own RE syntax, installing the regexpCheck.php script was not a viable option.
But since Apache uses Perl-Compatible Regular Expressions (PCRE)s, any tool which helps writing PCREs should help. I've used RegexPlanet's tool with Java and Javascript REs in the past, and was happy to find that they support Perl as well.
Just type in your regular expression and one or more example URLs, and it will tell you if the regex matches (a "1" in the "~=" column) and if applicable, any matching groups (the numbers in the "split" column will correspond to the numbers Apache expects, e.g. $1, $2 etc.) for each URL. They claim PCRE support is "in beta", but it was just what I needed to solve my syntax problems.
http://www.regexplanet.com/advanced/perl/index.html
I'd have simply added a comment to an existing answer but my reputation isn't yet at that level. Hope this helps someone.
In case you are not working in an standard shared hosting environment, but in one to which you have administration access (maybe your local test environment), make sure that use of .htaccess and mod_rewrite are enabled. They are disabled in a default Apache installation. And in that case, no action configured in your .htaccess file works, even if the regexes are perfectly valid.
To enable the use of .htaccess:
Find file apache2.conf, on Debian/Ubuntu this is in /etc/apache2, and within the file the section
<Directory /var/www/>
Options Indexes FollowSymLinks
AllowOverride None
Require all granted
</Directory>
and change the line AllowOverride None to AllowOverride All.
To enable module mod_rewrite:
On Debian/Ubuntu, execute
sudo a2enmod rewrite
By the way, to disable a module, you would use a2dismode instead of a2enmode.
After you did the above configuration changes, restart Apache for them to take effect:
sudo systemctl restart apache2
If you're planning on writing more than just one line of rules in .htacesss,
don't even think about trying one of those hot-fix methods to debug it.
I have wasted days setting multiple rules, without feedback from LOGs, only to finally give up.
I got Apache on my PC, copied the whole site to its HDD, and got the whole rule-set sorted out, using the logs, real fast.
Then I reviewed my old rules, which been working. I saw they were not really doing what was desired. A time bomb, given a slightly different address.
There are so many pit falls in rewrite rules, it's not a straight logic thing at all.
You can get Apache up and running in ten minutes, it's 10MB, good license, *NIX/WIN/MAC ready, even without install.
Also, check the header lines of your server and get the same version of Apache from their archive if it's old. My OP is still on 2.0; many things are not supported.
I'll leave this here, maybe obvious detail, but got me banging my head for hours:
be careful using %{REQUEST_URI} because what #Krist van Besien say in his answer is totally right, but not for the REQUEST_URI string, because the out put of this TestString starts with a /. So take care:
RewriteCond %{REQUEST_URI} ^/assets/$
^
| check this pesky fella right here if missing
Best way to debug it!
Add LogLevel notice rewrite:trace8 to the httpd.conf of apache to log all notices of mod_rewrite. If you are at shared hosting and don't have access to httpd.conf then test it locally and upload to the live site. Once enabled this generate a very large log in very short time, it means it can't be tested on live server anyway.
(Similar to Doin idea)
To show what is being matched, I use this code
$keys = array_keys($_GET);
foreach($keys as $i=>$key){
echo "$i => $key <br>";
}
Save it to r.php on the server root and then do some tests in .htaccess
For example, i want to match urls that do not start with a language prefix
RewriteRule ^(?!(en|de)/)(.*)$ /r.php?$1&$2 [L] #$1&$2&...
RewriteRule ^(.*)$ /r.php?nomatch [L] #report nomatch and exit
as pointed out by #JCastell, the online tester does a good job of testing individual redirects against an .htaccess file. However, more interesting is the api exposed which can be used to batch test a list of urls using a json object. However, to make it more useful, I have written a small bash script file which makes use of curl and jq to submit a list of urls and parse the json response into a CSV formated output with the line number and rule matched in the htaccess file along with the redirected url, making it quite handy to compare a list of urls in a spreadsheet and quickly determine which rules are not working.
Perhaps the best way to debug rewrite rules is not to use rewrite rules at all, but to defer URL processing from the htaccess file to a PHP file (let's call it router.php). Then, you can use PHP to do any manipulating you like, with proper error detection and the usual ways to do debugging. This even runs faster, too, since you don't have to use the rewriting module.
To transfer control immediately from .htaccess to router.php for any URL that is not found in the file system, just put the following line in .htaccess:
FallbackResource router.php
Yes, it's really that easy. And yes, it really works. Give it a try.
Note: You may need an ErrorDocument directive in your .htacess file to transfer control explicitly for certain URLs to your router.php file on HTTP status 404, especially if you inherit from a parent htaccess file that handles status 404. So that would make it a total of two lines to transfer control to a router file.
If you are working with url, You might want to check if you "Enable Mod Rewrite"

How can I redirect requests to specific files above the site root?

I'm starting up a new web-site, and I'm having difficulties enforcing my desired file/folder organization:
For argument's sake, let's say that my website will be hosted at:
http://mywebsite.com/
I'd like (have set up) Apache's Virtual Host to map http://mywebsite.com/ to the /fileserver/mywebsite_com/www folder.
The problem arises when I've decided that I'd like to put a few files (favicon.ico and robots.txt) into a folder that is ABOVE the /www that Apache is mounting the http://mywebsite.com/ into
robots.txt+favicon.ico go into => /fileserver/files/mywebsite_com/stuff
So, when people go to http://mywebsite.com/robots.txt, Apache would be serving them the file from /fileserver/mywebsite_com/stuff/robots.txt
I've tried to setup a redirection via mod_rewrite, but alas:
RewriteRule ^(robots\.txt|favicon\.ico)$ ../stuff/$1 [L]
did me no good, because basically I was telling apache to serve something that is above it's mounted root.
Is it somehow possible to achieve the desired functionality by setting up Apache's (2.2.9) Virtual Hosts differently, or defining a RewriteMap of some kind that would rewrite the URLs in question not into other URLs, but into system file paths instead?
If not, what would be the preffered course of action for the desired organization (if any)?
I know that I can access the before mentioned files via PHP and then stream them - say with readfile(..), but I'd like to have Apache do as much work as necessary - it's bound to be faster than doing I/O through PHP.
Thanks a lot, this has deprived me of hours of constructive work already. Not to mention poor Apache getting restarted every few minutes. Think of the poor Apache :)
It seems you are set to using a RewriteRule. However, I suggest you use an Alias:
Alias /robots.txt /fileserver/files/mywebsite_com/stuff/robots.txt
Additionally, you will have to tell Apache about the restrictions on that file. If you have more than one file treated this way, do it for the complete directory:
<Directory /fileserver/files/mywebsite_com/stuff>
Order allow,deny
Allow from all
</Directory>
Can you use symlinks?
ln -s /fileserver/files/mywebsite_com/stuff/robots.txt /fileserver/files/mywebsite_com/stuff/favicon.ico /fileserver/mywebsite_com/www/
(ln is like cp, but creates symlinks instead of copies with -s.)

How can I block mp3 crawlers from my website under Apache?

Is there some way to block access from a referrer using a .htaccess file or similar? My bandwidth is being eaten up by people referred from http://www.dizzler.com which is a flash based site that allows you to browse a library of crawled publicly available mp3s.
Edit: Dizzler was still getting in (probably wasn't indicating referrer in all cases) so instead I moved all my mp3s to a new folder, disabled directory browsing, and created a robots.txt file to (hopefully) keep it from being indexed again. Accepted answer changed to reflect futility of my previous attempt :P
That's like saying you want to stop spam-bots from harvesting emails on your publicly visible page - it's very tough to tell the difference between users and bots without forcing your viewers to log in to confirm their identity.
You could use robots.txt to disallow the spiders that actually follow those rules, but that's on their side, not your server's. There's a page that explains how to catch the ones that break the rules and explicitly ban them : Using Apache to stop bad robots [evolt.org]
If you want an easy way to stop dizzler in particular using the .htaccess, you should be able to pop it open and add:
<Directory /directoryName/subDirectory>
Order Allow,Deny
Allow from all
Deny from 66.232.150.219
</Directory>
From this site: (put this in your .htaccess file)
RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://((www\.)?dizzler\.com [NC]
RewriteRule .* - [F]
You could use something like
SetEnvIfNoCase Referer dizzler.com spammer=yes
Order allow,deny
allow from all
deny from env=spammer
Source: http://codex.wordpress.org/Combating_Comment_Spam/Denying_Access
It's not a very elegant solution, but you could block the site's crawler bot, then rename your mp3 files to break the links already on the site.