How to return a 410 error with htaccess for all *.html queries? - apache

Some time ago, I had a problem with my server. Using a security hole at Joomla, someone created thousands of files (ending in *.html) in my website.
I've deleted all these files, but Google keep querying this files. I've already added a hundred (or more) of filenames at Google Webmaster Tools to be removed, but there are still lots of files.
What I want, is to add a rule to the .htaccess file to return a 410 error code when any file ending with .html is requested, EXCEPT if the filename is google123456789abcdefg.html (a file from google). The problem is that the html files can be in any (non-existing) folder of the webserver...
Can you help me with this problem? Because I haven't a clue about how the .htaccess file works...

You can use the following rule :
RewriteEngine on
#--exclude "/google12345.html--#
RewriteCond %{REQUEST_URI} !^/google12345abc\.html$ [NC]
#--redirect all .html requests to 410 Gone--#
RewriteRule \.html$ - [R=410,L]

Related

.htaccess Redirect Media File Requests to a Different Domain/Server

Simply speaking, I want to substitute one file path for another in the URI, but only for certain file types.
I have a load of image files (PNG, GIF and JPG) on one server and a wordpress installation on another server. I can't put them all on the same server at the moment (for reasons too complicated to go into).
So, when I get a request for a PNG, GIF or JPG file on e.g.
http://www.server1.com/images1/image1.png
I want to be able to divert this request to the same image, but on server 2, potentially in a different top level subfolder e.g. "allimages" such as:
http://www.server2.com/allimages/images1/image1.png
Then, say divert:
http://www.server1.com/images2/image2.png
to
http://www.server2.com/allimages/images2/image2.png
I tried to make a start with .htaccess (on SERVER1) but haven't got very far. I put a .htaccess file in the root of Server 1, with these lines in:
RewriteEngine On
RewriteCond %{REQUEST_URI} (\.png|\.jpg|\.gif|)$
RewriteRule ^(.*)$ http://www.server2.com/allimages/$1 [L,R=301]
But I know this isn't correct. Can anyone help? Many thanks!
Try with:
RewriteEngine On
RewriteRule ^(.+(?:\.png|\.jpg|\.gif))$ http://www.server2.com/allimages/$1 [L,R=301]

apache configuration for react-router with browserHistory and baseName subdirectory

My application is in a subdirectory of the main web site. I have implemented basename and browserHistory and put the following recommended apache rewrite code into .htaccess in the app folder:
RewriteEngine On
RewriteBase /
RewriteRule ^index\.html$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . index.html [L]
This recommended apache rewrite works to return browserHistory URLs to the application index.html. For some reason (see below), it does not find the "style.css" and "brundle.js" that are EMBEDDED in the index page HTML. The only thing that I can see that I have different in the example .htaccess file is that I have a RewriteBase value of "/react_librivox_search" because my application is in that subfolder of the site.
RewriteBase /react_librivox_search
I have tested using various different beginning and ending slashes for paths and files in .htaccess, and the problem is not that.
The problem seems to be that the react application is setting a GET value for the files that includes a part of the PATH variable that is supposedly only a part of the react-router definition:
<Route path="/book/:id" component={BookDataDisplay}/>
Note that the additional path segment "/book/" is being appended to the base URL, and when files are not found in THAT directory (which does not really exist), the server returns "index.html" for the missing files, which accounts for the mime-type error for the librivox_search.css file and the "<" error for the bundle.js file.
The stylesheet http://jstest.dd:8083/react_librivox_search/book/librivox_search.css was not loaded because its MIME type, “text/html”, is not “text/css”. librivox_search.css (embedded in index.html)
SyntaxError: expected expression, got '<' bundle.js (embedded in index.html)
The same unexpected addition of "book" to the URL is at work here as well. Neither of the embedded index.html files is in that subdirectory. But I want to maintain that "book" in the path, since it identifies what KIND of data is being passed to the route . . . which distinquishes it from other kinds of data. I just do not want to have it sent to the SERVER (but perhaps that cannot be avoided), since the actual index.html embedded files are not there.
I suppose rewriting "/react_librivox_search/book" as "/react_librivxo_search" might work, but it seems to be a hacky way to go about it. And I don't want to have to put duplicate bundle.js files in multiple directories (that works, but
what a maintenance nightmare THAT would be, and no doubt bad practice).
Or is it recommended to put a separate .htaccess in a REAL "book" subfolder that returns "librivox_search.css" and "bundle.js" (in the base directory) depending on the file request?

Use .htaccess to transparently manage versions of site depending on extension

After reading and trying a lot of tutorials and howtos... I have not found a way to do this using .htaccess ;-P
I have this folder structure:
somepath/mywebPHP_v314/ (incluing many subfolders with php files)
somepath/mywebJS_v007/ (incluing many subfolders with js files)
somepath/mywebCSS_v876/ (incluing many subfolders with css files)
somepath/mywebJPG_v543/ (incluing many subfolders with jpg files)
somepath/standardhostingfolder/.htaccess (Only file at this folder. No subfolders)
I would like to program .htaccess to get this behaviour:
Depending on the extension of the file, Apache will serve transparently the corresponding file.
E.g. when visiting www.mywebname.com/products/family/somearchive.php Apache will serve transparently somepath/mywebPHP_v314/products/family/somearchive.php (instead of serving the usual somepath/standardhostingfolder/products/family/somearchive.php)
e.g. when visiting www.mywebname.com/products/family/foto3.jpg Apache will serve transparently somepath/mywebJPG_v543/products/family/foto3.jpg
Thanks in advance!
P.S. Just for clarifying: "somepath/standardhostingfolder/" is the server's "web root" folder: If I delete somepath/standardhostingfolder/.htaccess: When visiting www.mywebname.com/products/family/foto3.jpg Apache will try to serve naturally somepath/standardhostingfolder/products/family/foto3.jpg .
This should work for .jpg, you can create 3 other rules accordingly:
RewriteCond %{REQUEST_URI} \.jpg$
RewriteCond %{REQUEST_URI} !^/mywebJPG_v543
RewriteRule (.*) /mywebJPG_v543%{REQUEST_URI} [L]

mod_rewrite inserting full path to file

I need to create a rewrite to take traffic going to mp3/mp4 files in a specific subdirectory and then route them to a PHP file that tracks download stats etc before routing them to the actual file location since iTunes requires your podcast RSS contain actual media file extensions (.mp3, .mp4, etc)
I have created rewrites before with no problem but now I am running into an odd issue on this company's server.
My .htaccess located at www.company.com/companytools/podcasts
RewriteEngine on
RewriteRule ^/(.*).mp3$ /test.php?file=$1 [r=301,L]
Right now it is partially working it does act upon the mp3 file but ends up including the full path to test.php after the domain, so I end up with a 404 page looking for this URL:
www.company.com/www/internal/docs/companytools/podcasts/test.php?file=test
basically I need the path, but only the /companytools/podcasts part.
Any help is appreciated.
You may not need R=301 here to hide actual PHP handler.
Try this rule with RewriteBase:
RewriteEngine on
RewriteBase /companytools/podcasts/
RewriteRule ^(.+?)\.mp3$ test.php?file=$1 [L,QSA]

How do I force Apache to simply redirect the user and ignore the directory structure?

Ok, so this problem recently arose and I don't know why it is happening; it's actually two problems in one...
0. My .htaccess file, for reference. (EDITED)
Options -Indexes +FollowSymLinks
RewriteEngine On
RewriteBase /
ErrorDocument 400 /index.php?400
ErrorDocument 401 /index.php?401
ErrorDocument 403 /index.php?403
ErrorDocument 404 /index.php?404
ErrorDocument 410 /index.php?410
ErrorDocument 414 /index.php?414
ErrorDocument 500 /global/500.php
RewriteCond %{HTTP_HOST} !^$ [NC]
RewriteRule .* index.php [L]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https?://(.*\.)?(animuson)\.(biz|com|info|me|net|org|us|ws)/.*$ [NC]
RewriteRule ^.*$ - [F]
1. My 'pictures' folder is following the hard path instead of the redirect.
I have no idea WHY it is doing this. It's really bugging me. The 'pictures' folder is a symbolic link to another place so that I can easily upload files to that folder without having to search through folders and such via my FTP account, but that's the only thing I use it for. However, when I visit http://example.com/pictures my htaccess sees it as accessing that other folder, which is restricted, and throws a 403 error rather than redirecting to index.php and displaying the page like normal.
I figured it has something to do with that specific folder being a symbolic link causing it to act oddly, but I have determined that my rules are not being applied to folders at all. If I visit folders such as 'css' and 'com' which are folders in the web root, it displays a 404 error page and adds the '/' to the end of the URL because it's treating it as a directory. It also does the same 403 error for my 'images' directory which is set up in the same fashion.
So, the question here is how do I modify my RewriteRule to apply to the directories as well? I want everything accessed via the web to be redirected back to index.php while maintaining the full access path in the address bar, why is it not working? (I'm pretty sure it was working fine before.)
Here's a small chart to show the paths they're following...
example.com/pictures -> pictures/ -> /home/animuson/animuson-pictures -> 403
example.com/com -> com/ -> 404
example.com/test -> index.php
example.com/ -> index.php
example.com/images -> images/ -> /home/animuson/animuson-images -> 403
example.com/css -> css/ -> 404
EDIT: Following information added.
Apache is processing the structure of the directory first. It's determining if the path exists based on what was typed into the address bar. If someone types in a folder name that happens to exist, it will redirect the user to the path with the "/" at the end of the URL signifying that it's a directory. For the 'pictures' directory explained above, the user does not have permission to access that folder so it is redirecting them to a 403 Access Denied page rather than simply showing the page that is supposed to be displayed there via the RewriteRule above. My biggest question is why is Apache processing the directory first and how do I make it stop doing that? I would really love an answer to this question.
2. Why is my compression not working? (EDIT: This part is fixed.)
When analyzing my site through a web optimizer, it keeps saying my page isn't using web compression, but I'm almost 100% positive that it was working fine before under the same settings. Can anyone suggest any reasons why it might not be working with this set up or suggest a better way of doing it?
Where is this .htaccess file situated? At the root or in the pictures directory?
1) You're using Options -Indexes which will deny access to directory listings. This is handled by /index.php?403 which in turn will redirect to /403. (I confirmed this by manually going to /index.php?403) I don't see any other rules in the posted .htaccess that are supposed to affect this. So this either happens because either index.php or some other .htaccess file or server rule makes that redirect.
You might also want to check the UNIX file permissions of the directory in question.
2) According to this aptimizer, http://www.websiteoptimization.com/services/analyze/, compression is indeed enabled for html, js and css files, as specified in the rules. My bet is that the optimizer is being stupid and does one of these three things:
1)) Complaining about images not being compressed. (It's generally a bad idea to compress images because they're typically already compressed and the extra CPU load typically isn't worth it since the net gain is so small. So your rules are OK in this regard.)
2)) It might think that DEFLATE doesn't count as compression, and wants you to use GZip.
3)) It might also react to the externally included StatCounter js file, which is not compressed. (And there's not much you can do about that.)
After a while of deliberating on Apache's IRC channel, I was finally able to figure out the real reasoning behind this on a fluke. I just happened to be looking at the directory structure using ls -l and noticed that all of the symbolic links had somehow has their permissions changed to animuson:animuson from the root:root original. I tried to run a simple chown root:root on them and it had no effect, so I deleted them all and recreated them and the problem has gone away. I don't really have any idea why the permissions made any different in this scenario but the solution worked and everything is okay now. I've also added a DirectorySlash Off to my .htaccess file to get rid of the slashes after folders that exist, just to make it look all that much nicer.