htaccess block every IP/visitor and bots except google bot - apache

I am learning htaccess. Is the following possible by using htaccess:
1) Block every visitor/IP to site.
2) Block all the bots except google bot.
RewriteEngine On
order deny,allow
deny from all
RewriteCond %{HTTP_USER_AGENT} (bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]
Is the above htaccess right? Any help would be appreciated.

Is the above htaccess right?
No, of course it isn’t right – because you are blocking all requests with
order deny,allow
deny from all
– so the Google bot won’t get access either.
You can do this by using a combination of SetEnvIf and Allow – see http://httpd.apache.org/docs/2.2/mod/mod_authz_host.html#allow, that has an example for exactly this.
(You’ll need to remove the Directory directive used in there, because that can’t be used in .htaccess files. But only those two lines, what is inside the directive you have to keep of course.)

Related

error 404 defining URL in mod rewrite

I'm trying to rewrite my URL's to be more clean and user friendly and also better for SEO, so whenever the user clicks each country link to see the list of train journeys for each country, i.e: Italy, it should call the page country.php?country=italy , but the URL should be rewritten to great_train_journeys/country/italy.
I've tried to set rewrite rules on a .htaccess file but i'm getting the 404 error.
Here is my code for the .htaccess file:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^country=(.*)$
RewriteRule ^country/([A-Za-z0-9-]+)/?$ country.php?country=$1 [NC,R]
I'm using XAMPP to work on my local server, so my project folder is inside the HTDOCS folder, which is the root of my server:
Here is my project structure:
I've checked if mod_rewrite is enabled in the http.config file and also changed the AllowOverride to all like it is below:
<Directory />
Options Indexes FollowSymLinks MultiViews
AllowOverride All
Order allow,deny
allow from all
Thanks for your help
I'm trying to rewrite my URL's from this, country.php?country=Italy to this country/Italy
I believe you meant it the other way around, at least this is what your Code tells me. So if someone enter example.com/country/italy you want that the internally the this /country.php?country=italy is called but the user should not see it.
So in this case you need:
RewriteEngine on
RewriteRule ^/?country/([^/]+)/?$ /country.php?country=$1 [NC,L]
But if you mean it the other way around so that the URL in the browser is example.com/country.php?country=Italy but this should internally go to example.com/country/Italy than you need the following:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^(.*&)?country=([^&]*)(&.*)?$
RewriteRule ^/country\.php$ /country/%2 [NC,L]
With this we have some small problems because if example.com/country/Italy is the real folder and someone enter example.com/country.php?country=italy that we will not find the folder with the name italy and we get a 404 error.
In your original Code you also used the [R] Flag (that means redirect), for exapmle if a user enters example.com/country.php?country=italy that the URL in the browser will change to example.com/country/italy
than you should do this:
RewriteEngine on
RewriteCond %{QUERY_STRING} ^(.*&)?country=([^&]*)(&.*)?$
RewriteRule ^/country\.php?$ /country/%2 [NC,R=301]
Now we do a 301 redirect.

.htaccess block outside access on local server except for certain URL's

I currently have my local .htaccess on a MAMP server set up to block all incoming traffic from outside my local system;
<FilesMatch ".*">
Order deny,allow
Deny from all
Allow from 127.0.0.1
</FilesMatch>
This works fine but I then use API's like PayPal that require access to your site for IPN's. Is it possible to keep the restriction on the rest of the site and allow outside access only to specific urls like https://example.com/paypal_ipn?
I understand I can just switch the restriction off when using IPN's but that's not what I'm looking for. Many thanks.
You can use mod_rewrite based rules instead in your root .htaccess:
RewriteEngine On
RewriteCond %{THE_REQUEST} !/paypal_ipn [NC]
RewriteCond %{REMOTE_ADDR} !^127\.0\.0\.1
RewriteRule ^ - [F]
This will block all requests that are not:
originating from localhost (127.0.0.1)
for /paypal_ipn

Block IP access to specific page only

I need to block access from a certain IP address to one page of a website only, not the entire website.
Here's what I have, but doesn't seem to be working (I switch out offending IP to mine and am still abel to access after refresh/cache dump etc)
<Files specificpage.php>
order deny,allow
deny from XX.XXX.XXX.XX
</Files>
Is there a better way of doing this or does anything jump out here?
thx
You can actually mod_rewrite rules for finer control here. Place this in your root .htaccess:
RewriteEngine On
RewriteCond %{REMOTE_ADDR} =XX.XXX.XXX.XX
RewriteRule ^specificpage\.php$ - [F,NC]

Apache .htaccess to redirect index.html to root, why FollowSymlinks and RewriteBase?

In order to redirect all somefolder/index.html (and also somefolder/index.htm) to somefolder/ I use this simple rewrite rule in Apache .htaccess file:
RewriteEngine on
RewriteCond %{THE_REQUEST} ^.*\/index\.html?\ HTTP/
RewriteRule ^(.*)index\.html?$ "/$1" [R=301,L]
This works well!
But at Google groups they suggest to add also:
Options +FollowSymlinks
RewriteBase /
Could anyone be so kind to explain me why would i have to add these last lines, and explain me a bit what they mean and what they do?
Is there a potential secuirty risk in not adding these lines?
Many thanks,
Why they're suggested:
It's suggested that you add Options +FollowSymlinks because it's necessary that symlink following is enabled for mod_rewrite to work, and there's a chance that, while you may be allowed to turn it on, it's not enabled by the main server configuration. I suspect the reason that symlink following is necessary is beause the module makes a number of calls to apr_stat(), which looks like it needs to follow symlinks in order to get file information in all cases.
As for RewriteBase, it's typically not necessary. The documentation goes on about it, but as most people's files do live under the DocumentRoot somewhere, it usually only serves a purpose if you're redirecting externally and you use directory-relative URLs. To illustrate what I mean, consider the following:
RewriteEngine On
RewriteRule ^redirect index.html [R,L]
A request for example.com/redirect will result in an external redirect to example.com/full/path/to/web/root/index.html. The reason for this is that before it handles the redirection, mod_rewrite re-appends the current directory path (which is the default value of RewriteBase). If you modified RewriteBase to be /, then the path information would be replaced with that string, so a request for index.html would now be a request for /index.html.
Note that you could just have done this explicitly on the replace too, regardless of the value of RewriteBase:
RewriteEngine On
RewriteRule ^redirect /index.html [R,L]
...works as intended, for example. However, if you had many rules that needed a common base and were being shifted around between directories, or your content wasn't under the root, it would be useful to appropriately set RewriteBase in that case.
The risk of not using them:
There's absolutely no security risk in not specifying Options +FollowSymlinks, because if you don't and it's not set by the main server configuration, mod_rewrite will always return 403 Forbidden. That's kind of problematic for people trying to view your content, but it definitely doesn't give them any extended opportunity to exploit your code.
Not setting RewriteBase could expose the path to your web content if you had an improperly configured rule set in one of your .htaccess files, but I'm not sure that there's any reason to consider that a security risk.

RewriteRule for mapping x.domain.com to y.domain.com

Is it possible to redirect all requests to x.domain.com/.* to y.domain.com/.* WITHOUT letting this redirection be visible in the url?
I have unsuccessfully tried several things in .htaccess. Just specifying the [L] flag still shows this redirection in the url (as it does when I use the [R] flag additionally).
EDIT: as somebody claimed there being no reason for this, let me give some more information :)
I have one nice url: x.domain.com , which is well known.
Then there are a number of other domains: spring.domain.com , summer.domain.com , autumn.domain.com, winter.domain.com .
Depending on the time of the year, a specific y.domain.com becomes the current one. The x.domain.com should always map to the current one.
EDIT2:
I'll write here, as the code isn't nicely rendered in the comments...
I tried what Arjan suggested:
RewriteCond %{HTTP_HOST} ^x.domain.com$
RewriteRule ^(.*)$ /path/to/y.domain.folder/$1
Unfortunatly though this keeps redirecting forever. :(
Request exceeded the limit of 10 internal redirects due to probable configuration error. Use 'LimitInternalRecursion' to increase the limit if necessary. Use 'LogLevel debug' to get a backtrace.
Putting the [R] flag behind, I see in the url something like:
http://x.domain.com/path/to/y.domain.folder/path/to/y.domain.folder/path/to/y.domain.folder/ ...
Any suggestions?
Now that I can read the errorlogs, I can give a direct response, as what a possible 500 error refers to.
Assuming you have access to the Apache configuration, create the following virtual host for domain x.domain.com. Then simply update y to whatever you need each season.
<VirtualHost ...:80>
ServerName x.domain.com
UseCanonicalName Off
ProxyRequests Off
<Proxy *>
Order Allow,Deny
Allow from all
</Proxy>
ProxyPreserveHost Off
RewriteEngine On
RewriteRule ^$ http://y.domain.com/ [P,NC]
RewriteRule ^/(.*)$ http://y.domain.com/$1 [P,NC]
ProxyPassReverse / http://y.domain.com/
</VirtualHost>
Also to pick up the Alias suggestions, if you have multiple virtual hosts (one for each season) then you could put a server alias into the current domain. E.g.
<VirtualHost ...:80>
ServerName summer.domain.com
ServerAlias x.domain.com
...
</VirtualHost>
<VirtualHost ...:80>
ServerName spring.domain.com
...
</VirtualHost>
...
This would make Apache deliver the summer.domain.com pages if you go to x.domain.com. If your seasonal subdomains depend on the HOST header line to be set correctly (i.e. to season.domain.com) you would need to use the first suggestion above, though.
If these are not hosted on the same server, then you'd need the Proxy flag. This also requires the proxy module to be running. Not tested:
RewriteCond %{HTTP_HOST} ^x.domain.com$
RewriteRule ^(.*)$ http://y.domain.com/$1 [P]
EDIT: Given the edits to your question they're probably just on the same server. So then indeed, as jetru suggested an Alias might do. Or:
# No RewriteCond required; serve all content from other folder:
RewriteRule ^(.*)$ /path/to/y.domain.folder/$1
EDIT: The above would not change the HTTP_HOST header that was sent by the browser (maybe that can be done as well). This implies that it would only work if the subdomains are represented on the file system as separate directories. So, as the .htaccess would then be placed in the directory holding the website for x.domain.com, the RewriteCond wouldn't even be required. Also, the directory for this x.domain.com subdomain would in fact not need any HTML content then; in the end all content would be served from the directory of another subdomain.
EDIT: As the above does not seem to work either, and yields endless rewrite loops even when adding [NS], maybe simply adding [L] helps here:
RewriteRule ^(.*)$ /path/to/y.domain.folder/$1 [NS,L]
Or maybe one can set an environment variable to stop the loop:
RewriteCond %{ENV:MY_VAR} !=1
RewriteRule ^(.*)$ /path/to/y.domain.folder/$1 [E=MY_VAR:1]
But, for both [L] and [E]: I'm just guessing; I've never made mod_rewrite jump into the directory of another virtual host. I am not sure it can be done to start with.
Unfortunately, it's unclear how one would add a new subdomain. If one would just need to create a new directory with the name of the subdomain (without any use of some administrative tool) then the provider might be be using system wide rewriting as well. In fact, even without subdomains the provider might be doing some Mass Virtual Hosting as described in the URL Rewrite Guide.
I guess the best solution would be to change the value of HTTP_HOST on the fly, to solve issues with any system wide rewriting. Maybe the following is allowed to achieve that:
RewriteCond %{HTTP_HOST} ^x.domain.com$
RewriteRule ^(.*)$ /path/to/y.domain.folder/$1 [E=HTTP_HOST:y.domain.com]
Again, as the above would only be present in the .htaccess in the x.domain.folder, the RewriteCond is probably not needed at all.
Have you tried
Alias /dir/file.html /full/path/to/other/file.html
??
To my knowledge and testing with firebug a redirect via .htaccess is always announced to the client and it's up to him how to proceed. It is therefore not an alternative to some sort of SSI functionality. To prevent a "fake" address modern browser should always make the REAL address visible to the user, however I think I have seen some misbehavior in programs like "feeddemon" where IE is embedded. If you - for whatever reason - really want to show content from one subdomain on another you can try using Javascript or (i)frames on the user side or some include functionality on the server site (eg. file_get_contents with php). However, I don't recommend this.