Apache: include htaccess in conf with AllowOverride None, better performance? - apache

Suppose we have the /home/example.org/public_html/ directory on the filesystem, which serves as the document root of my virtualhost.
The relevant httpd configuration for that vhost would look like this:
<VirtualHost *:80>
ServerName example.org:80
...
DocumentRoot /home/example.org/public_html
<Directory /home/example.org/public_html>
AllowOverride All
...
</Directory>
...
</VirtualHost>
In order to prevent the htaccess lookups on the filesystem without losing the htaccess functionality – at least at the DocumentRoot level- I transformed the configuration to the following:
<VirtualHost *:80>
ServerName example.org:80
...
DocumentRoot /home/example.org/public_html
<Directory /home/example.org/public_html>
AllowOverride None
Include /home/example.org/public_html/.htaccess
...
</Directory>
...
</VirtualHost>
Difference
AllowOverride None
Include /home/example.org/public_html/.htaccess
Let’s see what we have accomplished with this:
httpd does not waste any time looking for and parsing htaccess files
resulting in faster request processing
Questions:
Using Include directive, Apache load htaccess only on service start or for each request?
If point 1 it's true, how do refresh apache conf without httpd.exe -k restart?

Firstly, note that checking for .htaccess is commonly not all that big an issue, since the relevant bits of the disk are cached in memory. It becomes an issue where for example you have a very large number of directories under your web root directory or directories, and the hits are scattered amongst them so that the hit rate on cached disk blocks is low. You might be better dealing with that by disabling .htaccess selectively for directory trees where it creates a problem. Parsing the .htaccess directives creates a little CPU load of course, but CPU should generally not be your server's bottleneck.
Answering your question as posed though; Yes, you will need to run a command as root to load the new configuration. Rather than using restart though, use reload or (better) graceful.
httpd.exe -k graceful
You could (but probably shouldn't) write a cron job to periodically check whether this needs to be run. Without a lot of testing, I think something like this should work, run as a regular root cron job:
#!/bin/bash
[ /var/run/httpd/http.pid -nt /home/example.org/public_html/.htaccess ] \
&& httpd.exe -k graceful
This creates a bit of disk load itself of course. This load doesn't increase with traffic volume, but might be an issue if you have many such included files.
SECURITY WARNING: It sounds like you are setting up a situation where a non root user is likely to be able to get Apache to Include directives at will. This is much more powerful than what can be done with a .htaccess file, and amounts to a root exploit. E.g. it gives access to things like the User and LoadModule directives, which .htaccess directives can never do.
I recommend that you should put Included directives in a file inside your Apache configuration directory, and have it accessible only by root. There are other ways to make sure that only root can edit the .htaccess file, but getting these files out of the user-owned area makes it less likely you'll inadvertently open access again later.
While the .htaccess mechanism does incur extra disk load, it is the mechanism that's designed for use by non-root users. It would be nice to have a mechanism for untrusted users to modify configuration with a limit on how often the .htaccess file would be checked for, but if it exists, I don't know it.

Apache accesses and processes the htaccess files on each request. This is why one does not need to restart the server every time to check their current configurations.
You do need to restart the server/service for testing any changes made to apache.conf, httpd.conf or the vhost configurations.
Quoting from Apache's tutorial on htaccess file:
You should avoid using .htaccess files completely if you have access
to httpd main server config file. Using .htaccess files slows down
your Apache http server. Any directive that you can include in a
.htaccess file is better set in a Directory block, as it will have
the same effect with better performance.
Since you already are trying to Include the htaccess from inside a <Directory> module block, the performance would be better if you include everything from the file to this block itself instead. There is, although no difference; apart from having to maintain configurations in two places simultaneously.
The htaccess file will get processed just once, at the time of server start.

Related

.htaccess file on localhost

I have a localhost on ubuntu 16. In the root localhost directory (/var/www/html/) i put this htaccess file.
AddDefaultCharset utf-8
RewriteEngine on
RewriteRule ^index?$ index.php
When I type localhost/index apache says me
The requested URL /index was not found on this server.
Is that an error in Apache configuration?
Basicly I want to make redirects to index.php in the root of my site and here I want to parse something like this localhost/cart/item/1 to array and then realize MVC. I am new in web dev and do not realy understand how can I do it, please help me.
You have to enable the interpretation of such dynamic configuration files (".htaccess" style files) first. They are disabled by default, since they slow down the server considerably. Usually it is preferable to place such rules directly in the servers static configuration files.
To enable them take a look at the AllowOverride command: https://httpd.apache.org/docs/2.4/mod/core.html#allowoverride
<Directory "/var/www/html">
AllowOverride All
</Directory>
So since you have to modify that configuration anyway... why don't you place your rewrite rules in there too? Easier, more robust and faster too...
Apart from that it sometimes is a good idea to implement rewrite rules in such way that they work in both locations, the http servers (virtual) host configuration and dynamic configuration files:
RewriteEngine on
RewriteRule ^/?index/?$ index.php [L]
And a general hint: you should always prefer to place such rules inside the http servers host configuration instead of using dynamic configuration files (".htaccess"). Those files are notoriously error prone, hard to debug and they really slow down the server. They are only provided as a last option for situations where you do not have control over the host configuration (read: really cheap hosting service providers) or if you have an application that relies on writing its own rewrite rules (which is an obvious security nightmare).
Maybe you need to create a folder, and then, work in your application inside there. Ex:
Create a var/www/html/project;
Put yout .htaccess inside;
Work with your application inside that folder.

Apache always get 403 permisson after changing DocumentRoot

I'm just a newbie for Apache. I just installed apache 2.2 on the FreeBSD box at my home office. The instruction on FreeBSD documentation is that I can change the DocumentRoot directive in order to use the customized directory data. Therefore, I replaced...
/usr/local/www/apache22/data
with
/usr/home/some_user/public_html
but something is not right. There's index.html file inside the directory, but it seems that apache could not read the directory/file.
Forbidden
You don't have permission to access / on this server.
The permission of
public_html
is
drwxr-xr-x
I wonder what could be wrong here. Also, in my case, I am not going to host more than one website for this FreeBSD box, so I didn't look at using VirtualHost at all. Is this a good practice just to change the DirectoryRoot directive?
Somewhere in the apache config is a line like:
# This should be changed to whatever you set DocumentRoot to.
#
<Directory "/usr/local/www/apache22/data">
You must change this path too, to make it work. This directive contains for example:
Order allow,deny
Allow from all
Which give initial user access to the directory.
one possibility that comes to mind is SELinux blocking web process from accessing that folder. If this is the case, you would see it in selinux log. You would have to check the context for your original web root with:
ls -Zl
and then apply it to your new web folder:
chcon whatevercontextyousaw public_html
Or, instead, if its not a production server that requires security (like a development machine behind a firewall), you might want to just turn selinux off.
Just one idea. Could be a number of other things.

what impact will it does on site performance if .htaccess file is loaded with hundred's of URL redirection..?

I have to redirect bulk of pages to new URL's and if I put all them in .htaccess file, will it get overload and decrease site's performance..??
Please let me know if there are any alternatives to dynamically edit .htaccess file.
Thanks
First notice that using .htaccess files or even allwing the use of .htaccess file is decreasing performances. As you are telling Apache to make I/O on the filesystem, seeking for .htaccess files on the directory tree. So for a better performance having a AllowOverride None on a <Directory /> (so from the root directory) is better, and of course you should'nt modify the AllowOverride setting in any subdirectory.
Now, if you a .htacess with a lot of rules, or a <Directory /path/to/my/directory> with a lot of rules, which is quite the same, except the second version will be read on startup and not for every request, then you will of course slow down youyr apache process as he must check all the rules. But this is usually quite fast. The best thing is to track it via some tools like autobench, httpperf, webinject, ab, etc.
There is a way in mod_rewrite to speed up the rewriting process when you all a lot of rules using the same scheme. It's RewriteMap. When using rewriteMap with a hash file you will of course have something with better speed. But rewriteMap requires that you forget about .htaccess dumb files and that you really use apache configuration files to edit your apache configuration (/etc/apache/*). So you need an adminisstrative access on this configuration.
And at the end of your question you are talking about editing dynamically your .htaccess. If your rewrites are really dynamic I would forget about using the webserver to handle that, and push the rewriting policy into the application code (PHP/C#/etc). Or you could also use the prg: option of rewriteMap and write your own perl, python or anything else daemon script, called by apache's mod-rewrite on each request, providing the rewrite policy. And the rewriting performance will then depend on your programing skills.
I belived, it will not degrade the perfomance in huge , althout the overload on apache server will definately increses....It's also depend on your bandwidth and speed of RAM on your server, if it is good then, I belived you will find any mager issue..
In my local environment, I am using WAMP, which allow me add /remove module dynamically, you can use cpanel also, which allow you add/edit many items dynamically for your apache server
http://docs.cpanel.net/twiki/bin/view/EasyApache3/WebHome
In Addition, you can active and set your virtual host setting uncommeting below lines from your httpds.conf file
LoadModule vhost_alias_module modules/mod_vhost_alias.so
Virtual hosts
Include conf/extra/httpd-vhosts.conf and afterword you can add you all your hundres of virtual host in same file i.e conf/extra/httpd-vhosts.conf

.htaccess or httpd.conf

I need to do a url-rewriting job now.
I don't know whether I should put the code into a .htaccess or httpd.conf?
EDIT
What's the effecting range of .htaccess?Will it affect all requests or only requests to the specific directory it's located?
If you wont have to change your rules very often, you should put them in the httpd.conf and turn off overriding in the top directory your rules apply to
AllowOverride None
With no overriding, your apache will not scan every directory for .htaccess files making less of an overhead for each request.
Whenever you do have to change your rules, you will have to restart your apache server if you put it in your httpd.conf as opposed to them being instantly detected in .htaccess files because it reads them all on every request.
You can easily do this using a graceful restart with the apachectl tool to avoid cutting off any current requests being served.
apachectl graceful
If you aren't going to turn override off, you might as well just use .htaccess only.
Edit in response to your edit:
Say you have a request for www.example.com/dir1/dir2/dir3/file
Apache will look for a .htaccess file in all 3 of those directories and the root for rules to apply to the request if you have overriding allowed.
Ease of use and IMO maintainability (just go to the dir you want as any permissioned user) = .htaccess but that is parsed repeatedly vs. the parse once in httpd.conf where your über-high volume would be best set.
There are three issues here in terms of which is "better":
performance
management
security
.htaccess is slower, harder to manage, and potentially less secure. If you have access to the httpd.conf, then placing rules there can be easier to manage (in one place), faster ("AllowOverrides None" means that the server does not look in the current directory and any parent directories for an override file to parse and follow), and since .htaccess files are not present in the website directory, they cannot be edited (and if created, will be ignored).
You may use both of them. IMHO, .htaccess will be a bit better

How can I redirect requests to specific files above the site root?

I'm starting up a new web-site, and I'm having difficulties enforcing my desired file/folder organization:
For argument's sake, let's say that my website will be hosted at:
http://mywebsite.com/
I'd like (have set up) Apache's Virtual Host to map http://mywebsite.com/ to the /fileserver/mywebsite_com/www folder.
The problem arises when I've decided that I'd like to put a few files (favicon.ico and robots.txt) into a folder that is ABOVE the /www that Apache is mounting the http://mywebsite.com/ into
robots.txt+favicon.ico go into => /fileserver/files/mywebsite_com/stuff
So, when people go to http://mywebsite.com/robots.txt, Apache would be serving them the file from /fileserver/mywebsite_com/stuff/robots.txt
I've tried to setup a redirection via mod_rewrite, but alas:
RewriteRule ^(robots\.txt|favicon\.ico)$ ../stuff/$1 [L]
did me no good, because basically I was telling apache to serve something that is above it's mounted root.
Is it somehow possible to achieve the desired functionality by setting up Apache's (2.2.9) Virtual Hosts differently, or defining a RewriteMap of some kind that would rewrite the URLs in question not into other URLs, but into system file paths instead?
If not, what would be the preffered course of action for the desired organization (if any)?
I know that I can access the before mentioned files via PHP and then stream them - say with readfile(..), but I'd like to have Apache do as much work as necessary - it's bound to be faster than doing I/O through PHP.
Thanks a lot, this has deprived me of hours of constructive work already. Not to mention poor Apache getting restarted every few minutes. Think of the poor Apache :)
It seems you are set to using a RewriteRule. However, I suggest you use an Alias:
Alias /robots.txt /fileserver/files/mywebsite_com/stuff/robots.txt
Additionally, you will have to tell Apache about the restrictions on that file. If you have more than one file treated this way, do it for the complete directory:
<Directory /fileserver/files/mywebsite_com/stuff>
Order allow,deny
Allow from all
</Directory>
Can you use symlinks?
ln -s /fileserver/files/mywebsite_com/stuff/robots.txt /fileserver/files/mywebsite_com/stuff/favicon.ico /fileserver/mywebsite_com/www/
(ln is like cp, but creates symlinks instead of copies with -s.)