Understanding Apache RewriteMap with RewriteLock - apache

I've taken over development of a fairly heavy-duty LAMP application. The original dev used an .htaccess file with RewriteMap and a PHP script to handle certain conditions of the app.
Specifically, when certain subdomain patterns are requested by the client, the RewriteMap catches them and sends them to the appropriate application module.
I'm quite comfortable with typical mod_rewrite redirects, and I think I've got the basic RewriteMap concept figured out; but I'm struggling to find decent documentation on how RewriteLock works. According to the Apache docs:
This directive sets the filename for a synchronization lockfile which mod_rewrite needs to communicate with RewriteMap programs. Set this lockfile to a local path (not on a NFS-mounted device) when you want to use a rewriting map-program. It is not required for other types of rewriting maps.
But this is still a little vague for me. Whats the exact purpose and function of RewriteLock and how does it work?

RewriteLock is used with the prg: keyword. RewriteMap can be used with several keywords, to use text files (txt:), hashfiles (dbm:), randomized text (rnd:) or external mapping scripts ( this one is the prg: keyword ). In this mode the external script is launched when apache start. Then for every incoming request, when mod-rewrite is calling the prg: mapping, apache sends input to that script and reads the output stream to get the value.
RewriteLock must be used in that case to prevent parallel requests (so parallel inputs to that external process) to mix answers on this process standard output. It's a locking mechanism (a file, the given path, which is a classical token, only one user) to enforce serialization of the calls to this external mapping script. IMHO it should be transparently applied by mod-rewrite when using prg: as I never found a prg case where this locking thing is not mandatory.
Edit:
Well in fact you could use an external prg: without the rewriteLock if randomization of the output is not a problem, i.e. for a given entry you can get a response which was given for another entry, like in a script doing some advanced rnd:, your own round-robin service. But if the output must reflect the entry, then you need that semaphore, which of course can slow down the rewritemap process.
So if you're only using the hashmap or textmap you do not need to set the RewriteLock.
Edit:
You may find useful details on this thread, like the fact the lock file exists only for a few milliseconds, when apache calls the prg and waits for an answer.
Edit:
On the question one strange fact is:
The original dev used an .htaccess file with RewriteMap
This is strange because RewriteMap cannot work on .htaccess files, .htaccess are configuration entries read dynamically and RewriteMap as stated here in the Context line can only be set in the main configuration or in a VirtualHost configuration. It cannot be in a Location, a Directory or a .htaccess. So chances are this will never work in a .htaccess.
Now #puk asked for an example of RewriteMap usage. Well, searching for "RewriteMap" in Stack overflow will show you several real examples:
here in a question
here a list of example in my answer
another here

Apache hangs if you define more than one RewriteLock directives or if you use it in a VHOST config.
The RewriteLock should be specified at server config level and ONLY ONCE. This lock file will be used by all prg type maps. So if you want to use multiple prg maps, I suggest using an internal locking mechanism, for example in PHP there is the flock function, and simply ignore the warning apache writes in the error log.
See here for more info:
http://books.google.com/books?id=HUpTYMf8-aEC&lpg=PP1&pg=PA298#v=onepage&q&f=false

Related

How can I use an .htaccess file in Nginx?

I am currently migrating my website from Apache to nginx, but my .htaccess file is not working. My website is inside the /usr/share/nginx/html/mywebsite folder. How can I use .htaccess in my nginx server?
This is my .htaccess file:
RewriteEngine on
RewriteRule video/watch/([a-zA-Z0-9_#$*-]+)/?$ "videos-single.php?id=$1" [NC]
Nginx doesn't support .htaccess (see here: "You can’t do this. You shouldn’t. If you need .htaccess, you’re probably doing it wrong.").
You've two choices (as I know):
import your .htaccess to nginx.conf (maybe the htaccess to nginx converter helps you)
use authd-htpasswd (I didn't try it)
Disclosure: I am the author of htaccess for nginx, which is now open source software.
Over the past years, I created a plugin which implements htaccess behaviour into nginx, especially things like RewriteRule, Allow and Deny, which can be crucial for web security. The plugin is used in my own productive environments without a problem.
I totally share the point of efficiency and speed in nginx, and why they didn't implement htaccess.
However, think about it. You cannot make it worse if you're using nginx plus htaccess. You still keep the great performance of nginx, plus you can drive your legacy appliances effortlessly on one webserver.
This is not supported officially in nginx. If you need this kind of functionality you will need to use Apache or some other http server which supports it.
That said, the official nginx reasoning is flawed because it conflates what users want to do with the way it is done. For example, nginx could easily check the directories only every 10 seconds / minute or so, or it could use inotify and similar mechanisms. This would avoid the need to check it on every request... But knowing that doesn't help you. :)
You could get around this limitation by writing a script that would wait for nginx config files to appear and then copy them to /etc/nginx/conf.d/. However there might be some security implications - as there is no native support for .htaccess in nginx, there is also no support for limiting allowed configuration directives in config files. YMMV.
Using the config file is one option, but the cool thing about the .htaccess file is that it provided a way for a web developer to have some control over server settings without having root access to the server. There doesn't seem to be anything like this on nginx which is a real bummer.
I understand how the way it's setup on apache slows down response times, but hoped there could be an nginx way to do the same thing without the performance hit... At least a way to do rewrites with regex on urls if nothing else.
"Is there no nginx way to do bulk redirects using regular expressions that doesn't slow down response times."
Just edit your database with myphpmyadmin.
Open myphpmyadmin select your database then find your "yourprefix_Posts" table.
Open it then click the "Search" tab, then "Find and Replace".
Select "post_content" in the dropdown
In the "Find" field, type URL you want to change: "website.com/oldURL".
In the "Replace" field, type the new URL: "website.com/newURL".
(To use regular expression, tick the "Regular Expression" box.)
NOTE: You can test this out by simply leaving the "Replace" field blank.
ALWAYS BACKUP database before making changes. This might sound scary but its really not. Its super simple and can be used to quickly replace just about anbything.

List of served files in apache

I am doing some reverse engineering on a website.
We are using LAMP stack under CENTOS 5, without any commercial/open source framework (symfony, laravel, etc). Just plain PHP with an in-house framework.
I wonder if there is any way to know which files in the server have been used to produce a request.
For example, let's say I am requesting http://myserver.com/index.php.
Let's assume that 'index.php' calls other PHP scripts (e.g. to connect to the database and retrieve some info), it also includes a couple of other html files, etc
How can I get the list of those accessed files?
I already tried to enable the server-status directive in apache, and although it is working I can't get what I want (I also passed the 'refresh' parameter)
I also used lsof -c httpd, as suggested in other forums, but it is producing a very big output and I can't find what I'm looking for.
I also read the apache logs, but I am only getting the requests that the server handled.
Some other users suggested to add the PHP directives like 'self', but that means I need to know which files I need to modify to include that directive beforehand (which I don't) and which is precisely what I am trying to find out.
Is that actually possible to trace the internal activity of the server and get those file names and locations?
Regards.
Not that I tried this, but it looks like mod_log_config is the answer to my own question

Aliases on Dreamhost, general management of http request / server errors

I had a hard time deciding how I should manage these errors (404, 500, ...) and when I finally decided, I am encountering problems. This is a reeeeeally long question, I appreciate anyone's attempt to help!
Let me first describe how I decided to set it up. I have several sites hosted on a shared Dreamhost account. In the folder structure that I see, everything of mine on the server is under /home/username, and for example, site1.com's web root is at /home/username/site1.com
I am creating a generic error handler (php script) for errors like 404 not found, 500, etc. that I want to store above the web roots of my sites at /home/username/error_handler/index.php so that I can use an .htaccess file at /home/username/.htaccess which includes something like the following:
ErrorDocument 404 /error_handler/index.php
ErrorDocument 500 /error_handler/index.php
...and many more
When these errors occur on any of my sites, I want it to be directed to /home/username/error_handler/index.phpThis is the problem I'm having a hard time figuring out. The ErrorDocument directives above will actually cause Apache to look for /home/username/site1.com/error_handler/index.php
Anyway, the errors should be redirected to my error handling php script. The script will use $_SERVER['REDIRECT_STATUS'] to get the error code, then use $_SERVER['REDIRECT_URL'] and $_SERVER['HTTP_HOST'] to decide what to do. It will check if an error handler specific to that site exists (for example: site1.com/errors/404.php). If this custom page doesn't exist, it will output a generic message that is slightly more user-friendly and styled, and perhaps will include some contact info for me depending on the error.
Doing it this way lets me funnel all these errors through this 1 php script. I can log the errors however I like or send email notifications if I want. It also lets me set up the ErrorDocument Apache directives once for all my sites instead of having to do it for every site. It will also continue to work without modification when I move the site around, since I already have a system that scans the folder structure to figure out where my site roots are when they really aren't at the web root technically speaking. This may not be possible with other solutions like using mod_rewrite for all 404 problems, which I know is common. Or if it is possible, it may be very difficult to do. Plus, I have already done that work, so it will be easy for me to adapt.
When I am working on sites for which I don't have a domain name yet (or sites where the domain name is already in use at the moment), I store them temporarily in site1.com/dev/site3.com for example. Moving the site to site3.com eventually would cause me to have to update the htaccess files if I had one for each site. Changing the domain name would do the same.
Ex: a site stored at site1.com/dev/site3.com would have this in its htaccess file:
ErrorDocument 404 /site1.com/dev/site3.com/error/404.php
And it would have to be changed to this:
ErrorDocument 404 /site3.com/error/404.php
Obviously, this isn't a huge amount of work, but I already manage a lot of sites and I will probably be making more every year, 95% of which will be hosted on my shared DreamHost account. And most of them get moved at least once. So setting up something automatic will save me a some effort in the long run.
I already have a system set up for managing site-relative links on all my sites. These links will work whether the site exists in a subdirectory of an existing site, or in their own domain. They also work without change in a local development server despite a difference in the web root location. For example, on the live server, the site-relative http link /img/1.jpg would resolve to the file /home/username/site1.com/img/1.jpg while on my local development server it would resolve to C:\xampp\htdocs\img\1.jpg, despite what I consider the logical site root being at C:\xampp\htdocs\site1.com. I love this system, and it is what gave me the idea to set up something that would work automatically like I expected it to, based on the file structure I used.
So, if I could get it to work, I think this seems like a pretty good system. But I am still very new to apache configuration, mod_rewrite, etc. It's possible there is a much easier and better way to do this. If you know of one, please let me know.
Anyway, all that aside, I can't get it working. The easiest thing would be if I could have the ErrorDocument directive send the requests to folders above the web root. But the path is a URL path relative to the document root. Using the following in /home/username/.htaccess,
ErrorDocument 404 /error_handler/index.php
a request for a non-existent resource causes Apache to look for the file at
site1.com/error_handler/index.php
So I thought I should set up a redirection (on all my sites) that would redirect those URLS to /home/username/error_handler. I tried a few things and couldn't get any of them to work.
Alias seemed like the simplest solution, but it is something that has to be set at server runtime (not sure if that is the right terminology - when the server is started). On my local server, it worked fine using:
Alias /error_handler C:\xampp\htdocs\error_handler2
I changed the local folder to test that the Alias was functioning properly. (On the local server, the URL path specified by the ErrorDocument directive is actually pointing to the right folder, since in my local server the web root is technically C:\xampp\htdocs and I store the error handler I want to use is stored locally at C:\xampp\htdocs\error_handler\index.php)
Dreamhost has a web client that can create what I am guessing is an Alias. When I tried to redirect the folder error_handler on site1.com to /home/username/error_handler, it would seem to work right if I typed site1.com/error_handler in the browser. But if I typed site1.com/test1234 (non-existant), it would say there was a 404 error trying to use the error handler. Also, I would have to login through the web client and point and click (and wait several minutes for the server to restart) every time I wanted to set this up for a new site, even if I could get it to work.
So I tried getting it to work with mod_rewrite, which seems like the most flexible solution. My first attempt looked something like this (stored in /home/username/site1.com/.htaccess for now, though it would eventually be at /home/username/.htaccess:
RewriteEngine On
RewriteRule ^error_handler/index.php$ /home/username/error_handler/index.php
The plain english version of what I was trying to do above is to send requests on any of my sites for error_handler/index.php to /home/username/error_handler/index.php. The mis-understanding I had is that the subsitution will be treated as a file path if it exists. But I missed that the documentation says "(or, in the case of using rewrites in a .htaccess file, relative to your document root)". So instead of rewriting to /home/username/error_handler/index.php, it's actually trying to rewrite to /home/username/site1.com/home/username/error_handler/index.php.
I tried including Options +FollowSymLinks because in the Apache documentation it says this:
To enable the rewrite engine in this context [per-directory re-writes in htaccess], you need to set "RewriteEngine On" and "Options FollowSymLinks" must be enabled. If your administrator has disabled override of FollowSymLinks for a user's directory, then you cannot use the rewrite engine. This restriction is required for security reasons.
I searched around for a while and I couldn't find anything about how Dreamhost handles this (probably because I don't know where to look).
I experimented with RewriteBase because in the Apache documentation it says this:
"This directive is required when you use a relative path in a substitution in per-directory (htaccess) context unless either of the following conditions are true:
The original request, and the substitution, are underneath the DocumentRoot (as opposed to reachable by other means, such as Alias)."
Since this is supposed to be a URL path, in my case it should be RewriteBase /, since all my redirects will be from site1.com/error_handler. I also tried Rewrite Base /home/username and RewriteRule ^error_handler/index.php$ error_handler/index.php. However, the Rewrite Base is a URL path relative to the document root. So I need to use something like an alias still. The implication in the quote from the documentation above is that it is possible to use mod_rewrite to send content above the web root. One of the many things I don't know is what the 'other means' besides Alias might be. I believe Alias might not be an option on Dreamhost. At least I couldn't make sense of it.
Why not use error pages in the site root, then include the actual file from the shared section?

How do I rewrite URLs with Nginx admin / Apache / Wordpress

I have the following URL format:
www.example.com/members/admin/projects/?projectid=41
And I would like to rewrite them to the following format:
www.example.com/avits/projectname/
Project names do not have to be unique when a user creates them therefore I will be checking for an existing name and appending an integer to the end of the project name if a project of the same name already exists. e.g. example.project, example.project1, example.project2 etc.
I am happy setting up the GET request to query the database by project name however I am having huge problems setting up these pretty url's.
I am using Apache with Nginx Admin installed which mens that all static content is served via Nginx without the overhead of apache.
I am totally confused as to whether I should be employing an nginx rewrite rule in my nginx.conf file or standard rewrites in my .htaccess file.
To confuse matters further although this is a rather large custom appliction it is build on top of a wordpress backbone for easy blogging functionality meaning that I also have the built in wordpress rewrite module at my disposal.
I have tried all three methods with absolutely no success. I have read a lot on the matter but simply cannot seem to get anything to work. I am certain this is purely down to a complete lack of understanding on with regards to URL rewriting. Combined with the fact that I don't know which type of rewriting should be applicable in my case means that I am doing nothing more than going round in circles.
Can anyone clear up this matter for me and explain how to rewrite my URLs in the manner described above?
Many thanks.
If you are proxying all the non static file requests to Apache, do the rewrites there - you don't need to do anything on nginx as it will just pass the requests to the back end.
The problem with what you are proposing is that it's not actually a rewrite, a rewrite is taking the first URL and just changing it around or moving the user to another location.
What you need actually takes logic to extrapolate the project name from the project ID.
For example you can rewrite:
www.example.com/members/admin/projects/?projectid=41
To:
www.example.com/avits/41/
Fairly easily, but can you map that /41/ in your app code to change it to /projectname/ - because a URL rewrite can't do that.

Detecting if Apache is using mod_rewrite

How can a client detect if a server is using mod_rewrite? Now I know that some mod_rewrite rules are not very obvious. But some are, such as "SEO Friendly Urls". What types of behavior is impossible unless a server is running mod_rewrite?
What types of behavior is impossible unless a server is running mod_rewrite?
The real answer is "none". In theory, any URL could be formed by actual files or directories, including the classical "SEO friendly" URLs.
There is only circumstantial evidence:
The best indication that I can think of is when the entire site structure consists of URLs without .htm .php .html file extensions:
http://domain.com/slugs/house-warming-party
to exclude the possibility of that URL being a directory, request
http://domain.com/slugs/house-warming-party/index.htm
http://domain.com/slugs/house-warming-party/index.html
http://domain.com/slugs/house-warming-party/index.php
http://domain.com/slugs/house-warming-party/index.asp
... whatever other extensions there are .....
if those requests all fail, it is very likely that the site is using mod_rewrite. However if they succeed, as #Gumbo says, it could also be the MultiViews option fixing the request. Either way, this is nowhere near safe!
Depending on what your use case is, you could also try to deduct things from the CMS used on the site. Wordpress with mod_rewrite turned on will show a different URL structure than with it turned off. The same holds true for most other CMSes. But of course, this is also a highly imperfect approach.
The use of HTML resources with a .html/.htm/.php ending would point slightly against the use of mod_rewrite, but you can never be sure.
The use of the PATHINFO variable (also known as poor man's mod_rewrite) would point somewhat strongly against the use of mod_rewrite:
http://example.com/index.php/slugs/house-warming-party
In conclusion, mod_rewrite (like most URL-rewriting tools) is supposed to be a module transparent to the outside world. I know of no sure-fire way to detect it from outside, and there may well be none.