Drupal Clean Urls break randomly for arbitrary paths - apache

I've done everything right. My server has mod_rewrite enabled, my virtualhost path has AllowOverride set to All, and I have the .htaccess file in place with the rewrite rules same as everyone. But I have trouble accessing some pages using their clean url paths. So for 90% of the pages, clean urls work fine. But for that 10%, they don't.
I have checked whether those pages exist -- they do. Checked whether they are accessible using index.php?q=[path] -- and they are. They are only inaccessible through clean url paths.
Can anyone help me with this mystery?

Because you can access your pages through q=path/to/menu/item, then it's clear that it is mod_rewrite that is at fault and not Drupal.
To debug what is going on with your rewrite, either turn on the rewrite log and tail -f it while you request the troubled pages, or alternatively print_r($_GET) at the top of index.php or page.tpl.php to see what is actually being requested.
If you are comfortable posting your potentially sensitive .htaccess here, do so and we can have a look at it for you to see if there are any misconfigurations.

mod_rewrite has a few long-standing bugs that mangle URLs on the way through (do your problem urls have any escape characters?). I don't know if Drupal does this, but in other PHP apps I have had to add code to re-do the rewrite once the correct entrypoint has been reached.
Unfortunately, Drupal can't take its search path in PATH_INFO (as a lot of other apps do), otherwise you could use mod_alias which is much simpler and much more reliable.

Related

Worried about potential performance issue with httpd.conf with thousands of 301 redirects

I've been doing further research about permanent 301 re-directs.
I'm doing a site re-design and 4,000 pages are changing their URLs so I have a few thousand 301 re-direct statements where the URLs are changing so much that I can't do regular expressions.
I've been researching about the performance differences between putting them in a .htaccess file or in httpd.conf. I'm reading and getting conflicting information about the benefits of each.
i.e. this which sounds promising:
"Note, if you are using Apache then it's strongly recommended to put the redirect rules in your httpd.conf (stored in memory when Apache starts) and not .htaccess files (which are loaded on every page request)."
Source - Major site rewrite and SEO with 301 redirects
but then conflicted by this:
"You can use Include directive in httpd.conf to be able to maintain redirects in another file. But it would not be very efficient, as every request would need to be checked against a lot of regular expressions. "
Source - http://www.faqoverflow.com/serverfault/414225.html
My host said:
"No performance impact with httpd.conf, you're in effect doing the same thing as adding them to the config itself. But you are doing so in a way that will not cause issues with it or have the changes lost."
Is it correct that adding thousands of 301 re-direct statements to httpd.conf won't cause performance issues for my site?
Is it correct that adding thousands of 301 re-direct statements to httpd.conf won't cause performance issues for my site?
Thousands of redirects are fine. I've heard or people attempting to do benchmarks on stuff like there's practically no significant impact, anymore so than regular stuff your server's OS does.
Your second quote is flat out wrong, at least as far as apache 2.2-2.4 is concerned. When you use the Include directive, it loads the contents of the file(s) as part of the server's configuration. That means it's loaded when you start the server, or when you explicitly tell apache to reload its configuration. It does not look at all the Included files for every request.
Apache uses this directive pretty liberally, as in most out-of-the-box configurations use Include to load entire directories of per-module configuration and per-vhost configuration.

mod_rewrite - Does Apache caches .htaccess rules? (still follow rules even after deleting the .htaccess file)

I was doing some tests with mod_rewrite in my wamp environment.
I tested a simple rule that I put at the root of one of my websites and asked it to redirect any request ending with index.php to localhost (there is no sense to it, just wanted to check the rule)
It worked, but after, any change I'd made to my .htaccess file rule was not reflected.
After a while I just decided to delete the .htaccess... well it's still doing redirection! I just don't understand it. Does Apache cache the rules or something (restarting services trough wamp menu didn't change anything)
(Don't ask for the exact rule I used, since I deleted the file, I don't think it's relevant anyway)
.htaccess files are processed each time a request comes through. It is possible that your browser cached the request being forwarded. Did you try it with httpfox or anything to see what the headers said?
Have you tried deleting the browser cache?

Detecting if Apache is using mod_rewrite

How can a client detect if a server is using mod_rewrite? Now I know that some mod_rewrite rules are not very obvious. But some are, such as "SEO Friendly Urls". What types of behavior is impossible unless a server is running mod_rewrite?
What types of behavior is impossible unless a server is running mod_rewrite?
The real answer is "none". In theory, any URL could be formed by actual files or directories, including the classical "SEO friendly" URLs.
There is only circumstantial evidence:
The best indication that I can think of is when the entire site structure consists of URLs without .htm .php .html file extensions:
http://domain.com/slugs/house-warming-party
to exclude the possibility of that URL being a directory, request
http://domain.com/slugs/house-warming-party/index.htm
http://domain.com/slugs/house-warming-party/index.html
http://domain.com/slugs/house-warming-party/index.php
http://domain.com/slugs/house-warming-party/index.asp
... whatever other extensions there are .....
if those requests all fail, it is very likely that the site is using mod_rewrite. However if they succeed, as #Gumbo says, it could also be the MultiViews option fixing the request. Either way, this is nowhere near safe!
Depending on what your use case is, you could also try to deduct things from the CMS used on the site. Wordpress with mod_rewrite turned on will show a different URL structure than with it turned off. The same holds true for most other CMSes. But of course, this is also a highly imperfect approach.
The use of HTML resources with a .html/.htm/.php ending would point slightly against the use of mod_rewrite, but you can never be sure.
The use of the PATHINFO variable (also known as poor man's mod_rewrite) would point somewhat strongly against the use of mod_rewrite:
http://example.com/index.php/slugs/house-warming-party
In conclusion, mod_rewrite (like most URL-rewriting tools) is supposed to be a module transparent to the outside world. I know of no sure-fire way to detect it from outside, and there may well be none.

Redirecting a Directory to a Script on Apache

So I'm playing with a script that makes it super easy to mirror images off of the web. The script works great (based off of the old imgred.com source, if you've seen that) problem is, it looks a little clunky when using it.
Currently, in order to use the script, you go to a url like:
http://mydomain.com/mirror/imgred.php?Image=http://otherdomain.com/image.jpg
What I'd like to do is to be able to go to:
http://mydomain.com/mirror/http://otherdomain.com/image.jpg
and have it redirect to the former URL, preferably transparent to the user.
I'm reasonably certain that this can be done via .htaccess with a MOD_REWRITE of some kind, but I'm getting frustrated trying to get that to work.
After messing with this myself, I found out that apache collapses any double slash in the URL before the query part into a single slash, and passes the result to mod_rewrite. Maybe that was giving you problems?
This might work for you (.htaccess in the mirror directory):
RewriteEngine On
RewriteBase /mirror
RewriteRule ^http(s?):/(.*) imgred.php?Image=http$1://$2 [L]
Don't know if your script accepts https addresses as well, so I included that just to be sure

Popular techniques to debug .htaccess

I'm a self-taught coder and I like to debug by echoing suspicious variables and commenting out code.
Lately, I've had to learn more about the .htaccess file. I need it to do things like interpret php scripts as php5, url rewriting, limit file upload size etc.... I have a lot of trouble debugging a .htaccess file. I often have to migrate PHP applications from one shared hosting environment to another. Sometimes this breaks the .htaccess file (or instead, something in the .htaccess file breaks the site). I check to make sure domain names are updated.
Are there popular techniques for debugging a .htaccess file? Is it just look in the apache logs? Anything else?
Looking in the apache logs is the easiest way to debug .htaccess imho (adding rewriteLog Directive if necessary)
About migrating: if you are not using any physical file paths inside .htaccess (i.e. /var/www/site/script.php) they should be working without problems. If this is not the case, first try to remove all options and leave only redirect directives, in this mode you can see if it's problem with server configuration which denies rewriting of default settings.
Some reference