Worried about potential performance issue with httpd.conf with thousands of 301 redirects - apache

I've been doing further research about permanent 301 re-directs.
I'm doing a site re-design and 4,000 pages are changing their URLs so I have a few thousand 301 re-direct statements where the URLs are changing so much that I can't do regular expressions.
I've been researching about the performance differences between putting them in a .htaccess file or in httpd.conf. I'm reading and getting conflicting information about the benefits of each.
i.e. this which sounds promising:
"Note, if you are using Apache then it's strongly recommended to put the redirect rules in your httpd.conf (stored in memory when Apache starts) and not .htaccess files (which are loaded on every page request)."
Source - Major site rewrite and SEO with 301 redirects
but then conflicted by this:
"You can use Include directive in httpd.conf to be able to maintain redirects in another file. But it would not be very efficient, as every request would need to be checked against a lot of regular expressions. "
Source - http://www.faqoverflow.com/serverfault/414225.html
My host said:
"No performance impact with httpd.conf, you're in effect doing the same thing as adding them to the config itself. But you are doing so in a way that will not cause issues with it or have the changes lost."
Is it correct that adding thousands of 301 re-direct statements to httpd.conf won't cause performance issues for my site?

Is it correct that adding thousands of 301 re-direct statements to httpd.conf won't cause performance issues for my site?
Thousands of redirects are fine. I've heard or people attempting to do benchmarks on stuff like there's practically no significant impact, anymore so than regular stuff your server's OS does.
Your second quote is flat out wrong, at least as far as apache 2.2-2.4 is concerned. When you use the Include directive, it loads the contents of the file(s) as part of the server's configuration. That means it's loaded when you start the server, or when you explicitly tell apache to reload its configuration. It does not look at all the Included files for every request.
Apache uses this directive pretty liberally, as in most out-of-the-box configurations use Include to load entire directories of per-module configuration and per-vhost configuration.

Related

Is there any way in Apache to specify a single .htaccess file?

My hosting allows use of .htaccess files as the configuration files are not available.
I'm aware of the performance hit that override files incur though - so I was thinking - If Apache provided a mode for having a single .htaccess file - wouldn't that be faster than having to check for multiple .htaccess files whilst still maintaining the convenience?
If Apache provided a mode for having a single .htaccess file
Well, not a "mode" as such, but you could achieve this by allowing .htaccess for the parent directory (root directory or even one above the document root) and disable the use of .htaccess files in all subdirectories. The .htaccess file in the parent directory will still apply.
Realistically (if indeed this is at all "realistic"), you would probably need to enable .htaccess for the directory "above the document root" and disable .htaccess in the document root and below, rather than enabling .htaccess for the document root. Otherwise, if you enable .htaccess for the document root, you will have to disable .htaccess for each subdirectory individually. And if you add more subdirectories, the server config will need to be updated accordingly. (Since, the AllowOverride directive is only allowed in <Directory> containers without regex, not <DirectoryMatch> containers.) However, this might not be possible on some shared hosting environments (there might not be an "above the document root") and it could impact the installation of some CMSs.
Note that you obviously need access to the server config (or VirtualHost) in order to implement this, so it is hypothetical in this instance.
wouldn't that be faster than having to check for multiple .htaccess files
Possibly. But you are only talking about a micro-optimsation at best in real terms. On most sites, even enabling .htaccess files at all will hardly be noticeable - if at all. The "performance hit" you speak of is not as big as you might think. To put it another way, if you are finding that .htaccess is proving to be a bottle neck then you've either done something wrong, or you have far more serious problems to address.
Note, however, that you are generally only using .htaccess files on smaller sites anyway. On larger / high traffic sites you will have your own VPS / Server and access to the server config, so there wouldn't be any need to use .htaccess (or, importantly, have it enabled).
whilst still maintaining the convenience?
Not exactly. Part of the "convenience" is being able to put the .htaccess file in any directory you like, overriding parent directives and have it apply to just that directory tree. (It is the userland equivalent of the <Directory> container in the server config.)

How can I use an .htaccess file in Nginx?

I am currently migrating my website from Apache to nginx, but my .htaccess file is not working. My website is inside the /usr/share/nginx/html/mywebsite folder. How can I use .htaccess in my nginx server?
This is my .htaccess file:
RewriteEngine on
RewriteRule video/watch/([a-zA-Z0-9_#$*-]+)/?$ "videos-single.php?id=$1" [NC]
Nginx doesn't support .htaccess (see here: "You can’t do this. You shouldn’t. If you need .htaccess, you’re probably doing it wrong.").
You've two choices (as I know):
import your .htaccess to nginx.conf (maybe the htaccess to nginx converter helps you)
use authd-htpasswd (I didn't try it)
Disclosure: I am the author of htaccess for nginx, which is now open source software.
Over the past years, I created a plugin which implements htaccess behaviour into nginx, especially things like RewriteRule, Allow and Deny, which can be crucial for web security. The plugin is used in my own productive environments without a problem.
I totally share the point of efficiency and speed in nginx, and why they didn't implement htaccess.
However, think about it. You cannot make it worse if you're using nginx plus htaccess. You still keep the great performance of nginx, plus you can drive your legacy appliances effortlessly on one webserver.
This is not supported officially in nginx. If you need this kind of functionality you will need to use Apache or some other http server which supports it.
That said, the official nginx reasoning is flawed because it conflates what users want to do with the way it is done. For example, nginx could easily check the directories only every 10 seconds / minute or so, or it could use inotify and similar mechanisms. This would avoid the need to check it on every request... But knowing that doesn't help you. :)
You could get around this limitation by writing a script that would wait for nginx config files to appear and then copy them to /etc/nginx/conf.d/. However there might be some security implications - as there is no native support for .htaccess in nginx, there is also no support for limiting allowed configuration directives in config files. YMMV.
Using the config file is one option, but the cool thing about the .htaccess file is that it provided a way for a web developer to have some control over server settings without having root access to the server. There doesn't seem to be anything like this on nginx which is a real bummer.
I understand how the way it's setup on apache slows down response times, but hoped there could be an nginx way to do the same thing without the performance hit... At least a way to do rewrites with regex on urls if nothing else.
"Is there no nginx way to do bulk redirects using regular expressions that doesn't slow down response times."
Just edit your database with myphpmyadmin.
Open myphpmyadmin select your database then find your "yourprefix_Posts" table.
Open it then click the "Search" tab, then "Find and Replace".
Select "post_content" in the dropdown
In the "Find" field, type URL you want to change: "website.com/oldURL".
In the "Replace" field, type the new URL: "website.com/newURL".
(To use regular expression, tick the "Regular Expression" box.)
NOTE: You can test this out by simply leaving the "Replace" field blank.
ALWAYS BACKUP database before making changes. This might sound scary but its really not. Its super simple and can be used to quickly replace just about anbything.

How to implement url page redirection for a massive huge website

my site e.g. carparts.co.uk has 355000 unique urls. (it is a car parts catalogue site) (on webmaster tools it shows that 174000 of these are indexed)
We want to move our site to a new shopping cart platform (prestashop), and have completely changed the structure of the catalogue, which means we now have a new set of urls. (although the main domain is unchanged and is still carparts.co.uk)
i now have a excel sheet where I have a column of the 355000 'old' urls matched against the closest equivalent url on the new catalogue.
e.g.
old url: "carparts.co.uk/ford-ranger-alternator belts.htm"
goes to: "carparts.co.uk/belt-drive"
(and there are 355,000 of similar redirects)
my question is how should i do this?
i've that you can use htaccess to do this, but i'm worried because i've read that htaccess slows down sites if it is very large (is this slowness only encounted when trying to access one of the old urls?, or will it impact the speed of all my urls?
so what is the best thing for me to do with such a large number of urls?
Your best bet is probably setting up a RewriteMap. This requires server vhost config access as you can't configure the map from an htaccess file (though you can use one). The mapping is cached by apache so you don't need to worry about constant file access.
Something simple like:
RewriteMap redirects txt:/full/path/to/redirect-map.txt
Then in the file redirect-map.txt would simply have a "from" and "to":
"ford-ranger-alternator belts.htm" belt-drive
old-url.htm new-url
etc...
Then in either your htaccess file or in vhost config, just do:
RewriteCond $(redirects:$1|0) !=0
RewriteRule ^(.*)$ $(redirects:$1) [L,R]
Use of htaccess slows down the website because it needs to check several files for each request, and these are checked dynamically for every request.
It's more a problem for deep routed sites. For example, a request to:
www.example.com/folder1/folder2/folder3/folder4/index.htm
would need to check
The main config file.
Then add any overrides in the document root
htaccess file.
Then add any overrides in the folder1 htaccess file.
Then add any overrides in the folder2 htaccess file.
...etc.
However if you don't have deep nesting then it's not so bad. Still slower than not using them, but may not be noticeable on most sites.
The benefit of htaccess for you here, would mean that you wouldn't need to put all the redirects in one place, and could split them up amongst the htaccess file. I'm not sure of the impact of adding 355,000 redirects to the main Apache config, but it is a fair number, so imagine it could have a performance impact. The htaccess files, on the other hand, are read dynamically as the request is made, so all the redirects would not need to be loaded into Apache.
So, this might be one of the few use cases where htaccess might be a better solution, even if you do have access to the main config files.

Detecting if Apache is using mod_rewrite

How can a client detect if a server is using mod_rewrite? Now I know that some mod_rewrite rules are not very obvious. But some are, such as "SEO Friendly Urls". What types of behavior is impossible unless a server is running mod_rewrite?
What types of behavior is impossible unless a server is running mod_rewrite?
The real answer is "none". In theory, any URL could be formed by actual files or directories, including the classical "SEO friendly" URLs.
There is only circumstantial evidence:
The best indication that I can think of is when the entire site structure consists of URLs without .htm .php .html file extensions:
http://domain.com/slugs/house-warming-party
to exclude the possibility of that URL being a directory, request
http://domain.com/slugs/house-warming-party/index.htm
http://domain.com/slugs/house-warming-party/index.html
http://domain.com/slugs/house-warming-party/index.php
http://domain.com/slugs/house-warming-party/index.asp
... whatever other extensions there are .....
if those requests all fail, it is very likely that the site is using mod_rewrite. However if they succeed, as #Gumbo says, it could also be the MultiViews option fixing the request. Either way, this is nowhere near safe!
Depending on what your use case is, you could also try to deduct things from the CMS used on the site. Wordpress with mod_rewrite turned on will show a different URL structure than with it turned off. The same holds true for most other CMSes. But of course, this is also a highly imperfect approach.
The use of HTML resources with a .html/.htm/.php ending would point slightly against the use of mod_rewrite, but you can never be sure.
The use of the PATHINFO variable (also known as poor man's mod_rewrite) would point somewhat strongly against the use of mod_rewrite:
http://example.com/index.php/slugs/house-warming-party
In conclusion, mod_rewrite (like most URL-rewriting tools) is supposed to be a module transparent to the outside world. I know of no sure-fire way to detect it from outside, and there may well be none.

Drupal Clean Urls break randomly for arbitrary paths

I've done everything right. My server has mod_rewrite enabled, my virtualhost path has AllowOverride set to All, and I have the .htaccess file in place with the rewrite rules same as everyone. But I have trouble accessing some pages using their clean url paths. So for 90% of the pages, clean urls work fine. But for that 10%, they don't.
I have checked whether those pages exist -- they do. Checked whether they are accessible using index.php?q=[path] -- and they are. They are only inaccessible through clean url paths.
Can anyone help me with this mystery?
Because you can access your pages through q=path/to/menu/item, then it's clear that it is mod_rewrite that is at fault and not Drupal.
To debug what is going on with your rewrite, either turn on the rewrite log and tail -f it while you request the troubled pages, or alternatively print_r($_GET) at the top of index.php or page.tpl.php to see what is actually being requested.
If you are comfortable posting your potentially sensitive .htaccess here, do so and we can have a look at it for you to see if there are any misconfigurations.
mod_rewrite has a few long-standing bugs that mangle URLs on the way through (do your problem urls have any escape characters?). I don't know if Drupal does this, but in other PHP apps I have had to add code to re-do the rewrite once the correct entrypoint has been reached.
Unfortunately, Drupal can't take its search path in PATH_INFO (as a lot of other apps do), otherwise you could use mod_alias which is much simpler and much more reliable.