Handling multiple re-writing and redirection with .htaccess - apache

Working with .htaccess has always been little confusing for many developers.
Currently I am also experiencing a issue
we want 3-4 things to work simultaneously with htaccess
1) redirect non-www to www
2) remove .php extension
3) for pages with trailing parameters abc.php?pageid=28 and abc.php?pageid=95&cat=92 - these pages must show their actual page names like www.xyz.com/about-us rather than ids.
all above must work together.

Refer the following links, it may resolve the issues.
How to write multiple rewrite conditions and rules in one .htaccess file for redirecting urls?
htaccess mod_rewrite - Trailing Slash and loop of redirects
.htaccess - Rewrite multiple subdirectories to root

Related

Will over 1000 301 redirects slow down my site? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I've updated my site links to use pretty URLs instead of query strings.
Old URL: https://digimoncard.io/deck/?deckid=1241
New URL: https://digimoncard.io/deck/green-otk-1241
Both URLs can be visited without a 404 but I plan on adding 301 redirects to my htaccess like so:
RewriteEngine on
Redirect 301 https://digimoncard.io/deck/?deckid=1241 https://digimoncard.io/deck/green-otk-1241
Part of my concern is the number of redirects needed (will be 1218 exactly). Will this potentially slow down the site/server due to having to query each of these on every page load?
My other solution is to potentially leave it alone and let google index the new URLs and overtime let the query string ones stale out.
1218 redirect directives in .htaccess shouldn't cause a noticeable/significant delay, however there are other issues with your suggestion and this can be greatly optimised to avoid any additional overhead...
...having to query each of these on every page load?
It's not just "every page load", but potentially every request. All static resources (CSS, JS, images, etc.) will trigger the same set of directives (unless they are hosted away from the application server).
RewriteEngine on
Redirect 301 https://digimoncard.io/deck/?deckid=1241 https://digimoncard.io/deck/green-otk-1241
This won't work. The mod_alias Redirect directive matches the URL-path only, it does not match the query string (or scheme+hostname), so the above directive will simply fail to match and do nothing. (RewriteEngine is also part of mod_rewrite, not mod_alias.)
In order to match the query string you need to use mod_rewrite with an additional condition to check against the QUERY_STRING server variable. For example:
RewriteCond %{QUERY_STRING} ^deckid=(1241)$
RewriteRule ^deck/$ /green-otk-%1 [R=301,L]
The %1 backreference contains 1241 (captured from the preceding CondPattern) - this simply saves repetition (that could potentially introduce bugs). Unless of course you are generating these directives automatically.
Don't use .htaccess - use your application logic instead
However, ideally, you would not be doing these redirects in .htaccess to begin with. It would be far more optimal to do these in your application logic (ideally when your site determines that the result would trigger a 404 - although that does not happen in your case). By placing these directives in .htaccess you are prioritising the "redirects" at the expense of normal site traffic. By implementing these redirects later (in your application) you prioritise normal site traffic.
Since I assume you are using a front-controller to route your URLs this should be relatively trivial to implement. (You only process the redirect logic when a request comes in that matches the old URL format.)
Optimised .htaccess version
However, you could greatly optimise this if deciding to go the .htaccess route if all your old URLs follow the same format... you could internally rewrite any request that uses the format /deck/?deckid=<number> to a subdirectory (assuming all your old URLs use this format). You then have another .htaccess file in the subdirectory that processes all the 1218 redirects. This way, you only have a single directive in your main .htaccess file, that is processed on every request and the redirect logic (in the subdirectory .htaccess file) is only processed when it needs to be.
This avoids the overhead of having 1000+ redirect directives in the main .htaccess file.
The directives in the subdirectory .htaccess file can also be simplified since we can rewrite the request to move the query string to the URL-path to avoid the additional condition later.
For example, at the top of your root .htaccess file:
RewriteEngine On
# Internally rewrite the request for (what looks like) an old URL
# ...to the "/redirect-old" subdirectory
RewriteCond %{QUERY_STRING} ^deckid=(\d+)$
RewriteRule ^deck/$ redirect-old/%1 [L]
All URLs of the form /deck/?deckid=<number> are internally rewritten to /redirect-old/<number>...
Then, in /redirect-old/.htaccess you have simplified "redirect" directives like the following that match against the rewritten URL:
# /redirect-old/.htaccess
RewriteEngine On
# Redirect old URLs
RewriteRule ^1241$ /deck/green-otk-$0 [R=301,L]
RewriteRule ^1234$ /deck/foo-$0 [R=301,L]
RewriteRule ^4321$ /deck/bar-$0 [R=301,L]
:
These directives match the rewritten URL, ie. /redirect-old/<number> and redirect accordingly.
The $0 backreference in each case is simply the URL-path (ie. number) that is matched by the RewriteRule pattern. (Saves repetition - as mentioned above.)
Well, it may slow down but not necessarily. If your .htaccess doesn't contain like 10k or more redirects, it should be fine. But on a precautionary side, you can always use files with less size and remove unnecessary redirects and let google index the URLs.
You can refer this link for more information
https://www.matthewedgar.net/do-redirects-add-to-website-speed/#htaccessredirectsspeed

htaccess redirection implied html extension not working

I've tried many things before coming here, it should be a simple problem, but there is something I miss for sure.
I want to redirect a bunch of URLs to another ones, one by one, and here is an example in my .htaccess file :
RewriteEngine On
Redirect 301 /index.php/Microcontrôleurs_Généralités https://newdomain.org/Microcontrôleurs_Généralités
The thing is that the old URLs are files in a real folder "index.php" but with ".html" extension.
When I go to https://olddomain.org/Microcontrôleurs_Généralités, apache serves me the implied .html file. I can go to https://olddomain.org/Microcontrôleurs_Généralités.html too, it's the same file on disk.
But my redirection as above does not redirect anything.
If I add the .html extension to the file like this :
RewriteEngine On
Redirect 301 /index.php/Microcontrôleurs_Généralités.html https://newdomain.org/Microcontrôleurs_Généralités
Then, if I go to the URL with explicit ".html" at end, it is redirected correctly, but if I miss the .html, apache says the URL was not found.
I've turned this in my head numerous times, I can't figure out the real problem.
Help would be much apreciated, thx.

multiple folder redirect

I have been trying variations of the following without success:
Redirect permanent /([0-9]+)/([0-9]+)/(.?).html http://example.com/($3)
It seems to have no effect. I have also tried rewrite with similar lack of results.
I want all links similar to: http://example.com/2002/10/some-long-title.html
to redirect the browser and spiders to: http://example.com/some-long-title
When I save this to my server, and visit a link with the nested folders, it just returns a 404 with the original URL unchanged in the address bar. What I want is the new location in the address bar (and the content of course).
I guess this is more or less what you are looking for:
RewriteEngine On
ReriteRule ^/([0-9]+)/([0-9]+)/(.?)\.html$ http://example.com/$3 [L,R=301]
This can be used inside the central apache configuration. If you have to use .htaccess files because you don't have access to the apache configuration then the syntax is slightly different.
Using mod_alias, you want the RedirectMatch, not the regular Redirect directive:
RedirectMatch permanent ^/([0-9]+)/([0-9]+)/(.+)\.html$ http://example.com/$3
Your last grouping needs to be (.+) which means anything that's 1 character or more, what you had before, (.?) matches anything that is either 0 or 1 character. Also, the last backreference doesn't need the parentheses.
Using mod_rewrite, it looks similar:
RewriteEngine On
RewriteRule ^/([0-9]+)/([0-9]+)/(.+)\.html$ http://example.com/$3 [L,R=301]

SEO URLs with ColdFusion controller?

quick ref: area = portal type page.
I would like old urls http://domain.com/long/rubbish/url/blah/blah/index.cfm?id=12345
to redirect to http://domain.com/area/12345-short-title
http://domain.com/area/12345-short-title should display the content.
I have worked out so far to do this I could use apache to write all URLs to
http://domain.com/index.cfm/long/rubbish/url/blah/blah/index.cfm?id=12345
and
http://domain.com/index.cfm/area/12345-short-title
The index.cfm will either server the content or apply a permanent redirect, but it will need to get the title and area information from the database first.
There are 50,000 pages on this website. I also have other ideas for subdomain redirects, and permanent subdomains and controlling how they act through the index.cfm.
Infrastructure are keen to do as much through Apache rewrite as possible, we suspect it would be faster. However I'm not sure we have that choice if we need to get the area and title information for each page.
Has anyone got some experience with this that can provide input?
--
Something to note, I'm assuming we'll have to keep all the internal URLs used on the website in the old format. It would be a mega job to change them all.
This means all internal URLs will have to use a permanent redirect every time.
Rather than redirecting both groups of URLs to the same script, why not simply send them to two distinct scripts?
Simply like this:
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^\w+/\d+-[\w-]+$ /content.cfm/$0 [L]
RewriteCond ${REQUEST_URI} !-f
RewriteRule ^.* /redirect.cfm/$0 [L,QSA]
Then, the redirect.cfm can lookup the replacement URL and do the 301 redirect, whilst content.cfm simply serves the content.
(You haven't specified how your CF is setup; you may need to update the Jrun/Tomcat/other config to support /content.cfm/* and /redirect.cfm/* - it'll be done the same as it's done for index.cfm)
For performance reasons, you still want to avoid the database hits for redirecting if you can, and you can do that by generating rewrite rules for each page that performs the 301 redirect on the Apache side. This can be as simple as appending a line to the .htaccess file, like so:
<cfset NewLine = 'RewriteRule #ReEscape(OldUrl)# #NewUrl# [L,QSA,R=301]' />
<cffile action="append" file="./.htaccess" output=#NewLine# />
(Where OldUrl and NewUrl have been looked-up from the database.)
You might also want to investigate using mod_alias redirect instead of mod_rewrite RewriteRule, where the syntax would be Redirect permanent #OldUrl# #NewUrl# - since the OldUrl is an exact path match it would likely be faster.
Note that these rules will need to be checked before the above redirect.cfm redirect is done - if they are in the same .htaccess you can't simply do an append, but if they are in the site's general Apache config files then the .htaccess rules will be checked first.
Also, as per Sharon's comment, you should verify if your Apache will handle 50k rules - whilst I've seen it reported that "thousands" of regex-based Apache rewrites are perfectly fine, there may well be some limit (or at least the need to split across multiple files).
Using apache rewrites would only be faster if they were static rewrites, or if they all followed some rule that you could write in regex within the .htaccess file. If you're having to touch the database for these redirects, then it may not make sense to do it in .htaccess.
Another approach is the one used by most CMSs for handling virtual directories and redirects. An index.cfm file at the root of the site handles all incoming requests and returns the correct pages and pathing. MURA CMS uses this approach (as well as Joomla and most of the others.)
Basically you're using the CGI.path_info variable on an incoming request, searching for it in your DB, and doing a redirect to the new path. As usual, Ben Nadel has a good write-up of how to use this approach: Ben Nadel: Using IIS URL Rewriting And CGI.PATH_INFO With IIS MOD-Rewrite
You can, however, use the .htaccess to remove the "index.cfm" from the url string entirely if you want by redirecting all incoming requests to the root URL with something that looks like this in your .htaccess:
RewriteEngine On
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} !-d
RewriteRule ^([a-zA-Z0-9-]{1,})/([a-zA-Z0-9/-]+)$ /$1/index.cfm/$2 [PT]
Basically this would redirect something like http://www.yourdomain.com/your-new-url/ to http://www.yourdomain.com/index.cfm/your-new-url/ where it could be processed as described by the blog post above. The user would never see the index.cfm.

How can I redirect people accessing my files as directories?

I have the following situation:
On my webserver I have an instance of websvn running, where specific repositories and revisions can be accessed by a URL like
http://www.myhost.com/listing.php?repname=repository1&path=%2Ftrunk%2Fbackend
Somehow, out there in the wild, a wrong URL is being used to access this
http://www.myhost.com/listing.php/?repname=repository1&path=%2Ftrunk%2Fbackend
(Notice the slash after listing.php)
Now, although the URL works and websvn still shows the webpage, images and stylesheets do not get loaded correctly, since they are referenced relative.
I tried to add an .htaccess file to the webroot to redirect people accessing the file as directory to the correct URL.
I have tried multiple variations and ended up with this file:
RewriteEngine on
RewriteRule ^/listing.php/ listing.php [R=301,QSA]
But, since I am writing here, you already guessed it: It doesn't work.
I also tried
RewriteEngine on
RewriteRule ^/listing.php(.*) listing.php$1 [R=301,QSA]
What am I doing wrong?
Perhaps among other things, a RewriteRule within .htaccess that starts with “^/” will never match anything at all. (Examples that include a leading slash are for the global configuration file.) Remove the leading forward slash and see if that helps.
Also, I recommend changing the 301 to a 307 until you get it working. Otherwise, your browser will cache the 301 result, redirecting on subsequent references without consulting your server at all and likely giving you very confusing results.