Apache RedirectMatch - apache

I'm working on an apache server (2.2), and I'm trying to redirect a URL based off of a URL filter. For example,
https://mywebsite.com/path/to/page?folder=folderDirectory/folderName
will redirect to:
https://mywebsite.com/static/contentUnavailable.html
In my httpd.conf file I have the following code ..
RedirectMatch (.*)path/to/page?folder=folderDirectory/folderName /static/contentUnavailable.html
I restart apache everytime I make modifications to this file, however the page is not redirecting. What am I doing wrong in the RedirectMatch?

You can't match query string with a redirectmatch, sorry, you need mod_rewrite for this and using a RewriteCond. Rough example:
RewriteCond %{QUERY_STRING} ^folder
RewriteRule ^ /static/contentUnavailable.html [R,L,QSD]
This will match a query string that starts with folder (and continues with whatever else, no matter what it is). and redirect everything to the destination you want, discarding the query string in the process (QSD flag).
In any case let me commend you for trying to stick to redirect/redirectmatch first (while everyone else just goes blindly for mod_rewrite even for the simplest redirects). You are doing things right.

Related

Apache: doing pattern matching and grouping with a RewriteRule leads to the local path instead of getting the URL component

I'd like to use RewriteRule's pattern to get the path requested and redirect the client elsewhere keeping the path in the resulting redirect.
I thought something like this would do the trick:
RewriteRule. ^(.*)$ http://testserver/test/$1
If the user requests foo, send him to test/foo (don't worry about looping, I put some RewriteCond logic to prevent that).
To my surprise, Apache ends up with something like http://testserver/foo/var/www/html. What it did was the following:
/bar /var/www/html/bar
I raised the log level of mod_rewrite and found out it did the match, but Apache was expand matching the local path of /, which is /var/www/html and using that to redirect the browser, which won't surely work.
I tried using [PT] which I thought would prevent the expansion, but it didn't.
Any idea on how can I prevent it from happening. Any help would be appreciated.
Best

Apache %{REQUEST_URI} not working correctly

I am not using Virtual Hosts or anything fancy though I have some .htaccess files setup. Following is my rewrite rule in httpd.conf:
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/app/smsapi [NC]
RewriteRule (.*) https://www.example.com/uri=%{REQUEST_URI} [R,L]
This rule basically says that if the uri does not begin with /app/smsapi then fire the rewrite. But when I restart the server and try it I get some weird results.
When I request the URL https://www.example.com/app/smsapi/index.php, I get a 200 Success code which is as expected. But, when I request the URL http://www.example.com/app/smsapi/index.php, it redirects to https://www.example.com/uri=/app/smsapi/index.php. So it actually fires the rule even though the request URI does not satisfy the condition.
So, then I decided to turn off the rewrite rules and give it a go. Now, both those URL give me a 200 Success code.
Now, I know this problem cannot be solved easily by other people who do not have access to the server, but am I right in saying that this is certainly a problem with REQUEST_URI not firing correctly? I have shown that without the rewrite rule, everything works normally, but with the rewrite rule, the second URL is redirected. Therefore, the redirection must be caused by the rewrite rule? Additionally, the condition for redirect rule is not satisfied. Doesn't this prove that there is something wrong with the functioning of the rewrite rule?
Is there any other possibility?
UPDATE
Something very weird is happening here. I setup a local server and tried the same rule and what I got for the URL http://192.168.0.112/app/ is
http://192.168.0.112/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/uri=/app/
which is correct because as long as the URL is not like /app/smsapi, it should redirect it. Wonder why this is not happening on the real server. Also, where you insert these rules seems to make a difference. (I am only including these rules after the LoadModule command).
On localhost, if I put these rules either above or below the Directory section, it won't work. But, if I include it inside the Directory section it will.
On server, if I include the rules inside the Directory section, they won't work. But, if I include them either above or below the Directory section, they start working.
This seems to me to be due to a difference in the versions. My localhost is an Ubuntu Desktop 16.04 running Apache 2.4.18. While the server is CentOS 6.8 running Apache 2.2.15.
But, i think the mystery as to why on the server redirect happens only once (though it is configured to go upto 20 times) has something to do with https. Which is also related to the original problem in which https is redirected even on a non-matching rule.
Clues anyone?
UPDATE
I updated the httpd.conf file with the same rules but I used http:// instead of https:// and it gave me the correct result with 20 redirects. That means I have isolated the problem to https.
You are reporting the exact issue in the first phrase: "I am not using Virtual Hosts or anything fancy though I have some .htaccess files setup"
.htaccess is "fancy" and overcomplicated, not virtualhosts.
If you had defined that RewriteCond in virtualhost in the first place it would work, but .htaccess is per-dir context (aka a nightmare) and the regex ^/ will never match in that context.
If you want to match REQUEST_URI in per-dir context (directory or .htaccess) you need to drop the initial slash, that is:
RewriteCond %{REQUEST_URI} !^app/smsapi [NC]
Extra, also consider you MAY NOT need to add a RewriteCond for this:
RewriteRule ^(?!app/smsapi)(.*) https://www.example.com/uri=$1 [R,L]

301 redirect for old urls with language parameter

I need a hand with some 301 redirects for my apache htaccess file. The old urls contain variables at the end and have structures like the following:
/furniture-248/category/570-shelves.html?lang=en
/all-products/furniture-248/shelves.html?page=2&lang=en
/store/product/asearch.html?path=7_632&lang=en&Itemid=284
The new urls don't contain parameters of this nature and would be simply of the form:
main-cat/subcat/sale.html
I tried a regular 301 redirect in the htaccess file which works for urls without parameters but those urls containing the ?lang=en simply don't work.
This is what I was trying:
Redirect 301 /furniture-248/category/570-shelves.html?lang=en http://www.domain.com/shelves.html
I'd be very grateful for any help and advice.
Many thanks in advance
You can't use the query string as part of a redirect like that. You have two options.
Option 1
Take the "?lang=en" part off and just redirect all instances of that URL, whatever the query string is.
Redirect 301 /furniture-248/category/570-shelves.html http://www.domain.com/shelves.html
This will leave the query string intact, so the new URL will include "?lang=en" if it is present, or any other query string.
But of course, you might need to only redirect it when it has the "?lang=en" part, or leaving the query string intact when redirecting might not be acceptable. In that case, it will need to be...
Option 2
Use mod_rewrite:
RewriteEngine On
RewriteCond %{QUERY_STRING} ^lang=en$
RewriteRule ^furniture-248/category/570-shelves\.html$ http://www.domain.com/shelves.html? [R=301,L]
This does exactly what you asked for, redirecting /furniture-248/category/570-shelves.html?lang=en to http://www.domain.com/shelves.html and only that.
Note that..
The query string is matched separately.
The opening forward slash on the matching part is not used (because the fact you're in a website root level .htaccess file implies that opening slash).
The closing question mark on the redirect URL is important, as it tells the engine to drop the existing query string, which is what you want.
[R=301,L] means do a 301 redirect and don't process any more URL rewriting on this URL.
For the matching part in RewriteRule, the dot before "html" is escaped with "\" because dot has a special meaning in a regex.
Also for the matching parts, in both RewriteCond and RewriteRule, the ^ means the start of the string and the $ means the end of it, so that we are matching exactly that rather than it being possible for it to be part of a longer string.
And finally, if you're adding a number of these, you only need the "RewriteEngine On" part once, at the top. The other two parts are needed for each one.
Please be sure to test all redirects you add with this method as there is more to mod_rewrite than I have mentioned in this simple explanation.

Does REQUEST_URI hide or ignore some filenames in .htaccess?

I'm having some difficulty with a super simple htaccess redirect.
All I want to do is rewrite absolutely everything, except a couple files.
htaccess looks like this:
RewriteEngine On
RewriteCond %{REQUEST_URI} !sitemap
RewriteCond %{REQUEST_URI} !robots
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
The part that works is that everything gets redirected to new domain as it should be. And I can also access robots.txt without being forwarded, but not with sitemap.xml. If I try to go to sitemap.xml, the domain forwards along anyway and opens the sitemap file on the new domain.
I have this exact same issue when trying to "ignore" index.html. I can ignore robots, I can ignore alternate html or php files, but if I want to ignore index.html, the regex fails.
Since I can't actually SEE what is in the REQUEST_URI variable, my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI? I know this because of a stupid test. If I choose to ignore index.html like this:
RewriteCond %{REQUEST_URI} !index.html
Then if I type example.com/index.html I will be forwarded. But if I just type example.com/ the ignore actually works and it shows the content of index.html without forwarding!
How is it that when I choose to ignore the regex "index.html", it only works when "index.html" is not actually typed in the address bar!?!
And it gets even weirder! Should I type something like example.com/index.html?option=value, then the ignore rule works and I do NOT get forwarded when there are attributes like this. But index.html by itself doesn't work, and then just having the slash root, the rule works again.
I'm completely confused! Why does it seem like REQUEST_URI is not able to see some filenames like index.html and sitemap.xml? I've been Googling for 2 days and not only can I not find out if this is true, but I can't seem to find any websites which actually give examples of what these htaccess server variables actually contain!
Thanks!
my guess is that somehow index.html and sitemap.xml are some kind of "special" files that don't end up in REQUEST_URI?
This is not true. There is no such special treatment of any requested URL. The REQUEST_URI server variable contains the URL-path (only) of the request. This notably excludes the scheme + hostname and any query string (which are available in their own variables).
However, if there are any other mod_rewrite directives that precede this (including the server config) that rewrite the URL then the REQUEST_URI server variable is also updated to reflect the rewritten URL.
index.html (Directory Index)
index.html is possibly a special case. Although, if you are explicitly requesting index.html as part of the URL itself (as you appear to be doing) then this does not apply.
If, on the other hand, you are requesting a directory, eg. http://example.com/subdir/ and relying on mod_dir issuing an internal subrequest for the directory index (ie. index.html), then the REQUEST_URI variable may or may not contain index.html - depending on the version of Apache (2.2 vs 2.4) you are on. On Apache 2.2 mod_dir executes first, so you would need to check for /subdir/index.html. However, on Apache 2.4, mod_rewrite executes first, so you simply check for the requested URL: /subdir/. It's safer to check for both, particularly if you have other rewrites and there is possibility of a second pass through the rewrite engine.
Caching problems
However, the most probable cause in this scenario is simply a caching issue. If the 301 redirect has previously been in place without these exceptions then it's possible these redirections have been cached by the browser. 301 (permanent) redirects are cached persistently by the browser and can cause issues with testing (as well as your users that also have these redirects cached - there is little you can do about that unfortunately).
RewriteCond %{REQUEST_URI} !(sitemap|index|alternate|alt) [NC]
RewriteRule .* alternate.html [R,L]
The example you presented in comments further suggests a caching issue, since you are now getting different results for sitemap than those posted in your question. (It appears to be working as intended in your second example).
Examining Apache server variables
#zzzaaabbb mentioned one method to examine the value of the Apache server variable. (Note that the Apache server variable REQUEST_URI is different to the PHP variable of the same name.) You can also assign the value of an Apache server variable to an environment variable, which is then readable in your application code.
For example:
RewriteRule ^ - [E=APACHE_REQUEST_URI:%{REQUEST_URI}]
You can then examine the value of the APACHE_REQUEST_URI environment variable in your server-side code. Note that if you have any other rewrites that result in the rewritting process to start over then you could get multiple env vars, each prefixed with REDIRECT_.
With the index.html problem, you probably just need to escape the dot (index\.html). You are in the regex pattern-matching area on the right-hand side of RewriteCond. With the un-escaped dot in there, there would need to be a character at that spot in the request, to match, and there isn't, so you're not matching and are getting the unwanted forward.
For the sitemap not matching problem, you could check to see what REQUEST_URI actually contains, by just creating an empty dummy file (to avoid 404 throwing) and then do a redirect at top of .htaccess. Then, in browser URL, type in anything you want to see the REQUEST_URI for -- it will show in address bar.
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^ /test.php?var=%{REQUEST_URI} [NE,R,L]
Credit MrWhite with that easy test method.
Hopefully that will show that sitemap in URL ends up as something else, so will at least partially explain why it's not pattern-matching and preventing redirect, when it should be pattern-matching and preventing redirect.
I would also test by being sure that the server isn't stepping in front of things with custom 301 directive that for whatever reason makes sitemap behave unexpectedly. Put this at the top of your .htaccess for that test.
ErrorDocument 301 default

multiple folder redirect

I have been trying variations of the following without success:
Redirect permanent /([0-9]+)/([0-9]+)/(.?).html http://example.com/($3)
It seems to have no effect. I have also tried rewrite with similar lack of results.
I want all links similar to: http://example.com/2002/10/some-long-title.html
to redirect the browser and spiders to: http://example.com/some-long-title
When I save this to my server, and visit a link with the nested folders, it just returns a 404 with the original URL unchanged in the address bar. What I want is the new location in the address bar (and the content of course).
I guess this is more or less what you are looking for:
RewriteEngine On
ReriteRule ^/([0-9]+)/([0-9]+)/(.?)\.html$ http://example.com/$3 [L,R=301]
This can be used inside the central apache configuration. If you have to use .htaccess files because you don't have access to the apache configuration then the syntax is slightly different.
Using mod_alias, you want the RedirectMatch, not the regular Redirect directive:
RedirectMatch permanent ^/([0-9]+)/([0-9]+)/(.+)\.html$ http://example.com/$3
Your last grouping needs to be (.+) which means anything that's 1 character or more, what you had before, (.?) matches anything that is either 0 or 1 character. Also, the last backreference doesn't need the parentheses.
Using mod_rewrite, it looks similar:
RewriteEngine On
RewriteRule ^/([0-9]+)/([0-9]+)/(.+)\.html$ http://example.com/$3 [L,R=301]