.htaccess how to force/automatic clean URL - apache

I'm new to mod_rewrite but am trying my best to fix up my site with clean URLs.
Using RewriteRule I can get it so you can type in a clean URl and get to the right page, but what I'm having trouble with is automatically redirecting to the clean URL if a "messy" one is used (which is highly possible due to user submitted links and content etc)
Now, I have this bit of code which I found on another .htaccess forum, and it works in one situation, but not another. I'll explain:
# FORCE CLEAN URLS
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+index\.php\?(.*)=([^\s]+) [NC]
RewriteRule ^ %1\/%2? [R=301,L]
This works fine on an address like this, for example: www.domain.com/index.php?cmd=login it automatically redirects to www.domain.com/cmd/login
But the problem comes when there is more than one query, like: www.domain.com/index.php?cmd=view-profile&user=bob
I can't figure out how to make it sort out that kind of URL when there could be up to 3 or more queries in the address.
I'm not fully competent with regex, so my attempts to amend the code snippet I have has failed thus far.
Any help would be appreciated! I would like them to be 301 redirects so that the site can get indexed properly and be SEO compliant no matter what type of clean or messy URL is used, but I'm open to suggestions!
EDIT
After playing around with the regex for a few hours, I've progressed but got stumped again.
If I make the expression to this:
index\.php\?(.*)=([^\s]+)(&(.*)=([^\s]+))?+
$1/$2/$3/$4/$5
It will match these 3 URLs from index.php onwards:
http://site.com/index.php?cmd=shop&cat=78
http://site.com/index.php?cmd=shop
http://site.com/index.php?cmd=shop&cat=78&product=68
BUT the resulted output varies depending on which it is. These are my results:
http://site.com/cmd=shop&cat/78///
http://site.com/cmd/shop///
http://site.com/cmd=shop&cat=78&product/68///
I'm nit sure how to get it to treat certain parts as optional so it groups properly.

You'll need to deal with each number of pairs of parameters separately. The one you have can be used to handle one name/value pair, then approach it similarly for 2, and 3 (and 4 if needed):
# To handle a single name/value pair in the query string:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+index\.php\?([^&=]+)=([^&\ ]+)(\ |$) [NC]
RewriteRule ^ /%1\/%2? [R=301,L]
# To handle 2:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+index\.php\?([^&=]+)=([^&\ ]+)&([^&=]+)=([^&\ ]+)(\ |$) [NC]
RewriteRule ^ /%1\/%2/%3/%4? [R=301,L]
# To handle 3:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+index\.php\?([^&=]+)=([^&\ ]+)&([^&=]+)=([^&\ ]+)&([^&=]+)=([^&\ ]+)(\ |$) [NC]
RewriteRule ^ /%1\/%2/%3/%4/%5/%6? [R=301,L]
Basically, you're adding another &([^&=]+)=([^&\ ]+) before the check for the end of the request, (\ |$), and adding another /%#/%# to the end of the target URI, where the #'s are appropriate incremented backreferences.

Related

htaccess: Can match one slash, but not double slashes

I am unable to write a rule that matches double slashes.
In my .htacess file:
#RULE 1:
RewriteCond %{REQUEST_URI} ^.*hi1.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
#RULE 2:
RewriteCond %{REQUEST_URI} ^.*hi2/.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
#RULE 3:
RewriteCond %{REQUEST_URI} ^.*hi3//.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
RESULTS:
https://www.example.com/hi1//
successfully redirects to google
https://www.example.com/hi2//
successfully redirects to google
https://www.example.com/hi3//
fails to redirect to google
Third url yields the following:
Sorry, this page doesn't exist.
Please check the URL or go back a page.
404 Error. Page Not Found.
EDIT # 1:
Interestingly:
#RULE 4:
RewriteCond %{REQUEST_URI} ^.*hi4/.*/.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
RESULTS:
https://www.example.com/hi4/abc/
successfully redirects to google
https://www.example.com/hi4//
fails to redirect to google
EDIT # 2:
My original post seems to have created confusion. I will try to be clearer: I need a rule that will match a url ending in double slash, and will not match a url that does not end in double slash. Currently, my .htaccess file contains only the following:
RewriteEngine on
RewriteRule yoyo https://www.cnn.com/ [R=301,L]
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Results:
https://www.example.com/about-us//
fails to redirect to google, and yields 404 error
(The first rule (yoyo) is only to ensure no caching.)
EDIT # 3:
I see that the confusion continues. So, my .htaccess file contains only:
RewriteEngine on
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Results:
https://www.example.com/about-us//
fails to redirect to google, and yields 404 error
This time, I think we can rule out caching, because I used the .htaccss on a website of mine that previously had no .htaccess file.
Simply, my efforts to match a url ending with double-slash are failing.
You need not to write 3 rules when you could catch similar kind of URIs with regex patterns so that we need not to write multiple patterns, this also takes cares of multiple occurrences of / coming in the end. Could you please try following, please make sure you clear your browser cache after placing these rules into your htaccess file.
RewriteEngine ON
RewriteCond %{REQUEST_URI} ^/hi[0-9]+/{2,}?$ [NC]
RewriteRule ^(.*)$ https://www.google.com/ [R=301,L]
EDIT:
OK now I get it. Only match paths ending with two slashes.
I updated the answer. The request URI inside THE_REQUEST is not on the end, but is followed by a space and more after that, so matching //\s should work for you
AmitVerma mentioned the correct answer in his comment, but it is being snowed in by other comments. For all the other people like me who did not know about the THE_REQUEST parameter (thank you Amit) a more complete answer here.
The problem with the original rule is the use of the REQUEST_URI parameter. The value of this parameter will probably already have been cleaned by the webserver or other modules. Double slashes would have been removed.
The THE_REQUEST parameter contains the original unmodified request. Therefore the following will work as requested:
RewriteCond %{THE_REQUEST} //\s.*$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Regarding your updated question:
... I need a rule that will match a url ending in double slash
RewriteCond %{THE_REQUEST} //$
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
Aside: Your previous rules matched a URL containing a double slash anywhere in the URL-path (which would naturally catch a double slash at the end as well).
However, the above will not match a URL that ends with a double slash. In fact, it will never match anything because THE_REQUEST does not only contain the URL. THE_REQUEST server variable contains the first line of the HTTP request headers. For example, when you request https://example.com/about-us//, THE_REQUEST will contain a string of the form:
GET /about-us// HTTP/1.1
So, you can see from the above that a regex like //$ will never match. You will need to use a condition of the form:
RewriteCond %{THE_REQUEST} //\s
To match two slashes followed by a space. Which could only occur at the end of URL. (Although it could also occur at the end of the query string, but cross that bridge when we come to it.)
However, since the other suggestions (eg. ^.*hi3//.*$) don't appear to have worked, then this is not going to work either.
You need to clear your browser cache before testing and please test with 302 (temporary) redirects, otherwise, you can easily go round in circles chasing caching issues. You should also test this with the Browser "Inspector" open on the "Network" tab and check the "Disable cache" option. For example, in Chrome:
(UPDATE) Debugging...
This does not seem to be a question about regex, as the earlier answers/comments (and code snippets in the question itself) should already have produced the desired results. So "something else" would seem to be going on here.
To debug and see the value of THE_REQUEST, you can do something like the following (at the very top of your .htaccess file):
RewriteCond %{QUERY_STRING} !^the-request=
RewriteRule ^ /?the-request=%{THE_REQUEST} [R,L]
And then request /about-us//. You should then be redirected to a URL of the form:
/?the-request=GET%20/about-us//%20HTTP/1.1
(Where the %20 are naturally the URL encoded spaces.)
Please report back exactly what you are seeing.
Here's what finally worked to match double slashes (nothing else worked for me):
RewriteEngine on
RewriteCond %{THE_REQUEST} //
RewriteRule ^.*$ https://www.google.com/ [R=301,L]
(And, as I wrote, I was careful to prevent caching, so caching never was an issue.)
PLOT TWIST:
Even this solution, which is the only solution that works on one of my websites, does not work on the website I have been testing on for most of this discussion. In other words, there is not one single solution for matching double-slash on that server!

Editing .htaccess file to modify URL

I'm trying to modify my .htaccess file to modify my URL and have tried many methods but cannot achieve exactly what I want. For example I have this URL:
http://mywebsite.com/FOLDER/index.php?id=5
Now I want it to look like:
http://mywebsite.com/FOLDER/5
or
http://mywebsite.com/FOLDER/ID/5
My .htaccess contains the following code:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^index/([0-9]+)/([0-9a-zA-Z_-]+) index.php?id=$1 [NC]
I cannot figure out what's wrong. Thanks.
You can use:
RewriteEngine on
# external redirect from actual URL to pretty one
RewriteCond %{THE_REQUEST} \s/+FOLDER/index\.php\?id=(\d+) [NC]
RewriteRule ^ /FOLDER/%1? [R=301,L,NE]
# internal forward from pretty URL to actual one
RewriteRule ^FOLDER/(\d+)/?$ FOLDER/index.php?id=$1 [L,QSA,NC]
The first argument of RewriteRule is what the incoming url without domain and without preceding paths (more on that later) is going to be matched against. This url is, in your case, http://mywebsite.com/FOLDER/5. Assuming that your .htaccess file is in your DocumentRoot, the regex will match against FOLDER/5.
You are currently trying to match FOLDER/5 with ^index/([0-9]+)/([0-9a-zA-Z_-]+), which is not going to work. A better regex would be ^(.*)/([0-9]+)$ or ^(.*)/ID/([0-9]+)$. You can then rewrite to $1/index.php?id=$2. I would recommend using the [L] flag to stop rewriting for this round to avoid common problems with multiple rules matching while you do not expect them to.
Besides this, make sure that your .htaccess files are being read (e.g. by checking that if you enter garbage, you get a 500 internal server error), that mod_rewrite is enabled, that you are allowed to override FileInfo. You also may need to turn AcceptPathInfo off.

Add parameter using htaccess on condition

This will be a simple for those familiar with Apache rules.
Situation
Using Alipay for a payment platform, the return URL cannot feature any of your own URL parameters (be it GET or POST). However, I am using Joomla and specifically Akeeba subscriptions. This component expects a parameter in the URL in accordance with the payment platform in question.
I want to detect (through one of Alipay's URL parameters) when a return page is hit and add the extra parameter.
Example (domain and page redacted)
http://...?
currency=HKD&
total_fee=2.00&
out_trade_no=211&
trade_no=2014040100276615&
trade_status=TRADE_FINISHED
Desired outcome
http://...?
currency=HKD&
total_fee=2.00&
out_trade_no=211&
trade_no=2014040100276615&
trade_status=TRADE_FINISHED&
paymentmethod=alipay
The simple addition of a &paymentmethod=alipay
Problem
I can't seem to get Apache to pick up the rule; here are a couple of attempts so far. Please note, I definitely can use .htaccess and don't need to change RewriteBase.
-- Attempt 1 --
RewriteCond %{QUERY_STRING} out_trade_no=
RewriteRule ^out_trade_no paymentmethod=alipay&out_trade_no [R,L,QSA]
-- Attempt 2 --
RewriteCond %{QUERY_STRING} (^|&)out_trade_no=(&|$) [NC]
RewriteRule ^ %{REQUEST_URI}&paymentmethod=alipay [L,R=301,QSA]
Progress
Combining the two, I have made progress but, now seem to have the Rewrite part spamming "paymentmethod=alipay" which seems to cause an error.
RewriteCond %{QUERY_STRING} out_trade_no=
RewriteCond %{QUERY_STRING} !paymentmethod=
RewriteRule ^ %{REQUEST_URI}&paymentmethod=alipay [R,L]
Now getting a redirect chain until it automatically stops at a redirect limit
If you are just trying to match a query string from that URL with that rewritecond you need to match the first one(currency). Which is the easiest.
Try this. It will send all the parameters you want.
RewriteCond %{QUERY_STRING} ^\bcurrency=
RewriteRule ^(.*)$ /$1?paymentmethod=alipay [R,QSA,L]

Htaccess Match Random 6 characters, with exceptions?

Alright, so I've been trying to wrap my head around (what I believe to be) a simple mod_rewrite case. Maybe it's not, but I'm hoping you can help me with that, Stack Overflow.
So what I want is this: there are several folders that need to be ignored (ie, "css", "js", "bootstrap", etc). If the url string doesn't match those, I want to check if it's a string of exactly six letters and numbers, and redirect that to one url. Otherwise, it gets redirected to another url.
This is what I have:
RewriteCond %{REQUEST_URI} !^/(index\.php|images|robots\.txt|bootstrap|phpmyadmin|css|js|font|recaptchalib.php|uploads)/
RewriteRule ^(a-z0-9+){6}$ /index.php/download/index/$1 [L]
RewriteRule ^(.*)$ /index.php/$1 [L]
If I take out the middle line, it works fine except I don't get the "match 6 random characters" functionality. With the middle line, I get a 500 error on every page.
Could someone help me out please?
You need to repeat the condition. A RewriteCond only applies to the immediately following RewriteRule. So you're second rule doesn't exclude all those folders in the pattern that you have in your condition. Try:
RewriteCond %{REQUEST_URI} !^/(index\.php|images|robots\.txt|bootstrap|phpmyadmin|css|js|font|recaptchalib.php|uploads)/
RewriteRule ^(a-z0-9+){6}$ /index.php/download/index/$1 [L]
RewriteCond %{REQUEST_URI} !^/(index\.php|images|robots\.txt|bootstrap|phpmyadmin|css|js|font|recaptchalib.php|uploads)/
RewriteRule ^(.*)$ /index.php/$1 [L]
You should let CodeIgniter handle this. Keep your .htaccess for routing to the index.php front controller, and use route(s) to handle the URI and where it should go from there.
This is especially true because now ANY six-letter URI is going to be defaulted. What if you have a controller like example.com/bloggers? It will always be assumed to be a download item, even if it's a real controller URI.
The "easiest" option (read: option that does not conflict with existing controllers/routes) is to utilize the 404_controller to check the URI and see if it's a valid download URL. Then you can run the appropriate code.
To explain a likely reason why your .htaccess code is not working: your regular expression for matching six alpha-numeric characters is wrong. Here's what you need:
^([a-zA-Z0-9]{6})$
This regex can be used as a CodeIgniter route, also, if you go that route (heh). Just remove the ^$ beginning/end characters, as CI puts them there for you.
As mentioned by Jon Lin, you also need to duplicate RewriteCond conditionals, as they are only good for one RewriteRule. After one, the conditionals reset.

redirecting based on referer using mod_rewrite ignoring the GET parameters

I'm trying to restrict the access to a webpage with mod_rewrite, based on the referer.
The webpage's URL is
http://www.example.com/path/to/page.php
It is located on a Debian server in /var/www/path/to/page.php
I have a rewrite map allowedReferers containing a list of URLs
allowedReferers
http://www.example.com/test/test1.php:white
http://www.example.com/test/test2.php:white
I also have the following rewrite conditions/rules
Rewrite
Cond %{HTTP_REFERER} ^(.*)$
RewriteCond ${allowedReferers:%1|black} ^black$ [NC]
RewriteRule /* http://www.someotherplace.com [R,L]
So far this works perfectly well.
http://www.example.com/test/test1.php
http://www.example.com/test/test2.php
Can access the website, while
http://www.example.com/test/test3.php
gets redirected to someotherplace.com.
My problem is that, in real life, my referers will contain GET parameters.
e.g.
http://www.example.com/test/test1.php?id=245
My idea was to rewrite the first condition to something like this
RewriteCond %{HTTP_REFERER} ^(.*)\?.*$
or this
RewriteCond %{HTTP_REFERER} ^(.*)\?id=[0-9]*$
I've tested both regexes in Firefox' RegexTester and they behave as I want them to.
Applied to the following input
http://www.example.com/test/test1.php?id=245
they return this for $1:
http://www.example.com/test/test1.php
I expected that %1 also contains the URL minus the GET parameters.
So that, leaving the rest of the rule unchanged:
RewriteCond ${allowedReferers:%1|black} ^black$ [NC]
RewriteRule /* http://www.someotherplace.com [R,L]
should result in the expected behavior:
http://www.example.com/test/test1.php?id=234
http://www.example.com/test/test2.php?id=222
can access the website, while
http://www.example.com/test/test3.php?id=256
(or http://www.athirdplace.com/ etc.)
will be redirected to someotherplace.com
Unfortunately it does not behave as expected at all.
Having applied the change to the first condition suddenly every referer has access to the website.
As I wanted to see what actually is inside of %1, I came up with the following idea:
RewriteCond %{HTTP_REFERER} ^(.*)\?id=[0-9]*$
RewriteCond ${allowedReferers:%1|black} ^black$ [NC]
RewriteRule /* %1 [R,L]
Assuming that refering to the page from
http://www.example.com/test/test2.php?id=234
would redirect me to
http://www.example.com/test/test2.php
Wrong assumption. It redirects me to
http://www.example.com/var/www/path/to/
which is, as I mentioned in the beginning, the address of the page whose access is to be restricted.
And of course provokes a 404, as /var/www/ is docroot.
Redirecting to %1 was just a desperate attempt to debug my problem, so I do not need a solution to achieve this. What I'm looking for is a way to solve my original redirection problem.
Referers like these
http://www.example.com/test/test1.php?id=234
http://www.example.com/test/test2.php?id=222
(no matter which id is passed)
go to
http://www.example.com/path/to/page.php
while everything else goes to
http://www.someotherplace.com
Finally I would also appreciate any ideas how to debug mod_rewrite, especially ways to peek into stuff like %{HTTP_REFERER}, %1, $1, and the likes.
Just found a solution how to (at least partially) debug mod_rewrite:
http://www.latenightpc.com/blog/archives/2007/09/05/a-couple-ways-to-debug-mod_rewrite
provides a very handy trick to output some of the values mod_rewrite is using.
I modified the example a little bit, to the following:
RewriteCond %{HTTP_REFERER} ^(.*)\?id=[0-9]*$
RewriteRule (.*) /path/to/mod_rewrite_debugger.php?referer=%{HTTP_REFERER}&p1=%1 [R=301,L,QSA]
mod_rewrite_debugger.php just contains
echo "<pre>"; print_r($_GET); echo "</pre>"
The output is:
[referer] => http://vfh143.beuth-hochschule.de/tests/moodle3.php?id=3
[p1] => http://vfh143.beuth-hochschule.de/tests/moodle3.php
Which shows that my original assumption was right.
Unfortunately the debugger doesn't work anymore when the second condition is applied:
RewriteCond %{HTTP_REFERER} ^(.*)\?id=[0-9]*$
RewriteCond ${allowedReferers:%1|black} ^black$ [NC]
RewriteRule (.*) /path/to/mod_rewrite_debugger.php?referer=%{HTTP_REFERER}&p1=%1 [R=301,L,QSA]
produces the following output:
[referer] => http://vfh143.beuth-hochschule.de/tests/moodle3.php?id=3
[p1] =>
Turns out that the rules suddenly worked as originally intended. The Problem solved itself.
Maybe this helps someone.