I'm using apache / mod_rewrite to re-write URLs but I am having problems with the plus sign.
With the following rule..
RewriteRule ^/(.+[^/])/?$ http://localhost:8080/app/home?tag=$1 [P,L]
Both:
http://localhost/1+1 and http://localhost/1%2B2
end up as
uri=http://localhost:8080/app/home, args=tag=1+2
So in both cases the application translates the plus sign into a space so it cannot differentiate between spaces and plus signs anymore.
If I use the "B" flag, then in both cases the + signs are translated into %2B and the application ends up with the same problem but reversed (both spaces and plus signs are plus signs)
Is there a way to get apache to properly escape %2B into a plus sign and not a space?
I read something about mod_security but I am not using that so I am not sure if there is some other security mechanism that is causing this?
Any help would be greatly appreciated!
No, this isn't quite the same as the referenced question. The problem here is specifically plus signs and the answer to Apache: mod_rewrite: Spcaes & Special Characters in URL not working doesn't address that.
There's also an issue with slashes, for which see http://httpd.apache.org/docs/current/mod/core.html#allowencodedslashes
(but you do need access to the Apache config to do this - .htaccess won't do).
In fact it is impossible to do using a rewrite rule alone. Apache decodes the URL before putting it through rewrite, but it doesn't understand plus signs: http://example.com/a+b.html wouldn't deliver a file called
"a b.html".
The plus signs are decoded by PHP into the $_GET array (or whatever the relevant language mechanism is) for query strings, because form handlers in browsers put them in. So Apache will translate %2B to + before applying the rewrite, and leave + itself alone, meaning you can't tell the difference.
Of course, one could argue that + used as space is simply invalid in such URLs and one should use only %20. However, if you don't have control over generating them, you're bound to see them. Browsers won't generate them automatically though.
The answer is DIY, and in many ways it is more predictable and simpler:
RewriteRule .* index.php [L]
Hence everything turns into index.php and there's no attempt to construct a query string. If you want to exclude certain patterns, e.g. those with slashes in, or where an explicit file does exist, the obvious amendments apply. e.g.
RewriteCond %{REQUEST_FILENAME} !-f
Then in index.php
$uri = substr($_SERVER['REQUEST_URI'], 1); // remove leading slash
$qmpos = strpos($uri, '?'); // is there a question mark, if so where
if ($qmpos !== FALSE) { $uri = substr($uri, 0, $qmpos); } // only the bit before q.m.
$decoded = urldecode($uri); // decode the part before the URL
if (! empty($decoded)) { $_GET['args'] = $decoded; } // add result to $_GET
That decodes the original request (excluding the leading slash - would be slightly different if you're deeper down a hierarchy, but the principle is the same - and excluding any additional query string), and decodes the args parameter according to PHP's normal rules and puts it into $_GET so you can process it along with the rest of the $_GET query string parameters in the usual way.
I believe this should work for empty URLs (http://example.com/) or those which only have a query string (http://example.com/?foo=1), as well as the simple case (http://example.com/bar) and the case with a query string as well (http://example.com/bar?foo=1). No doubt similar approaches will work for other languages.
In your particular case, you actually don't want the pluses decoded in the PHP at all. That's fine, use rawurldecode instead, which doesn't do pluses.
Related
I need help for rewriting URL as I need to concatenate all the query string values to a single parameter concatenated with + sign and replacing = sign with :
So a link like this
http://domain.com/mypage?p1=v1&p2=v2&p3=v3
would become
http://domain.com/mypage?params=p1:v1+p2:v2+p3:v3
(Edited: I change the page name, removing .php)
First of all, this problem is not best solved via mod_rewrite. If you can, you should modify the receiving end (which you can't) or modify the sending end (which you might be able to) to send the proper data instead. If you happen to know all parameter names that can possibly appear in the url, you can use this question instead.
You can match the query string with the %{QUERY_STRING} variable in a RewriteCond. You then internally rewrite to an url with that query string.
RewriteEngine on
RewriteCond %{QUERY_STRING} !params=
RewriteRule ^sillyframework/sillypage$ sillyframework/sillypage?%{QUERY_STRING}¶ms=
RewriteCond %{QUERY_STRING} ^([^&=]+)=([^&]+)¶ms=([^&=]*)$
RewriteRule ^sillyframework/sillypage$ sillyframework/sillypage?params=%3+%1:%2 [L]
RewriteCond %{QUERY_STRING} ^([^&=]+)=([^&]+)&(.*)¶ms=([^&=]*)$
RewriteRule ^sillyframework/sillypage$ sillyframework/sillypage?%3¶ms=%4+%1:%2 [L]
There is one problem with this approach, and that it requires several internal recursions to translate all parameters in the query string. Why is this bad? Every recursion costs time. To protect the server, the default amount of recursions allowed is 10, after which you'll get an internal server error. It is possible to bypass this recursion limit, either by increasing the limit, or by using the [N] flag. The [N] flag is a very bad idea, just as much as a badly chosen limit for the recursion limit. Apache will not terminate in incorrect rule with the N flag, so a specially crafted request for a rule that uses it can let mod_rewrite dos your own server. A badly chosen (high value) of InternalRecursionLimit basically does the same. Instead of showing an internal server error, mod_rewrite will blow up the server before it hits that limit. The difference between the N flag and the limit is that a high value for InternalRecursionLimit and L will theoretically eventually hit that limit, where N will not do that.
as the query string will not have a fixed number/name parameters, I decided to write a CGI script to do the transformation and then redirecting to the modified url.
Thanks a lot for helping.
Is it possible to re-write parameters that are not always in the same order in a URL?
For example, we might have a url like
/products/type/animal/id/123456
But it could also appear as:
/products/id/ab123456/type/animal
Using a mod rewrite statement like
/products[.html?]?(?:/id/([^/])?)/?(?:/type/([^/])?)/? "products.html?id=$1&type=$2" [L, NC]
works fine for the the first example but of course fails for the second. Is there anyway around this?
EDIT:There are multiple key/value pairs (perhaps 7 or 8) so it would not be possible to use a universal /([^/]+)/?/([^/]+)/ type regex.
Just write multiple rules that match each of the possible source orderings!
i've a problem with apache mod rewrite, I need to generate a SEF query with flexible parameters
example:
www.myname.com/category.php?p1=itemname&p2=categoryname&p3=color&p4=size
or
www.myname.com/category.php?p1=itemname&p3=color
or
www.myname.com/category.php?p3=color&p4=size
the combinations are always different.
how I can do it dynamically?
I started with:
RewriteRule ^search/([^/]+)-([^/]+)-([^/]+)$ category.php?p1=$1&p2=$2&p3=$3&p4=$4
Thank You!!
It's not possible to build a regexp that matches in the flexible way you want.
I see some alternatives:
You could assing a position to earch parameter in the url, something like:
http://www.myname.com/param1-param2-param3-param4
BUT in the absence of one of the parameters, the separator char should still apear in the url:
http://www.myname.com/--color-size
this, in my opinion the url is UGLY
You can evaluate the use of URL path params, take a look at what every developer should know about url encoding
with this alternative the url could be something like:
http://www.myname.com/item;name=xxx;category=yyy;color=zzz
I do not know how search engines would consider this urls, but I imagine that it's SEF.
I know there are a lot of questions like this, but I can't really find one that I can translate into what I need:
I'm trying to rewrite
www.domain.com/subfolder/index.php?p=test.php&condition=true&another=false
to
www.domain.com/subfolder/p/test/condition/true/another/false
where the PHP GET variables van be ints, strings or booleans. The p variable is special and will always be there.
I've tried
RewriteEngine On
RewriteRule ^([^/]*)$ /subfolder/index.php?p=$1 [L]
as a partial solution, but I can't get it to work (I just get 404s)
I'm a Mod Rewrite noob, so any help would be much appreciated.
This has more to do with regular expressions. If memory serves, you need something more like this:
RewriteRule ^/subfolder/(.*)/(.*)/$ /subfolder/index.php?p=$1&condition=$2 [L]
Each numbered parameter must correspond to some piece of the first part of the rewrite. That is, the $1 corresponds to the first (.*).
I don't have time to test this, but the idea is more or less correct.
I have a site that displays products - in the simplest sense the url of the page for a particular product is:
site.com/products/manufacturer_model - so for example if I was displaying a Dell Latitude D700 laptop my URL would look like:
site.com/products/dell_latitude_d700
I have a number of products that contain characters that I would need to URL escape - so for example a Dell Latitude 12?34. Obviously I cannot include the '?' character in the URL. For the purpose of being SEO-friendly - should I ignore that character? e.g.
site.com/products/dell_latitude_1234
Or should I escape it? e.g.
site.com/products/dell_latitude_12%3F34
Seems like escaping it would be the most logical approach - but do crawlers understand this?
Well, using "_" is not so friendly to users, so I think using "-" is better (check seoMOZ beginners guide).
Also, you would like to check what characters really need escaping on RFC 3986. If you are using PHP, check out urlencode function page at php.net. I wrote a function to make this updated conversion a few months ago ;)
But getting back to your main question, do use escaped (when needed per RFC 3986) for writing your URLs. It is the safe path to not getting stuck or penalized.