confusion between two rewrite urls rules - apache

these two rules are confused :
RewriteRule ^health-institute-([a-zA-Z\-]+)-([a-zA-Z\-]+)$ search.php?city=$1&speciality=$2 [L]
RewriteRule ^health-institute-app-([a-zA-Z\-]+)$ search.php?city=$1 [L]
when I want to reach health-institute-app-mycity (2nd rule) the server consider app as a value and try to reach search.php?city=app&speciality=mycity (1st rule)
how can I say that these are two separate rules?

Yes, because the regex ^health-institute-([a-zA-Z\-]+)-([a-zA-Z\-]+)$ in the first rule also matches health-institute-app-mycity.
You need to reverse these two directives so the more specific rule is first.
For example:
RewriteRule ^health-institute-app-([a-zA-Z-]+)$ search.php?city=$1 [L]
RewriteRule ^health-institute-([a-zA-Z-]+)-([a-zA-Z-]+)$ search.php?city=$1&speciality=$2 [L]
(No need to backslash-escape the hyphen when at the start or end of the character class.)
HOWEVER, the regex in the (now) second rule is potentially ambiguous since the hyphen (-) is used to delimit the two values (city and speciality), but the hyphen is also included in both the character classes, so it can presumably be part of the values themselves. However, both city and speciality cannot both contain hyphens, despite the regex seemingly allowing this.
For example, how should a request for health-institute-foo-bar-baz-qux be resolved? Since the quantifier + is greedy, this will currently result in search.php?city=foo-bar-baz&speciality=qux. If there is ever a hyphen in the speciality (as suggested this could be the case by the regex) it will never be matched.

Related

Rewrite encoded URLs with RewriteRules

I was rewritting "domain.com/lolmeter/platformValue/usernameValue" (platformValue and usernameValue are values requested by the user with text inputs) with the following rewrite rule:
RewriteRule ^lolmeter/([a-zA-Z0-9]*)/([a-zA-Z0-9]*)$ /lolmeter.html?platform=$1&username=$2 [L]
button.href = lolmeter/platformValue/usernameValue
I noticed that when the user inputs a whitespace or another non alphanumeric value, it is encoded with "%" symbols automatically, so I tried to rewrite the rule to accept them, like:
RewriteRule ^lolmeter/(([a-zA-Z0-9]|%)*)/(([a-zA-Z0-9]|%)*)$ /lolmeter.html?platform=$1&username=$2 [L]
But it doesn't work, I assume because of the parentheses. Which symbol should I use then for an inner "|" ?
P.S: Is there a more popular or modern way for changing URLs?
RewriteRule ^lolmeter/(([a-zA-Z0-9]|%)*)/(([a-zA-Z0-9]|%)*)$ /lolmeter.html?platform=$1&username=$2 [L]
The RewriteRule pattern matches against the %-decoded URL-path. So, if an encoded space (ie. %20) is present in the URL-path of the request then the rule matches against a literal space, not %20.
You can use the \s shorthand character class inside the character class in your regex to match any whitespace character.
For example:
RewriteRule ^lolmeter/([a-zA-Z0-9\s]+)/([a-zA-Z0-9\s]*)$ /lolmeter.html?platform=$1&username=$2 [L]
Note that I made the quantifier on the second/middle path segment + instead of * since I assume the middle path segment is not optional. Note that multiple contiguous slashes in the URL-path are also reduced before the regex is matched so if the middle path segment was omitted then the passed username would be seen as the platform, which I'm sure is not the intention.
Note also that in the above the space is not re-encoded in the resulting rewrite. Use the B flag to re-encode the space as a + in the query string. (If you specifically needed the space to be re-encoded as %20 then use the BNP flag as well - requires Apache 2.4.26)
P.S: Is there a more popular or modern way for changing URLs?
Not sure exactly what you mean by this, but mod_rewrite on Apache is the URL rewriting module. Always has been and probably always will be.
However, you don't necessarily need to rewrite the request the way you have done, although you may still want to match the URL in a similar way (depending on what else you are doing). You could perhaps just rewrite the request to lolmeter.html and have your script parse the URL-path directly, rather than the query string.
Or, I suppose the "modern way" would be to rewrite everything to a "front-controller" - an entry script that parses the URL and "routes" the request appropriately. This avoids having to have a multitude of rewrites in .htaccess. Although this isn't anything "new", it has perhaps become more common. Many CMS/frameworks use this pattern.

How to mod_rewrite query string which includes path and parameters?

My website uses a rather complicated query string parameter: Its value is a path including parameters.
For SEO (Search Engine Optimization) etc. I'm now attempting to mod_rewrite shortened versions...
example.com/path/c1/d1/e1.html?x=x1&y=y1
example.com/path/c2/d2/e2.html?x=x2&y=y2
example.com/path/c2/d3/e4.html?x=x5&y=y6
...to the currently required...
example.com/path/?param=a/b/c1/d1/e1?x=x1&y=y1
example.com/path/?param=a/b/c2/d2/e2?x=x2&y=y2
example.com/path/?param=a/b/c2/d3/e4?x=x5&y=y6
So the goal is to...
get rid of the fixed part (?param=a/b/) to shorten the address and
don't have two ? in the visible address
preserve the query string value's necessary variable path components (like c1/d1/e1 or c2/d2/e2 or c2/d3/e4)
add .html to the final part before the query string value's ? to make the folder structure appear 1 level less deep
preserve the query string value's necessary variable parameters (like ?x=x1&y=y1 or ?x=x2&y=y2 or ?x=x5&y=y6)
After hours of research and attempting lots of things that did not work, I signed up here to request your advice on how to solve this mess. Would you please be so kind to assist?
Edit / additional infos:
After the fixed string /path/?param=a/b/ it is always 3 variable path segments like c1/d1/e1.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
Same applies to the parameter values (x1, y1). Additionally, y1 can contain percent symbol % due to URL-encoding.
Using two question marks (one to start the query string and the other as part of the parameter value) looks invalid but works.
The actual file that handles the request is /path/index.php.
Try the following at the top of your .htaccess file, using mod_rewrite:
RewriteEngine on
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?(x=[^&]+&y=[^&]+)\s
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(x=[^&]+&y=[^&]+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1 [L]
The first rule redirects any direct requests for the "old" URL of the form /path/?param=a/b/c1/d1/e1?x=1&y=y1 (index.php is optional) to the "new" canonical URL of the form /path/c1/d1/e1.html?x=x1&y=y1. This is for the benefit of search engines and any third party inbound links that cannot be updated. You must, however, have already changed all your internal links to the "new" canonical URL.
By matching against THE_REQUEST (as opposed to the QUERY_STRING) we avoid a redirect loop by preventing the rewritten URL from being redirected. THE_REQUEST contains the first line of the request headers and is not changed by other rewrites. For example, THE_REQUEST would contain a string of the form:
GET /path/?param=a/b/c1/d1/e1?x=1&y=y1 HTTP/1.1
This is currently a 302 (temporary) redirect. Only change this to a 301 (permanent) redirect once you have tested that this works OK, in order to avoid potential caching issues.
The second rule internally rewrites requests for the "new" canonical URL, eg. /path/c1/d1/e1.html?x=x1&y=y1, back to the original/underlying URL-path, eg. /path/index.php?param=a/b/c1/d1/e1?x=1&y=y1. The & before the last URL parameter is intentional un-escaped (ie. URL decoded) as discussed in comments.
The $1 and $2 backreferences refer back to the captured groups in the RewriteRule pattern. Whereas the %1 and %2 backreferencs refer to the captured groups in the preceding CondPattern.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
I've used a more general (and shorter) subpattern in the regex above which will match more characters, but is arguably easier to read. ie. [^/]+ - matches anything except a slash and [^&]+ - matches anything except a &.
If you specifically wanted to match only the allowed characters then you could change the above subpatterns to [a-zA-Z0-9()%-]+ or [\w()%-]+ which also matches underscores (_).
UPDATE: x and y are just examples for parameter names, but in reality there can be lots of different parameter names.
the parameters have more than a single character. They consist of letters a-z, A-Z and in the future maybe digits 0-9. There can be more than the two parameters x and y.
Maybe just match any query string (providing there is a query string).
Try the following instead:
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?([^\s]+)
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(.+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1

RewriteRule for Gmail Email Addresses

I'm trying to create a rewrite rule for unsubscribe URLs so that the url
https://example.com/unsubscribe/myemail#example.com/
will be re-written to
https://example.com/unsubscribe.php?email=myemail#example.com
In the past I've always used the following rule with no problems
ReWriteRule ^unsubscribe/(.*)/?$ /unsubscribe.php?email=$1 [NC,L]
However, when testing this recently, it seems to be replacing the "+" character (as is used commonly with gmail tagging, for example "myemail+spam#example.com") with an empty space, creating an email address different to the one entered by the user. This is a problem. You can see an example here:
Example Rewrite Rule Processing
I don't really get why this is happening as the "(.*)" filter should allow any character any number of times, shouldn't it?
Any suggestions would be greatly appreciated.
You can use the mod-rewrite B flag in your rule:
ReWriteRule ^unsubscribe/(.*)/?$ /unsubscribe.php?email=$1 [NC,L,B]
From the apache mod-rewrite flag manual :
The [B] flag instructs RewriteRule to escape non-alphanumeric characters before applying the transformation.
In 2.4.26 and later, you can limit the escaping to specific characters in backreferences by listing them: [B=#?;]. Note: The space character can be used in the list of characters to escape, but it cannot be the last character in the list.
mod_rewrite has to unescape URLs before mapping them, so backreferences are unescaped at the time they are applied .

.htaccess rule that only accepts a specific value in a list

I want to accept this url structure, where $level can only be one of these values: a-, a+, b-, b+, ab-, ab+
domain.com/notes/a-
domain.com/notes/a+
domain.com/notes/b-
domain.com/notes/b+
domain.com/notes/ab-
domain.com/notes/ab+
I tried this approach, but I was unsuccessful.
RewriteRule ^notes/([a|o|b|ab]-+)$ /notes.php?level=$1 [L]
You're nearly there, but + is a special character so it has to be escaped, and alternation (pipe character) goes in parentheses. I removed 'o' that wasn't in your list. (?:) just says don't capture this.
RewriteRule ^notes/((?:a|b|ab)(?:-|\+))$ notes.php?level=$1 [L]

Multiple rewrite rule with different parameters in the same position

I would like to know if the following would be possible
RewriteRule ^([^/]*)/([^/]*)$ /search.php?type=$1&query=$2 [L]
RewriteRule ^([^/]*)/([^/]*)/([^/]*)$ /search.php?type=$1&query=$2&condition=$3 [L]
RewriteRule ^([^/]*)/([^/]*)/([^/]*)/([^/]*)$ /search.php?type=$1&query=$2&page=$3[L]
As you can see the first and third row are similiar with the only difference being the name of the third parameter, the second rule would be used for pages such as
/isbn/1203910293/new
whilst the third rule would be used for pages such as where page aliases page number
/title/harry-potter/2
I know this seems quite silly considering I can just call the condition parameter, but it would clear things up in the future if used the parameter page
The third rule pattern ^([^/]*)/([^/]*)/([^/]*)/([^/]*)$ will not match
/title/harry-potter/2
because the rule requires four parts, e.g.
/title/harry-potter/2/xyz
or at least a trailing slash
/title/harry-potter/2/
Instead it will be matched by the second rule pattern, because it has three parts too, just like
/isbn/1203910293/new
If you want to match page numbers, you need to match against a rule similar to the second rule, but be more specific, like e.g.
RewriteRule ^([^/]*)/([^/]*)/(\d+)$ /search.php?type=$1&query=$2&page=$3 [L]