.htaccess: Add optional language parameter to all pages - apache

I have a .htaccess file rewriting all URLs. For example:
# Special urls
RewriteRule ^(article)/([^/]*)(?:/[^/]*)?$ /index.php?page=$1&keyword=$2 [L,QSA,NC]
RewriteRule ^(privacy)$ /index.php?page=legal&type=$1 [L,QSA,NC]
RewriteRule ^(imprint)$ /index.php?page=legal&type=$1 [L,QSA,NC]
# All urls
RewriteRule ^([0-9a-zA-Z\s]+)$ /index.php?page=$1 [QSA]
Now I want to integrate a language parameter in my URL. For example
example.com/test ➡ /index.php?page=test
example.com/en/test ➡ /index.php?lang=en&page=test
How would I accomplish this without having to edit all RewriteRules? Is there a way to check if the part after example.com matches a regex, append a query parameter and handle all future rules normally?

Yes, this is possible. Add the following rule before your existing rules:
# Add optional language URL param if present in the first path segment
RewriteRule ^(\w\w)/(.*) $2?lang=$1 [QSA,DPI]
UPDATE: The DPI flag discards the original path-info that would otherwise be appended to the URL-path after the rewrite. This would result in the directives that immediately follow from failing to match, until the next round of processing. (See update below that goes into more detail.)
This assumes that the language code is always 2 characters.
The URL is rewritten to remove the language path segment from the URL-path and appends this as a lang URL parameter. The L flag is specifically omitted so the following rules are left to match the URL-path without the language code.
Since the following rules already have the QSA flag then the lang= URL parameter is appended.
Note, however, that the lang=en parameter is appended at the end of the query string, not prefixed to the beginning, as in your example.
UPDATE: It seems like not only the language gets appended, but also the page name. For example: example.com/de/index results in index?lang=de/index
Solved by adding DPI to [QSA] ➡ [QSA,DPI]
Hhhmmm, yes and no... given only the directives as stated in the question this should still "work" without the DPI flag (although not as efficiently). In fact, index?lang=de/index does not seem to be possible as a resulting URL (there's no page URL parameter)? It's possible that other directives are perhaps resulting in the path-info being appended to the query string? One thing of note is that the last rule stated in the question is missing the L flag, so any directives that follow are perhaps being unnecessarily processed (and even conflicting). However, the DPI flag is certainly an improvement here and should be added.
In detail...
Given a request of the form /de/index, and /de does not exist as a physical directory, then the /index part on the end of the requested URL-path is additional pathname information (path-info) and this is appended to the URL-path after the rewrite above (which is indeed undesirable). So, a request of the form /de/index is rewritten to index?lang=de, which becomes index/index?lang=de after the path-info is re-appended (note that it's not appended to the query string).
The resulting URL-path index/index (with appended path-info) fails to match the RewriteRule directives that follow. However, the rewrite engine then starts over, at which point the path-info that was (unnecessarily) added is then naturally discarded before the next round of processing. This results in the "correct" index?lang=de URL being used as the input for the second round of processing. This matches the last rule stated in the question and the request is finally rewritten to /index.php?page=index&lang=de.
So, the DPI flag shouldn't strictly be necessary here for it to "work". However, it is certainly recommended as it avoids the unnecessary 2nd pass through the rewrite engine. With the DPI flag, the path-info is not appended after the first rewrite, so the URL-path would match the appropriate rule on the first pass.

Related

Need .htaccess recipe to display rss feed dynamically

I currently use the following recipe to route .rss files to a script that produces a rss feed dynamically:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
It works perfectly for URLs like this:
www.example.com/articles.rss
What I would to like to do is change the URL to this:
www.example.com/rss/articles/
Everything I have tried doesn't work.
I just tried to put some slashes in the recipe but I'm not an expert in these recipes so they didn't work. Somethig like this didn't work: RewriteRule ^/rss/(.*)/$ /get-feed.pl?item=$1
("recipe" = regular expression / "regex" for short OR RewriteRule "pattern" from the Apache docs - At least I think that is what you are referring to? We are not baking a cake here! ;) )
That is very close, except that the URL-path that the RewriteRule pattern matches against does not start with a slash when used in a .htaccess (directory) context. So, it would need to be like this: ^rss/(.*)/$. If you had looked to see what your first rule was returning you would have seen that there was no slash prefix in the backreference that was captured (ie. the value of the item URL parameter).
However, there are other (minor) issues here...
The 2nd path segment cannot be empty, so it would be preferable to match something, rather than anything. eg. (.+) instead of (.*). However, this should be made more restrictive, so to match just a single path segement, instead of any URL-path (which is likely to fail anyway I suspect). eg. Presumably /rss/foo/bar/baz/ should not match?
Again, if you only want to match a string of the form articles then make the regex more restrictive so that it only matches letters (or perhaps letters + numbers + hyphens)?
You are missing the L (last) flag on this rule, which is a problem if you have other directives that follow.
So, if you are wanting to rewrite URLs of the form www.example.com/rss/articles/ (note the trailing slash) then try the following instead:
RewriteRule ^rss/([\w-]+)/$ /get-feed.pl?item=$1 [L]
Make sure the browser cache is cleared before testing.
And this would need to go near the top of the .htaccess file, before any existing rewrites.
Aside: A quick look at your original directive:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
This is not strictly correct, as it potentially matches too much. The unescaped dot before rss matches any character. And the .* subpattern matches 0 or more characters of anything - it must be something. So, this should really be something like:
RewriteRule ^([\w-]+)\.rss$ /get-feed.pl?item=$1 [L]

Replace ? with / in rewrite rules

I need to achieve this map given below :
https://www.domain1.com/open.html?cuSDyh&utm_medium=rep-email
to
https://www.domain2.com/c/cuSDyh?utm_medium=rep-email
I write a rewrite rule to achieve this which seems wrong as "?" is not replaced in the end result.
RewriteRule ^/open.html?(.*)$ https://www.domain2.com/c/$1 [NC,R=301,L]
Can anyone help me to find the right rewrite rule for this one?
RewriteRule ^/open.html?(.*)$ https://www.domain2.com/c/$1 [NC,R=301,L]
There are a couple of problems here that prevents the rule from working:
The RewriteRule pattern (1st argument) matches against the URL-path only, which notably excludes the query string. To match the query string you need an additional condition (RewriteCond directive) and match against the QUERY_STRING server variable.
The URL-path matched by the RewriteRule pattern does not start with a slash.
Note that the RewriteRule pattern is a regex. The unescaped ? in the above regex is a special meta character (a quantifier) that makes the preceding token optional. So, in your example, the l is optional. (Although you match everytihng that follows anyway, so the ? is entirely redundant.)
FROM: /open.html?cuSDyh&utm_medium=rep-email
TO: /c/cuSDyh?utm_medium=rep-email
I'm assuming the first URL-parameter (eg. cuSDyh in your example) consists of letters only (lower or uppercase) and is not a name=value pair (as per your example). There can be 0 or more URL parameters that follow the first URL parameter.
Try the following instead:
RewriteCond %{QUERY_STRING} ^([a-z]+)(?:&|$)(.*) [NC]
RewriteRule ^open\.html$ /c/%1?%2 [NE,R=302,L]
The %1 and %2 backreferences match the captured subgroups in the preceding CondPattern. ie. %1 matches the first URL parameter (eg. cuSDyh in your example) and %2 matches all subsequent URL params, excluding the delimiter.
The NE flag (not NC) is required on the RewriteRule directive to prevent the query string part of the URL being doubly URL-encoded in the redirect response. (The QUERY_STRING variable is already URL-encoded.)
If there are no additional URL parameters after the first then the above rule will essentially append an empty query string (denoted by the trailing ?). However, the trailing ? is removed automatically by Apache in the redirect response.
To avoid potential caching issues, always test first with a 302 (temporary) redirect and only change to a 301 (permanent) redirect - if that is the intention - once testing is complete.

Rewrite encoded URLs with RewriteRules

I was rewritting "domain.com/lolmeter/platformValue/usernameValue" (platformValue and usernameValue are values requested by the user with text inputs) with the following rewrite rule:
RewriteRule ^lolmeter/([a-zA-Z0-9]*)/([a-zA-Z0-9]*)$ /lolmeter.html?platform=$1&username=$2 [L]
button.href = lolmeter/platformValue/usernameValue
I noticed that when the user inputs a whitespace or another non alphanumeric value, it is encoded with "%" symbols automatically, so I tried to rewrite the rule to accept them, like:
RewriteRule ^lolmeter/(([a-zA-Z0-9]|%)*)/(([a-zA-Z0-9]|%)*)$ /lolmeter.html?platform=$1&username=$2 [L]
But it doesn't work, I assume because of the parentheses. Which symbol should I use then for an inner "|" ?
P.S: Is there a more popular or modern way for changing URLs?
RewriteRule ^lolmeter/(([a-zA-Z0-9]|%)*)/(([a-zA-Z0-9]|%)*)$ /lolmeter.html?platform=$1&username=$2 [L]
The RewriteRule pattern matches against the %-decoded URL-path. So, if an encoded space (ie. %20) is present in the URL-path of the request then the rule matches against a literal space, not %20.
You can use the \s shorthand character class inside the character class in your regex to match any whitespace character.
For example:
RewriteRule ^lolmeter/([a-zA-Z0-9\s]+)/([a-zA-Z0-9\s]*)$ /lolmeter.html?platform=$1&username=$2 [L]
Note that I made the quantifier on the second/middle path segment + instead of * since I assume the middle path segment is not optional. Note that multiple contiguous slashes in the URL-path are also reduced before the regex is matched so if the middle path segment was omitted then the passed username would be seen as the platform, which I'm sure is not the intention.
Note also that in the above the space is not re-encoded in the resulting rewrite. Use the B flag to re-encode the space as a + in the query string. (If you specifically needed the space to be re-encoded as %20 then use the BNP flag as well - requires Apache 2.4.26)
P.S: Is there a more popular or modern way for changing URLs?
Not sure exactly what you mean by this, but mod_rewrite on Apache is the URL rewriting module. Always has been and probably always will be.
However, you don't necessarily need to rewrite the request the way you have done, although you may still want to match the URL in a similar way (depending on what else you are doing). You could perhaps just rewrite the request to lolmeter.html and have your script parse the URL-path directly, rather than the query string.
Or, I suppose the "modern way" would be to rewrite everything to a "front-controller" - an entry script that parses the URL and "routes" the request appropriately. This avoids having to have a multitude of rewrites in .htaccess. Although this isn't anything "new", it has perhaps become more common. Many CMS/frameworks use this pattern.

How to mod_rewrite query string which includes path and parameters?

My website uses a rather complicated query string parameter: Its value is a path including parameters.
For SEO (Search Engine Optimization) etc. I'm now attempting to mod_rewrite shortened versions...
example.com/path/c1/d1/e1.html?x=x1&y=y1
example.com/path/c2/d2/e2.html?x=x2&y=y2
example.com/path/c2/d3/e4.html?x=x5&y=y6
...to the currently required...
example.com/path/?param=a/b/c1/d1/e1?x=x1&y=y1
example.com/path/?param=a/b/c2/d2/e2?x=x2&y=y2
example.com/path/?param=a/b/c2/d3/e4?x=x5&y=y6
So the goal is to...
get rid of the fixed part (?param=a/b/) to shorten the address and
don't have two ? in the visible address
preserve the query string value's necessary variable path components (like c1/d1/e1 or c2/d2/e2 or c2/d3/e4)
add .html to the final part before the query string value's ? to make the folder structure appear 1 level less deep
preserve the query string value's necessary variable parameters (like ?x=x1&y=y1 or ?x=x2&y=y2 or ?x=x5&y=y6)
After hours of research and attempting lots of things that did not work, I signed up here to request your advice on how to solve this mess. Would you please be so kind to assist?
Edit / additional infos:
After the fixed string /path/?param=a/b/ it is always 3 variable path segments like c1/d1/e1.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
Same applies to the parameter values (x1, y1). Additionally, y1 can contain percent symbol % due to URL-encoding.
Using two question marks (one to start the query string and the other as part of the parameter value) looks invalid but works.
The actual file that handles the request is /path/index.php.
Try the following at the top of your .htaccess file, using mod_rewrite:
RewriteEngine on
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?(x=[^&]+&y=[^&]+)\s
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(x=[^&]+&y=[^&]+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1 [L]
The first rule redirects any direct requests for the "old" URL of the form /path/?param=a/b/c1/d1/e1?x=1&y=y1 (index.php is optional) to the "new" canonical URL of the form /path/c1/d1/e1.html?x=x1&y=y1. This is for the benefit of search engines and any third party inbound links that cannot be updated. You must, however, have already changed all your internal links to the "new" canonical URL.
By matching against THE_REQUEST (as opposed to the QUERY_STRING) we avoid a redirect loop by preventing the rewritten URL from being redirected. THE_REQUEST contains the first line of the request headers and is not changed by other rewrites. For example, THE_REQUEST would contain a string of the form:
GET /path/?param=a/b/c1/d1/e1?x=1&y=y1 HTTP/1.1
This is currently a 302 (temporary) redirect. Only change this to a 301 (permanent) redirect once you have tested that this works OK, in order to avoid potential caching issues.
The second rule internally rewrites requests for the "new" canonical URL, eg. /path/c1/d1/e1.html?x=x1&y=y1, back to the original/underlying URL-path, eg. /path/index.php?param=a/b/c1/d1/e1?x=1&y=y1. The & before the last URL parameter is intentional un-escaped (ie. URL decoded) as discussed in comments.
The $1 and $2 backreferences refer back to the captured groups in the RewriteRule pattern. Whereas the %1 and %2 backreferencs refer to the captured groups in the preceding CondPattern.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
I've used a more general (and shorter) subpattern in the regex above which will match more characters, but is arguably easier to read. ie. [^/]+ - matches anything except a slash and [^&]+ - matches anything except a &.
If you specifically wanted to match only the allowed characters then you could change the above subpatterns to [a-zA-Z0-9()%-]+ or [\w()%-]+ which also matches underscores (_).
UPDATE: x and y are just examples for parameter names, but in reality there can be lots of different parameter names.
the parameters have more than a single character. They consist of letters a-z, A-Z and in the future maybe digits 0-9. There can be more than the two parameters x and y.
Maybe just match any query string (providing there is a query string).
Try the following instead:
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?([^\s]+)
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(.+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1

mod rewrite help

I am trying to use mod rewrite to remove and replace part of my url. I am looking to get my urls looking like this.
http://domain.com/e813c697e8dd8dc2bbfecb1d20b15783.html
instead of
http://domain.com/lookup.php?md5=e813c697e8dd8dc2bbfecb1d20b15783
lookup.php calls matches the md5 to the database and fetches and forwards you to the correct url.
All I need to do now is rewrite it so that it rewrites from this
http://domain.com/lookup.php?md5=e813c697e8dd8dc2bbfecb1d20b15783
to this
http://domain.com/e813c697e8dd8dc2bbfecb1d20b15783.html
I have tried this which works but it makes rewrites from any .html page at root level and makes it display nothing "blank".
RewriteRule ^([a-z0-9]+)\.html$ /lookup.php?md5=$1
Can anyone tell me a way to do this so that my regular html pages are not messed up and be able to display these links how I want to? Thanks.
Your current rule is a way too broad. You need to make it more specific to only match md5 hash value -- which is easy:
RewriteRule ^([a-f0-9]{32})\.html$ /lookup.php?md5=$1 [QSA,L]
Your pattern for file name is too broad -- it will match any file with letters and digits. md5 hash, on another hand, uses very limited subset of characters (a-f only) and digits .. and has to be 32 characters long. This pattern ([a-f0-9]{32}) does the job perfectly.
I have also added L and QSA flags (QSA to preserve any existing query string (like, tracking info, for example) and L to stop matching any other rules).
To further ensure that it does not match any real files which may have name in such format (who knows), add RewriteCond %{REQUEST_FILENAME} !-f line before the rule.
One thing you can do is quantify the number of hex digits:
RewriteRule ^([0-9a-f]{32})\.html$ /lookup.php?md5=$1
as md5 will always have 32 hex digits.
That depends on the naming scheme of your regular html pages. For starters though you could change it to: RewriteRule ^([a-f0-9]+)\.html$ /lookup.php?md5=$1
Which would make any html pages which have a letter not between a and f work.
If none of your HTML pages have numbers in their names, and if the cost of a page not redirecting outweighs the odds of an md5 hash having no numbers in it. You could check that there is at least one digit in the filename: RewriteRule ^([a-f]*\d[a-f\d]+)\.html$ /lookup.php?md5=$1
Lastely if it is acceptable for you to have urls like http://domain.com/md5/e813c697e8dd8dc2bbfecb1d20b15783.html instead you could change it to: RewriteRule ^md5/([a-f0-9]+)\.html$ /lookup.php?md5=$1 and not have to worry about the fragileness of the other methods.
I'm not sure, but I think you have a RewriteBase / somewhere in you .htaccess judging by your example mod_rewrite. If not you might need to add a / right after the ^ in whatever RewriteRule you choose to go with.