How to change URL argument from arg-1 to arg_1 before apache processes it? use .htaccess? - apache

I have a CMS that takes url arguments to return a list of results with this structure:
website.com/argument_1/argument_2.
In order for the site to return the results the args have to have underscores.
However, the code that is generated for the url structure is
website.com/argument-1/argument-2. I need to keep this url structure, but, when someone clicks the link, I need it passed to PHP via apache with the underscores.
I hope that makes sense. Is this done with .htaccess and rewrite rules? I have never written any thing like that before, so any help is appreciated. Thanks

You should definitely do this in PHP. Here is a solution that will work, but requires one pass through RewriteRule for each dash in the URL, so I don't recommend it. But regardless:
RewriteEngine On
RewriteRule ^(.*)-(.*)$ $1_$2 [QSA]
To clarify what will happen here, if the request is for http://www.example.com/argument-1/argument-2/argument-3, the RewriteRule will be run multiple times because it can only replace a single dash per pass. So the URL will be transformed something like this:
Pass 1: http://www.example.com/argument-1/argument-2/argument_3
Pass 2: http://www.example.com/argument-1/argument_2/argument_3
Pass 3: http://www.example.com/argument_1/argument_2/argument_3
As for the $1 and $2, these refer back to the parenthesized components from the regular expression. The regular expression, ^(.*)-(.*)$, breaks down like this:
^ - Match the beginning of the URI
(.*) - Match (and capture) any number of characters, this will be $1 in the replacement
- - Match a dash
(.*) - Match (and capture) any number of characters, this will be $2 in the replacement
$ - Match the end of the URI
So the first pass through, $1 will be /argument-1/argument-2/argument and $2 will be 3. The replacement then puts an underscore between the 2 of them and creates a new URI:
/argument-1/argument-2/argument_3
Then it runs again because the regular expression still matches the new URI (because it has a - in it) and $1 is /argument-1/argument and $2 is 2/argument_3. The replacement then puts an underscore between the 2 of them and creates a new URI:
/argument-1/argument_2/argument_3
Then it runs again because the regular expression still matches the new URI (because it has a - in it) and $1 is /argument and $2 is 1/argument_2/argument_3. The replacement then puts an underscore between the 2 of them and creates a new URI:
/argument_1/argument_2/argument_3
Then Apache continues with this URI since the regular expression no longer matches (because there are no more dashes).

Related

Redirect to new URL having looked up new "ID" in RewriteMap which appears in two places in the target URL

This one may have been answered but I can't seem to find one that fits my specifics. So apologies if it sounds old.
My rewrite consists of using a rewrite map and a redirect to another domain.
For example:
An old site has moved and new id's have been generated along with file name changes
So this URL:
https://www.example.com/gallery/951/med_U951I1424251158.SEQ.0.jpg
would need to be redirected to
https://www.example2.com/gallery/5710/med_U5710I1424251158.SEQ.0.jpg
The new value 5710 comes from a rewrite map lookup where I pass 951
there are two changes needed and the 2nd change has two possible variations:
The first change is gallery/951 to gallery/5710
The 2nd change is the filename where the id can be delimited in either of 2 ways:
U951I - delimited between a U and an I
or
U951. - delimited between a U and a .
I started with something like this:
RewriteRule "^gallery/(\d+)/(\s+)$"
But that is as far as I can get.
You could do something like the following:
RewriteCond ${MapName:$1|XXX} (.+)
RewriteRule ^/?gallery/(\d+)/(.*U)\1([I.][^/]+)$ https://www.example2.com/gallery/%1/$2%1$3 [R=302,L]
I'm assuming the domain being redirected to resides on a different server, otherwise (if this is in .htaccess) you would need to check the requested hostname in an additional condition (RewriteCond directive).
Explanation:
The regex ^/?gallery/(\d+)/(.*U)\1([I.][^/]+)$ matches and captures the relevant parts of the source URL...
^/?gallery/ - Literal text (optional slash prefix if this is being used in a server context, as opposed to .htaccess)
(\d+) - Captures the "old" member ID (later available using the $1 backreference and used to lookup the "new" member ID from the RewriteMap)
/ - literal slash
(.*U) - Captures the first part of the filename before the "old" member ID. Later available in the $2 backreference.
\1 - An internal backreference matches the "old" member ID that occurred in the 2nd path segment - Not captured.
([I.][^/]+)$ - Captures the last part of the filename after the "old" member ID - Later available in the $3 backreference. The character classs [I.] matches either an I or a literal . (the dot does not need to be escaped when used inside a character class). And [^/]+ matches everything thereafter to the end of the filename (URL-path).
The RewriteCond directive serves to lookup the new member ID (using the old member ID saved in the $1 backreference) in the rewrite map defined earlier in the server config. The result is then captured, which is later available in the %1 backreference. I set a default XXX so it would be easy to spot any lookups that fail.
Using the RewriteCond directive means we only lookup the rewrite map once. (Results are probably cached anyway, but it saves repetition at the very least.)
/%1/$2%1$3 - The substitution string is then constructed from the backreferences captured earlier:
%1 is the new member ID (captured in the preceding CondPattern)
$2 is the part of the filename before the member ID (captured by the RewriteRule pattern against the URL-path).
$3 is the part of the filename after the member ID.
Note that backreferences of the form $n refer to captured groups in the RewriteRule pattern and backreferences of the form %n refer to captured groups in the last matched CondPattern (RewriteCond directive).

Need .htaccess recipe to display rss feed dynamically

I currently use the following recipe to route .rss files to a script that produces a rss feed dynamically:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
It works perfectly for URLs like this:
www.example.com/articles.rss
What I would to like to do is change the URL to this:
www.example.com/rss/articles/
Everything I have tried doesn't work.
I just tried to put some slashes in the recipe but I'm not an expert in these recipes so they didn't work. Somethig like this didn't work: RewriteRule ^/rss/(.*)/$ /get-feed.pl?item=$1
("recipe" = regular expression / "regex" for short OR RewriteRule "pattern" from the Apache docs - At least I think that is what you are referring to? We are not baking a cake here! ;) )
That is very close, except that the URL-path that the RewriteRule pattern matches against does not start with a slash when used in a .htaccess (directory) context. So, it would need to be like this: ^rss/(.*)/$. If you had looked to see what your first rule was returning you would have seen that there was no slash prefix in the backreference that was captured (ie. the value of the item URL parameter).
However, there are other (minor) issues here...
The 2nd path segment cannot be empty, so it would be preferable to match something, rather than anything. eg. (.+) instead of (.*). However, this should be made more restrictive, so to match just a single path segement, instead of any URL-path (which is likely to fail anyway I suspect). eg. Presumably /rss/foo/bar/baz/ should not match?
Again, if you only want to match a string of the form articles then make the regex more restrictive so that it only matches letters (or perhaps letters + numbers + hyphens)?
You are missing the L (last) flag on this rule, which is a problem if you have other directives that follow.
So, if you are wanting to rewrite URLs of the form www.example.com/rss/articles/ (note the trailing slash) then try the following instead:
RewriteRule ^rss/([\w-]+)/$ /get-feed.pl?item=$1 [L]
Make sure the browser cache is cleared before testing.
And this would need to go near the top of the .htaccess file, before any existing rewrites.
Aside: A quick look at your original directive:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
This is not strictly correct, as it potentially matches too much. The unescaped dot before rss matches any character. And the .* subpattern matches 0 or more characters of anything - it must be something. So, this should really be something like:
RewriteRule ^([\w-]+)\.rss$ /get-feed.pl?item=$1 [L]

How to mod_rewrite query string which includes path and parameters?

My website uses a rather complicated query string parameter: Its value is a path including parameters.
For SEO (Search Engine Optimization) etc. I'm now attempting to mod_rewrite shortened versions...
example.com/path/c1/d1/e1.html?x=x1&y=y1
example.com/path/c2/d2/e2.html?x=x2&y=y2
example.com/path/c2/d3/e4.html?x=x5&y=y6
...to the currently required...
example.com/path/?param=a/b/c1/d1/e1?x=x1&y=y1
example.com/path/?param=a/b/c2/d2/e2?x=x2&y=y2
example.com/path/?param=a/b/c2/d3/e4?x=x5&y=y6
So the goal is to...
get rid of the fixed part (?param=a/b/) to shorten the address and
don't have two ? in the visible address
preserve the query string value's necessary variable path components (like c1/d1/e1 or c2/d2/e2 or c2/d3/e4)
add .html to the final part before the query string value's ? to make the folder structure appear 1 level less deep
preserve the query string value's necessary variable parameters (like ?x=x1&y=y1 or ?x=x2&y=y2 or ?x=x5&y=y6)
After hours of research and attempting lots of things that did not work, I signed up here to request your advice on how to solve this mess. Would you please be so kind to assist?
Edit / additional infos:
After the fixed string /path/?param=a/b/ it is always 3 variable path segments like c1/d1/e1.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
Same applies to the parameter values (x1, y1). Additionally, y1 can contain percent symbol % due to URL-encoding.
Using two question marks (one to start the query string and the other as part of the parameter value) looks invalid but works.
The actual file that handles the request is /path/index.php.
Try the following at the top of your .htaccess file, using mod_rewrite:
RewriteEngine on
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?(x=[^&]+&y=[^&]+)\s
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(x=[^&]+&y=[^&]+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1 [L]
The first rule redirects any direct requests for the "old" URL of the form /path/?param=a/b/c1/d1/e1?x=1&y=y1 (index.php is optional) to the "new" canonical URL of the form /path/c1/d1/e1.html?x=x1&y=y1. This is for the benefit of search engines and any third party inbound links that cannot be updated. You must, however, have already changed all your internal links to the "new" canonical URL.
By matching against THE_REQUEST (as opposed to the QUERY_STRING) we avoid a redirect loop by preventing the rewritten URL from being redirected. THE_REQUEST contains the first line of the request headers and is not changed by other rewrites. For example, THE_REQUEST would contain a string of the form:
GET /path/?param=a/b/c1/d1/e1?x=1&y=y1 HTTP/1.1
This is currently a 302 (temporary) redirect. Only change this to a 301 (permanent) redirect once you have tested that this works OK, in order to avoid potential caching issues.
The second rule internally rewrites requests for the "new" canonical URL, eg. /path/c1/d1/e1.html?x=x1&y=y1, back to the original/underlying URL-path, eg. /path/index.php?param=a/b/c1/d1/e1?x=1&y=y1. The & before the last URL parameter is intentional un-escaped (ie. URL decoded) as discussed in comments.
The $1 and $2 backreferences refer back to the captured groups in the RewriteRule pattern. Whereas the %1 and %2 backreferencs refer to the captured groups in the preceding CondPattern.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
I've used a more general (and shorter) subpattern in the regex above which will match more characters, but is arguably easier to read. ie. [^/]+ - matches anything except a slash and [^&]+ - matches anything except a &.
If you specifically wanted to match only the allowed characters then you could change the above subpatterns to [a-zA-Z0-9()%-]+ or [\w()%-]+ which also matches underscores (_).
UPDATE: x and y are just examples for parameter names, but in reality there can be lots of different parameter names.
the parameters have more than a single character. They consist of letters a-z, A-Z and in the future maybe digits 0-9. There can be more than the two parameters x and y.
Maybe just match any query string (providing there is a query string).
Try the following instead:
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?([^\s]+)
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(.+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1

How to rewrite URLs in htaccess that end with recurring characters

I have changed web platforms and have old URLs that I cannot and do not want to match on the new platform where the old content is now living.
I have an array of old product URLs that all have '-p-' in the URL, followed by a string of numbers and ending in .html (osCommerce platform URLs).
I would like to know how to rewrite:
/x/[rest-of-url]-p-[random numbers].html
to
/x/[rest-of-url]
I would like the end result to look something like this:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo-p-2214.html
redirects to:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo
Does anyone know if this is doable in the htaccess file as a rewrite rule?
My managed hosting service providers BeepWeb answered my question.
RewriteRule ^/shop/(.*)-p-(.*).html$ http://www.shop.com/product/$1/ [R=302]
The first argument is the URI that you are matching. The (.) matches any characters. The second argument is the destination URL. The $1 corresponds to the first (.). $2 would be the second (.*), and so on... The [R=302] tells the rewrite to be a 302 redirect (use [R=301] for a 301 redirect).
Using the (.) is essentially like using a wildard. You can instead narrow this down by specifying which characters you want to match as opposed to all characters (instead of using (.) you could use ([abc]*) which would match only against a, b and c characters).
Also, be careful that you do not match other URLs unintentionally (i.e. you need to make sure that the pattern matches are unique to the URLs being rewritten).
If you need the source reference, see the following:
https://httpd.apache.org/docs/current/rewrite/intro.html
Thanks again to http://www.beepweb.com for their detailed response.
Hope it helps others.

Apache rewrite backreference variable not accessible after first use

I have come across a situation that seems odd to me. It seems that backreference variables when building apache rewrite rules get lost after the first use.
My requirement is changing an old URL pattern to conform to a new path pattern, e.g:
www.example.com/documents/newsletter/newsletter-issue-50.htm
to become
www.example.com/sites/default/newsletter/50/English/newsletter-issue-50.htm
As you can see, the new URL pattern needs to have the issue number specified in 2 places.
My rewrite rule is as follows:
RewriteRule ^documents/newsletter/newsletter-issue-(.*).htm$ http://www.example.com/sites/default/newsletter/$1/English/newsletter-issue-$1.htm [R=301,L]
When I use this rule, I still get a 404 because the resultant URL misses to replace the second "$1" with the issue number , in this case "50". What I get is
http://www.example.com/sites/default/newsletter/50/English/newsletter-issue-.htm
I have used this test site and it confirms that the second backreference variable is not being evaluated at all. Am sure am missing something here, since it should be a simple rule to put in place.
Any help on this would be greatly appreciated.
Thanks.
Strangely enough, I works in the rewrite tester if you surround with 2 sets of parenthesis:
RewriteRule ^documents/newsletter/newsletter-issue-((.*))[.]htm$ http://www.example.com/sites/default/newsletter/$1/English/newsletter-issue-$1.htm [R=301,L]
I have also escaped the file extension prefix