How to rewrite URLs in htaccess that end with recurring characters - apache

I have changed web platforms and have old URLs that I cannot and do not want to match on the new platform where the old content is now living.
I have an array of old product URLs that all have '-p-' in the URL, followed by a string of numbers and ending in .html (osCommerce platform URLs).
I would like to know how to rewrite:
/x/[rest-of-url]-p-[random numbers].html
to
/x/[rest-of-url]
I would like the end result to look something like this:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo-p-2214.html
redirects to:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo
Does anyone know if this is doable in the htaccess file as a rewrite rule?

My managed hosting service providers BeepWeb answered my question.
RewriteRule ^/shop/(.*)-p-(.*).html$ http://www.shop.com/product/$1/ [R=302]
The first argument is the URI that you are matching. The (.) matches any characters. The second argument is the destination URL. The $1 corresponds to the first (.). $2 would be the second (.*), and so on... The [R=302] tells the rewrite to be a 302 redirect (use [R=301] for a 301 redirect).
Using the (.) is essentially like using a wildard. You can instead narrow this down by specifying which characters you want to match as opposed to all characters (instead of using (.) you could use ([abc]*) which would match only against a, b and c characters).
Also, be careful that you do not match other URLs unintentionally (i.e. you need to make sure that the pattern matches are unique to the URLs being rewritten).
If you need the source reference, see the following:
https://httpd.apache.org/docs/current/rewrite/intro.html
Thanks again to http://www.beepweb.com for their detailed response.
Hope it helps others.

Related

htaccess rewrite rule containing certain word

I have a magento web shop and i use a plugin to import stock, prices, products. Annoyingly this plugin doesnt save old urls if i update the product name etc.
Is there a way i can do this with htaccess? For example, i removed the SKU from the end of a product URL but google has indexed some of these old URLs.
Is it possible to rewrite https://www.example.com/xerox-everyday-toner-for-tn242y-yellow-toner-cartridge-006r04226 to https://www.example.com/xerox-everyday-toner-for-tn242y-yellow-toner-cartridge using some wildcards? Obviously everything before the word "cartridge" changes per product so i want a redirect that if a URL contains "-cartridge-" remove everything after that pattern as SKU lengths can change but only contain alphanumeric characters. If a URL does not contain "-cartridge-" do not do anything.
I've tried a few regex patterns using an online htaccess builder but i cant seem to get this correct (unless these sites dont process the regex and thats why i think they dont work).
RewriteRule (.+-cartridge)-.+$ $1 [R=301,L]
This should do the job. Everything up to -cartridge gets captured (capturing the dynamic part before and this static suffix in one go, means we don't have the assemble the substitution URL out of multiple parts, but can just use $1), and after it a - plus some arbitrary characters must follow.
is there anyway you can add a rule so it excludes if "multipack" comes after the "-cartridge-".
The often easiest way to do this, is to place a "do nothing" rule before the one that does the rewriting. Then you can work with a positive match ("if URL ends in -cartridge-multipack, do nothing"), instead of trying to find a negated pattern.
RewriteRule -cartridge-multipack$ - [L]
RewriteRule (.+-cartridge)-.+$ $1 [R=301,L]
Pattern anchored at the end with $ (means nothing is allowed to come after this), - for "no substitution", and the L flag to make the rewrite engine stop the current round of processing.

Need .htaccess recipe to display rss feed dynamically

I currently use the following recipe to route .rss files to a script that produces a rss feed dynamically:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
It works perfectly for URLs like this:
www.example.com/articles.rss
What I would to like to do is change the URL to this:
www.example.com/rss/articles/
Everything I have tried doesn't work.
I just tried to put some slashes in the recipe but I'm not an expert in these recipes so they didn't work. Somethig like this didn't work: RewriteRule ^/rss/(.*)/$ /get-feed.pl?item=$1
("recipe" = regular expression / "regex" for short OR RewriteRule "pattern" from the Apache docs - At least I think that is what you are referring to? We are not baking a cake here! ;) )
That is very close, except that the URL-path that the RewriteRule pattern matches against does not start with a slash when used in a .htaccess (directory) context. So, it would need to be like this: ^rss/(.*)/$. If you had looked to see what your first rule was returning you would have seen that there was no slash prefix in the backreference that was captured (ie. the value of the item URL parameter).
However, there are other (minor) issues here...
The 2nd path segment cannot be empty, so it would be preferable to match something, rather than anything. eg. (.+) instead of (.*). However, this should be made more restrictive, so to match just a single path segement, instead of any URL-path (which is likely to fail anyway I suspect). eg. Presumably /rss/foo/bar/baz/ should not match?
Again, if you only want to match a string of the form articles then make the regex more restrictive so that it only matches letters (or perhaps letters + numbers + hyphens)?
You are missing the L (last) flag on this rule, which is a problem if you have other directives that follow.
So, if you are wanting to rewrite URLs of the form www.example.com/rss/articles/ (note the trailing slash) then try the following instead:
RewriteRule ^rss/([\w-]+)/$ /get-feed.pl?item=$1 [L]
Make sure the browser cache is cleared before testing.
And this would need to go near the top of the .htaccess file, before any existing rewrites.
Aside: A quick look at your original directive:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
This is not strictly correct, as it potentially matches too much. The unescaped dot before rss matches any character. And the .* subpattern matches 0 or more characters of anything - it must be something. So, this should really be something like:
RewriteRule ^([\w-]+)\.rss$ /get-feed.pl?item=$1 [L]

Rewrite encoded URLs with RewriteRules

I was rewritting "domain.com/lolmeter/platformValue/usernameValue" (platformValue and usernameValue are values requested by the user with text inputs) with the following rewrite rule:
RewriteRule ^lolmeter/([a-zA-Z0-9]*)/([a-zA-Z0-9]*)$ /lolmeter.html?platform=$1&username=$2 [L]
button.href = lolmeter/platformValue/usernameValue
I noticed that when the user inputs a whitespace or another non alphanumeric value, it is encoded with "%" symbols automatically, so I tried to rewrite the rule to accept them, like:
RewriteRule ^lolmeter/(([a-zA-Z0-9]|%)*)/(([a-zA-Z0-9]|%)*)$ /lolmeter.html?platform=$1&username=$2 [L]
But it doesn't work, I assume because of the parentheses. Which symbol should I use then for an inner "|" ?
P.S: Is there a more popular or modern way for changing URLs?
RewriteRule ^lolmeter/(([a-zA-Z0-9]|%)*)/(([a-zA-Z0-9]|%)*)$ /lolmeter.html?platform=$1&username=$2 [L]
The RewriteRule pattern matches against the %-decoded URL-path. So, if an encoded space (ie. %20) is present in the URL-path of the request then the rule matches against a literal space, not %20.
You can use the \s shorthand character class inside the character class in your regex to match any whitespace character.
For example:
RewriteRule ^lolmeter/([a-zA-Z0-9\s]+)/([a-zA-Z0-9\s]*)$ /lolmeter.html?platform=$1&username=$2 [L]
Note that I made the quantifier on the second/middle path segment + instead of * since I assume the middle path segment is not optional. Note that multiple contiguous slashes in the URL-path are also reduced before the regex is matched so if the middle path segment was omitted then the passed username would be seen as the platform, which I'm sure is not the intention.
Note also that in the above the space is not re-encoded in the resulting rewrite. Use the B flag to re-encode the space as a + in the query string. (If you specifically needed the space to be re-encoded as %20 then use the BNP flag as well - requires Apache 2.4.26)
P.S: Is there a more popular or modern way for changing URLs?
Not sure exactly what you mean by this, but mod_rewrite on Apache is the URL rewriting module. Always has been and probably always will be.
However, you don't necessarily need to rewrite the request the way you have done, although you may still want to match the URL in a similar way (depending on what else you are doing). You could perhaps just rewrite the request to lolmeter.html and have your script parse the URL-path directly, rather than the query string.
Or, I suppose the "modern way" would be to rewrite everything to a "front-controller" - an entry script that parses the URL and "routes" the request appropriately. This avoids having to have a multitude of rewrites in .htaccess. Although this isn't anything "new", it has perhaps become more common. Many CMS/frameworks use this pattern.

How to mod_rewrite query string which includes path and parameters?

My website uses a rather complicated query string parameter: Its value is a path including parameters.
For SEO (Search Engine Optimization) etc. I'm now attempting to mod_rewrite shortened versions...
example.com/path/c1/d1/e1.html?x=x1&y=y1
example.com/path/c2/d2/e2.html?x=x2&y=y2
example.com/path/c2/d3/e4.html?x=x5&y=y6
...to the currently required...
example.com/path/?param=a/b/c1/d1/e1?x=x1&y=y1
example.com/path/?param=a/b/c2/d2/e2?x=x2&y=y2
example.com/path/?param=a/b/c2/d3/e4?x=x5&y=y6
So the goal is to...
get rid of the fixed part (?param=a/b/) to shorten the address and
don't have two ? in the visible address
preserve the query string value's necessary variable path components (like c1/d1/e1 or c2/d2/e2 or c2/d3/e4)
add .html to the final part before the query string value's ? to make the folder structure appear 1 level less deep
preserve the query string value's necessary variable parameters (like ?x=x1&y=y1 or ?x=x2&y=y2 or ?x=x5&y=y6)
After hours of research and attempting lots of things that did not work, I signed up here to request your advice on how to solve this mess. Would you please be so kind to assist?
Edit / additional infos:
After the fixed string /path/?param=a/b/ it is always 3 variable path segments like c1/d1/e1.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
Same applies to the parameter values (x1, y1). Additionally, y1 can contain percent symbol % due to URL-encoding.
Using two question marks (one to start the query string and the other as part of the parameter value) looks invalid but works.
The actual file that handles the request is /path/index.php.
Try the following at the top of your .htaccess file, using mod_rewrite:
RewriteEngine on
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?(x=[^&]+&y=[^&]+)\s
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(x=[^&]+&y=[^&]+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1 [L]
The first rule redirects any direct requests for the "old" URL of the form /path/?param=a/b/c1/d1/e1?x=1&y=y1 (index.php is optional) to the "new" canonical URL of the form /path/c1/d1/e1.html?x=x1&y=y1. This is for the benefit of search engines and any third party inbound links that cannot be updated. You must, however, have already changed all your internal links to the "new" canonical URL.
By matching against THE_REQUEST (as opposed to the QUERY_STRING) we avoid a redirect loop by preventing the rewritten URL from being redirected. THE_REQUEST contains the first line of the request headers and is not changed by other rewrites. For example, THE_REQUEST would contain a string of the form:
GET /path/?param=a/b/c1/d1/e1?x=1&y=y1 HTTP/1.1
This is currently a 302 (temporary) redirect. Only change this to a 301 (permanent) redirect once you have tested that this works OK, in order to avoid potential caching issues.
The second rule internally rewrites requests for the "new" canonical URL, eg. /path/c1/d1/e1.html?x=x1&y=y1, back to the original/underlying URL-path, eg. /path/index.php?param=a/b/c1/d1/e1?x=1&y=y1. The & before the last URL parameter is intentional un-escaped (ie. URL decoded) as discussed in comments.
The $1 and $2 backreferences refer back to the captured groups in the RewriteRule pattern. Whereas the %1 and %2 backreferencs refer to the captured groups in the preceding CondPattern.
These variable segments can contain alphanumerical characters a-z A-Z 0-9, dash symbol - and bracket symbols ( and ).
I've used a more general (and shorter) subpattern in the regex above which will match more characters, but is arguably easier to read. ie. [^/]+ - matches anything except a slash and [^&]+ - matches anything except a &.
If you specifically wanted to match only the allowed characters then you could change the above subpatterns to [a-zA-Z0-9()%-]+ or [\w()%-]+ which also matches underscores (_).
UPDATE: x and y are just examples for parameter names, but in reality there can be lots of different parameter names.
the parameters have more than a single character. They consist of letters a-z, A-Z and in the future maybe digits 0-9. There can be more than the two parameters x and y.
Maybe just match any query string (providing there is a query string).
Try the following instead:
# REDIRECT: /path/?param=a/b/c1/d1/e1?x=1&y=y1
RewriteCond %{THE_REQUEST} ^[A-Z]{3,7}\s/path/(?:index\.php)?\?param=a/b/([^/]+/[^/]+/[^/]+)\?([^\s]+)
RewriteRule ^(path)/(?:index\.php)?$ /$1/%1.html?%2 [R=302,L]
# REWRITE: /path/c1/d1/e1.html?x=x1&y=y1
RewriteCond %{QUERY_STRING} ^(.+)$
RewriteRule ^(path)/([^/]+/[^/]+/[^/]+)\.html$ $1/index.php?param=a/b/$2?%1

Redirect 301 from a Directory to a Single File

I'm having a bit of trouble figuring out something that should be simple. I want to 301 redirect everything in a directory to one single file in a new location.
In my .htaccess, I've already tried the following...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/index.html
and this...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/
and this...
Redirect 301 /myDir http://www.mydomain.com/myNewDir
The problem is that each of those are simply mapping each file within /myDir/, and appending it to the end of the destination URL.
After Googling, I saw something that said to do this...
Redirect 301 ^/myDir(.*) http://www.mydomain.com/myNewDir
But that just does the same thing... it's mapping the existing file location to the end of the URL.
It was easy finding lots of ".htaccess redirect" tutorials online but they seem to only show the obvious examples like 'one-to-one file mapping' or 'one-to-one directory mapping'. These tutorials also seem to neglect explaining the various relevant file directives and how to properly use them.
This particular hosting account is garbage and also has FrontPage extensions installed. Mod-rewrite fails (breaks the whole site) yet the Redirect 301 lines are operating fine. So until I can move this new (non-FrontPage) site to a more robust hosting account, I'll need to stick with the Redirect 301 one-liner.
How can I simply use a Redirect 301 to redirect everything within /myDir/ to the same single file located at /myNewDir/index.html? (I'd prefer using just /myNewDir/ if possible). Kindly explain, in detail, the file directives used in your solution.
UPDATE:
Previously accepted answer is not working.
Example:
RedirectMatch 301 /myDir1/(.*) http://mydomain.org/newpath/myDir1/index.html
...is giving a "Too many redirects occurred trying to open" error.
This is because /myDir1/(.*) is matching anyplace within the string so if the target URL contains /myDir1/ anywhere, not just the root, it will get redirected into a nasty loop.
See my own posted answer for correct solution.
I found the answer within one of my old projects.
Redirect 301 is all wrong for this. I really wanted RedirectMatch 301 instead.
RedirectMatch 301 ^/myDir/(.*) http://www.example.com/myNewDir/
Explanation(s):
http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch
"This directive is equivalent to Redirect, but makes use of standard
regular expressions, instead of simple prefix matching."
http://www.zytrax.com/tech/web/regex.htm
"The ^ (circumflex or caret) outside square brackets means look only at
the beginning of the target string, for example, ^Win will not find
Windows in STRING1 but ^Moz will find Mozilla."
and...
"The . (period) means any character(s) in this position, for example,
ton. will find tons, tone and tonneau but not wanton because it has no
following character."
and...
The * (asterisk or star) matches the preceding character 0 or more
times, for example, tre* will find tree (2 times) and tread (1 time)
and trough (0 times).
Try this:
RedirectMatch 301 /myDir/.* http://www.mydomain.com/myNewDir/index.html
Reference: http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch.
As far as brackets around .* are concerned, RedirectMatch uses standard regular expressions, which means that you can capture matched characters and use them in your redirect rule refferencing them as $1, $2, etc.
In regular expressions * means any number of repetitions of the previous character. . - denotes any character. So the combination .* says that this pattern match any number of any character. Hence * . * means that this pattern will match /myDir and /myDir/, and still /myDir/test.html. So * . * can also be used