htaccess rewrite rule containing certain word - apache

I have a magento web shop and i use a plugin to import stock, prices, products. Annoyingly this plugin doesnt save old urls if i update the product name etc.
Is there a way i can do this with htaccess? For example, i removed the SKU from the end of a product URL but google has indexed some of these old URLs.
Is it possible to rewrite https://www.example.com/xerox-everyday-toner-for-tn242y-yellow-toner-cartridge-006r04226 to https://www.example.com/xerox-everyday-toner-for-tn242y-yellow-toner-cartridge using some wildcards? Obviously everything before the word "cartridge" changes per product so i want a redirect that if a URL contains "-cartridge-" remove everything after that pattern as SKU lengths can change but only contain alphanumeric characters. If a URL does not contain "-cartridge-" do not do anything.
I've tried a few regex patterns using an online htaccess builder but i cant seem to get this correct (unless these sites dont process the regex and thats why i think they dont work).

RewriteRule (.+-cartridge)-.+$ $1 [R=301,L]
This should do the job. Everything up to -cartridge gets captured (capturing the dynamic part before and this static suffix in one go, means we don't have the assemble the substitution URL out of multiple parts, but can just use $1), and after it a - plus some arbitrary characters must follow.
is there anyway you can add a rule so it excludes if "multipack" comes after the "-cartridge-".
The often easiest way to do this, is to place a "do nothing" rule before the one that does the rewriting. Then you can work with a positive match ("if URL ends in -cartridge-multipack, do nothing"), instead of trying to find a negated pattern.
RewriteRule -cartridge-multipack$ - [L]
RewriteRule (.+-cartridge)-.+$ $1 [R=301,L]
Pattern anchored at the end with $ (means nothing is allowed to come after this), - for "no substitution", and the L flag to make the rewrite engine stop the current round of processing.

Related

Need .htaccess recipe to display rss feed dynamically

I currently use the following recipe to route .rss files to a script that produces a rss feed dynamically:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
It works perfectly for URLs like this:
www.example.com/articles.rss
What I would to like to do is change the URL to this:
www.example.com/rss/articles/
Everything I have tried doesn't work.
I just tried to put some slashes in the recipe but I'm not an expert in these recipes so they didn't work. Somethig like this didn't work: RewriteRule ^/rss/(.*)/$ /get-feed.pl?item=$1
("recipe" = regular expression / "regex" for short OR RewriteRule "pattern" from the Apache docs - At least I think that is what you are referring to? We are not baking a cake here! ;) )
That is very close, except that the URL-path that the RewriteRule pattern matches against does not start with a slash when used in a .htaccess (directory) context. So, it would need to be like this: ^rss/(.*)/$. If you had looked to see what your first rule was returning you would have seen that there was no slash prefix in the backreference that was captured (ie. the value of the item URL parameter).
However, there are other (minor) issues here...
The 2nd path segment cannot be empty, so it would be preferable to match something, rather than anything. eg. (.+) instead of (.*). However, this should be made more restrictive, so to match just a single path segement, instead of any URL-path (which is likely to fail anyway I suspect). eg. Presumably /rss/foo/bar/baz/ should not match?
Again, if you only want to match a string of the form articles then make the regex more restrictive so that it only matches letters (or perhaps letters + numbers + hyphens)?
You are missing the L (last) flag on this rule, which is a problem if you have other directives that follow.
So, if you are wanting to rewrite URLs of the form www.example.com/rss/articles/ (note the trailing slash) then try the following instead:
RewriteRule ^rss/([\w-]+)/$ /get-feed.pl?item=$1 [L]
Make sure the browser cache is cleared before testing.
And this would need to go near the top of the .htaccess file, before any existing rewrites.
Aside: A quick look at your original directive:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
This is not strictly correct, as it potentially matches too much. The unescaped dot before rss matches any character. And the .* subpattern matches 0 or more characters of anything - it must be something. So, this should really be something like:
RewriteRule ^([\w-]+)\.rss$ /get-feed.pl?item=$1 [L]

How to rewrite URLs in htaccess that end with recurring characters

I have changed web platforms and have old URLs that I cannot and do not want to match on the new platform where the old content is now living.
I have an array of old product URLs that all have '-p-' in the URL, followed by a string of numbers and ending in .html (osCommerce platform URLs).
I would like to know how to rewrite:
/x/[rest-of-url]-p-[random numbers].html
to
/x/[rest-of-url]
I would like the end result to look something like this:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo-p-2214.html
redirects to:
http://www.shop.com/shop/versace-black-snakeskin-pony-hair-hobo
Does anyone know if this is doable in the htaccess file as a rewrite rule?
My managed hosting service providers BeepWeb answered my question.
RewriteRule ^/shop/(.*)-p-(.*).html$ http://www.shop.com/product/$1/ [R=302]
The first argument is the URI that you are matching. The (.) matches any characters. The second argument is the destination URL. The $1 corresponds to the first (.). $2 would be the second (.*), and so on... The [R=302] tells the rewrite to be a 302 redirect (use [R=301] for a 301 redirect).
Using the (.) is essentially like using a wildard. You can instead narrow this down by specifying which characters you want to match as opposed to all characters (instead of using (.) you could use ([abc]*) which would match only against a, b and c characters).
Also, be careful that you do not match other URLs unintentionally (i.e. you need to make sure that the pattern matches are unique to the URLs being rewritten).
If you need the source reference, see the following:
https://httpd.apache.org/docs/current/rewrite/intro.html
Thanks again to http://www.beepweb.com for their detailed response.
Hope it helps others.

Apache rewrite backreference variable not accessible after first use

I have come across a situation that seems odd to me. It seems that backreference variables when building apache rewrite rules get lost after the first use.
My requirement is changing an old URL pattern to conform to a new path pattern, e.g:
www.example.com/documents/newsletter/newsletter-issue-50.htm
to become
www.example.com/sites/default/newsletter/50/English/newsletter-issue-50.htm
As you can see, the new URL pattern needs to have the issue number specified in 2 places.
My rewrite rule is as follows:
RewriteRule ^documents/newsletter/newsletter-issue-(.*).htm$ http://www.example.com/sites/default/newsletter/$1/English/newsletter-issue-$1.htm [R=301,L]
When I use this rule, I still get a 404 because the resultant URL misses to replace the second "$1" with the issue number , in this case "50". What I get is
http://www.example.com/sites/default/newsletter/50/English/newsletter-issue-.htm
I have used this test site and it confirms that the second backreference variable is not being evaluated at all. Am sure am missing something here, since it should be a simple rule to put in place.
Any help on this would be greatly appreciated.
Thanks.
Strangely enough, I works in the rewrite tester if you surround with 2 sets of parenthesis:
RewriteRule ^documents/newsletter/newsletter-issue-((.*))[.]htm$ http://www.example.com/sites/default/newsletter/$1/English/newsletter-issue-$1.htm [R=301,L]
I have also escaped the file extension prefix

Redirect 301 from a Directory to a Single File

I'm having a bit of trouble figuring out something that should be simple. I want to 301 redirect everything in a directory to one single file in a new location.
In my .htaccess, I've already tried the following...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/index.html
and this...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/
and this...
Redirect 301 /myDir http://www.mydomain.com/myNewDir
The problem is that each of those are simply mapping each file within /myDir/, and appending it to the end of the destination URL.
After Googling, I saw something that said to do this...
Redirect 301 ^/myDir(.*) http://www.mydomain.com/myNewDir
But that just does the same thing... it's mapping the existing file location to the end of the URL.
It was easy finding lots of ".htaccess redirect" tutorials online but they seem to only show the obvious examples like 'one-to-one file mapping' or 'one-to-one directory mapping'. These tutorials also seem to neglect explaining the various relevant file directives and how to properly use them.
This particular hosting account is garbage and also has FrontPage extensions installed. Mod-rewrite fails (breaks the whole site) yet the Redirect 301 lines are operating fine. So until I can move this new (non-FrontPage) site to a more robust hosting account, I'll need to stick with the Redirect 301 one-liner.
How can I simply use a Redirect 301 to redirect everything within /myDir/ to the same single file located at /myNewDir/index.html? (I'd prefer using just /myNewDir/ if possible). Kindly explain, in detail, the file directives used in your solution.
UPDATE:
Previously accepted answer is not working.
Example:
RedirectMatch 301 /myDir1/(.*) http://mydomain.org/newpath/myDir1/index.html
...is giving a "Too many redirects occurred trying to open" error.
This is because /myDir1/(.*) is matching anyplace within the string so if the target URL contains /myDir1/ anywhere, not just the root, it will get redirected into a nasty loop.
See my own posted answer for correct solution.
I found the answer within one of my old projects.
Redirect 301 is all wrong for this. I really wanted RedirectMatch 301 instead.
RedirectMatch 301 ^/myDir/(.*) http://www.example.com/myNewDir/
Explanation(s):
http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch
"This directive is equivalent to Redirect, but makes use of standard
regular expressions, instead of simple prefix matching."
http://www.zytrax.com/tech/web/regex.htm
"The ^ (circumflex or caret) outside square brackets means look only at
the beginning of the target string, for example, ^Win will not find
Windows in STRING1 but ^Moz will find Mozilla."
and...
"The . (period) means any character(s) in this position, for example,
ton. will find tons, tone and tonneau but not wanton because it has no
following character."
and...
The * (asterisk or star) matches the preceding character 0 or more
times, for example, tre* will find tree (2 times) and tread (1 time)
and trough (0 times).
Try this:
RedirectMatch 301 /myDir/.* http://www.mydomain.com/myNewDir/index.html
Reference: http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch.
As far as brackets around .* are concerned, RedirectMatch uses standard regular expressions, which means that you can capture matched characters and use them in your redirect rule refferencing them as $1, $2, etc.
In regular expressions * means any number of repetitions of the previous character. . - denotes any character. So the combination .* says that this pattern match any number of any character. Hence * . * means that this pattern will match /myDir and /myDir/, and still /myDir/test.html. So * . * can also be used

mod rewrite help

I am trying to use mod rewrite to remove and replace part of my url. I am looking to get my urls looking like this.
http://domain.com/e813c697e8dd8dc2bbfecb1d20b15783.html
instead of
http://domain.com/lookup.php?md5=e813c697e8dd8dc2bbfecb1d20b15783
lookup.php calls matches the md5 to the database and fetches and forwards you to the correct url.
All I need to do now is rewrite it so that it rewrites from this
http://domain.com/lookup.php?md5=e813c697e8dd8dc2bbfecb1d20b15783
to this
http://domain.com/e813c697e8dd8dc2bbfecb1d20b15783.html
I have tried this which works but it makes rewrites from any .html page at root level and makes it display nothing "blank".
RewriteRule ^([a-z0-9]+)\.html$ /lookup.php?md5=$1
Can anyone tell me a way to do this so that my regular html pages are not messed up and be able to display these links how I want to? Thanks.
Your current rule is a way too broad. You need to make it more specific to only match md5 hash value -- which is easy:
RewriteRule ^([a-f0-9]{32})\.html$ /lookup.php?md5=$1 [QSA,L]
Your pattern for file name is too broad -- it will match any file with letters and digits. md5 hash, on another hand, uses very limited subset of characters (a-f only) and digits .. and has to be 32 characters long. This pattern ([a-f0-9]{32}) does the job perfectly.
I have also added L and QSA flags (QSA to preserve any existing query string (like, tracking info, for example) and L to stop matching any other rules).
To further ensure that it does not match any real files which may have name in such format (who knows), add RewriteCond %{REQUEST_FILENAME} !-f line before the rule.
One thing you can do is quantify the number of hex digits:
RewriteRule ^([0-9a-f]{32})\.html$ /lookup.php?md5=$1
as md5 will always have 32 hex digits.
That depends on the naming scheme of your regular html pages. For starters though you could change it to: RewriteRule ^([a-f0-9]+)\.html$ /lookup.php?md5=$1
Which would make any html pages which have a letter not between a and f work.
If none of your HTML pages have numbers in their names, and if the cost of a page not redirecting outweighs the odds of an md5 hash having no numbers in it. You could check that there is at least one digit in the filename: RewriteRule ^([a-f]*\d[a-f\d]+)\.html$ /lookup.php?md5=$1
Lastely if it is acceptable for you to have urls like http://domain.com/md5/e813c697e8dd8dc2bbfecb1d20b15783.html instead you could change it to: RewriteRule ^md5/([a-f0-9]+)\.html$ /lookup.php?md5=$1 and not have to worry about the fragileness of the other methods.
I'm not sure, but I think you have a RewriteBase / somewhere in you .htaccess judging by your example mod_rewrite. If not you might need to add a / right after the ^ in whatever RewriteRule you choose to go with.