Understanding htaccess Filesmatch code - apache

I am trying to install drupal in a subdirectory on my bluehost hosted website...
It's a HUGE pain
I'm thinking the following lines from the .htaccess is the problem. When I currently navigatoe to mysite.com/subdir/install.php I get a 403 error. However, when I take out "deny" from the lines below, I cease to get that error, so I suspect that this line is causing all the trouble.
My question is, can someone help me understand what is happening in the following code? Especially if you can break it down by component.
<FilesMatch "\.(engine|inc|info|install|make|module|profile|test|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)(|~|\.sw[op]|\.bak|\.orig|\.save)?$|^(\..*|Entries.*|Repository|Root|Tag|Template)$|^#.*#$|\.php(~|\.sw[op]|\.bak|\.orig\.save)$">
Order allow,deny
</FilesMatch>

FilesMatch allows you to match files using a regular expression.
On your above FilesMatch you have 4 sets of regular expression where the 1 set have an secondary optional set.
Basically what it is doing is forbidden access (error 403) to any of the files found that are described on your sets of regex.
For example:
\.(engine|inc ...)$|
Means if the file ends with .engine or .inc or ... rest of the rule, deny access to it.
Then at the end of the first set of rules you have a | which like the above example, stands for OR so if the first set of rules were not match, it starts the second one, which is slight different.
^(\..*|Entries.*|Repository)$
Here it does the opposite, it matches if the file starts and end with a given keyword, so for example:
If file starts with . followed by anything the (.*) means anything else for example .htaccess or starts with Entries followed by anything or is exactly Repository or ... till the end.
Then the next rule ^#.*#$, this one means the file starts and ends with a # as # its treated literally
And the last set of rules does the same of the first verify if file ends with those given extensions.
If you want to know more then I suggest you to learn more about Perl Compatible Regular Expressions (PCRE)

Related

htaccess rewrite rule containing certain word

I have a magento web shop and i use a plugin to import stock, prices, products. Annoyingly this plugin doesnt save old urls if i update the product name etc.
Is there a way i can do this with htaccess? For example, i removed the SKU from the end of a product URL but google has indexed some of these old URLs.
Is it possible to rewrite https://www.example.com/xerox-everyday-toner-for-tn242y-yellow-toner-cartridge-006r04226 to https://www.example.com/xerox-everyday-toner-for-tn242y-yellow-toner-cartridge using some wildcards? Obviously everything before the word "cartridge" changes per product so i want a redirect that if a URL contains "-cartridge-" remove everything after that pattern as SKU lengths can change but only contain alphanumeric characters. If a URL does not contain "-cartridge-" do not do anything.
I've tried a few regex patterns using an online htaccess builder but i cant seem to get this correct (unless these sites dont process the regex and thats why i think they dont work).
RewriteRule (.+-cartridge)-.+$ $1 [R=301,L]
This should do the job. Everything up to -cartridge gets captured (capturing the dynamic part before and this static suffix in one go, means we don't have the assemble the substitution URL out of multiple parts, but can just use $1), and after it a - plus some arbitrary characters must follow.
is there anyway you can add a rule so it excludes if "multipack" comes after the "-cartridge-".
The often easiest way to do this, is to place a "do nothing" rule before the one that does the rewriting. Then you can work with a positive match ("if URL ends in -cartridge-multipack, do nothing"), instead of trying to find a negated pattern.
RewriteRule -cartridge-multipack$ - [L]
RewriteRule (.+-cartridge)-.+$ $1 [R=301,L]
Pattern anchored at the end with $ (means nothing is allowed to come after this), - for "no substitution", and the L flag to make the rewrite engine stop the current round of processing.

How do I turn dash breaks in file name into new directories with Apache mod_rewrite?

I'd like to use Apache's mod_rewrite in order to treat all dash breaks within a file name as separate folders starting from the top, or vice versa.
For example, if someone was to access /events/booking/?id=1 the file events-booking.php?id=1 would actually be requested.
The current sceript I have is taken from the StackOverflow post below, which simply strips the file of it's extension, but I'd like the addition of the above too.
https://stackoverflow.com/a/1698807/2023781
Any help on how to modify my existing block of mod_rewrite script would be greatly appreciated (I'm not really good with regex yet unfortunately).
Off the top of my head, this should do it:
RewriteRule ^([^/]*)/([^/]*)/$ /$1-$2.php [L]
^ = beginning of path
([^/]*) = match all characters except slash
$ = end of path
$1 and $2 are the two matched strings from the path.
The query string should carry over as-is.

Apache rewrite backreference variable not accessible after first use

I have come across a situation that seems odd to me. It seems that backreference variables when building apache rewrite rules get lost after the first use.
My requirement is changing an old URL pattern to conform to a new path pattern, e.g:
www.example.com/documents/newsletter/newsletter-issue-50.htm
to become
www.example.com/sites/default/newsletter/50/English/newsletter-issue-50.htm
As you can see, the new URL pattern needs to have the issue number specified in 2 places.
My rewrite rule is as follows:
RewriteRule ^documents/newsletter/newsletter-issue-(.*).htm$ http://www.example.com/sites/default/newsletter/$1/English/newsletter-issue-$1.htm [R=301,L]
When I use this rule, I still get a 404 because the resultant URL misses to replace the second "$1" with the issue number , in this case "50". What I get is
http://www.example.com/sites/default/newsletter/50/English/newsletter-issue-.htm
I have used this test site and it confirms that the second backreference variable is not being evaluated at all. Am sure am missing something here, since it should be a simple rule to put in place.
Any help on this would be greatly appreciated.
Thanks.
Strangely enough, I works in the rewrite tester if you surround with 2 sets of parenthesis:
RewriteRule ^documents/newsletter/newsletter-issue-((.*))[.]htm$ http://www.example.com/sites/default/newsletter/$1/English/newsletter-issue-$1.htm [R=301,L]
I have also escaped the file extension prefix

Regex rule to match % sign in url for apache mod rewrite

Hello my rewrite rule is failling sometimes because my urls have % signs in them.
For example this url:
http://www.chillisource.co.uk/product/Grocery/Dr.%20Burnoriums%20Psycho%20Juice/1/B005MSE5KG/Psycho_Juice_70%_Ghost_Pepper
This is my rewrite rule:
RewriteRule ^product/([a-zA-Z]+)/([\sa-zA-Z0-9\-\+\.]+)/([0-9]+)/([A-Z0-9]+)/([a-zA-Z0-9]+) /product?&cat=$1&q=$2&page=$3&prod=$4&prodName=$5
How can I modify the 5th rule ([a-zA-Z0-9]+) to not fail on when there is a % in the product name ?
Thanks in advance.
Perhaps, it's not %20, but space. That is, the URL passed to rewrite adter urldecoding. If not, then just add percent sign to the range. (if yes — space).
---- Forget this part, I misunderstood the question ----
From what I get from the mod_rewrite documentation (http://httpd.apache.org/docs/current/mod/mod_rewrite.html), you should not have to deal with hex encoded characters (I assume that from the following statement:
THE_REQUEST
The full HTTP request line sent by the browser to the server (e.g., "GET /index.html HTTP/1.1"). This does not include any additional headers sent by the browser. This value has not been unescaped (decoded), unlike most other variables below.
In fact, using mod_rewrite would be practically impossible since you'd have to deal with that EVERYWHERE, e.g., you can always write %41 instead of 'A'.
--- But the following still is true ---
But your rewrite rule can't work, at least not with the request URL you posted: The last part of the regex "([a-zA-Z0-9]+)" is FAR too strict. In this case, it fails for the following reasons:
It lacks a treatment of the percent sign, as in "70%"
You forgot to include the underscore "_"
Try adding at least these two characters ("[a-zA-Z0-9%_]+") and it should work.

Redirect 301 from a Directory to a Single File

I'm having a bit of trouble figuring out something that should be simple. I want to 301 redirect everything in a directory to one single file in a new location.
In my .htaccess, I've already tried the following...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/index.html
and this...
Redirect 301 /myDir/ http://www.mydomain.com/myNewDir/
and this...
Redirect 301 /myDir http://www.mydomain.com/myNewDir
The problem is that each of those are simply mapping each file within /myDir/, and appending it to the end of the destination URL.
After Googling, I saw something that said to do this...
Redirect 301 ^/myDir(.*) http://www.mydomain.com/myNewDir
But that just does the same thing... it's mapping the existing file location to the end of the URL.
It was easy finding lots of ".htaccess redirect" tutorials online but they seem to only show the obvious examples like 'one-to-one file mapping' or 'one-to-one directory mapping'. These tutorials also seem to neglect explaining the various relevant file directives and how to properly use them.
This particular hosting account is garbage and also has FrontPage extensions installed. Mod-rewrite fails (breaks the whole site) yet the Redirect 301 lines are operating fine. So until I can move this new (non-FrontPage) site to a more robust hosting account, I'll need to stick with the Redirect 301 one-liner.
How can I simply use a Redirect 301 to redirect everything within /myDir/ to the same single file located at /myNewDir/index.html? (I'd prefer using just /myNewDir/ if possible). Kindly explain, in detail, the file directives used in your solution.
UPDATE:
Previously accepted answer is not working.
Example:
RedirectMatch 301 /myDir1/(.*) http://mydomain.org/newpath/myDir1/index.html
...is giving a "Too many redirects occurred trying to open" error.
This is because /myDir1/(.*) is matching anyplace within the string so if the target URL contains /myDir1/ anywhere, not just the root, it will get redirected into a nasty loop.
See my own posted answer for correct solution.
I found the answer within one of my old projects.
Redirect 301 is all wrong for this. I really wanted RedirectMatch 301 instead.
RedirectMatch 301 ^/myDir/(.*) http://www.example.com/myNewDir/
Explanation(s):
http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch
"This directive is equivalent to Redirect, but makes use of standard
regular expressions, instead of simple prefix matching."
http://www.zytrax.com/tech/web/regex.htm
"The ^ (circumflex or caret) outside square brackets means look only at
the beginning of the target string, for example, ^Win will not find
Windows in STRING1 but ^Moz will find Mozilla."
and...
"The . (period) means any character(s) in this position, for example,
ton. will find tons, tone and tonneau but not wanton because it has no
following character."
and...
The * (asterisk or star) matches the preceding character 0 or more
times, for example, tre* will find tree (2 times) and tread (1 time)
and trough (0 times).
Try this:
RedirectMatch 301 /myDir/.* http://www.mydomain.com/myNewDir/index.html
Reference: http://httpd.apache.org/docs/1.3/mod/mod_alias.html#redirectmatch.
As far as brackets around .* are concerned, RedirectMatch uses standard regular expressions, which means that you can capture matched characters and use them in your redirect rule refferencing them as $1, $2, etc.
In regular expressions * means any number of repetitions of the previous character. . - denotes any character. So the combination .* says that this pattern match any number of any character. Hence * . * means that this pattern will match /myDir and /myDir/, and still /myDir/test.html. So * . * can also be used