Generic mod_rewrite referrer check - apache

I'm looking for a generic (host-independent) set of mod_rewrite rules for doing HTTP_REFERER checking on resources. I came up with the following which seemed intuitive, but sadly doesn't work:
RewriteCond %{HTTP_REFERER} !^https?://%{HTTP_HOST}/.*
# RewriteRule .* - [F] # <- or whatever
Apparently you can't have a variable on both sides of the comparison. So, a hack:
RewriteCond %{HTTP_HOST}##%{HTTP_REFERER} !^([^#]*)##https?://\1/.*
But wow, that's ugly -- and if you don't know exactly what's going on, it's terribly confusing.
Is there a better (cleaner) way to write these rules?

"if you don't know exactly what's going on, it's terribly confusing"
First congrats on the workaround. Checking the source, mod_rewrite.c doesn't seem to do any form of variable interpolation, so I can't think of an alternative. As to your "confusing" point, isn't that why we have comments? I've also tidied up your regexp (e.g. the trailing .* is redundant) and used = as a delim to emphasise that you're doing a comparison.
It might look tacky, but your idea is near optimal in terms of runtime.
#
# Is **HTTP_REFERER** of the form http(s)://HTTP_HOST/....
# Note that mod_rewrite only does interpolation in the teststring so this is
# set up in the format AAAA=BBBB and the pattern uses a backreference (\1) to
# match the corresponding elements of AAAA and BBBB
#
RewriteCond %{HTTP_HOST}==%{HTTP_REFERER} !^(.*?)==https?://\1/

Related

Need .htaccess recipe to display rss feed dynamically

I currently use the following recipe to route .rss files to a script that produces a rss feed dynamically:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
It works perfectly for URLs like this:
www.example.com/articles.rss
What I would to like to do is change the URL to this:
www.example.com/rss/articles/
Everything I have tried doesn't work.
I just tried to put some slashes in the recipe but I'm not an expert in these recipes so they didn't work. Somethig like this didn't work: RewriteRule ^/rss/(.*)/$ /get-feed.pl?item=$1
("recipe" = regular expression / "regex" for short OR RewriteRule "pattern" from the Apache docs - At least I think that is what you are referring to? We are not baking a cake here! ;) )
That is very close, except that the URL-path that the RewriteRule pattern matches against does not start with a slash when used in a .htaccess (directory) context. So, it would need to be like this: ^rss/(.*)/$. If you had looked to see what your first rule was returning you would have seen that there was no slash prefix in the backreference that was captured (ie. the value of the item URL parameter).
However, there are other (minor) issues here...
The 2nd path segment cannot be empty, so it would be preferable to match something, rather than anything. eg. (.+) instead of (.*). However, this should be made more restrictive, so to match just a single path segement, instead of any URL-path (which is likely to fail anyway I suspect). eg. Presumably /rss/foo/bar/baz/ should not match?
Again, if you only want to match a string of the form articles then make the regex more restrictive so that it only matches letters (or perhaps letters + numbers + hyphens)?
You are missing the L (last) flag on this rule, which is a problem if you have other directives that follow.
So, if you are wanting to rewrite URLs of the form www.example.com/rss/articles/ (note the trailing slash) then try the following instead:
RewriteRule ^rss/([\w-]+)/$ /get-feed.pl?item=$1 [L]
Make sure the browser cache is cleared before testing.
And this would need to go near the top of the .htaccess file, before any existing rewrites.
Aside: A quick look at your original directive:
RewriteRule ^(.*).rss$ /get-feed.pl?item=$1
This is not strictly correct, as it potentially matches too much. The unescaped dot before rss matches any character. And the .* subpattern matches 0 or more characters of anything - it must be something. So, this should really be something like:
RewriteRule ^([\w-]+)\.rss$ /get-feed.pl?item=$1 [L]

mod_rewrite in .htaccess, go/*, go?, go?/* work, but go/?* doesn't

I tried going through some of the mod_rewrite questions on the site, but didn't find an answer to my problem (the fact that most questions were titled "mod_rewrite doesn't work" didn't help make that task any easier, hopefully my title is a little more helpful).
this is the content of my /somepath/.htaccess:
RewriteEngine On
RewriteRule ^go/(.*) /cgi-bin/goto.cgi?$1
# ^^^ WORKS: /somepath/go/200001..
RewriteRule ^go?(.*) /cgi-bin/goto.cgi$1
# ^^^ WORKS: /somepath/go?200001..
ErrorDocument 404 /cgi-bin/goto.cgi?error404
In testing, I made goto.cgi simply return $ENV{QUERY_STRING}, $ENV{QUERY_STRING_UNESCAPED} and $ENV{PATH_INFO}. Currently, the above rules mean that I am getting the QUERY_STRING passed on correctly when the urls:
/somepath/go/200001
/somepath/go?200001
/somepath/go?/200001
are accessed, but not when
/somepath/go/?200001
is accessed, in which case $ENV{QUERY_STRING}, $ENV{QUERY_STRING_UNESCAPED} and $ENV{PATH_INFO} are empty.
So the question is, what rule can I use so that
/somepath/go/?200001
gives my script the ?200001 or even /?200001 part back?
The first rule is destroying the query string of the example that fails, since $1 will be empty and an empty query string in the substitution removes any existing query string.
And the second rule is gibberish.
RewriteRule ^go/(.+)$ /cgi-bin/goto.cgi?$1
RewriteRule ^go/?$ /cgi-bin/goto.cgi

Mod Rewrite - Ungreedy modifier needed?

I have this code in a .htaccess file..
RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteRule ^(.+)-headquarters-([0-9]+)\.html company.php?lid=$2
Now my question is, what happens if the matching result from the first parentheses contains a hyphen? There is a very good chance a lot of them will. Is this where the ungreedy modifiers come in!?
I don't think in this case you need to make it ungreedy. e.g. test-headquarters-headquarters-42.html will match perfectly fine (with test-headquarters being $1).
But for completeness I will suggest a few options
You can make it ungreedy by adding a ? after it. So your rule would become (although, it this case, there would not be any case where it would result in different behaviour):
RewriteRule ^(.+)?-headquarters-([0-9]+)\.html company.php?lid=$2
You could also choose match anything except a hyphen by using
RewriteRule ^([^-]+)-headquarters-([0-9]+)\.html company.php?lid=$2

question regarding specific mod_rewrite syntax

I know there are other questions that are similar to this.. but I'm really struggling with mod_rewrite syntax so I could use some help.
Basically, what I am trying to do is have the following redirect occur:
domain.com/1/ redirect to domain.com/?id=$1 (also should work for www.domain.com)
What I have so far (that isn't working):
RewriteEngine On
ReRewriteRule ^/([0-9])$ /?id=$1
A few issues.
First is terminology: if you want when a user types domain.com/1/ that the request is served by index.php?id=1, then you are rewriting /1/ to index.php?id=1, not the other way around as you said.
Second, simple typo: RewriteRule, not ReRewriteRule.
Second, [0-9] is the right way to match a number, but it'll only match a single digit. If you want to handle /13 then you should match one or more instances [0-9] by writing [0-9]+.
Third, the target of your rule should be the file you want to serve. / is not a file or an absolute URL, write out the index.php if that's what you mean.
Third, you say you want to handle /1/, but your rule says that the matched request must end in a number, not a slash. If you want to accept the slash whether it's there or not, put that in the rule.
RewriteRule ^/?([0-9]+)/?$ index.php?id=$1 [L]
Does that work?
You've three issues:
RewriteRule is misspelt as point out by Michael, you need to worry about the trailing slash, and you need to stop processing rules when you've found the match:
RewriteRule ^/(\d+)/?$ /?id=$1 [L]
You have misspelled RewriteRule. Otherwise, I think your syntax looks correct.
RewriteEngine On
ReRewriteRule ^/([A-Za-z0-9]+)$ /?id=$1
--^^^---------
Actually, you should probably remove the /:
RewriteEngine On
RewriteRule ^([A-Za-z0-9$_.+!*'(),-]+)$ /?id=$1
------^^^---------
EDIT Added the +. Look at all the answers here. You need a composite of them., including the + and the [L] in addition to what I have here.
EDIT2 Also edited to include alpha characters in the id.
EDIT3 Added special characters to regex. These should be valid in a URL, but it's unusual to find them there.

Mod Rewrite problem

I have a problem that I cannot wrap my head around.
I'm using Apache and PHP
I need to get:
http://localhost.com/cat_ap.php?nid=5964
from
http://localhost.com/cat_ap~nid~5964.htm
How do I go about changing that around? I have done more simple mod rewrites but this is slightly more complicated. Can anyone give me a leg up or point me in the right direction
RewriteRule ^/cat_ap~nid~(.*)\.htm$ /cat_ap?nid=$1 [R]
The [R] at the end is optional. If you omit it, Apache won't redirect your users (it will still serve the correct page).
If the nid part is also a variable, you can try this:
RewriteRule ^/cat_ap~([^~]+)~(.*)\.htm$ /cat_ap?$1=$2 [R]
EDIT: As Ben Blank said in his comment, you might want to restrict the set of valid URLs. For example, you might want to make sure a nid exists, and that it's numerical:
RewriteRule ^/cat_ap~nid~([0-9]+)\.htm$ /cat_ap?nid=$1
or if the nid part is a variable, that it only consists of alphabetical characters:
RewriteRule ^/cat_ap~([A-Za-z]+)~([0-9]+)\.htm$ /cat_ap?$1=$2
Assuming the variable parts here are the "nid" and the 5964, you can do:
RewriteRule ^/cat_ap~(.+)~(.+).htm$ ^/cat_ap?$1=$2
The first "(.+)" matches "nid" and the second matches "5964".
If you want everything arbitrary:
RewriteRule ^/(\w+)~(\w+)~(\w+)\.htm$ $1?$2=$3 [L]
Where \w is equal to [A-Za-z0-9_]. And if you want to use this rule in a .htaccess file, remove the leading / from the pattern.