Log image filename that's cached by external cdn using htaccess - apache

I want to keep a log of image file names whenever a specific cdn caches our images but I can't quite get it. Right now, my code looks something like:
RewriteCond %{HTTP_USER_AGENT} Photon/1.0
RewriteRule ^(.*)$ log.php?image=$1 [L]
The above always logs the image as being "log.php" even if I'm making the cdn cache "example.jpg" and I thoroughly don't understand why.

The above always logs the image as being "log.php" even if I'm making the cdn cache "example.jpg" and I thoroughly don't understand why.
Because in .htaccess the rewrite engine loops until the URL passes through unchanged (despite the presence of the L flag) and your rule also matches log.php (your rule matches everything) - so this is the "image" that is ultimately logged. The L flag simply stops the current pass through the rewrite engine.
For example:
Request /example.jpg
Request is rewritten to log.php?image=example.jpg
Rewrite engine starts over, passing /log.php?image=example.jpg to the start of the second pass.
Request is rewritten to log.php?image=log.php by the same RewriteRule directive.
Rewrite engine starts over, passing /log.php?image=log.php to the start of the third pass.
Request is rewritten to log.php?image=log.php (again).
URL has not changed in the last pass - processing stops.
You need to make an exception so that log.php itself is not processed. Or, state that all non-.php files are processed (instead of everything). Or, if only images are meant to be processed then only check for images.
For example:
# Log images only
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteRule ^(.+\.(?:png|jpg|webp|gif))$ log.php?image=$1 [L]
Remember to backslash-escape literal dots in the regex.
Or,
# Log Everything except log.php itself
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteCond %{REQUEST_URI} ^/(.+)
RewriteRule !^log\.php$ log.php?image=%1 [L]
In the last example, %1 refers to the captured subpattern in the preceding CondPattern. I only did it this way, rather than using REQUEST_URI directly since you are excluding the slash prefix in your original logging directive (ie. you are passing image.jpg to your script when /image.jpg is requested). If you want to log the slash prefix as well, then you can omit the 2nd condition and pass REQUEST_URI directly. For example:
# Log Everything except log.php itself (include slash prefix)
RewriteCond %{HTTP_USER_AGENT} Photon/1.0
RewriteRule !^log\.php$ log.php?image=%{REQUEST_URI} [L]
Alternatively, on Apache 2.4+ you can use the END flag instead of L to force the rewrite engine to stop and prevent further passes through the rewrite engine. For example:
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteRule (.+) log.php?image=$1 [END]

Related

remove //xx after directory in URL

I need to redirect all URLs like this:
example.com/podcasts//rebt
to
example.com/podcasts
I am trying to adjust this code to do both but I can't get it to work:
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]
To remove //<something> at the end of the URL-path (eg. /podcasts//rebt to /podcasts, try the following instead at the top of the root .htaccess file:
RewriteEngine On
RewriteCond %{THE_REQUEST} \s([^?]+?)//
RewriteRule . %1 [R=301,L]
THE_REQUEST server variable contains the first line of the initial request headers (eg. GET /podcasts/rebt HTTP/1.1) and does not change when the request is internally rewritten (unlike REQUEST_URI).
The regex \s([^?]+?)// captures the part of the URL-path before the first instance of a double slash. Anything after and including the double slash, are discarded. This regex also ensures we do not inadvertently match against the query string (if any).
The %1 backreference contains the captured subpattern (ie. everything before the first double slash in the URL-path) from the preceding CondPattern.
Aside: Note that this will not work properly if the preceding URL-path maps to a physical directory, since it will result in two redirects. eg. /directory//something to /directory to /directory/ (by mod_dir). In this case, you should avoid removing the first trailing slash.
You should test first with a 302 (temporary) redirect to avoid any potential caching issues and only change to a 301 (permanent) redirect when you are sure it's working as intended. You should clear your browser cache before testing.
A look at your existing rule...
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{REQUEST_URI} ^(.*?)(/{2,})(.*)$
RewriteRule . %1/%3 [R=301,L]
This code is intended to reduce multiple slashes to single slashes in the URL-path, not remove the double slash and remaining path entirely. eg. /podcasts//rebt to /podcasts/rebt. However, since it checks against the REQUEST_URI server variable (which can change throughout the request) it may not work as intended.
Also, the condition that checks against the REQUEST_METHOD would seem to be redundant, unless you are erroneously POSTing to double-slashed URLs internally? A 301 redirect removes any POST data (since the browser converts it to GET) - hence why the check may be necessary in certain cases.

Rewrite every request to one file with url as parameter

I am coding a small CMS in PHP and need to redirect all requests to that file (called cms.php in my case). For example
/~ps0ke/ -> /~ps0ke/cms.php?path=index.html
/~ps0ke/projects/cms.html -> /~ps0ke/cms.php?path=projects/cms.html
and so on. There is also a lang paramter that is set if en/ is preceding the directory. This should not be of importance because my problem existed before I added multi-lingual support. Right now I am using Apache and the following .htaccess to achieve the rewrite:
RewriteEngine On
RewriteBase /~ps0ke/
# Serve index.html via cms.php when base dir or index.html is requested. Also
# set the language.
RewriteRule ^((en)/)?(index.html)?$ cms.php?lang=$2&path=index.html [NC,L]
# Serve everything else via cms.php. Also set the language.
# Serving from the page subdirectory is due to a problem with all-wildcard
# RewriteRule. This might be fixed.
RewriteRule ^((en)/)?page/(.*)$ cms.php?lang=$2&path=$3 [NC,L,B]
you may notice that there is an additional page/ in between the RewriteBase and the actual path. I am doing this because simply matching for
RewriteRule ^((en)/)?(.*)$ cms.php?lang=$2path=$3 [NC,L,B]
simply does not work. I don't understand why. When I use the rule as above outputting $_GET results in
Array
(
[lang] =>
[path] => cms.php
)
Regardless of the actual GET path, the path GET-Variable is always set to the script's name. And I just don't understand why.
The reason I don't want to have the page/ prefix included is that it maintains backwards compatibility. The CMS is specialized in serving a normal file structure and builds its navigation etc. just from the file system. Therefor it would be nice to have the actual real file structure represented in the GET path. Therefore, even if someone removes the CMS again, the links would still work.
Just easier reference I put in the Apache manual entries for the options used:
NC|nocase
Use of the [NC] flag causes the RewriteRule to be matched in a
case-insensitive manner. That is, it doesn't care whether letters
appear as upper-case or lower-case in the matched URI.
B (escape backreferences)
The [B] flag instructs RewriteRule to escape non-alphanumeric
characters before applying the transformation.
L|last
The [L] flag causes mod_rewrite to stop processing the rule set. In
most contexts, this means that if the rule matches, no further rules
will be processed. This corresponds to the last command in Perl, or
the break command in C. Use this flag to indicate that the current
rule should be applied immediately without considering further rules.
Any help (a fix or an explanation) is appreciated! Thanks in advance!
Your are getting into this problem because your rules are executing twice. You can stop it by avoiding all resources (js, image, css etc) to rewrite and also not letting it run second time.
Have your rules like this:
RewriteEngine On
RewriteBase /~ps0ke/
# avoid any rules for resources and 2nd time:
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_URI} \.(?:jpe?g|gif|bmp|png|tiff|css|js)$ [NC,OR]
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule ^ - [L]
# Serve index.html via cms.php when base dir or index.html is requested. Also
# set the language.
RewriteRule ^((en)/)?(index.html)?$ cms.php?lang=$2&path=index.html [NC,L,QSA]
# Serve everything else via cms.php. Also set the language.
# Serving from the page subdirectory is due to a problem with all-wildcard
# RewriteRule. This might be fixed.
RewriteRule ^((en)/)?(.*)$ cmas.php?lang=$2path=$3 [NC,L,QSA]

.htaccess mod_rewrite linking to wrong page

I have in my .htaccess the following code:
RewriteEngine On
RewriteRule ^/?([^/\.]+)/?$ $1.php [L]
RewriteRule ^/?([^/\.]+).php$ $1/ [R,L]
RewriteRule ^/?([^/\.]+)/?$ $1.php [L] is working fine. What this is doing is taking a url like http://www.example.com/whatever and making it read the page as http://www.example.com/whatever.php.
However, what I'd like to be able to do is take a url like http://www.example.com/whatever.php and automatically send it to http://www.example.com/whatever, hence the second line of the code. However, this isn't working. What its doing now, is as soon as it comes across a link ending in .php, the url becomes http://localhost/C:/Sites/page/whatever/, and pulling a 403: Forbidden page.
All I want to know is what I can to so that http://www.example.com/whatever.php will be read as http://www.example.com/whatever, and that if http://www.example.com/whatever.php is entered into the URL bar, it will automatically redirect to http://www.example.com/whatever.
Does that make any sense?
EDIT
Ok, so it appears I wasn't all too clear.. basically, I want /whatever/ to read as whatever.php while the URL still stays as /whatever/, right? However, if the URL was /whatever.php, I want it to actually redirect the users URL to /whatever/, and then once again read it as whatever.php. Is this possible?
If you're rules are inside an .htaccess file, you can omit the leading slash when you match against a URI:
RewriteRule ^([^/\.]+)/?$ /$1.php [L]
Also note that a leading slash is included in the target (/$1.php), this makes sure /whatever/ gets rewritten to /whatever.php. When you redirect, if you are missing this leading slash, apache prepends the document root to it. Thus /whatever.php gets redirected to the document root C:/Sites/page/whatever/. Even if you include the leading slash, this will never work because you're going to cause a redirect loop:
Enter "http://www.example.com/whatever.php" in your address bar
apache redirects you to "http://www.example.com/whatever/"
apache gets the URI whatever/ and applies the first rule and the URI gets rewritten to /whatever.php
The URI gets put through the rewrite engine again
the URI /whatever.php matches the second rule and redirects the browser to "http://www.example.com/whatever/"
repeat steps 3-5
You need to add a condition that the actual request is for /whatever.php:
RewriteCond %{THE_REQUEST} ^(GET|POST|HEAD)\ /([^/\.]+)\.php
RewriteRule ^ /%2/ [R,L]
So altogether, you'll have:
RewriteEngine On
RewriteRule ^([^/\.]+)/?$ /$1.php [L]
RewriteCond %{THE_REQUEST} ^(GET|POST|HEAD)\ /([^/\.]+)\.php
RewriteRule ^ /%2/ [R,L]
You're making a relative path substitution in a per-directory context (.htaccess is a per-directory context). This requires RewriteBase. Per-directory rewrites are done in a later stage of processing, when URLs have been mapped to paths. But the rewrite must produce a URL, which is processed again. I think without the RewriteBase to supply the URL prefix, you end up with a filesystem prefix instead of the URL. That may be why you're getting the C:/Sites thing. Try RewriteBase. But after a correct RewriteBase to specify the correct URL prefix to be tacked in front to the relative rewritten part, I'm afraid you will have the rewrite loop, because you're rewriting whatever.php to whatever; and whatever to whatever.php.
Reference: http://httpd.apache.org/docs/current/rewrite/tech.html

What's going on with my mod_rewrite?

I have a simple mod_rewrite system set up on my site which basically converts
http://site.com/file -> http://site.com/file.php
Here's the .htaccess file
Options -MultiViews
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.site.com
RewriteRule ^(.*)$ http://site.com/$1 [R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-z]+)/?$ http://site.com/$1.php [L]
This was working for a long time and then a couple of days ago I realized that while the RewriteRule was working, it was actually changing my URL in the status bar.
For instance, it would redirect /photos to /photos.php, but it would also change the URL to show the .php. This has never happened before and I'm not sure what happened to trigger the change.
Any ideas?
The first rewrite rule needs the [L] flag. From the mod_rewrite documentation for the [R] flag:
You will almost always want to use [R] in conjunction with [L] (that is, use [R,L]) because on its own, the [R] flag prepends http://thishost[:thisport] to the URI, but then passes this on to the next rule in the ruleset, which can often result in 'Invalid URI in request' warnings.
In this case, you don't get a warning, but appending the ".php" extension happens before issuing the redirect rather than when the second, redirected request comes in.
Also, remove the scheme and domain name from the substitution in the second rewrite rule. A full URL can cause an implicit redirect. From the documentation for RewriteRule:
The Substitution of a
rewrite rule is the string that replaces the original URL-path that
was matched by Pattern. The Substitution may
be a:
[...]
Absolute URL
If an absolute URL is specified,
mod_rewrite checks to see whether the
hostname matches the current host. If it does, the scheme and
hostname are stripped out and the resulting path is treated as
a URL-path. Otherwise, an external redirect is performed for
the given URL. To force an external redirect back to the
current host, see the [R] flag below.

Apache URL Rewriting,

I am trying to get URL rewriting to work on my website. Here is the contents of my .htaccess:
RewriteEngine On
RewriteRule ^blog/?$ index.php?page=blog [L]
RewriteRule ^about/?$ index.php?page=about [L]
RewriteRule ^portfolio/?$ index.php?page=portfolio [L]
#RewriteRule ^.*$ index.php?page=blog [L]
Now the 3 uncommented rewrite rules work perfectly, if I try http://www.mysite.com/blog/, I get redirected to http://www.mysite.com/index.php?page=blog, the same for "about" and "portfolio". However, if I mistype blog, say I try http://www.mysite.com/bloh/, then obviously I get a 404 error. The last rule, the commented one, was to help prevent that. Any URL should get redirected to the blog, but of course this rule is still parsed even if we have successfully used a previous one, so I used the "last" flag ([L]). If I uncomment my last rule, anything, including blog, about, and portfolio, redirect to blog. Shouldn't the "last" flag stop the execution as soon as it finds a matching rule?
Thanks.
Yes, the Last flag means it won't apply any of the rules following this rule in this request.
After rewriting the URL, it makes an internal request using the new rewritten URL which would match your last RewriteRule and thus your redirects go into an infinite loop.
Use the RewriteCond directive to limit rewriting to URLs that don't start with index.php, and you should be fine.
You could add a condition like:
RewriteCond %{REQUEST_URI} !^index\.php
I'll also mention that using RewriteRule ^.*$ is a good way to break all of your media requests (css, js, images) as well. You might want to add some conditions like:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
To make sure you're not trying to rewrite actual files or directories that exist on your server. Otherwise they'll be unreachable unless index.php serves those too!
From apache's mod_rewrite docs
'last|L' (last rule)
Stop the rewriting process here and don't apply any more rewrite
rules. This corresponds to the Perl
last command or the break command in
C. Use this flag to prevent the
currently rewritten URL from being
rewritten further by following rules.
Remember, however, that if the
RewriteRule generates an internal
redirect (which frequently occurs when
rewriting in a per-directory context),
this will reinject the request and
will cause processing to be repeated
starting from the first RewriteRule.
You could use
ErrorDocument 404 /index.php?page=blog
but you should be aware of the fact that it doesn't return 404 error code, but a redirect one and I don't know if that is such a good practice.
After you [L]eave processing for the request, the whole processing runs again for the new (rewritten) URL. You could get out of that loop by using this before your other rules:
RewriteRule ^index.php - [L]
which means "for index.php, don't rewrite and leave processing."