Rewrite every request to one file with url as parameter - apache

I am coding a small CMS in PHP and need to redirect all requests to that file (called cms.php in my case). For example
/~ps0ke/ -> /~ps0ke/cms.php?path=index.html
/~ps0ke/projects/cms.html -> /~ps0ke/cms.php?path=projects/cms.html
and so on. There is also a lang paramter that is set if en/ is preceding the directory. This should not be of importance because my problem existed before I added multi-lingual support. Right now I am using Apache and the following .htaccess to achieve the rewrite:
RewriteEngine On
RewriteBase /~ps0ke/
# Serve index.html via cms.php when base dir or index.html is requested. Also
# set the language.
RewriteRule ^((en)/)?(index.html)?$ cms.php?lang=$2&path=index.html [NC,L]
# Serve everything else via cms.php. Also set the language.
# Serving from the page subdirectory is due to a problem with all-wildcard
# RewriteRule. This might be fixed.
RewriteRule ^((en)/)?page/(.*)$ cms.php?lang=$2&path=$3 [NC,L,B]
you may notice that there is an additional page/ in between the RewriteBase and the actual path. I am doing this because simply matching for
RewriteRule ^((en)/)?(.*)$ cms.php?lang=$2path=$3 [NC,L,B]
simply does not work. I don't understand why. When I use the rule as above outputting $_GET results in
Array
(
[lang] =>
[path] => cms.php
)
Regardless of the actual GET path, the path GET-Variable is always set to the script's name. And I just don't understand why.
The reason I don't want to have the page/ prefix included is that it maintains backwards compatibility. The CMS is specialized in serving a normal file structure and builds its navigation etc. just from the file system. Therefor it would be nice to have the actual real file structure represented in the GET path. Therefore, even if someone removes the CMS again, the links would still work.
Just easier reference I put in the Apache manual entries for the options used:
NC|nocase
Use of the [NC] flag causes the RewriteRule to be matched in a
case-insensitive manner. That is, it doesn't care whether letters
appear as upper-case or lower-case in the matched URI.
B (escape backreferences)
The [B] flag instructs RewriteRule to escape non-alphanumeric
characters before applying the transformation.
L|last
The [L] flag causes mod_rewrite to stop processing the rule set. In
most contexts, this means that if the rule matches, no further rules
will be processed. This corresponds to the last command in Perl, or
the break command in C. Use this flag to indicate that the current
rule should be applied immediately without considering further rules.
Any help (a fix or an explanation) is appreciated! Thanks in advance!

Your are getting into this problem because your rules are executing twice. You can stop it by avoiding all resources (js, image, css etc) to rewrite and also not letting it run second time.
Have your rules like this:
RewriteEngine On
RewriteBase /~ps0ke/
# avoid any rules for resources and 2nd time:
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_URI} \.(?:jpe?g|gif|bmp|png|tiff|css|js)$ [NC,OR]
RewriteCond %{ENV:REDIRECT_STATUS} 200
RewriteRule ^ - [L]
# Serve index.html via cms.php when base dir or index.html is requested. Also
# set the language.
RewriteRule ^((en)/)?(index.html)?$ cms.php?lang=$2&path=index.html [NC,L,QSA]
# Serve everything else via cms.php. Also set the language.
# Serving from the page subdirectory is due to a problem with all-wildcard
# RewriteRule. This might be fixed.
RewriteRule ^((en)/)?(.*)$ cmas.php?lang=$2path=$3 [NC,L,QSA]

Related

Log image filename that's cached by external cdn using htaccess

I want to keep a log of image file names whenever a specific cdn caches our images but I can't quite get it. Right now, my code looks something like:
RewriteCond %{HTTP_USER_AGENT} Photon/1.0
RewriteRule ^(.*)$ log.php?image=$1 [L]
The above always logs the image as being "log.php" even if I'm making the cdn cache "example.jpg" and I thoroughly don't understand why.
The above always logs the image as being "log.php" even if I'm making the cdn cache "example.jpg" and I thoroughly don't understand why.
Because in .htaccess the rewrite engine loops until the URL passes through unchanged (despite the presence of the L flag) and your rule also matches log.php (your rule matches everything) - so this is the "image" that is ultimately logged. The L flag simply stops the current pass through the rewrite engine.
For example:
Request /example.jpg
Request is rewritten to log.php?image=example.jpg
Rewrite engine starts over, passing /log.php?image=example.jpg to the start of the second pass.
Request is rewritten to log.php?image=log.php by the same RewriteRule directive.
Rewrite engine starts over, passing /log.php?image=log.php to the start of the third pass.
Request is rewritten to log.php?image=log.php (again).
URL has not changed in the last pass - processing stops.
You need to make an exception so that log.php itself is not processed. Or, state that all non-.php files are processed (instead of everything). Or, if only images are meant to be processed then only check for images.
For example:
# Log images only
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteRule ^(.+\.(?:png|jpg|webp|gif))$ log.php?image=$1 [L]
Remember to backslash-escape literal dots in the regex.
Or,
# Log Everything except log.php itself
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteCond %{REQUEST_URI} ^/(.+)
RewriteRule !^log\.php$ log.php?image=%1 [L]
In the last example, %1 refers to the captured subpattern in the preceding CondPattern. I only did it this way, rather than using REQUEST_URI directly since you are excluding the slash prefix in your original logging directive (ie. you are passing image.jpg to your script when /image.jpg is requested). If you want to log the slash prefix as well, then you can omit the 2nd condition and pass REQUEST_URI directly. For example:
# Log Everything except log.php itself (include slash prefix)
RewriteCond %{HTTP_USER_AGENT} Photon/1.0
RewriteRule !^log\.php$ log.php?image=%{REQUEST_URI} [L]
Alternatively, on Apache 2.4+ you can use the END flag instead of L to force the rewrite engine to stop and prevent further passes through the rewrite engine. For example:
RewriteCond %{HTTP_USER_AGENT} Photon/1\.0
RewriteRule (.+) log.php?image=$1 [END]

Why does mod_rewrite ignore my [L] flag?

This is my .htaccess file. It should deliver static files from assets folder if the url matches them. Otherwise, everything should be redirected to index.php.
Note that the url doesn't contain assets as segemnt here. So example.com/css/style.css directs to assets/css/style.css.
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [L]
# other requests to index.php
RewriteRule !^asset/ index.php [L]
Unfortunately, urls like example.com/assets/css/style.css also deliver the file, since for that url none of my rules applies and Apache's default behavior is applied which delivers the file.
So I tried changing the last line to this. I thought that this would work since the [L] flag in the rule above should stop execution for asset urls and deliver them.
RewriteRule ^(.*)$ index.php [L]
Instead, not all requests are redirected to index.php, even static assets like example.com/css/style.css. Why does the flag not stop execution of rewrite rules and who to fix my problem then?
I found the solution on the pages of the official documentation.
If you are using RewriteRule in either .htaccess files or in
sections, it is important to have some understanding of
how the rules are processed. The simplified form of this is that once
the rules have been processed, the rewritten request is handed back to
the URL parsing engine to do what it may with it. It is possible that
as the rewritten request is handled, the .htaccess file or
section may be encountered again, and thus the ruleset may be run
again from the start. Most commonly this will happen if one of the
rules causes a redirect - either internal or external - causing the
request process to start over.
It is therefore important, if you are using RewriteRule directives in
one of these contexts, that you take explicit steps to avoid rules
looping, and not count solely on the [L] flag to terminate execution
of a series of rules, as shown below.
An alternative flag, [END], can be used to terminate not only the
current round of rewrite processing but prevent any subsequent rewrite
processing from occurring in per-directory (htaccess) context. This
does not apply to new requests resulting from external redirects.
To fix my problem, I changed to [L] flags to [END].
RewriteEngine on
# disable directory browsing
Options -Indexes
# static assets
RewriteCond %{DOCUMENT_ROOT}/assets/$1 -f
RewriteRule ^(.*)$ assets/$1 [END]
# other requests to index.php
RewriteRule !^asset/ index.php [END]

How to prevent mod_rewrite from rewriting URLs more than once?

I want to use mod_rewrite to rewrite a few human-friendly URLs to arbitrary files in a folder called php (which is inside the web root, since mod_rewrite apparently won't let you rewrite to files outside the web root).
/ --> /php/home.php
/about --> /php/about_page.php
/contact --> /php/contact.php
Here are my rewrite rules:
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^$ php/home.php [L]
RewriteRule ^about$ php/about_page.php [L]
RewriteRule ^contact$ php/contact.php [L]
However, I also want to prevent users from accessing files in this php directory directly. If a user enters any URL beginning with /php, I want them to get a 404 page.
I tried adding this extra rule at the end:
RewriteRule ^php php/404.php [L]
...(where 404.php is a file that outputs 404 headers and a "Not found" message.)
But when I access / or /about or /contact, I always get redirected to the 404. It seems the final RewriteRule is applied even to the internally rewritten URLs (as they now all start with /php).
I thought the [L] flag (on the first three RewriteRules) was supposed to prevent further rules from being applied? Am I doing something wrong? (Or is there a smarter way to do what I'm trying to do?)
[L] flag should be used only in the last rule,
L - Last Rule - Stops the rewriting process here and don’t apply any more rewriting rules & because of that you are facing issues.
I had similar problem. I have a content management system written in PHP and based on Model-View-Control paradigm. The most base part is the mod_rewrite. I've successfully prevent access to PHP files globally. The trick has name THE_REQUEST.
What's the problem?
Rewriting modul rewrites the URI. If the URI matches a rule, it is rewritten and other rules are applied on the new, rewritted URI. But! If the matched rule ends with [L], the engine doesn't terminate in fact, but starts again. Then the new URI doesn't more match the rule ending with [L], continues and matches the last one. Result? The programmer stars saying bad words at the unexpected 404 error page. However computer does, what you say and doesn't do, what you want. I had this in my .htaccess file:
RewriteEngine On
RewriteBase /
RewriteRule ^plugins/.* pluginLoader.php [L]
RewriteCond %{REQUEST_URI} \.php$
RewriteRule .* index.php [L]
That's wrong. Even the URIs beginning with plugins/ are rewritten to index.php.
Solution
You need to apply the rule if and only if the original - not rewritten - URI matches the rule. Regrettably the mod_rewrite does not provide any variable containing the original URI, but it provides some THE_REQUEST variable, which contains the first line of HTTP request header. This variable is invariant. It doesn't change while rewrite engine is working.
...
RewriteCond %{THE_REQUEST} \s.*\.php\s
RewriteRule \.php$ index.php [L]
The regular expression is different. It is not applied on the URI only, but on entire first line of the header, that means on something like GET /script.php HTTP/1.1. But the critical rule is this time applied only if the user is explicitly requesting some PHP-script directly. The rewritten URI is not used.

How can you ignore the end of a URL using mod_rewrite?

I'd like to structure my website like this:
domain.com/person/edit/1
domain.com/person/edit/2
domain.com/person/edit/3
etc.
I have a page to which all these requests should go:
domain.com/person/edit.html
The JavaScript will look at the trailing part of the url when the page is loaded so I want the server to internally ignore it.
I've got this rewrite rule:
RewriteRule ^person/view/(.*)$ person/view.html [L]
I'm sure that I'm missing something obvious but when I visit one of the pages above I get this 404 message:
The requested URL /person/view.html/1 was not found on this server.
As far as I understood it the [L] means that if this rule applies Apache should stop rewriting and serve up the alternate page. Instead it seems to be applying the rule at the earliest possible moment and then appending the rest of the unmatched url to the re-written one.
How do I get these re-writes to work properly?
"As far as I understood it the [L] means that if this rule applies Apache should stop rewriting and serve up the alternate page."
Well .. [L] flag tells Apache to stop checking other rules .. and rewrite goes to next iteration .. where it again checks against all rules again (that is how it works).
Try these "recipe" (put it somewhere on top of your .htaccess):
Options +FollowSymLinks -MultiViews
# activate rewrite engine
RewriteEngine On
# Do not do anything for already existing files
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule .+ - [L]
Another idea to try -- add DPI flag to your [L]: [L,DPI]
If Options will not help, then rewrite rule should. But it all depends on your Apache's configuration. If the above does not work -- please post your whole .htaccess (update your question).

mod_rewrite for all pages (including subfolders) on a site to a single php page

I am in the process of converting my site with many static html pages to a site driven by a database. My problem is that I don't want to lose what google has already indexed, so I would like to rewrite requests to be sent through a php script which lookup the filepath for content in the database. My understanding is that a mod_rewrite would best serve this purpose, unfortunately I have never used it, so I am a bit lost.
What I have:
www.domain.com/index.html
www.domain.com/page.html?var=123&flag=true
www.domain.com/folder/subfolder/
www.domain.com/folder/subfolder/index.html
www.domain.com/folder/subfolder/new/test.html
www.domain.com/folder/subfolder/new/test.html?var=123&flag=true
What I want (I also probably need to urlencode the path)(passing the full uri is also ok):
www.domain.com/index.php?page=/index.html OR www.domain.com/index.php?page=www.domain.com/index.html
www.domain.com/index.php?page=/page.html?var=123&flag=true
www.domain.com/index.php?page=/folder/subfolder/
www.domain.com/index.php?page=/folder/subfolder/index.html
www.domain.com/index.php?page=/folder/subfolder/new/test.html
www.domain.com/index.php?page=/folder/subfolder/new/test.html?var=123&flag=true
Here's my first go at it:
RewriteEngine On # Turn on rewriting
RewriteCond %{REQUEST_URI} .* # Do I even need this?
^(.*)$ /index.php?page=$1
Ideas? Thanks in advance :)
Edit:
So I tried implementing Ragnar's solution, but I kept getting 500 errors when I use 'RewriteCond $1' or include the '/' on the last line. I have setup a test.php file which will echo GET_["page"] so I know that the rewrite is working correctly. So far I can get some of the correct output (but only when I am not in root), for example:
RewriteEngine on
RewriteRule ^page/(.*)$ test.php?page=$1 [L]
If I visit the page http://www.domain.com/page/test/subdirectory/page.html?var=123 it will output 'test/subdirectory/page.html' (missing the querystring, which I need). However, if I use this example:
RewriteEngine on
RewriteRule ^(.*)$ test.php?page=$1 [L]
If I visit http://www.domain.com/page/test/subdirectory/page.html?var=123 it will only output 'test.php' which is thoroughly confusing. Thoughts?
Edit #2:
It seems I've been going about this all wrong. I just wanted the ability to use full uri in my php script page. The final working solution to do what I want is the following:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /test.php
Then in my php script, I can use $_SERVER['REQUEST_URI'] to get what I need. I knew this should have been easier than what I was trying...
I would recommend you to look into the Apache URL Rewriting Guide, it contains extensive information about rewriting with examples.
If I understand you correctly, you should be able to use something like this
RewriteEngine on
RewriteCond $1
RewriteRule ^(.*)$ index.php/?page=$1 [L]
Which is very similar code to the one you posted. If you want better information, be specific about your problem.
There's no need for so many lines, it only complicates things.
All you need is 2 lines in .htaccess:
rewriteengine on
#rewriterule-: ar1 path =relative. ar2 if relative, that to rewritebase.
rewriterule !^foo/bar/index\.php$ /foo/bar/index.php
#..assert ar1 dismatches correct url
In PHP
You can output the first input of rewriterule in PHP using:
<?=$_SERVER['REQUEST_URI'];
That will give you all the power and allow you to do all things. Simply parse $_SERVER["REQUEST_URI"] manually and you can echo totally different pages depending on the value of $_SERVER["REQUEST_URI"].
Sec bugs
Note that your server may do pathing or buggy pathing before rewriterule. (You can't override this behavior without server privileges.) Eg if the user visits /foo//// you may only see /foo/ or /foo. And eg if the user visits ///foo you may only see /foo. And eg if the user visits /a/../foo you may only see /foo. And eg if the user visits /a//b/../../foo you may only see /foo or /a/foo [because buggy production servers treat multiple / as distinct in the context of .., no kidding].
With circuit break
Rewrite circuit breaks on cin identical to htaccess∙parentfolder∙relative interpreted rewriterule∙arg2. (First off, personally I'd disable circuit breaks to reduce rule complexity but there seems to be no way to do so.)
Circuit-break solution:
rewriteengine on
#rewriterule-: ar1 path =relative. ar2 if relative, that to rewritebase.
rewriterule ^ /foo/bar/index.php
#..circuit-breaking, so assert ar2 if absolute, htaccess parentfolder =root, else, htaccess parentfolder not in interpreted ar2.
Circuit break and rewritebase undesigned use
Circuit break needs either of:
arg2 [of rewriterule] &rlhar; absolute. and htaccess parentfolder &rlhar; root.
arg2 &rlhar; relative. and that folder not in interpreted arg2.
So when that folder ≠ root, circuit break needs arg2 &rlhar; relative. when arg2 &rlhar; relative, circuit break needs⎾that folder &rlhar; not in interpreted arg2⏋.
Say we need circuit break and a htaccess parentfolder that's in interpreted arg2, so we edit arg2 via rewritebase:
rewriteengine on
#rewriterule-: ar1 path =relative. ar2 if relative, that to rewritebase.
rewriterule ^ bar/index.php
#..circuit-breaking, so assert ar2 if absolute, htaccess parentfolder =root, else, htaccess parentfolder not in interpreted ar2.
rewritebase /foo
#..needs to be absolute [</> starting]. pathing [eg </../a/..> and </../..///ran/dom-f/oobar/./././../../..////////.//.>] allowed