Frontloading mod_rewrite rule is causing index.php to load twice - apache

I've been working on a project that uses a frontloader to handle all requests (Routing domain.com/args/go/here to Index.php?req=args/go/here), and it's worked very well... Or I should say, I thought it did - I recently added a new logger, and to test it I placed a test log message in index.php. This message was being written to my log file twice, every time I reloaded the page, and after much debugging I found the cause to be my .htaccess file - for whatever reason, it loads index.php twice for every request.
Here's my .htaccess:
RewriteEngine On
RewriteBase /site/beta/ #I added this after I discovered the bug
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^index\.php$ #This too. Doesn't work
RewriteRule ^(.*)$ index.php?args=$1 [L]
I've also tried:
FallbackResource /site/beta/index.php
Which both doesn't work (Index.php just doesn't load if you try to go to, say, 127.0.0.1/site/beta/admin/controls/ - but it does if you just go to /index.php), and still loads twice.
Is anyone able to help me? I spent a few hours in IRC, and no one could come up with a solution that worked. (The two above are the only ones suggested)

Are you completly sure it's a mod_rewrite bug? If you enable RewriteLog file with a high rewriteLogLevel (9) do you see the same requests handled 2 times?
For me every time I see the 'same request done 2 times' I think about another strange web bug: The empty IMG src bug.
If you have somewhere in your HTML an
<IMG SRC="">
or in one of the css (harder to find) a:
url()
Then you've got it. HTTP protocol dictate that an empty GET url (and an image or url() in css is a GET implicit request) MUST be a call to the same url as the one which render the original page (and it can be a POST as well if you get your page as a POST request).
There're really few reason to have a mod_rewrite responding 2 times to one single request. Check with Firebug or LiveHTTP Requests that you're not always sending the index.php request 2 times. Or test your server with a telnet-mode HTTP request, by hand, as this will certainly send only one request.

This can also be the browser trying to load (invisibly it seems unless you check the apache access logs) favicon.ico. I had the same issue until I put one in my sites root directory. I know the issue was resolved for the initial asker, I'm putting this here for people like myself looking for an answer to the same question.

Related

POST information getting lost in .htaccess redirect

So, I have a fully working CRUD. The problem is, because of my file structure, my URLs were looking something like https://localhost/myapp/resources/views/add-product.php but that looked too ugly, so after research and another post here, I was able to use a .htaccess file to make the links look like https://localhost/myapp/add-product (removing .php extension and the directories), and I'm also using it to enforce HTTPS. Now, most of the views are working fine, but my Mass Delete view uses POST information from a form on my index. After restructuring the code now that the redirect works, the Mass Delete view is receiving an empty array. If I remove the redirect and use the "ugly URLs" it works fine. Here's how my .htaccess file is looking like:
Options +FollowSymLinks +MultiViews
RewriteEngine On
RewriteBase /myapp/
RewriteRule ^resources/views/(.+)\.php$ $1 [L,NC,R=301]
RewriteCond %{DOCUMENT_ROOT}/myapp/resources/views/$1.php -f
RewriteRule ^(.+?)/?$ resources/views/$1.php [END]
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
I didn't actually write any of it, it's a mesh between answered questions and research. I did try to change the L flag to a P according to this post: Is it possible to redirect post data?, but that gave me the following error:
Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at admin#example.com to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.
Apache/2.4.52 (Win64) OpenSSL/1.1.1m PHP/8.1.2 Server at localhost Port 443
POST information getting lost in .htaccess redirect
You shouldn't be redirecting the form submission in the first place. Ideally, you should be linking directly to the "pretty" URL in your form action. If you are unable to change the form action in the HTML then include an exception in your .htaccess redirect to exclude this particular URL from being redirected.
Redirecting the form submission is not really helping anyone here. Users and search engines can still see the "ugly" URL (it's in the HTML source) and you are doubling the form submission that hits your server (and doubling the user's bandwidth).
"Redirects" like this are only for when search engines have already indexed the "ugly" URL and/or is linked to by external third parties that you have no control over. This is in order to preserve SEO, just like when you change any URL structure. All internal "ugly" URLs should have already been converted to the "pretty" version. The "ugly" URLs are then never exposed to users or search engines.
So, using a 307 (temporary) or 308 (permanent) status code to get the browser to preserve the request method across the redirect should not be necessary in the first place. For redirects like this it is common to see an exception for POST requests (because the form submission shouldn't be redirected). Or only target GET requests. For example:
RewriteCond %{REQUEST_METHOD} GET
:
Changing this redirect to a 307/8 is a workaround, not a solution. And if this redirect is for SEO (as it only should be) then this should be a 308 (permanent), not a 307 (temporary).
Aside:
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Your HTTP to HTTPS redirect is in the wrong place. This needs to go as the first rule, or make sure you are redirecting to HTTPS in the current first rule and include this as the second rule, before the rewrite (to ensure you never get a double redirect).
By placing this rule last then any HTTP requests to /resources/views/<something>.php (or /<something>) will not be upgraded to HTTPS.

How to redirect 404 errors (and 403) to index.html with a 200 response

I am building a static website that uses JS to parse a URL in order to work out what to display.
I need every URL to actually open index.html where the JS can pull apart the path and act accordingly.
For example http://my.site/action/params will be parsed as an action with some parameters params.
Background, this will be served from AWS S3 via CloudFront using custom error redirection - and this works fine on AWS.
I am, however, trying to build a dev environment under Ubuntu running apache and want to emulate the redirection locally.
I have found a couple of pages that come close, but not quite.
This page shows how to do the redirect to a custom error page on the server housed in a file called "404". As 404 is the actual error response code, the example looks a bit confusing and I am having trouble modifying the example to point to index.html.
The example in the accepted answer suggests:
Redirect 200 /404
ErrorDocument 404 /404
which I have modified to:
Redirect 200 /index.html
ErrorDocument 404 /index.html
However this returns a standard 404 Not Found error page.
If I remove the Redirect line, leaving just the ErrorDocument line, I get the index.html page returned as required, but the https status response is still a 404 code where I need it to be a 200.
If I leave the Redirect line as per the example I actually get the same result as my modified version, so I suspect this is the line that is incorrect, but I can't figure it out.
(I'm using the Chrome Dev Tools console to see the status codes etc).
I think I have found a solution using Rewrite rules instead of error docs.
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
The key I was missing in this approach seems to be not including an R=??? status response code at the end of the rewrite rule. It took me a while to find that!
As it uses mod_rewrite rather than defining error pages I assume that the mechanism is different to how CloudFront does it, but for my dev system needs it seems that the result is the same - which means I can work on the site without having to invalidate the CloudFront cache after every code change and upload.

HTTPS redirect fails with .htaccess rewrite for certain URL length

I have an .htaccess file for showing a default image if the requested URL does not exist. I simplified it to this:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . default.png [L]
Using HTTPS, this suddenly stopped working if the URL exceeds a certain length (connection closed).
HTTP always works.
It used to work like this for years and it still does on other servers.
It also seems that the kind of characters matter:
not working:
https://server.abc/images/01234567890123456789012345678901234567890123456789abc.png
https://server.abc/images/012345678901234567890123456789012345678901234567890123456789.png
working:
https://server.abc/images/01234567890123456789012345678901234567890123456789.png
https://server.abc/images/01234567890123456789012345678901234567890123456789123.png
https://server.abc/images/0123456789012345678901234567890123456789012345678912345.png
The redirect works if the condition is removed (second line), so it seems like it has something to do with REQUEST_FILENAME, HTTPS and the byte size (encoding?) of the filename/URL string.
This occurs with Apache/2.4.46 and macOS/10.15.7. It might have started after one of the latest security updates.
Any idea where this is coming from or what kind of configuration could cause this?
Thanks for your help!
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . default.png [L]
It's not clear why this would "fail" for only certain requests over HTTPS only. A "security" update (particularly if it involves mod_security) is a likely cause - although an unusual one.
However, you shouldn't really be doing it this way to begin with. This will result in a request for any non-existent URL being served /default.png with a "200 OK" response and potentially risk being indexed by search engines and abused by a malicious user.
What you are doing here is essentially setting a custom 404 response to an image, which you could do with the following instead and which will also return the "correct" 404 status.
ErrorDocument 404 /default.png
Now, any request that does not map to file (or directory) will be served the image /default.png but with a 404 "Not Found" HTTP response code, so search engines/bots get the "correct" response.
This also naturally gets around the REQUEST_FILENAME issue, assuming these "not working" URLs do ultimately result in a 404 and not some other response (due to the "security" update).

RewriteRule causes page to reload twice

I shaped two different RewriteRules for my page:
# Enable URL Rewriting
RewriteEngine on
# exclude followed stuff
RewriteRule ^(js|img|css|favicon\.ico|image\.php|anprobe|content|libs|flash\.php|securimage)/ - [L,QSA,S=2]
# conditions (REQUEST dont point # file|dir|link)
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-F
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
# rules
RewriteRule ^(?!index\.php)brillen/(.*(brillen)|360|neu)/(.*)([a-zA-Z0-9]{5}-[a-zA-Z0-9]{5}(?!\.))(.*)$ /index.php/brillen/$1?art_id=$4&$5&%{QUERY_STRING} [NS,QSA,L]
RewriteRule ^(?!index\.php)(.*)$ /index.php/$1 [NS,QSA,L]
... and I'm encountering a strange problem, which lies in every request causing the page internally to load twice, which leads to the problem that db actions and email dispatching are also executed twice.
Does anyone have an idea concerning that?
Thanks in advance!
Note 1: All requested resources are valid and available according to the browser's resource tracking.
Note 2: May the problem originate in retaining and post-processing the PATH_INFO? (/index.php/$1 => /index.php/foo/bar/...)
The rewrite Engine cannot make a single HTTP request run twice. It routes the HTTP request for Apache to either a static file, a proxy function, or a module (like PHP) with alteration in the request. But it cannot clone the request and give it 2 times to apache.
When you have any "run twice" problem chances are that you are hit by the empty image url bug. In fact it's not really a bug it's a feature of HTML (at least before HTML5) and a feature of url-parsing.
If you get somewhere an empty GET url, HTML states that the browser should re-send the same query (the one that gave him the current page) with same parameters. This can make a POST request happen 2 times (if the requested 1st page were a POST). So where are these empty GET url? Most of the time you get either :
<IMG SRC="" ...> (in the HTML)
or:
url() (in the css)
or:
<script type="text/javascript" src=""></script>
<link rel="stylesheet" type="text/css" href=""> (in the HTML headers)
Read also #Jon answer about the favicon query. You should always test the result without browsers behaviours by using wget or telnet 80 queries.
Update: detailled explanations and followups available on this blog with HTML5 additions which should remove this behavior for modern browsers.
I had the same issue (or so I thought). It was caused by the request for favicon.ico, which I hadn't considered in my rewrite rule.
I had the same problem, caused because I did some url rewriting, and the script was being loaded twice, due to the fact that i did not add this:
RewriteRule ^(js|img|css|favicon\.ico)/ - [L,QSA,S=2]
This will stop the script from being loaded twice; it solved my problem.

Apache mod_perl handler/dispatcher returning control to apache

Is it possible to have an apache mod_perl handler, which receives all incoming requests and decides based upon a set of rules if this request is something it wants to act upon, and if not, return control to apache which would serve the request as normal?
A use-case:
A legacy site which uses
DirectoryIndex for serving index.html
(or similar) and default handlers for
perl scripts etc, is being given a
freshened up url-scheme
(django/catalyst-ish). A dispatcher
will have a set of urls mapped to
controllers that are dispatched based
on the incoming url.
However, the tricky part is having
this dispatcher within the same
namespace on the same vhost as the old
site. The thought is to rewrite the
site piece by piece, as a "update all"
migration gives no chance in testing
site performance with the new system,
nor is it feasible due to the sheer
size of the site.
One of the many problems, is that the dispatcher now receives all URLs as expected, but DirectoryIndex and static content (which is mostly served by a different host, but not everything) is not served properly. The dispatcher returns an Apache::Const::DECLINED for non-matching urls, but Apache does not continue to serve the request as it normally would, but instead gives the default error page. Apache does not seem to try to look for /index.html etc.
How can this be solved? Do you need to use internal redirects? Change the handler stack in the dispatcher? Use some clever directives? All of the above? Not possible at all?
All suggestions are welcome!
I have done a similar thing but a while back, so I might be a little vague:
I think you need to have the standard file handler (I believe this is done using the set-handler directive) as well as the perl handler in the stack
You might need to use PerlTransHandler or a similar one to hook into the filename/url mapping phase and make sure the next handler inline will pick the right file up off the filesystem.
Maybe you will have success using a mod_rewrite configuration which only does rewrite URLs to your dispatcher if a requested file does not exist in the file system. That way your new application acts as an overlay to the old application and can be replaced in successive steps by just removing old parts of the application during deployment of new parts.
This can be accomblished by a combination of RewriteCond and RewriteRule. Your new application needs to sit in a private "namespace" (location) not otherwise used in the old application.
I am not a mod_perl expert but with eg mod_php it could work like this:
RewriteEngine on
# do not rewrite requests into the new application(s) / namespaces
RewriteRule ^new_app/ - [L]
# do not rewrite requests to existing file system objects
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
# do the actual rewrite here
RewriteRule ^(.*)$ new_app/dispatcher.php/$1