perl "fake" a 404, but let apache handle it - apache

I tried searching for the answer to this but no luck. Basically I have an .htaccess file which specifies a script to handle 404s:
ErrorDocument 404 /cgi-bin/handle_errors.cgi?404
I have another script which handles page requests according to the query string via a rewrite rule:
RewriteRule ^/path/handler/(.*)$ /cgi-bin/path_handler?$1 [QSA,L]
such that
insaner.com/path/handler/123
would show the content generated by:
insaner.com/cgi-bin/path_handler?123
if some conditions are met for "123". If they aren't, I would like to issue a "404" status, but have that handled by apache itself (which would in fact be then handled by /cgi-bin/handle_errors.cgi?404). So, is such a thing possible? I know I can just call /cgi-bin/handle_errors.cgi?404 from the script after printing the 404 status, but is there a way to get apache to handle the 404? Ie, such that if I later comment out the line in .htaccess, that apache issues its standard 404 response?
Also, is
print "Status: 404\n\n";
enough for the browser? Or do I need to do:
print "Status: 404 Not Found\n\n";
or something like that?

Found this answer to the second part of my question in the CGI.pm documentation:
Note that the human-readable phrase is also expected to be present to
conform with RFC 2616, section 6.1.
EDIT:
This could be solved by having the path handler script print the Status: code then just printing the results of an LWP::UserAgent request (which might clobber important environment variables and data in the process). So it is doable, but is not an approach I would recommend. It is better to break out the error handler code into a module that the path handler can then call itself to generate the content, avoiding the extra step of involving apache, and allowing you to keep any variables and data (and handling that if need be). As my question originally asked, yes, both approaches would also allow things to continue working even if the line in .htaccess was commented out or removed.

Related

.htaccess redirect based on a part of URL

After site crash a redirect php script doesn't work as expected.
We try to fix it, but in the meantime we are looking for a quick solution to redirect search engine results so our visitors can at least visit after clicking a relative web page.
The url structure or the search engines result are something like this:
https://www.example.com/MainCategory/SubCategory_1/SubCategory_2/Product?page=1
and I'd like to redirect using the "SubCategory_2" part of the URL to something like this
https://www.example.com/SubCategory_2.php
so until we fully repair the script at least our visitors will se a relative web page.
I'm quite stuck... Any ideas?
Thank you
To redirect the stated URL, where all parts are variable (including an entirely variable, but present query string) then you can do something like the following using mod_rewrite near the top of your root .htaccess file (or crucially, before any existing internal rewrites):
RewriteEngine On
RewriteCond %{QUERY_STRING} .
RewriteRule ^[^/]+/[^/]+/([^/]+)/[^/]+$ /$1.php [QSD,R=302,L]
The QSD flag is necessary to discard the original query string from the redirected response.
The above will redirect:
/MainCategory/SubCategory_1/SubCategory_2/Product?page=1 to /SubCategory_2.php
/foo/bar/baz/qux?something to /baz.php
You can test it here using this htaccess tester.
UPDATE:
unfortunately without success. I get 404 error.
You'll get a 404 if the directive did not match the requested URL, or /SubCategory_2.php does not exist.
Is the URL redirected? What do you see in the browser's address bar?
If there was no redirect then the above rule did not match the requested URL and the rule did nothing. Either because:
The URL format is not as stated in the question.
The rule is in the wrong place in the .htaccess file. As stated, this rule needs to be near the top of the config file.
I found a basic solution here htaccess redirect if URL contains a certain string I crate something like this RewriteRule ^(.*)SubCategory_2(.*)$ https://example.com/SubCategory_2.php[L,R=301] and works just fine. My problem is that this is a "static solution" since "SubCategory_2" is a variable.
Ok, but that is a very generic (arguably "too generic") solution for the problem you appear to be attempting to solve. This matches "SubCategory_2" anywhere in the URL-path (not just whole path segments) and preserves any query string (present on your example URL) through the redirect. So, this would not perform the stated redirect on the example URL in your question.
However, the directive you've posted (which you say "works just fine") cannot possibly work as written, at least not by itself. Ignoring the missing space (a typo I assume) before the flags argument, this would result in an endless redirect loop, since the target URL /SubCategory_2.php also matches the regex ^(.*)SubCategory_2(.*)$.
Also, should this be a 301 (permanent) redirect? You seem to imply this is a "temporary" solution?
HOWEVER, it's not technically possible to make "SubCategory_2" entirely variable in this "basic solution" and search for this variable "something" in a larger string and redirect to "something.php". How do you know that you have found the correct part of a much larger URL? You need to be more specific about what you are searching for.
In your original question you are extracting the 3rd path segment in a URL-path that consists of 4 path segments and a query string. That is a perfectly reasonable pattern, but you can't extract "something" when you don't know what or where "something" is.

htaccess rewrite for everything after the extension that is not a query string

I'm trying to rewrite URLs in such a way as to remove everything after the extension (in my case .php) that is not a query string.
Ideally, I'd like these requests to respond with a 404, but it seems the default Apache/PHP setup is to simply return the page as normal.
For example a request to /index.php/anystring shows my home page, not a 404 as I would expect.
Wanted to reach out as I'd be surprised if someone hasn't solved this problem already.
Thanks.
You need to disable the pathname info using AcceptPathInfo Directive,
Add the following line to your htaccess :
AcceptPathInfo off

Apache throwing 403 on www.example.com/dir/.php

I've discovered from Google Webmaster Tools that Google has found a bad link somewhere that is throwing a 403 error on my server.
The url is like this:
http://www.example.com/directory/.php
I don't know how that url has come about and the site is too complicated for me to find out, but I'd like to simply place a 301 redirect to:
http://www.example.com/directory/
I've put the correct rule in .htaccess for the redirect, but it doesn't appear to be triggered. It's almost as if the 403 is being generated before .htaccess is processed. Does anyone know why this might be and how I can successfully get the user redirected to the new page?
Except for the fact that the rule may be indeed incorrect, this could also be because this involves looking for a file with a filename starting with a ..
These files are usually "hidden" from the outside world, so it might be that either your Apache (configuration) or your OS does not allow serving this file.
Without knowing more about the configuration, this cannot be analyzed though. You could test it by creating a file called ".test" and see if that's reachable.

Aliases on Dreamhost, general management of http request / server errors

I had a hard time deciding how I should manage these errors (404, 500, ...) and when I finally decided, I am encountering problems. This is a reeeeeally long question, I appreciate anyone's attempt to help!
Let me first describe how I decided to set it up. I have several sites hosted on a shared Dreamhost account. In the folder structure that I see, everything of mine on the server is under /home/username, and for example, site1.com's web root is at /home/username/site1.com
I am creating a generic error handler (php script) for errors like 404 not found, 500, etc. that I want to store above the web roots of my sites at /home/username/error_handler/index.php so that I can use an .htaccess file at /home/username/.htaccess which includes something like the following:
ErrorDocument 404 /error_handler/index.php
ErrorDocument 500 /error_handler/index.php
...and many more
When these errors occur on any of my sites, I want it to be directed to /home/username/error_handler/index.phpThis is the problem I'm having a hard time figuring out. The ErrorDocument directives above will actually cause Apache to look for /home/username/site1.com/error_handler/index.php
Anyway, the errors should be redirected to my error handling php script. The script will use $_SERVER['REDIRECT_STATUS'] to get the error code, then use $_SERVER['REDIRECT_URL'] and $_SERVER['HTTP_HOST'] to decide what to do. It will check if an error handler specific to that site exists (for example: site1.com/errors/404.php). If this custom page doesn't exist, it will output a generic message that is slightly more user-friendly and styled, and perhaps will include some contact info for me depending on the error.
Doing it this way lets me funnel all these errors through this 1 php script. I can log the errors however I like or send email notifications if I want. It also lets me set up the ErrorDocument Apache directives once for all my sites instead of having to do it for every site. It will also continue to work without modification when I move the site around, since I already have a system that scans the folder structure to figure out where my site roots are when they really aren't at the web root technically speaking. This may not be possible with other solutions like using mod_rewrite for all 404 problems, which I know is common. Or if it is possible, it may be very difficult to do. Plus, I have already done that work, so it will be easy for me to adapt.
When I am working on sites for which I don't have a domain name yet (or sites where the domain name is already in use at the moment), I store them temporarily in site1.com/dev/site3.com for example. Moving the site to site3.com eventually would cause me to have to update the htaccess files if I had one for each site. Changing the domain name would do the same.
Ex: a site stored at site1.com/dev/site3.com would have this in its htaccess file:
ErrorDocument 404 /site1.com/dev/site3.com/error/404.php
And it would have to be changed to this:
ErrorDocument 404 /site3.com/error/404.php
Obviously, this isn't a huge amount of work, but I already manage a lot of sites and I will probably be making more every year, 95% of which will be hosted on my shared DreamHost account. And most of them get moved at least once. So setting up something automatic will save me a some effort in the long run.
I already have a system set up for managing site-relative links on all my sites. These links will work whether the site exists in a subdirectory of an existing site, or in their own domain. They also work without change in a local development server despite a difference in the web root location. For example, on the live server, the site-relative http link /img/1.jpg would resolve to the file /home/username/site1.com/img/1.jpg while on my local development server it would resolve to C:\xampp\htdocs\img\1.jpg, despite what I consider the logical site root being at C:\xampp\htdocs\site1.com. I love this system, and it is what gave me the idea to set up something that would work automatically like I expected it to, based on the file structure I used.
So, if I could get it to work, I think this seems like a pretty good system. But I am still very new to apache configuration, mod_rewrite, etc. It's possible there is a much easier and better way to do this. If you know of one, please let me know.
Anyway, all that aside, I can't get it working. The easiest thing would be if I could have the ErrorDocument directive send the requests to folders above the web root. But the path is a URL path relative to the document root. Using the following in /home/username/.htaccess,
ErrorDocument 404 /error_handler/index.php
a request for a non-existent resource causes Apache to look for the file at
site1.com/error_handler/index.php
So I thought I should set up a redirection (on all my sites) that would redirect those URLS to /home/username/error_handler. I tried a few things and couldn't get any of them to work.
Alias seemed like the simplest solution, but it is something that has to be set at server runtime (not sure if that is the right terminology - when the server is started). On my local server, it worked fine using:
Alias /error_handler C:\xampp\htdocs\error_handler2
I changed the local folder to test that the Alias was functioning properly. (On the local server, the URL path specified by the ErrorDocument directive is actually pointing to the right folder, since in my local server the web root is technically C:\xampp\htdocs and I store the error handler I want to use is stored locally at C:\xampp\htdocs\error_handler\index.php)
Dreamhost has a web client that can create what I am guessing is an Alias. When I tried to redirect the folder error_handler on site1.com to /home/username/error_handler, it would seem to work right if I typed site1.com/error_handler in the browser. But if I typed site1.com/test1234 (non-existant), it would say there was a 404 error trying to use the error handler. Also, I would have to login through the web client and point and click (and wait several minutes for the server to restart) every time I wanted to set this up for a new site, even if I could get it to work.
So I tried getting it to work with mod_rewrite, which seems like the most flexible solution. My first attempt looked something like this (stored in /home/username/site1.com/.htaccess for now, though it would eventually be at /home/username/.htaccess:
RewriteEngine On
RewriteRule ^error_handler/index.php$ /home/username/error_handler/index.php
The plain english version of what I was trying to do above is to send requests on any of my sites for error_handler/index.php to /home/username/error_handler/index.php. The mis-understanding I had is that the subsitution will be treated as a file path if it exists. But I missed that the documentation says "(or, in the case of using rewrites in a .htaccess file, relative to your document root)". So instead of rewriting to /home/username/error_handler/index.php, it's actually trying to rewrite to /home/username/site1.com/home/username/error_handler/index.php.
I tried including Options +FollowSymLinks because in the Apache documentation it says this:
To enable the rewrite engine in this context [per-directory re-writes in htaccess], you need to set "RewriteEngine On" and "Options FollowSymLinks" must be enabled. If your administrator has disabled override of FollowSymLinks for a user's directory, then you cannot use the rewrite engine. This restriction is required for security reasons.
I searched around for a while and I couldn't find anything about how Dreamhost handles this (probably because I don't know where to look).
I experimented with RewriteBase because in the Apache documentation it says this:
"This directive is required when you use a relative path in a substitution in per-directory (htaccess) context unless either of the following conditions are true:
The original request, and the substitution, are underneath the DocumentRoot (as opposed to reachable by other means, such as Alias)."
Since this is supposed to be a URL path, in my case it should be RewriteBase /, since all my redirects will be from site1.com/error_handler. I also tried Rewrite Base /home/username and RewriteRule ^error_handler/index.php$ error_handler/index.php. However, the Rewrite Base is a URL path relative to the document root. So I need to use something like an alias still. The implication in the quote from the documentation above is that it is possible to use mod_rewrite to send content above the web root. One of the many things I don't know is what the 'other means' besides Alias might be. I believe Alias might not be an option on Dreamhost. At least I couldn't make sense of it.
Why not use error pages in the site root, then include the actual file from the shared section?

Weird apache behavior when trying to display urls without html extension

I have a url that is easily accessible when you request it as:
http://example.com/2005/01/example.html
or
http://example.com/2005/01/example
(I actually don't know why it works without the extension. Instead, it should return the usual 404 error.)
However, when I try to access the following url:
http://example.com/2005/01/example/
(note the trailing slash)
I get a 404 Not found error but with the requested url printed as:
http://example.com/2005/01/example.html/
So, it seems the ".html" part was automatically added by apache.
My question is: how do I disable this behavior? I need to do it because I want add mod_rewrite rules to hide the html extension, so that I can access that url as:
http://example.com/2005/01/example/
My apache is 2.2.9 on Ubuntu 8.10.
Thanks!
MultiViews could cause this behavior. Try to disable it:
Options -MultiViews
Is example.html an actual file that lies in the directory path 2005/01? It seems like mod_rewrite is already active. If you use a blog or content management system on your server, then it probably does stuff to your url's already.