I am working with a tomcat application that I cannot modify, though I have full control over apache which is set up in front of it (with a reverse proxy).
The application will do a 302 redirect to pagenotavailable.jsp when it encounters a URL that's no longer valid (i.e. the ID no longer exists or is malformed).
I need to figure out a way to make these return a 404 so that these URLs drop out of the search engine indexes.
One possibility I came up with is to set up a mod_rewrite redirect from pagenotavailable.jsp to a page I made called 404.html :
RewriteEngine on
RewriteRule ^/pagenotavailable\.jsp /404.html [NC,R=404,L]
Or just this, since 404.html is already set up as ErrorDocument:
RewriteRule ^/pagenotavailable\.jsp$ - [R=404]
That is showing the content of my custom 404.html page, but the URL does not update (it is still pagenotavailable.jsp) and the status code is still 302.
Any ideas why I don't get a 404, of any alternative approaches are appreciated!
If you are responding with a 302, then the client will always update its URL (e.g. in the URL bar) to show .../pagenotavailable.html. If you want that to go away, you'll have to redirect again to your preferred URL. The only other option would be to modify the application so it doesn't perform a redirect, but instead a "forward" which is handled entirely on the server-side. But you specifically mentioned that you can't modify the web app, soo...
I'd be very surprised if the status code were still 302... when a client receives a 302 response, it should perform a GET to the Location provided in the 302 response header. If anything, I'd expect a 200 response with your 404.html content if the response code wasn't being set to 404.
The [R=xxx] flag tells mod_rewrite to issue a redirect, which must be a 3xx response code. You can use a 404, but you should be aware of the caveats. That page says what happens with the Location header (i.e. nothing) and it says that it will imply the [L] flag, but it doesn't say anything about what response code will actually be sent to the client.
What about using RewriteRule to rewrite the page to something that actually doesn't exist?
RewriteRule ^pagenotavailabe.jsp$ /does-not-exist.html
... then let your standard 404 handler handle the error (and return the contents of 404.html plus the 404 protocol response).
Related
I am building a static website that uses JS to parse a URL in order to work out what to display.
I need every URL to actually open index.html where the JS can pull apart the path and act accordingly.
For example http://my.site/action/params will be parsed as an action with some parameters params.
Background, this will be served from AWS S3 via CloudFront using custom error redirection - and this works fine on AWS.
I am, however, trying to build a dev environment under Ubuntu running apache and want to emulate the redirection locally.
I have found a couple of pages that come close, but not quite.
This page shows how to do the redirect to a custom error page on the server housed in a file called "404". As 404 is the actual error response code, the example looks a bit confusing and I am having trouble modifying the example to point to index.html.
The example in the accepted answer suggests:
Redirect 200 /404
ErrorDocument 404 /404
which I have modified to:
Redirect 200 /index.html
ErrorDocument 404 /index.html
However this returns a standard 404 Not Found error page.
If I remove the Redirect line, leaving just the ErrorDocument line, I get the index.html page returned as required, but the https status response is still a 404 code where I need it to be a 200.
If I leave the Redirect line as per the example I actually get the same result as my modified version, so I suspect this is the line that is incorrect, but I can't figure it out.
(I'm using the Chrome Dev Tools console to see the status codes etc).
I think I have found a solution using Rewrite rules instead of error docs.
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.html [L]
The key I was missing in this approach seems to be not including an R=??? status response code at the end of the rewrite rule. It took me a while to find that!
As it uses mod_rewrite rather than defining error pages I assume that the mechanism is different to how CloudFront does it, but for my dev system needs it seems that the result is the same - which means I can work on the site without having to invalidate the CloudFront cache after every code change and upload.
I have an .htaccess file for showing a default image if the requested URL does not exist. I simplified it to this:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . default.png [L]
Using HTTPS, this suddenly stopped working if the URL exceeds a certain length (connection closed).
HTTP always works.
It used to work like this for years and it still does on other servers.
It also seems that the kind of characters matter:
not working:
https://server.abc/images/01234567890123456789012345678901234567890123456789abc.png
https://server.abc/images/012345678901234567890123456789012345678901234567890123456789.png
working:
https://server.abc/images/01234567890123456789012345678901234567890123456789.png
https://server.abc/images/01234567890123456789012345678901234567890123456789123.png
https://server.abc/images/0123456789012345678901234567890123456789012345678912345.png
The redirect works if the condition is removed (second line), so it seems like it has something to do with REQUEST_FILENAME, HTTPS and the byte size (encoding?) of the filename/URL string.
This occurs with Apache/2.4.46 and macOS/10.15.7. It might have started after one of the latest security updates.
Any idea where this is coming from or what kind of configuration could cause this?
Thanks for your help!
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule . default.png [L]
It's not clear why this would "fail" for only certain requests over HTTPS only. A "security" update (particularly if it involves mod_security) is a likely cause - although an unusual one.
However, you shouldn't really be doing it this way to begin with. This will result in a request for any non-existent URL being served /default.png with a "200 OK" response and potentially risk being indexed by search engines and abused by a malicious user.
What you are doing here is essentially setting a custom 404 response to an image, which you could do with the following instead and which will also return the "correct" 404 status.
ErrorDocument 404 /default.png
Now, any request that does not map to file (or directory) will be served the image /default.png but with a 404 "Not Found" HTTP response code, so search engines/bots get the "correct" response.
This also naturally gets around the REQUEST_FILENAME issue, assuming these "not working" URLs do ultimately result in a 404 and not some other response (due to the "security" update).
My goal is to reduce the visibility of my app's signature. This is not security by obscurity, just a superficial bit of defence in depth, so that at first glance an attacker cannot tell if it is a static site or not. (Also cosmetic; it just feels "cleaner" to hide app details even if they would never become visible in normal operation). Therefore I want to deny access to some directories without revealing that they exist, so I must give the exact same 404 response my app would give if the user requested a non-existent page.
In an .htaccess file, I have the following:
RewriteEngine on
RewriteCond "%{REQUEST_FILENAME}" "!-f"
RewriteCond "%{REQUEST_FILENAME}" "!-d"
RewriteRule "^(.*)" "index.php?page=$1"
RewriteRule "^(secret_dir1|secret_dir2)(/.*)?$" "index.php?page=404"
where index.php renders a nice pretty webpage according to the value of the "page" GET parameter; if "page" does not correspond to a page at the app level, or "page" is set to 404, the script renders a pretty 404 page with proper headers and everything.
Here's where the problem happens. "App-level" 404s work as expected; a 404 page is rendered. However, if the user requests mydomain.com/dir_i_am_trying_to_hide, they are given a 301 redirect to mydomain.com/dir_i_am_trying_to_hide/?page=404: an external redirect instead of an internal rewrite.
Why is it sending out an external redirect instead of just rewriting the url? How am I supposed to avoid this properly? Barring that, is there a way to force the server to do an internal rewrite instead? (The Apache docs seem to indicate you can force a RewriteRule to be external, but not the other way around)
Turns out my rewrite rule was not causing the external redirect; Apache's DirectorySlash was; I would query hostname/secret_dir1 and it would send a redirect to hostname/secret_dir1/.
I'm not sure why the query string was changed, but adding DirectorySlash off fixed it.
I am trying to get any 403 request to go to a custom error page with 200 OK request (for security reason).
Tried this to redirect any 4xx error code to a custom error page
ErrorDocument 403 /shared/error.html
But this will not change the response code (it will still be 404).
There this option but it will give 302 here
Tried this to change the error code:
RewriteEngine on
RewriteRule ^/shared/.*/$ /shared/error.html [R=200,L]
But somehow it doesn't redirect to the custom error page I want to.
Tried it with absolute path as well:
RewriteEngine on
RewriteRule ^/shared/.*/$ https://%{SERVER_NAME}/shared/error.html [R=200,L]
Still doesn't work. Is it not possible or am I missing something obvious? This is my first time modifying but I did some research already. Any help is appreciated.
If you can use ErrorDocument for the code 200:
ErrorDocument 200 /shared/error.html
RewriteEngine on
RewriteRule ^/shared/.*/$ anything [R=200]
When the rule is applied it will execute the redirection to the ErrorDocument straight away, and that is the reason why you can put "anything" in there.
Check the documentation for the R flag (https://httpd.apache.org/docs/current/rewrite/flags.html):
The status code specified need not necessarily be a redirect (3xx) status code. However, if a status code is outside the redirect range (300-399) then the substitution string is dropped entirely, and rewriting is stopped as if the L were used.
Beware that you might not want this redirection for the whole code, so you should apply it within a context. And take cara with your regex so you don't fall in a loop.
Source: https://www.askapache.com/htaccess/crazy-advanced-mod_rewrite-tutorial/
If you'd use a php-script for example instead of your static html-document, you can override the response code inside your script.
See: http://php.net/manual/en/function.http-response-code.php
Hey guys, have a question regarding apache. I have a site that's been re-engineered, but I want to capture all the 'old' links that people may have bookmarked or come from search engines to the old site which is under a new domain name. How do I get apache to redirect only 404 not found to the old site?
TIA,
J
You should first decide what status code you want to send. Sending both a 404 status code and a redirect is not possible.
But seth did already mention the right method, the ErrorDocument directive:
# local path
ErrorDocument 404 /local/path/to/error/document
# external URI
ErrorDocument 404 http://uri.example/to/error/document
If you use a local path, the 404 status code is sent. If you use an absolute URI, a 302 status code (temporary redirect) is sent.
And if you want to send a 301 redirect:
Redirect 301 / http://new.example.com/
Your old domain should capture all responses and return a '301 moved permanently' response with the new domain in the 'Location' field of the header. A 404 means 'not found' and in this case it's not strictly true.
Another option, similar to that proposed by #seth is to add the handler so it points to a static html page which you can use to explain to the user what happen, and present them with options.
You can include a meta redirect so that if they don't do anything after a few seconds they're automatically redirected.
Which option will work best is really up to you do decide.
You could set your 404 document to a CGI that redirects the user.
ErrorDocument 404 /cgi-bin/redirect-to-other.cgi