IIS8 - Slash after file name delivers file & HTTP:200 for CFM, HTML, but HTTP:404 for ASP - iis-8

I'm working on a site that's recently migrated to IIS8/Windows Server 2012 R2. It's running ColdFusion 11, and serves a mix of static HTML, CFM and ASP content.
The problem I'm facing is this:
For .cfm and .htm files, it is possible to enter a URL with a trailing slash (with or without arbitrary text after the slash) and still get the file served with a HTTP 200 OK response code. For example, all of these:
myurl.com/about.cfm/foo
myurl.com/about.cfm/
myurl.com/about.cfm/typewhateveryouwantitdoesnotmatter
will deliver the same content as
myurl.com/about.cfm
Except that any relative URLs in the page will break - images, CSS, scripts, links, etc. IIS interprets about.cfm/ as a directory, and returns the rendered content of the about.cfm file for any nonexistent "file" in that "directory".
This behavior is undesireable - I'd rather it produced a 404 error.
Interestingly, the misbehavior described above does not work for ASP content.
myurl.com/about.asp/foo
returns HTTP 404: Not Found just as one would want.
Googling this problem is tough because of the signal-to-noise ratio - I'm wading through reams of basic IIS URL rewrite advice concerning the addition of trailing slashes to directories, and having a hard time finding anything like my situation. Thanks in advance for your help!

Related

Archiving an old PHP website: will any webhost let me totally disable query string support?

I want to archive an old website which was built with PHP. Its URLs are full of .phps and query strings.
I don't want anything to actually change from the perspective of the visitor -- the URLs should remain the same. The only actual difference is that it will no longer be interactive or dynamic.
I ran wget --recursive to spider the site and grab all the static content. So now I have thousands of files such as page.php?param1=a&param2=b. I want to serve them up as they were before, so that means they'll mostly have Content-Type: text/html, and the webserver needs to treat ? and & in the URL as literal ? and & in the files it looks up on disk -- in other words it needs to not support query strings.
And ideally I'd like to host it for free.
My first thought was Netlify, but deployment on Netlify fails if any files have ? in their filename. I'm also concerned that I may not be able to tell it that most of these files are to be served as text/html (and one as application/rss+xml) even though there's no clue about that in their filenames.
I then considered https://surge.sh/, but hit exactly the same problems.
I then tried AWS S3. It's not free but it's pretty close. I got further here: I was able to attach metadata to the files I was uploading so each would have the correct content type, and it doesn't mind the files having ? and & in their filenames. However, its webserver interprets ?... as a query string, and it looks up and serves the file without that suffix. I can't find any way to disable query strings.
Did I miss anything -- is there a way to make any of the above hosts act the way I want them to?
Is there another host which will fit the bill?
If all else fails, I'll find a way to transform all the filenames and all the links between the files. I found how to get wget to transform ? to #, which may be good enough. It would be a shame to go this route, however, since then the URLs are all changing.
I found a solution with Netlify.
I added the wget options --adjust-extension and --restrict-file-names=windows.
The --adjust-extension part adds .html at the end of filenames which were served as HTML but didn't already have that extension, so now we have for example index.php.html. This was the simplest way to get Netlify to serve these files as HTML. It may be possible to skip this and manually specify the content types of these files.
The --restrict-file-names=windows alters filenames in a few ways, the most important of which is that it replaces ? with #. This is needed since Netlify doesn't let us deploy files with ? in the name. It's a bit of a hack; this is not really what this option is meant for.
This gives static files with names like myfile.php#param1=value1&param2=value2.html and myfile.php.html.
I did some cleanup. For example, I needed to adjust a few link and resource paths to be absolute rather than relative due to how Netlify manages presence or lack of trailing slashes.
I wrote a _redirects file to define URL rewriting rules. As the Netlify redirect options documentation shows, we can test for specific query parameters and capture their values. We can use those values in the destinations, and we can specify a 200 code, which makes Netlify handle it as a rewrite rather than a redirection (i.e. the visitor still sees the original URL). An exclamation mark is needed after the 200 code if a "query-string-less" version (such as mypage.php.html) exists, to tell Netlify we are intentionally shadowing.
/mypage.php param1=:param1 param2=:param2 /mypage.php#param1=:param1&param2=:param2.html 200!
/mypage.php param1=:param1 /mypage.php#param1=:param1.html 200!
/mypage.php param2=:param2 /mypage.php#param2=:param2.html 200!
If not all query parameter combinations are actually used in the dumped files, not all of the redirect lines need to be included of course.
There's no need for a final /mypage.php /mypage.php.html 200 line, since Netlify automatically looks for a file with a .html extension added to the requested URL and serves it if found.
I wrote a _headers file to set the content type of my RSS file:
/rss.php
Content-Type: application/rss+xml
I hope this helps somebody.

Apache httpd mod_include - handle include of 400+ responses with blank

I want to you Apache 2.2 httpd to SSI include URLs using
<!--#include virtual="/content/foo.html" -->
My problem is if, the SSI included page doesnt exist on my App server, it responds with a 404 response and a default error page HTML, which is then stitched into my page via the include.
For failing (4xx,5xx) SSI includes I simply want the SSI include to add the empty string to my page.
It doesn't appear Apache 2.2. supports the 'onerror' directive (which I think would solve this) - and i dont see any other options.
http://httpd.apache.org/docs/2.2/mod/mod_include.html
You could potentially add a rewrite to handle those portions of your application's URI space, but I'd advise against it. The approach being investigated seems to not fix the main problem: the concept of SSIs hinges on the files being included should be consistently available. If the included files are returning 4xx or 5xx class errors, the onus is on you to fix these errors.

Apache throwing 403 on www.example.com/dir/.php

I've discovered from Google Webmaster Tools that Google has found a bad link somewhere that is throwing a 403 error on my server.
The url is like this:
http://www.example.com/directory/.php
I don't know how that url has come about and the site is too complicated for me to find out, but I'd like to simply place a 301 redirect to:
http://www.example.com/directory/
I've put the correct rule in .htaccess for the redirect, but it doesn't appear to be triggered. It's almost as if the 403 is being generated before .htaccess is processed. Does anyone know why this might be and how I can successfully get the user redirected to the new page?
Except for the fact that the rule may be indeed incorrect, this could also be because this involves looking for a file with a filename starting with a ..
These files are usually "hidden" from the outside world, so it might be that either your Apache (configuration) or your OS does not allow serving this file.
Without knowing more about the configuration, this cannot be analyzed though. You could test it by creating a file called ".test" and see if that's reachable.

Showing non-ascii characters in URL

I'm trying to make a page that will show Arabic/Hebrew in the URL.
for example: www.mydomain.co.ar/אבא.php
Problem is, when i upload the page to the Apache server and try to browse to that
page either with "www.mydomain.co.ar/אבא.php" or the percent encoding way
"www.mydomain.co.ar%D7%90%D7%91%D7%90.php" i get a 404.
Then i list the directory and apache sees àáà.php.
I know there is a way to show up non ASCII in url, wikipedia is doing it for ages.
My thoughts are maybe .htaccess rewrite? if so how can i accomplish that?
Looks like you have to tell apache that the file system is encoded in UTF-8 (or whatever). Maybe starting apache with an UTF-8 locale active (LC_CTYPE=ar.utf8 or similar) helps there.
Wikipedia parses the URLs in the PHP software (and then asks the database about the right article), so this does not necessarily say how Apache does this.

Issues with FastCGI and links containing index.php? versus index.php

On a Windows 2003 server running IIS 6.0 and FastCGI with an ExpressionEngine-powered website, I've encountered an issue where links containing index.php fail unless a question mark is added.
The basic issue is that if a link points to "index.php/archive/article", the page fails to load (see below) but it will work when "index.php?/archive/article" is used.
What happens when the "index.php" links fail is the URL will change in the browser address bar, but the main page content is still displayed. Append a question mark to "index.php" and the page loads properly.
The site was previously running with ISAPI as the Server API with no issues: the server saw "index.php" and "index.php?" as being synonymous and pages with "index.php" in the path would load as expected.
How would I configure setting somewhere which would tell FastCGI to treat "index.php" and "index.php?" the same way?
I am a bit green when it comes to Windows servers; my experience is mostly with Apache servers running on Unix boxes.
Any guidance or pointers would be most appreciated.
One option is that you could simply enable EE's force URL query string option.
But, if you don't like having the question mark in the URL, you can try this workaround.
I can't say that I know anything about Windows servers, but this has worked for me on Apache servers when running PHP as CGI. Best of luck!