Why does Apache get a 404 when trying to include SSI from another server? - apache

I have two web sites that produce data that I would like to combine into one web page. Site one (the "main" site) produces most of the web page, and site two contains additional data that I want to include on that page.
I figure the best way would be to use SSI to include data from site two into the web page produced by site one. Apache on site one seems to contact site two properly, but site two returns a 404. If I contact site two directly with a browser, using the exact same URL that site one is using, I get proper data. Why wouldn't Apache on site one get the same data?
I've tried two ways to include the data from site two, one directly and one using a reverse proxy, but neither works. Other (local) SSIs and reverse-proxies work fine on this page. These are the two include lines:
<!--#include virtual="/servertwodata" -->
<!--#include virtual="http://www.servertwo.com/execs/somescript.sh?task=overview" -->
The error that I get in the Apache error log is:
unable to include "http://www.servertwo.com/execs/somescript.sh?task=overview" in parsed file /var/www/html/index.html, subrequest returned 404, referer: http://www.serverone.com/index.html
Does anyone have a clue why Apache on site one would get a 404 from site two, but the exact same URL in a generic browser would get the data fine?

From the documentation:
The value is a (%-encoded) URL-path. The URL cannot contain a scheme or hostname, only a path and an optional query string. If it does not begin with a slash (/) then it is taken to be relative to the current document.
In short: It doesn't support external URLs.

It turns out that reverse proxy does discards any query string that is included as part of the proxypass configuration parameter. I removed the query string from the proxypass parameter and moved it to the SSI include virtual tag, and it no longer gets the 404 from site two.
The response that is put on the web page is scrambled hieroglyphics, but what's causing that is a different question.

Related

How to Avoid a Mixed-Content Error When Displaying a Search Result?

Question:
How can I include both https: and http: results from a single domain in a Google custom search engine but display any such result in an iframe with a secure parent window?
How It's Structured:
My Google custom search engine currently searches "mydomainname.com/directory/" with the option to "Include all pages whose address contains this URL". It operates on a specific page of the website to search pages within the specified directory. The Link Target set in Websearch Settings is an iframe on the same page as the search bar.
The browser window and the iframe src are both on the same secure domain. And since the search results are all from a directory within the site structure, are all on this same domain as well.
Currently some results appear as "https://..." and some appear "www...". Obviously, this creates a mixed-content error when the browser window is https:// and an attempt is made to display a http:// search result in the iframe.
The results that are http:// will, of course, also work as https:// urls. I do not know what makes a page or file appear in the search results as "www." or "https://" when they all originate from a single secure domain.
The "http://" results appear even if I specify the site to be searched as https://www.mydomainname.com/directory/. I don't want to exclude these results, but I want them to be able to be displayed when browsing the site securely.
The Objective:
So the bottom-line rule that I need to work around is that insecure pages or files cannot be loaded into an iframe on a secure web page. I obviously want users to be able to utilize the https:// site but then I need the search to function in such a way that allows for all possible search results for these users.
The reason I need the results' target to be this iframe is that this is the frame that displays all the content of the web page. The search results work in harmony with the organization of other information. Such that choosing a link from a category in the page's navigation and choosing a search result from the custom search result display the chosen content into the same location, the iframe.
What I've Tried:
I've tried designating https:// specifically in the Google Search Engine (gse) settings and removing : 'http' from the script line gcse.src =(document.location.protocol == 'https:' ? 'https:' : 'http:') + '//cse.google.com/cse.js?cx=' + cx;.
I looked in the script file that it's linking to: http://cse.google.com/cse.js?cx=012685392925564329750:ghl2znnfada but I can't decipher what might need to be changed in it.
In the error log on the console I don't see much to be relevant except for the expected inability to load insecure pages while browsing securely. But there is this that looks like (maybe) it's relevant? though I could be completely wrong because I can't really decipher it either:
Mixed Content: The page at
'https://mydomainname.com/directory/index.php' was loaded over HTTPS,
but requested an insecure script 'http://www.google.com/jsapi?
key=ABQIAAAAdCtw6Xq1Q31YAr7VSQOSvxS5g7WKqCWUBuUdhz3-
rUOumR2saRSPGvey2WjYALW7f5_JzakSL3lAEg'. This request has been blocked;
the content must be served over HTTPS.
Insecure Script from Error Message:
http://www.google.com/jsapi?key=ABQIAAAAdCtw6Xq1Q31YAr7VSQOSvxS5g7WKqCWUBuUdhz3-rUOumR2saRSPGvey2WjYALW7f5_JzakSL3lAEg
Proposed Paths to a Solution:
I am open to any solution methods that may be possible. I have considered several routes but am not sure how to properly execute them or have failed in my attempts to execute them.
Some solutions I thought may work are:
Show all results as https:// links (without excluding any) so that they can be accessed whether on a secure connection to the site or not.
Redirect any links clicked without https:// to be loaded into the iframe as https://
Change something about the pages and files on the server so that they only appear in the search results as https://
Change something about Google's search engine script so it parses all found results as https://
Somehow show links as http:// if browsing non-secure, and https:// if browsing secure *
*I don't know how viable or efficient this would be
The most robust solution is to migrate all your website in https :
use 301 (permanent) redirect from http to https
and activate HSTS (if possible with includeSubdomains)
Google will take a little time to update his index but the HSTS will automatically replace http by https so you should avoid any mixed content issues.

Set Base URL using .htaccess

I'm setting up a clients area so my customers can review their site during development. I want to set it up so the URL is http://clients.mydomain.com/clientname/
Is there a way in the .htaccess file to set that as the base URL? I'm using the leading / format for my URLs in the page (ie /about/ or /css/), which will is fine locally & when I deploy to production, but doesn't work in the scenario outlined above.
The proper way would be to use relative links in your HTML, it's unreliable to try to track the referer and rewrite every subsequent request to shove the /clientname/ back in as a prefix.
If you make a subdomain for each customer, and develop sites there, you don't have to change the base URL. This will prevent other htaccess rules to break also when deploying to the live server...
So use:
http://clientname.mydomain.com

redirect a subdomain to a remote url, preferably via only DNS setting or with httpd.conf, without changing url displayed

For example I wish to redirect list.mydomain.com to http://my.emailingapp.com/lists/
but keeping the name displayed in URL as "list.mydomain.com".
Note that all parameters are to be passed over. e.g. list.mydomain.com/?stuff=a should be the same with http://my.emailingapp.com/lists/?stuff=a
Another note: these domains are on different server.
There are many other similar posts, but all of them does not work exactly as I wanted to.
Adding a CNAME record for list.mydomain.com to my.emailingapp.com using which you can achieve the following.
URL remains list.mydomain.com
The arguments get displayed in URL as list.mydomain.com/?stuff=a
Your requirement of having /lists/ should be implementable by URL rewrite rules.

SEO Canonical Issue resolution on iis

i have a site running on IIS that i have Canonical Issue with.
the error is:
The page with URL "http://www.site.org/images/join_forum.gif" can also be accessed by using URL "https://www.site.org/images/join_forum.gif".Search engines identify unique pages by using URLs. When a single page can be accessed by using any one of multiple URLs, a search engine assumes that there are multiple unique pages. Use a single URL to reference a page to prevent dilution of page relevance. You can prevent dilution by following a standard URL format.
how can i resolve this?
If the only difference is http vs https then don't worry about it. Search engines are smart enough to know they are the same file. And especially so for images.

I want all page requests to point to a single page

The wrinkle is that the pages being requested are aspx pages and they are no longer present. I want any request coming to the root domain (and any subdomain like www) to redirect to a single page in the root directory (namely index.html) I went into the IIS admin tool, selected the domain and tried to direct to a url (http://mydomain.com/index.html) but that caused index.html to be appended multiple times and resulted in an error.
What is the best way to do this, so that any http request ot hsi domain goes to the index.html page?
Thanks in advance.
Warren
You can achieve this using the ASP.Net App_Offline feature; if you place a file in the root of your website called App_Offline.htm, the contents of that file will be returned in response to all incoming requests.
Or find your default 404.html file and put some redirection code into it