How should I handle Spiders/Web Crawlers using HTTP/0.9 if I am using Apache 2? - seo

I am using Apache 2 to serve content, and Bing Bot is using HTTP/0.9 to request pages from my server which does not serve direct IP hosts.
How should I handle the spider if I don't know which host they want, but still need them to index my site?
I currently return 400 Bad Request, but it makes me nervous that my sites will not be indexed for Bing or Yahoo.
Thanks

[SOLVED]: I have been returning 400 Bad Request and Bing/Yahoo have taken the hint.

Related

Can I use Cloudflare to look like multiple websites from one website?

I'm a Cloudflare noob. I have had one site up and running for a while as a way of doing https for my site. I host it on http://www.pishandtish.com (made up name for this example) and through the joys of Cloudflare the world sees it as https://www.pishandtish.com - around the back, Cloudflare is fetching the content from my http://www.pishandtish.com site and proxying it to the world as https://www.pishandtish.com
Pretty straightforward stuff - Cloudflare is proxying my content.
But if, say, I had http://foo.pishandtish.com and http://bar.pishandtish.com, could I use Cloudflare for the rest of the world to see my http://foo.pishandtish.com as https://foo.com, and my http://bar.pishandtish.com as https://bar.com ? (i.e. a way of a cheapskate to do two websites on a single-website hosting plan)
And if so, any clues as to how? Some sort of request rewriting? And can I do it on the free Cloudflare plan?
These are just for very small (zero budget) community groups, and very-low-traffic sites.
OK, what I think you'd need here is for Cloudflare to re-write the Host header of the http request, so that my request for something from foo.com looks like a request for foo.pishandtish.com by the time it arrives at my website (proxied and host-header-rewritten by Cloudflare).
Host-rewriting is a feature for domains on the Cloudflare Enterprise plan.
https://support.cloudflare.com/hc/en-us/articles/206652947-Using-Page-Rules-to-Re-Write-Host-Headers

How can I find out who is hotlinking my content?

I have lots of videos on my website, I am curious to know what websites are hotlinking to it.
I am using cpanel with awstats, I have google analytics too.
The server is running Apache.
Actually you can check Referer header.
If you want block all requests outside of your domain. Here is example for Apache server.
But this technique has 2 disadvantages:
Very-very easy to send faked Referer header
Some browsers in very rear case may not send Referer header at all
Most common way to prevent content from cross-linking is generate dynamic temporary links with limited session time.

How to solve HTTPS response 498 when googlebot comes along?

I have an AJAX site leuker.nl and when googlebot comes along the site is started and it will retrieve an XML file from my backend server that contains site text.
The HTTP GET request used to retrieve the file returns a HTTP error 498.
Looking on LINK it explains that is concerns an invalid/expired token (esri) returned by "ArcGIS for Server".
I don't understand this error, I don't even use ArcGIS and never heard of it before.
Andy idea how to solve this?
In the backend I use Apache Httpd 2.4 in combination with Tomcat 8.0. Apache proxy requests to Tomcat through an ajp connector. The XML file requested is directly returned by Apache.

Browsing to IP and port number displaying raw html 400 bad request

Setup HTTPS and a redirect from HTTP to HTTPS. Browsing to just the IP address with or without HTTP and HTTPS works great and redirects perfectly. But while browsing to X.X.X.X:443 the web server is displaying the 400 bad request in raw html. Can I either disable the 400 bad request or be able to redirect those requests to HTTPS? Please help. Thanks!
If such is possible, it would depend on which web server you are using and you didn't specify that. However...
Doing so would actually be a bad idea as it would encourage people to use HTTP (no S) to connect to your secure server. In doing so, they would send their request in plaintext. If the system just returned a "301 Moved Permanently" to the HTTPS url, the second request (with reply) would be protected but you still would have leaked the request to a potential attacker during the first attempt.

Is there a way to use WCF to redirect all HTTP requests on a certain port

I need to redirect all requests on port 80 of an application server to a web server. I'm trying to avoid the need to install IIS and instead use WCF to do the job.
It looks like an operation such as the one below is suitable but one problem I've got is if a URL of the form http://mydomain.com/ is used then WCF will present a page about metadata.
[OperationContract, WebGet(UriTemplate = "*")]
RedirectToWebServer();
Does anybody know of a way to get WCF behaving the same as IIS in redirect mode?
This just seems like the wrong tool for the job. If you really don't want to use one of the many web servers that could do this with a couple minutes of setup time (IIS, Apache, Lighttpd), you could just make a simple HTTP socket server.
Listen on port 80. As soon as you get two newlines in a row, send back the response:
HTTP/1.1 301 Moved Permanently
Location: http://myothersite.com/whatever
(I'm almost certain that's the minimum you need). If you want to be really fancy and follow HTTP specs, match HTTP/1.1 or HTTP/1.0 based on what the request has.. but for a quick and dirty redirect, that's all you need.
That said, again, I'd say go grab another web server and set up a redirect using it. There are many lightweight HTTP servers that will work.