Can we detect if a site is on CDN? - optimization

Is there a way to detect if a site is on a Content Delivery Network and if yes, can we tell which service are they using?

A method that is achievable from the command line is using the 'host' command, with the -a flag set to see the DNS record e.g.
host -a www.visitbritain.com
Returns:
www.visitbritain.com. 0 IN CNAME d18sjq5nyxcof4.cloudfront.net.
Here you can see that the CNAME entry tells us that the site is using cloudfront as the CDN.

Just take a look at the urls of the images (and other media) of the site.
Reverse lookup IP's of the hostnames you see there and you will see who own them.

I built this little tool to identify the CDN used by a site or a domain, feel free to try it.
The URL: http://www.whatsmycdn.com/

You might also be able to tell from the HTTP headers of the media if the URL doesn't give it away. For example, media served by SimpleCDN has Server: SimpleCDN 5.6a4 in its headers.

cdn planet now have their cdn finder tool on github
http://www.cdnplanet.com/blog/better-cdn-finder/ The tool installs on the command line and allows you the feed in host names and check if they use a CDN.

If Website using GCP CDN you simply check it using curl
curl -I <https://site url>
In reponse you can find following headers there available
x-goog-metageneration: 2
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 17393
x-goog-meta-object-id: 11602
x-goog-meta-source-id: 013dea516b21eedfd422a05b96e2c3e4
x-goog-meta-file-hash: cf3690283997e18819b224c6c094f26c

Yes you can find by
host -a www.website.com

Apart from some excellent answers already posted here which include some direct methods which may or may not work for all the websites out there, there is also an indirect way to see if a CDN is there. And especially if its your own website and you want to know if you are getting what you are paying for !
The promise of a CDN is that connections from your users are terminated closer to them so that they get less TCP / TLS connection establishment overhead and static content is cached closet to them so that it loads faster, puts less strain on your origin servers.
To verify this, you can take measurements of site load times across the globe and see if all the users get similar loads times. No you dont have to get a machine everywhere in the world to do that ! Someone has already done that for you
Head to https://prober.tech/ and the URL you wish to test for load times.
Because this site itself is in Cloudflare's CDN, you can put that link itself in the test box and use it as baseline !
More information on using the tool can be found here

Related

How to create a friendly url in Tomcat?

I want to modify my application URL from //localhost:8080/monitor/index.html to just monitor , so that on putting monitor on browser, my application should open. Is there a way to achieve this, can someone suggest the configuration changes which will be required for this.
Can I map my short URL to the existing one may be somewhere in web.xml. I am not sure about the approach any suggestions will be great.
Thanks and regards
Deb
You're mixing up several different protocol layers in your question.
If you just enter nothing but "monitor" in the browser URL bar the browser is going to first lookup "monitor" in DNS and finding nothing it will then probably send a query to Google or your configured search engine. In the past browsers have taken other steps, such as appending ".com" and prepending "www." but I don't think modern browsers do that any more.
So far, your server is not even remotely involved.
If you're a large ISP user (TimeWarner, Comcast) and use their DNS it's also possible the ISP will intercept your failed DNS lookup and route the request to a "helpful" search page (i.e. SPAM) of their own.
At this point the request is still nowhere near your server.
I suppose you could mess with the /etc/hosts file on your local system to resolve "monitor" to the proper hostname, but that's an extremely brittle solution that has to be hard coded on each machine you want to have this "shortcut" link (and which breaks when the hostname changes).
You're much better off just setting up a web shortcut in your browser that points to the right place.

Strange domains in mod_pagespeed cache folder

About a year ago I have installed mod_pagespeed on my VPS server, set it up and left it running. Recently I was exploring files on my server, went to pagespeed cache folder and discovered some strange folders.
All folders usually named this way ,2Fwww.mydomain.com or ,2F111.111.111.111 for IP addresses. I was surprised to see some domains that does not belong to me, like:
24x7-allrequestsallowed.com
allrequestsallowed.com
m.odnoklassniki.ru
www.fbi.gov
www.securitylab.ru
It looks like something dodgy is going on, was my server compromised, is there any reasonable explanation?
That does look peculiar. Everything in the cache folder should be files that mod_pagespeed tried to rewrite. There are two ways that I know of that this can happen:
1) You reference some third-party resource (say an image from another domain, or google analytics script) and you have explicitly enabled rewriting of that domain with ModPagespeedDomain www.example.com or ModPagespeedDomain *.
2) If your server accepts HTTP requests with invalid Host headers. Try (for example) wget --header="Host: www.fbi.gov" www.yourdomain.com/foo/bar.html. If your server accepts requests like that it may be providing mod_pagespeed with an incorrect base domain, and then subresources would be fetched from the same domain (so if www.yourdomain.com/foo/bar.html references some.jpeg, and your server accepts invalid host headers, we could fetch www.fbi.gov/foo/some.jpeg as the resource). There was a recent security release that makes sure all of these subrequests are done against localhost (not arbitrary third-party websites). Please see: https://developers.google.com/speed/docs/mod_pagespeed/CVE-2012-4001
You might want to look through these folders and see what specific resources are in there. I think that the biggest concern you should have is that someone might be trying to perform an XSS attack on your users or maybe a DDoS attack against another website (like www.fbi.gov), using your server as one vector. I do not think that these folders are indicative that your server itself is compromised.
If you would like to discuss this more, https://groups.google.com/forum/?fromgroups#!forum/mod-pagespeed-discuss is a good list to join and email.

Why is my favicon appearing on Amazon S3 endpoint but not on the forwarded domain?

I have tried everything possible and am out of ideas as to why my favicon is still not appearing. If I told you how much time I've spent trying to figure this out you'd understand why i'm on the verge of losing my mind.
Here's the rundown [i'm not technical- just starting to learn so please bear with me]:
I'm using Amazon S3 as my host. GoDaddy is the DNS and I have forwarding with a mask setup so that the amazon endpoint is directed to the actual domain.
Here's the strange thing-- the favicon appears on the amazon endpoint but doesn't on the forwarded domain which is where I want it to appear. The favicon also appears when I do some testing using Dreamweaver.
I can assure you that it isn't a matter a clearing the cache as I've done that numerous times and have ran tests to make sure that it's working. I've tried all the possible different types of variations of code and nothing works. I'm led to believe that it's not an issue with the code, cache, file but rather something else that is out of my realm of knowledge.
So I come to Stackoverflow.
Please-- any help will be GREATLY appreciated!
For anyone having such problem - making the favicon public and using a direct link found in the file's properties on s3 did the charm.
That means use a full URL that is always going to work from everywhere. Depending on how things are set-up a hostname could resolve to something like localhost on multiple machines, so you want to make sure that the host name you're using always has the resource at that location. CORS should have anything to do with it as it is a standard full GET request.

Adding decision logic to Apache's mod_proxy_balancer with Memcache

What I am trying to achieve is to have Apache's mod_proxy_balancer check if a request was already made using a Memcache store.
Basically:
Streaming media request comes in.
Check if streaming media has already been served with Memcache.
If so, can that streaming media server handle another request.
If so send request to said streaming media server.
If not send request to the next streaming media server in line.
Store key:value pair in Memcache.
My questions are:
Does mod_proxy_balancer already do this in some way?
Is there anyway to make Apache a content-aware load balancer?
Any other suggestions would be greatly appreciated too, other software, other approach, etc.
Cheers.
Looking at 'mod_proxy_balancer.c'; one could, as suggested in the comments in the file, add additional lbmethods. Something along the lines of "bymemcached_t" or "bymemcached_r" where the t and r endings denote the "bytraffic" and "byrequests" methods respectively. We would do our pseudo code above and if not found proceed to the other methods and save the result in the memcached store.
In my research I came across HAProxy which does exactly what I want from its documentation using the balance algorithm option of 'uri' just not using Memcached. Which is fine for my purposes.

How do I mix ssl with non-ssl content?

I have an ssl page that also downloads an avatar from a non-ssl site. Is there anything i can do to isolate that content so that the browser does not warn user of mixed content?
Just an idea - either:
try to use an ssl url on the avatar website, if necessary by editing whatever JS/PHP/... script they provide, or:
use your scripting language of choice to grab a copy of the avatar and store it on your server, then serve it from there.
There are a number of good security reasons for the browser to warn about this situation, and attempting to directly bypass it is only likely to set off more red flags.
Ninefingers' suggestions are good, and I would suggest a third option: you can proxy the content directly through your own server using a simple binary retrieve/transmit script, if it changes frequently and is unsuitable for caching.
If all the content you want to include from foreign sites comes from a specific server and path (i.e. http://other.guy/avatar/*) you could use mod_proxy to create a reverse proxy which makes https://your.site/avatar_proxy/{xyz} mirror http://other.guy/avatar/{xyz} .This will increase your bandwidth usage and probably slow things down.