Do i need a Content Delivery Network If my audience is in one city? - optimization

So ive asked question earlier about having some sort of social network website with lots of images and the problem is the more users , the more images the website will have and i was afraid it would take a LONG time for the images to load on the client side.
How to handle A LOT of images in a webpage?
So the feedback i got was to get a content delivery network. Base on my limited knowledge of what a content delivery network is, it is series of computures containing copies of data and clients access that certain servers/computers depending where they are in the world? What if im planning to release my website only for a university, only for students. Would i need something like a CDN for my images to load instantly? Or would i need to rent a REALLY expensive server? Thanks.

The major hold up for having lots of images is the number of requests the browser has to make to the server, and then, in turn, the number of requests the server has to queue up and send back.
While one benefit of a CDN is location (it will load assets from the nearest physical server) the other benefit is that it's another server. So instead of one server having to queue up and deliver all 20 file requests, it can maybe do 10 while the other server is simultaneously doing 10.
Would you see a huge benefit? Well, it really doesn't matter at this point. Having too much traffic is a really good problem to have. Wait until you actually have that problem, then you can figure out what your best option is at that point.

If you're target audience will not be very large, you shouldn't have a big problem with images loading. A content delivery network is useful when you have a large application with a distributed userbase and very high traffic. Underneath that, and you shouldnt have a problem.

Hardware stress aside, another valuable reason for using a CDN is that browsers limit the number of simultaneous connections to one host, so let's say the browser is limited to 6 connections and you have in one page load 10 images, 3 CSS files and 3 javascript files. If all 10 of those images are coming from one host, then it will take a while to get through all 16 of those connections. If however, the 10 images are loaded from a CDN that uses different hosts, that load time can be drastically reduced.

Even if all your users are geographically close, they may have very different network topologies to reach your hosting provider. If you choose a hosting provider that has peering agreements with all the major ISPs that provide service in your town, then a CDN may not provide you much benefit. If your hosting provider has only one peer who may also be poorly connected to the ISPs in your town, then a CDN may provide a huge benefit, if it can remove latency from some or all of your users.
If you can measure latency to your site from all the major ISPs in your area to your hosting provider, that will help you decide if you need a CDN to help shorten the hops between your content and your clients.

Related

Is WebRTC too privacy invasive to use for video chat without TURN servers?

I'd like to implement a simple video chat system for students to tutor each other. I'm a one man show, and would like a system I can run in a cost effective way starting with 10 users, and hopefully scale up as needed.
WebRTC seems like a great, low latency, and cheap option to build this feature. However, if clients are communicating, then they must know each other's public IP. Is this a significant privacy or security issue?
What is the worst case scenario of somebody getting my IP address? Wouldn't any malicious actor have to get through my ISP to get my specific location?
Thanks!
If you host it yourself, WebRTC can be extremely cost-effective. I've been running the SFU at galene.org (disclaimer: I'm the main developer), which is used for multiple lectures with up to a hundred students. Even though this is a full-fledged SFU (and not a mere TURN server), hosting amounts to just over €6/month.
If your tutoring sessions involve just two or three people, then peer-to-peer WebRTC might be enough, but even then a TURN server will be required, especially if some of your users are on university networks. For larger groups, you will need to push your traffic through an SFU.
If you do peer-to-peer WebRTC, then any user can learn the IP of any user they are communicating with; this is most probably not an issue, since the IP addresses are most probably already being disclosed (e.g. in mail headers). If you go though an SFU, then the IP addresses are not deliberately disclosed, but they might still leak; for example, the SFU implementation mentioned above (Galene) discloses IP addresses when a user initiates a file transfer since file transfers happen directly between clients, in a peer-to-peer fashion. (It may be possible to avoid this disclosure by setting the iceTransportPolicy field to relay in the PeerConnection constructor, but I haven't tested how effective it is.)
WebRTC doesn't have to be P2P. You could run a SFU. Each user will upload their video to your server, and the server will distribute via WebRTC. Then the users will never know each others IPs.
I don't have any exact numbers, but it isn't expensive either. Your biggest expense will probably be bandwidth. Lots of Open Source SFUs exist, this is a good list to get started.

Would a google cloud platform machine with many different CPUs allow me to run API requests through several differen IP addresses?

I am trying to query public utility data from an API (oasis.caiso.com) with a threaded script in R. Apparently this API will refuse requests from certain IP addresses if too many are made. Therefor I need to run many different API requests in parallel across different IP addresses, and am wondering if a machine with many different CPUs on google cloud platform will allow this?
I was looking at the n1-highcpu-96 option from this page: https://cloud.google.com/compute/docs/machine-types
If this is a poor solution can anyone suggest another distributed computing solution that can scale to allow dozens or even hundreds of API queries simultaneously from different IPs?
If I needed multiple IP to perform "light" API calls I would not scale vertically (with a machine having 96 core). I would create an instance group with 50 or 100 or n Debian micro or small preentible instances with the size depending on the kind of computation you need to perform.
You can set up a startup script loaded in the metadata or in a custom image that connects to the API server do what it has to do and save the result on a bucket and if the instance get a "API refuse" I would simply kill the instance automatically having the instances group creating a new one for me with possibly a new IP.
This I think is a possible easy solution to achieve what you want, but I guess there are multiple solutions.
I am not sure what you are trying to achieve and I think you need to check first that it is legal and if the owner of the API agree.

How many active, simultaneous connections can a web server accept?

I know this is a difficult question but here it is, in context:
Our company has a request to build a WordPress website for a certain client. The caveat is that, on one day per year, for a period of about 20 minutes, 5,000 - 10,000 people will attempt to access the home page of this website. Their purpose: Only to acquire an outbound link to another site.
My concern is, no matter what kind of hosting we provide, the server may reject the connections after a certain number of connections are reached.
Any ideas on this?
This does not depend on WordPress. WordPress is basically software to render webpages: it helps you to quickly modify the content content of a page. Other software like for instance Apache accepts connections and redirects the calls to for instance WordPress.
Apache can be configured to accept more connections. I think the default is about 200. If that's bad really depends. If the purpose is only to give another URL, you can say that connections will be terminated fast. So that's not really an issue. If on the other hand you want to generate an entire page using PHP and MySQL it can take some time before a client is satisfied. In that case 200 connections are perhaps not sufficient.
As B-Lat points out. You can use cloud computing platforms like Google App Engine or Microsoft Azure that provide a lot of server power. But only bill their clients on the consumption on these resources. In other words you can accept thousands of connections at once. But you don't need to pay for the other days when clients visit your website less often.

Is it a bad idea to have a web browser query another api instead of my site providing it?

Here's my issue. I have a site that provides some investing services, I pay for end of day data which is all I really need for my service but I feel its a bit odd when people check in during the day and it only displays yesterdays closing price. End of day is fine for my analytics but I want to display delayed quotes on my site.
According to the yahoo's YQL faq: If you use IP based authentication then you are limited to 1000 calls/day/IP, if my site grows I may exceed that but I was thinking of trying to push this request to the people browsing my site themselves since its extremely unlikely that the same IP will visit my site 1,000 times a day(my site itself has no use for this info). I would call a url from their browser, then parse the results so I can allow them to view it in the format of the sites template.
I'm new to web development so I'm wondering is it a common practice or a bad idea to have the users browser make the api call themselves?
It is not a bad idea at all:
You stretch up limitations this way;
Your server will respond faster (since it does not have to contact the api);
Your page will load faster because the initial response is smaller;
You can load the remaining data from the api in async manner while your UI is already responsive.
Generally speaking it is a great idea to talk with api's from the client. It's more dynamic, you spread traffic, more responsiveness etc...
The biggest downside I can think of is depending on the availability of other services. On the other hand your server(s) will be stressed less because of spreading the traffic.
Hope this helped a bit! Cheers!

Being cost effected with a bandwidth for a streaming service

Basically, I'm about to launch a music streaming app, and I'm trying to figure out cost.
Cloud services like S3 and RackSpace cloud are expensive. As far as scalability is concerned... I'm assumign that an average user listens to music for an hour and lets say our app scales to 100,000's of users. It's about 90MB / hour per user of bandwidth... Let's make another assumption and say that we have an average of 10,000 concurrent users streaming music in a 24 hour period (90MB (avg/hr) * 10k * 24 = 21,600,000MB = ~20.5 TB)... That's a shit load of bandwidth! According to Rackspace's pricing, that's $3,780 USD per day... holllllly crap! Anoher thing, services like Rdio, Grooveshark, etc have roughly 15 million (licensed) songs... If I through that into the mix, that's 15,000,000 * 3MB (avg song) = 43,945GB = $4,300 a month.
So at these rates, companies like Rdio and Grooveshark, etc, in no way pay this much.
So my question is simple... generally, what are some routes to take when creating a streaming service? Being specific would earn my vote! (AKA, links to well rated companies offering cheaper CDN services or unmetered colocation for a flat rate)
Thanks duders!
More)
Application servers will be hosted on Rackspace... but this is somewhat irrelevant considering the fact that I really just need a fast "cdn"
Look at accelerating load balancers like jetNEXUS. They are very simple to set up and use techniques like static caching HTML muxing and compression to dramatically reduce the amount of data hitting the actual servers. This can save you a ton of money in bandwidth costs.
I think Rackspace has some Zeus or Jetnexus offerings, and I know that it's available as an option on Amazon's Cloud.
There are plenty of of ways to reduce that cost. I know Spotify do the following (among other things):
Cache the songs locally.
Use P2P to download from other clients (they mainly use the server to guarantee low latency).
Only allow high bit-rates for paying users.
I recommend you read the following: http://www.csc.kth.se/~gkreitz/spotify-p2p10/
If you're looking for cheap hosting then I suggest you check out: http://www.hetzner.de/. I haven't used them but I've heard lots good things about them.
We've been working on reducing the costs for our high volume email delivery service (http://elasticemail.com) which uses a lot of bandwidth and needs to scale. We found that by switching to OVH we could get much more bandwidth and much more hardware for a lot cheaper and they have great API's to automate a lot of the complexities you'd find in a complex infrastructure.
So kudos to OVH (http://ovh.ie) for saving us a lot of money.
I know Rackspace cloud files for their CDN (which is included in the price) use Akamai. Akamai don't seem to have any pricing around on the web, but they do seem to be expensive after some googling.
I'd try these things.
Tell Rackspace your plans and ask if they can work out some kind of a bulk deal.
Contact Akamai and tell them about your plans and see what they offer.
Google "cheapest content delivery network" and see what comes up.
I think a CDN is what you want, that'd give you the capacity that you need. I don't think it'd be possible to do that much from a simple VPS or cloud provider without a CDN behind it.
Basically, if you're serving a lot of static content, and you're doing it from cloud servers (vps's) it's gonna clog up your pipes at some point, even if you have a few servers, it'll eventually reach capacity, but with a CDN, all the content is pushed out to the nodes, so it basically goes on and on :)
From my experience, Akamai CDN is awesome. I've used it quite a bit (through RS cloud files) and in like 2 years only hit 2 issues, one was the end user's fault for using some far away dns servers, and the other was fixed in about 1.5 days, where a user was in Italy or somewhere and their content was coming out of some other country.
Akamai uses the geo IP database lookup of the DNS server that requests the url to give you the IP of a host nearby. This works great for most people as they'll use their ISPs DNS servers for doing lookup.
On the plus side, most users get ping times much smaller than if they downloaded it from America, for example in Gold Coast, my ping time to akamai is about 20-50 msecs, to USA it's 250-400 msecs.
Update: After doing some googling myself, this looked promising: http://24ways.org/2008/using-google-app-engine-as-your-own-cdn - they're saying to use google app engine as a CDN. On the plus side, last time I checked you could do that for free, but on the downside, I wouldn't base a business by planning that that would stay free; going by google's history of releasing free things, then later charging for them or dropping them.