Designing a service for scale. Number of servers needed

Designing a service for scale. Number of servers needed - apache

Suppose that I need to design a web service. To keep it simple, assume that I use LAMP (Linux-Apache-MySQL-PHP).
I know that I will serve exactly N user requests per second. The requests are basically simple CRUD operations to the database, no file uploads or complex calculations.
Suppose that each request executes M ms and takes K Mb of memory on my server, having G Gb of RAM.
How many such servers do I need? Is it just N * K / G?
The reasonable value for M is 200ms. What is the reasonable value for K?
Do we need to take CPU power into account in this question?
Any additional considerations?

What you're doing is a good back of the envelope approximation but by no means should you use your thought exercise as a definitive guide for scaling your service.
That is because no service will exhibit that type of constant behavior as you describe (blame if on unpredictable peripheral i/o, garbage collection, external factors, user input, etc)
The correct approach is to perform scale and load testing. After you've written your service, start to load test your service and note the performance characteristics of your service. If you do things right you should reach a point where your configuration maxes out: either the CPU, the network throughput, the memory, or disk I/O. If neither are maxed out and you hit a limit then it's one of your upstream dependencies (your database etc.)
Once you've reached your peak it will tell you how many requests per second you can handle at peak.
You will also notice that in most cases peak performance is not sustainable: your setup may be able to burst for short periods of time handling many more requests per second than under sustained load.
After you get the numbers for a single server, you can start to vary in two ways:
test with different hardware configurations (add more RAM if you're memory bound, add a better CPU if you're CPU bound, etc)
test with multiple servers; start adding servers and see how your service scales horizontally
Ideally your service should scale linearly as you add servers but you will likely find that the performance curve is not linear.
Get your numbers, tweak your design. Rinse. Repeat.
There is no substitute, magic formula.

Related

Redis > Isolate keys with large values?

It's my understanding that best practice for redis involves many keys with small values.
However, we have dozens of keys that we'd like to have store a few MB each. When traffic is low, this works out most of the time, but in high traffic situations, we find that we have timeout errors start to stack up. This causes issues for all of our tiny requests to redis, which were previously reliable.
The large values are for optimizing a key part of our site's functionality, and a real performance boost when it's going well.
Is there a good way to isolate these large values so that they don't interfere with the network I/O of our best practice-sized values?
Note, we don't need to dynamically discover if a value is >100KB or in the MBs. We have a specific method that we could have use a separate redis server/instance/database/node/shard/partition (I'm not a hardware guy).

Just install/configure as many instances as needed (2 in the case), each managing independently on a logical subset if keys (e.g. big and small), with routing done by the application. Simple and effective - divide and converter conquer

The correct solution would would be to have 2 separate redis clusters, one for big sized keys, and another one for small sized keys. These 2 clusters could run on the same set of physical or virtual machines, aka multitenancy (You would want to do that to fully utilize the underlying cores on your machine, as redis server is single threaded). This way you would be able to scale both the clusters separately, and your problem of small requests timing out because of queueing behind the bigger ones will be alleviated.

With http2, does number of XHRs have any effect on performance if overall data size is the same?

As far as I know, HTTP/2 no longer uses separate TCP connections for every request, which is the main performance-booster of the protocol.
Does that mean it doesn't matter whether I use 10 XHRs with 10kB of content each or one XHR with 100kB and then split the parts client-side?

A precise answer would require a benchmark for your specific case.
In more general terms, from the client point of view, if you can make the 10 XHR at the same time (for example, in a tight loop), then what happens is that those 10 requests will leave the client more or less at the same time, incur in the latency between the client and the server, be processed on the server (more or less in parallel depending on the server architecture), so the result could be similar of a single XHR - although I would expect the single request to be more efficient.
From the server point of view, however, things may be different.
If you multiply by 10 what could have been done with a single request, now your server sees a 10x increase in request rate.
Reading from the network, request parsing and request dispatching are all activities that are heavily optimized in servers, but they do have a cost, and a 10x increase in that cost may be noticeable.
And that 10x increase in request rate to the server may impact the database as well, the filesystem, etc. so there may be ripple effects that can only be noticed by actually performing the benchmark.
Other things that you have to weigh are the amount of work that you need to do in the server to aggregate things, and to split them on the client; along with other less measurable things like code clarity and maintainability, and so forth.
I would say that common pragmatic judgement applies here: if you can make the same work with one request, why making 10 requests ? Do you have a more specific example ?
If you are in doubt, measure.

Redis mimic MASTER/MASTER? or something else?

I have been reading a lot of the posts on here and surfing the web, but maybe I am not asking the right question. I know that Redis is currently Master/slave until Cluster becomes available. However, I was wondering if someone can tell me how I would want to configure Redis logistically to meet my needs (or if its not the right tool).
Scenerio:
we have 2 sites on opposite ends of the US. We want clients to be able to write at each site at a high volume. We then want each client to be able to perform reads at their site as well. However we want the data to be available from a write at the sister site in < 50ms. Given that we have plenty of bandwidth. Is there a way to configure redis to meet our needs? our writes maximum size would be on the order of 5k usually much less. The main point is how can i have2 masters that are syncing to one another even if it is not supported by default.

The catch with Tom's answer is that you are not running any sort of cluster, you are just writing to two servers. This is a problem if you want to ensure consistency between them. Consider what happens when your client fails a write to the remote server. Do you undo the write to local? What happens to the application when you can't write to the remote server? What happens when you can't read from the local?
The second catch is the fundamental physics issue Joshua raises. For a round trip you are talking a theoretical minimum of 38ms leaving a theoretical maximum processing time on both ends (of three systems) of 12ms. I'd say that expectation is a bit too much and bandwidth has nothing to do with latency in this case. You could have a 10GB pipe and those timings are still extant. That said, transferring 5k across the continent in 12ms is asking a lot as well. Are you sure you've got the connection capacity to transfer 5k of data in 50ms, let alone 12? I've been on private no-utilization circuits across the continent and seen ping times exceeding 50ms - and ping isn't transferring 5k of data.
How will you keep the two unrelated servers in-sync? If you truly need sub-50ms latency across the continent, the above theoretical best-case means you have 12ms to run synchronization algorithms. Even one query to check the data on the other server means you are outside the 50ms window. If the data is out of sync, how will you fix it? Given the above timings, I don't see how it is possible to synchronize in under 50ms.
I would recommend revisiting the fundamental design requirements. Specifically, why this requirement? Latency requirements of 50ms round trip across the continent are usually the sign of marketing or lack of attention to detail. I'd wager that if you analyze the requirements you'll find that this 50ms window is excessive and unnecessary. If it isn't, and data synchronization is actually important (likely), than someone will need to determine if the significant extra effort to write synchronization code is worth it or even possible to keep within the 50ms window. Cross-continent sub-50ms latency data sync is not a simple issue.
If you have no need for synchronization, why not simply run one server? You could use a slave on the other side of the continent for recovery-only purposes. Of course, that still means that best-case you have 12ms to get the data over there and back. I would not count on
50ms round trip operations+latency+5k/10k data transfer across the continent.

It's about 19ms at the speed of light to cross the US. <50ms is going to be hard to achieve.
http://www.wolframalpha.com/input/?i=new+york+to+los+angeles

This is probably best handled as part of your client - just have the client write to both nodes. Writes generally don't need to be synchronous, so sending the extra command shouldn't affect the performance you get from having a local node.

How online-game clients are able to exchange data through internet so fast?

Let's imagine really simple game... We have a labirinth and two players trying to find out exit in real time through internet.
On every move game client should send player's coordinates to server and accept current coordinates of another client. How is it possible to make this exchange so fast (as all modern games do).
Ok, we can use memcache or similar technology to reduce data mining operations on server side. We can also use fastest webserver etc., but we still will have problems with timings.
So, the questions are...
What protocol game clients are usually using for exchanging information with server?
What server technologies are coming to solve this problem?
What algorithms are applied for fighting with delays during game etc.

Usually with Network Interpolation and prediction. Gamedev is a good resource: http://www.gamedev.net/reference/list.asp?categoryid=30
Also check out this one: http://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking

use UDP, not TCP
use a custom protocol, usually a single byte defining a "command", and as few subsequent bytes as possible containing the command arguments
prediction is used to make the other players' movements appear smooth without having to get an update for every single frame
hint: prediction is used anyway to smooth the fast screen update (~60fps) since the actual game speed is usually slower (~25fps).

The other answers haven't spelled out a couple of important misconceptions in the original post, which is that these games aren't websites and operate quite differently. In particular:
There is no or little "data-mining" that needs
to be speeded up. The fastest online
games (eg. first person shooters)
typically are not saving anything to
disk during a match. Slower online
games, such as MMOs, may use a
database, primarily for storing
player information, but for the most
part they hold their player and world data in memory,
not on disk.
They don't use
webservers. HTTP is a relatively slow
protocol, and even TCP alone can be
too slow for some games. Instead they
have bespoke servers that are written just for that particular game. Often these servers are tuned for low latency rather than throughput, because they typically don't serve up big documents like a web server would, but many tiny messages (eg. measured in bytes rather than kilobytes).
With those two issues covered, your speed problem largely goes away. You can send a message to a server and get a reply in under 100ms and can do that several times per second.

Algorithmically suggest best node to perform demanding computation

At work we perform demanding numerical computations.
We have a network of several Linux boxes with different processing capabilities. At any given time, there can be anywhere from zero to dozens of people connected to a given box.
I created a script to measure the MFLOPS (Million of Floating Point Operations per Second) using the Linpack Benchmark; it also provides number of cores and memory.
I would like to use this information together with the load average (obtained using the uptime command) to suggest the best computer for performing a demanding computation. In other words, its 3:00pm; I have a meeting in two hours; I need to run a demanding process: what node will get me the answer fastest?
I envision a script which will output a suggestion along the lines of:
SUGGESTED HOSTS (IN ORDER OF PREFERENCE)
HOST1.MYNETWORK
HOST2.MYNETWORK
HOST3.MYNETWORK
Such suggestion should favor fast computers (high MFLOPS) if the load average is low and, as load average increases for a given node, it should favor available nodes instead (i.e., I'd rather run in a slower computer with no users than in an eight-core with forty dudes logged in).
How should I prioritize? What algorithm (rationale) would you use? Again, what I have is:
Load Average (1min, 5min, 15min)
MFLOPS measure
Number of users logged in
RAM (installed and available)
Number of cores (important to normalize the load average)
Any thoughts? Thanks!

You don't have enough data to make an well-informed decision. It sounds as though the scheduling is very volatile: "At any given time, there can be anywhere from zero to dozens of people connected to a given box." So the current load does not necessarily reflect the future load of the machines.
To properly asses what hosts someone should use to minimize computation time would require knowing when the current jobs will terminate. If a powerful machine is about to be done doing most of its jobs, it would be a good candidate even though it currently has a high load.
If you want to guess purely on the current situation, you can do a weighed calculation to find out which hosts have the most MFLOPS available.
MFLOPS available = host's MFLOPS + (number of logical processors - load average)
Sort the hosts by MFLOPS available and suggest them in a descending order.
This formula assumes that the MFLOPS of a host is linearly related to its load average. This might not be exactly true, but it's probably fairly close.
I would favor the most recent load average since it's closer to the current/future situation, whereas, jobs from 15 minutes ago might have completed by now.

Have you considered a distributed approach to computation? Not all computations can be broken up such that more than one cpu can work on them. But perhaps your problem space can benefit from some parallelization. Have a look at Hadoop.

You don't need to know FLOPS. beowulf modules paralell computing center has I go to has the script for sure
PDC operates leading-edge, high-performance computers on a national level. PDC offers easily accessible computational resources that primarily cater to the ...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas