Why is consistent hashing mentioned in load balancing literature? - load-balancing

I've been reviewing load balancing and I don't understand why the topic of consistent hashing has come up some times. If the goal is to ensure uniform allocation of traffic to appropriate servers, why not just generate a random number [1,M] and assign to that server? (Source: educative.io behind paywall.)
I have read that priority queues can be useful too when certain tasks must be executed in a sequence which not be the order in which the arrived. Yet this doesn't (at least on the surface) be related to consistent hashing either.

Related

What is a 'Partition' in Apache Helix

I am learning Apache Helix. I came across the keyword 'Partitions'.
According to the definition mentioned here http://helix.apache.org/Concepts.html, Each subtask (of a main task) is referred to as a partition in Helix.
When I gone through the recipe - Distributed Lock Manager, partitions are nothing but instances of a resource. (Increase the numOfPartitions, number of locks is increased).
final int numPartitions = 12;
admin.addResource(clusterName, lockGroupName, numPartitions, "OnlineOffline",
RebalanceMode.FULL_AUTO.toString());
Can someone explain with simple example, what exactly the partition in Apache Helix is ?
I think you're right that a partition is essentially an instance of a resource. As is the case in other distributed systems, partitions are used to achieve parallelism. A resource with only one instance can only run on one machine. Partitions simply provide the construct necessary to split a single resource among many machines by, well, partitioning the resource.
This is a pattern that is found in a large portion of distributed systems. The difference, though, is while e.g. distributed databases explicitly define partitions essentially as a subset of some larger data set that can fit on a single node, Helix is more generic in that partitions don't have a definite meaning or use case, but many potential meanings and potential use cases.
One of these use cases in a system with which I'm very familiar is Apache Kafka's topic partitions. In Kafka, each topic - essentially a distributed log - is broken into a number of partitions. While the topic data can be spread across many nodes in the cluster, each partition is constrained to a single log on a single node. Kafka provides scalability by adding new partitions to new nodes. When messages are produced to a Kafka topic, internally they're hashed to some specific partition on some specific node. When messages are consumed from a topic, the consumer switches between partitions - and thus nodes - as it consumes from the topic.
This pattern generally applies to many scalability problems and is found in almost any HA distributed database (e.g. DynamoDB, Hazelcast), map/reduce (e.g. Hadoop, Spark), and other data or task driven systems.
The LinkedIn blog post about Helix actually gives a bunch of useful examples of the relationships between resources and partitions as well.

Algorithm for distributed messaging?

I have a distributed application across which I'd like to replicate a single, eventually consistent state. The data is suitable for a CRDT (http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf) which has the excellent property that each node, given the same set of messages, will deterministically converge to the same value without complicated consensus protocols.
However, I need another messaging/log layer that will ensure that each node actually sees every message, even in the face of adverse network conditions.
Specifically, I'm looking for an algorithm that has the following properties:
Works on an asynchronous network.
Nodes are only necessarily aware of their neighbors, not the whole network.
Nodes may be added or dropped at any time (that is, the network is not of a fixed size or topology).
The network can be acyclic (this can be a requirement, if necessary).
Is capable of bringing up to date a node that has become behind due to temporary network outage or dropped messages.
Is capable of bringing a new, empty node joining the cluster up to date.
There is not a hard limit on the time taken for the network to converge on a value (that is, for every node to recieve every message), but given no partitions it should be fairly quick (in fuzzy terms, a matter of seconds, not minutes).
Is bounded in size. Algorithms that keep the entire message history (which will grow boundlessly) are unsuitable.
Is anyone aware of an algorithm with these properties?

redis sharding, pipelining, and round-trips

Suppose in your web application you need to do a number of redis calls to render a page, like, getting a bunch of user hashes. To speed this up you could wrap up your redis commands in a MULTI/EXEC section, thus using pipelining, so that you avoid doing many round-trips. But you also want to shard your data, because you have lots of it and/or you want to distribute writes. Then pipelining wouldn't work, because different keys would potentially live on different nodes, unless you have a clear idea of the data layout of your application and shard based on roles rather than using a hash function. So, what are the best practices to shard data across different servers without compromising performance too much due to many servers being contacted to complete a "conceptually unique" job? I believe the answer depends on the web application one is developing, and I'll eventually run some tests, but it'd be helpful to know how others have coped with the trade-offs I mentioned.
MULTI/EXEC and pipelining are two different things. You can do MULTI/EXEC without any pipelining and vice versa.
If you want to shard and pipeline at the same time, you need to group the operations to pipeline per Redis instance, and then use pipelining for each instance.
Here is a simple example using Ruby: https://gist.github.com/2587593
One way to further improve performance is to parallelize the traffic on the Redis instances once the operations have been grouped (i.e. you group the operations, you send them to all instances in parallel, you wait for the answers from all instances).
This is a bit more complex, because an asynchronous non blocking client is required. For maximum performance, C/C++ should be used on client side. This can be easily implemented by using hiredis + the event loop of your choice.

Redis mimic MASTER/MASTER? or something else?

I have been reading a lot of the posts on here and surfing the web, but maybe I am not asking the right question. I know that Redis is currently Master/slave until Cluster becomes available. However, I was wondering if someone can tell me how I would want to configure Redis logistically to meet my needs (or if its not the right tool).
Scenerio:
we have 2 sites on opposite ends of the US. We want clients to be able to write at each site at a high volume. We then want each client to be able to perform reads at their site as well. However we want the data to be available from a write at the sister site in < 50ms. Given that we have plenty of bandwidth. Is there a way to configure redis to meet our needs? our writes maximum size would be on the order of 5k usually much less. The main point is how can i have2 masters that are syncing to one another even if it is not supported by default.
The catch with Tom's answer is that you are not running any sort of cluster, you are just writing to two servers. This is a problem if you want to ensure consistency between them. Consider what happens when your client fails a write to the remote server. Do you undo the write to local? What happens to the application when you can't write to the remote server? What happens when you can't read from the local?
The second catch is the fundamental physics issue Joshua raises. For a round trip you are talking a theoretical minimum of 38ms leaving a theoretical maximum processing time on both ends (of three systems) of 12ms. I'd say that expectation is a bit too much and bandwidth has nothing to do with latency in this case. You could have a 10GB pipe and those timings are still extant. That said, transferring 5k across the continent in 12ms is asking a lot as well. Are you sure you've got the connection capacity to transfer 5k of data in 50ms, let alone 12? I've been on private no-utilization circuits across the continent and seen ping times exceeding 50ms - and ping isn't transferring 5k of data.
How will you keep the two unrelated servers in-sync? If you truly need sub-50ms latency across the continent, the above theoretical best-case means you have 12ms to run synchronization algorithms. Even one query to check the data on the other server means you are outside the 50ms window. If the data is out of sync, how will you fix it? Given the above timings, I don't see how it is possible to synchronize in under 50ms.
I would recommend revisiting the fundamental design requirements. Specifically, why this requirement? Latency requirements of 50ms round trip across the continent are usually the sign of marketing or lack of attention to detail. I'd wager that if you analyze the requirements you'll find that this 50ms window is excessive and unnecessary. If it isn't, and data synchronization is actually important (likely), than someone will need to determine if the significant extra effort to write synchronization code is worth it or even possible to keep within the 50ms window. Cross-continent sub-50ms latency data sync is not a simple issue.
If you have no need for synchronization, why not simply run one server? You could use a slave on the other side of the continent for recovery-only purposes. Of course, that still means that best-case you have 12ms to get the data over there and back. I would not count on
50ms round trip operations+latency+5k/10k data transfer across the continent.
It's about 19ms at the speed of light to cross the US. <50ms is going to be hard to achieve.
http://www.wolframalpha.com/input/?i=new+york+to+los+angeles
This is probably best handled as part of your client - just have the client write to both nodes. Writes generally don't need to be synchronous, so sending the extra command shouldn't affect the performance you get from having a local node.

How online-game clients are able to exchange data through internet so fast?

Let's imagine really simple game... We have a labirinth and two players trying to find out exit in real time through internet.
On every move game client should send player's coordinates to server and accept current coordinates of another client. How is it possible to make this exchange so fast (as all modern games do).
Ok, we can use memcache or similar technology to reduce data mining operations on server side. We can also use fastest webserver etc., but we still will have problems with timings.
So, the questions are...
What protocol game clients are usually using for exchanging information with server?
What server technologies are coming to solve this problem?
What algorithms are applied for fighting with delays during game etc.
Usually with Network Interpolation and prediction. Gamedev is a good resource: http://www.gamedev.net/reference/list.asp?categoryid=30
Also check out this one: http://developer.valvesoftware.com/wiki/Source_Multiplayer_Networking
use UDP, not TCP
use a custom protocol, usually a single byte defining a "command", and as few subsequent bytes as possible containing the command arguments
prediction is used to make the other players' movements appear smooth without having to get an update for every single frame
hint: prediction is used anyway to smooth the fast screen update (~60fps) since the actual game speed is usually slower (~25fps).
The other answers haven't spelled out a couple of important misconceptions in the original post, which is that these games aren't websites and operate quite differently. In particular:
There is no or little "data-mining" that needs
to be speeded up. The fastest online
games (eg. first person shooters)
typically are not saving anything to
disk during a match. Slower online
games, such as MMOs, may use a
database, primarily for storing
player information, but for the most
part they hold their player and world data in memory,
not on disk.
They don't use
webservers. HTTP is a relatively slow
protocol, and even TCP alone can be
too slow for some games. Instead they
have bespoke servers that are written just for that particular game. Often these servers are tuned for low latency rather than throughput, because they typically don't serve up big documents like a web server would, but many tiny messages (eg. measured in bytes rather than kilobytes).
With those two issues covered, your speed problem largely goes away. You can send a message to a server and get a reply in under 100ms and can do that several times per second.