I need to generate sequential unique IDs for distributed clients.
I considered Redis in standalone mode, Jedis client, INCR command to
increment a counter value against a key shared by all the
clients. AOF always option enabled. This option is not good because of the
standalone Redis server being SPOF and hence downtime for the clients.
The master and replicas with Sentinels setup is also not foolproof
because there will always be a replication lag. It is not guaranteed
that in case of master failure, there would be no data lag. I don't want to use the WAIT command to block in unbounded fashion. Consequently, the replicas would be slightly lagged and hence possible duplicate values might be visible to the clients.
Considering that, is Redis reliable for generating unique id sequence for concurrent clients?
Related
I need to load static data one time in redis in the master node and only when the synchronization is finished for all slaves I am going to be able to read. This is because we are going to have a lot reading and a few writing, and the data is not going to change for a long time.
I read from oficial documentation https://docs.redis.com/latest/rs/concepts/data-access/consistency-durability/, https://docs.redis.com/latest/rs/concepts/data-access/consistency-durability/ and https://redis.io/topics/cluster-tutorial in Redis Cluster consistency guarantees.
I read also Can the WAIT command provide strong consistency in Redis? but without to get a conclusion.
If I use synchronous replication and wait command to check if the replication was successful, do I have some guarantees about consistency ?
By default, a Redis Cluster is not able to guarantee strong consistency. It means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
The reason why Redis Cluster can lose writes is because it uses asynchronous replication, however, you can improve consistency by forcing the database to flush data to disk before replying to the client, but this usually results in prohibitively low performance. That would be the equivalent of synchronous replication in the case of Redis Cluster. Basically, there is a trade-off to be made between performance and consistency, if you are fine with that!
Redis Cluster has support for synchronous writes when absolutely needed, implemented via the WAIT command. This makes losing writes a lot less likely. However, note that Redis Cluster does not implement strong consistency even when synchronous replication is used: it is always possible, under more complex failure scenarios, that a replica that was not able to receive the write will be elected as master.
There is another notable scenario where Redis Cluster will lose writes, that happens during a network partition where a client is isolated with a minority of instances including at least a master.
For example, imagine a 6 nodes cluster composed of A, B, C, A1, B1, C1, with 3 masters and 3 replicas. There is also a client, let's call it Z1.
After a partition occurs, it is possible that in one side of the partition we have A, C, A1, B1, C1, and in the other side we have B and Z1.
Z1 is still able to write to B, which will accept its writes. If the partition heals in a very short time, the cluster will continue normally. However, if the partition lasts enough time for B1 to be promoted to master on the majority side of the partition, the writes that Z1 has sent to B in the meantime will be lost.
Note that there is a maximum window to the amount of writes Z1 will be able to send to B: if enough time has elapsed for the majority side of the partition to elect a replica as master, every master node in the minority side will have stopped accepting writes.
This amount of time is a very important configuration directive of Redis Cluster, and is called the node timeout.
After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas. Similarly, after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.
I have a key (hash type) in redis
Key is
service_status:cluster_1
Value is like below
{
service_1: normal,
service_2: normal,
service_3: normal,
service_4: normal,
service_5: down
...
}
The system is a monitor system. This data is used to store the services status of one cluster.
There are thousands of services in the cluster, so thousands of update request may hit redis to update the same key at the same time.
My concern is how redis handle this? Will there be some lock since these update pointing to the same data?
Redis is single-threaded so there are no "parallel" updates, and therefore no need for locking. Operations in general and updates to a specific hash key, in particular, are executed one at a time.
I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apacheā¢ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.
I'm using Redis as a session store in my app. Can I use the same instance (and db) of Redis for my job queue? If it's of any significance, it's hosted with redistogo.
It is perfectly fine to use the same redis for multiple operations.
We had a similar use case where we used Redis as a key value store as well as a job queue.
However you may want to consider other aspects like the performance requirements for your application. Redis can ideally handle around 70k operations per second and if at some time in future you think you may hit these benchmarks it's much better to split your operations to multiple redis instances based on the kind of operations you perform. This will allow you to make decisions about availability and replication at a more finer level depending on the requirements. As a simple use case once your key size grows you may be able to flush your session app redis or shard your keys using redis cluster without affecting job queing infrastructure.
If redis gets overloaded, can I configure it to drop set requests? I have an application where data is updated in real time (10-15 times a second per item) for a large number of items. The values are outdated quickly and I don't need any kind of consistency.
I would also like to compute parallel sum of the values that are written in real time. What's the best option here? LUA executed in redis? Small app located on the same box as redis using UNIX sockets?
When Redis gets overloaded it will just slow down its clients. For most commands, the protocol itself is synchronous.
Redis supports pipelining though, but there is no way a client can cancel the traffic still in the pipeline, but not yet acknowledged by the server. Redis itself does not really queue the incoming traffic, the TCP stack does it.
So it is not possible to configure the Redis server to drop set requests. However, it is possible to implement a last value queue on client side:
the queue is actually a represented by 2 maps indexed by your items (only one value stored per item). The primary map will be used by the application. The secondary map will be used by a specific thread. The 2 maps content can be swapped in an atomic way.
a specific thread is blocking when the primary map is empty. When it is not, it swaps the content of the two maps, sends the content of the secondary map asynchronously to Redis using aggressive pipelining and variadic parameters commands. It also receives ack from Redis.
while the thread is working with the secondary map, the application can still fill the primary map. If Redis is too slow, the application will only accumulate last values in the primary map.
This strategy could be implemented in C with hiredis and the event loop of your choice.
However, it is not trivial to implement, so I would first check if the performance of Redis against all the traffic is not enough for my purpose. It is not uncommon to benchmark Redis at more than 500K op/s these days (using a single core). Nothing prevents you to shard your data on multiple Redis instances if needed.
You will likely saturate the network links before the CPU of the Redis server. That's why it is better to implement the last value queue (if needed) on client side rather than server side.
Regarding the sum computation, I would try to calculate and maintain it in real time. For instance, the GETSET command can be used to set a new value while returning the previous one.
Instead of just setting your values, you could do:
[old value] = GETSET item <new value>
INCRBY mysum [new value] - [old value]
The mysum key will contain the sum of your values for all the items at any time. With Redis 2.6, you can use Lua to encaspulate this calculation to save roundtrips.
Running a big batch to calculate statistics on existing data (this is how I understand your "parallel" sum) is not really suitable for Redis. It is not designed for map/reduce like computation.