Redis : How to prevent this race condition - redis

I have a hash in redis in which one the field has value as stringified array, whenever user register for an event,
Fetch this stringified array from redis
Parse in backend and add the user's username in the array
stringify the array and store back to hash
There is a potential race condition possibility here if two users register at close enough time.
Race condition could be like this that the both users get the same stringified array from redis and then they modify, and only one update will happen as one will be overwritten by other.
Is there a way to prevent this race condition like transactions in SQL. I have read about multi, but it does not allow to do computation between commands on server.
Or storing stringifying array and storing as hash field is a bad idea, and I should use a normal list for this on redis.

The solution is to use atomic operations where you can. You have several options:
use real Redis lists that support handy commands such as LPUSH
do everything inside a Lua script (they are atomic by definition)
use Redis transactions and the WATCH command to track changes
The typical WATCH usage involves attempting to execute the transaction until it succeeds. You can do this with a simple loop, however it's possible that your connector has a special convenience method exactly for that.

Related

Making sure (distributed) caches only store the latest value in a distributed system

Let's say I want to use a built-in solution such as Redis or Memcached to cache database rows (as an example), to avoid recurrent costly trips to the database.
For the sake of the argument, let's assume I have a TABLE(id, x, y) and that I want to cache all rows so I never have to read directly from the database.
Questions:
Consider the following case: NodeA tries to update a given row's field x while NodeB tries to update y, then both simultaneously try to update the cache line. If they try to "manually" update the field they just changed to the row in the cache, if we follow the typical last-write-wins, one of the fields is going to be discarded, which is catastrophic. This makes me think I need to always fill the cache's rows with a full row read from the database.
But this by itself won't necessarily help me. If NodeA writes to x and loads the entire row in memory and then NodeB writes to y and reads the entire row in memory, if NodeB writes to the cache before NodeA then NodeB's changes will be overwritten! This makes me believe I need to always somehow version the rows both in the DB and in the cache. Is this the case? Memcached seems to have a compare and set primitive, but I see no such thing in Redis.
Even if 1. and 2. are not an issue, I still need to guarantee that my write / read has read-after-write consistency, otherwise it may happen that what I'm reading and intending to put in the cache is not necessarily the most up-to-date version. If that's the case, how can I make sure of this? By requiring w + r > n?
This seems to be a very common use-case, I'd guess it's pretty much a solved problem. What can I try to resolve this?
Key value stores as redis support advance data structures, such as HASHs.
If you're doing partial updates to cached entities (only a set of fields is updated as part of the super set), and given your goal is to avoid time-consuming database reads, simply save the table entry as a HASH K/V pairs (using HSET) and the use HGETALL for reading.
Redis OPS are atomic by nature, so that should solve your problems, if I got them right.
On a side note, if you're caching an entire entity yet doing partial updates, you should consider a simpler caching approach, such as read-through (making cache validity a reader-only concern).
As opposed to Database accesses. Redis cache access from different location unless somehow serialized, will always have the potential of being out of order when it comes to distributed systems, as there's always the execution environment (network, threading) to introduce possible delays.
Doing read-through caching will ensure data is always updated after the most recent write without the need to synchronize anything else.
This is how Facebook solved the issue with Memcached: http://nil.csail.mit.edu/6.824/2020/papers/memcache-faq.txt.
The idea is to use the concept of a lease: when a request for a cached value is received and there is no data for such key, a lease token (64 bits id) is returned.
When the webserver fetches the data from the database it can then store the data in the cache with that token. Every time an invalidation request is invoked on a key, a new lease token is created, and as such, if a put is attempted for an old token, the put ends up rejected.
As far as I understand, it's not really possible to (easily) replicate this behavior with Redis without resorting to LUA scripts.

Is there a way to increment all fields in a Redis hash at once?

I want to use Redis to cache a large amount of highly dynamic counters. In my case users are subscribing to sources and I want to cache each user's counter for that source. When a new item arrives at the source it's natural that the counters for all users subscribed to this source should be incremented by 1. Some sources have hundreds of thousands of subscribers, so it's important that this happens immediately.
However, Redis doesn't have a native method to increment all hash fields at once, only HINCRBY. Is there a solution for this?
While searching, I stumbled upon some threads where people wanted a bulk version of HINCRBY, but my case is different. I want to increment all fields in the hash.
Such operation should be achieved by using LUA script see:
https://redis.io/commands/evalsha
https://redis.io/commands/eval
You can use LUA script to "batch" couple of operations on a single hash

Redis atomic pop and add to sorted set, BRPOPLPUSH equivalent

I have one redis list "waiting" and one redis list "partying".
I have a long running process that safely blocks on the "waiting" list item to come along, and then pops it and pushes it onto the "partying" list atomically using BRPOPLPUSH. Awesome.
Users in the "waiting" list are repeatedly querying "am I in the "partying" list yet?", but there is no fast (i.e. < O(n)) method of checking if a user is in a redis list. You have to grab the whole list and loop through it.
So I'm resorting to switching from a redis list to a redis sorted set, with the 'score' as the unix timestamp of when they joined the "waiting" sorted set. I can blocking pop on the lowest score (user at the head of the queue). Using sorted sets, I can use ZSCORE to check in O(1) time if they're on either list, so it's looking hopeful.
How can I perform the nice atomic equivalent of BRPOPLPUSH on sorted sets?
It's like I need a mythical BZRPOPMIN & ZADD= BZRPOPMINZADD. If the process dies between these two, a user will effectively be disappeared from both sets.
Looking into MULTI EXEC transactions in redis, they are not what they appear to be at first glance, they're more like 'pipelines', in that I can't get the result of the first command (BZRPOPMIN) and feed it into the second command (ZADD). I'm very suspicious of putting the blocking BZRPOPMIN into the MULTI too, am I right to be?
How can I perform the nice atomic equivalent of BRPOPLPUSH on sorted sets? 
Sorry, you can't. We actually discussed this when the ZPOP family was added and decided against it: "However I'm not for the BZPOPZADD part, because instead experience with lists shown that this is not a good idea in general, unfortunately, and that that adding safety of message processing may be used other means. The worst thing abut BZPOPZADD and BRPOPLPUSH and so forth are the cascading effects, they create a lot of problems in replication for instance, and our BRPOPLPUSH replication is still not correct in certain ways (we can talk about it if you want)." (ref: https://github.com/antirez/redis/pull/4879#issuecomment-389116241)
 I'm very suspicious of putting the blocking BZRPOPMIN into the MULTI too, am I right to be?
Definitely, and blocking commands can't be called inside a transaction anyway.

Am I guaranteed that the requests to Redis will be executed in order?

all. I write a data item to Redis. Then later, I read the data item out of Redis.
Since there may be multiple servers taking these Redis requests and satisfying them, if I make the write request 1 ms before I make the read request (suppose they're both being done by the same process) am I assured that the read request won't be processed first, and I get a response back like "that data item doesn't exist"?
Assuming that the commands are being issued in sequence, you can assume that they will be atomic and single-threaded operations. Read more about this in this stackoverflow answer.
The above is True for a single Redis server, and not guaranteed for a Cluster behavior (thanks #mwp). In that situation, I'd recommend adding a check at the client level. If key doesn't exist when Redis makes its GET call, by default a nil value is returned.
Last note: depending on your implementation you may want to look into storing your redis items in a list, LPUSH-ing the write requests and BRPOP-ing the out values so you would always be guaranteed a value exists...

What is best practice for list and set handling in Redis?

We are using Redis as a caching server, and often have to deal with caching list. When we cache simple objects we do a GET and Redis will return null if the object doesn't exist and we'll know that the object isn't cached and have to be loaded from database.
But how do we best handle the same for lists - an empty list can be a valid value. Do we need to call EXISTS to check if the list exists (but making the operation 2 calls instead of one) or does someone have a better idea how to handle this scenario?
/Thanks
If you absolutely need to do that, when the list is created you can push a "sentinel" as first element that is never removed. In order to do this atomically you can use MULTI/EXEC/WATCH, but watch is only available in Redis 2.2 that is currently a preview (even if pretty stable, you can grab it from github master branch).
I think in your use case you may also want RPUSHX and LPUSHX, that will atomically push against a list only if it already exists.
Note that since Redis 2.2 to exist means to have at least 1 element for a list, as lists that will reach zero elements are automatically removed, for many good reasons ;)
Unfortunately, list/set retrieval commands such as LRANGE and SMEMBERS do not seem to distinguish between an empty list/set and a non-existent list/set.
So if you absolutely need to distinguish between the two cases, I guess you'll need to do an EXISTS first. Try pipelining your commands for better performance. Most Redis client libraries support pipelining.
Or you might reconsider your caching strategy so that you don't need to distinguish them.
If you are using php, I would assign the return value to an variable and then check if it is an array. (This is how it works using the Predis library)
$res = $redis->get('Key');
if(is_array($res))
do code here