What is best practice for list and set handling in Redis? - redis

We are using Redis as a caching server, and often have to deal with caching list. When we cache simple objects we do a GET and Redis will return null if the object doesn't exist and we'll know that the object isn't cached and have to be loaded from database.
But how do we best handle the same for lists - an empty list can be a valid value. Do we need to call EXISTS to check if the list exists (but making the operation 2 calls instead of one) or does someone have a better idea how to handle this scenario?
/Thanks

If you absolutely need to do that, when the list is created you can push a "sentinel" as first element that is never removed. In order to do this atomically you can use MULTI/EXEC/WATCH, but watch is only available in Redis 2.2 that is currently a preview (even if pretty stable, you can grab it from github master branch).
I think in your use case you may also want RPUSHX and LPUSHX, that will atomically push against a list only if it already exists.
Note that since Redis 2.2 to exist means to have at least 1 element for a list, as lists that will reach zero elements are automatically removed, for many good reasons ;)

Unfortunately, list/set retrieval commands such as LRANGE and SMEMBERS do not seem to distinguish between an empty list/set and a non-existent list/set.
So if you absolutely need to distinguish between the two cases, I guess you'll need to do an EXISTS first. Try pipelining your commands for better performance. Most Redis client libraries support pipelining.
Or you might reconsider your caching strategy so that you don't need to distinguish them.

If you are using php, I would assign the return value to an variable and then check if it is an array. (This is how it works using the Predis library)
$res = $redis->get('Key');
if(is_array($res))
do code here

Related

Making sure (distributed) caches only store the latest value in a distributed system

Let's say I want to use a built-in solution such as Redis or Memcached to cache database rows (as an example), to avoid recurrent costly trips to the database.
For the sake of the argument, let's assume I have a TABLE(id, x, y) and that I want to cache all rows so I never have to read directly from the database.
Questions:
Consider the following case: NodeA tries to update a given row's field x while NodeB tries to update y, then both simultaneously try to update the cache line. If they try to "manually" update the field they just changed to the row in the cache, if we follow the typical last-write-wins, one of the fields is going to be discarded, which is catastrophic. This makes me think I need to always fill the cache's rows with a full row read from the database.
But this by itself won't necessarily help me. If NodeA writes to x and loads the entire row in memory and then NodeB writes to y and reads the entire row in memory, if NodeB writes to the cache before NodeA then NodeB's changes will be overwritten! This makes me believe I need to always somehow version the rows both in the DB and in the cache. Is this the case? Memcached seems to have a compare and set primitive, but I see no such thing in Redis.
Even if 1. and 2. are not an issue, I still need to guarantee that my write / read has read-after-write consistency, otherwise it may happen that what I'm reading and intending to put in the cache is not necessarily the most up-to-date version. If that's the case, how can I make sure of this? By requiring w + r > n?
This seems to be a very common use-case, I'd guess it's pretty much a solved problem. What can I try to resolve this?
Key value stores as redis support advance data structures, such as HASHs.
If you're doing partial updates to cached entities (only a set of fields is updated as part of the super set), and given your goal is to avoid time-consuming database reads, simply save the table entry as a HASH K/V pairs (using HSET) and the use HGETALL for reading.
Redis OPS are atomic by nature, so that should solve your problems, if I got them right.
On a side note, if you're caching an entire entity yet doing partial updates, you should consider a simpler caching approach, such as read-through (making cache validity a reader-only concern).
As opposed to Database accesses. Redis cache access from different location unless somehow serialized, will always have the potential of being out of order when it comes to distributed systems, as there's always the execution environment (network, threading) to introduce possible delays.
Doing read-through caching will ensure data is always updated after the most recent write without the need to synchronize anything else.
This is how Facebook solved the issue with Memcached: http://nil.csail.mit.edu/6.824/2020/papers/memcache-faq.txt.
The idea is to use the concept of a lease: when a request for a cached value is received and there is no data for such key, a lease token (64 bits id) is returned.
When the webserver fetches the data from the database it can then store the data in the cache with that token. Every time an invalidation request is invoked on a key, a new lease token is created, and as such, if a put is attempted for an old token, the put ends up rejected.
As far as I understand, it's not really possible to (easily) replicate this behavior with Redis without resorting to LUA scripts.

Redis atomic pop and add to sorted set, BRPOPLPUSH equivalent

I have one redis list "waiting" and one redis list "partying".
I have a long running process that safely blocks on the "waiting" list item to come along, and then pops it and pushes it onto the "partying" list atomically using BRPOPLPUSH. Awesome.
Users in the "waiting" list are repeatedly querying "am I in the "partying" list yet?", but there is no fast (i.e. < O(n)) method of checking if a user is in a redis list. You have to grab the whole list and loop through it.
So I'm resorting to switching from a redis list to a redis sorted set, with the 'score' as the unix timestamp of when they joined the "waiting" sorted set. I can blocking pop on the lowest score (user at the head of the queue). Using sorted sets, I can use ZSCORE to check in O(1) time if they're on either list, so it's looking hopeful.
How can I perform the nice atomic equivalent of BRPOPLPUSH on sorted sets?
It's like I need a mythical BZRPOPMIN & ZADD= BZRPOPMINZADD. If the process dies between these two, a user will effectively be disappeared from both sets.
Looking into MULTI EXEC transactions in redis, they are not what they appear to be at first glance, they're more like 'pipelines', in that I can't get the result of the first command (BZRPOPMIN) and feed it into the second command (ZADD). I'm very suspicious of putting the blocking BZRPOPMIN into the MULTI too, am I right to be?
How can I perform the nice atomic equivalent of BRPOPLPUSH on sorted sets? 
Sorry, you can't. We actually discussed this when the ZPOP family was added and decided against it: "However I'm not for the BZPOPZADD part, because instead experience with lists shown that this is not a good idea in general, unfortunately, and that that adding safety of message processing may be used other means. The worst thing abut BZPOPZADD and BRPOPLPUSH and so forth are the cascading effects, they create a lot of problems in replication for instance, and our BRPOPLPUSH replication is still not correct in certain ways (we can talk about it if you want)." (ref: https://github.com/antirez/redis/pull/4879#issuecomment-389116241)
 I'm very suspicious of putting the blocking BZRPOPMIN into the MULTI too, am I right to be?
Definitely, and blocking commands can't be called inside a transaction anyway.

Am I guaranteed that the requests to Redis will be executed in order?

all. I write a data item to Redis. Then later, I read the data item out of Redis.
Since there may be multiple servers taking these Redis requests and satisfying them, if I make the write request 1 ms before I make the read request (suppose they're both being done by the same process) am I assured that the read request won't be processed first, and I get a response back like "that data item doesn't exist"?
Assuming that the commands are being issued in sequence, you can assume that they will be atomic and single-threaded operations. Read more about this in this stackoverflow answer.
The above is True for a single Redis server, and not guaranteed for a Cluster behavior (thanks #mwp). In that situation, I'd recommend adding a check at the client level. If key doesn't exist when Redis makes its GET call, by default a nil value is returned.
Last note: depending on your implementation you may want to look into storing your redis items in a list, LPUSH-ing the write requests and BRPOP-ing the out values so you would always be guaranteed a value exists...

Redis : How to prevent this race condition

I have a hash in redis in which one the field has value as stringified array, whenever user register for an event,
Fetch this stringified array from redis
Parse in backend and add the user's username in the array
stringify the array and store back to hash
There is a potential race condition possibility here if two users register at close enough time.
Race condition could be like this that the both users get the same stringified array from redis and then they modify, and only one update will happen as one will be overwritten by other.
Is there a way to prevent this race condition like transactions in SQL. I have read about multi, but it does not allow to do computation between commands on server.
Or storing stringifying array and storing as hash field is a bad idea, and I should use a normal list for this on redis.
The solution is to use atomic operations where you can. You have several options:
use real Redis lists that support handy commands such as LPUSH
do everything inside a Lua script (they are atomic by definition)
use Redis transactions and the WATCH command to track changes
The typical WATCH usage involves attempting to execute the transaction until it succeeds. You can do this with a simple loop, however it's possible that your connector has a special convenience method exactly for that.

Redis and Object Versioning

How are people coping with changes to redis object schemas - adding or removing properties from objects?
Sharing from my own experience (one year old project with thousands of user requests per second).
Usually, there were two scenarios for me:
Add new information to existing structures (like, "email" field to a user)
Remove or change existing values in existing structures (like, change format of some field)
Drop stuff from the database
For 1 I keep following simple strategy: degrade gracefully, e.g. if user doesn't have email record - treat it as empty email. Worked all the time.
For 2 and 3 it depends, whether data can be changed/calculated/fixed before releasing or after. I run a job on database that does all the work for me, for few millions of keys it takes considerable time (minutes). If that job can be run only after I release the new code - then degrading gracefully helps a lot, I simply release and then run the job.
PS: If you affect a lot of keys in redis then it is very important to use http://redis.io/topics/pipelining Saves a lot of time.
Take a list of all affected (i.e. you want to fix them in any way) keys or records in pipeline
Do whatever you want on them. If it's possible try to queue writing operations into pipeline too
Send queued operations to redis.
It is also very important for you to make indexes of your structures. I keep sets with ids. Then I simply iterate over SMEMBERS(set_with_ids).
It is much, much better than iterating over KEYS command.
For extremely simple versioning, you could use different database numbers. This could be quite limiting in cases where almost everything is the same between two versions but it's also a very clean way to do it if it will work for you.