Using HSCAN for pagination

Using HSCAN for pagination - redis

I have a large HSET in Redis and would like to have an API to browse through it with pagination. The obvious candidate would be using HSCAN with a cursor, so that API users would pass it back to fetch the next batch of records. I'm afraid, however, that if a new entry is added to the hash in between, the ordering could be messed up and cursor will become invalid (similar to how iterators break once you update a hash during the iteration).
Is my concern valid? If not, why? If yes, are there any workarounds, or will I have to change the storage approach?

Related

Redis atomic pop and add to sorted set, BRPOPLPUSH equivalent

I have one redis list "waiting" and one redis list "partying".
I have a long running process that safely blocks on the "waiting" list item to come along, and then pops it and pushes it onto the "partying" list atomically using BRPOPLPUSH. Awesome.
Users in the "waiting" list are repeatedly querying "am I in the "partying" list yet?", but there is no fast (i.e. < O(n)) method of checking if a user is in a redis list. You have to grab the whole list and loop through it.
So I'm resorting to switching from a redis list to a redis sorted set, with the 'score' as the unix timestamp of when they joined the "waiting" sorted set. I can blocking pop on the lowest score (user at the head of the queue). Using sorted sets, I can use ZSCORE to check in O(1) time if they're on either list, so it's looking hopeful.
How can I perform the nice atomic equivalent of BRPOPLPUSH on sorted sets?
It's like I need a mythical BZRPOPMIN & ZADD= BZRPOPMINZADD. If the process dies between these two, a user will effectively be disappeared from both sets.
Looking into MULTI EXEC transactions in redis, they are not what they appear to be at first glance, they're more like 'pipelines', in that I can't get the result of the first command (BZRPOPMIN) and feed it into the second command (ZADD). I'm very suspicious of putting the blocking BZRPOPMIN into the MULTI too, am I right to be?

How can I perform the nice atomic equivalent of BRPOPLPUSH on sorted sets? 
Sorry, you can't. We actually discussed this when the ZPOP family was added and decided against it: "However I'm not for the BZPOPZADD part, because instead experience with lists shown that this is not a good idea in general, unfortunately, and that that adding safety of message processing may be used other means. The worst thing abut BZPOPZADD and BRPOPLPUSH and so forth are the cascading effects, they create a lot of problems in replication for instance, and our BRPOPLPUSH replication is still not correct in certain ways (we can talk about it if you want)." (ref: https://github.com/antirez/redis/pull/4879#issuecomment-389116241)
 I'm very suspicious of putting the blocking BZRPOPMIN into the MULTI too, am I right to be?
Definitely, and blocking commands can't be called inside a transaction anyway.

Redis : How to prevent this race condition

I have a hash in redis in which one the field has value as stringified array, whenever user register for an event,
Fetch this stringified array from redis
Parse in backend and add the user's username in the array
stringify the array and store back to hash
There is a potential race condition possibility here if two users register at close enough time.
Race condition could be like this that the both users get the same stringified array from redis and then they modify, and only one update will happen as one will be overwritten by other.
Is there a way to prevent this race condition like transactions in SQL. I have read about multi, but it does not allow to do computation between commands on server.
Or storing stringifying array and storing as hash field is a bad idea, and I should use a normal list for this on redis.

The solution is to use atomic operations where you can. You have several options:
use real Redis lists that support handy commands such as LPUSH
do everything inside a Lua script (they are atomic by definition)
use Redis transactions and the WATCH command to track changes
The typical WATCH usage involves attempting to execute the transaction until it succeeds. You can do this with a simple loop, however it's possible that your connector has a special convenience method exactly for that.

How to find Keys with specific member value?

I'm new in Redis and use Redis 2.8 with StackExchange.Redis Libarary.
How can I write a KEYS pattern to get all keys with specific Hashed member value?
As I use StackExchange.Redis and want to get Keys with a pattern like this (when username is a member for a key): KEYS "username:*AAA*".
database.HashKeys("suggest me a pattern :) ")
I will call this method many times on HTTP user request to find out user's session data stored in Redis database, do you suggest a better alternative solution for this approach?

This simply isn't a direct fit for any redis features. You certainly shouldn't use KEYS for this - in addition to being expensive (you should prefer SCAN, btw), that scans the keys, not the values.

Redis: how to store a list of user hashes and retrieve it?

I've started using redis today and I've been through the tutorial and some links at stackoverflow but I'm failing to understand how to properly use redis for what it seems to be a very simple use case.
Goal: Save several users data into redis and read all of the users at once.
I start a redis client and I start by adding the first user which has id 1:
127.0.0.1:6379> hmset user:1 name "vitor" age 35
OK
127.0.0.1:6379> hgetall user:1
1) "name"
2) "vitor"
3) "age"
4) "35"
I add a couple of more users, doing several command like this one:
127.0.0.1:6379> hmset user:2 name "nuno" age 10
I was (probably wrongly) expecting to be able to now query all my users by doing:
hgetall "user:"
or even
hgetall "user:*"
The fact that I've not seen anything like this in the tutorials, kind of tells me that I'm not using redis right for this use case.
Would you be able to tell me what should be the approach for this use case?

To understand why these kind of operations seem non-trivial in NoSQL implementations, it's good to think about why NoSQL exists (and has become very popular) at all.
When you look at an early NoSQL implementation like memcached, the first use case was very simple, but very important: a blazingly fast cache for distributed data, to cache for example web page data. Very quickly stuff like clustering and sharding was added, so not all data has to be available everywhere at once at every single node in the cluster, but can be gathered on demand.
NoSQL is very different from relational data storage. Don't overuse it. Consider relational databases as well, as they are sometimes far more suited for what you are trying to accomplish. In everything you design, ask yourself "Does this scale well?".
Okay, back to your question. It is in general bad practice to do wildcard searches. You prepare your data in a way that you can retrieve your data in a scalable way.
Redis is a very chique solution, allowing you to overcome a lot of NoSQL limitations in an elegant way.
If getting "a list of all users" isn't something you have to do very often, or doesn't need to scale well, is always "I really always want all users" because it's for a daily scan anyway, use HSCAN. SCAN operations with a proper batch size don't get in the way of other clients, you can just retrieve your records a couple of thousand at a time, and after a few calls you've got everything.
You can also store your users in a SET. There's no ordering in a set, so no pagination. It can help to keep your user names unique.
If you want to do things like "get me all users that start with the letter 'a'", I'd use a ZSET. I'd wait a week or two for ZRANGEBYLEX which is just about to be released, in the works as we speak. Or use an ORM like Josiah Carlsons's 'rom' package.
When you ask yourself "But now I have to do three calls instead of one when storing my data...?!": yup, that's how it works. If you need atomicity, use a Lua script, or MULTI+EXEC pipelining. Lua is generally easier.
You can also ask yourself if using a HSET is needed. Do you need to retrieve the individual data members? Each key or member has some overhead. On top of that, HGETALL has a Big-O specification of O(N), so it doesn't scale well. It might be better to serialize your row as a whole, using JSON or MsgPack, and store it in one HSET member, or just a simple GET/SET. Also read up on SORT.
Hope this helps, TW

If you still want to use Redis you can use something like :
SADD users "{"userId":1,"name":John, "vitor":x,"age:35}"
SADD users "{"userId":2,"name":xt, "vitor":x,"age:43}"
...
And you can retrieve the same using :
SMEMBERS users

What is best practice for list and set handling in Redis?

We are using Redis as a caching server, and often have to deal with caching list. When we cache simple objects we do a GET and Redis will return null if the object doesn't exist and we'll know that the object isn't cached and have to be loaded from database.
But how do we best handle the same for lists - an empty list can be a valid value. Do we need to call EXISTS to check if the list exists (but making the operation 2 calls instead of one) or does someone have a better idea how to handle this scenario?
/Thanks

If you absolutely need to do that, when the list is created you can push a "sentinel" as first element that is never removed. In order to do this atomically you can use MULTI/EXEC/WATCH, but watch is only available in Redis 2.2 that is currently a preview (even if pretty stable, you can grab it from github master branch).
I think in your use case you may also want RPUSHX and LPUSHX, that will atomically push against a list only if it already exists.
Note that since Redis 2.2 to exist means to have at least 1 element for a list, as lists that will reach zero elements are automatically removed, for many good reasons ;)

Unfortunately, list/set retrieval commands such as LRANGE and SMEMBERS do not seem to distinguish between an empty list/set and a non-existent list/set.
So if you absolutely need to distinguish between the two cases, I guess you'll need to do an EXISTS first. Try pipelining your commands for better performance. Most Redis client libraries support pipelining.
Or you might reconsider your caching strategy so that you don't need to distinguish them.

If you are using php, I would assign the return value to an variable and then check if it is an array. (This is how it works using the Predis library)
$res = $redis->get('Key');
if(is_array($res))
do code here

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas