REDIS SISMEMBERS Performance: LUA script on multiple keys vs single transaction - redis

I have a dozen of REDIS Keys of the type SET, say
PUBSUB_USER_SET-1-1668985588478915880,
PUBSUB_USER_SET-2-1668985588478915880,
PUBSUB_USER_SET-3-1668988644477632747,
.
.
.
.
PUBSUB_USER_SET-10-1668983464477632083
The set contains a userId and the problem statement is to check if the user is present in any of the set or not
The solution I tried is to get all the keys and append with a delimiter (, comma) and pass it as an argument to lua script wherein with gmatch operator I split the keys and run sismember operation until there is a hit.
local vals = KEYS[1]
for match in (vals..","):gmatch("(.-)"..",") do
local exist = redis.call('sismember', match, KEYS[2])
if (exist == 1) then
return 1
end
end
return 0
Now as and when the number of keys grows to PUBSUB_USER_SET-20 or PUBSUB_USER_SET-30 I see an increase in latency and in throughput.
Is this the better way to do or Is it better to batch LUA scripts where in instead of passing 30keys as arguments I pass in batches of 10keys and return as soon as the user is present or is there any better way to do this?

I would propose a different solution instead of storing keys randomly in a set. You should store keys in one set and you should query that set to check whether a key is there or not.
Lets say we've N sets numbered s-0,s-1,s-2,...,s-19
You should put your keys in one of these sets based on their hash key, which means you need to query only one set instead of checking all these sets. You can use any hashing algorithm.
To make it further interesting you can try consistent hashing.

You can use redis pipeline with batching(10 keys per iteration) to improve the performance

Related

Fixed length data structures in redis

I need to match tens of thousands of 4 byte strings with about one or more boolean values. I don't mind using up a whole word for the booleans if it means faster retrieval. However I have such tight constraints for my data I imagine there is some, albeit minor, optimizations that can be made if these are reported to the storage engine in advance. Does Redis have any way to take advantage of this?
Here is a sample of my data:
"DENL": false
"NLES": false
"NLUS": true
"USNL": true
"AEGB": true
"ITAE": true
"ITFR": false
The keys are the concatination of two ISO 3166-1 alpha-2 codes. As such they are guaranteed to be 4 uppercase English letters.
The data structures I have considered using are:
Hashes to map the 4 byte keys to a string representing the booleans
A separate set for each boolean value
And since my data only contains uppercase English letters and there are only 456976 possible combinations of those (which comes out to 56KB per bit stored per key):
One or more strings that are accessed with bitwise operations (GETBIT, BITFIELD) using a function to convert the key string to a bit index.
I think that sets are probably the most elegant solution and a binary string over all possible combinations will be the most efficient. I would like to know wheter there is there some kind of middle ground? Like a set with fixed length strings as members. I would expect a datatype optimized for fixed length strings would be able to provide faster searching than one optimized for variable length strings.
It is slightly better to use the 4-letter country-code-combination as a simple key, with an empty value.
The set data-type is really a hash map where the keys are the element and are added to the hash map with NULL value. I wouldn't use a set as this implies to hashes and two lookups into a hash map: the first for the set key in the database and the second for the hash internal to the set for the element.
Use the existence of the key as either "need customs declaration" or "does not need a customs declaration" as Tomasz says.
Using simple keys allows you to use the SET command with NX/XX conditions, which may be handy in your logic:
NX -- Only set the key if it does not already exist.
XX -- Only set the key if it already exists.
Use EXISTS command instead of GET as it is slightly faster (no type checking, no value fetching).
Another advantage of simple keys vs sets is to get the value of multiple keys at once using MGET:
> MGET DENL NLES NLUS
1) ""
2) ""
3) (nil)
To be able to do complex queries, assuming these are rare and not optimized for performance, you can use SSCAN (if you go with sets) or KEYS (if you go with simple keys). However, if you go with simple keys you better use a dedicated database, see SELECT.
To query for those with NL on the left side you would use:
KEYS NL??
There are a couple of optimizations you could try:
use a set and treat all values as either "need customs declaration" or "does not need a customs declaration" - depending which one has fewer values; then with SISMEMBER you can check if your key is in that set which gives you the correct answer,
have a look at introduction to Redis data types, chapter "Bitmaps" - if you pre-define all of your keys in some array you can use SETBIT and GETBIT operations to store the flag "needs customs declaration" for given bit number (index in array).

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently
A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

How to list keys without serial number in REDIS?

I am trying to list keys with specific pattern like below:
KEYS "*Team*"
and I am getting resultset with serial number like below:
1) "TeamMetricSummary_google_bps_app_google wfep
league_chambersc2016:04-03-2016_06-04-2016"
2) "\xac\xed\x00\x05t\x00TTeamMetricSummary_google_bps_app_google wfep
league_malini.gto:12-06-2015_04-02-2016"
My problem is that I want to avoid serial number in result set.
Is that possible?
That is not possible. Redis will return the whole key. You can use regex or string operations like split in your application logic to achieve this. For this you must know your input. For example if your key is in a pattern like xTeamNamey. where x and y are some constraints (serial number) you want to avoid, you can insert your key like x:TeamName:y. In retrieval you can use string.split(":")[1] to get the TeamName.
To answer your specific question:
Redis supports Lua scripting. If you're on a version of Redis that is bundled with Lua version 5.0 or above, you Lua script can use regular expressions. Write a Lua script that does redis.call of KEYS command and then does pattern matching to remove serial numbers for you.
Alternative:
By the way, assuming above is part of software that runs in production, here is what I would suggest: Don't use KEYS command on production! As it's documentation says, Redis has to go through all the keys to get keys matching your pattern. Alternatively, you may consider doing shadow writes to a Redis set that's trimmed of serial number every time you add a key. When you need list of keys, you can read the whole set. However, you'll have to manually handle the expiration of keys.

Get multiple sets

I've currently got a dataset which is something like:
channel1 = user1,user2,user3
channel2 = user4,user5,user6
(note- these are not actual names, the text is not a predictable sequence)
I would like to have the most optimized capability for the following:
1) Add user to a channel
2) Remove user from a channel
3) Get list of all users in several selected channels, maintaining knowledge of which channel they are in (in case it matters- this can also be simply checking whether a channel has any users or not without getting an actual list of them)
4) Detect if a specific user is in a channel (willing to forego this feature if necessary)
I'm a bit hungup on the fact that there are only two ways I can see of getting multiple keys at once:
A) Using regular keys and a mget key1, key2, key3
In this solution, each value would be a JSON string which can then be manipulated and queried clientside to add/remove/determine values. This itself has a couple problems- firstly that it's possible another client will change the data while it's being processed (i.e. this solution is not atomic) and it's not easy to detect right away if a channel contains a specific user even though it is easy to detect if a channel has any users (this is low priority, as stated above)
B) Using sets and sunion
I would really like to use sets for this solution somehow, the above solution just seems wrong... but I cannot see how to query multiple sets at once and maintain info about which set each member is from or if any of the sets in the union have 0 members (sunion only gives me a final set of all the combined members)
Any solutions which can implement the above points 1-4 in optimal time and atomic operations?
EDIT: One idea which might work in my specific case is to store the channel name as part of the username and then use sets. Still, it would be great to get a more generic answer
Short answer: use sets + pipelining + MULTI/EXEC, or sets + Lua.
1) Add user to a channel
SADD command
2) Remove user from a channel
SREM command
3) Get list of all users in several selected channels
There are several ways to do it.
If you don't need strict atomicity, you just have to pipeline several SMEMBERS commands to retrieve all the sets in one roundtrip. If you are just interested whether channels have users or not, you can replace SMEMBERS by SCARD.
If you need strict atomicity, you can pipeline a MULTI/EXEC block containing SMEMBERS or SCARD commands. The output of the EXEC command will contain all the results. This is the solution I would recommend.
An alternative (atomic) way is to call a server-side Lua script using the EVAL command. Lua script executions are always atomic. The script could take a number of channel as input parameters, and build a multi-layer bulk reply to return the output.
4) Detect if a specific user is in a channel
SISMEMBER command - pipeline them if you need to check for several users.

REDIS - How can I query keys and get their values in one query?

Using keys I can query the keys as you can see below:
redis> set popo "pepe"
OK
redis> set coco "kansas"
OK
redis> set cool "rock"
OK
redis> set cool2 "punk"
OK
redis> keys *co*
1) "cool2"
2) "coco"
3) "cool"
redis> keys *ol*
1) "cool2"
2) "cool"
Is there any way to get the values instead of the keys? Something like: mget (keys *ol*)
NOTICE: As others have mentioned, along with myself in the comments on the original question, in production environments KEYS should be avoided. If you're just running queries on your own box and hacking something together, go for it. Otherwise, question if REDIS makes sense for your particular application, and if you really need to do this - if so, impose limits and avoid large blocking calls, such as KEYS. (For help with this, see 2015 Edit, below.)
My laptop isn't readily available right now to test this, but from what I can tell there isn't any native commands that would allow you to use a pattern in that way. If you want to do it all within redis, you might have to use EVAL to chain the commands:
eval "return redis.call('MGET', unpack(redis.call('KEYS', KEYS[1])))" 1 "*co*"
(Replacing the *co* at the end with whatever pattern you're searching for.)
http://redis.io/commands/eval
Note: This runs the string as a Lua script - I haven't dove much into it, so I don't know if it sanitizes the input in any way. Before you use it (especially if you intend to with any user input) test injecting further redis.call functions in and see if it evaluates those too. If it does, then be careful about it.
Edit: Actually, this should be safe because neither redis nor it's lua evaluation allows escaping the containing string: http://redis.io/topics/security
2015 Edit: Since my original post, REDIS has released 2.8, which includes the SCAN command, which is a better fit for this type of functionality. It will not work for this exact question, which requests a one-liner command, but it's better for all reasonable constraints / environments.
Details about SCAN can be read at http://redis.io/commands/scan .
To use this, essentially you iterate over your data set using something like scan ${cursor} MATCH ${query} COUNT ${maxPageSize} (e.g. scan 0 MATCH *co* COUNT 500). Here, cursor should always be initialized as 0.
This returns two things: first is a new cursor value that you can use to get the next set of elements, and second is a collection of elements matching your query. You just keep updating cursor, calling this query until cursor is 0 again (meaning you've iterated over everything), and push the found elements into a collection.
I know SCAN sounds like a lot more work, but I implore you, please use a solution like this instead of KEYS for anything important.