Sort sets by number of elements in Redis

Sort sets by number of elements in Redis - redis

I have a Redis database with a number of sets, all identified by a common key pattern, let's say "myset:".
Is there a way, from the command line client, to sort all my sets by number of elements they contain and return that information? The SORT command only takes single keys, as far as I understand.
I know I can do it quite easily with a programming language, but I prefer to be able to do it without having to install any driver, programming environment and so on on the server.
Thanks for your help.

No, there is no easy trick to do this.
Redis is a store, not really a database management system. It supports no query language. If you need some data to be retrieved, then you have to anticipate the access paths and design the data structure accordingly.
For instance in your example, you could maintain a zset while adding/removing items from the sets you are interested in. In this zset, the value will be the key of the set, and the score the cardinality of the set.
Retrieving the content of the zset by rank will give you the sets sorted by cardinality.
If you did not plan for this access path and still need the data, you will have no other choice than using a programming language. If you cannot install any Redis driver, then you could work from a Redis dump file (to be generated by the BGSAVE command), download this file to another box, and use the following package from Sripathi Krishnan to parse it and calculate the statistics you require.
https://github.com/sripathikrishnan/redis-rdb-tools

Caveat: The approach in this answer is not intended as a general solution -- remember that use of the keys command is discouraged in a production setting.
That said, here's a solution which will output the set name followed by it's length (cardinality), sorted by cardinality.
# Capture the names of the keys (sets)
KEYS=$(redis-cli keys 'myset:*')
# Paste each line from the key names with the output of `redis-cli scard key`
# and sort on the second key - the size - in reverse
paste <(echo "$KEYS") <(echo "$KEYS" | sed 's/^/scard /' | redis-cli) | sort -k2 -r -n
Note the use of the paste command above. I count on redis-cli to send me the results in order, which I'm pretty sure it will do. So paste will take one name from the $KEYS and one value from the redis output and output them on a single line.

Related

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently

A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

How to list keys without serial number in REDIS?

I am trying to list keys with specific pattern like below:
KEYS "*Team*"
and I am getting resultset with serial number like below:
1) "TeamMetricSummary_google_bps_app_google wfep
league_chambersc2016:04-03-2016_06-04-2016"
2) "\xac\xed\x00\x05t\x00TTeamMetricSummary_google_bps_app_google wfep
league_malini.gto:12-06-2015_04-02-2016"
My problem is that I want to avoid serial number in result set.
Is that possible?

That is not possible. Redis will return the whole key. You can use regex or string operations like split in your application logic to achieve this. For this you must know your input. For example if your key is in a pattern like xTeamNamey. where x and y are some constraints (serial number) you want to avoid, you can insert your key like x:TeamName:y. In retrieval you can use string.split(":")[1] to get the TeamName.

To answer your specific question:
Redis supports Lua scripting. If you're on a version of Redis that is bundled with Lua version 5.0 or above, you Lua script can use regular expressions. Write a Lua script that does redis.call of KEYS command and then does pattern matching to remove serial numbers for you.
Alternative:
By the way, assuming above is part of software that runs in production, here is what I would suggest: Don't use KEYS command on production! As it's documentation says, Redis has to go through all the keys to get keys matching your pattern. Alternatively, you may consider doing shadow writes to a Redis set that's trimmed of serial number every time you add a key. When you need list of keys, you can read the whole set. However, you'll have to manually handle the expiration of keys.

Alphabetical index with millions of rows in redis

For my application, I need an alphabetical index on a set with millions of rows.
When I use a sorted set, and give all members the same score, the result looks perfect.
Performance is also great, with a test set of 2 million rows, the last third does not perform noticably less than the first third of the set.
However, I need to query those results. For example, get the first (max) 100 items that start with "goo". I played around with zscan and sort, but it does not give me a working and performant result.
Since redis is very fast when inserting a new member to the sorted set, it must be technically possible to immediately (well, very quickly) go to the right memory location. I suppose redis uses some kind of quicksort mechanism to accomplish this.
But.. I don't seem to get the result when I just want to query the data, and not write to it.
We use replicated slaves for read actions, and we prefer the (default) read-only config switch. So creating a dummy key and deleting it afterward (however unelegant) is not really an option.
I'm stuck a bit, and I'm thinking about writing a ZLEX command in redis-server itself. Which I could use like this:
HELP "ZLEX" -> (ZLEX set score startswith)
-- Query the lexicographical index of a sorted set, supplying a 'startswith' string.
127.0.0.1:12345> ZLEX myset 0 goo LIMIT 0 100
1) goo
2) goof
3) goons
4) goozer
What are your thoughts? Am I missing something in the standard redis commands?
We're using Redis 2.8.4 x64 on Debian.
Kind regards, TW
Edits:
Note:
Related issue: indexing-using-redis-sorted-sets -> At least the name I gave to ZLEX seems to conform with Antirez' (Salvatore's) standards. As of 24-1-2014, I'm working on implementing ZLEX. It seems to be the easiest and most straight-forward solution for this use case, and Antirez could merge it into the main branch for everyone's benefit.

I've implemented ZLEX.
Here are the full specs.
You can grab the new functionality from here: github tw-bert
I also posted a pull request to Antirez here.
Kind regards, TW

Have you had a look at this ?
It can be useful depending on the length of the field by which you sort, this method requires b*(a^2) keys, where a is the length of the field , and b is amount of rows for this field.

REDIS - How can I query keys and get their values in one query?

Using keys I can query the keys as you can see below:
redis> set popo "pepe"
OK
redis> set coco "kansas"
OK
redis> set cool "rock"
OK
redis> set cool2 "punk"
OK
redis> keys *co*
1) "cool2"
2) "coco"
3) "cool"
redis> keys *ol*
1) "cool2"
2) "cool"
Is there any way to get the values instead of the keys? Something like: mget (keys *ol*)

NOTICE: As others have mentioned, along with myself in the comments on the original question, in production environments KEYS should be avoided. If you're just running queries on your own box and hacking something together, go for it. Otherwise, question if REDIS makes sense for your particular application, and if you really need to do this - if so, impose limits and avoid large blocking calls, such as KEYS. (For help with this, see 2015 Edit, below.)
My laptop isn't readily available right now to test this, but from what I can tell there isn't any native commands that would allow you to use a pattern in that way. If you want to do it all within redis, you might have to use EVAL to chain the commands:
eval "return redis.call('MGET', unpack(redis.call('KEYS', KEYS[1])))" 1 "*co*"
(Replacing the *co* at the end with whatever pattern you're searching for.)
http://redis.io/commands/eval
Note: This runs the string as a Lua script - I haven't dove much into it, so I don't know if it sanitizes the input in any way. Before you use it (especially if you intend to with any user input) test injecting further redis.call functions in and see if it evaluates those too. If it does, then be careful about it.
Edit: Actually, this should be safe because neither redis nor it's lua evaluation allows escaping the containing string: http://redis.io/topics/security
2015 Edit: Since my original post, REDIS has released 2.8, which includes the SCAN command, which is a better fit for this type of functionality. It will not work for this exact question, which requests a one-liner command, but it's better for all reasonable constraints / environments.
Details about SCAN can be read at http://redis.io/commands/scan .
To use this, essentially you iterate over your data set using something like scan ${cursor} MATCH ${query} COUNT ${maxPageSize} (e.g. scan 0 MATCH *co* COUNT 500). Here, cursor should always be initialized as 0.
This returns two things: first is a new cursor value that you can use to get the next set of elements, and second is a collection of elements matching your query. You just keep updating cursor, calling this query until cursor is 0 again (meaning you've iterated over everything), and push the found elements into a collection.
I know SCAN sounds like a lot more work, but I implore you, please use a solution like this instead of KEYS for anything important.

Get a range of keys with redis?

Something that comes up all the time with our data set at work is needing to query for a bunch of values given a range of keys. Date ranges are an obvious example.
I know you can use unix timestamps and a sorted set to query by date ranges, but it seems annoying, because I'd have to either
put the whole document as the value in the sorted set, or
just put ids in it, then ask redis for each key.
Maybe option 2 is standard? Is there a way to ask redis for multiple keys at once? Like mongodb's $in query? Or perhaps asking for a bunch of keys in a pipeline is just as fast?

Options 2, put Ids into sorted set then use mget to get values out, if your keys are hashes then you need to issue multiple hget, but the advantage is that you can pull out specific parts of the object that you actually need instead of everything. It is very fast in practice.

Maybe some bash magic helps?
echo 'keys YOURKEY*' | redis-cli | sed 's/^/get /' | redis-cli
This will output the data from all the keys which begin with YOURKEY

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas