REDIS - How can I query keys and get their values in one query?

REDIS - How can I query keys and get their values in one query? - redis

Using keys I can query the keys as you can see below:
redis> set popo "pepe"
OK
redis> set coco "kansas"
OK
redis> set cool "rock"
OK
redis> set cool2 "punk"
OK
redis> keys *co*
1) "cool2"
2) "coco"
3) "cool"
redis> keys *ol*
1) "cool2"
2) "cool"
Is there any way to get the values instead of the keys? Something like: mget (keys *ol*)

NOTICE: As others have mentioned, along with myself in the comments on the original question, in production environments KEYS should be avoided. If you're just running queries on your own box and hacking something together, go for it. Otherwise, question if REDIS makes sense for your particular application, and if you really need to do this - if so, impose limits and avoid large blocking calls, such as KEYS. (For help with this, see 2015 Edit, below.)
My laptop isn't readily available right now to test this, but from what I can tell there isn't any native commands that would allow you to use a pattern in that way. If you want to do it all within redis, you might have to use EVAL to chain the commands:
eval "return redis.call('MGET', unpack(redis.call('KEYS', KEYS[1])))" 1 "*co*"
(Replacing the *co* at the end with whatever pattern you're searching for.)
http://redis.io/commands/eval
Note: This runs the string as a Lua script - I haven't dove much into it, so I don't know if it sanitizes the input in any way. Before you use it (especially if you intend to with any user input) test injecting further redis.call functions in and see if it evaluates those too. If it does, then be careful about it.
Edit: Actually, this should be safe because neither redis nor it's lua evaluation allows escaping the containing string: http://redis.io/topics/security
2015 Edit: Since my original post, REDIS has released 2.8, which includes the SCAN command, which is a better fit for this type of functionality. It will not work for this exact question, which requests a one-liner command, but it's better for all reasonable constraints / environments.
Details about SCAN can be read at http://redis.io/commands/scan .
To use this, essentially you iterate over your data set using something like scan ${cursor} MATCH ${query} COUNT ${maxPageSize} (e.g. scan 0 MATCH *co* COUNT 500). Here, cursor should always be initialized as 0.
This returns two things: first is a new cursor value that you can use to get the next set of elements, and second is a collection of elements matching your query. You just keep updating cursor, calling this query until cursor is 0 again (meaning you've iterated over everything), and push the found elements into a collection.
I know SCAN sounds like a lot more work, but I implore you, please use a solution like this instead of KEYS for anything important.

Related

What Redis data type fit the most for following example

I have following scenario:
Fetch array of numbers (from REDIS) conditionally
For each number do some async stuff (fetch something from DB based on number)
For each thing in result set from DB do another async stuff
Periodically repeat 1. 2. 3. because new numbers will be constantly added to REDIS structure.Those numbers represent unix timestamp in milliseconds so out of the box those numbers will always be sorted in time of addition
Conditionally means fetch those unix timestamp from REDIS that are less or equal to current unix timestamp in milliseconds(Date.now())
Question is what REDIS data type fit the most for this use case having in mind that this code will be scaled up to N instances, so N instances will share access to single REDIS instance. To equally share the load each instance will read for example first(oldest) 5 numbers from REDIS. Numbers are unique (adding same number should fail silently) so REDIS SET seems like a good choice but reading M first elements from REDIS set seems impossible.
To prevent two different instance of the code to read same numbers REDIS read operation should be atomic, it should read the numbers and delete them. If any async operation fail on specific number (steps 2. and 3.), numbers should be added again to REDIS to be handled again. They should be re-added back to the head not to the end to be handled again as soon as possible. As far as i know SADD would push it to the tail.
SMEMBERS key would read everything, it looks like a hammer to me. I would need to include some application logic to get first five than to check what is less or equal to Date.now() and then to delete those and to wrap somehow everything in single transaction. Besides that set cardinality can be huge.
SSCAN sounds interesting but i don't have any clue how it works in "scaled" environment like described above. Besides that, per REDIS docs: The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Like described above collection will be changed frequently

A more appropriate data structure would be the Sorted Set - members have a float score that is very suitable for storing a timestamp and you can perform range searches (i.e. anything less or equal a given value).
The relevant starting points are the ZADD, ZRANGEBYSCORE and ZREMRANGEBYSCORE commands.
To ensure the atomicity when reading and removing members, you can choose between the the following options: Redis transactions, Redis Lua script and in the next version (v4) a Redis module.
Transactions
Using transactions simply means doing the following code running on your instances:
MULTI
ZRANGEBYSCORE <keyname> -inf <now-timestamp>
ZREMRANGEBYSCORE <keyname> -inf <now-timestamp>
EXEC
Where <keyname> is your key's name and <now-timestamp> is the current time.
Lua script
A Lua script can be cached and runs embedded in the server, so in some cases it is a preferable approach. It is definitely the best approach for short snippets of atomic logic if you need flow control (remember that a MULTI transaction returns the values only after execution). Such a script would look as follows:
local r = redis.call('ZRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', ARGV[1])
return r
To run this, first cache it using SCRIPT LOAD and then call it with EVALSHA like so:
EVALSHA <script-sha> 1 <key-name> <now-timestamp>
Where <script-sha> is the sha1 of the script returned by SCRIPT LOAD.
Redis modules
In the near future, once v4 is GA you'll be able to write and use modules. Once this becomes a reality, you'll be able to use this module we've made that provides the ZPOP command and could be extended to cover this use case as well.

Redis - Sorted set, find item by property value

In redis I store objects in a sorted set.
In my solution, it's important to be able to run a ranged query by dates, so I store the items with the score being the timestamp of each items, for example:
# Score Value
0 1443476076 {"Id":"92","Ref":"7ADT","DTime":1443476076,"ATime":1443901554,"ExTime":0,"SPName":"7ADT33CFSAU6","StPName":"7ADT33CFSAU6"}
1 1443482969 {"Id":"11","Ref":"DAJT","DTime":1443482969,"ATime":1443901326,"ExTime":0,"SPName":"DAJTJTT4T02O","StPName":"DAJTJTT4T02O"}
However, in other situations I need to find a single item in the set based on it's ID.
I know I can't just query this data structure as if it were a nosql db, but I tried using ZSCAN, which didn't work.
ZSCAN MySet 0 MATCH Id:92 count 1
It returns; "empty list or set"
Maybe I need to serialize different?
I have serialized using Json.Net.
How, if possible, can I achieve this; using dates as score and still be able to lookup an item by it's ID?
Many thanks,
Lars
Edit:
Assume it's not possible, but any thoughts or inputs are welcome:
Ref: http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/
In Redis, data can only be queried by its key. Even if we use a hash,
we can't say get me the keys wherever the field race is equal to
sayan.
Edit 2:
I tried to do:
ZSCAN MySet 0 MATCH *87*
127.0.0.1:6379> ZSCAN MySet 0 MATCH *87*
1) "192"
2) 1) "{\"Id\":\"64\",\"Ref\":\"XQH4\",\"DTime\":1443837798,\"ATime\":1444187707,\"ExTime\":0,\"SPName\":\"XQH4BPGW47FM\",\"StPName\":\"XQH4BPGW47FM\"}"
2) "1443837798"
3) "{\"Id\":\"87\",\"Ref\":\"5CY6\",\"DTime\":1443519199,\"ATime\":1444172326,\"ExTime\":0,\"SPName\":\"5CY6DHP23RXB\",\"StPName\":\"5CY6DHP23RXB\"}"
4) "1443519199"
And it finds the desired item, but it also finds another one with an occurance of 87 in the property ATime. Having more unique, longer IDs might work this way and I would have to filter the results in code to find the one with the exact value in its property.
Still open for suggestions.

I think it's very simple.
Solution 1(Inferior, not recommended)
Your way of ZSCAN MySet 0 MATCH Id:92 count 1 didn't work out because the stored string is "{\"Id\":\"92\"... not "{\"Id:92\".... The string has been changed into another format. So try to use MATCH Id\":\"64 or something like that to match the json serialized data in redis. I'm not familiar with json.net, so the actual string leaves for you to discover.
By the way, I have to ask you did ZSCAN MySet 0 MATCH Id:92 count 1 return a cursor? I suspect you used ZSCAN in a wrong way.
Solution 2(Better, strongly recommended)
ZSCAN is good when your sorted set is not large and you know how to save network roundtrip time by Redis' Lua transaction. This still make "look up by ID" operation O(n). Therefore, a better solution is to change you data model in the following way:
change sorted set
from
# Score Value
0 1443476076 {"Id":"92","Ref":"7ADT","DTime":1443476076,"ATime":1443901554,"ExTime":0,"SPName":"7ADT33CFSAU6","StPName":"7ADT33CFSAU6"}
1 1443482969 {"Id":"11","Ref":"DAJT","DTime":1443482969,"ATime":1443901326,"ExTime":0,"SPName":"DAJTJTT4T02O","StPName":"DAJTJTT4T02O"}
to
# Score Value
0 1443476076 Id:92
1 1443482969 Id:11
Move the rest detailed data in another set of hashes type keys:
# Key field-value field-value ...
0 Id:92 Ref-7ADT DTime-1443476076 ...
1 Id:11 Ref-7ADT DTime-1443476076 ...
Then, you locate by id by doing hgetall id:92. As to ranged query by date, you need do ZRANGEBYSCORE sortedset mindate maxdate then hgetall every id one by one. You'd better use lua to wrap these commands in one and it will still be super fast!
Data in NoSql database need to be organized in a redundant way like above. This may make some usual operation involve more than one commands and roundtrip, but it can be tackled by redis's lua feature. I strongly recommend the lua feature of redis, cause it wrap commands into one network roundtrip, which are all executed on the redis-server side and is atomic and super fast!
Reply if there's anything you don't know

How to get all hashes in foo:* using a single id counter instead of a set/array

Introduction
My domain has articles, which have a title and text. Each article has revisions (like the SVN concept), so every time it is changed/edited, those changes will be stored as a revision. A revision is composed of changes and the description of those changes
I want to be able to obtain all revisions descriptions at once.
What's the problem?
I'm certain that I would store the revision as a hash in articles:revisions:<id> storing the changes, and the description in it.
What I'm not certain of is how do I get all of the descriptions at once.
I have many options to do this, but none of them convinces me.
Store the revision ids for an article as a set, and use SORT articles:revisions:idSet BY NOSORT GET articles:revisions:*->description. This means that I would store a set for each article. If every article had 50 revisions, and we had 10.000 articles, we would have 500.000 ids stored.
Is this the best way? Isn't this eating up too much RAM?
I have other ideas in mind, but I don't consider them good either.
Iterate from 0 to the last revision's id, doing a HGET for each id using MULTI
Create the idSet for a specific article if it doesn't exist and is request, expire after some time.
Isn't there a way for redis to do a SORT array BY NOSORT GET, with array being an adhoc array in the form of [0, MAX]?

Seems like you have a good solution.
As long as you keep those id numbers less than 10,000 and your sets with less than 512 elements(set-max-intset-entries), your memory consumption will be much lower than you think.
Here's a good explanation of it.

This can be solved in an optimized way using a TRIE or DAWG better than what Redis provides. I don't know your application or other info on your search problem (e.g. construction time, unsuccessful searches, update performance).
If you search much more often than you need to update / insert into your lookup storage, I'd suggest you have a look at DAWGDIC [1] as a library, and construct "search paths" (similar as you already described) using a string format that can be search-completed later:
articleID:revisionID:"changeDescription":"change"
Example (I assume you have one description per revision, and n changes. This isn't clear to me from your question):
1:2:"Some changes":"Added two sentences here, removed one sentence there"
1:2:"Some changes":"Fixed article title"
2:4:"Advertisement changes":"Added this, removed that"
Note: Even though you construct these strings with duplicate prefixes, the DAWG will store them in a very space efficient way (simply put, it will append the right side of the string to the data structure and create a shortcut for the common prefix, see also [2] for a comparison of TRIE data structures).
To list changes of article 1, revision 2, set the common prefix for your lookup:
completer.Start(index, "1:2");
Now you can simple call completer.Next() to lookup a next record that shares the same prefix, and completer.value() to get the record's value. In our example we'll get:
1:2:"Some changes":"Added two sentences here, removed one sentence there"
1:2:"Some changes":"Fixed article title"
Of course you need to parse the strings yourself into your data object.
Maybe that's not what you're looking for and overkill. But it can be a very space and search performance efficient way, if it meets your requirements.
[1] https://code.google.com/p/dawgdic/
[2] http://kmike.ru/python-data-structures/

Alphabetical index with millions of rows in redis

For my application, I need an alphabetical index on a set with millions of rows.
When I use a sorted set, and give all members the same score, the result looks perfect.
Performance is also great, with a test set of 2 million rows, the last third does not perform noticably less than the first third of the set.
However, I need to query those results. For example, get the first (max) 100 items that start with "goo". I played around with zscan and sort, but it does not give me a working and performant result.
Since redis is very fast when inserting a new member to the sorted set, it must be technically possible to immediately (well, very quickly) go to the right memory location. I suppose redis uses some kind of quicksort mechanism to accomplish this.
But.. I don't seem to get the result when I just want to query the data, and not write to it.
We use replicated slaves for read actions, and we prefer the (default) read-only config switch. So creating a dummy key and deleting it afterward (however unelegant) is not really an option.
I'm stuck a bit, and I'm thinking about writing a ZLEX command in redis-server itself. Which I could use like this:
HELP "ZLEX" -> (ZLEX set score startswith)
-- Query the lexicographical index of a sorted set, supplying a 'startswith' string.
127.0.0.1:12345> ZLEX myset 0 goo LIMIT 0 100
1) goo
2) goof
3) goons
4) goozer
What are your thoughts? Am I missing something in the standard redis commands?
We're using Redis 2.8.4 x64 on Debian.
Kind regards, TW
Edits:
Note:
Related issue: indexing-using-redis-sorted-sets -> At least the name I gave to ZLEX seems to conform with Antirez' (Salvatore's) standards. As of 24-1-2014, I'm working on implementing ZLEX. It seems to be the easiest and most straight-forward solution for this use case, and Antirez could merge it into the main branch for everyone's benefit.

I've implemented ZLEX.
Here are the full specs.
You can grab the new functionality from here: github tw-bert
I also posted a pull request to Antirez here.
Kind regards, TW

Have you had a look at this ?
It can be useful depending on the length of the field by which you sort, this method requires b*(a^2) keys, where a is the length of the field , and b is amount of rows for this field.

Get multiple sets

I've currently got a dataset which is something like:
channel1 = user1,user2,user3
channel2 = user4,user5,user6
(note- these are not actual names, the text is not a predictable sequence)
I would like to have the most optimized capability for the following:
1) Add user to a channel
2) Remove user from a channel
3) Get list of all users in several selected channels, maintaining knowledge of which channel they are in (in case it matters- this can also be simply checking whether a channel has any users or not without getting an actual list of them)
4) Detect if a specific user is in a channel (willing to forego this feature if necessary)
I'm a bit hungup on the fact that there are only two ways I can see of getting multiple keys at once:
A) Using regular keys and a mget key1, key2, key3
In this solution, each value would be a JSON string which can then be manipulated and queried clientside to add/remove/determine values. This itself has a couple problems- firstly that it's possible another client will change the data while it's being processed (i.e. this solution is not atomic) and it's not easy to detect right away if a channel contains a specific user even though it is easy to detect if a channel has any users (this is low priority, as stated above)
B) Using sets and sunion
I would really like to use sets for this solution somehow, the above solution just seems wrong... but I cannot see how to query multiple sets at once and maintain info about which set each member is from or if any of the sets in the union have 0 members (sunion only gives me a final set of all the combined members)
Any solutions which can implement the above points 1-4 in optimal time and atomic operations?
EDIT: One idea which might work in my specific case is to store the channel name as part of the username and then use sets. Still, it would be great to get a more generic answer

Short answer: use sets + pipelining + MULTI/EXEC, or sets + Lua.
1) Add user to a channel
SADD command
2) Remove user from a channel
SREM command
3) Get list of all users in several selected channels
There are several ways to do it.
If you don't need strict atomicity, you just have to pipeline several SMEMBERS commands to retrieve all the sets in one roundtrip. If you are just interested whether channels have users or not, you can replace SMEMBERS by SCARD.
If you need strict atomicity, you can pipeline a MULTI/EXEC block containing SMEMBERS or SCARD commands. The output of the EXEC command will contain all the results. This is the solution I would recommend.
An alternative (atomic) way is to call a server-side Lua script using the EVAL command. Lua script executions are always atomic. The script could take a number of channel as input parameters, and build a multi-layer bulk reply to return the output.
4) Detect if a specific user is in a channel
SISMEMBER command - pipeline them if you need to check for several users.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas