Searching in values of a redis db - redis

I am a novice in using Redis DB. After reading some of the documentation and looking into some of the examples on the Internet and also scanning stackoverflow.com, I can see that Redis is very fast, scales well but this costs the price that we have to think out how our data will be accessed at the design time and what operations they will have to undergo. This I can understand but I am a little confused about searching in the data what was so easy, however slow, with the plain old SQL. I could do it in one way with the KEY command but it is an O(N) operation and not O(log(N)). So I would lose one of the advantages of Redis.
What do more experienced colleagues say here?
Let's take an example use case: we have need to store personal data for approx. 100.000 people and those data need to be searched by name, phone nr.
For this I would use the following structures:
1. SET for storing all persons' ids {id1, id2, ...}
2. HASH for each person to store personal data and name it
like map:<id> e.g. map:id1{name:<name>, phone:<number>, etc...}
Solution 1:
1. HASH for storing all persons' ids but the key should be the phone number
2. Then with the command KEY 123* all ids could be retrieved who have a phone number
sarting with 123. On basis of the ids also the other personal data could be retrieved.
3. So forth for each data to be searched for a separate HASH should be created.
But a major drawback of this solution is that the attribute values must also be unique, so that the assigment of the phone number and the ids in the HASH would be unambiguous. On the other hand, O(N) runtime is not ideal.
Moreover, this uses more space than would be necessary and the KEY command deteriorates the access performance. (http://redis.io/commands/keys)
How should it be done in the right way? I could also imagine that ids would go in a ZSET and the data needed search could be the scores but this make only possible to work with ranges not with seraches.
Thank you also in advance, regards, Tamas
Answer summary:
Actually, both responses state that Redis was not designed to search in the values of the keys. If this use case is necessary, then either workarounds need to be implemented as shown in my original solution or in the below solution.
The below solution by Eli has a much better performance, than my original one because the access to the keys can be considered constant, only the list of ids needs to be iterated through, for the access this would give O(const) runtime. This data model also allows that one person might have the same phone number as someone else and so on also for names etc... so 1-n relationship is also possible (I would say with old ERD terminology).
The drawback of this solution is, that it consumes much more space than mine and phone numbers whose starting digits are known only, could not be searched.
Thanks for both responses.

Redis is for use cases where you need to access and update data at very high frequency and where you benefit from use of data structures (hashes, sets, lists, strings, or sorted sets). It's made to fill very specific use cases. If you have a general use case like very flexible searching, you'd be much better served by something built for this purpose like elastic search or SOLR.
That said, if you must do this in Redis, here's how I'd do it (assuming users can share names and phone numbers):
name:some_name -> set([id1, id2, etc...])
name:some_other_name -> set([id3, id4, etc...])
phone:some_phone -> set([id1, id3, etc...])
phone:some_other_phone -> set([id2, id4, etc...])
id1 -> {'name' : 'bob', 'phone' : '123-456-7891', etc...}
id2 -> {'name' : 'alice', 'phone' : '987-456-7891', etc...}
In this case, we're making a new key for every name (prefixed with "name:") and every phone number (prefixed "phone:"). Each key points to a set of ids that have all the info you want for a user. When you search, for a phone, for example, you'll do:
HGETALL 'phone:123-456-7891'
and then loop through the results and return whatever info on each (name in our example) in your language of choice (you can do this whole thing in server-side Lua on the Redis box to go even faster and avoid network back-and-forth, if you want):
for id in results:
HGET id 'name'
You're cost here will be O(m) where m is the number of users with the given phone number, and this will be a very fast operation on Redis because of how optimized it is for speed. It'll be overkill in your case because you probably don't need things to go so fast, and you'd prefer having flexible search, but this is how you would do it.

redis is awesome, but it's not built for searching on anything other than keys. You simply cant query on values without building extra data sets to store items to facilitate such querying, but even then you don't get true search, just more maintenance, inefficient use of memory, yada, yada...
This question has already been addressed, you've got some reading to do :-D
To search strings, build auto-complete in redis and other cool things...
How do I search strings in redis?
Why using MongoDB over redis is smart when searching inside documents...
What's the most efficient document-oriented database engine to store thousands of medium sized documents?

Original Secondary Indicies in Redis
The accepted answer here is correct in that the traditional way of handling searching in Redis has been through secondary indices built around Sets and Sorted Sets.
e.g.
HSET Person:1 firstName Bob lastName Marley age 32 phoneNum 8675309
You would maintain secondary indices, so you would have to call
SADD Person:firstName:Bob Person:1
SADD Person:lastName:Marley Person:1
SADD Person:phoneNum:8675309 Person:1
ZADD Person:age 32 Person:1
This allows you to now perform search-like operations
e.g.
SELECT p.age
FROM People AS p
WHERE p.firstName = 'Bob' and p.lastName = 'Marley' and p.phoneNum = '8675309'
Becomes:
ids = SINTER Person:firstName:Bob Person:lastName:Marley Person:phoneNum:8675309
foreach id in ids:
age = HGET id age
print(age)
The key challenge to this methodology is that in addition to being relatively complicated to set up (it really forces you to think about your model), it becomes extremely difficult to maintain atomically, particularly in shardded environments (where cross-shard key constraints can become problematic) consequentially the keys and index can drift apart, forcing you to periodically have to loop through and rebuild the index.
Newer Secondary Indices with RediSearch
Caveat: This uses RediSearch a Redis Module available under the Redis Source Available License
There's a newer module that plugs into Redis that can do all this for you called RediSearch This lets you declare secondary indices, and then will take care of indexing everything for you as you insert it. For the above example, you would just need to run
FT.CREATE person-idx ON HASH PREFIX 1 Person: SCHEMA firstName TAG lastName TAG phoneNumber TEXT age NUMERIC SORTABLE
That would declare the index, and after that all you need to do is insert stuff into Redis, e.g.
HSET Person:1 firstName Bob lastName Marley phoneNumber 8675309 age 32
Then you could run:
FT.SEARCH person-idx "#firstName:{Bob} #lastName:{Marley} #phoneNumber: 8675309 #age:[-inf 33]"
To return all the items matching the pattern see query syntax for more details

zeeSQL is a novel Redis modules with SQL and secondary indexes capabilities, allowing search by value of Redis keys.
You can set it up in such a way to track the values of all the hashes and put them into a standard SQL table.
For your example of searching people by phone number and name, you could do something like.
> ZEESQL.CREATE_DB DB
"OK"
> ZEESQL.INDEX DB NEW PREFIX customer:* TABLE customer SCHEMA id INT name STRING phone STRING
At this point zeeSQL will track all the hashes that start with custumer and will put them into a SQL table. It will store the fields id as an integer, name as a string and phone as a string.
You can populate the table simply adding hashes to Redis, and zeeSQL will keep everything in sync.
> HMSET customer:1 id 1 name joseph phone 123-345-2345
> HMSET customer:2 id 2 name lukas phone 234-987-4453
> HMSET customer:3 id 3 name mary phone 678-443-2341
At this point you can look into the customer table and you will find the result you are looking for.
> ZEESQL.EXEC DB COMMAND "select * from customer"
1) 1) RESULT
2) 1) id
2) 2) name
2) 3) phone
3) 1) INT
3) 2) STRING
3) 3) STRING
4) 1) 1
4) 2) joseph
4) 3) 123-345-2345
5) 1) 2
5) 2) lukas
5) 3) 234-987-4453
6) 1) 3
6) 2) mary
6) 3) 678-443-2341
The results specify, at first the name of the columns, then the type of the columns and finally the actual results set.
zeeSQL is based on SQLite and it supports all the SQLite syntax for filtering and aggregation.
For instance, you could search for people knowing only the prefix of their phone number.
> ZEESQL.EXEC DB COMMAND "select name from customer where phone like 678%"
1) 1) RESULT
2) 1) name
3) 1) STRING
4) 1) mary
You can find more examples in the tutorial: https://doc.zeesql.com/tutorial#using-secondary-indexes-or-search-by-values-in-redis

Related

Cascade deletes in Redis

On my current project I'm implementing autocompletion service on top of Redis, for it I use such approach (this article describes it more widely):
1) for storing dump of the data I have hash in which I put searchable objects as a values, for instance
HSET data 1 "{\"name\":\"Kill Bill\",\"year\":2003}"
HSET data 2 "{\"name\":\"King Kong\",\"year\":2005}"
2) for storing all possible sequences of input characters (that I generate in advance) which could be used in search I use sorted sets, like
ZADD search:index:k 0 1
ZADD search:index:ki 0 1
ZADD search:index:kil 0 1
ZADD search:index:kill 0 1
Where value stored in sorted set (in my example '1') is key for data from hash. So, for searching some data (for example where name started with 'ki') we need to make two steps:
data_keys = REDIS.zrevrange('search:index:ki', 0, -1)
matching_data = REDIS.hmget(data, *data_keys)
The issue I tried to solve - how automatically remove all data from sorted sets related to hash values when I removed it? In relational databases I can use cascade deletion for such cases, but how can I handle it in Redis?
Your design appears awkward to me, I'm unsure what you're actually trying to do with Redis and perhaps that could be the topic of another question.
That said, to address your question, Redis does offer a "cascading delete"-like behavior. Instead, if you're deleting hash "1", iterate the prefix and ZREM it from the relevant sorted sets.
Note: do not use a Lua script for this task, as it will generate key names (i.e. sorted sets by prefix) and that is against the recommendations (will not work on a cluster)

Getting top results from Redis hash

I am trying to write a query in Redis to get the first 2 field values of my hash key..
Basically, when I do HVALS hashname, I want to get the values of the first 2 fields added (the oldest 2). This is somewhat like getting the TOP 2 tuples in a SQL database.
Is this possible in redis?
No, this isn't possible. The order of fields and values in a Redis Hash is for all intents and purposes random (despite the empirical evidence obtained on from experimenting on smallish Hashes). For ordering elements, refer to Redis' Sorted Sets.
Update: to answer the question in the comment, IIUC it looks like you can solve it easily with just Strings. Because of Redis' nature, at any given moment there is either one user waiting for a specific match, or zero. You can SET matchmaking:blue username1:token if the key doesn't exist (i.e. zero users waiting for the match) and GET and DEL it if it exists. Be sure to use SET's "NX" subcommand, MULTI/EXEC and/or Lua to ensure the atomicity of these two logical operations.
From what I have experimented with, HVALS returns values for keys in the order you're looking for i.e. oldest key first. Now its up to you to only pick the first two values using the client program e.g. HSET myhmap name "abhi" , HSET myhmap email "test#test" , HSET myhmap planet "earth", HSET myhmap galaxy "andromeda". HVALS myhmap will return "abhi","test#test", "earth" , "andromeda"

What is the best way to retrieve soccer games by league names in redis?

I have a hundreds of soccer games saved in my redus database. They are saved in hashes under the key: games:soccer:data I have three z sets to clasify them into upcoming, live, and ended. All ordered by date (score). This way I can easily retrieve them depending on if will start soon, they are already happening, or they already ended. Now, i want to be able to retrieve them by league names.
I came up with two alternatives:
First alternative: save single hashes containing the game id and the league name. This way I can get all live game ids and then check each id against their respective hashes, if it matches the league name(s) i want, then i push it into an array, if not, i skip it. Finally, return the array with all game ids for the leagues i wanted.
Second alternative: create keys for each league and have live, upcoming, and ended sets for each. This way, i think, it would be faster to retrieve the game ids; however, it would be a pain to maintain each set.
If you have any other way of doing this, please let me know. I don't know if sorting would be faster and save me some memory.
I am looking for speed and low memory usage.
EDIT (following hobbs alternative):
const multi = client.multi();
const tempSet = 'users:data:14:sports:soccer:lists:temp_' + getTimestamp();
return multi
.sunionstore(
tempSet,
[
'sports:soccer:lists:leagueNames:Bundesliga',
'sports:soccer:lists:leagueNames:La Liga'
]
)
.zinterstore(
'users:data:14:sports:soccer:lists:live',
2,
'sports:lists:live',
tempSet
)
.del(tempSet)
.execAsync()
I need to set AGGREGATE MAX to my query and I have no idea how.
One way would be to use a SET containing all of the games for each league, and use ZINTERSTORE to compute the intersection between your league sets and your existing sets. You could do the ZINTERSTORE every time you query the data (it's not a horribly expensive operation unless your data is very large), or you could do it only when writing to one of the "parent" sets, or you could treat it as a sort of cache by giving it a short TTL and creating it only if it doesn't exist when you go to query it.

How to combine muli-fields values and sorted time-ranges using Redis

I am trying to insert time based records with multiple fields on the values (with TTL enabled).
For the multiple fields the best way to do it via Redis is using HSET:
HSET user:32 name "johns" timecreated "3333311232" address "somewhere"
I also try to read those values via time range:
for example return all history records (for example user 32) which was inserted in the last day:
so the best for that would be storing via ZADD using scores(this time I am losing the hash-map structure for easy retrieval):
ZADD user:32 3333311232 "name=johns,timecreated=3333311232,address=somewhere"
On the top of the things I want to add TTL for each record
Any idea how I could optimize my design?
I could split into two but that will requires two queries when reading:
ZADD user:32 3333311232 "user:32:3333311232"
HMSET user:32:3333311232 name “johns” timecreated “3333311232” address="somewhere"
than to retrieve ill need:
//some range
ZRANGEBYSCORE user:32 3333311232 333331123
result: 1389772850
now to get all information: HGETALL user:32:1389772850
What do you think?
Thank you,
ray.
The two methods you describe are the two common approaches. If you store the entire object in the ZSET, you would typically store it as a JSON string. If you don't need "random" access to the object, that's a valid approach.
I usually go for the other approach; a ZSET combined with hashes. the two queries are not a big deal. You could even abstract it away with a Lua script; see EVAL.
Regarding the TTL, while you cannot expire individual ZSET values, you could expire the hash, and use keyspace notifications to listen for the expired event, and remove the corresponding value from the ZSET.
Let me know if you need some more specifics.

Redis - Sorted set, find item by property value

In redis I store objects in a sorted set.
In my solution, it's important to be able to run a ranged query by dates, so I store the items with the score being the timestamp of each items, for example:
# Score Value
0 1443476076 {"Id":"92","Ref":"7ADT","DTime":1443476076,"ATime":1443901554,"ExTime":0,"SPName":"7ADT33CFSAU6","StPName":"7ADT33CFSAU6"}
1 1443482969 {"Id":"11","Ref":"DAJT","DTime":1443482969,"ATime":1443901326,"ExTime":0,"SPName":"DAJTJTT4T02O","StPName":"DAJTJTT4T02O"}
However, in other situations I need to find a single item in the set based on it's ID.
I know I can't just query this data structure as if it were a nosql db, but I tried using ZSCAN, which didn't work.
ZSCAN MySet 0 MATCH Id:92 count 1
It returns; "empty list or set"
Maybe I need to serialize different?
I have serialized using Json.Net.
How, if possible, can I achieve this; using dates as score and still be able to lookup an item by it's ID?
Many thanks,
Lars
Edit:
Assume it's not possible, but any thoughts or inputs are welcome:
Ref: http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-1/
In Redis, data can only be queried by its key. Even if we use a hash,
we can't say get me the keys wherever the field race is equal to
sayan.
Edit 2:
I tried to do:
ZSCAN MySet 0 MATCH *87*
127.0.0.1:6379> ZSCAN MySet 0 MATCH *87*
1) "192"
2) 1) "{\"Id\":\"64\",\"Ref\":\"XQH4\",\"DTime\":1443837798,\"ATime\":1444187707,\"ExTime\":0,\"SPName\":\"XQH4BPGW47FM\",\"StPName\":\"XQH4BPGW47FM\"}"
2) "1443837798"
3) "{\"Id\":\"87\",\"Ref\":\"5CY6\",\"DTime\":1443519199,\"ATime\":1444172326,\"ExTime\":0,\"SPName\":\"5CY6DHP23RXB\",\"StPName\":\"5CY6DHP23RXB\"}"
4) "1443519199"
And it finds the desired item, but it also finds another one with an occurance of 87 in the property ATime. Having more unique, longer IDs might work this way and I would have to filter the results in code to find the one with the exact value in its property.
Still open for suggestions.
I think it's very simple.
Solution 1(Inferior, not recommended)
Your way of ZSCAN MySet 0 MATCH Id:92 count 1 didn't work out because the stored string is "{\"Id\":\"92\"... not "{\"Id:92\".... The string has been changed into another format. So try to use MATCH Id\":\"64 or something like that to match the json serialized data in redis. I'm not familiar with json.net, so the actual string leaves for you to discover.
By the way, I have to ask you did ZSCAN MySet 0 MATCH Id:92 count 1 return a cursor? I suspect you used ZSCAN in a wrong way.
Solution 2(Better, strongly recommended)
ZSCAN is good when your sorted set is not large and you know how to save network roundtrip time by Redis' Lua transaction. This still make "look up by ID" operation O(n). Therefore, a better solution is to change you data model in the following way:
change sorted set
from
# Score Value
0 1443476076 {"Id":"92","Ref":"7ADT","DTime":1443476076,"ATime":1443901554,"ExTime":0,"SPName":"7ADT33CFSAU6","StPName":"7ADT33CFSAU6"}
1 1443482969 {"Id":"11","Ref":"DAJT","DTime":1443482969,"ATime":1443901326,"ExTime":0,"SPName":"DAJTJTT4T02O","StPName":"DAJTJTT4T02O"}
to
# Score Value
0 1443476076 Id:92
1 1443482969 Id:11
Move the rest detailed data in another set of hashes type keys:
# Key field-value field-value ...
0 Id:92 Ref-7ADT DTime-1443476076 ...
1 Id:11 Ref-7ADT DTime-1443476076 ...
Then, you locate by id by doing hgetall id:92. As to ranged query by date, you need do ZRANGEBYSCORE sortedset mindate maxdate then hgetall every id one by one. You'd better use lua to wrap these commands in one and it will still be super fast!
Data in NoSql database need to be organized in a redundant way like above. This may make some usual operation involve more than one commands and roundtrip, but it can be tackled by redis's lua feature. I strongly recommend the lua feature of redis, cause it wrap commands into one network roundtrip, which are all executed on the redis-server side and is atomic and super fast!
Reply if there's anything you don't know