I'm looking into redis for my specific need but I know don't see how I could achieve what I need.
I want to store sets of integers (100s-low thousands entries per set) and then "query" by an input set. All sets that contain all the elements as the query should match. (SDIFF query key should return empty set and then iterate over all keys).
I don't see how this can be done efficiently (about 5ms per 10k keys)?
If you will only query your data by integer, consider then storing using the integer as a key. Instead of:
SADD a 3 5
SADD b 3 7
You can:
SADD int:3 a
SADD int:5 a
SADD int:3 b
SADD int:7 b
Then you use SINTER int:3 int:7 ... to get all matching integer set names (what you used for keys originally).
If you do need to query both ways, then you may need to do both. This is like doing a many-to-many relationship in Redis.
Here, you are trading off insert time and memory usage for fast query performance. Every time you add a pair, you need to both SADDs: SADD setName int and SADD prefix:int setName.
If this extra memory and insert effort is not an option on your case, your next option is to use a Lua Script to loop through the keys (pattern matching your set names) and using SISMEMBER to test through the integers of your query. See Lua script for Redis which sums the values of keys for an example of looping through a set of keys using Lua.
A Lua script is like a stored procedure, it will run atomically on your Redis server. However, if it will give perform at 5ms for 10k sets tested for multiple integer members remains to be seen.
Related
I am looking to have a set which will store elements and that I can get the cardinality after. I noticed I could use the commands SADD or PFADD then use SCARD or PFCOUNT. What is the difference between these two? What are the advantages/disadvantages?
When using SADD, you store data in a SET.
When using PFADD, you store data in an Hyperloglog, which is a different kind of data structure.
A SET is used to store unique values, when you have to access again these values.
An HyperLogLog allows to get an approximate count of the number of unique values in the data added using PFADD. It is useful when you have a great number of distinct values and don't need to get them back. It may be used by example to get the number of unique visitors for a given day for a given page on a high traffic web site (you just add the unique visitor IDs to the HLL).
SADD and SCARD are for "Set".
PFADD and PFCOUNT are for "HyperLogLog".
Advantage of "HyperLogLog":
"HyperLogLog" takes much less memory than "Set".
This video below explains about "HyperLogLog" precisely in about 5 minutes.
https://youtu.be/UAL2dxl1fsE
I have tables like Job, JobInfo. And i want to perform queries like below -
"SELECT J.JobID FROM Job J, JobInfo B WHERE B.JobID = J.JobID AND BatchID=5850 AND B.Status=0 AND J.JobType<>2"
How shall i go about writing my redis data types so that i can map such queries in redis?
IF i try to map the rows of table job in a redis hash for e.g. (hash j jobid 1 status 2) & similarly the rows of table JobInfo in again a redis hash as (hash jinfo jobid 1 jobtype 3.)
So my tables can be a set of hashes. Job table can be set with entries JobSet:jobid & JobInfo table can be set with entries like JobInfoSet:jobid
But i am confused in when i will do a SINTER on JobSet & JobInfoSet. how am i going to query that hash to get keys? As in the hash content of set jobSet is not identical to hash content of table JobInfoSet (they may have different key value pair.
So what exactly am i going to get as an output of SINTER? And how am i going to query that output as key-value pair?
So the tables will be a collection of redis hashes
Redis is not designed to structure the data in SQL way. Beside a in-memory key value store, it supports five types of data structures: Strings, Hashes, Lists, Sets and Sorted Sets. At high level this is a sufficient hint that Redis is designed to solve performance problems that arises due to high computation in relational data models. However, if you want to execute sql query in a in-memory structure, you may want to look at memsql.
Let's break down the SQL statement into different components and I'll try to show how redis can accomplish various parts.
Select J.JobID, J.JobName from Job J;
We translate each row in "Job" into a hash in redis using the SQL primary index as the redis natural index in redis.
For example:
SQL
==JobId==|==Name==
123 Fred
Redis
HSET Job:123 Name Fred
which can be conceptualized as
Job-123 => {"Name":"Fred"}
Thus we can store columns as hash fields in redis
Let's say we do the same thing for JobInfo. Each JobInfo object has its own ID
JobInfo-876 => {"meta1": "some value", "meta2": "bla", "JobID": "123"}
In sql normally we would make a secondary index on JobInfo.JobID but in NoSql land we maintain our own secondary indexes.
Sorted Sets are great for this.
Thus when we want to fetch JobInfo objects by some field, JobId in this case we can add it to a sorted set like this
ZADD JobInfo-JobID 123 JobInfo-876
This results in a set with 1 element in it {JobInfo-876} which has a score of 123. I realize that forcing all JobIDs into the float range for the score is a bad idea, but work with me here.
Now when we want to find all JobInfo objects for a given JobID we just do a log(N) lookup into the index.
ZRANGEBYSCORE JobInfo-JobID 123 123
which returns "JobInfo-876"
Now to implement simple joins we simply reuse this JobInfo-JobID index by storing Job keys by their JobIDs.
ZADD JobInfo-JobID 123 Job-123
Thus when doing something akin to
SELECT J.JobID, J.Name, B.meta1 FROM Job, JobInfo USING (JobID).
This would translate to scanning through the JobInfo-JobID secondary index and reorganizing the Job and JobInfo objects returned.
ZRANGEBYSCORE JobInfo-JobID -inf +inf WITHSCORES
5 -> (Job-123, JobInfo-876)
These objects all share the same JobID. CLient side you'd then asynchronously fetch the needed fields. Or you could embed these lookups in a lua script. This lua script could make redis hang for a long time. Normally redis tries to be fair with clients and prefers you to have short batched queries instead of one long query.
Now we come to a big problem, what if we want to combine secondary indexes. Let's say we have a secondary index on JobInfo.Status, and another on Job.JobType.
If we make a set of all jobs with the right JobType and use that as a filter on the JobInfo-JobID shared secondary index then we not only eliminate the bad Job elements but also every JobInfo element. We could, I guess fetch the scores(JobID) on the intersection and refetch all JobInfo objects with those scores, but we lose some of the filtering we did.
It is at this point where redis breaks down.
Here is an article on secondary indexes from the creator of redis himself: http://redis.io/topics/indexes
He touches multi-dimensional indexes for filtering purposes. As you can see he designed the data structures in a very versatile way. One that is the most appealing is the fact that sorted set elements with the same score are stored in lexicographical order. Thus you can easily have all elements have a score of 0 and piggyback on Redis's speed and use it more like cockroachDB, which relies on a global order to implement many SQL features.
The other answer are completely correct for redis up to version 3.4
The latest releases of redis, from 4.0 onward, include supports for modules.
Modules are extremelly powerfull and it happens that I just wrote a small module to embed SQLite into redis itself; rediSQL.
With that module you can actually use a fully functional SQL database inside your redis instace.
I have a large map, which maps skuIds (strings, e.g.: AB-1 to "hola") to names; the skuIds are unique, but the names are not.
There are about 1 million skuIds mapped to about 1000 unique names. Now I need to get the unique name list for any subset of the skuId set.
I tried hashmap's hmget, but retrieving millions of records and looping through is not effiecient; then I tried using the Sorted sets, (kept the same score for same name), but then I needed the set of scores for a sorted set, which is not provided directly by redis.
We can do this by using a Lua script (taking about 10-15 seconds), but I am not sure about disadvantages of having Lua scripts in the code.
Use HSCAN for that. Also look at this answer about HSCAN usage.
This is a 2 part question.
I have a redis db storing items with the following keys:
record type 1: "site_id:1_item_id:3"
record type 2: "site_id:1_item_id:3_user_id:6"
I've been using KEYS site_id:1_item_id:* to grab record type 1 items (in this case for site 1)
Unfortunately, it returns all type 1 and type 2 items.
Whats the best way to grab all "site_id:1_item_id:3" type records? While avoiding the ones including user_id? Is there an EOL match I can use?
Secondly, I've read using KEYS is a bad choice, can anyone recommend a different approach here? I'm open to editing the key names if I must.
First thing first: unless your are the only redis user on your local developpment machine, you are right using KEYS is wrong. It blocks the redis instance until completed, so anyone querying it while you are using keys will have to wait for you keys query to be finished. Use SCAN instead.
SCAN will iterate over the entries in a non blocking way, and you are guaranteed to get all of them.
I don't know which language you use to query it, but in python it is quite easy to query the keys with scan and filter them on the fly.
But let's say you would like to use keys anyway. Looks to me like either doing KEYS site_id:1_item_id:? or KEYS site_id:1_item_id:3 does the trick.
wether you want the keys finishing with "3" or not (I am not sure I completely understood your question here).
Here is an example that I tried on my local machine:
redis 127.0.0.1:6379> flushall
OK
redis 127.0.0.1:6379> set site_id:1_item_id:3 a
OK
redis 127.0.0.1:6379> set site_id:1_item_id:3_user_id:6 b
OK
redis 127.0.0.1:6379> set site_id:1_item_id:4 c
OK
// ok so we have got the database cleaned and set up
redis 127.0.0.1:6379> keys *
1) "site_id:1_item_id:3"
2) "site_id:1_item_id:4"
3) "site_id:1_item_id:3_user_id:6"
// gets all the keys like site_id:1_item_id:X
redis 127.0.0.1:6379> keys site_id:1_item_id:?
1) "site_id:1_item_id:3"
2) "site_id:1_item_id:4"
// gets all the keys like site_id:1_item_id:3
redis 127.0.0.1:6379> keys site_id:1_item_id:3
1) "site_id:1_item_id:3"
Don't forget that Redis KEYS uses GLOB style pattern, which is not exactly like a regex.
You can check out the keys documentation examples to make sure you understand
The correct approach here, is to use an index of keys - maintained by you. Redis should not be queried in any conventional sense.
When creating a key in Redis, I get using the ":" format and treating it similar to a URL structure.
But what if that structure itself contains key-value type combinations? Does one put the key in the structure?
Made-up Example:
Option A) "country:usa:manufacturer:ford:vehicle:f150:color" = black
or
Option B) "usa:ford:f150:color" = black
In some ways, I think that there is strength in the structure of Option A, but it also adds a lot of complexity to the key.
Thoughts?
While keeping in mind your made-up example (do try to use an actual example, you'll get better answers) I would have to say neither.
I would go with an ID for the key, likely an int. then I'd put each key/value pair in your option A as a hash member and value.
For example:
HSET 1 country USA
HSET 1 manufacturer ford
And so on. Or you could use an hmset operation to set them all at once.
Why? You get the benefit of keeping the fields as describing the data (which you lose in your option b), the memory advantages of hashes over strings, and reduced complexity on key structure, not to mention the memory benefits of a short integer as keyname versus a long string.
Further, you have a memory cheap way to create indexes as integer sets. for example a key called "country:1" could be a set of entry IDs which then give you a way to "pull all entries for country ID 1" - USA in the example. By using integers you get the benefit of being able to store these all in a very memory efficient way, at the minor cost of a lookup table. This could even be done in lua to avoid a network hop.
The greater the range of possible combinations and entries, the more valuable the memory savings are. If you've got millions or billions of them, you'll want to follow the integer-ID & lookup route. This would also set you up nicely if you ever need to shard data - either server side or client side.