How to design Redis data structures in order to perform queries similar to DB queries in redis? - sql

I have tables like Job, JobInfo. And i want to perform queries like below -
"SELECT J.JobID FROM Job J, JobInfo B WHERE B.JobID = J.JobID AND BatchID=5850 AND B.Status=0 AND J.JobType<>2"
How shall i go about writing my redis data types so that i can map such queries in redis?
IF i try to map the rows of table job in a redis hash for e.g. (hash j jobid 1 status 2) & similarly the rows of table JobInfo in again a redis hash as (hash jinfo jobid 1 jobtype 3.)
So my tables can be a set of hashes. Job table can be set with entries JobSet:jobid & JobInfo table can be set with entries like JobInfoSet:jobid
But i am confused in when i will do a SINTER on JobSet & JobInfoSet. how am i going to query that hash to get keys? As in the hash content of set jobSet is not identical to hash content of table JobInfoSet (they may have different key value pair.
So what exactly am i going to get as an output of SINTER? And how am i going to query that output as key-value pair?
So the tables will be a collection of redis hashes

Redis is not designed to structure the data in SQL way. Beside a in-memory key value store, it supports five types of data structures: Strings, Hashes, Lists, Sets and Sorted Sets. At high level this is a sufficient hint that Redis is designed to solve performance problems that arises due to high computation in relational data models. However, if you want to execute sql query in a in-memory structure, you may want to look at memsql.

Let's break down the SQL statement into different components and I'll try to show how redis can accomplish various parts.
Select J.JobID, J.JobName from Job J;
We translate each row in "Job" into a hash in redis using the SQL primary index as the redis natural index in redis.
For example:
SQL
==JobId==|==Name==
123 Fred
Redis
HSET Job:123 Name Fred
which can be conceptualized as
Job-123 => {"Name":"Fred"}
Thus we can store columns as hash fields in redis
Let's say we do the same thing for JobInfo. Each JobInfo object has its own ID
JobInfo-876 => {"meta1": "some value", "meta2": "bla", "JobID": "123"}
In sql normally we would make a secondary index on JobInfo.JobID but in NoSql land we maintain our own secondary indexes.
Sorted Sets are great for this.
Thus when we want to fetch JobInfo objects by some field, JobId in this case we can add it to a sorted set like this
ZADD JobInfo-JobID 123 JobInfo-876
This results in a set with 1 element in it {JobInfo-876} which has a score of 123. I realize that forcing all JobIDs into the float range for the score is a bad idea, but work with me here.
Now when we want to find all JobInfo objects for a given JobID we just do a log(N) lookup into the index.
ZRANGEBYSCORE JobInfo-JobID 123 123
which returns "JobInfo-876"
Now to implement simple joins we simply reuse this JobInfo-JobID index by storing Job keys by their JobIDs.
ZADD JobInfo-JobID 123 Job-123
Thus when doing something akin to
SELECT J.JobID, J.Name, B.meta1 FROM Job, JobInfo USING (JobID).
This would translate to scanning through the JobInfo-JobID secondary index and reorganizing the Job and JobInfo objects returned.
ZRANGEBYSCORE JobInfo-JobID -inf +inf WITHSCORES
5 -> (Job-123, JobInfo-876)
These objects all share the same JobID. CLient side you'd then asynchronously fetch the needed fields. Or you could embed these lookups in a lua script. This lua script could make redis hang for a long time. Normally redis tries to be fair with clients and prefers you to have short batched queries instead of one long query.
Now we come to a big problem, what if we want to combine secondary indexes. Let's say we have a secondary index on JobInfo.Status, and another on Job.JobType.
If we make a set of all jobs with the right JobType and use that as a filter on the JobInfo-JobID shared secondary index then we not only eliminate the bad Job elements but also every JobInfo element. We could, I guess fetch the scores(JobID) on the intersection and refetch all JobInfo objects with those scores, but we lose some of the filtering we did.
It is at this point where redis breaks down.
Here is an article on secondary indexes from the creator of redis himself: http://redis.io/topics/indexes
He touches multi-dimensional indexes for filtering purposes. As you can see he designed the data structures in a very versatile way. One that is the most appealing is the fact that sorted set elements with the same score are stored in lexicographical order. Thus you can easily have all elements have a score of 0 and piggyback on Redis's speed and use it more like cockroachDB, which relies on a global order to implement many SQL features.

The other answer are completely correct for redis up to version 3.4
The latest releases of redis, from 4.0 onward, include supports for modules.
Modules are extremelly powerfull and it happens that I just wrote a small module to embed SQLite into redis itself; rediSQL.
With that module you can actually use a fully functional SQL database inside your redis instace.

Related

Performance of Lucene queries in Ignite

I have a very simple object as keys in my cache and I want to be able to iterate on the key/value pairs where a string matches a field in my keys.
Here is how the field is declared in the class
#AffinityKeyMapped #QueryTextField String crawlQueueID;
I run many queries and expect a small amount of documents to match. The queries take a relatively large amount of time, which is surprising given that there are maybe only 100K pairs locally in the cache. My queries are local, I want to hit only the K/V stored in the local node.
According to the profiler I am using, 80% of the CPU is spent here
GridLuceneIndex.java:285 org.apache.lucene.search.IndexSearcher.search(Query, int)
Knowing Lucene's performance, I am really surprised. Any suggestions?
BTW I want to sort the results based on a numerical field in the value object. Can this be done via annotations?
I could have one cache per value of the field I am querying against but given that there are potentially hundreds of thousands or even millions of different values, that would probably be too many caches for Ignite to handle.
EDIT
Looking at the code that handles the Lucene indexing and querying, the index gets reloaded for every query. Given that I do hundreds of them in a row, we probably don't benefit from any caching or optimisation of the index structure in Lucene.
Additionally, there is a range query running as a filter to check for the TTL. FilterQueries are faster but on a fresh indexreader, there would not be much caching either. Of course, if no TTL is needed for a given table, this should not be required.
Judging by the documentation about the indexing with SQL indexing:
Ignite automatically creates indexes for each primary key and affinity
key field.
the indexing is done on the key alone. In my case, the value I want to use for sorting is in the value object so that would not work.

redis: Get all keys that contain queried element in set

I'm looking into redis for my specific need but I know don't see how I could achieve what I need.
I want to store sets of integers (100s-low thousands entries per set) and then "query" by an input set. All sets that contain all the elements as the query should match. (SDIFF query key should return empty set and then iterate over all keys).
I don't see how this can be done efficiently (about 5ms per 10k keys)?
If you will only query your data by integer, consider then storing using the integer as a key. Instead of:
SADD a 3 5
SADD b 3 7
You can:
SADD int:3 a
SADD int:5 a
SADD int:3 b
SADD int:7 b
Then you use SINTER int:3 int:7 ... to get all matching integer set names (what you used for keys originally).
If you do need to query both ways, then you may need to do both. This is like doing a many-to-many relationship in Redis.
Here, you are trading off insert time and memory usage for fast query performance. Every time you add a pair, you need to both SADDs: SADD setName int and SADD prefix:int setName.
If this extra memory and insert effort is not an option on your case, your next option is to use a Lua Script to loop through the keys (pattern matching your set names) and using SISMEMBER to test through the integers of your query. See Lua script for Redis which sums the values of keys for an example of looping through a set of keys using Lua.
A Lua script is like a stored procedure, it will run atomically on your Redis server. However, if it will give perform at 5ms for 10k sets tested for multiple integer members remains to be seen.

Out of Process in memory database table that supports queries for high speed caching

I have a SQL table that is accessed continually but changes very rarely.
The Table is partitioned by UserID and each user has many records in the table.
I want to save database resources and move this table closer to the application in some kind of memory cache.
In process caching is too memory intensive so it needs to be external to the application.
Key Value stores like Redis are proving inefficient due to the overhead of serializing and deserializing the table to and from Redis.
I am looking for something that can store this table (or partitions of data) in memory, but let me query only the information I need without serializing and deserializing large blocks of data for each read.
Is there anything that would provide Out of Process in memory database table that supports queries for high speed caching?
Searching has shown that Apache Ignite might be a possible option, but I am looking for more informed suggestions.
Since it's out-of-process, it has to do serialization and deserialization. The problem you concern is how to reduce the serialization/deserizliation work. If you use Redis' STRING type, you CANNOT reduce these work.
However, You can use HASH to solve the problem: mapping your SQL table to a HASH.
Suppose you have the following table: person: id(varchar), name(varchar), age(int), you can take person id as key, and take name and age as fields. When you want to search someone's name, you only need to get the name field (HGET person-id name), other fields won't be deserialzed.
Ignite is indeed a possible solution for you since you may optimize serialization/deserialization overhead by using internal binary representation for accessing objects' fields. You may refer to this documentation page for more information: https://apacheignite.readme.io/docs/binary-marshaller
Also access overhead may be optimized by disabling copy-on-read option https://apacheignite.readme.io/docs/performance-tips#section-do-not-copy-value-on-read
Data collocation by user id is also possible with Ignite: https://apacheignite.readme.io/docs/affinity-collocation
As the #for_stack said, Hash will be very suitable for your case.
you said that Each user has many rows in db indexed by the user_id and tag_id . So It is that (user_id, tag_id) uniquely specify one row. Every row is functional depends on this tuple, you could use the tuple as the HASH KEY.
For example, if you want save the row (user_id, tag_id, username, age) which values are ("123456", "FDSA", "gsz", 20) into redis, You could do this:
HMSET 123456:FDSA username "gsz" age 30
When you want to query the username with the user_id and tag_id, you could do like this:
HGET 123456:FDSA username
So Every Hash Key will be a combination of user_id and tag_id, if you want the key to be more human readable, you could add a prefix string such as "USERINFO". e.g. : USERINFO:123456:FDSA .
BUT If you want to query with only a user_id and get all rows with this user_id, this method above will be not enough.
And you could build the secondary indexes in redis for you HASH.
as the above said, we use the user_id:tag_id as the HASH key. Because it can unique points to one row. If we want to query all the rows about one user_id.
We could use sorted set to build a secondary indexing to index which Hashes store the info about this user_id.
We could add this in SortedSet:
ZADD user_index 0 123456:FDSA
As above, we set the member to the string of HASH key, and set the score to 0. And the rule is that we should set all score in this zset to 0 and then we could use the lexicographical order to do range query. refer zrangebylex.
E.g. We want to get the all rows about user_id 123456,
ZRANGEBYLEX user_index [123456 (123457
It will return all the HASH key whose prefix are 123456, and then we use this string as HASH key and hget or hmget to retrieve infomation what we want.
[ means inclusive, and ( means exclusive. and why we use 123457? it is obvious. So when we want to get all rows with a user_id, we shoud specify the upper bound to make the user_id string's leftmost char's ascii value plus 1.
More about lex index you could refer the article I mentioned above.
You can try apache mnemonic started by intel. Link -http://incubator.apache.org/projects/mnemonic.html. It supports serdeless features
For a read-dominant workload MySQL MEMORY engine should work fine (writing DMLs lock whole table). This way you don't need to change you data retrieval logic.
Alternatively, if you're okay with changing data retrieval logic, then Redis is also an option. To add to what #GuangshengZuo has described, there's ReJSON Redis dynamically loadable module (for Redis 4+) which implements document-store on top of Redis. It can further relax requirements for marshalling big structures back and forth over the network.
With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:
The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.
Use Redis' hash map data type
Form key name from primary key values of the table by a separator (such as ":")
Store the remaining fields as a hash
When you want to query a single row, directly form the key and retrieve its results
When you want to query a range, use wild char "*" towards your key. But please be aware, scanning keys interrupt other Redis processes. So use this method if you really have to.
The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)
using Memcache and REDIS combination on top of MYSQL comes to Mind.

Correct modeling in Redis for writing single entity but querying multiple

I'm trying to convert data which is on a Sql DB to Redis. In order to gain much higher throughput because it's a very high throughput. I'm aware of the downsides of persistence, storage costs etc...
So, I have a table called "Users" with few columns. Let's assume: ID, Name, Phone, Gender
Around 90% of the requests are Writes. to update a single row.
Around 10% of the requests are Reads. to get 20 rows in each request.
I'm trying to get my head around the right modeling of this in order to get the max out of it.
If there were only updates - I would use Hashes.
But because of the 10% of Reads I'm afraid it won't be efficient.
Any suggestions?
Actually, the real question is whether you need to support partial updates.
Supposing partial update is not required, you can store your record in a blob associated to a key (i.e. string datatype). All write operations can be done in one roundtrip, since the record is always written at once. Several read operations can be done in one rountrip as well using the MGET command.
Now, supposing partial update is required, you can store your record in a dictionary associated to a key (i.e. hash datatype). All write operations can be done in one roundtrip (even if they are partial). Several read operations can also be done in one roundtrip provided HGETALL commands are pipelined.
Pipelining several HGETALL commands is a bit more CPU consuming than using MGET, but not that much. In term of latency, it should not be significantly different, except if you execute hundreds of thousands of them per second on the Redis instance.

Design Redis database table like SQL?

Suppose my database table structure is like this
id name college address
1 xxx nnn xn
2 yyy nnm yn
3 zzz nnz zn
If i want to get the student details based on the name in sql like this
select * from student where name = 'xxx'
so how its is possible in redis database
Redis, like other NoSQL datastores, has different requirements based on what you are going to be doing.
Redis has several data structures that could be useful depending on your need. For example, given your desire for a select * from student where name = 'xxx' you could use a Redis hash.
redis 127.0.0.1:6379> hmset student:xxx id 1 college nnn address xn
OK
redis 127.0.0.1:6379> hgetall student:xxx
1) "id"
2) "1"
3) "college"
4) "nnn"
5) "address"
6) "xn"
If you have other queries though, like you want to do the same thing but select on where college = 'nnn' then you are going to have to denormalize your data. Denormalization is usually a bad thing in SQL, but in NoSQL it is very common.
If your primary query will be against the name, but you may need to query against the college, then you might do something like adding a set in addition to the hashes.
redis 127.0.0.1:6379> sadd college:nnn student:xxx
(integer) 1
redis 127.0.0.1:6379> smembers college:nnn
1) "student:xxx"
With your data structured like this, if you wanted to find all information for names going to college xn, you would first select the set, then select each hash based on the name returned in the set.
Your requirements will generally drive the design and the structures you use.
With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:
The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.
Use Redis' hash map data type
Form key name from primary key values of the table by a separator (such as ":")
Store the remaining fields as a hash
When you want to query a single row, directly form the key and retrieve its results
When you want to query a range, use wild char "*" towards your key.
The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)
For plain, vanilla redis the other answers are completely correct, however, yesterday (02 - December - 2016) redis 4-rc1 is been released.
redis v4 provides support for modules and I just wrote a small module to embed SQLite into redis itself; rediSQL.
With that module you can actually use a fully functional SQL database inside your redis instace.
Redis just has some basic data structures with it, NoSQL and SQL are different worlds. But You can use Redis like some schemed SQL data store. There are funny program Redisql on github which try to play with Redis via SQL, and the idea behind Redisql is such that #sberry mentioned.
Hope it is not too late since the original question is long for six year of time. You may try my dbx plugin: https://github.com/cscan/dbx
Which support the simple SQL to maintain the hashes in REDIS. Something like this:
127.0.0.1:6379> dbx.select name, tel from phonebook where gender = 'F' order by age desc
or calling from shell
$ redis-cli "dbx.select name, tel from phonebook where gender = 'F' order by age desc"
Hope this help.
You can try searchbox framework. searchbox provides easy way for querying redis data with its Criteria api.
OnceDB is a full-text search in-memory database based on Redis. It supports data management like SQL relational databases and NoSQL schemaless databases.
OnceDB does not change the data storage structure of Redis and is fully compatible with Redis. Redis database files can be directly operated in OnceDB and then returned to Redis for use.
OnceDB automatically creates auxiliary indexes through operators:
= Ordinary field value, no index
# Primary key
? Grouping index
* Keyword grouping index, separated by ',' between keywords
\ Sort index, the score weight of the index is the value of the field
for example, execute the following command to add the user data:
upsert user username # dota password = 123456 title ? SDEI skills * java,go,c
> OK
you can search from the index by an operator, such as searching user data containing the c keyword, and printing the username and password fields.
find user 0 -1 username = * password = * skills * c
find user 0 -1 username = * password = * skills * c
1) (integer) 1
2) "user:dota"
3) "dota"
4) "123456"
5) "java,go,c"
Read more:OnceDB quick start
In SQL database design, we first put everything into the database and then figure out how we will query about that
In Redis Design, we first figure out what queries we need to answer, and then we are going to structure our data.
That is why Redis is super fast. Redis stores data as a hash in some cases. If the record has many attributes, in your case, a student might have "Age,name,class" attributes so storing "student` as the hash is useful.
In Redis, when you build your application, you have to see what you are going to store-users, sessions, products- and based on those things that your app is storing, you have to plan which data structures to use to store each thing.