Redis design events statistics store - redis

I asked another question about storing time sequence data
Want to use Redis as an events statistics store
ZADD [system]:[event] [timestamp] [result]:[timestamp]
This was the result of that question.
However using this type of data I feel could limit the usefulness. Ideally having more data as a value.
If I wanted to store a more complex key like a hash or maybe just some json. eg: {page: x, loadtime: y} and I wanted to group or query by that data, would that have to be in my app layer or can redis help here too.

Related

Out of Process in memory database table that supports queries for high speed caching

I have a SQL table that is accessed continually but changes very rarely.
The Table is partitioned by UserID and each user has many records in the table.
I want to save database resources and move this table closer to the application in some kind of memory cache.
In process caching is too memory intensive so it needs to be external to the application.
Key Value stores like Redis are proving inefficient due to the overhead of serializing and deserializing the table to and from Redis.
I am looking for something that can store this table (or partitions of data) in memory, but let me query only the information I need without serializing and deserializing large blocks of data for each read.
Is there anything that would provide Out of Process in memory database table that supports queries for high speed caching?
Searching has shown that Apache Ignite might be a possible option, but I am looking for more informed suggestions.
Since it's out-of-process, it has to do serialization and deserialization. The problem you concern is how to reduce the serialization/deserizliation work. If you use Redis' STRING type, you CANNOT reduce these work.
However, You can use HASH to solve the problem: mapping your SQL table to a HASH.
Suppose you have the following table: person: id(varchar), name(varchar), age(int), you can take person id as key, and take name and age as fields. When you want to search someone's name, you only need to get the name field (HGET person-id name), other fields won't be deserialzed.
Ignite is indeed a possible solution for you since you may optimize serialization/deserialization overhead by using internal binary representation for accessing objects' fields. You may refer to this documentation page for more information: https://apacheignite.readme.io/docs/binary-marshaller
Also access overhead may be optimized by disabling copy-on-read option https://apacheignite.readme.io/docs/performance-tips#section-do-not-copy-value-on-read
Data collocation by user id is also possible with Ignite: https://apacheignite.readme.io/docs/affinity-collocation
As the #for_stack said, Hash will be very suitable for your case.
you said that Each user has many rows in db indexed by the user_id and tag_id . So It is that (user_id, tag_id) uniquely specify one row. Every row is functional depends on this tuple, you could use the tuple as the HASH KEY.
For example, if you want save the row (user_id, tag_id, username, age) which values are ("123456", "FDSA", "gsz", 20) into redis, You could do this:
HMSET 123456:FDSA username "gsz" age 30
When you want to query the username with the user_id and tag_id, you could do like this:
HGET 123456:FDSA username
So Every Hash Key will be a combination of user_id and tag_id, if you want the key to be more human readable, you could add a prefix string such as "USERINFO". e.g. : USERINFO:123456:FDSA .
BUT If you want to query with only a user_id and get all rows with this user_id, this method above will be not enough.
And you could build the secondary indexes in redis for you HASH.
as the above said, we use the user_id:tag_id as the HASH key. Because it can unique points to one row. If we want to query all the rows about one user_id.
We could use sorted set to build a secondary indexing to index which Hashes store the info about this user_id.
We could add this in SortedSet:
ZADD user_index 0 123456:FDSA
As above, we set the member to the string of HASH key, and set the score to 0. And the rule is that we should set all score in this zset to 0 and then we could use the lexicographical order to do range query. refer zrangebylex.
E.g. We want to get the all rows about user_id 123456,
ZRANGEBYLEX user_index [123456 (123457
It will return all the HASH key whose prefix are 123456, and then we use this string as HASH key and hget or hmget to retrieve infomation what we want.
[ means inclusive, and ( means exclusive. and why we use 123457? it is obvious. So when we want to get all rows with a user_id, we shoud specify the upper bound to make the user_id string's leftmost char's ascii value plus 1.
More about lex index you could refer the article I mentioned above.
You can try apache mnemonic started by intel. Link -http://incubator.apache.org/projects/mnemonic.html. It supports serdeless features
For a read-dominant workload MySQL MEMORY engine should work fine (writing DMLs lock whole table). This way you don't need to change you data retrieval logic.
Alternatively, if you're okay with changing data retrieval logic, then Redis is also an option. To add to what #GuangshengZuo has described, there's ReJSON Redis dynamically loadable module (for Redis 4+) which implements document-store on top of Redis. It can further relax requirements for marshalling big structures back and forth over the network.
With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:
The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.
Use Redis' hash map data type
Form key name from primary key values of the table by a separator (such as ":")
Store the remaining fields as a hash
When you want to query a single row, directly form the key and retrieve its results
When you want to query a range, use wild char "*" towards your key. But please be aware, scanning keys interrupt other Redis processes. So use this method if you really have to.
The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)
using Memcache and REDIS combination on top of MYSQL comes to Mind.

How to design Redis data structures in order to perform queries similar to DB queries in redis?

I have tables like Job, JobInfo. And i want to perform queries like below -
"SELECT J.JobID FROM Job J, JobInfo B WHERE B.JobID = J.JobID AND BatchID=5850 AND B.Status=0 AND J.JobType<>2"
How shall i go about writing my redis data types so that i can map such queries in redis?
IF i try to map the rows of table job in a redis hash for e.g. (hash j jobid 1 status 2) & similarly the rows of table JobInfo in again a redis hash as (hash jinfo jobid 1 jobtype 3.)
So my tables can be a set of hashes. Job table can be set with entries JobSet:jobid & JobInfo table can be set with entries like JobInfoSet:jobid
But i am confused in when i will do a SINTER on JobSet & JobInfoSet. how am i going to query that hash to get keys? As in the hash content of set jobSet is not identical to hash content of table JobInfoSet (they may have different key value pair.
So what exactly am i going to get as an output of SINTER? And how am i going to query that output as key-value pair?
So the tables will be a collection of redis hashes
Redis is not designed to structure the data in SQL way. Beside a in-memory key value store, it supports five types of data structures: Strings, Hashes, Lists, Sets and Sorted Sets. At high level this is a sufficient hint that Redis is designed to solve performance problems that arises due to high computation in relational data models. However, if you want to execute sql query in a in-memory structure, you may want to look at memsql.
Let's break down the SQL statement into different components and I'll try to show how redis can accomplish various parts.
Select J.JobID, J.JobName from Job J;
We translate each row in "Job" into a hash in redis using the SQL primary index as the redis natural index in redis.
For example:
SQL
==JobId==|==Name==
123 Fred
Redis
HSET Job:123 Name Fred
which can be conceptualized as
Job-123 => {"Name":"Fred"}
Thus we can store columns as hash fields in redis
Let's say we do the same thing for JobInfo. Each JobInfo object has its own ID
JobInfo-876 => {"meta1": "some value", "meta2": "bla", "JobID": "123"}
In sql normally we would make a secondary index on JobInfo.JobID but in NoSql land we maintain our own secondary indexes.
Sorted Sets are great for this.
Thus when we want to fetch JobInfo objects by some field, JobId in this case we can add it to a sorted set like this
ZADD JobInfo-JobID 123 JobInfo-876
This results in a set with 1 element in it {JobInfo-876} which has a score of 123. I realize that forcing all JobIDs into the float range for the score is a bad idea, but work with me here.
Now when we want to find all JobInfo objects for a given JobID we just do a log(N) lookup into the index.
ZRANGEBYSCORE JobInfo-JobID 123 123
which returns "JobInfo-876"
Now to implement simple joins we simply reuse this JobInfo-JobID index by storing Job keys by their JobIDs.
ZADD JobInfo-JobID 123 Job-123
Thus when doing something akin to
SELECT J.JobID, J.Name, B.meta1 FROM Job, JobInfo USING (JobID).
This would translate to scanning through the JobInfo-JobID secondary index and reorganizing the Job and JobInfo objects returned.
ZRANGEBYSCORE JobInfo-JobID -inf +inf WITHSCORES
5 -> (Job-123, JobInfo-876)
These objects all share the same JobID. CLient side you'd then asynchronously fetch the needed fields. Or you could embed these lookups in a lua script. This lua script could make redis hang for a long time. Normally redis tries to be fair with clients and prefers you to have short batched queries instead of one long query.
Now we come to a big problem, what if we want to combine secondary indexes. Let's say we have a secondary index on JobInfo.Status, and another on Job.JobType.
If we make a set of all jobs with the right JobType and use that as a filter on the JobInfo-JobID shared secondary index then we not only eliminate the bad Job elements but also every JobInfo element. We could, I guess fetch the scores(JobID) on the intersection and refetch all JobInfo objects with those scores, but we lose some of the filtering we did.
It is at this point where redis breaks down.
Here is an article on secondary indexes from the creator of redis himself: http://redis.io/topics/indexes
He touches multi-dimensional indexes for filtering purposes. As you can see he designed the data structures in a very versatile way. One that is the most appealing is the fact that sorted set elements with the same score are stored in lexicographical order. Thus you can easily have all elements have a score of 0 and piggyback on Redis's speed and use it more like cockroachDB, which relies on a global order to implement many SQL features.
The other answer are completely correct for redis up to version 3.4
The latest releases of redis, from 4.0 onward, include supports for modules.
Modules are extremelly powerfull and it happens that I just wrote a small module to embed SQLite into redis itself; rediSQL.
With that module you can actually use a fully functional SQL database inside your redis instace.

Want to use Redis as an events statistics store

I am really interested in Redis, I have an idea and wanted to know if it is a suitable use case, or if it is not any other suggestions on a data store. Also any tips on storing the data would be appreciated.
My idea is just a simple event system so an event happens and it is stored in redis as follows
Key | Value
[unixtimestamp]:[system]:[event] | [result]
The data could be anything sales, impressions, errors, api response times, page load times any real time analytics. I then want to be able to make graphs based on that data.
This isn't an ideal design because it won't support your read pattern effectively and it will probably wasteful in terms of RAM if your [result] is short/small. Instead, look into using Redis' sorted sets with the timestamp as score, in the following fashion:
ZADD [system]:[event] [timestamp] [result]
Note that set members have to be unique so if [result]'s cardinality is low, make it unique by concatenating the timestamp to it (and filtering it out when you graph), i.e.:
ZADD [system]:[event] [timestamp] [result]:[timestamp]
This way you'll be able to fetch ranges of measurements by calling ZRANGEBYSCORE and graphing the results.

Redis suggesstion for selecting data type

We have questions based where in home page we were showing 2 list
Questions by date modified
Question have bigger views and ans count. And in this both listing if question have same views or ans count then sorting is based on date.
Previously i am directly quiring to MySQL database and fetching the values so it's easy.
But each page request hitting to MySQL it's bit expensive then start doing caching.
I started using Redis. Following is the cases when i use redis cache
Issues is On second listing i have to display questions by votes and not answered combine.
How can i stored this type of data in redis to load faster with sorting based by 2 conditions votes with time and ans count with time?
You can use sorted sets in redis. Your view or answer count can be the score. create a key based on timestamp. Sorted set method zrevrangebyscore will give you the correct order.
you can set your member of sorted set as:
'YEAR_MONTH_DATE_HOUR_MINUTE_SECONDS:question_id'
This way if you sort, questions with same score, will be returned in lexicographical order. That way question which came later will be placed higher if you use zrevrangebyscore.
You can create a hash map to map timestamp and question_id. for faster lookup
I asked a similar question, where I also purposed a solution. I want something different but it will do exactly what you want.
Redis zrevrangebyscore, sorting other than lexicographical order

In Redis are all hash keys stored in the same "table"? and if so how does it affect performance?

Looking at this example http://redis.io/topics/twitter-clone where user records are stored by using a hash key ("uid:1000") and "tweets" are stored by hash key ("post:60"), does this mean that all those records are stored in the same data structure and adding tweets will effect the time for retrieving user records?
Yes, the users and tweets are stored in the same data structure. That data structure is a hash table.
Internally, Redis has no concept of record types. As far as Redis is concerned,User:1000 and Post:60 are just a sequence of bytes. So yes, Redis does store all records in the same data structure.
Because Redis does not differentiate between Tweets and Users, the response times for both types of records is going to be similar.
So, everything boils down to the question - "Does Redis' performance scale to the number of records?"
The answer to that is YES, it does. As long as you have the memory to keep all your data, Redis' performance should not depend on the number of records.