"Grouping" data in Redis - redis

I have recently been looking at Redis and it seems almost perfect as I am doing something that mostly needs key-value based data structures.
As someone who has mostly used MySQL as a database I have got used to grouping data in tables and am quite confused as when reading about Redis I have seen no mention of tables or any other way of grouping data. Does this mean there is no concept of tables in Redis?
For example if I had a simple website where users could post comments about other users in a relational database I could have a table "users" and a table "comments", how would this be done using Redis?
Hopefully this is clear enough, thanks in advance.

Yes, redis is a super-powered key-value store, not a relational database. There are no tables.
However, something can be done. Take a look at LamerNews. It's a hackernews-like site that uses redis as its data store.

Users can be stored in a SET or LIST in REDIS.
User comments have to be stored in a HASH, with keys as commenter:commented and value would be the comment. So if user1 comments on user2 some text like "Hello hw do u do?", then our HASH which we can call as UserComments will have key and values as :
Key= user1:user2
value = "Hello hw do u do?"
From the HASH you could any time get all the comments posted by users, also if you tokenized the key you would get commenter and commented.

Related

Out of Process in memory database table that supports queries for high speed caching

I have a SQL table that is accessed continually but changes very rarely.
The Table is partitioned by UserID and each user has many records in the table.
I want to save database resources and move this table closer to the application in some kind of memory cache.
In process caching is too memory intensive so it needs to be external to the application.
Key Value stores like Redis are proving inefficient due to the overhead of serializing and deserializing the table to and from Redis.
I am looking for something that can store this table (or partitions of data) in memory, but let me query only the information I need without serializing and deserializing large blocks of data for each read.
Is there anything that would provide Out of Process in memory database table that supports queries for high speed caching?
Searching has shown that Apache Ignite might be a possible option, but I am looking for more informed suggestions.
Since it's out-of-process, it has to do serialization and deserialization. The problem you concern is how to reduce the serialization/deserizliation work. If you use Redis' STRING type, you CANNOT reduce these work.
However, You can use HASH to solve the problem: mapping your SQL table to a HASH.
Suppose you have the following table: person: id(varchar), name(varchar), age(int), you can take person id as key, and take name and age as fields. When you want to search someone's name, you only need to get the name field (HGET person-id name), other fields won't be deserialzed.
Ignite is indeed a possible solution for you since you may optimize serialization/deserialization overhead by using internal binary representation for accessing objects' fields. You may refer to this documentation page for more information: https://apacheignite.readme.io/docs/binary-marshaller
Also access overhead may be optimized by disabling copy-on-read option https://apacheignite.readme.io/docs/performance-tips#section-do-not-copy-value-on-read
Data collocation by user id is also possible with Ignite: https://apacheignite.readme.io/docs/affinity-collocation
As the #for_stack said, Hash will be very suitable for your case.
you said that Each user has many rows in db indexed by the user_id and tag_id . So It is that (user_id, tag_id) uniquely specify one row. Every row is functional depends on this tuple, you could use the tuple as the HASH KEY.
For example, if you want save the row (user_id, tag_id, username, age) which values are ("123456", "FDSA", "gsz", 20) into redis, You could do this:
HMSET 123456:FDSA username "gsz" age 30
When you want to query the username with the user_id and tag_id, you could do like this:
HGET 123456:FDSA username
So Every Hash Key will be a combination of user_id and tag_id, if you want the key to be more human readable, you could add a prefix string such as "USERINFO". e.g. : USERINFO:123456:FDSA .
BUT If you want to query with only a user_id and get all rows with this user_id, this method above will be not enough.
And you could build the secondary indexes in redis for you HASH.
as the above said, we use the user_id:tag_id as the HASH key. Because it can unique points to one row. If we want to query all the rows about one user_id.
We could use sorted set to build a secondary indexing to index which Hashes store the info about this user_id.
We could add this in SortedSet:
ZADD user_index 0 123456:FDSA
As above, we set the member to the string of HASH key, and set the score to 0. And the rule is that we should set all score in this zset to 0 and then we could use the lexicographical order to do range query. refer zrangebylex.
E.g. We want to get the all rows about user_id 123456,
ZRANGEBYLEX user_index [123456 (123457
It will return all the HASH key whose prefix are 123456, and then we use this string as HASH key and hget or hmget to retrieve infomation what we want.
[ means inclusive, and ( means exclusive. and why we use 123457? it is obvious. So when we want to get all rows with a user_id, we shoud specify the upper bound to make the user_id string's leftmost char's ascii value plus 1.
More about lex index you could refer the article I mentioned above.
You can try apache mnemonic started by intel. Link -http://incubator.apache.org/projects/mnemonic.html. It supports serdeless features
For a read-dominant workload MySQL MEMORY engine should work fine (writing DMLs lock whole table). This way you don't need to change you data retrieval logic.
Alternatively, if you're okay with changing data retrieval logic, then Redis is also an option. To add to what #GuangshengZuo has described, there's ReJSON Redis dynamically loadable module (for Redis 4+) which implements document-store on top of Redis. It can further relax requirements for marshalling big structures back and forth over the network.
With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:
The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.
Use Redis' hash map data type
Form key name from primary key values of the table by a separator (such as ":")
Store the remaining fields as a hash
When you want to query a single row, directly form the key and retrieve its results
When you want to query a range, use wild char "*" towards your key. But please be aware, scanning keys interrupt other Redis processes. So use this method if you really have to.
The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)
using Memcache and REDIS combination on top of MYSQL comes to Mind.

Mysql representation in redis

I dont have any experience in redis and i would like to use it to cache mysql php results but it getting abit complicated.
In mysql i have two tables
tbl_users: id, username, first_name, last_name
tbl_orders: order_id, order_name, order_date........
Suppose i would like to store both users and orders tables records in redis i picured a json looking like
{
"users":{
"1":{username:"jane", first_name:"user1", last_name:".."}
"2":{username:"jane", first_name:"user1", last_name:".."}
...........
}
"orders":{
"1":{order_name:"jane", order_date:"user1",....}
"1":{order_name:"jane", order_date:"user1", .....}
}
}
}
In this case should i create two redis servers, one for users and other for orders or how should i go about this.
First, you should read An introduction to Redis data types and abstractions, to understand what Redis is and what it can do.
It sounds like you have 3 things you want to cache in Redis:
A specific user:
This is what Redis is best at.
SET user:1 '{username:"jane", first_name:"user1", last_name:".."}',
then read the user JSON back with GET user:1
All users:
This probably isn't worth caching in Redis, since MySQL will be quite fast at returning all the rows in tbl_users already.
You could use a Redis list (via LPUSH and getting all of them LRANGE {listname} 0 -1) to cache it.
But you're just duplicating the functionality of a database table at this point, so I really wouldn't recommend it.
Orders for a specific user:
Again, MySQL should be able to do this efficiently for you, with a good index on tbl_orders, but Redis can cache it using a list as well.

Design Redis database table like SQL?

Suppose my database table structure is like this
id name college address
1 xxx nnn xn
2 yyy nnm yn
3 zzz nnz zn
If i want to get the student details based on the name in sql like this
select * from student where name = 'xxx'
so how its is possible in redis database
Redis, like other NoSQL datastores, has different requirements based on what you are going to be doing.
Redis has several data structures that could be useful depending on your need. For example, given your desire for a select * from student where name = 'xxx' you could use a Redis hash.
redis 127.0.0.1:6379> hmset student:xxx id 1 college nnn address xn
OK
redis 127.0.0.1:6379> hgetall student:xxx
1) "id"
2) "1"
3) "college"
4) "nnn"
5) "address"
6) "xn"
If you have other queries though, like you want to do the same thing but select on where college = 'nnn' then you are going to have to denormalize your data. Denormalization is usually a bad thing in SQL, but in NoSQL it is very common.
If your primary query will be against the name, but you may need to query against the college, then you might do something like adding a set in addition to the hashes.
redis 127.0.0.1:6379> sadd college:nnn student:xxx
(integer) 1
redis 127.0.0.1:6379> smembers college:nnn
1) "student:xxx"
With your data structured like this, if you wanted to find all information for names going to college xn, you would first select the set, then select each hash based on the name returned in the set.
Your requirements will generally drive the design and the structures you use.
With just 6 principles (which I collected here), it is very easy for a SQL minded person to adapt herself to Redis approach. Briefly they are:
The most important thing is that, don't be afraid to generate lots of key-value pairs. So feel free to store each row of the table in a different key.
Use Redis' hash map data type
Form key name from primary key values of the table by a separator (such as ":")
Store the remaining fields as a hash
When you want to query a single row, directly form the key and retrieve its results
When you want to query a range, use wild char "*" towards your key.
The link just gives a simple table example and how to model it in Redis. Following those 6 principles you can continue to think like you do for normal tables. (Of course without some not-so-relevant concepts as CRUD, constraints, relations, etc.)
For plain, vanilla redis the other answers are completely correct, however, yesterday (02 - December - 2016) redis 4-rc1 is been released.
redis v4 provides support for modules and I just wrote a small module to embed SQLite into redis itself; rediSQL.
With that module you can actually use a fully functional SQL database inside your redis instace.
Redis just has some basic data structures with it, NoSQL and SQL are different worlds. But You can use Redis like some schemed SQL data store. There are funny program Redisql on github which try to play with Redis via SQL, and the idea behind Redisql is such that #sberry mentioned.
Hope it is not too late since the original question is long for six year of time. You may try my dbx plugin: https://github.com/cscan/dbx
Which support the simple SQL to maintain the hashes in REDIS. Something like this:
127.0.0.1:6379> dbx.select name, tel from phonebook where gender = 'F' order by age desc
or calling from shell
$ redis-cli "dbx.select name, tel from phonebook where gender = 'F' order by age desc"
Hope this help.
You can try searchbox framework. searchbox provides easy way for querying redis data with its Criteria api.
OnceDB is a full-text search in-memory database based on Redis. It supports data management like SQL relational databases and NoSQL schemaless databases.
OnceDB does not change the data storage structure of Redis and is fully compatible with Redis. Redis database files can be directly operated in OnceDB and then returned to Redis for use.
OnceDB automatically creates auxiliary indexes through operators:
= Ordinary field value, no index
# Primary key
? Grouping index
* Keyword grouping index, separated by ',' between keywords
\ Sort index, the score weight of the index is the value of the field
for example, execute the following command to add the user data:
upsert user username # dota password = 123456 title ? SDEI skills * java,go,c
> OK
you can search from the index by an operator, such as searching user data containing the c keyword, and printing the username and password fields.
find user 0 -1 username = * password = * skills * c
find user 0 -1 username = * password = * skills * c
1) (integer) 1
2) "user:dota"
3) "dota"
4) "123456"
5) "java,go,c"
Read more:OnceDB quick start
In SQL database design, we first put everything into the database and then figure out how we will query about that
In Redis Design, we first figure out what queries we need to answer, and then we are going to structure our data.
That is why Redis is super fast. Redis stores data as a hash in some cases. If the record has many attributes, in your case, a student might have "Age,name,class" attributes so storing "student` as the hash is useful.
In Redis, when you build your application, you have to see what you are going to store-users, sessions, products- and based on those things that your app is storing, you have to plan which data structures to use to store each thing.

In Redis are all hash keys stored in the same "table"? and if so how does it affect performance?

Looking at this example http://redis.io/topics/twitter-clone where user records are stored by using a hash key ("uid:1000") and "tweets" are stored by hash key ("post:60"), does this mean that all those records are stored in the same data structure and adding tweets will effect the time for retrieving user records?
Yes, the users and tweets are stored in the same data structure. That data structure is a hash table.
Internally, Redis has no concept of record types. As far as Redis is concerned,User:1000 and Post:60 are just a sequence of bytes. So yes, Redis does store all records in the same data structure.
Because Redis does not differentiate between Tweets and Users, the response times for both types of records is going to be similar.
So, everything boils down to the question - "Does Redis' performance scale to the number of records?"
The answer to that is YES, it does. As long as you have the memory to keep all your data, Redis' performance should not depend on the number of records.

Encrypted database query

I've just found out about Stack Overflow and I'm just checking if there are ideas for a constraint I'm having with some friends in a project, though this is more of a theoretical question to which I've been trying to find an answer for some time.
I'm not much given into cryptography but if I'm not clear enough I'll try to edit/comment to clarify any questions.
Trying to be brief, the environment is something like this:
An application where the front-end as access to encrypt/decrypt keys and the back-end is just used for storage and queries.
Having a database to which you can't have access for a couple of fields for example let's say "address" which is text/varchar as usual.
You don't have access to the key for decrypting the information, and all information arrives to the database already encrypted.
The main problem is something like this, how to consistently make queries on the database, it's impossible to do stuff like "where address like '%F§YU/´~#JKSks23%'". (IF there is anyone feeling with an answer for this feel free to shoot it).
But is it ok to do where address='±!NNsj3~^º-:'? Or would it also completely eat up the database?
Another restrain that might apply is that the front end doesn't have much processing power available, so already encrypting/decrypting information starts to push it to its limits. (Saying this just to avoid replies like "Exporting a join of tables to the front end and query it there".)
Could someone point me in a direction to keep thinking about it?
Well thanks for so fast replies at 4 AM, for a first time usage I'm really feeling impressed with this community. (Or maybe I'm it's just for the different time zone)
Just feeding some information:
The main problem is all around partial matching. As a mandatory requirement in most databases is to allow partial matches. The main constraint is actually the database owner would not be allowed to look inside the database for information. During the last 10 minutes I've come up with a possible solution which extends again to possible database problems, to which I'll add here:
Possible solution to allow semi partial matching:
The password + a couple of public fields of the user are actually the key for encrypting. For authentication the idea is to encrypt a static value and compare it within the database.
Creating a new set of tables where information is stored in a parsed way, meaning something like: "4th Street" would become 2 encrypted rows (one for '4th' another for 'Street'). This would already allow semi-partial matching as a search could already be performed on the separate tables.
New question:
Would this probably eat up the database server again, or does anyone think it is a viable solution for the partial matching problem?
Post Scriptum: I've unaccepted the answer from Cade Roux just to allow for further discussion and specially a possible answer to the new question.
You can do it the way you describe - effectively querying the hash, say, but there's not many systems with that requirement, because at that point the security requirements are interfering with other requirements for the system to be usable - i.e. no partial matches, since the encryption rules that out. It's the same problem with compression. Years ago, in a very small environment, I had to compress the data before putting it in the data format. Of course, those fields could not easily be searched.
In a more typical application, ultimately, the keys are going to be available to someone in the chain - probably the web server.
For end user traffic SSL protects that pipe. Some network switches can protect it between web server and database, and storing encrypted data in the database is fine, but you're not going to query on encrypted data like that.
And once the data is displayed, it's out there on the machine, so any general purpose computing device can be circumvented at that point, and you have perimeter defenses outside of your application which really come into play.
why not encrypt the disk holding the database tables, encrypt the database connections, and let the database operate normally?
[i don't really understand the context/contraints that require this level of paranoia]
EDIT: "law constraints" eh? I hope you're not involved in anything illegal, I'd hate to be an inadvertent accessory... ;-)
if the - ahem - legal constraints - force this solution, then that's all there is to be done - no LIKE matches, and slow response if the client machines can't handle it.
A few months ago I came across the same problem: the whole database (except for indexes) is encrypted and the problem on partial matches raised up.
I searched the Internet looking for a solution, but it seems that there's not much to do about this but a "workaround".
The solution I've finally adopted is:
Create a temporary table with the data of the field against which the query is being performed, decrypted and another field that is the primary key of the table (obviously, this field doesn't have to be decrypted as is plain-text).
Perform the partial match agains that temporary table and retrieve the identifiers.
Query the real table for those identifiers and return the result.
Drop the temporary table.
I am aware that this supposes a non-trivial overhead, but I haven't found another way to perform this task when it is mandatory that the database is fully encrypted.
Depending on each particular case, you may be able to filter the number of lines that are inserted into the temporary table without losing data for the result (only consider those rows which belong to the user that is performing the query, etc...).
You want do use md5 hashing. Basically, it takes your string and turns it into a hash that cannot be reproduced. You can then use it to validate against things later. For example:
$salt = "123-=asd";
$address = "3412 g ave";
$sql = "INSERT INTO addresses (address) VALUES ('" . md5($salt . $address) . "')";
mysql_query($sql);
Then, to validate an address in the future:
$salt = "123-=asd";
$address = "3412 g ave";
$sql = "SELECT address FROM addresses WHERE address = '" . md5($salt . $address) . "'";
$res = mysql_query($sql);
if (mysql_fetch_row($res))
// exists
else
// does not
Now it is encrypted on the database side so nobody can find it out - even if they looked in your source code. However, finding the salt will help them decrypt it though.
http://en.wikipedia.org/wiki/MD5
If you need to store sensitive data that you want to query later I'd recommend to store it in plain text, restricting access to that tables as much as you can.
If you can't do that, and you don't want overhead in the front end you can make a component in the back end, running in a server, that processes the encrypted data.
Making querys to encrypted data? If you're using a good encryption algorithm I can't imagine how to do that.