Redis: how to store a list of user hashes and retrieve it? - redis

I've started using redis today and I've been through the tutorial and some links at stackoverflow but I'm failing to understand how to properly use redis for what it seems to be a very simple use case.
Goal: Save several users data into redis and read all of the users at once.
I start a redis client and I start by adding the first user which has id 1:
127.0.0.1:6379> hmset user:1 name "vitor" age 35
OK
127.0.0.1:6379> hgetall user:1
1) "name"
2) "vitor"
3) "age"
4) "35"
I add a couple of more users, doing several command like this one:
127.0.0.1:6379> hmset user:2 name "nuno" age 10
I was (probably wrongly) expecting to be able to now query all my users by doing:
hgetall "user:"
or even
hgetall "user:*"
The fact that I've not seen anything like this in the tutorials, kind of tells me that I'm not using redis right for this use case.
Would you be able to tell me what should be the approach for this use case?

To understand why these kind of operations seem non-trivial in NoSQL implementations, it's good to think about why NoSQL exists (and has become very popular) at all.
When you look at an early NoSQL implementation like memcached, the first use case was very simple, but very important: a blazingly fast cache for distributed data, to cache for example web page data. Very quickly stuff like clustering and sharding was added, so not all data has to be available everywhere at once at every single node in the cluster, but can be gathered on demand.
NoSQL is very different from relational data storage. Don't overuse it. Consider relational databases as well, as they are sometimes far more suited for what you are trying to accomplish. In everything you design, ask yourself "Does this scale well?".
Okay, back to your question. It is in general bad practice to do wildcard searches. You prepare your data in a way that you can retrieve your data in a scalable way.
Redis is a very chique solution, allowing you to overcome a lot of NoSQL limitations in an elegant way.
If getting "a list of all users" isn't something you have to do very often, or doesn't need to scale well, is always "I really always want all users" because it's for a daily scan anyway, use HSCAN. SCAN operations with a proper batch size don't get in the way of other clients, you can just retrieve your records a couple of thousand at a time, and after a few calls you've got everything.
You can also store your users in a SET. There's no ordering in a set, so no pagination. It can help to keep your user names unique.
If you want to do things like "get me all users that start with the letter 'a'", I'd use a ZSET. I'd wait a week or two for ZRANGEBYLEX which is just about to be released, in the works as we speak. Or use an ORM like Josiah Carlsons's 'rom' package.
When you ask yourself "But now I have to do three calls instead of one when storing my data...?!": yup, that's how it works. If you need atomicity, use a Lua script, or MULTI+EXEC pipelining. Lua is generally easier.
You can also ask yourself if using a HSET is needed. Do you need to retrieve the individual data members? Each key or member has some overhead. On top of that, HGETALL has a Big-O specification of O(N), so it doesn't scale well. It might be better to serialize your row as a whole, using JSON or MsgPack, and store it in one HSET member, or just a simple GET/SET. Also read up on SORT.
Hope this helps, TW

If you still want to use Redis you can use something like :
SADD users "{"userId":1,"name":John, "vitor":x,"age:35}"
SADD users "{"userId":2,"name":xt, "vitor":x,"age:43}"
...
And you can retrieve the same using :
SMEMBERS users

Related

Proper way of caching data in Redis

I'm trying to build a freelance platform with using MongoDB as main database and RedisDB for caching, but really couldn't figure out which way is the proper way of caching. Basically I'll store jwt tokens, verification codes and other stuff with expiration date. On the other hand, let's say I'll store 5 big collection as Gigs, JobOffers, Reviews, Users, Companies. I also want to use query them.
Example Use Case 1
Getting job offers only categorised as "Web Design"
Example Use Case 2
Getting job offers only posted by Company X
Option 1
for these two queries i can create two hashes
hash1
"job-offers:categoryId", jobOfferId, JobOffer
hash2
"job-offers:companyId", jobOfferId, JobOffer
Option 2
Using RedisJson and RedisSearch for querying and holding everything in JSON format
Option 3
Using redisSearch with creating multiple hashes
I couldn't figure out which approach will be best, or is there any other approach which is better than both of them.
Option 1 seems like suitable for your scenario. Binding job offers with category or company ids is the smartest solution.
You can use HGETALL to get all fields data from your hashset.
When using redis as a request caching mechanism, please remember that you have to keep redis cache updated consistently if it is generated from sql or no-sql db.
good question
as far as I can see, data of redis/part of mongo is stored on RAM, and RAM is more expensive than hard disk, if you don`t care about the price, and you can handle the situations by redis/mongo, and the data can be recovered from AOF/RDB files(or things like that), you can use whichever you want
If you do care about the price of RAM, probably just use a mysql and use
engine of InnoDB cuz it is cheap and on disk and it can recover and you know a lot of people use them(mysqls,postgres)
If I were you, I probably would choose mysql InnDB, and make the right index, it is fast enough for tables that hold millions of rows.(will get not so good if there are hundreds million rows)

Relational or document database for storing instant messages? Maybe something else?

I started playing around with RavenDB a few days ago. I like it this far, but I am pretty new to the whole NoSQL world. I am trying to think of patterns when to prefer it (or any other DocumentDB or any other NoSQL-kind of data store) over traditional RDBMSs. I do understand that "when you need to store documents or unstructured/dynamically structured data opt for DocumentDB" but that just feels way too general to grasp.
Why? Because from what I've read, people had been writing examples for "documents" such as order details in an e-commerce application and form details of a workflow management application. But these has been developed with RDBMSs for ages without too much trouble - for example, the details of an order, such as quantity, total price, discount, etc. are perfectly structured.
So I think there's an overlap here. But now, I am not asking for general advices for when to use what, because I believe the best for me would be to figure it out by myself through experimenting; so I am just going to ask about a concrete case along with my concerns.
So let's say I develop an instant messenger application which stores messages to ages back, like Facebook's messaging system does. I think using an RDBMS here is not suitable. My reason to this is that most poeple use instant messaging systems like this:
A: hi
B: hey
A: how r u?
B: good, u?
A: me 2
...
The thing to note is that most messages are very short, so storing each in a single row with this structure:
Messages(fromUserId, toUserId, sent, content)
feels very ineffective, because the "actual useful information (content)" is very small, whereas the table would contain incredible amounts of rows and therefore the indexes would grow huge. Adding to this the fact that messages are sent very frequently, the size of indexes would have a huge impact on performance. So a very large amount of rows must be managed and stored while every row contains a minimal amount of actual information.
In RavenDB, I would use a structure such as this:
// a Conversation object
{
"FirstUserId": "users/19395",
"SecondUserId": "users/19396",
"Messages": [
{
"Order": 0,
"Sender": "Second",
"Sent": "2016-04-02T19:27:35.8140061",
"Content": "lijhuttj t bdjiqzu "
},
{
"Order": 1,
"Sender": "Second",
"Sent": "2016-04-02T19:27:35.8200960",
"Content": "pekuon eul co"
}
]
}
With this structure, I only need to find out which conversation I am looking for: the one between User A and User B. Any message between User A and User B is stored in this object, regardless of whether User A or User B was the sender. So once I find the conversation between them - and there are far less converations than actual messages - I can just grab all of the messages associated with it.
However, if the two participants talk a lot (and assuming that messages are stored for, let's say, 3 years) there can be tens of thousands of messages in a single conversation causing the object to grow very large.
But there is one thing I don't know how it works (specifically) in RavenDB. Does its internal storage and query mechanism allow (the DB engine, not the client) to grab just the (for example) 50 most recent messages without reading the whole object? Afterall, it uses indexing on the properties of objects, but I haven't found any information about whether reading parts of an object is possible DB-side. (That is, without the DB engine reading the whole object from disk, parsing it and then sending back just the required parts to the client).
If it is possible, I think using Raven is a better option in this scenario, if not, then I am not sure. So please help me clean it up by answering the issue mentioned in the previous paragraph along with any advices on what DB model would suit this certain scenario the best. RDBMSs? DocDBs? Maybe something else?
Thanks.
I would say the primary distinctions will be:
Does your application consume the data in JSON? -- Then store it as JSON (in a document DB) and avoid serializing/deserializing it.
Do you need to run analytical workloads on the data? -- Then use SQL
What consistency levels do you need? -- SQL is made for high consistency, docDBs are optimized for lower consistencies
Does your schema change much? -- then use a (schema-less) docDB
What scale are you anticipating? -- docDBs are usually easier to scale out
Note also that many modern cloud document databases (like Azure DocDB) can give you the best of both worlds as they support geo-replication, schema-less documents, auto-indexing, guaranteed latencies, and SQL queries. SQL Databases (like AWS Aurora) can handle massive throughput rates, but usually still require more hand-holding from a DBA.

Caching temporary data - PostgreSQL and Mongo

I have some data from an API I need to cache. This data I want invalidated after X days, but I want it available locally to save time querying and compiling things for the end user.
Presently I have a PostgreSQL database. I want to keep this around because there's permanent data like user records I don't want to put in Mongo (unless you guys can convince me otherwise). I really have nothing against Mongo, but I can normalize some things with users and the only way I could think to do it without massive amounts of duplication is via PostgreSQL.
Now my API data is flat, and in JSON. I don't need to create any sort of link to any other table and it has a field that I can use as a key pretty easily. My idea is to literally "throw" the data into a Mongo instance and query as needed, invaliding every X days. This also offers some persistence should the server go down for whatever reason.
So my questions to you guys are this. Is this a good use case for Mongo over memcached? Should I just memcached the raw data instead? If you guys do suggest Mongo, should I move my users table and the relations over to Mongo as well?
Thanks!
This is the sort of thing Redis is really good for. Redis, possibly with selective cache invalidation via PostgreSQL's LISTEN and NOTIFY, is a pretty low pain way to manage caching.
Another option is to use UNLOGGED tables in PostgreSQL.

Redis full text search : reverse indexing or sunspot?

I have 3,5 millions records (readonly) actually stored in a MySQL DB that I would want to pull out to Redis for performance reasons. Actually, I've managed to store things like this into Redis :
1 {"type":"Country","slug":"albania","name_fr":"Albanie","name_en":"Albania"}
2 {"type":"Country","slug":"armenia","name_fr":"Arménie","name_en":"Armenia"}
...
The key I use here is the legacy MySQL id, so with some Ruby glue, I can break as less things as possible in this existing app (and this is a serious concern here).
Now the problem is when I need to perform a search on the keyword "Armenia", inside the value part. Seems like there's only two ways out :
Either I multiplicate Redis index :
id => JSON values (as shown above)
slug => id (reverse indexing based on the slug, that could do the basic search trick)
finally, another huge index specifically for autocomplete, as shown in this post : http://oldblog.antirez.com/post/autocomplete-with-redis.html
Either I use sunspot or some full text search engine (unfortunatly, I actually use ThinkingSphinx which is too much tied to MySQL :-(
So, what would you do ? Do you think the MySQL to Redis move of a single table is even a good idea ? I'm afraid of the Memory footprint those gigantic Redis key/values could take on a 16GB RAM Server.
Any feedback on a similar Redis usage ?
Before I start with a real answer, I wanted to mention that I don't see a good reason for you to be using Redis here. Based on what types of use cases it sounds like you're trying to do, it sounds like something like elasticsearch would be more appropriate for you.
That said, if you just want to be able to search for a few different fields within your JSON, you've got two options:
Auxiliary index that points field_key -> list_of_ids (in your case, "Armenia" -> 1).
Use Lua on top of Redis with JSON encoding and decoding to get at what you want. This is way more flexible and space efficient, but will be slower as your table grows.
Again, I don't think either is appropriate for you because it doesn't sound like Redis is going to be a good choice for you, but if you must, those should work.
Here's my take on Redis.
Basically I think of it as an in-memory cache that can be configured to only store the least recently used data (LRU). Which is the role I made it to play in my use case, the logic of which may be applicable to helping you think about your use case.
I'm currently using Redis to cache results for a search engine based on some complex queries (slow), backed by data in another DB (similar to your case). So Redis serves as a cache storage for answering queries. All queries either get served the data in Redis or the DB if it's a cache-miss in Redis. So, note that Redis is not replacing the DB, but merely being an extension via cache in my case.
This fit my specific use case, because the addition of Redis was supposed to assist future scalability. The idea is that repeated access of recent data (in my case, if a user does a repeated query) can be served by Redis, and take some load off of the DB.
Basically my Redis schema ended up looking somewhat like the duplication of your index you outlined above. I used sets and sortedSets to create "batches / sets" of redis-keys, each of which pointed to specific query results stored under a particular redis-key. And in the DB, I still had the complete data set and an index.
If your data set fits on RAM, you could do the "table dump" into Redis, and get rid of the need for MySQL. I could see this working, as long as you plan for persistent Redis storage and plan for the possible growth of your data, if this "table" will grow in the future.
So depending on your actual use case and how you see Redis fitting into your stack, and the load your DB serves, don't rule out the possibility of having to do both of the options you outlined above (which happend in my case).
Hope this helps!
Redis does provide Full Text Search with RediSearch.
Redisearch implements a search engine on top of Redis. This also enables more advanced features, like exact phrase matching, auto suggestions and numeric filtering for text queries, that are not possible or efficient with traditional Redis search approaches.

Redis set vs hash

In many Redis tutorials (such as this one), data is stored in a set, but with multiple values combined together in a string (i.e. a user account might be stored in the set as two entries, "user:1000:username" and "user:1000:password").
However, Redis also has hashes. It seems that it would make more sense to have a "user:1000" hash, which contains a "username" entry and a "password" entry. Rather than concatenating strings to access a particular value, you just access them directly in the hash.
So why isn't it used as much? Are these just old tutorials? Or do Redis hashes have performance issues?
Redis hashes are good for storing more complex data, like you suggest in your question. I use them for exactly that - to store objects with multiple attributes that need to be cached (specifically, inventory data for a particular product on an e-commerce site). Sure, I could use a concatenated string - but that adds unneeded complexity to my client code, and updating an individual field is not possible.
You may be right - the tutorials may simply be from before Hashes were introduced. They were clearly designed for storing Object representations: http://oldblog.antirez.com/post/redis-weekly-update-1.html
I suppose one concern would be the number of commands Redis must service when a new item is inserted (n number of commands, where n is the number of fields in the Hash) when compared to a simple String SET command. I haven't found this to be a problem yet on a service which hits Redis about 1 million times per day. Using the right data structure to me is more important than a negligible performance impact.
(Also, please see my comment regarding Redis Sets vs. Redis Strings - I think your question is referring to Strings but correct me if I'm wrong!)
Hashes are one of the most efficient methods to store data in Redis, even going so far as to recommending them for use whenever effectively possible.
http://redis.io/topics/memory-optimization
Use hashes when possible
Small hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.
Use case comparison:
Sets provide with a semantic interface to store data as a set in Redis server. The use
cases for this kind of data would be more for an analytics purpose, for example
how many people browse the product page and how many end up purchasing
the product.
Hashes provide a semantic interface to store simple and complex data objects in the
Redis server. For example, user profile, product catalog, and so on.
Ref: Learning Redis
Use cases for SETS
Uniqueness:
We have to enforce our application to make sure every username can be used by one single person. If someone signup with a username, we first look up set of usernames
SISMEMBER setOfUsernames newUsername
Creating relationships between different records:
Imagine you have Like functionality in your app. you might have a separate set for every single user and store the ID's of the images that user has liked so far.
Find common attributes that people like
In dating apps, users usually pick different attributes, and those attributes are stored in sets. And to help people match easily, our app might check the intersection of those common attributes
SINTER user#45:likesSet user#34:likesSet
When we have lists of items and order does not matter
For example, if you want to restrict API addresses that want to reach your app or block emails to send you emails, you can store them in a set.
Use cases for Hash
Redis Hashes are usually used to store complex data objects: sessions, users etc. Hashes are more memory-optimized.