In many Redis tutorials (such as this one), data is stored in a set, but with multiple values combined together in a string (i.e. a user account might be stored in the set as two entries, "user:1000:username" and "user:1000:password").
However, Redis also has hashes. It seems that it would make more sense to have a "user:1000" hash, which contains a "username" entry and a "password" entry. Rather than concatenating strings to access a particular value, you just access them directly in the hash.
So why isn't it used as much? Are these just old tutorials? Or do Redis hashes have performance issues?
Redis hashes are good for storing more complex data, like you suggest in your question. I use them for exactly that - to store objects with multiple attributes that need to be cached (specifically, inventory data for a particular product on an e-commerce site). Sure, I could use a concatenated string - but that adds unneeded complexity to my client code, and updating an individual field is not possible.
You may be right - the tutorials may simply be from before Hashes were introduced. They were clearly designed for storing Object representations: http://oldblog.antirez.com/post/redis-weekly-update-1.html
I suppose one concern would be the number of commands Redis must service when a new item is inserted (n number of commands, where n is the number of fields in the Hash) when compared to a simple String SET command. I haven't found this to be a problem yet on a service which hits Redis about 1 million times per day. Using the right data structure to me is more important than a negligible performance impact.
(Also, please see my comment regarding Redis Sets vs. Redis Strings - I think your question is referring to Strings but correct me if I'm wrong!)
Hashes are one of the most efficient methods to store data in Redis, even going so far as to recommending them for use whenever effectively possible.
http://redis.io/topics/memory-optimization
Use hashes when possible
Small hashes are encoded in a very small space, so you should try representing your data using hashes every time it is possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.
Use case comparison:
Sets provide with a semantic interface to store data as a set in Redis server. The use
cases for this kind of data would be more for an analytics purpose, for example
how many people browse the product page and how many end up purchasing
the product.
Hashes provide a semantic interface to store simple and complex data objects in the
Redis server. For example, user profile, product catalog, and so on.
Ref: Learning Redis
Use cases for SETS
Uniqueness:
We have to enforce our application to make sure every username can be used by one single person. If someone signup with a username, we first look up set of usernames
SISMEMBER setOfUsernames newUsername
Creating relationships between different records:
Imagine you have Like functionality in your app. you might have a separate set for every single user and store the ID's of the images that user has liked so far.
Find common attributes that people like
In dating apps, users usually pick different attributes, and those attributes are stored in sets. And to help people match easily, our app might check the intersection of those common attributes
SINTER user#45:likesSet user#34:likesSet
When we have lists of items and order does not matter
For example, if you want to restrict API addresses that want to reach your app or block emails to send you emails, you can store them in a set.
Use cases for Hash
Redis Hashes are usually used to store complex data objects: sessions, users etc. Hashes are more memory-optimized.
Related
I've been trying to make replay system. So basically when player moves, system saves his datas(movements, location, animation etc.) into JSON file. In the end of the record, JSON file may be over 50 MB. I'd want to save this data into Redis with expire date (24-48 hours).
My questions are;
Is it bad to save over 50 MB into Redis with expire date?
How many datas that over 50 MB can Redis handle without performance loss?
If players make 500 records in 48 hours, may it be bad for Redis?
How many milliseconds does it takes 50 MB data from Redis with average VDS/VPS?
Storing a large object(in terms of size) is not a good practice. You may read it from here. One of the problem is network. You need to send 50MB payload to a redis server in a single call. Also if you save them as one big object, then while retrieving, updating it (a single field, element etc), you need to get 50 MB back from server and parse it to get a single field, update it back end send back to server. That's a serious problem in terms of network.
Instead of redis strings, you may prefer sorted sets or lists depending on your use case. If you are going to store them with timestamps and get the range of events between these timestamps, then sorted sets may be an ideal solution for you. It's good for pagination etc. One of the crucial drawback is the complexity of adding a new element is O(log(N)).
lists may also provide a good playground for your case. You may use LPUSH/RPUSH to add new events to your list, and since Redis lists are implemented with linked lists, both adding a message to the beginning or end of the list is same, O(1), which is great.
Whenever an event happens, you either call ZADD or RPUSH/LPUSH to send the events to redis. If you need to query those to you may use available functions such as ZRANGEBYSCORE or LRANGE depending on your choice.
While designing your keys you may use an identifier such as user-id just like you mentioned in the comments. You will not have the problems with lists/sorted sets like you will have in strings. But choosing which one is most suitable for your depends on your use case for reads/writes or business rules.
Here some useful links to read;
Redis data types intro
Redis data types
Redis labs documentation about data types
I am currently researching some data storage strategies with ElasticSearch and wonder why for storing logs, this page indicates:
A standard format is to assign a new index for each day.
Would it not make more sense to create an index (database) with a new type a name (table) per day?
I am looking at this from the point of view of each index is tied to a different web application.
In another scenario, a web app uses one index. One of the types within that index is used for logging (what we currently do with SQL Server). Is this a good approach?
Interesting idea and, yes, you could probably do that. Why use multiple indices instead? If having control over things like shard-to-node allocation (maybe you want all of 2015 stored on one set of nodes, 2014, another), filter cache size, and similar is important, you lose that by going to a single index/multi-mapping approach. For very high volume applications, that control might be significant. YMMV.
With regard to the "each index is tied to a different web application" sentiment, aliases can (and are) used to collect multiple physical indices under a single searchable umbrella; you create one index per day/week/whatever, say, logs-20150730, logs-20150731... and assign the logs alias to all of the indices in the series. Net effect is the same as having a single "index".
Nice part of the alias approach is that purging/pruning old data is trivial; just delete the index when its contents age out of whatever your data retention policy is. With multi-mappings, you'd have to delete the requisite mapping within the index (do-able, but pretty I/O intrusive, since you'd likely be shoving stuff around inside every shard the mapping was distributed through.)
I've started using redis today and I've been through the tutorial and some links at stackoverflow but I'm failing to understand how to properly use redis for what it seems to be a very simple use case.
Goal: Save several users data into redis and read all of the users at once.
I start a redis client and I start by adding the first user which has id 1:
127.0.0.1:6379> hmset user:1 name "vitor" age 35
OK
127.0.0.1:6379> hgetall user:1
1) "name"
2) "vitor"
3) "age"
4) "35"
I add a couple of more users, doing several command like this one:
127.0.0.1:6379> hmset user:2 name "nuno" age 10
I was (probably wrongly) expecting to be able to now query all my users by doing:
hgetall "user:"
or even
hgetall "user:*"
The fact that I've not seen anything like this in the tutorials, kind of tells me that I'm not using redis right for this use case.
Would you be able to tell me what should be the approach for this use case?
To understand why these kind of operations seem non-trivial in NoSQL implementations, it's good to think about why NoSQL exists (and has become very popular) at all.
When you look at an early NoSQL implementation like memcached, the first use case was very simple, but very important: a blazingly fast cache for distributed data, to cache for example web page data. Very quickly stuff like clustering and sharding was added, so not all data has to be available everywhere at once at every single node in the cluster, but can be gathered on demand.
NoSQL is very different from relational data storage. Don't overuse it. Consider relational databases as well, as they are sometimes far more suited for what you are trying to accomplish. In everything you design, ask yourself "Does this scale well?".
Okay, back to your question. It is in general bad practice to do wildcard searches. You prepare your data in a way that you can retrieve your data in a scalable way.
Redis is a very chique solution, allowing you to overcome a lot of NoSQL limitations in an elegant way.
If getting "a list of all users" isn't something you have to do very often, or doesn't need to scale well, is always "I really always want all users" because it's for a daily scan anyway, use HSCAN. SCAN operations with a proper batch size don't get in the way of other clients, you can just retrieve your records a couple of thousand at a time, and after a few calls you've got everything.
You can also store your users in a SET. There's no ordering in a set, so no pagination. It can help to keep your user names unique.
If you want to do things like "get me all users that start with the letter 'a'", I'd use a ZSET. I'd wait a week or two for ZRANGEBYLEX which is just about to be released, in the works as we speak. Or use an ORM like Josiah Carlsons's 'rom' package.
When you ask yourself "But now I have to do three calls instead of one when storing my data...?!": yup, that's how it works. If you need atomicity, use a Lua script, or MULTI+EXEC pipelining. Lua is generally easier.
You can also ask yourself if using a HSET is needed. Do you need to retrieve the individual data members? Each key or member has some overhead. On top of that, HGETALL has a Big-O specification of O(N), so it doesn't scale well. It might be better to serialize your row as a whole, using JSON or MsgPack, and store it in one HSET member, or just a simple GET/SET. Also read up on SORT.
Hope this helps, TW
If you still want to use Redis you can use something like :
SADD users "{"userId":1,"name":John, "vitor":x,"age:35}"
SADD users "{"userId":2,"name":xt, "vitor":x,"age:43}"
...
And you can retrieve the same using :
SMEMBERS users
the question refers to the sharded configuration of redis. I have implemented a small test application in Java which creates 100.000 user hashes over Jedis in the form of user:userID. Each hash has the elements: name, phone, department, userID. I also created simple key-value pairs with the key phone:phone number which contains the userID whose phone number is the ID and sets for each department with the userIDs who work for that particular department. The two latter types I use only for seaching. These structures and the search are similar to Searching in values of a redis db.
In short, the data structures:
user:userID->{name, department, phone, userID}
department:department->([userID1, userID2,....])
phone:phone->userID
Use cases for the search:
access to user-hashes based on key i.e. userID
search for users with a phone number
search for all users of a department
Everything works all right in the single instance and sharded configuration but I would have the following questions:
In the single instance configuration it is possible to look for phone
number with a wide card e.g. with the KEY method but this is not
available in the sharded configuration. How would it be possible to
look for keys whose first part is known?
The user ID is generated from a zset whose score is increased for
userID. This can be doen in a transaction for the single instance
configuration but transactions seem not to be supported for sharding
configurations over jedis even if the participating keys are on the
same instance. How would it be possible to solve this problem if
multiple client threads can also do the user creation?
Thank you for your responses also in advance.
For the 1st part of your question:
There is no magic here, if you want to search across all your shards, you have to iterate over all shards. Jedis don't have this method, but you could extend ShardedJedis to add it (untested):
public Set<String> keys(String pattern) {
HashSet<String> found = new HashSet<String>();
for (Jedis jedis : getAllShards()) {
found.addAll(jedis.keys(pattern));
}
return found;
}
For the 2nd part of your question:
AFAIK, Jedis doesn't support transactions when using Shards, even if you do force the related keys to be on the same shard (see Jedis Advanced Usage).
This same link suggest a workaround that may apply for a few scenarios:
Mixed approach
If you want easy load distribution of ShardedJedis, but still need
transactions/pipelining/pubsub etc, you can also mix the normal and
the sharded approach: define a master as normal Jedis, the others as
sharded Jedis. Then make all the shards slaveof master. In your
application, direct your write requests to the master, the read
requests to ShardedJedis. Your writes don't scale anymore, but you
gain good read distribution, and you have
transactions/pipelining/pubsub simply using the master. Dataset should
fit in RAM of master. Remember that you can improve performance of the
master a lot, if you let the slaves do the persistance for the master!
At my current project I actively use redis for various purposes. There are 2 redis databases for current application:
The first one contains absolutely temporary data: how many users are online, who are online, various admin's counters. This db is cleared before the application starts by start-up script.
The second database is used for persistent data like user's ratings, user's friends, etc.
Everything seems to be correct and everybody is happy.
However, when I've started implementing a new functionality in my application, I discover that I need to intersect a set with user's friends with a set of online users. These sets stored in different redis databases, and I haven't found any possibility to do this task in redis, except changing application architecture and move all keys into one namespace(database).
Is there actually any way to perform some command in redis using data from multiple databases? Or maybe my use case of redis is wrong and I have to perform a fix of system architecture?
There is not. There is a command that makes it easy to move keys to another DB:
http://redis.io/commands/move
If you move all keys to one DB, make sure you don't have any key clashes! You could suffix or prefix the keys from the temp DB to make absolutely sure. MOVE will do nothing if the key already exists in the target DB. So make sure you act on a '0' reply
Using multiple DBs is definitely not a good idea:
A Quote from Salvatore Sanfilippo (the creator of redis):
I understand how this can be useful, but unfortunately I consider
Redis multiple database errors my worst decision in Redis design at
all... without any kind of real gain, it makes the internals a lot
more complex. The reality is that databases don't scale well for a
number of reason, like active expire of keys and VM. If the DB
selection can be performed with a string I can see this feature being
used as a scalable O(1) dictionary layer, that instead it is not.
With DB numbers, with a default of a few DBs, we are communication
better what this feature is and how can be used I think. I hope that
at some point we can drop the multiple DBs support at all, but I think
it is probably too late as there is a number of people relying on this
feature for their work.
https://groups.google.com/forum/#!msg/redis-db/vS5wX8X4Cjg/8ounBXitG4sJ