Choose between HINCRBY and INCR for redis

Choose between HINCRBY and INCR for redis - redis

I have a forum and want to save and show topics' view count using redis. It seems I have two methods to do this: HINCRBY and INCR. Which should I choose? And why?( Given I have 10,000,000 topics in total )
With HINCRBY I can use one key to keep all values, but the hash is big. But with INCR I'll have many keys.

If you use hashes (so go with HINCRBY) you can reduce your memory footprint if you can use multiple hashes instead of one: http://redis.io/topics/memory-optimization#using-hashes-to-abstract-a-very-memory-efficient-plain-key-value-store-on-top-of-redis
All you have to do is find some way of distributing your keys into multiple hashes, not just one, for example these guys found a way: http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value

Related

Count the number of keys that start with string

I've two different types of keys saved, one prefixed with response, and another uuid. I want to count the number of keys that start with uuid. According to this answer, it's possible to filter keys using MATCH but is it possible to get a count?

There's no operation in Redis to get a count given a particular prefix. Here are a couple of ideas for you:
If you're only storing Strings, you can use a Hash instead to keep track of everything, in this case, your UUID's would be fields in the hash rather than strings, but they are effectively equivalent. This would permit you to use the HLEN Command to see how many UUIDs you have. This would effectively be an O(1) solution - and has the added benefit of not relying on a separate counter key
Keep a uuid_coutner key, whenever a new UUID is added just use INCR, if one is removed, simply use DECR. Can use INCRBY/DECRBY if you add/remove more than 1 at a time. This would effectively be an O(1) solution, but of course, the counter/keys are stored independently from each other.
You can of course use the post you alluded to (a scan over all of your keys) to count them as well, but that is both O(n) and will incur a lot of unecessary traffic.

String vs Hash for string type? Hash will have only one key instead of many

For example, I see many people are doing something like the following:
> set data:1000 "some string 1"
> set data:1001 "some string 2"
But what about using a hash to minimize the number of keys?
> hset data 1000 "some string 1"
> hset data 1001 "some string 2"
In the second way, it will only create one data key instead of creating many keys in the first way.
Which way is recommended?
I just see some people and tutorial are doing hset data:10 01 xxx. This is actually not related to my question. My question is simply asking what it's recommended between set data:1001 xxx and hset data 1001 xxx.
And I don't plan to modify hash-max-zipmap-entries and hash-max-zipmap-value. That means the hash will exceed the two values eventually. In such a config, are the two ways the same? or Which way is recommended?

Reasons to use strings:
you need per value timeouts
the values are semantically isolated
you're on cluster and want the values to be sharded over different nodes to spread load (sharding is based on the key)
Reasons to use hashes:
you want to be able to purge all of them at once (del/unlink), or have a timeout that impacts all of those values at once
you want to be able to enumerate them (prefer hscan/hgetall over scan/keys)
slightly better memory usage on the keys themselves
the values are semantically related
it is OK for all the values to be on the same node (whether single-server or cluster)

This all depends on the tradeoffs you want to support. In general, using hashes will have a smaller memory footprint than using simple keys. In fact, it is about an order of magnitude less memory. And access to hash values is constant time. So, if you are using redis simply as a key-value store, then hashes are way more efficient than simple keys.
However, you will want to use simple keys if you need to support expiration, keyspace notifications, etc, then you will need to use simple keys.
Just be careful to tweak the values for hash-max-zipmap-entries and hash-max-zipmap-value in your redis.conf in order to ensure that hashes are treated correctly for your environment.
You can read all about the details in the memory optimization section of the documentation.

In Redis is it possible to find all hashes with a key containing a specified value?

I am using Jedis, and new to both that and Redis itself. I have db that stores hashes, and need to find all keys in the db that contain an entry with a specified key and a specified value. EG: "find all hashes in the db that have key/value of STATUS=ERROR". Is this possible in Jedis? From what I can tell from googling, hscan will find keys in a specified hash.
More generally, by way of teaching me to fish, any pointers for where to look this up? It seems there is no real jedis api doc, and not even the Redis doc itself seems to have nothing on hscan.

As you mentioned, you can use HSCAN to find the specified key-value pair from a hash. Also, you need to use the SCAN command to find all hashes.
However, this is NOT an efficient solution. In order to achieve your goal efficiently, you need to build an extra index, i.e. use a Redis SET to save keys of all hashes that have the specified key-value pair.
HSET hash1 STATUS ERROR
// ...
// HSET other members
// ...
// add it to index
SADD status:error hash1
// get all hashes have the specified key-value pair
SMEMBERS status:error
UPDATE:
As #Itamar Haber mentioned in the comments, if you have many records in the SET, you should use SSCAN to get these members. Since in this case, SMEMBERS might block Redis for a long time.

Which approach is better when using Redis?

I'm facing following problem:
I wan't to keep track of tasks given to users and I want to store this state in Redis.
I can do:
1) create list called "dispatched_tasks" holding many objects (username, task)
2) create many (potentialy thousands) lists called dispatched_tasks:username holding usually few objects (task)
Which approach is better? If I only thought of my comfort, I would choose the second one, as from time to time I will have to search for particular user tasks, and this second approach gives this for free.
But how about Redis? Which approach will be more performant?
Thanks for any help.

Redis supports different kinds of data structures as shown here. There are different approaches you can take:
Scenario 1:
Using a list data type, your list will contain all the task/user combination for your problem. However, accessing and deleting a task runs in O(n) time complexity (it has to traverse the list to get to the element). This can have an impact in performance if your user has a lot of tasks.
Using sets:
Similar to lists, but you can add/delete/check for existence in O(1) and sets elements are unique. So if you add another username/task that already exists, it won't add it.
Scenario 2:
The data types do not change. The only difference is that there will be a lot more keys in redis, which in can increase the memory footprint.
From the FAQ:
What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a Hash, List, Set, Sorted
Set?
Redis can handle up to 232 keys, and was tested in practice to handle
at least 250 million keys per instance.
Every hash, list, set, and sorted set, can hold 232 elements.
In other words your limit is likely the available memory in your
system.
What's the Redis memory footprint?
To give you a few examples (all obtained using 64-bit instances):
An empty instance uses ~ 3MB of memory. 1 Million small Keys ->
String Value pairs use ~ 85MB of memory. 1 Million Keys -> Hash
value, representing an object with 5 fields, use ~ 160 MB of
memory. To test your use case is trivial using the
redis-benchmark utility to generate random data sets and check with
the INFO memory command the space used.

Use set or just create keys in redis to check existence?

I can think of two ways of checking existence using redis:
Use the whole database as a 'set', and just SET a key and checking existence by GETing it (or using EXISTS as mentioned in the comment by #Sergio Tulentsev)
Use SADD to add all members to a key and check existence by SISMEMBER
Which one is better? Will it be a problem, compared to the same amount of keys in a single set, if I choose the first method and the number of keys in a database gets larger?

In fact, besides these two methods, you can also use the HASH data structure with HEXISTS command (I'll call this method as the third solution).
All these solutions are fast enough, and it's NOT a problem if you have a large SET, HASH, or keyspace.
So, which one should we use? It depends on lots of things...
Does the key has value?
Keys of both the first and the third solution can have value, while the second solution CANNOT.
So if there's no value for each key, I'd prefer the second solution, i.e. SET solution. Otherwise, you have to use the first or third solution.
Does the value has structure?
If the value is NOT raw string, but a data structure, e.g. LIST, SET. You have to use the first solution, since HASH's value CAN only be raw string.
Do you need to do set operations?
If you need to do intersection, union or diff operations on multiple data sets, you should use the second solution. Redis has built-in commands for these operations, although they might be slow commands.
Memory efficiency consideration
Redis takes more memory-efficient encoding for small SET and HASH. So when you have lots of small data sets, take the second and the third solution can save lots of memory. See this for details.
UPDATE
Do you need to set TTL for these keys?
As #dizzyf points out in the comment, if you need to set TTL for these keys, you have to use the first solution. Because items of HASH and SET DO NOT have expiration property. You can only set TTL for the entire HASH or SET, NOT their elements.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas