In Redis is it possible to find all hashes with a key containing a specified value? - redis

I am using Jedis, and new to both that and Redis itself. I have db that stores hashes, and need to find all keys in the db that contain an entry with a specified key and a specified value. EG: "find all hashes in the db that have key/value of STATUS=ERROR". Is this possible in Jedis? From what I can tell from googling, hscan will find keys in a specified hash.
More generally, by way of teaching me to fish, any pointers for where to look this up? It seems there is no real jedis api doc, and not even the Redis doc itself seems to have nothing on hscan.

As you mentioned, you can use HSCAN to find the specified key-value pair from a hash. Also, you need to use the SCAN command to find all hashes.
However, this is NOT an efficient solution. In order to achieve your goal efficiently, you need to build an extra index, i.e. use a Redis SET to save keys of all hashes that have the specified key-value pair.
HSET hash1 STATUS ERROR
// ...
// HSET other members
// ...
// add it to index
SADD status:error hash1
// get all hashes have the specified key-value pair
SMEMBERS status:error
UPDATE:
As #Itamar Haber mentioned in the comments, if you have many records in the SET, you should use SSCAN to get these members. Since in this case, SMEMBERS might block Redis for a long time.

Related

String vs Hash for string type? Hash will have only one key instead of many

For example, I see many people are doing something like the following:
> set data:1000 "some string 1"
> set data:1001 "some string 2"
But what about using a hash to minimize the number of keys?
> hset data 1000 "some string 1"
> hset data 1001 "some string 2"
In the second way, it will only create one data key instead of creating many keys in the first way.
Which way is recommended?
I just see some people and tutorial are doing hset data:10 01 xxx. This is actually not related to my question. My question is simply asking what it's recommended between set data:1001 xxx and hset data 1001 xxx.
And I don't plan to modify hash-max-zipmap-entries and hash-max-zipmap-value. That means the hash will exceed the two values eventually. In such a config, are the two ways the same? or Which way is recommended?
Reasons to use strings:
you need per value timeouts
the values are semantically isolated
you're on cluster and want the values to be sharded over different nodes to spread load (sharding is based on the key)
Reasons to use hashes:
you want to be able to purge all of them at once (del/unlink), or have a timeout that impacts all of those values at once
you want to be able to enumerate them (prefer hscan/hgetall over scan/keys)
slightly better memory usage on the keys themselves
the values are semantically related
it is OK for all the values to be on the same node (whether single-server or cluster)
This all depends on the tradeoffs you want to support. In general, using hashes will have a smaller memory footprint than using simple keys. In fact, it is about an order of magnitude less memory. And access to hash values is constant time. So, if you are using redis simply as a key-value store, then hashes are way more efficient than simple keys.
However, you will want to use simple keys if you need to support expiration, keyspace notifications, etc, then you will need to use simple keys.
Just be careful to tweak the values for hash-max-zipmap-entries and hash-max-zipmap-value in your redis.conf in order to ensure that hashes are treated correctly for your environment.
You can read all about the details in the memory optimization section of the documentation.

Get all hashes exists in redis

I'm have hashes in redis cache like:
Hash Key Value
hashme:1 Hello World
hashme:2 Here Iam
myhash:1 Next One
My goal is to get the Hashes as output in the CLI like:
hashme
myhash
If there's no such option, this is ok too:
hashme:1
hashme:2
myhash:1
I didn't find any relevant command for it in Redis API.
Any suggestions ?
You can use the SCAN command to get all keys from Redis. Then for each key, use the TYPE command to check if it's a hash.
UPDATE:
With Redis 6.0, the SCAN command supports TYPE subcommand, and you can use this subcommand to scan all keys of a specified type:
SCAN 0 TYPE hash
Also never use KEYS command in production environment!!! It's a dangerous command which might block Redis for a long time.
keys *
is work for me. you Can try it.
The idea of redis (and others K/v stores) is for you to build an index. The database won't do it for you. It's a major difference with relational databases, which conributes to better performances.
So when your app creates a hash, put its key into a SET. When your app deletes a hash, remove its key from the SET. And then to get the list of hash IDs, just use SMEMBERS to get the content of the SET.
connection.keys('*') this will bring all the keys irrespective of the data type as everything in redis is stored as key value format
for redis in python, you can use below command to retrieve keys from redis db
def standardize_list(bytelist):
return [x.decode('utf-8') for x in bytelist]
>>> standardize_list(r.keys())
['hat:1236154736', 'hat:56854717', 'hat:1326692461']
here r variable is redis connection object

Choose between HINCRBY and INCR for redis

I have a forum and want to save and show topics' view count using redis. It seems I have two methods to do this: HINCRBY and INCR. Which should I choose? And why?( Given I have 10,000,000 topics in total )
With HINCRBY I can use one key to keep all values, but the hash is big. But with INCR I'll have many keys.
If you use hashes (so go with HINCRBY) you can reduce your memory footprint if you can use multiple hashes instead of one: http://redis.io/topics/memory-optimization#using-hashes-to-abstract-a-very-memory-efficient-plain-key-value-store-on-top-of-redis
All you have to do is find some way of distributing your keys into multiple hashes, not just one, for example these guys found a way: http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value

Redis expire on large set of keys

My problem is: i have a set of values that each of them has to have an expire value.
code:
set a:11111:22222 someValue
expire a:11111:22222 604800 \\usually equal a week
In a perfect world i would have put all those values in a hash and give each of them it's appropriate expire value, but redis does not allow expire on a hash fields.
problem is that i also have a process that need to get all those keys about once an hour
keys a:*
this command is really expensive and according to redis documentation can cause performance issues. I have about 25000-30000 keys at each given moment.
Does someone knows how can i solve such a problem?
thumbs up it guarantee (-;
Roy
Let me propose an alternative solution.
Rather than asking Redis to scan all the keys, why not perform a background dump, and parse the dump to extract the keys? This way, there is zero impact on the Redis instance itself.
Parsing the dump file is not as scary as it sounds, because you can use the excellent redis-rdb-tools package:
https://github.com/sripathikrishnan/redis-rdb-tools
You can either convert the dump file into a json file, and then parse the json file, or use the Python API to extract the keys by yourself.
As you've already mentioned, using keys is not a good solution to get your keys:
Warning: consider KEYS as a command that should only be used in production environments with extreme care. It may ruin performance when it is executed against large databases. This command is intended for debugging and special operations, such as changing your keyspace layout. Don't use KEYS in your regular application code. If you're looking for a way to find keys in a subset of your keyspace, consider using sets.
Source: Redis docs for KEYS
As the docs are suggesting, you should build your own indices!
A common way of building an index is to use a sorted set. You can read more on how it's working on my question over here.
Building references to your a:* keys using a sorted set, will also allow you to only select the required keys in relation to a date or any other int value, in case you're filtering the results!
And yes: it would be awesome if hashes could expire. Sadly it looks like its not going to happen, but there are in fact creative alternatives to take care about it by yourself.
Why don't you use a sorted set.
Here is some data creation sequence.
redis 127.0.0.1:6379> setex a:11111:22222 604800 someValue
OK
redis 127.0.0.1:6379> zadd user:index 1385112435 a:11111:22222 // 1384507635 + 604800
(integer) 1
redis 127.0.0.1:6379> setex a:11111:22223 604800 someValue2
OK
redis 127.0.0.1:6379> zadd user:index 1385113289 a:11111:22223 // 1384508489 + 604800
(integer) 1
redis 127.0.0.1:6379> zrangebyscore user:index 1385112435 1385113289
1) "a:11111:22222"
2) "a:11111:22223"
This is no select performance issue.
but, It spends more memory and insert cost.

paging through entries in redis hash

I couldn't find a way to "page" through redis hashes (doc).
I've got ~5million hash entries in 1 redis db. I am trying to iterate through all of them without having to resort to building a list of entry keys.
Can this be achieved?
Since all the redis hash commands require the key element. You need to store your set of keys to page your hash.
See my answer to this question for an example of key iteration by using extra sets.
There is no way to avoid storing extra sets (or lists) and still iterate on a huge number of keys. The KEYS command is not an option.
I had exactly the same requirement of Redis Hash Pagination and yes it is possible to page through redis hash using HSCAN command. Detailed documentation of the same is present at SCAN.
Usage: Hscan <your key/hash name> <cursor-id> count <page-size>.
cursor id which should be passed initially is 0 and it returns a cursor-id and data which is of page-size. The cursor id returned needs to be passed in the subsequent call for fetching subsequent data.