Is it possible to search for occurrences of a specific value in redis?
It's easy enough to do the same for keys
SET firstname "John"
KEYS f?rstname
["firstname"]
But can one search for all occurrences of "John" or better yet "J*hn" ?
As far as I know there is no such option in Redis. As you mentioned KEYS pattern can be used to search for the keys with specific pattern, but similar functionality on values would result into search among all of the keys/fields/elements which may not be trivial since Redis has advanced data structures like hashes, sets and lists. Time complexity of this operation would be possibly even greater than O(N) which is why also KEYS command shouldn't be used in production environments.
Related
For example, I see many people are doing something like the following:
> set data:1000 "some string 1"
> set data:1001 "some string 2"
But what about using a hash to minimize the number of keys?
> hset data 1000 "some string 1"
> hset data 1001 "some string 2"
In the second way, it will only create one data key instead of creating many keys in the first way.
Which way is recommended?
I just see some people and tutorial are doing hset data:10 01 xxx. This is actually not related to my question. My question is simply asking what it's recommended between set data:1001 xxx and hset data 1001 xxx.
And I don't plan to modify hash-max-zipmap-entries and hash-max-zipmap-value. That means the hash will exceed the two values eventually. In such a config, are the two ways the same? or Which way is recommended?
Reasons to use strings:
you need per value timeouts
the values are semantically isolated
you're on cluster and want the values to be sharded over different nodes to spread load (sharding is based on the key)
Reasons to use hashes:
you want to be able to purge all of them at once (del/unlink), or have a timeout that impacts all of those values at once
you want to be able to enumerate them (prefer hscan/hgetall over scan/keys)
slightly better memory usage on the keys themselves
the values are semantically related
it is OK for all the values to be on the same node (whether single-server or cluster)
This all depends on the tradeoffs you want to support. In general, using hashes will have a smaller memory footprint than using simple keys. In fact, it is about an order of magnitude less memory. And access to hash values is constant time. So, if you are using redis simply as a key-value store, then hashes are way more efficient than simple keys.
However, you will want to use simple keys if you need to support expiration, keyspace notifications, etc, then you will need to use simple keys.
Just be careful to tweak the values for hash-max-zipmap-entries and hash-max-zipmap-value in your redis.conf in order to ensure that hashes are treated correctly for your environment.
You can read all about the details in the memory optimization section of the documentation.
I have an application for which I'm storing millions of keys in Redis in the format:
Type+#+Year+#+MachineType+#+City+#+State+#+Country+#+Size
Sample_Key Value
Retail#2017#MachineA#SanFrancisco#CA#USA#500 1000
Bulk#2017#MachineB#NewYorkCity#NY#USA#1000 100000
Retail#2017#MachineA#NewYorkCity#NY#USA#1000 5000
My customers would come in and want a specific value or set of values aggregated, so say, everything in San Francisco, CA and New York, NY, so as to get 1000 and 5000 (and then we'd perform some aggregation on it). What Redis function(s) are best suited to say, "Give me all values for keys that contain either San Francisco or New York", or "Give me all keys for store sizes of 1000", or any such combination.
I'd need to get all these values, aggregate them and serve it out with (ideally) millisecond-level latency.
I have looked up KEYS which they say NOT to use in Production (Plus, I never quite figured out how to get the values for BOTH (or potentially more) cities/other combinations). I have looked up SCAN, but being cursor-based, it might not be the best solution. There's nothing else that I have looked at which quite covers the scenario I've described in a quick and easy way. Help?
You are right about don't use KEYS. In your case SCAN is bad too, because you have to iterate over a large amount of keys(millions). SCAN won't block other commands for too long, but the total time cost won't shorten. If you run SCAN command often, the load on Redis will be quite heavy. In fact I think Redis doesn't fit for this case, RDBMS is better.
I have 2 suggestions:
Use SCAN and be aware of the potential risk, do some serious test.
Store and maintain a key relationship(e.g. City-Keys) somewhere, first find the keys you need, then get the value of these keys from redis. In your case, I'd suggest put all keys in a RDBMS table, and use LIKE to find the ones you need(use search engine such as solr or elasticsearch is much better on performance, of course)
I can think of two ways of checking existence using redis:
Use the whole database as a 'set', and just SET a key and checking existence by GETing it (or using EXISTS as mentioned in the comment by #Sergio Tulentsev)
Use SADD to add all members to a key and check existence by SISMEMBER
Which one is better? Will it be a problem, compared to the same amount of keys in a single set, if I choose the first method and the number of keys in a database gets larger?
In fact, besides these two methods, you can also use the HASH data structure with HEXISTS command (I'll call this method as the third solution).
All these solutions are fast enough, and it's NOT a problem if you have a large SET, HASH, or keyspace.
So, which one should we use? It depends on lots of things...
Does the key has value?
Keys of both the first and the third solution can have value, while the second solution CANNOT.
So if there's no value for each key, I'd prefer the second solution, i.e. SET solution. Otherwise, you have to use the first or third solution.
Does the value has structure?
If the value is NOT raw string, but a data structure, e.g. LIST, SET. You have to use the first solution, since HASH's value CAN only be raw string.
Do you need to do set operations?
If you need to do intersection, union or diff operations on multiple data sets, you should use the second solution. Redis has built-in commands for these operations, although they might be slow commands.
Memory efficiency consideration
Redis takes more memory-efficient encoding for small SET and HASH. So when you have lots of small data sets, take the second and the third solution can save lots of memory. See this for details.
UPDATE
Do you need to set TTL for these keys?
As #dizzyf points out in the comment, if you need to set TTL for these keys, you have to use the first solution. Because items of HASH and SET DO NOT have expiration property. You can only set TTL for the entire HASH or SET, NOT their elements.
How i can find keys with multiple match pattern, for example i've keys with
foo:*, event:*, poi:* and article:* patterns.
how i find keys with redis keys function for match with foo:* or poi:* pattern, its like
find all keys with preffix foo:* or poi:*
You should not do this. KEYS is mainly a debug command. It is not supposed to be used for anything else.
Redis is not a database supporting ad-hoc queries: you are supposed to provide access paths for the data you put into Redis (using extra set or hash or zset indexes).
If you really need to run arbitrary boolean expressions on keys to select data, I would suggest to do it offline by using the rdb-redis-tools package.
What is the most convenient/fast way to implement a sorted set in redis where the values are objects, not just strings.
Should I just store object id's in the sorted set and then query every one of them individually by its key or is there a way that I can store them directly in the sorted set, i.e. must the value be a string?
It depends on your needs, if you need to share this data with other zsets/structures and want to write the value only once for every change, you can put an id as the zset value and add a hash to store the object. However, it implies making additionnal queries when you read data from the zset (one zrange + n hgetall for n values in the zset), but writing and synchronising the value between many structures is cheap (only updating the hash corresponding to the value).
But if it is "self-contained", with no or few accesses outside the zset, you can serialize to a chosen format (JSON, MESSAGEPACK, KRYO...) your object and then store it as the value of your zset entry. This way, you will have better performance when you read from the zset (only 1 query with O(log(N)+M), it is actually pretty good, probably the best you can get), but maybe you will have to duplicate the value in other zsets / structures if you need to read / write this value outside, which also implies maintaining synchronisation by hand on the value.
Redis has good documentation on performance of each command, so check what queries you would write and calculate the total cost, so that you can make a good comparison of these two options.
Also, don't forget that redis comes with optimistic locking, so if you need pessimistic (because of contention for instance) you will have to do it by hand and/or using lua scripts. If you need a lot of sync, the first option seems better (less performance on read, but still good, less queries and complexity on writes), but if you have values that don't change a lot and memory space is not a problem, the second option will provide better performance on reads (you can duplicate the value in redis, synchronize the values periodically for instance).
Short answer: Yes, everything must be stored as a string
Longer answer: you can serialize your object into any text-based format of your choosing. Most people choose MsgPack or JSON because it is very compact and serializers are available in just about any language.