Iterating through keys in Redis

Iterating through keys in Redis - redis

I have just started with Redis. My DB contains about 1 billion records. Using HKEYS * results in an out of memory error.
Is there a way to iterate through keys? Something like HKEYS * but with a limit n?
Edit:
I am now using a loop which matches a pattern
for c in '1234567890abcedf':
r.keys(c + '*')

Available since Redis 2.8.0 are the cursor based Redis iteration commands (SCAN, HSCAN etc) that let you iterate efficiently over billions of keys.
For your specific case, the start using HSCAN instead of HKEYS/HGETALL. It is efficient, cheap on server resources and scales very well. You can even add a pattern to HSCAN unlike HKEYS.
e.g.
127.0.0.1:6379> HMSET hash0 key0 value0 key1 value1 entry0 data0 entry1 data1
OK
127.0.0.1:6379> HSCAN hash0 0 MATCH key*
1) "0"
2) 1) "key0"
2) "value0"
3) "key1"
4) "value1"
127.0.0.1:6379> HSCAN hash0 0
1) "0"
2) 1) "key0"
2) "value0"
3) "key1"
4) "value1"
5) "entry0"
6) "data0"
7) "entry1"
8) "data1"

You can't iterate over redis keys directly, but you can accomplish something very similar by transactionally writing the key portion of your key-value pair to a sorted set at the same time you write your key-value pair.
Downstream, you would "iterate" over your keys by reading n keys from the sorted set, and then transactionally removing them from the sorted set at the same time as you remove the associated key-value pair.
I wrote up an example with some C# code here: http://rianjs.net/2014/04/how-to-iterate-over-redis-keys/
You could do this in any language that has a redis library that supports transactions.

For iterating through keys:
SCAN cursor [MATCH pattern] [COUNT count]
http://redis.io/commands/scan
For iterating through the values of a hash
HSCAN key cursor [MATCH pattern] [COUNT count]
http://redis.io/commands/hscan

Sorry, at the current time, year 2012, the simple answer is no, however, with lua scripting you could do it, although that is not direct redis in the strictest sense.

Related

Can we store huge array of objects into Redis?

I have an array of objects similar to this
MyArray({obj1:"obj1value",obj2:"obj2value",obj3:"obj3value"})
The length of above is 500 000. Can I push it into Redis? I tried Hmset but unable to get proper solution.
In order to store above array into MySql, performance is pretty much expensive. As I need to iterate every time for insert. Hence I thought to Go with Redis. I am using Node and Redis.

You can use redis hash instead of redis list to store our data.
var redis = require("redis");
var client = redis.createClient();
client.hmset('myObject',
{'obj1': 'objValue1',
'obj2': 'objValue2',
'obj3': 'objValue3'
});
Here, myObject is the name of your hash holding key-value pair inside it.
Result:
127.0.0.1:6379> hgetall myObject
1) "obj1"
2) "objValue1"
3) "obj2"
4) "objValue2"
5) "obj3"
6) "objValue3"

In Redis, is it possible to sort a member across multiple Sorted Sets?

I'm tracking members in multiple Sorted Sets in Redis as a way to do multi-column indexing on the members. As an example, let's say I have two Sorted Sets, lastseen (which is epoch time) and points, and I store usernames as members in these Sorted Sets.
I'm wanting to first sort by lastseen so I can get the users seen within the last day or month, then I'm wanting to sort the resulting members by points so I effectively have the members seen within the last day or month sorted by points.
This would be easy if I could store the result of a call to ZREVRANGEBYSCORE to a new Sorted Set (we'll call the new Sorted Set temp), because then I could sort lastseen with limits, store the result to temp, use ZINTERSTORE against temp and points with a weight of zero for out (stored to result), and finally use ZREVRANGEBYSCORE again on result. However, there's no built-in way in Redis to store the result of ZRANGE to a new Sorted Set.
I looked into using the solution posted here, and while it does seem to order the results correctly, the resulting scores in the Sorted Set can no longer be used to accurately limit results based on time (ie. only want ones within the last day).
For example:
redis> ZADD lastseen 12345 "foo"
redis> ZADD lastseen 12350 "bar"
redis> ZADD lastseen 12355 "sucka"
redis> ZADD points 5 "foo"
redis> ZADD points 3 "bar"
redis> ZADD points 9 "sucka"
What I'd like to end up with, assuming my time window is between 12349 and 12356, is the list of members ['sucka', 'bar'].

The solutions I can think of are:
1) Your wish was to ZREVRANGEBYSCORE and somehow save the temporary result. Instead you could copy the zset (which can be done with a ZINTERSTORE with only one set as an argument), then do a ZREMRANGEBYSCORE on the new copy to get rid of the times you're not interested in, then do the final ZINTERSTORE.
2) Do it in a loop on the client, as Eli suggested.
3) Do the same thing in a Lua script.
These are all potentially expensive operations, so what's going to work best will depend on your data and use case. Without knowing more, I would personally lean towards the Lua solution.

For queries that get this complex, you want to supplement Redis' built-in commands with another processing language. The easiest way to do that is calling from within whatever your backend language is and using that to process. An example in Python using redis-py is:
import redis
finish_time, start_time = 12356, 12349
r = redis.Redis(host='localhost', port=6379, db=0, password='some_pass')
entries_in_time_frame = r.zrevrangebyscore('lastseen', finish_time, start_time)
p = r.pipeline()
for entry in entries_in_time_frame:
p.zscore('points', entry)
scores = zip(entries_in_time_frame, p.execute())
sorted_entries = [tup[0] for tup in sorted(scores, key=lambda tup: tup[1])]
>>> ['sucka', 'bar']
Note the pipeline, so we're only ever sending two calls to the Redis server, so network latency shouldn't slow us down much. If you need to go even faster (perhaps if what's returned by the first ZREVRANGEBYSCORE is very long), you can rewrite the same logic as above as a Lua script. Here's a working example (note my lua is rusty, so this can be optimized):
local start_time = ARGV[1]
local finish_time = ARGV[2]
local entries_in_time_frame = redis.call('ZREVRANGEBYSCORE', KEYS[1], finish_time, start_time)
local sort_function = function (k0, k1)
local s0 = redis.call('ZSCORE', KEYS[2], k0)
local s1 = redis.call('ZSCORE', KEYS[2], k1)
return (s0 > s1)
end
table.sort(entries_in_time_frame, sort_function)
return entries_in_time_frame
You can call it like so:
redis-cli -a some_pass EVAL "$(cat script.lua)" 2 lastseen points 12349 12356
Returning:
1) "bar"
2) "foo"

Redis scan count: How to force SCAN to return all keys matching a pattern?

I am trying to find out values stored in a list of keys which match a pattern from redis. I tried using SCAN so that later on i can use MGET to get all the values, The problem is:
SCAN 0 MATCH "foo:bar:*" COUNT 1000
does not return any value whereas
SCAN 0 MATCH "foo:bar:*" COUNT 10000
returns the desired keys.
How do i force SCAN to look through all the existing keys? Do I have to look into lua for this?

With the code below you will scan the 1000 first object from cursor 0
SCAN 0 MATCH "foo:bar:*" COUNT 1000
In result, you will get a new cursor to recall
SCAN YOUR_NEW_CURSOR MATCH "foo:bar:*" COUNT 1000
To scan 1000 next object. Then when you increase COUNT from 1000 to 10000 and retrieve data you scan more keys then in your case match more keys.
To scan the entire list you need to recall SCAN until the cursor give in response return zero (i.e entire scan)
Use INFO command to get your amount of keys like
db0:keys=YOUR_AMOUNT_OF_KEYS,expires=0,avg_ttl=0
Then call
SCAN 0 MATCH "foo:bar:*" COUNT YOUR_AMOUNT_OF_KEYS

Just going to put this here for anyone interested in how to do it using the python redis library:
import redis
redis_server = redis.StrictRedis(host=settings.redis_ip, port=6379, db=0)
mid_results = []
cur, results = redis_server.scan(0,'foo:bar:*',1000)
mid_results += results
while cur != 0:
cur, results = redis_server.scan(cur,'foo:bar:*',1000)
mid_results += results
final_uniq_results = set(mid_results)
It took me a few days to figure this out, but basically each scan will return a tuple.
Examples:
(cursor, results_list)
(5433L, [... keys here ...])
(3244L, [... keys here, maybe ...])
(6543L, [... keys here, duplicates maybe too ...])
(0L, [... last items here ...])
Keep scanning cursor until it returns to 0.
There is a guarantee it will return to 0.
Even if the scan returns an empty results_list between scans.
However, as noted by #Josh in the comments, SCAN is not guaranteed to terminate under a race condition where inserts are happening at the same time.
I had a hard time figuring out what the cursor number was and why I would randomly get an empty list, or repeated items, but even though I knew I had just put items in.
After reading:
https://github.com/antirez/redis/blob/unstable/src/dict.c#L772-L855
It made more sense, but still there is some deep programming magic and compromises happening to iterate the sets.

If your use case involves Python, or if you just want to get the values once and has Python installed on your machine, this is a trivial task if you use the scan_iter method on the redis python library:
from redis import StrictRedis
redis = StrictRedis.from_url(REDIS_URI)
keys = []
for key in redis.scan_iter('foo:bar:*', 1000):
keys.append(key)
In the end, keys will contain all the keys you would get by applying #khanou 's method.
This is also more efficient than doing shell scripts, since those spawn a new client on each iteration of the loop.

how to get values of multiple keys value at a time in redis?

how to get multiple keys' values in redis? for example, keys are x, y, and z. they have their own values a, b, and c respectively. I want to get all values at a time for all such keys.

The native protocol supports the MGET method as shown in the documentation:
redis> SET key2 "World"
OK
redis> MGET key1 key2 nonexisting
1) "Hello"
2) "World"
3) (nil)
redis>
This method allows you to retrieve the values of multiple keys in a single roundtrip to the server. Depending on the actual platform you are using and the client code, the method might be called differently in your client library. For example if you are using .NET and the ServiceStack.Redis client you could use the GetValues method on the IRedisClient:
List<string> GetValues(List<string> keys);

Add a value to a Redis list only if it doesn't already exist in the list?

I'm trying to add a value to a list but only if it hasn't been added yet.
Is there a command to do this or is there a way to test for the existence of a value within a list?
Thanks!

I need to do the same.
I think about to remove the element from the list and then add it again. If the element is not in the list, redis will return 0, so there is no error
lrem mylist 0 myitem
rpush mylist myitem

As Tommaso Barbugli mentioned you should use a set instead a list if you need only unique values.
see REDIS documentation SADD
redis> SADD myset "Hello"
(integer) 1
redis> SADD myset "World"
(integer) 1
redis> SADD myset "World"
(integer) 0
redis> SMEMBERS myset
1) "World"
2) "Hello"
If you want to check the presence of a value in the set you may use SISMEMBER
redis> SADD myset "one"
(integer) 1
redis> SISMEMBER myset "one"
(integer) 1
redis> SISMEMBER myset "two"
(integer) 0

It looks like you need a set or a sorted set.
Sets have O(1) membership test and enforced uniqueness.

If you can't use the SETs (in case you want to achieve some blocking POP/PUSH list features) you can use a simple script:
script load 'local exists = false; for idx=1, redis.call("LLEN",KEYS[1]) do if (redis.call("LINDEX", KEYS[1], idx) == ARGV[1]) then exists = true; break; end end; if (not exists) then redis.call("RPUSH", KEYS[1], ARGV[1]) end; return not exists or 0'
This will return the SHA code of the script you've added.
Just call then:
evalsha 3e31bb17571f819bea95ca5eb5747a373c575ad9 1 test-list myval
where
3e31bb17571f819bea95ca5eb5747a373c575ad9 (the SHA code of the script you added)
1 — is number of parameters (1 is constant for this function)
test-list — the name of your list
myval - the value you need to add
it returns 1 if the new item was added or 0 if it was already in the list.

Such feature is available in set using hexistshexists command in redis.

Checking a list to see if a member exists within it is O(n), which can get quite expensive for big lists and is definitely not ideal. That said, everyone else seems to be giving you alternatives. I'll just tell you how to do what you're asking to do, and assume you have good reasons for doing it the way you're doing it. I'll do it in Python, assuming you have a connection to Redis called r, some list called some_list and some new item to add called new_item:
lst = r.lrange(list_name, -float('Inf'), float('Inf'))
if new_item not in lst:
r.rpush(list_name, new_item)

I encountered this problem while adding to a task worker queue, because I wanted to avoid adding many duplicate tasks. Using a Redis set (as many people are suggesting) would be nice, but Redis sets don't have a "blocking pop" like BRPOPLPUSH, so they're not good for task queues.
So, here's my slightly non-ideal solution (in Python):
def pushOnlyNewItemsToList(redis, list_name, items):
""" Adds only the items that aren't already in the list.
Though if run simultaneously in multiple threads, there's still a tiny chance of adding duplicate items.
O(n) on the size of the list."""
existing_items = set(redis.lrange(list_name,0,-1))
new_items = set(items).difference(existing_items)
if new_items:
redis.lpush(list_name, *new_items)
Note the caveats in the docstring.
If you need to truly guarantee no duplicates, the alternative is to run LREM, LPUSH inside a Redis pipeline, as in 0xAffe's answer. That approach causes less network traffic, but has the downside of reordering the list. It's probably the best general answer if you don't care about list order.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas