Given that I have KEY VALUEs stored in Redis and wish to expire keys starting with a pattern.
For example have stored:
SET hello.world "Hello"
SET there.how "There"
SET hello.are.you "Are you"
Then after the keys were set, want to expire all keys starting with "hello".
Assuming there are really a large number of keys, not just this simple example. Do not wish to do many round trip calls to Redis.
I think easiest way is just iterate all the keys with scan(pattern), and expire the keys with pipeline, it will do certain round trips as you say
def while_true_loop():
target_pattern = 'hello.([A-Za-z0-9]*)$'
simple_pattern = 'hello.*'
import re
cursor = 0
while True:
cursor, keys = rs.scan(cursor, match=simple_pattern, count=1000)
pipe = rs.pipeline()
for key in keys:
if re.match(target_pattern, key):
pipe.expire(key, 3600)
pipe.execute()
if not keys:
break
Related
I need to get in one call all data fields for a set of known REDIS hash keys. I've used MGET for string keys such as :
MGET key [key ...]
Available since 1.0.0.
Time complexity: O(N) where N is the number of keys to retrieve.
Returns the values of all specified keys. For every key that does not hold a string value or does not exist, the special value nil is returned. Because of this, the operation never fails.
HMGET only brings all fields for one key. I need many keys all fields by key.
There is no command like that, redis hashes work within the hash, so HMGET work inside one hash and give all the fields in that hash. There is no way to access all the fields in multiple hashes at ones.
However you can user several HMGET on each hash and get all the fields. you can pipeline these commands to execute in a one go.
Option 1
Ex. implementation in pseudo code
Pipeline p
List<String> = p.hgetall('key1', fields...);
List<String> = p.hgetall('key2', fields...);
List<String> = p.hgetall('key3', fields...);
p.exec();
Option 2
Other option is to write a LUA script and call that using EVAL
local array = {}
local keys = redis.call('KEYS', '<your pattern>')
for _,key in ipairs(keys) do
local val = redis.call('HGETALL', key)
array[#array + 1] = val
end
return array
Call the lua sctipt
redis-cli EVAL "$(cat test.lua)" 0
1) 1) "field1"
2) "val"
2) 1) "field1"
2) "val"
3) "field2"
4) "val2"
As noted in another answer, there's no built in way but there more workarounds besides a transaction.
Option 1: use a Lua script (i.e. EVAL "..." 3 myhash1 myhash2 myhash3 myfield)
local r = {}
while (#KEYS > 0) do
local k = table.remove(KEYS,1)
r[#r+1] = redis.call('HGET', k, ARGV[1])
end
return r
Option 2: write a Redis module
Out of scope as a an answer :)
I'm iterating through data and dumping some to a Redis DB. Here's an example:
hmset id:1 username "bsmith1" department "accounting"
How can I increment the unique ID on the fly and then use that during the next hmset command? This seems like an obvious ask but I can't quite find the answer.
Use another key, a String, for storing the last ID. Before calling HMSET, call INCR on that key to obtain the next ID. Wrap the two commands in a MULTI/EXEC block or a Lua script to ensure the atomicity of the transaction.
Like Itamar mentions you can store your index/counter in a separate key. In this example I've chosen the name index for that key.
Python 3
KEY_INDEX = 'index'
r = redis.from_url(host)
def store_user(user):
r.incr(KEY_INDEX, 1) # If key doesn't exist it will get created
index = r.get(KEY_INDEX).decode('utf-8') # Decode from byte to string
int_index = int(index) # Convert from string to int
result = r.set('user::%d' % int_index, user)
...
Note that user::<index> is an arbitrary key chosen by me. You can use whatever you want.
If you have multiple machines writing to the same DB you probably want to use pipelines.
How can I delete a complete folder in redis, using DEL command OR in C# in StackExchange.Redis.
SomeFolder1:SomeSubfolder1:somekey1
SomeFolder1:SomeSubfolder1:somekey2
SomeFolder1:SomeSubfolder2:somekey1
How can I Del SomeFolder1:SomeSubfolder1 so remaining keys are
SomeFolder1:SomeSubfolder2:somekey1
There's no such thing as a "folder" in redis, and no functionality to del based on a pattern. Options:
use a database; you can have an arbitrary number of databases (each is numbered, the default is 0), and you can discard a database in a single operation: flushdb - so keep all this associated data in one database and you're sorted
use scan to iterate the matching keys, issuing a del for each match
note that scan is a server command, so you need:
const int db = 0;
var server = muxer.GetServer(...);
var db = muxer.GetDatabase(db);
foreach(var key in server.Keys(db, "SomeFolder1:SomeSubfolder1:*",
pageSize: 500))
{
db.KeyDelete(key, flags: CommandFlags.FireAndForget);
}
the FireAndForget allows you to ignore the individual replies, which means you aren't bound by latency and you don't have TPL overheads.
Fundamentally, though: redis is not meant to be used like this - if you find yourself scanning keys, you are doing something wrong. A more typical implementation might be to use a hash to store the folder (which each key/value pair inside the hash being the contents) - then deleting the hash is one operation. Alternatively, use a set to store the keys of the items inside each logical folder.
Hash approach:
hash = SomeFolder1:SomeSubfolder1
key = somekey1, value = ...
key = somekey2, value = ...
hash = SomeFolder1:SomeSubfolder2
key = somekey1, value = ...
Set approach:
set: SomeFolder1:SomeSubfolder1
somekey1
somekey2
set: SomeFolder1:SomeSubfolder2
somekey1
string: SomeFolder1:SomeSubfolder1:somekey1
value
string: SomeFolder1:SomeSubfolder1:somekey2
value
string: SomeFolder1:SomeSubfolder2:somekey1
value
I have a Redis set with key 'a' and value '1','2','3'.
Is there a way to set different expire time for each key-value pair in the set.
For example ('a','1') should expire after 60 seconds where as ('a','2') should expire after 120 seconds.
Unfortunately, no. Redis' "containers" (i.e. lists, hashes, sets and sorted sets) do not support per-member expiry, although this functionality has been requested many times in the past.
You can, however, implement your own logic to achieve that result. There are several possible approaches to address this - here's one example. Instead of using a set, use a sorted set (ZSET) and set each member's score to its expiry time using epoch values. This type of workflow could be implemented using a Lua script for example. To add members use something like:
redis.call('zadd', KEYS[1], os.time()+ARGV[1], ARGV[2])
and EVAL it using '1 a 60 1' and '1 a 120 2' as arguments, per your example. To actually "expire" the items from the set, you'll need to delete them once their time has passed. You can do that either by implementing a periodical process that scans your list or upon accessing it. For example, the following Lua can be used to expire members:
redis.call('zremrangebyscore', KEYS[1], '-inf', os.time())
and EVAL it using '1 a' as arguments per your example.
EDIT: How to achieve the above using Python
import time
import redis
def add(r, key, ttl, member):
r.zadd(key, member, int(time.time()+ttl))
def expire(r, key):
r.zremrangebyscore(key, '-inf', int(time.time()))
...
r = redis.Redis()
add(r, 'a', 1, 60)
add(r, 'a', 2, 120)
# periodically or before every operation do
expire(r, 'a')
I have data consisting of user_ids and tags of these user ids.
The user_ids occur multiple times and have pre-specified number of tags (500) however that might change in the feature. What must be stored is the user_id, their tags and their count.
I want later to easily find tags with top score.. etc. Every time a tag appears it is incremented
My implementation in redis is done using sorted sets
every user_id is a sorted set
key is user_id and is a hex number
works like this:
zincrby user_id:x 1 "tag0"
zincrby user_id:x 1 "tag499"
zincrby user_id:y 1 "tag3"
and so on
having in mind that I want to get tags with highest score, is there a better way?
The second issue is that right now I 'm using "keys *" to retrieve these keys for client side manipulation which I know that it's not aimed towards production systems.
Plus it would be great for memory problems to iterate through a specified number of keys (in the range of 10000). I know that keys have to be stored in memory, however they don't follow
a specific pattern to allow for partial retrieval so I can avoid "zmalloc" error (4GB 64 bit debian server).
Keys amount to range of 20 million.
Any thoughts?
My first point would be to note that 4 GB are tight to store 20M sorted sets. A quick try shows that 20M users, each of them with 20 tags would take about 8 GB on a 64 bits box (and it accounts for the sorted set ziplist memory optimizations provided with Redis 2.4 - don't even try this with earlier versions).
Sorted sets are the ideal data structure to support your use case. I would use them exactly as you described.
As you pointed out, KEYS cannot be used to iterate on keys. It is rather meant as a debug command. To support key iteration, you need to add a data structure to provide this access path. The only structures in Redis which can support iteration are the list and the sorted set (through the range methods). However, they tend to transform O(n) iteration algorithms into O(n^2) (for list), or O(nlogn) (for zset). A list is also a poor choice to store keys since it will be difficult to maintain it as keys are added/removed.
A more efficient solution is to add an index composed of regular sets. You need to use a hash function to associate a specific user to a bucket, and add the user id to the set corresponding to this bucket. If the user id are numeric values, a simple modulo function will be enough. If they are not, a simple string hashing function will do the trick.
So to support iteration on user:1000, user:2000 and user:1001, let's choose a modulo 1000 function. user:1000 and user:2000 will be put in bucket index:0 while user:1001 will be put in bucket index:1.
So on top of the zsets, we now have the following keys:
index:0 => set[ 1000, 2000 ]
index:1 => set[ 1001 ]
In the sets, the prefix of the keys is not needed, and it allows Redis to optimize the memory consumption by serializing the sets provided they are kept small enough (integer sets optimization proposed by Sripathi Krishnan).
The global iteration consists in a simple loop on the buckets from 0 to 1000 (excluded). For each bucket, the SMEMBERS command is applied to retrieve the corresponding set, and the client can then iterate on the individual items.
Here is an example in Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# ----------------------------------------------------
import redis, random
POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
NUSERS = 10000
NTAGS = 500
NBUCKETS = 1000
# ----------------------------------------------------
# Fill redis with some random data
def fill(r):
p = r.pipeline()
# Create only 10000 users for this example
for id in range(0,NUSERS):
user = "user:%d" % id
# Add the user in the index: a simple modulo is used to hash the user id
# and put it in the correct bucket
p.sadd( "index:%d" % (id%NBUCKETS), id )
# Add random tags to the user
for x in range(0,20):
tag = "tag:%d" % (random.randint(0,NTAGS))
p.zincrby( user, tag, 1 )
# Flush the pipeline every 1000 users
if id % 1000 == 0:
p.execute()
print id
# Flush one last time
p.execute()
# ----------------------------------------------------
# Iterate on all the users and display their 5 highest ranked tags
def iterate(r):
# Iterate on the buckets of the key index
# The range depends on the function used to hash the user id
for x in range(0,NBUCKETS):
# Iterate on the users in this bucket
for id in r.smembers( "index:%d"%(x) ):
user = "user:%d" % int(id)
print user,r.zrevrangebyscore(user,"+inf","-inf", 0, 5, True )
# ----------------------------------------------------
# Main function
def main():
r = redis.Redis(connection_pool=POOL)
r.flushall()
m = r.info()["used_memory"]
fill(r)
info = r.info()
print "Keys: ",info["db0"]["keys"]
print "Memory: ",info["used_memory"]-m
iterate(r)
# ----------------------------------------------------
main()
By tweaking the constants, you can also use this program to evaluate the global memory consumption of this data structure.
IMO this strategy is simple and efficient, because it offers O(1) complexity to add/remove users, and true O(n) complexity to iterate on all items. The only downside is the key iteration order is random.