I have a Redis set with key 'a' and value '1','2','3'.
Is there a way to set different expire time for each key-value pair in the set.
For example ('a','1') should expire after 60 seconds where as ('a','2') should expire after 120 seconds.
Unfortunately, no. Redis' "containers" (i.e. lists, hashes, sets and sorted sets) do not support per-member expiry, although this functionality has been requested many times in the past.
You can, however, implement your own logic to achieve that result. There are several possible approaches to address this - here's one example. Instead of using a set, use a sorted set (ZSET) and set each member's score to its expiry time using epoch values. This type of workflow could be implemented using a Lua script for example. To add members use something like:
redis.call('zadd', KEYS[1], os.time()+ARGV[1], ARGV[2])
and EVAL it using '1 a 60 1' and '1 a 120 2' as arguments, per your example. To actually "expire" the items from the set, you'll need to delete them once their time has passed. You can do that either by implementing a periodical process that scans your list or upon accessing it. For example, the following Lua can be used to expire members:
redis.call('zremrangebyscore', KEYS[1], '-inf', os.time())
and EVAL it using '1 a' as arguments per your example.
EDIT: How to achieve the above using Python
import time
import redis
def add(r, key, ttl, member):
r.zadd(key, member, int(time.time()+ttl))
def expire(r, key):
r.zremrangebyscore(key, '-inf', int(time.time()))
...
r = redis.Redis()
add(r, 'a', 1, 60)
add(r, 'a', 2, 120)
# periodically or before every operation do
expire(r, 'a')
Related
Given that I have KEY VALUEs stored in Redis and wish to expire keys starting with a pattern.
For example have stored:
SET hello.world "Hello"
SET there.how "There"
SET hello.are.you "Are you"
Then after the keys were set, want to expire all keys starting with "hello".
Assuming there are really a large number of keys, not just this simple example. Do not wish to do many round trip calls to Redis.
I think easiest way is just iterate all the keys with scan(pattern), and expire the keys with pipeline, it will do certain round trips as you say
def while_true_loop():
target_pattern = 'hello.([A-Za-z0-9]*)$'
simple_pattern = 'hello.*'
import re
cursor = 0
while True:
cursor, keys = rs.scan(cursor, match=simple_pattern, count=1000)
pipe = rs.pipeline()
for key in keys:
if re.match(target_pattern, key):
pipe.expire(key, 3600)
pipe.execute()
if not keys:
break
In my website, users are allowed to keep the same usernames. Moreover, at any point in time a user logs in, I temporarily save their username in a redis key with a ttl of 10 mins.
The question is: is there any way - using Redis - to find all user ids online within the last 10 mins, sharing the same username?
Currently, I'm extracting all the keys' values and finding collisions in Python - which doesn't really help since I need to do this multiple times at runtime (and there's lots of user traffic).
I hypothesize that I could have created sets with a unique username as the key, and stored all user ids in the set to give me O(1) look ups on users sharing the same usernames. But then, I'd have to sacrifice the 10 mins ttl condition (which I need for every username individually).
Btw Redis/Lua beginner here, hence the noob question (if it is).
Where there is a will, there is a way... :)
Begin by storing the logins in a Sorted Set. Assuming that user id 123 had logged in at time 456 with the username "foo", you can represent that as:
ZADD logins 456 123:foo
Note: you'll also have to remove old elements from that Sorted Set so it doesn't just grow out of control.
Next, you want to search for the users from the last 10 minutes, so you'd use ZRANGEBYSCORE for that. Instead of shipping the entire thing back to your client, use Lua to process it and check for collisions.
The following script example wraps together all of the above:
-- Keys: 1) The logins Sorted Set
-- Args: 1) The epoch value of 'now'
-- 2) The logged in user id
-- 3) The logged in user name
-- Get logins from the last 10 minutes
local l = redis.call('ZRANGEBYSCORE', KEYS[1], ARGV[1]-600, '+inf')
-- "Evict" old logins
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', '(' .. ARGV[1]-600)
-- Store the new login
redis.call('ZADD', KEYS[1], ARGV[1], ARGV[2] .. ':' .. ARGV[3])
local c = {} -- detected name collision
for _, v in pairs(l) do
local p = v:find(':') -- no string.split in Lua
local i = v:sub(1,p-1) -- id
local n = v:sub(p+1) -- name
if n == ARGV[3] then
c[#c+1] = i
end
end
return c
I'm iterating through data and dumping some to a Redis DB. Here's an example:
hmset id:1 username "bsmith1" department "accounting"
How can I increment the unique ID on the fly and then use that during the next hmset command? This seems like an obvious ask but I can't quite find the answer.
Use another key, a String, for storing the last ID. Before calling HMSET, call INCR on that key to obtain the next ID. Wrap the two commands in a MULTI/EXEC block or a Lua script to ensure the atomicity of the transaction.
Like Itamar mentions you can store your index/counter in a separate key. In this example I've chosen the name index for that key.
Python 3
KEY_INDEX = 'index'
r = redis.from_url(host)
def store_user(user):
r.incr(KEY_INDEX, 1) # If key doesn't exist it will get created
index = r.get(KEY_INDEX).decode('utf-8') # Decode from byte to string
int_index = int(index) # Convert from string to int
result = r.set('user::%d' % int_index, user)
...
Note that user::<index> is an arbitrary key chosen by me. You can use whatever you want.
If you have multiple machines writing to the same DB you probably want to use pipelines.
I want to match my user to a different user in his/her community every day. Currently, I use code like this:
#matched_user = User.near(#user).order("RANDOM()").first
But I want to have a different #matched_user on a daily basis. I haven't been able to find anything in Stack or in the APIs that has given me insight on how to do it. I feel it should be simpler than having to resort to a rake task with cron. (I'm on postgres.)
Whenever I find myself hankering for shared 'memory' or transient state, I think to myself "this is what (distributed) caches were invented for".
#matched_user = Rails.cache.fetch(#user.cache_key + '/daily_match', expires_in: 1.day) {
User.near(#user).order("RANDOM()").first
}
NOTE: While specifying a TTL for cache entry tells Rails/the cache system to try and keep that value for the given timeframe, there's NO guarantee that it will. In particular, a cache that aggressively tries to reclaim memory may expire an entry well before its desired expires_in time.
For this particular use case, it shouldn't be a big deal but in cases where the business/domain logic demands periodically generated values that are durable then you really have to factor that into your database.
How about using PostgreSQL's SETSEED function? I used the date to seed so that every day the seed will change, but within a day, the seed will be consistent.:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
You may want to seed a random value after using this so that any future calls to random aren't biased:
random = User.connection.execute("SELECT RANDOM()").to_a.first["random"]
# Same code as above:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
# Use random value before seed to make new seed:
User.connection.execute "SELECT SETSEED(#{random})"
I have split these steps in different sections just for readability. you can optimise query later.
1) Find all user records till today morning. so that the count will freeze.
usrs_till_today_morning = User.where("created_at <?", DateTime.now.in_time_zone(Time.zone).beginning_of_day)
2) Pluck all ID's
user_ids = usr_till_today_morning.pluck(:id)
3) Today date it will be a range (1..30) but will remain constant throughout the day.
day_today = Time.now.day
4) Select the same ID for the day
todays_user_id = user_ids[day_today % user_ids.count]
#matched_user = User.find(todays_user_id)
So it will give you random user records by maintaining same record throughout the day!!
I have data consisting of user_ids and tags of these user ids.
The user_ids occur multiple times and have pre-specified number of tags (500) however that might change in the feature. What must be stored is the user_id, their tags and their count.
I want later to easily find tags with top score.. etc. Every time a tag appears it is incremented
My implementation in redis is done using sorted sets
every user_id is a sorted set
key is user_id and is a hex number
works like this:
zincrby user_id:x 1 "tag0"
zincrby user_id:x 1 "tag499"
zincrby user_id:y 1 "tag3"
and so on
having in mind that I want to get tags with highest score, is there a better way?
The second issue is that right now I 'm using "keys *" to retrieve these keys for client side manipulation which I know that it's not aimed towards production systems.
Plus it would be great for memory problems to iterate through a specified number of keys (in the range of 10000). I know that keys have to be stored in memory, however they don't follow
a specific pattern to allow for partial retrieval so I can avoid "zmalloc" error (4GB 64 bit debian server).
Keys amount to range of 20 million.
Any thoughts?
My first point would be to note that 4 GB are tight to store 20M sorted sets. A quick try shows that 20M users, each of them with 20 tags would take about 8 GB on a 64 bits box (and it accounts for the sorted set ziplist memory optimizations provided with Redis 2.4 - don't even try this with earlier versions).
Sorted sets are the ideal data structure to support your use case. I would use them exactly as you described.
As you pointed out, KEYS cannot be used to iterate on keys. It is rather meant as a debug command. To support key iteration, you need to add a data structure to provide this access path. The only structures in Redis which can support iteration are the list and the sorted set (through the range methods). However, they tend to transform O(n) iteration algorithms into O(n^2) (for list), or O(nlogn) (for zset). A list is also a poor choice to store keys since it will be difficult to maintain it as keys are added/removed.
A more efficient solution is to add an index composed of regular sets. You need to use a hash function to associate a specific user to a bucket, and add the user id to the set corresponding to this bucket. If the user id are numeric values, a simple modulo function will be enough. If they are not, a simple string hashing function will do the trick.
So to support iteration on user:1000, user:2000 and user:1001, let's choose a modulo 1000 function. user:1000 and user:2000 will be put in bucket index:0 while user:1001 will be put in bucket index:1.
So on top of the zsets, we now have the following keys:
index:0 => set[ 1000, 2000 ]
index:1 => set[ 1001 ]
In the sets, the prefix of the keys is not needed, and it allows Redis to optimize the memory consumption by serializing the sets provided they are kept small enough (integer sets optimization proposed by Sripathi Krishnan).
The global iteration consists in a simple loop on the buckets from 0 to 1000 (excluded). For each bucket, the SMEMBERS command is applied to retrieve the corresponding set, and the client can then iterate on the individual items.
Here is an example in Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# ----------------------------------------------------
import redis, random
POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
NUSERS = 10000
NTAGS = 500
NBUCKETS = 1000
# ----------------------------------------------------
# Fill redis with some random data
def fill(r):
p = r.pipeline()
# Create only 10000 users for this example
for id in range(0,NUSERS):
user = "user:%d" % id
# Add the user in the index: a simple modulo is used to hash the user id
# and put it in the correct bucket
p.sadd( "index:%d" % (id%NBUCKETS), id )
# Add random tags to the user
for x in range(0,20):
tag = "tag:%d" % (random.randint(0,NTAGS))
p.zincrby( user, tag, 1 )
# Flush the pipeline every 1000 users
if id % 1000 == 0:
p.execute()
print id
# Flush one last time
p.execute()
# ----------------------------------------------------
# Iterate on all the users and display their 5 highest ranked tags
def iterate(r):
# Iterate on the buckets of the key index
# The range depends on the function used to hash the user id
for x in range(0,NBUCKETS):
# Iterate on the users in this bucket
for id in r.smembers( "index:%d"%(x) ):
user = "user:%d" % int(id)
print user,r.zrevrangebyscore(user,"+inf","-inf", 0, 5, True )
# ----------------------------------------------------
# Main function
def main():
r = redis.Redis(connection_pool=POOL)
r.flushall()
m = r.info()["used_memory"]
fill(r)
info = r.info()
print "Keys: ",info["db0"]["keys"]
print "Memory: ",info["used_memory"]-m
iterate(r)
# ----------------------------------------------------
main()
By tweaking the constants, you can also use this program to evaluate the global memory consumption of this data structure.
IMO this strategy is simple and efficient, because it offers O(1) complexity to add/remove users, and true O(n) complexity to iterate on all items. The only downside is the key iteration order is random.