In my website, users are allowed to keep the same usernames. Moreover, at any point in time a user logs in, I temporarily save their username in a redis key with a ttl of 10 mins.
The question is: is there any way - using Redis - to find all user ids online within the last 10 mins, sharing the same username?
Currently, I'm extracting all the keys' values and finding collisions in Python - which doesn't really help since I need to do this multiple times at runtime (and there's lots of user traffic).
I hypothesize that I could have created sets with a unique username as the key, and stored all user ids in the set to give me O(1) look ups on users sharing the same usernames. But then, I'd have to sacrifice the 10 mins ttl condition (which I need for every username individually).
Btw Redis/Lua beginner here, hence the noob question (if it is).
Where there is a will, there is a way... :)
Begin by storing the logins in a Sorted Set. Assuming that user id 123 had logged in at time 456 with the username "foo", you can represent that as:
ZADD logins 456 123:foo
Note: you'll also have to remove old elements from that Sorted Set so it doesn't just grow out of control.
Next, you want to search for the users from the last 10 minutes, so you'd use ZRANGEBYSCORE for that. Instead of shipping the entire thing back to your client, use Lua to process it and check for collisions.
The following script example wraps together all of the above:
-- Keys: 1) The logins Sorted Set
-- Args: 1) The epoch value of 'now'
-- 2) The logged in user id
-- 3) The logged in user name
-- Get logins from the last 10 minutes
local l = redis.call('ZRANGEBYSCORE', KEYS[1], ARGV[1]-600, '+inf')
-- "Evict" old logins
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', '(' .. ARGV[1]-600)
-- Store the new login
redis.call('ZADD', KEYS[1], ARGV[1], ARGV[2] .. ':' .. ARGV[3])
local c = {} -- detected name collision
for _, v in pairs(l) do
local p = v:find(':') -- no string.split in Lua
local i = v:sub(1,p-1) -- id
local n = v:sub(p+1) -- name
if n == ARGV[3] then
c[#c+1] = i
end
end
return c
Related
I'm creating a game matchmaking system using Redis based on MMR, which is a number that pretty much sums up the skill of a player. Therefore the system can match him/her with others who are pretty much with the same skill.
For example if a player with MMR of 1000 joins the queue, system will try to find other ppl with MMR of range 950 to 1050 to match with this player. But if after one minute it cannot find any player with given stats it will scale up the range to 900 to 1100 (a constant threshold).
What I want to do is really easy with relational database design but I can't figure out how to do it with Redis.
The queue table implementation would be like this:
+----+---------+------+-------+
| ID | USER_ID | MMR | TRIES |
+----+---------+------+-------+
| 1 | 50 | 1000 | 1 |
| 2 | 70 | 1500 | 1 |
| 3 | 350 | 1200 | 1 |
+----+---------+------+-------+
So when a new player queues up, it will check it's MMR against other players in the queue if it finds one between 5% Threshold it will match the two players if not it will add the new player to the table and wait for new players to queue up to compare or to pass 1 minute and the cronjob increment the tries and retry to match players.
The only way I can imagine is to use two separate keys for the low and high of each player in the queue like this
MatchMakingQueue:User:1:Low => 900
MatchMakingQueue:User:1:High => 1100
but the keys will be different and I can't get for example all users in between range of low of 900 to high of 1100!
I hope I've been clear enough any help would be much appreciated.
As #Guy Korland had suggested, a Sorted Set can be used to track and match players based on their MMR, and I do not agree with the OP's "won't scale" comment.
Basically, when a new player joins, the ID is added to a zset with the MMR as its score.
ZADD players:mmr 1000 id:50
The matchmaking is made for each user, e.g. id:50 with the following query:
ZREVRANGEBYSCORE players:mmrs 1050 950 LIMIT 0 2
A match is found if two IDs are returned and at least one of them is different than that of the new player. To make the match, both IDs (the new player's and the matched with one) need to be removed from the set - I'd use a Lua script to implement this piece of logic (matching and removing) for atomicity and communication reduction, but it can be done in the client as well.
There are different ways to keep track of the retries, but perhaps the simplest one is to use another Sorted Set, where the score is that metric.
The following pseudo Redis Lua code is a minimal example of the approach:
local kmmrs, kretries = KEYS[1], KEYS[2]
local id = ARGV[1]
local mmr = redis.call('ZSCORE', kmmrs, id)
local retries = redis.call('ZSCORE', kretries, id)
local min, max = mmr*(1-0.05*retries), mmr*(1+0.05*retries)
local candidates = redis.call('ZREVRANGEBYSCORE', kmmrs, max, min, 'LIMIT', 0, 2)
if #candidates < 2 then
redis.call('ZINCRBY', kretries, 1, id)
return nil
end
local reply
if candidates[1] ~= id then
reply = candidates[1]
else
reply = candidates[2]
end
redis.call('ZREM', kmmrs, id, reply)
redis.call('ZREM', kretries, id, reply)
return reply
Let me get the problem right! Your problem is that you want to find all the users in a given range of MMR value. What if You make other users say that "I'm falling in this range".
Read about Redis Pub/Sub.
When a user joins in, publish its MMR to rest of the players.
Write the code on the user side to check if his/her MMR is falling in the range.
If it is, user will publish back to a channel that it is falling in that range. Else, user will silently discard the message.
repeat these steps if you get no response back in 1 minute.
You can make one channel (let's say MatchMMR) for all users to publish MMR for match request which should be suscribed by all the users. And make user specific channel in case somebody has a MMR in the calculated range.
Form you published messages such that you can send all the information like "retry count", "match range percentage", "MMR value" etc. so that your code at user side can calculate if it is the right fit for the MMR.
Read mode about redis Pub/Sub at: https://redis.io/topics/pubsub
I'm iterating through data and dumping some to a Redis DB. Here's an example:
hmset id:1 username "bsmith1" department "accounting"
How can I increment the unique ID on the fly and then use that during the next hmset command? This seems like an obvious ask but I can't quite find the answer.
Use another key, a String, for storing the last ID. Before calling HMSET, call INCR on that key to obtain the next ID. Wrap the two commands in a MULTI/EXEC block or a Lua script to ensure the atomicity of the transaction.
Like Itamar mentions you can store your index/counter in a separate key. In this example I've chosen the name index for that key.
Python 3
KEY_INDEX = 'index'
r = redis.from_url(host)
def store_user(user):
r.incr(KEY_INDEX, 1) # If key doesn't exist it will get created
index = r.get(KEY_INDEX).decode('utf-8') # Decode from byte to string
int_index = int(index) # Convert from string to int
result = r.set('user::%d' % int_index, user)
...
Note that user::<index> is an arbitrary key chosen by me. You can use whatever you want.
If you have multiple machines writing to the same DB you probably want to use pipelines.
I have a Redis set with key 'a' and value '1','2','3'.
Is there a way to set different expire time for each key-value pair in the set.
For example ('a','1') should expire after 60 seconds where as ('a','2') should expire after 120 seconds.
Unfortunately, no. Redis' "containers" (i.e. lists, hashes, sets and sorted sets) do not support per-member expiry, although this functionality has been requested many times in the past.
You can, however, implement your own logic to achieve that result. There are several possible approaches to address this - here's one example. Instead of using a set, use a sorted set (ZSET) and set each member's score to its expiry time using epoch values. This type of workflow could be implemented using a Lua script for example. To add members use something like:
redis.call('zadd', KEYS[1], os.time()+ARGV[1], ARGV[2])
and EVAL it using '1 a 60 1' and '1 a 120 2' as arguments, per your example. To actually "expire" the items from the set, you'll need to delete them once their time has passed. You can do that either by implementing a periodical process that scans your list or upon accessing it. For example, the following Lua can be used to expire members:
redis.call('zremrangebyscore', KEYS[1], '-inf', os.time())
and EVAL it using '1 a' as arguments per your example.
EDIT: How to achieve the above using Python
import time
import redis
def add(r, key, ttl, member):
r.zadd(key, member, int(time.time()+ttl))
def expire(r, key):
r.zremrangebyscore(key, '-inf', int(time.time()))
...
r = redis.Redis()
add(r, 'a', 1, 60)
add(r, 'a', 2, 120)
# periodically or before every operation do
expire(r, 'a')
I want to match my user to a different user in his/her community every day. Currently, I use code like this:
#matched_user = User.near(#user).order("RANDOM()").first
But I want to have a different #matched_user on a daily basis. I haven't been able to find anything in Stack or in the APIs that has given me insight on how to do it. I feel it should be simpler than having to resort to a rake task with cron. (I'm on postgres.)
Whenever I find myself hankering for shared 'memory' or transient state, I think to myself "this is what (distributed) caches were invented for".
#matched_user = Rails.cache.fetch(#user.cache_key + '/daily_match', expires_in: 1.day) {
User.near(#user).order("RANDOM()").first
}
NOTE: While specifying a TTL for cache entry tells Rails/the cache system to try and keep that value for the given timeframe, there's NO guarantee that it will. In particular, a cache that aggressively tries to reclaim memory may expire an entry well before its desired expires_in time.
For this particular use case, it shouldn't be a big deal but in cases where the business/domain logic demands periodically generated values that are durable then you really have to factor that into your database.
How about using PostgreSQL's SETSEED function? I used the date to seed so that every day the seed will change, but within a day, the seed will be consistent.:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
You may want to seed a random value after using this so that any future calls to random aren't biased:
random = User.connection.execute("SELECT RANDOM()").to_a.first["random"]
# Same code as above:
User.connection.execute "SELECT SETSEED(#{Date.today.strftime("%y%d%m").to_i/1000000.0})"
#matched_user = User.near(#user).order("RANDOM()").first
# Use random value before seed to make new seed:
User.connection.execute "SELECT SETSEED(#{random})"
I have split these steps in different sections just for readability. you can optimise query later.
1) Find all user records till today morning. so that the count will freeze.
usrs_till_today_morning = User.where("created_at <?", DateTime.now.in_time_zone(Time.zone).beginning_of_day)
2) Pluck all ID's
user_ids = usr_till_today_morning.pluck(:id)
3) Today date it will be a range (1..30) but will remain constant throughout the day.
day_today = Time.now.day
4) Select the same ID for the day
todays_user_id = user_ids[day_today % user_ids.count]
#matched_user = User.find(todays_user_id)
So it will give you random user records by maintaining same record throughout the day!!
I have data consisting of user_ids and tags of these user ids.
The user_ids occur multiple times and have pre-specified number of tags (500) however that might change in the feature. What must be stored is the user_id, their tags and their count.
I want later to easily find tags with top score.. etc. Every time a tag appears it is incremented
My implementation in redis is done using sorted sets
every user_id is a sorted set
key is user_id and is a hex number
works like this:
zincrby user_id:x 1 "tag0"
zincrby user_id:x 1 "tag499"
zincrby user_id:y 1 "tag3"
and so on
having in mind that I want to get tags with highest score, is there a better way?
The second issue is that right now I 'm using "keys *" to retrieve these keys for client side manipulation which I know that it's not aimed towards production systems.
Plus it would be great for memory problems to iterate through a specified number of keys (in the range of 10000). I know that keys have to be stored in memory, however they don't follow
a specific pattern to allow for partial retrieval so I can avoid "zmalloc" error (4GB 64 bit debian server).
Keys amount to range of 20 million.
Any thoughts?
My first point would be to note that 4 GB are tight to store 20M sorted sets. A quick try shows that 20M users, each of them with 20 tags would take about 8 GB on a 64 bits box (and it accounts for the sorted set ziplist memory optimizations provided with Redis 2.4 - don't even try this with earlier versions).
Sorted sets are the ideal data structure to support your use case. I would use them exactly as you described.
As you pointed out, KEYS cannot be used to iterate on keys. It is rather meant as a debug command. To support key iteration, you need to add a data structure to provide this access path. The only structures in Redis which can support iteration are the list and the sorted set (through the range methods). However, they tend to transform O(n) iteration algorithms into O(n^2) (for list), or O(nlogn) (for zset). A list is also a poor choice to store keys since it will be difficult to maintain it as keys are added/removed.
A more efficient solution is to add an index composed of regular sets. You need to use a hash function to associate a specific user to a bucket, and add the user id to the set corresponding to this bucket. If the user id are numeric values, a simple modulo function will be enough. If they are not, a simple string hashing function will do the trick.
So to support iteration on user:1000, user:2000 and user:1001, let's choose a modulo 1000 function. user:1000 and user:2000 will be put in bucket index:0 while user:1001 will be put in bucket index:1.
So on top of the zsets, we now have the following keys:
index:0 => set[ 1000, 2000 ]
index:1 => set[ 1001 ]
In the sets, the prefix of the keys is not needed, and it allows Redis to optimize the memory consumption by serializing the sets provided they are kept small enough (integer sets optimization proposed by Sripathi Krishnan).
The global iteration consists in a simple loop on the buckets from 0 to 1000 (excluded). For each bucket, the SMEMBERS command is applied to retrieve the corresponding set, and the client can then iterate on the individual items.
Here is an example in Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# ----------------------------------------------------
import redis, random
POOL = redis.ConnectionPool(host='localhost', port=6379, db=0)
NUSERS = 10000
NTAGS = 500
NBUCKETS = 1000
# ----------------------------------------------------
# Fill redis with some random data
def fill(r):
p = r.pipeline()
# Create only 10000 users for this example
for id in range(0,NUSERS):
user = "user:%d" % id
# Add the user in the index: a simple modulo is used to hash the user id
# and put it in the correct bucket
p.sadd( "index:%d" % (id%NBUCKETS), id )
# Add random tags to the user
for x in range(0,20):
tag = "tag:%d" % (random.randint(0,NTAGS))
p.zincrby( user, tag, 1 )
# Flush the pipeline every 1000 users
if id % 1000 == 0:
p.execute()
print id
# Flush one last time
p.execute()
# ----------------------------------------------------
# Iterate on all the users and display their 5 highest ranked tags
def iterate(r):
# Iterate on the buckets of the key index
# The range depends on the function used to hash the user id
for x in range(0,NBUCKETS):
# Iterate on the users in this bucket
for id in r.smembers( "index:%d"%(x) ):
user = "user:%d" % int(id)
print user,r.zrevrangebyscore(user,"+inf","-inf", 0, 5, True )
# ----------------------------------------------------
# Main function
def main():
r = redis.Redis(connection_pool=POOL)
r.flushall()
m = r.info()["used_memory"]
fill(r)
info = r.info()
print "Keys: ",info["db0"]["keys"]
print "Memory: ",info["used_memory"]-m
iterate(r)
# ----------------------------------------------------
main()
By tweaking the constants, you can also use this program to evaluate the global memory consumption of this data structure.
IMO this strategy is simple and efficient, because it offers O(1) complexity to add/remove users, and true O(n) complexity to iterate on all items. The only downside is the key iteration order is random.