Strategy to synchronize 2 sortedsets in redis? - redis

I have 2 sorted sets:
mainSet: which contains a list of <score, id>
userSet: which contains some of items filtered from mainSet based on certain ruless. (Each user in the system will have an userSet with userId as key)
If there are some changes in the mainSet and very likely item order will be re-sorted, (like score updated, new items added), then I want to update the userSet accordingly.
For example,
mainSet<key, 1, id1>
mainSet<key, 2, id2>
mainSet<key, 3, id3>
mainSet<key, 4, id4>
userSet<userKey, 2, id2>
userSet<userKey, 3, id3>
Items in the mainSet will be added/updated in the userSet based on some filtered rules
Currently, to make sure items in the userSet up to date with the mainSet (for example, the score of items changed), I have to traverse all items in the mainSet, check filtered rules, and re-add in the userSet but it takes around 250 ms for 200 items in the mainSet. (The mainSet may have max of 1000 items).
I want to know if there are any better approaches for my case?

You should look into Redis Keyspace Notifications? Keyspace notifications is a feature available since 2.8.0. If you add a listener to your main set, and for each received event validate whether the object should be moved in or out of the subset, then the system should handle the changes automatically.

Related

How to get list of key values in the Redis database at once?

Kind of new to Redis cache database. I have a scenario where I have a list of items (say around 5000) which on converting it to JSON format takes around 400 MB. So each time if any one of the items is modified, I have to get the value from the Redis database and have to change that particular item and again push it to the Redi database. This is kind of affecting the performance.
So instead of caching the whole items in one file. I have decided to cache the items individually in the Redis database. But If I need to get all the items at once, I have to loop and pass 5000 keys to Redis and get the data which will make 5000 calls in Redis. Is there any option to get all the items at once whenever necessary so that I can get all the data and also individual data wherever necessary? Also is there any limitations on how many keys can be stored in Redis?
Thanks in Advance
You have many options here. Most likely the best is to use one hash to store all your items, one item per hash field. The item itself most likely has its own values, you can serialize each item in json. You can create with:
> hmset items 01 "{'name': 'myCoolItem', 'price': 0.99}" 02 "{'name': 'anotherItem', 'price': 2.99}" ...
You can get all items with:
> hgetall items
1) "01"
2) "{'name': 'myCoolItem', 'price': 0.99}"
3) "02"
4) "{'name': 'anotherItem', 'price': 2.99}"
...
You can get a single item with:
> hget items 02
"{'name': 'anotherItem', 'price': 2.99}"
This is a tradeoff between updating and getting items, assuming the update operation on a single item is not as frequent or the items don't have tons of fields. To update a given field of a given item, you still have to deserialize locally, modify, serialize and then update back to Redis, but you are doing it on a per-item basis.
If you have tons of fields per item or the update operation is rather frequent, you can use one hash key per item:
> hset item:01 name myCoolItem price 0.99
> hset item:02 name anotherItem price 2.99
This allows you to get, or modify, one field of one item at a time:
> hset item:02 price 3.49
> hget item:01 price
"0.99"
You can then get all your items, using multiple hgetall commands. To avoid hitting the server once per item, you can use pipelining. See Is there MGET analog for Redis hashes?
A third option, you can consider using a module like RedisJson.
By using Hash in redis you can achieve this. Becoz by using HSET you can update the individual value, also by using HGETALL you can get all your 5000 data in single call

Most efficient way to find random item not present in join table and without repeating previous random item

In my Rails 4 app I have an Item model and a Flag model. Item has_many Flags. Flags belong_to Item. Flag has the attributes item_id, user_id, and reason. I am using enum for pending status. I need the most efficient way to get an item that doesn't exist in the flags table because I have a VERY large table. I also need to make sure that when a user clicks to generate another random item, it will not repeat the current random item back to back. It would be OK to repeat any time afterwards, just not back to back.
This is what I have so far:
def pending_item
#pending_items = Item.joins("LEFT OUTER JOIN flags ON flags.item_id = items.id").
where("items.user_id != #{current_user.id} and flags.id is null")
#pending_item = #pending_badges.offset(rand #pending_badges.count).first
end
Is there a more efficient way than this?
And how can I make sure there are no back to back repeats of the same pending_item?
What you have is the fastest way to do it in the database I know of (which is why I gave it here). Many other Stack Overflow posts discuss ways to efficiently select random rows from tables; there are more efficient methods if you're selecting from an entire table, but they don't apply when you're selecting a random result from a query whose results can be different every time.
If performance is critical, it would be much faster to do it in memory.
The first time you need to pick a random pending item for a given user, select all of the user's pending items from the database and store them in the Rails cache. (This only works if there's a reasonable number of pending items per user.)
Each time you need to pick a random pending item for a given user, get the full list from the cache and pick a random member with .sample or whatever.
Here's the tricky part: to keep the cache consistent, every time you do anything that could change a user's full list of pending items (including something like adding a new flag type), you'll need to invalidate the cache entry.
This is a lot of effort, so you really have to want to do it.
Regarding avoiding repeats, the only way to do that reliably is to store the last pending item displayed and exclude it from your query
def pending_item
#pending_items = UserItem.
joins("LEFT OUTER JOIN flags ON flags.item_id = items.id").
where("items.user_id != ?", current_user.id).
where("flags.id is null").
where("items.id != ?", previous_shown_item_id)
#pending_item = #pending_badges.offset(rand #pending_badges.count).first
end
or, if you do the random selection in memory, exclude the last shown pending item when you do that.

Which datastructure should I use in Redis for a notification system?

I am trying to make a notification system with Redis rather than using MySQL which is what I use for the rest of the system. The reason for this is that I don't really need to save that much data so it can be saved in memory and I want it to be lightweight and fast.
The notifications will be kept temporarily. What I mean by that is that I do not want to save all notifications, but more like 50 latest unseen notifications for each user. So first thing I thought about was to use a linked list with a capped length of 50.
I would need to save this information for the notification:
postId
commentId
type
time
userId
username
image
So perhaps a JSON serialized string like this:
{"postId":1,"commentId":10,"type":1,"time":1462960058,"userId":2,"username":"Alexander","image":"ntfpRrgx.png"}
The notifications would be output like this on the client side:
Alexander commented on your post.
Alexander replied to your comment.
Where the type determines what kind of notification it is. I can handle "type" checks client side and output notification format accordingly. But here is the part I am having difficult with.
1) I need to be able to save the notifications in an ordered way so that I know which notification is newest.
2) I need to be able to know when a notification has been seen, so that it is not registered as not seen anymore.
3) I need to have a count of unseen notifications that I can show to the user. And If the user clicks on a notification, I need to mark that as a seen notification and decrement the count of unseen notifications.
4) I need to be able to mark all notifications as marked seen if the user wishes to do that.
5) I need to be able to get a subset of the notifications, whether seen or unseen, like an offset and limit on MySQL. For example, the user sees the newest 5 notifications, but he could click a next button and see the next 5, and the next 5 and so on.
I have no idea how to do all of this on Redis.
The key for the list or set could be user:1:notification. I know a list is sorted, and we can add and remove from the head and tail. But how do I achieve all these points?
1: You can use redis sorted sets (zset) operations and use timestamp as a score, and event id (or the entire event json) as a member.
ZADD my-set-key timestamp event-id
Then to get a page newest items you use zrevrange command. If you choose to use event id as a member, then you need additional structure to store event fields. I would recommend HSET eventid, field, value.
2: You can remove an item by member (event-id)
ZREM my-set-key event-id
3: Assuming your zset only keeps unseen, then you can use ZCARD to get size of the set
ZCARD my-set-key
4: You can remove an entire set in one shot using
DELETE my-set-key
5: You can paginate using zrange/zrevrange:
ZREVRANGE my-set-key start-position to-position
If you need to keep both seen and unseen items, then you need an extra zset where you only add, but don't remove once an item is seen

Key-value list in Redis

How can I store collection with key-value pairs in Redis? For example, I want to log time when user tried to login, to some collection. Every user have id, so I want to use it as a key. But I want to store it separatly from other elements, in separate collection
For each user you can have a sorted set. You can use the user id in the name of the sorted set. Just use 1 as the value since you don't need to store something there and use the timestamp as the score.
zadd 'user:' + uid +':logins' currentTimestamp 1
With this you can run queries to grab how many times a user tried to login during certain periods with zcount etc.

How to keep a list of 'used' data per user

I'm currently working on a project in MongoDB where I want to get a random sampling of new products from the DB. But my problem is not MongoDB specific, I think it's a general database question.
The scenario:
Let's say we have a collection (or table) of products. And we also have a collection (or table) of users. Every time a user logs in, they are presented with 10 products. These products are selected randomly from the collection/table. Easy enough, but the catch is that every time the user logs in, they must be presented with 10 products that they have NEVER SEEN BEFORE. The two obvious ways that I can think of solving this problem are:
Every user begins with their own private list of all products. Each time they get one of these products, the product is removed from their private list. The result is that the next time products are chosen from this previously trimmed list, it already contains only new items.
Every user has a private list of previously viewed products. When a user logs in, they select 10 random products from the master list, compare the id of each against their list of previously viewed products, and if the item appears on the previously viewed list, the application throws this one away selects a new one, and iterates until there are 10 new items, which it then adds to the previously viewed list for next time.
The problem with #1 is it seems like a tremendous waste. You would basically be duplicating the list data for n number of users. Also removing/adding new items to the system would be a nightmare since it would have to iterate through all users. #2 seems preferable, but it too has issues. You could end up making a lot of extra and unnecessary calls to the DB in order to guarantee 10 new products. As a user goes through more and more products, there are less new ones to choose from, so the chances of having to throw one away and get new one from the DB greatly increases.
Is there an alternative solution? My first and primary concern is performance. I will give up disk space in order to optimize performance.
Those 2 ways are a complete waste of both primary and secondary memory.
You want to show 2 never before seen products, but is this a real must?
If you have a lot of products 10 random ones have a high chance of being unique.
3 . You could list 10 random products, even though not as easy as in MySQL, still less complicated than 1 and 2.
If you don't care how random the sequence of id's is you could do this:
Create a single randomized table of just product id's and a sequential integer surrogate key column. Start each customer at a random point in the list on first login and cycle through the list ordered by that key. If you reach the end, start again from the top.
The customer record would contain a single value for the last product they saw (the surrogate from the randomized list, not the actual id). You'd then pull the next ten on login and do a single update to the customer. It wouldn't really be random, of course. But this kind of table-seed strategy is how a lot of simpler pseudo-random number generators work.
The only problem I see is if your product list grows more quickly than your users log in. Then they'd never see the portions of the list which appear before wherever they started. Even so, with a large list of products and very active users this should scale much better than storing everything they've seen. So if it doesn't matter that products appear in a set psuedo-random sequence, this might be a good fit for you.
Edit:
If you stored the first record they started with as well, you could still generate the list of all things seen. It would be everything between that value and last viewed.
How about doing this: crate a collection prodUser where you will have just the id of the product and the list of customersID, (who have seen these products) .
{
prodID : 1,
userID : []
}
when a customer logs in you find the 10 prodID which has not been assigned to that user
db.prodUser.find({
userID : {
$nin : [yourUser]
}
})
(For some reason $not is not working :-(. I do not have time to figure out why. If you will - plz let me know.). After showing the person his products - you can update his prodUser collection. To mitigate mongos inability to find random elements - you can insert elements randomly and just find first 10.
Everything should work really fast.