I'm curious about best practices as far as expiring secondary indexes in Redis.
For instance, say I have object IDs with positions that are only considered valid for 5 seconds before they expire. I have:
objectID -> hash of object info
I set these entries to automatically expire after 5 seconds.
I also want to look up objects by their zip code, so I have another mapping:
zipCode -> set or list of objectID
I know there's no way to automatically expire elements of a set, but I'd like to automatically remove objectIDs from that zip code mapping when they expire.
What is the standard practice for how to do this? If there were an event fired upon expiration I could find the associated zip code mapping and remove the objectID, but not sure there is (I will be using Go).
If there were an event fired upon expiration I could find the associated zip code mapping and remove the objectID, but not sure there is
YES, there is an expiration notification, check this for an example.
You can have a client subscribing to the expiration event, and delete items from the zipCode set or list.
However, this solution has two problems:
The notification is NOT reliable. If the client disconnects from Redis, or just crashes, client will miss some notifications. In order to make it more reliable, you can have multiple clients listening to the notification. If one client crashes, others can still remove items from zipCode.
It's NOT atomic. There's a time window between the key expiration and removing items from zipCode.
If you're fine with these 2 problems, you can try this solution.
Related
I'm trying to implement a Tagging using Redis. This is how it looks like:
mykey (my item)
mykey:tags (a set with the tags associated to that item)
tags:tag1 (a set with references to all items tagged with "tag1")
...
I'm planning on using Redis Keyspace Notifications to prevent expired keys to stay on my tag sets forever (even when every item in the cache has a default TTL set, I don't like to keep stale data around).
These are the options I'm considering:
1) Subscribe to all "expired" events.
psubscribe '__keyevent#*:expired'
Pros:
Only 1 subscriber.
Cons:
Since not all items contain tags, I will have to check for mykey:tags
and if exists get the tags and remove the item from each tag set.
The contention on this method will increase with the amount of keys
in the store.
2) Subscribe to all events for those keys containing tags only.
psubscribe '__keyspace#*:mykey'
Pros:
Subscriptions will be created for those items with tags only.
Cons:
There must be overhead associated with each subscriber.
The number of subscriber can grow pretty fast depending on the number
of tagged items in the store.
Questions:
Which option should I implement? Should I be concerned about the
number of subscribers on 2) or is the contention on 1) a bigger
deal? I couldn't find any recommendations about this subject.
The end game is to implement this on Redis Cluster. Does this add
any extra concern to the implementation?
Update 1:
This is a generic implementation for tagging on top of our cache. I'm not sure at this point about how we ended up using it. This is more like a PoC I'm working on. Some numbers trying to answer some questions in the comments:
Volume: We have tens of millions of unique visitors per day. Not all items stored in cache for each visitor has tags though. But this changes constantly.
Tags: Tags are managed. There are currently a couple of dozen of tags. We are considering supporting free text tags in the future.
I haven't tested any of the two approaches I'm suggesting here. I was hoping that one of the options were so bad that was not even an option :)
Update 2:
After some trials and errors and some more research I discarded 2). There is a limit for redis clients as well as for the Output Buffers which makes this option a no go. You can find more information here and here.
I tried 1) and it works just fine. I even set the expiration of the keys 5ms apart from each other and the code handle it properly. This can be an alternative to go.
Another option can be the one suggested by #thepirat000. I'm marking this answer as the accepted one but I'm also adding a little tweak to his suggestion: I don't want to do maintenance in the tags on every tag operation, instead I can randomly determine when to do it. This is a good enough approach which doesn't use pub/sub nor the keyspace notifications.
There will be probably too much overhead by using Keyspace Notifications for this.
Why don't you do the clean-up as a scheduled or recurring task, or even when the keys are retrieved by tag?
I've worked on something similar on CachingFramework.Redis where the cleanup is optionally run when retrieving the keys related to a tag. Also the tag set TTL is the MAX(TTL) of the keys it contains.
I have set an expiration value to a key in redis, and want to get the opportunity to run a piece of code before the key will be deleted by redis. Is it possible, and if so how...?
Thanks
My solution was to create a new key, with the same name as the one I wanted to hook, only I added a prefix for it indicating it's a key for timeouts usage ("TO") - something like:
set key1 data1
set TO_key1 ""
expire TO_key1 20
In the example above, as soon as "TO_key1" will expire, it will notify my program and I'll get the opportunity to run my code before I will manually delete "key1".
I found this link very useful for creating the listener for redis: Redis Key expire notification with Jedis
This isn't possible with standard OS Redis... yet. There is a way, however, to do something similar without too much hassle. If you stop using Redis' expiry (at least for those keys that you're interesting in "hooking") and manage expiry "manually" in your code, you can do anything you want before/during/after the expiry event.
Since Redis offers key-level expiry out of the box, people are usually content with it. In some cases, i.e.g. expiring members in a Set, the only way to go is the manual approach but that approach is still valid for regular keys when you need finer control.
Question
I want to pass data between applications, in a publish-subscribe manner. Data may be produced at a much higher rate than consumed and messages get lost, which is not a problem. Imagine a fast sensor and a slow sensor data processor. For that, I use redis pub/sub and wrote a class which acts as a subscriber, receives every message and puts that into a buffer. The buffer is overwritten when a new message comes in or nullified when the message is requested by the "real" function. So when I ask this class, I immediately get a response (hint that my function is slower than data comes in) or I have to wait (hint that my function is faster than the data).
This works pretty good for the case that data comes in fast. But for data which comes in relatively seldom, let's say every five seconds, this does not work: imagine my consumer gets launched slightly after the producer, the first message is lost and my consumer needs to wait nearly five seconds, until it can start working.
I think I have to solve this with Redis tools. Instead of a pub/sub, I could simply use the get/set methods, thus putting the cache functionality into Redis directly. But then, my consumer would have to poll the database instead of the event magic I have at the moment. Keys could look like "key:timestamp", and my consumer now has to get key:* and compare the timestamps permamently, which I think would cause a lot of load. There is no natural possibility to sleep, since although I don't care about dropped messages (there is nothing I can do about), I do care about delay.
Does someone use Redis for a similar thing and could give me a hint about clever use of Redis tools and data structures?
edit
Ideally, my program flow would look like this:
start the program
retrieve key from Redis
tell Redis, "hey, notify me on changes of key".
launch something asynchronously, with a callback for new messages.
By writing this, an idea came up: The publisher not only publishes message on topic key, but also set key message. This way, an application could initially get and then subscribe.
Good idea or not really?
What I did after I got the answer below (the accepted one)
Keyspace notifications are really what I need here. Redis acts as the primary source for information, my client subscribes to keyspace notifications, which notify the subscribers about events affecting specific keys. Now, in the asynchronous part of my client, I subscribe to notifications about my key of interest. Those notifications set a key_has_updates flag. When I need the value, I get it from Redis and unset the flag. With an unset flag, I know that there is no new value for that key on the server. Without keyspace notifications, this would have been the part where I needed to poll the server. The advantage is that I can use all sorts of data structures, not only the pub/sub mechanism, and a slow joiner which misses the first event is always able to get the initial value, which with pub/sib would have been lost.
When I need the value, I obtain the value from Redis and set the flag to false.
One idea is to push the data to a list (LPUSH) and trim it (LTRIM), so it doesn't grow forever if there are no consumers. On the other end, the consumer would grab items from that list and process them. You can also use keyspace notifications, and be alerted each time an item is added to that queue.
I pass data between application using two native redis command:
rpush and blpop .
"blpop blocks the connection when there are no elements to pop from any of the given lists".
Data are passed in json format, between application using list as queue.
Application that want send data (act as publisher) make a rpush on a list
Application that want receive data (act as subscriber) make a blpop on the same list
The code shuold be (in perl language)
Sender (we assume an hash pass)
#Encode hash in json format
my $json_text = encode_json \%$hash_ref;
#Connect to redis and send to list
my $r = Redis->new(server => "127.0.0.1:6379");
$r->rpush("shared_queue","$json_text");
$r->quit;
Receiver (into a infinite loop)
while (1) {
my $r = Redis->new(server => "127.0.0.1:6379");
my #elem =$r->blpop("shared_queue",0);
#Decode hash element
my $hash_ref=decode_json($elem\[1]);
#make some stuff
}
I find this way very usefull for many reasons:
The element are stored into list, so temporary disabling of receiver has no information loss. When recevier restart, can process all items into the list.
High rate of sender can be handled with multiple instance of receiver.
Multiple sender can send data on unique list. In ths case should be easily implmented a data collector
Receiver process that act as daemon can be monitored with specific tools (e.g. pm2)
From Redis 5, there is new data type called "Streams" which is append-only datastructure. The Redis streams can be used as reliable message queue with both point to point and multicast communication using consumer group concept Redis_Streams_MQ
I have an activity feed system that uses Redis sorted sets.
Events happen and a message is placed into a sorted set for each relevant user, with a timestamp for the score.
The messages are then distributed to the users, either when they log in, or through a push if the user is currently logged in.
I'd like to differentiate between messages that have been "read" by the user and ones that are still unread.
From my understanding, I can't just have a "read/unread" property as part of the member as changing it would cause the member to be different and therefore be added a second time, instead of replacing the current member.
So, what I'm thinking is that for each user, I have to sorted sets - an "unread" set and a "read" set.
When new events come in, they're added to the "unread" set
When the user reads a message, I add the message to the read set and remove it from the unread set.
A little less sure on how to deliver them. I can't just union them as I lose the distinction between read/unread unless I inverse the score on the unread ones, for instance.
Returning the two sets individually (and merging them in code) makes paging difficult. I'd like to be able to the 20 most recent messages regardless of read/unread status.
So, questions:
Is the read set/unread set way the best way to do it? Is there a better way?
What's the best way to return subsets of the merged/union'd data.
Thanks!
Instead of trying to update the member, you could just pop it and insert a new version. Should be no problem, because you know both member and its timestamp.
I'm trying to solve the following problem in Redis.
I have a list that contains various available keys:
List MASTER:
111A
222B
333C
444D
555E
I'd like to be able to pop an element off of the list and use it as a key with an expires.
After the expires is up, I'd like to be able to push this number back onto MASTER for future use. I don't see any obvious way to do this, so I'm soliciting for a creative one.
The best method would be to get called back by Redis when the key expires and then take action.
However, callbacks support is still to be added (http://code.google.com/p/redis/issues/detail?id=360).
You can either use a Redis version that contains a custom/community modification to support this feature (like the last one in the link I've posted), or worse :): start tracking keys and timeouts in your client app.