Redis Keyspace Notifications - No. of Subscribers vs Contention - redis

I'm trying to implement a Tagging using Redis. This is how it looks like:
mykey (my item)
mykey:tags (a set with the tags associated to that item)
tags:tag1 (a set with references to all items tagged with "tag1")
...
I'm planning on using Redis Keyspace Notifications to prevent expired keys to stay on my tag sets forever (even when every item in the cache has a default TTL set, I don't like to keep stale data around).
These are the options I'm considering:
1) Subscribe to all "expired" events.
psubscribe '__keyevent#*:expired'
Pros:
Only 1 subscriber.
Cons:
Since not all items contain tags, I will have to check for mykey:tags
and if exists get the tags and remove the item from each tag set.
The contention on this method will increase with the amount of keys
in the store.
2) Subscribe to all events for those keys containing tags only.
psubscribe '__keyspace#*:mykey'
Pros:
Subscriptions will be created for those items with tags only.
Cons:
There must be overhead associated with each subscriber.
The number of subscriber can grow pretty fast depending on the number
of tagged items in the store.
Questions:
Which option should I implement? Should I be concerned about the
number of subscribers on 2) or is the contention on 1) a bigger
deal? I couldn't find any recommendations about this subject.
The end game is to implement this on Redis Cluster. Does this add
any extra concern to the implementation?
Update 1:
This is a generic implementation for tagging on top of our cache. I'm not sure at this point about how we ended up using it. This is more like a PoC I'm working on. Some numbers trying to answer some questions in the comments:
Volume: We have tens of millions of unique visitors per day. Not all items stored in cache for each visitor has tags though. But this changes constantly.
Tags: Tags are managed. There are currently a couple of dozen of tags. We are considering supporting free text tags in the future.
I haven't tested any of the two approaches I'm suggesting here. I was hoping that one of the options were so bad that was not even an option :)
Update 2:
After some trials and errors and some more research I discarded 2). There is a limit for redis clients as well as for the Output Buffers which makes this option a no go. You can find more information here and here.
I tried 1) and it works just fine. I even set the expiration of the keys 5ms apart from each other and the code handle it properly. This can be an alternative to go.
Another option can be the one suggested by #thepirat000. I'm marking this answer as the accepted one but I'm also adding a little tweak to his suggestion: I don't want to do maintenance in the tags on every tag operation, instead I can randomly determine when to do it. This is a good enough approach which doesn't use pub/sub nor the keyspace notifications.

There will be probably too much overhead by using Keyspace Notifications for this.
Why don't you do the clean-up as a scheduled or recurring task, or even when the keys are retrieved by tag?
I've worked on something similar on CachingFramework.Redis where the cleanup is optionally run when retrieving the keys related to a tag. Also the tag set TTL is the MAX(TTL) of the keys it contains.

Related

How to quickly subscribed to relevant subset of large set of routing keys?

I have the feeling I am not understanding something fundamental in AMQP/RabbitMQ, since I cannot find much help on this specific detail.
Let's assume I have a system made up of several components sending each other messages via a RabbitMQ broker. The messages can have routing keys of the form XXX.YYY. Let's further assume XXX and YYY are numbers between 000 and 999. That means there are a total of 1,000,000 different possible routing keys.
Now, not every component in my system is interested in every message. Let's say there is a component that wants all the messages in which XXX is between 300 and 500 and YYY is between 600 and 900. That means the component wants to process messages referring to 200*300 = 60,000 different routing keys. Also, the component might be restarted at any point in time and needs to be able to start processing the messages quickly after restart.
Furthermore, the routing keys the component is interested in might change at runtime.
There are several ways to approach this that I can think of:
Use topic exchanges and subscribe to each routing key. If I do this using one connection and one channel, it is awfully slow. My understanding is that bindings are created sequentially for each channel and thus creating 60,000 bindings takes a while. Adding and removing bindings is trivial, though. Would it be feasible to create more channels so that bindings can be created in parallel?
Use topic exchanges and wildcards, discard messages you're not interested in in the client. We could subscribe to *.* and receives messages for all 1,000,000 routing keys => much more load in the client. Or subscribe to all 200 relevant values of XXX.* and receive messages for 200,000 routing keys. Is this a generally applied pattern?
Use headers exchanges and set x-match to any. This feels a little hacky and it seems headers exchanges are not widely used. You also have to deal with the maximum size of the header when defining a binding. Do people do this? You only need a handful of bindings though, so re-creating the bindings after a restart is very fast. Updating the set of topics we're interested in is also not a problem: Just re-create everything.
So, I guess my question is: What's the best practice to subscribe to a large amount of topics very quickly (<5s) and still be able change routing keys dynamically at run-time?
Would it be feasible to split the component which needs the messages and the subscription into two components? One component is only responsible for keeping the subscriptions up-to-date (this would exchange-to-exchange subscriptions) and the other components receives every message from the downstream exchange.

Redis keyspace notifications - get values(small size) of set operations

I'm working on creating DB with Redis.
One of my recruitments is that all the clients in the system will be able to listen to set events and get information about both key and value change.
I know that publishing value may be big(512 MB) but I know that in my system the size of value will not be more than 100 chars.
I have 3 possible solutions and I wonder which one will be better or consider other solutions:
1) After each set operation client will also publish it (PUB/SUB)
2)Edit setGenericCommand function to publish the value as well and use keyspace binding.
3)After client receive keyspace notification it will get the value with get operation.
I would like to understand which approach will be better?
Thank you!
So, 1st and foremost, remember that PubSub is at-most-once delivery. If you really need to process every change in the client, you should consider a more resilient way to do so.
That said, assuming you're ok with PubSub's promises, 1 is the simplest and I'd go with that. At most, I'd provide the clients with a Lua wrapper that combines the SET and PUBLISH commands. This, of course, removes the need to actually listen to Keyspace notifications as you basically implementing it yourself.
2 means hacking Redis, which is great but means you'll have to maintain your own which is meh--;
3 is also simple enough, but with 1 you get away with a single round trip instead of 2.
Another (4) approach is to write a custom module, but IMO too complex for this need. Go with 1 and Lua, and may the force be with you.

RabbitMQ - How to ensure two queues stay synchronized

I have two queues that both have distinct data types that affect one another as they're being processed by my application, therefore processing messages from the two queues asynchronously would cause a data integrity issue.
I'm curious as to the best practice for making sure only one consumer is consuming at any given time. Here is a summary of what I have so far:
EventMessages receive information about external events that may or may not have an impact on the enqueued/existing PurchaseOrderMessages.
Since we anticipate we'll be consuming more PurchaseOrderMessage than EventMessage, maybe we should just ensure the EventMessage Queue is empty (via the API) before we process anything in PurchaseOrderMessage Queue - but that gets into the question of wait times, etc. and this all needs to happen as close to real time as possible.
If there's a way to simply pause a Consumer A until Consumer B is at rest that might be the simplest solution, I'm just not quite sure which direction I need to go in.
UPDATE
To provide some additional context, a PurchaseOrderMessage will contain a Origin and Destination.
A EventMessage also contains location data.
Each time a PurchaseOrderMessage is processed, it will query the current EventMessage records for any Event locations that match the Origin and Destination of that PurchaseOrder and create an association.
Each time an EventMessage is processed, it will query the current PurchaseOrderMessage records for any Origins of Destinations that match that Event and create an association.
If synchronous queues aren't a good solution, what's an alternative that would insure none of the associations are missed when EventMessages and PurchaseOrderMessages are getting published to the app at the same time?
UPDATE 2
Ultimately this data will serve a UI which will have a list of PurchaseOrders and the events that might be affecting their delivery dates. It would be too slow to do the "Event Check" as the PurchaseOrder data was being rendered/retrieved by the end user which is why we're wanting to do it as they're processed/consumed.
Let me begin with the bottom line up front - on the face of it, what you are asking doesn't make sense.
Queues should never require synchronization. The very thought of doing so entirely defeats the purpose of having a queue. For some background, visit this answer.
Let's consider some common places from real life where we encounter multiple queues:
Movie theaters (box office, concession counter, usher)
Theme parks (snack bars, major attractions)
Manufacturing floors (each station may have a queue waiting to process)
In each of these examples, from the point of view of the object in the queue, it can only wait in one at a time. It cannot wait in one line while it is waiting in another- such a thing is physically impossible.
Your example seems to take two completely unrelated things and merge them together. You have a queue for PurchaseOrder objects - but what is the queue for? That's the equivalent of going to Disney World and waiting in the Customer queue - what is the purpose of such a queue? If the purpose is not clear, it's not a real queue.
Addressing your issue
This particular issue needs to be addressed first by clearly defining the various operations that are being done to a PurchaseOrder, then creating queues for each of those operations. If these operations are truly synchronous, then your business logic should be coded to wait for one operation to complete before starting another. In this circumstance, it would be considered an exception if a PurchaseOrder got to the head of one queue without fulfilling a pre-requisite.
Please remember that a message queue typically serves a stateless operation. Good design dictates that messages in the queue contain all the information needed for the processor to process the message. If you don't adhere to this, then your database becomes a single point of contention for your system - and while this is not an insurmountable problem, it does make the design more complex.
Waiting in Multiple Queues
Now, if you've ever been to Disney World, you'll also know that they have something called a FastPass+ (FP+), which allows the holder to skip the line at the designated attraction. Disney allocates a certain number of slots per hour for each major attraction at the park, and guests are able to request up to three FP+s during each day. FP+ times are allocated for one hour blocks, and guests cannot have two overlapping FP+ time blocks. Once all FP+ slots have been issued for the ride, no more are made available. The FP+ system ensures these rules are enforced, independently of the standby queues for each ride. Essentially, by using FastPass+, guests can wait in multiple lines virtually and experience more attractions during their visit.
If you are unable to analyze your design and come up with an alternative, perhaps the FastPass+ approach could help alleviate some of the bottlenecks.
Disclaimer: I don't work for Disney, but I do go multiple times per month, always getting my FastPass first

Reclaiming expired keys in Redis

I'm trying to solve the following problem in Redis.
I have a list that contains various available keys:
List MASTER:
111A
222B
333C
444D
555E
I'd like to be able to pop an element off of the list and use it as a key with an expires.
After the expires is up, I'd like to be able to push this number back onto MASTER for future use. I don't see any obvious way to do this, so I'm soliciting for a creative one.
The best method would be to get called back by Redis when the key expires and then take action.
However, callbacks support is still to be added (http://code.google.com/p/redis/issues/detail?id=360).
You can either use a Redis version that contains a custom/community modification to support this feature (like the last one in the link I've posted), or worse :): start tracking keys and timeouts in your client app.

Redis usecase for notification system

Does anyone know the usecases for notification system (redis).
I tried many patterns, but not satisfied.
I would think using a List to create a queue would be the best approach. You can push a JSON document or some other serialized data representing the notice onto the list then pop them off as they are delivered (or keep them in the list depending upon your need). Using things like LRANGE you can easily paginate to handle any number of notices.
Take a look at Staircar: Redis-powered notifications. Tumblr team actually uses Redis SortedSet for notifications:
Redis Sorted Sets fit the characteristics of notifications perfectly, without the I/O and concurrency pitfalls of implementing a similar structure in MySQL. Sorted sets in Redis are ordered by a score (unix timestamp in our case), contain unique elements (non-repeating collections of strings in redis speak), can be trimmed or appended to cheaply, and are keyed off, well, a key (user in our case)
Take a look at Thoonk. It produces like pub/sub events that correspond to publish/edit/retract/resorts on higher level objects called feeds. It works well for notification inboxes and application state changes.
The contract/schema https://github.com/andyet/thoonk.js/blob/master/contract.txt provides a lot of recipes that you may be interested in.