Redis usecase for notification system - notifications

Does anyone know the usecases for notification system (redis).
I tried many patterns, but not satisfied.

I would think using a List to create a queue would be the best approach. You can push a JSON document or some other serialized data representing the notice onto the list then pop them off as they are delivered (or keep them in the list depending upon your need). Using things like LRANGE you can easily paginate to handle any number of notices.

Take a look at Staircar: Redis-powered notifications. Tumblr team actually uses Redis SortedSet for notifications:
Redis Sorted Sets fit the characteristics of notifications perfectly, without the I/O and concurrency pitfalls of implementing a similar structure in MySQL. Sorted sets in Redis are ordered by a score (unix timestamp in our case), contain unique elements (non-repeating collections of strings in redis speak), can be trimmed or appended to cheaply, and are keyed off, well, a key (user in our case)

Take a look at Thoonk. It produces like pub/sub events that correspond to publish/edit/retract/resorts on higher level objects called feeds. It works well for notification inboxes and application state changes.
The contract/schema https://github.com/andyet/thoonk.js/blob/master/contract.txt provides a lot of recipes that you may be interested in.

Related

Max limit for number of routing keys per queue?

I am trying to use rabbitmq as a part of the notification system. I have an exchange called "notification_events" and the queues in the exchange are based on the types of events, for example, 'send_account_notification_queue' or 'send_tickets_notification_queue'. In order to send to specific user(s) I plan on binding userId to the appropriate queue as a routing key. And I'm sure the number of routing keys will grow with more users...
I read that it is bad to have thousands or millions of queues, but how about routing keys? Are there better ways of doing this? Any help is appreciated and thanks in advance for your time :)
Do you really need queueing per user? Have you considered having a single queue per type of event and using userId to notify the appropriate user? It assumes that a given user notification is fast and cannot fail just for the subset of users.
If you need more complex per-user logic (like reordering events) or dealing with a specific user not being able to receive an event then queueing system is not the right abstraction. Look for an orchestration system like temporal.io that supports per-user object with as complex logic as necessary.
See this answer that explains how Temporal solves it for a system with similar requirements.

What is the diff between data-sync and pub-sub in Deepstream

All:
I am pretty new to deepstream, on its website, it described in core concepts section as:
data-sync Interactive JSON documents that can be edited and observed.
Changes are persisted and synced across clients.
and
publish-subscribe Many clients can subscribe to topics and receive
data whenever other clients publish it to the same topic
I wonder what is the diff between its data-sync and pub-sub in terms of their purpose, in anther way, what task can one do while the other can not?
Thanks
PubSub is a way for clients and servers to send messages to each other. These messages can contain all sorts of data, but once the message is delivered its gone - there's no storage or statefulness. If you're familiar with EventEmitters in e.g. JavaScript you're already familiar with the pattern.
Data-Sync on the other hand is stateful, persistent data. Clients can request JSON documents called records, update them and subscribe to changes made by other records. Records can be arranged in lists and lists can be referenced by records, allowing for data-sync to become the realtime backbone for all the data that drives your app.

Redis Keyspace Notifications - No. of Subscribers vs Contention

I'm trying to implement a Tagging using Redis. This is how it looks like:
mykey (my item)
mykey:tags (a set with the tags associated to that item)
tags:tag1 (a set with references to all items tagged with "tag1")
...
I'm planning on using Redis Keyspace Notifications to prevent expired keys to stay on my tag sets forever (even when every item in the cache has a default TTL set, I don't like to keep stale data around).
These are the options I'm considering:
1) Subscribe to all "expired" events.
psubscribe '__keyevent#*:expired'
Pros:
Only 1 subscriber.
Cons:
Since not all items contain tags, I will have to check for mykey:tags
and if exists get the tags and remove the item from each tag set.
The contention on this method will increase with the amount of keys
in the store.
2) Subscribe to all events for those keys containing tags only.
psubscribe '__keyspace#*:mykey'
Pros:
Subscriptions will be created for those items with tags only.
Cons:
There must be overhead associated with each subscriber.
The number of subscriber can grow pretty fast depending on the number
of tagged items in the store.
Questions:
Which option should I implement? Should I be concerned about the
number of subscribers on 2) or is the contention on 1) a bigger
deal? I couldn't find any recommendations about this subject.
The end game is to implement this on Redis Cluster. Does this add
any extra concern to the implementation?
Update 1:
This is a generic implementation for tagging on top of our cache. I'm not sure at this point about how we ended up using it. This is more like a PoC I'm working on. Some numbers trying to answer some questions in the comments:
Volume: We have tens of millions of unique visitors per day. Not all items stored in cache for each visitor has tags though. But this changes constantly.
Tags: Tags are managed. There are currently a couple of dozen of tags. We are considering supporting free text tags in the future.
I haven't tested any of the two approaches I'm suggesting here. I was hoping that one of the options were so bad that was not even an option :)
Update 2:
After some trials and errors and some more research I discarded 2). There is a limit for redis clients as well as for the Output Buffers which makes this option a no go. You can find more information here and here.
I tried 1) and it works just fine. I even set the expiration of the keys 5ms apart from each other and the code handle it properly. This can be an alternative to go.
Another option can be the one suggested by #thepirat000. I'm marking this answer as the accepted one but I'm also adding a little tweak to his suggestion: I don't want to do maintenance in the tags on every tag operation, instead I can randomly determine when to do it. This is a good enough approach which doesn't use pub/sub nor the keyspace notifications.
There will be probably too much overhead by using Keyspace Notifications for this.
Why don't you do the clean-up as a scheduled or recurring task, or even when the keys are retrieved by tag?
I've worked on something similar on CachingFramework.Redis where the cleanup is optionally run when retrieving the keys related to a tag. Also the tag set TTL is the MAX(TTL) of the keys it contains.

Redis as a message broker

Question
I want to pass data between applications, in a publish-subscribe manner. Data may be produced at a much higher rate than consumed and messages get lost, which is not a problem. Imagine a fast sensor and a slow sensor data processor. For that, I use redis pub/sub and wrote a class which acts as a subscriber, receives every message and puts that into a buffer. The buffer is overwritten when a new message comes in or nullified when the message is requested by the "real" function. So when I ask this class, I immediately get a response (hint that my function is slower than data comes in) or I have to wait (hint that my function is faster than the data).
This works pretty good for the case that data comes in fast. But for data which comes in relatively seldom, let's say every five seconds, this does not work: imagine my consumer gets launched slightly after the producer, the first message is lost and my consumer needs to wait nearly five seconds, until it can start working.
I think I have to solve this with Redis tools. Instead of a pub/sub, I could simply use the get/set methods, thus putting the cache functionality into Redis directly. But then, my consumer would have to poll the database instead of the event magic I have at the moment. Keys could look like "key:timestamp", and my consumer now has to get key:* and compare the timestamps permamently, which I think would cause a lot of load. There is no natural possibility to sleep, since although I don't care about dropped messages (there is nothing I can do about), I do care about delay.
Does someone use Redis for a similar thing and could give me a hint about clever use of Redis tools and data structures?
edit
Ideally, my program flow would look like this:
start the program
retrieve key from Redis
tell Redis, "hey, notify me on changes of key".
launch something asynchronously, with a callback for new messages.
By writing this, an idea came up: The publisher not only publishes message on topic key, but also set key message. This way, an application could initially get and then subscribe.
Good idea or not really?
What I did after I got the answer below (the accepted one)
Keyspace notifications are really what I need here. Redis acts as the primary source for information, my client subscribes to keyspace notifications, which notify the subscribers about events affecting specific keys. Now, in the asynchronous part of my client, I subscribe to notifications about my key of interest. Those notifications set a key_has_updates flag. When I need the value, I get it from Redis and unset the flag. With an unset flag, I know that there is no new value for that key on the server. Without keyspace notifications, this would have been the part where I needed to poll the server. The advantage is that I can use all sorts of data structures, not only the pub/sub mechanism, and a slow joiner which misses the first event is always able to get the initial value, which with pub/sib would have been lost.
When I need the value, I obtain the value from Redis and set the flag to false.
One idea is to push the data to a list (LPUSH) and trim it (LTRIM), so it doesn't grow forever if there are no consumers. On the other end, the consumer would grab items from that list and process them. You can also use keyspace notifications, and be alerted each time an item is added to that queue.
I pass data between application using two native redis command:
rpush and blpop .
"blpop blocks the connection when there are no elements to pop from any of the given lists".
Data are passed in json format, between application using list as queue.
Application that want send data (act as publisher) make a rpush on a list
Application that want receive data (act as subscriber) make a blpop on the same list
The code shuold be (in perl language)
Sender (we assume an hash pass)
#Encode hash in json format
my $json_text = encode_json \%$hash_ref;
#Connect to redis and send to list
my $r = Redis->new(server => "127.0.0.1:6379");
$r->rpush("shared_queue","$json_text");
$r->quit;
Receiver (into a infinite loop)
while (1) {
my $r = Redis->new(server => "127.0.0.1:6379");
my #elem =$r->blpop("shared_queue",0);
#Decode hash element
my $hash_ref=decode_json($elem\[1]);
#make some stuff
}
I find this way very usefull for many reasons:
The element are stored into list, so temporary disabling of receiver has no information loss. When recevier restart, can process all items into the list.
High rate of sender can be handled with multiple instance of receiver.
Multiple sender can send data on unique list. In ths case should be easily implmented a data collector
Receiver process that act as daemon can be monitored with specific tools (e.g. pm2)
From Redis 5, there is new data type called "Streams" which is append-only datastructure. The Redis streams can be used as reliable message queue with both point to point and multicast communication using consumer group concept Redis_Streams_MQ

Prevent subscribers from reading certain samples temporarily

We have a situation where there are 2 modules, with one having a publisher and the other subscriber. The publisher is going to publish some samples using key attributes. Is it possible for the publisher to prevent the subscriber from reading certain samples? This case would arise when the module with the publisher is currently updating the sample, which it does not want anybody else to read till it is done. Something like a mutex.
We are planning on using Opensplice DDS but please give your inputs even if they are not specific to Opensplice.
Thanks.
RTI Connext DDS supplies an option to coordinate writes (in the documentation as "coherent write", see Section 6.3.10, and the PRESENTATION QoS.
myPublisher->begin_coherent_changes();
// (writers in that publisher do their writes) /* data captured at publisher */
myPublisher->end_coherent_changes(); /* all writes now leave */
Regards,
rip
If I understand your question properly, then there is no native DDS mechanism to achieve what you are looking for. You wrote:
This case would arise when the module with the publisher is currently updating the sample, which it does not want anybody else to read till it is done. Something like a mutex.
There is no such thing as a "global mutex" in DDS.
However, I suspect you can achieve your goal by adding some information to the data-model and adjust your application logics. For example, you could add an enumeration field to your data; let's say you add a field called status and it can take one of the values CALCULATING or READY.
On the publisher side, in stead of "taking a the mutex", your application could publish a sample with the status value set to CALCULATING. When the calculation is finished, the new sample can be written with the value of status set to READY.
On the subscriber side, you could use a QueryCondition with status=READY as its expression. Read or take actions should only be done through the QueryCondition, using read_w_condition() or take_w_condition(). Whenever the status is not equal to READY, the subscribing side will not see any samples. This approach takes advantage of the mechanism that newer samples overwrite older ones, assuming that your history depth is set to the default value of 1.
If this results in the behaviour that you are looking for, then there are two remaining disadvantages to this approach. First, the application logics get somewhat polluted by the use of the status field and the QueryCondition. This could easily be hidden by an abstraction layer though. It would even be possible to hide it behind some lock/unlock-like interface. The second disadvantage is due to the extra sample going over the wire when setting the status field to CALCULATING. But extra communications can not be avoided anyway if you want to implement a distributed mutex-like functionality. Only if your samples are pretty big and/or high-frequent, this is an issue. In that case, you might have to resort to a dedicated, small Topic for the single purpose of simulating the locking mechanism.
The PRESENTATION Qos is not specific RTI Connext DDS. It is part of the OMG DDS specification. That said the ability to write coherent changes on multiple DataWriters/Topics (as opposed to using a single DataWriter) is part of one of the optional profiles (object model profile), so no all DDS implementations necessariiy support it.
Gerardo