Why pub sub in redis cannot be used together with other commands?

Why pub sub in redis cannot be used together with other commands? - redis

I'm reading here, and I see a warning stating that PUB/SUB subscribers in Redis should not issue other commands:
A client subscribed to one or more channels should not issue commands,
although it can subscribe and unsubscribe to and from other channels.
I have two questions:
Why is this limitation?
For the scope of the paragraph, what's a client? A whole process? A Redis connection? A complete Redis instance? Or is it a bad idea in general to issue commands and subscribe to channels, and the admonition goes for every and any scope I can think of?

A client, in this case, is an instance of a connection to Redis. An application could well have multiple clients, each with different responsibilities or as a way to provide higher degrees of parallelism to the application.
What they are suggesting here, however, is that you use an individual client (think 'connection') to handle your incoming subscription messages and to react to those messages as its sole responsibility. The reason it's recommended not to make calls with this connection is because while it is waiting on incoming messages from subscribed channels, the client is in a blocked state.
Trying to make a call on a given client won't work while it's awaiting response from a blocking call.

Related

How does Redis PubSub subscribe mechanism works?

I want to create a Publish-Subscribe infrastructure in which every subscriber will listen to multiple (say 100k) channels.
I think to use Redis PubSub for that purpose but I'm not sure if subscribing to thousands of channels is the best practice here.
To answer this I want to know how subscribing mechanism in Redis works in the background.
Another option is to create a channel per subscriber and put some component in between, that will get all messages and publish it to relevant channels.
Any other Idea?

Salvatore/creator of Redis has answered this here: https://groups.google.com/forum/#!topic/redis-db/R09u__3Jzfk
All the complexity on the end is on the PUBLISH command, that performs
an amount of work that is proportional to:
a) The number of clients receiving the message.
b) The number of clients subscribed to a pattern, even if they'll not
match the message.
This means that if you have N clients subscribed to 100000 different
channels, everything will be super fast.
If you have instead 10000 clients subscribed to the same channel,
PUBLISH commands against this channel will be slow, and take maybe a
few milliseconds (not sure about the actual time taken). Since we have
to send the same message to everybody.

Persisting Data in a Twisted App

I'm trying to understand how to persist data in a Twisted application. Let's say I've decided to write a Twisted server that:
Accepts inbound SMTP requests
Sends the message to a 3rd party system for modification
Relays the modified message to its destination
A typical Twisted tutorial would have you build this app using Deferreds and callbacks, roughly:
A Factory handles inbound requests
Each time a full email is received a call is sent to the remote message processor, returning a deferred
Add an errback that substitutes the original message if anything goes wrong in the modify call.
Add a callback to send the message on to the recipient, which again returns a deferred.
A real server would add/include additional call/errbacks to retry or notify the sender or whatnot. Again for simplicity, assume we consider this an acceptable amount of effort and just log errors.
Of course, this persists NO data in the event of a crash/restart/something else. I get that a solution involves a 3rd party persistent datastore (RabbitMQ is often mentioned) and could probably come up with a dozen random ways to achieve the outcome.
However, I imagine there are a few approaches that work best in a Twisted app. What do they look like? How do they store (and restore in the event of a crash) the in-process messages?

If you found this question, you probably already know that Twisted is event-based. It sounds simple, but the "hardest" part of the answer is to get the persistence platform generating the events we need when we need them. Naturally, you can persist the data in a DB or a message queue, but some platforms don't naturally generate events. For example:
ZeroMQ has (or at least had) no callback for new data. It's also relatively poor at persistence.
In other cases, events are easy but reliability is a problem:
pgSQL can be configured to generate events using triggers, but they're one-time things so you can't resume incomplete events
The light at the end of the tunnel seems to be something like RabbitMQ.
RabbitMQ can persist the message in a database to survive a crash
We can use acknowledgements on both legs (incoming SMTP to RabbitMQ and RabbitMQ to outgoing SMTP) to ensure the application is reliable. Importantly, RabbitMQ supports acknowledgements.
Finally, several of the RabbitMQ clients provide full asynchronous support (see for example pika, txampq, and puka)
It's enough for our purposes that the RabbitMQ client provides us an event-based interface.
At a more theoretical level, however, this need not be the case. In fact, despite the "notification" issue above, ZeroMQ has an event-based client. Even if our software is elegantly event-based, we will run into systems that aren't. In these cases, we have no choice but to fall back on polling. In principle, if not in practice, we just query the message provider for messages. When we exhaust the current queue (and immediately if there are no messages), we use a callLater command to check again in the future. It may feel anti-pattern, but it's (to the best of my knowledge anyway) the right way to handle this particular case.

Message bus: sender must wait for acknowledgements from multiple recipients

In our application the publisher creates a message and sends it to a topic.
It then needs to wait, when all of the topic's subscribers ack the message.
It does not appear, the message bus implementations can do this automatically. So we are leaning towards making each subscriber send their own new message for the client, when they are done.
Now, the client can receive all such messages and, when it got one from each destination, do whatever clean-ups it has to do. But what if the client (sender) crashes part way through the stream of acknowledgments? To handle such a misfortune, I need to (re)implement, what the buses already implement, on the client -- save the incoming acknowledgments until I get enough of them.
I don't believe, our needs are that esoteric -- how would you handle the situation, where the sender (publisher) must wait for confirmations from multiple recipients (subscribers)? Sort of like requesting (and awaiting) Return-Receipts from each subscriber to a mailing list...
We are using RabbitMQ, if it matters. Thanks!

The functionality that you are looking for sounds like a messaging solution that can perform transactions across publishers and subscribers of a message. In The Java world, JMS specifies such transactions. One example of a JMS implementation is HornetQ.
RabbitMQ does not provide such functionality and it does for good reasons. RabbitMQ is built for being extremely robust and to perform like hell at the same time. The transactional behavior that you describe is only achievable with the cost of reasonable performance loss (especially if you want to keep outstanding robustness).
With RabbitMQ, one way to assure that a message was consumed successfully, is indeed to publish an answer message on the consumer side that is then consumed by the original publisher. This can be achieved through RabbitMQ's RPC procedure calls which might help you to get a clean solution for your problem setting.
If the (original) publisher crashes before all answers could be received, you can assume that all outstanding answers are still queued on the broker. So you would have to build your publisher in a way that it is capable to resume with processing those left messages. This might turn out to be none-trivial.
Finally, I recommend the following solution: Design your producing component in a way that you can consume the answers with one or more dedicated answer consumers that are separated from the origin publisher.
Benefits of this solution are:
the origin publisher can finish its task independent of consumer success
the origin publisher is independent of consumer availability and speed
the origin publisher implementation is far less complex
in a crash scenario, the answer consumer can resume with processing answers
Now to a more general point: One of the major benefits of messaging is the decoupling of application components by the broker. In AMQP, this is achieved with exchanges and bindings that allow you to move message distribution logic from your application to a central point of configuration.
If you add RPC-style calls to your clients, then your components are most likely closely coupled again, meaning that the publishing component fails if one of the consuming components fails / is not available / too slow. This is exactly what you will want to avoid. Otherwise, why would you have split the components then?
My recommendation is that you design your application in a way that publishers can complete their tasks independent of the success of consumers wherever possible. Back-channels should be an exceptional case and be implemented in the described not-so coupled way.

Why is pausing a queue not a broker function?

I was looking for an ActiveMQ broker admin command, to tell it to pause a queue - that is:
continue accepting messages from producing clients
cease delivering to consuming clients, allowing the queue backlog to grow until the queue is resumed, whereupon the backlog is sent to clients.
I was unable to find such a command. The commonest answer was that it should be managed at the client end -- that is, locate every consumer and stop it. Other answers were workarounds, like manipulating network routes or firewalls so that the clients and broker could no longer communicate.
A cursory survey of other message queues indicates that ActiveMQ is not unusual in this regard.
It seems to me there are two reasons this functionality might not be implemented:
It is difficult to implement -- but I can't think of any reason why.
It is counter to the design philosophy of message queues
Which is it, and why?

Being able to pause a queue is supported in the newly released ActiveMQ 5.12.0:
When the queue is "paused":
NO messages sent to the associate consumers
messages still to be enqueued on the queue
ability to be able to browse the queue
all the JMX counters for the queue to be available and correct.
...
implemented pause/resume/isPaused queue view mbean ops and attribute
when paused, there is no dispatch to regular queue consumers, send
and browse work as normal. Any inflight messages will continue inflight
till ackes as normal.
See https://issues.apache.org/jira/browse/AMQ-5229
If you have Jolokia enabled (I think it is enabled by default nowadays), you can use something like the following curl request to pause the queue:
curl --user admin:admin http://127.0.0.1:8161/api/jolokia/exec/org.apache.activemq:brokerName=localhost,destinationName=myQueue,destinationType=Queue,type=Broker/pause
(Using the default username, password and broker name and a queue called myQueue)
Replace "pause" with "resume" in order to resume the queue.

Probably not too complicated to implement - as you say.
I don't know if it's an active design decision of if there has been no demand. Other similar products such as IBM WebSphere MQ implements "get/put inhibited" on queues, so it's obviously is not totally against the philosofy of messaging - rather a tool to operate and trouble shoot live systems.
I'm a bit biased, but I actually like to decouple the sender from the receive (if the are two different systems, that might eventually get switched/upgraded/changed..).
An easy way to decouple the systems, and be able to do what you want is to make the sender send to one queue "DATA.OUT" and the receiver listen to another "DATA.IN". Then you can use Apache Camel (which is typically bundled with ActiveMQ to achieve Enterprise Integration Patterns), to route from DATA.OUT to DATA.IN.
A Camel Route is possible to start/stop via JMX, which will achieve something similar to what you described.
I guess ActiveMQ design in the matter rather have you do these kind of things in a middleware layer, such as Apache Camel, rather than direct on the queues.

Does the redis pub/sub model require persistent connections to redis?

In a web application, if I need to write an event to a queue, I would make a connection to redis to write the event.
Now if I want another backend process (say a daemon or cron job) to process the or react the the publishing of the event in redis, do I need a persistant connection?
Little confused on how this pub/sub process works in a web application.

Basically in Redis there are two different messaging models:
Fire and Forget / One to Many: Pub/Sub. At the time a message is PUBLISH-ed all the subscribers will receive it, but this message is then lost forever. If a client was not subscribed there is no way it can get it back.
Persisting Queues / One to One: Lists, possibly used with blocking commands such as BLPOP. With lists you have a producer pushing into a list, and one or many consumers waiting for elements, but one message will reach only one of the waiting clients. With lists you have persistence, and messages will wait for a client to pop them instead of disappearing. So even if no one is listening there is a backlog (as big as your available memory, or you can limit the backlog using LTRIM).
I hope this is clear. I suggest you studying the following commands to understand more about Redis and messaging semantics:
LPUSH/RPUSH, RPOP/LPOP, BRPOP/BLPOP
PUBLISH, SUBSCRIBE, PSUBSCRIBE
Doc for this commands is available at redis.io

I'm not totally sure, but I believe that yes, pub/sub requires a persistent connection.
For an alternative I would take a peek at resque and how it handles that. Instead of using pub/sub it simply adds an item to a list in redis, and then whatever daemon or cron job you have can use the lpop command to get the first one.
Sorry for only giving a pseudo answer and then a plug.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas