RabbitMQ persist last message for specific routing key - rabbitmq

I'm in the process of restructuring my home automation and am aiming to use RabbitMQ as the central entry point for data (weather, lights, heating, ...), moving away from REST.
Messages have the following structure
routing key: mcu.le.d8.w1.terrasse.BA1
{
"DS18B20": {
"T1": 14.75,
"T2": 14.56
},
"HYT221": {
"H": 73.2,
"T": 14.23
},
"LDR": 53,
"T": 14.66
}
or
routing key: weather.wetter.now
{
"from": "19:30",
"from-ts": 1508347800,
"local": "2017-10-18 19:59:00",
"sunrise": "07:49",
"sunrise-ts": 1508305740,
"sunset": "18:27",
"sunset-ts": 1508344020,
"temp": 14.1,
"text": "klar",
"wind-direction": 270,
"wind-speed": 2.5
}
with additional data such as where this data comes from and when it got generated.
All this data gets pushed into one single topic exchange, where clients bind to with exclusive, auto_deleting queues.
What I want is that when a new client connects and creates/binds a queue, that the last message for each routing key is sent to the client, so that the client obtains an up-to-date state of every routing key it subscribes to upon subscribing.
An alternative would be to create a client that subscribes to all and inserts it into a database, and the client queries the database first to get the snapshot, and then starts listening to updates from RabbitMQ.
Is there a way to do this without a database, only in RabbitMQ? I don't want to store all the messages, only the last one for each routing key.

It sounds like you need the exchange to persists messages and I do not think that is possible in Rabbit. The flow, as I understand it, is the exchange receives a request and forwards that message to all subscribed queues. If you do not have a queue subscribed, the message would be lost.
If you really want to avoid the db and don't mind a bit of a unique implementation, you could create a backup queue per message type and before publishing, read and acknowledge the message from the respective backup Q, and then when a new client comes in, you can read without acknowledging so that message will remain for another future client or until acknowledged by the publisher. This probably is more complex then storing to a db though...
P.S. sounds like you have a cool project going on. If you plan to put it on Instructables please send me a link.

Rabbit MQ does not have this concept (was mentioned over 10 years ago as planned, still not implemented). AMPS does, though - it's called "the state of the world".
http://devnull.crankuptheamps.com/documentation/html/5.2.1.0/user-guide/html/chapters/sow.html

Related

MassTransit exchange to exchange bindings benefit

The document said
Published messages are routed to a receive endpoint queue by message type, using exchanges and exchange bindings. A service's receive endpoints do not affect other services or their receive endpoints, as long as they do not share the same queue.
As I know, create one ReceiveEndpoint like below will then create one exchange and one queue with the same name (e.g. some-queue), and will bind this exchange to the message type's exchange.
services.AddMassTransit(x =>
{
x.AddConsumer<EventConsumer>();
x.UsingRabbitMq((ctx, cfg) =>
{
cfg.ReceiveEndpoint("some-queue", e =>
{
e.ConfigureConsumer<EventConsumer>(ctx);
});
});
});
However, I don't get the point why bother have additional "some-queue" exchange. Any example usecase will be helpful.
I cover the reasons for the topology in several videos, including this one.
I was looking for the answer to this question myself. For the benefit of the answer I paste some revised quotes of the linked video from Chris here:
MassTransit has done this since the very first versions.
If you would want to to send directly to a queue you would have to either:
Specify a blank exchange name and set the routing key equal to the queue name.
or
Send to an exchange that's bound to the queue.
You can't send to a queue directly.
[...]
When we looked at the topology for the broker for MassTransit, the approach we took is to create an exchange with the same name as the queue. This gives us some actually really cool features:
Let's say I want to keep a copy of every message sent to my endpoint. I can do that by just creating another queue and binding it to that exchange. That lets me do like a wiretap which is actually a messaging pattern.
[...]
When you're troubleshooting and ask yourself: "Why didn't the service do what I suspected?": With this I can go at that queue and then I could go
to my wiretap queue and look at every message that was received. This is a really cool way to kind of steal traffic and look at it and figure out what's going on.

Message Delivery Guarantee for Multiple Consumers in Pub/Sub and Messaging Queues

Requirement
A system undergoes some state change, and multiple other parts of the system has to know this(lets call them observers) so that they can perform some actions based on the current state, the actions of the observers are important, if some of the observers are not online(not listening currently due to some trouble, but will be back soon), the message should not be discarded till all the observers gets the message.
Trying to accomplish this with pub/sub model, here are my findings, (please correct if this understanding is wrong) -
The publisher creates an event on specific topic, and multiple subscribers can consume the same message. This model either provides no delivery guarantee(in redis), or delivery is guaranteed once(with messaging queues), ie. when one of the consumer acknowledges a message, the message is discarded(rabbitmq).
Example
A new Person Profile entity gets created in DB
Now,
A background verification service has to know this to trigger the verification process.
Subscriptions service has to know this to add default subscriptions to the user.
Now both the tasks are important, unrelated and can run in parallel.
Now In Queue model, if subscription service is down for some reason, a BG verification process acknowledges the message, the message will be removed from the queue, or if it is fire and forget like most of pub/sub, the delivery is anyhow not guaranteed for both the services.
One more point is both the tasks are unrelated and need not be triggered one after other.
In short, my need is to make sure all the consumers gets the same message and they should be able to acknowledge them individually, the message should be evicted only after all the consumers acknowledged it either of the above approaches doesn't do this.
Anything I am missing here ? How should I approach this problem ?
This scenario is explicitly supported by RabbitMQ's model, which separates "exchanges" from "queues":
A publisher always sends a message to an "exchange", which is just a stateless routing address; it doesn't need to know what queue(s) the message should end up in
A consumer always reads messages from a "queue", which contains its own copy of messages, regardless of where they originated
Multiple consumers can subscribe to the same queue, and each message will be delivered to exactly one consumer
Crucially, an exchange can route the same message to multiple queues, and each will receive a copy of the message
The key thing to understand here is that while we talk about consumers "subscribing" to a queue, the "subscription" part of a "pub-sub" setup is actually the routing from the exchange to the queue.
So a RabbitMQ pub-sub system might look like this:
A new Person Profile entity gets created in DB
This event is published as a message to an "events" topic exchange with a routing key of "entity.profile.created"
The exchange routes copies of the message to multiple queues:
A "verification_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.#"
A "subscription_setup_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.created"
The consuming scripts don't know anything about this routing, they just know that messages will appear in the queue for events that are relevant to them:
The verification service picks up the copy of the message on the "verification_service" queue, processes, and acknowledges it
The subscription setup service picks up the copy of the message on the "subscription_setup_service" queue, processes, and acknowledges it
If there are multiple consuming scripts looking at the same queue, they'll share the messages on that queue between them, but still completely independent of any other queue.
Here's a screenshot from this interactive visualisation tool that shows this scenario:
As you mentioned it is not something that you can control with Redis Pub/Sub data structure.
But you can do it easily with Redis Streams.
Streams will allow you to post messages using the XADD command and then control which consumers are dealing with the message and acknowledge that message has been processed.
You can look at these sample application that provides (in Java) example about:
posting and consuming messages
create multiple consumer groups
manage exceptions
Links:
Getting Started with Redis Streams and Java
Redis Streams in Action ( Project that shows how to use ADD/ACK/PENDING/CLAIM and build an error proof streaming application with Redis Streams and SpringData )

Rabbitmq is showing a publish rate lower than deliver/ack rate, without a backlog to consume. Can these statistics be accurate?

We have been using rabbit for a while, and after a recent deploy to create a new fanout exchange, we got quite spooked by something very curious: in the real time graphs in the management console, and in the REST API, we have queues consistently publishing fewer posts than they are delivering/acking, even though the queue size is hovering around 0!
Sample stats from REST API:
"message_stats": {
"ack": 149063323,
"ack_details": {
"rate": 305.0
},
"deliver": 149089898,
"deliver_details": {
"rate": 318.8
},
"deliver_get": 149089898,
"deliver_get_details": {
"rate": 318.8
},
"disk_reads": 4058297,
"disk_reads_details": {
"rate": 0.0
},
"disk_writes": 149084451,
"disk_writes_details": {
"rate": 227.6
},
"publish": 142374350,
"publish_details": {
"rate": 129.6
},
"redeliver": 5379,
"redeliver_details": {
"rate": 0.0
}
},
The queue has been flatlined at 0 for the time periods we're worrying about.
Here's what the web management console shows:
From what we can tell, we are not getting messages multiple times, so we are currently holding the opinion that this is a bug in the statistics collection or reporting. We have not yet had a window to reboot this node as it is a live system. We have discovered no ill-effects yet.
Gory details: Our queue has 240 consumers over 12 connections, and its being published to via a fanout exchange bound to two queues. Both of those queues are exhibiting this behavior. The publish rates of the queues are matching the publish rates of the exchange. The two queues both have a dead letter exchange/queue combo, and require ack. The fanout exchange, and one of the queues are new, and there have been some changes to how these messages are published (used to be published to the default exchange routed to the old queue, now we publish to the new fanout exchange to hit both the old and new queues). The consumers' code has not changed much at all for the old queue (one fewer insert into a postgres table), and the new queue runs code that is extremely similar to the old queue (except only inserting into that one postgres table).
Are these stats possible to encounter in the wild outside of buggy conditions? Could this be caused by a bad topography? What would such a state entail, what side effects would it have, and what types of setups would create this?

What happens if a Publisher terminates before receive ack?

I want to ensure that certain kind of messages couldn't be lost, hence I should use Confirms (aka Publisher Acknowledgements).
The broker loses persistent messages if it crashes before said
messages are written to disk. Under certain conditions, this causes
the broker to behave in surprising ways.
For instance, consider this scenario:
a client publishes a persistent message to a durable queue
a client consumes the message from the queue (noting that the message is persistent and the queue durable), but doesn't yet ack it,
the broker dies and is restarted, and
the client reconnects and starts consuming messages.
At this point, the client could reasonably assume that the message
will be delivered again. This is not the case: the restart has caused
the broker to lose the message. In order to guarantee persistence, a
client should use confirms.
But what if, using confirms, the Publisher goes down before receive the ack and the message wasn't delivery to the queue for some reason (i.e. network failure).
Suppose we have a simple REST endpoint where we can POST new COMMENTS and, when a new COMMENT is created we want to publish a message in a queue. (Note: it doesn't matter if I send a message of a new COMMENT that at the end isn't created due to a rollback for example).
CommentEndpoint {
Channel channel;
post(String comment) {
channel.publish("comments-queue",comment) // is a persistent queue
Comment aNewComment = new Comment(comment)
repository.save(comment)
// what happens if the server where this publisher is running terminates here ?
channel.waitConfirmations()
}
}
When the server restarts the channel is gone and the message could never be delivered.
One solution that comes to my mind is that after a restart, query the recent comments (¿something like the comments created between the last 3 min before the crash?) in the repository and send one message for each one and await confirmations.
What you are worried about is really no longer RabbitMQ only issue, it is a distributed transaction issue. This discussion gives one reasonable lightweight solution. And there are more strict solutions, for instance, two-phase commit, three-phase commit, etc, to ensure data consistent when it is really necessary.

How to get delivery path in rabbitmq to become message property?

The undelying use case
It is typical pubsub use case: Consider we have M news sources, and there are N subscribers who subscribe to the desired news sources, and who want to get news updates. However, we want these updates to land up in mongodb - essentially maintain most recent 'k' updates (and can be indexed and searched etc.). We want to design for M to scale upto million publishers, N to scale to few millions.
Subscribers' updates are finally received and stored in more than one hosts and their native mongodbs.
Modeling in rabbitmq
Rabbitmq will be used to persist the mappings (who subscribes to which news source).
I have setup a pubsub system in this way: We create publisher exchanges (each mapping to one news source) and of type 'fanout'.
For modelling subscribers, there are two options.
In the first option, have one queue for each subscriber bound to relevant publisher exchanges. And let the client process open connections to all these subscriber queues and receive the updates (and persist them to mongodb). Note that in this option, when the client is restarted, it has to manage list of all susbcribers, and open connections to all subscriber queues it is responsible for.
In the second option, we want to be able to remove overhead of having to explicitly open on each user queue upon startup. Instead, we want to listen to only one queue - representative of all subscribers who will send updates to this client host.
For achieving this, we first create one exchange for each subscriber and let it bind to the publisher exchange(s) that it follows. We let a single queue for each client, and let the subscriber exchange bind to this queue (type=direct) if the subscriber belongs to that client.
Once the client receives the update message, it should come to know which subscriber exchange it came from. Only then we can add it to mongodb for relevant subscriber. Presumably the subscriber exchange should add this information as a new header on the message.
As per rabbitmq docs, I believe there is no way to get achieve this. (Or more specifically, to get the 'delivery path' property from the delivered message, from which we can get this information).
My questions:
Is it possible to add a new header to message as it passes through exchange?
If this is not possible, then can we achieve it through custom exchange and relevant plugin? Any plugin that I can readily use for this purpose?
I am curious as to why rabbitmq is not providing delivery path property as an optional configuration?
Is there any other way I can achieve the same? (See pubsubhubbub note below)
PubSubHubBub
The use case is very similar to what pubsubhubbub protocol provides for. And there is rabbitmq plugin too called rabbithub. However, our system will be a closed system, and I believe that the webhook approach of the protocol is going to be too much of overhead compared to listening on single queue (and from performance perspective.)
The producer (RMQ Client) of the message should add all the required headers (including the originator's identity) before producing (publishing) it on RMQ. These headers are used for routing.
If, while in transit, the message (including headers) needs to be transformed (e.g. adding new headers), it needs to be sent to the transformer (another RMQ Client). This transformer will essentially become the new publisher.
The actual consumer should receive its intended messages (for which it has subscribed to) through single queue. The routing of all its subscribed messages should be arranged on the RMQ Exchange.
Managing the last 'K' updates should neither be the responsibility of the producer nor the consumer. So, it should be done in the transformer. Producers' messages should be routed to this transformer (for storage) before further re-routing to exchange(s) from where consumers consume.