OpenSplice DDS subscriber does not receive data all the time - data-distribution-service

I have the following scenario when using OpenSplice DDS:
A publisher publishes a DDS topic instance continuously. A subscriber subscribes data using NoOpDataReaderListener, my test runs both publisher and subscriber at the same and the method on_data_available of NoOpDataReaderListener in the subscriber print out the data content using std::cout.
My problem is that data is not printed out all the time when you run the test many runs even though the on_subscription_matched method of NoOpDataReaderListener is invoked during every run which indicating the matching of publisher and subscriber is successful. But the on_data_available does not always print out the data content.
Even I replace the data content printing out inside on_data_available with std::cout << data do come in, that message doesn’t appear all the time, only probably 7 out of 10 runs (7 runs all DDS samples can be seen from Subscriber, another 3 runs no DDS samples come out from Subscriber at all). So I think it is more to do with DDS problem instead of std out not being flushed out. Does anyone have idea or how to diagnostic DDS communication channel problem?
Some more information, with wireshark, I have observed the following:
it looks like whether the run is successful (subscriber successfully getting data from publisher) depends on if the user unicast port in publisher process can send data (RTPS DATA instead of just RTPS Heartbeat) to user multicast locator (239.255.0.1::multicast_port) in subscriber process.

Related

Cumulocity - managedObject Event - detect device first connection

Looking to understand whether there is a a bulletproof event from the namagedObject side of c8y where we know the device has just connected.
I have a microservice that listens for events in real time and I want to trigger a process once we know a device has connected to send its payload.
We have used:
"c8y_Connection": {"status":"CONNECTED"}
We have had the microservice log to Slack all events from managedObjects where we saw for three days the "status":"CONNECTED" value in the payload of our demo devices at reporting times.
But after three days, we see no more this "CONNECTED" state (all payloads showing "DISCONNECTED").
What I am trying to achieve from the inventoryObject event is to understand when a device had connected and sent payload to know when data had arrived. I then go get the data and process it externally. This is post registration and as part of the daily data send cycle for my type of device.
What would be the best way to understand when a device has sent payload in a microservice? I want to notify an external application with either “data is arriving for id 35213” or even better, “data has arrived for device 35213, and here’s the $payload”.
Just as a general information ahead:
The c8y_Connection fragment showing connected shows an active MQTT connection or an active long polling connection and it is only evaluated once every minute.
So if the client is just sending data and immediately disconnecting afterwards this might not picked up.
If you want to see the device having send something to Cumulocity maybe the c8y_Availability fragment is a better as it holds the timestamp when the device last send something.
{ "lastMessage": "2022-10-11T14:49:50.201+09:00", "status": "UNAVAILABLE"}
Also here the evaluation (or better the update to database) only happens every minute.
Both c8y_Availability and c8y_Connection however are only generated if the availability monitoring has been activated for the device (by defining a required interval for the device).
So if you have activated the availability monitoring and you see a "lastMessage" you can reliably say that the device has already send something to Cumulocity.

Pushing messages to new subscribers

I am creating a bulk video processing system using spring-boot. Here the user will provide all the video related information through an xlsx sheet and we will process the videos in the backend. I am using the Rabbitmq for queuing up the request.
Let say a user has uploaded a sheet with 100 rows,then there will be 100 messages in the Rabbitmq queue. In the back-end, we are auto-scaling the subscribers (servers). So we will start with one subscriber-only and based on the load (number of messages in the queue) we will scale up to 15 subscribers.
But our producer is very fast and it allocating all the messages to our first subscriber (before other subscribers are coming up) and all our new subscriber are not getting any messages from the queue.
If all the subscribers are available before producer started pushing the messages then it is allocating the messages to all servers.
Please provide me a solution of how can our new subscribers pull the messages from the queue that were produced earlier.
You are probably being affected by the listener container prefetchCount property - it defaults to 250 with recent versions (since 2.0).
So the first consumer will get up to 250 messages when it starts.
It sounds like you should reduce it to a small number, even all the way down to 1 so only one message is outstanding at each consumer.

Select consumers before publishing a message rabbitmq

I am trying to build a system where I need to select next available and suitable consumer to send a message from a queue (or may be any other solution not using the queue)
Requirements
We have multiple publishers/clients who would send objects (images) to process on one side and multiple Analysts who would process them, once processed the publisher should get the corresponding response.
The publishers do not care which Analyst is going to process the data.
Users have a web app where they can map each client/publisher to one or more or all agents, say for instance if Publisher P1 is mapped to Agents A & B, all objects coming from P1 can be processed by Agent A or Agent B. Note: an object can only be processed by one agent only.
Depending on the mapping I should have a middleware which consumes the messages from all publishers and distributes to the agents
Solution 1
My initial thoughts were to have a queue where all publishers post their messages. Another queue where Agents publish message saying they are waiting to process an object.
A middleware picks the message, gets the possible list of agents it can send the message to (from cached database) and go through the agents queue to find the next suitable and available agent and publish the message to that agent.
The issue with this solution is if I have agents queue like a,b,c,d and the message I receive can only be processed by agent b I will be rejecting agents d & c and they would end up at the tail of the queue and I have around 180 agents so they might never be picked or if the next message can only be processed by agent d (for example) we have to reject all the agents to get there
Solution 2
First bit from publishers to middleware is still the same
Have a scaled fast nosql database where agents add a record to notify there availability. Basically a key value pair
The middleware gets config from cache and gets the next available + suitable agent from the nosql database sends message to the agent's queue (through direct exchange) and updates the nosql to set isavailable false ad gets the next message.
Issue with this solution is the db and middleware can become a bottleneck, also if I scale the middleware I will end up in database concurrency issues for example f I have two copies of middleware running and each recieves a message which can be proceesed by Agents A & B and both agents are available.
The two middleware copies would query the db and might get A as availble and end up sneding both messages to A while B is still waiting for a message to process.
I will have around 100 publishers and 180 agents to start with.
Any ideas how to improve these solutions or any other feasible solution would be highly appreciated?
Depending on this I also need to figure out how the Agent would send response back to the publisher.
Thank you
I'll answer this from the perspective the perspective of my open-source service bus: Shuttle.Esb
Typically one would ignore any content-based routing and simply have a distributor pattern. All message go to the primary endpoint and it will distribute the messages. However, if you decide to stick to these logical groupings you could have primary endpoints for each logical grouping (per agent group). You would still have the primary endpoint but instead of having worker endpoints mapped to agents you would have agent groupings map to the logical primary endpoint with workers backing that.
Then in the primary endpoint you would, based on your content (being the agent identifier), forward the message to the relevant logical primary endpoint. All the while you keep track of the original sender. In the worker you would then send a message back to the queue of the original sender.
I'm sure you could do pretty much the same using any service bus.
I see several requirements in here, that can be boiled down to a few things, I think:
publisher does not care which agent processes the image
publisher needs to know when the image processing is done
agent can only process 1 image at a time
agent can only process certain images
are these assumptions correct? did I miss anything important?
if not, then your solution is pretty much built into RabbitMQ with routing and queues. there should be no need to build custom middle-tier service to manage this.
With RabbitMQ, you can have a consumer set to only process 1 message at a time. The consumer sets it's "prefetch" limit to 1, and retrieves a message from the queue with "no ack" set to false - meaning, it must acknowledge the message when it is done processing it.
To consume only messages that a particular agent can handle, use RabbitMQ's routing capabilities with multiple queues. The queues would be created based on the type of image or some other criteria by which the consumers can select images.
For example, if there are two types of images: TypeA and TypeB, you would have 2 queues - one for TypeA and one for TypeB.
Then, if Agent1 can only handle TypeA images, it would only consume from the TypeA queue. If Agent2 can handle both types of images, it would consume from both queues.
To put the right images in the right queue, the publisher would need to use the right routing key. If you know if the image type (or whatever the selection criteria is), you would change the routing key on the publisher side to match that selection criteria. The routing in RabbitMQ would be set up to move messages for TypeA into the TypeA queue, etc.
The last part is getting a response on when the image is done processing. That can be accomplished through RabbitMQ's "reply to" field and related code. The gist of it is that the publisher has it's own exclusive queue. When it publishes a message, it includes the name of it's exclusive queue in the "reply to" header of the message. When the agent finishes processing the image, it sends a status update message back through the queue found in the "reply to" header. That status update message tells the producer the status of the request.
From a RabbitMQ perspective, these pieces can be put together using the examples and documentation found here:
http://www.rabbitmq.com/getstarted.html
Look at these specifically:
Work Queues: http://www.rabbitmq.com/tutorials/tutorial-two-python.html
Topics: http://www.rabbitmq.com/tutorials/tutorial-five-python.html
RPC (aka Request/Response): http://www.rabbitmq.com/tutorials/tutorial-six-python.html
You'll find examples in many languages, in these docs.
I also cover most of these scenarios (and others) in my RabbitMQ Patterns eBook
Since the total number of senders and receivers are only hundreds, how about to create one queue for each of your senders. Based on your sender receiver mapping, receivers subscribes to the sender queues (update the subscribing on mapping changes). You could configure your receiver to only receive the next message from all the queues it subscribes (in a random way) when it finishes processing one message.

"Unknown delivery tag" from RabbitMQ when ack'ing a message in a cluster with replicated queues

We've been using Rabbit successfully for about a year. Recently have upgraded to v2.6.1, because we want to use clusters with replicated message queues.
My testing has hit a puzzling behavior that smells like a Rabbit bug to me. The test that uncovers this is working with a two-node cluster. Both nodes are running v2.6.1. Both nodes have disk. Both nodes are running on Mac OS, though I doubt this is pertinent.
I'm also running Alice on the node that runs the test. The test uses it to programmatically do a stop_app on one of the nodes, because the test is trying to validate that if the cluster master fails, and a slave is elevated to take its place, that we don't lose messages.
So, the test has a small thread pool, which is given tasks that periodically 1) publish messages, and 2) toggle the state of the Rabbit master node (stopped if running; started if stopped). Other threads are consuming messages from queues.
I'm using publisher confirms, and I'm also acknowledging the messages in the consumers (using autoAck=false for channel.basicConsume()).
When the master node is stopped, I see both the producers and consumers catching ShutdownSignalException. They handle this by attempting to reconnect to the cluster. This works fine. When reconnected, they continue with their business.
Sometimes, what I see is that a consumer has successfully fetched a message from the broker, and is calling channel.basicAck() when it gets that ShutdownSignalException.
Later, when the consumer has reconnected, it again pulls down the same message. (The message bodies are tagged with a UUID, so I know it is the same one.) This time, when the consumer attempts to basicAck() the message, it again gets ShutdownSignalException, but this one has the following text in it: "reply-text=PRECONDITION_FAILED - unknown delivery tag 7".
In fact, that is the same delivery tag that was offered to the consumer by the broker before the master went down and the consumer reconnected.
Googling suggests that this event means that the consumer is attempting to ack the same message more than once.
But, how can this be so? If the first ack succeeded, then the message should have been removed from the broker's queues, and the consumer shouldn't see the same message again.
Yet, if the first ack did not succeed, then the consumer shouldn't be dinged for attempting to re-ack the message.
Anyone seen this before? It smells like a bug in Rabbit's replicated queues to me, but I've still new to Rabbit, and so am willing to believe there's a subtlety here in consuming from a clustered broker that I haven't yet grokked!
Thanks, --Steve
I'm not sure if my case matching yours, but I have seen similar "unknown delivery tag" on attempts to ack after reconnect and then the same message arrived again. Initially it looked like a bug to me, but in fact this is expected behavior. Consumer with QOS>1 may have in it's local buffer some messages and delivery tag will be invalid for all o them after reconnect. From another hand, attempt to ack even the current message after reconnect doesn't make any sense, because that message already nacked automatically on connection lost and this is why I got it again.

NServiceBus Message Replay Archive Architecture

I'm building an application that needs to preserve copies of the messages it is sending so that I can replay all the messages at a later stage. This is necessary because the processing of the messages will change dramatically over the course of development, but the data must be captured ASAP since it is realtime observation data. I can't seem to find any built-in functionality that directly addresses this and while I could write a custom tool to persist the data, that seems to contradict the purpose of using NServiceBus in the first place. Some options I'm considering:
Use the ForwardReceivedMessagesTo functionality of the Target bus to create an Archive queue, and build a simple application which uses this Archive queue as an input queue for simply forwarding messages onto the Target bus whenever the Replayer tool runs. This does clear the Archive queue, requiring it to be backed up first using the mqbkup utility, but this can be automated as part of the replay process. Alternatively, using two alternating Archive queues (one taking in new messages, and one for replaying) would solve this.
Use a publish/subscribe model and have an Archiver subscribe to the Target queue, placing the message in an Archive queue. A Replayer tool similar to the one above could use the Archive queue as an input queue and forward the messages to the Target. This would also clear the Archive queue, requiring one of the solutions above.
MassTransit people mention something called BusDriver that allows copying between queues but I can't find anything more about it.
My primary concern is to choose the approach that is least likely to lose data as once an observation is made it can never be made again outside of a narrow time window. This seems like it should be a common problem and yet I can't seem to find a straightforward solution to it. Suggestions?
Update I've decided to go with a journalled Target queue. I'll have an Archiver use the journal as an input and store messages to a database (could just be file-based), as well as allow for replaying messages from that database to the Target queue. While it would be possible to write a tool that copies messages from the journal queue to the target queue, the real problem - from a practical perspective - is that of managing the journal queue: it can't be backed up easily (mqbkup brings down the MSMQ service, which is unacceptable), and operating non-destructively on the queue requires me to write an MSMQ-based tool when I'd rather stick to the NServiceBus level of abstraction. Ultimately, MSMQ is a transport and not a store of messages so it needs to be treated as such.
The first option seems viable. I would add that NSB comes with a tool named ReturnToSourceQueue.exe that will replay messages for you. You could turn on journalling to keep messages, but something would have to clean that up since it doesn't roll. We have NetApp and we've used its SnapMirror to backup queue data, I'm sure there are similar things out there.
#3 BusDriver command-line reference:
BusDriver is a command-line utility used to administer queues, service bus instances and other things related to MassTransit.
Command-Line Reference
busdriver.exe [verb] [-option:value] [--switch]
help, --help Displays help
count Counts the number of messages in the specified
queue
-uri The URI of the queue
peek Displays the body of messages in the queue without
removing the messages
-uri The URI of the queue
-count The number of messages to display
move Move messages from one queue to another
-from The URI of the source queue
-to The URI of the destination queue
-count The number of messages to move
requeue Requeue messages from one queue to another
-uri The URI of the queue
-count The number of messages to move
save Save messages from a queue to a set of files
-uri The URI of the source queue
-file The name of the file to write to (will have .1, .2
appended automatically for each message)
-count The number of messages to save
--remove If set, the messages will be removed from the queue
load Load messages from a set of files into a queue
-uri The URI of the destination queue
-file The name of the file to read from (will have .1, .2
appended automatically for each message)
-count The number of messages to load
--remove If set, the message file will be removed once the
message has been loaded
trace Request a trace of messages that have been received
by a service bus
-uri The URI of the control bus for the service bus
instance
-count The number of messages to request
status Request a status probe of the bus at the endpoint
-uri The URI of the control bus for the service bus instance
exit, quit Exit the interactive console (run without arguments
to start the interactive console)
Examples:
count -uri:msmq://localhost/mt_server
Returns the number of messages that are present in the local
MSMQ private queue named "mt_server"
peek -uri:msmq://localhost/mt_client
Displays the body of the first message present in the local
MSMQ private queue named "mt_client"
trace -uri:msmq://localhost/mt_subscriptions
Requests and displays a trace of the last 100 messages received
by the mt_subscriptions (the default queue name used by the
subscription service, which is part of the RuntimeServices)
move -from:msmq://localhost/mt_server_error -to:msmq://localhost/mt_server
Moves one message from the mt_server_error queue to the mt_server
queue (typically done to reprocess a message that was previously
moved to the error queue due to a processing error, etc.)