RabbitMQ application stops when another node in cluster is shutdown - rabbitmq

I am new to RabbitMQ and I have troubles when handling RabbitMQ cluster.
The topology is like:
At first, every is ok. RabbitMQ node1 and RabbitMQ node2 are in a cluster.
They are interconnected by a RabbitMQ plugin called autocluster.
Then I delete pod rabbitmq-1 by kubectl delete pod rabbitmq-1. And I found that RabbitMQ application in node1 is stopped. I don't understand why RabbittoMQ will stop application if it detects another node's failure. It does not make sense. Is this behaviour designed by RabbitMQ or autocluster? Can you enlighten me?
My config is like:
[
{rabbit, [
{tcp_listen_options, [
{backlog, 128},
{nodelay, true},
{linger, {true,0}},
{exit_on_close, false},
{sndbuf, 12000},
{recbuf, 12000}
]},
{loopback_users, [<<"guest">>]},
{log_levels,[{autocluster, debug}, {connection, debug}]},
{cluster_partition_handling, pause_minority},
{vm_memory_high_watermark, {absolute, "3276MiB"}}
]},
{rabbitmq_management, [
{load_definitions, "/etc/rabbitmq/rabbitmq-definitions.json"}
]},
{autocluster, [
{dummy_param_without_comma, true},
{autocluster_log_level, debug},
{backend, etcd},
{autocluster_failure, ignore},
{cleanup_interval, 30},
{cluster_cleanup, false},
{cleanup_warn_only, false},
{etcd_ttl, 30},
{etcd_scheme, http},
{etcd_host, "etcd.kube-system.svc.cluster.local"},
{etcd_port, 2379}
]}
]
In my case, x-ha-policy is enabled.

You set cluster_partition_handling to pause_minority. One out of two nodes isn't the majority, so the cluster stops as configured. You either have to add an additional node or set cluster_partition_handling to ignore.
From the docs:
In pause-minority mode RabbitMQ will automatically pause cluster nodes
which determine themselves to be in a minority (i.e. fewer or equal
than half the total number of nodes) after seeing other nodes go down.
It therefore chooses partition tolerance over availability from the
CAP theorem. This ensures that in the event of a network partition, at
most the nodes in a single partition will continue to run. The
minority nodes will pause as soon as a partition starts, and will
start again when the partition ends.

Related

Merge two message threads into one

have two message threads, each thread consists of ten messages. I need to request to display these two chains in one.
The new thread must consist of ten different messages: five messages from one system, five messages from another (backup) system. Messages from the system use the same SrcMsgId value. Each system has a unique SrcMsgId within the same chain. The message chain from the backup system enters the splunk immediately after the messages from the main system. Messages from the standby system also have a Mainsys_srcMsgId value - this value is identical to the main system's SrcMsgId value. Tell me how can I display a chain of all ten messages? Perhaps first messages from the first system (main), then from the second (backup) with the display of the time of arrival at the server.
Specifically, we want to see all ten messages one after the other, in the order in which they arrived at the server. Five messages from the primary, for example: ("srcMsgId": "rwfsdfsfqwe121432gsgsfgd71") and five from the backup: ("srcMsgId": "rwfsdfsfqwe121432gsgsfgd72"). The problem is that messages from other systems also come to the server, all messages are mixed (chaotically), which is why we want to organize all messages from one system and its relative in the search. Messages from the backup system are associated with the main system only by this parameter: "Mainsys_srcMsgId" - using this key, we understand that messages come from the backup system (secondary to the main one).
Examples of messages from the primary and secondary system:
Main system:
{
"event": "Sourcetype test please",
"sourcetype": "testsystem-2",
"host": "some-host-123",
"fields":
{
"messageId": "ED280816-E404-444A-A2D9-FFD2D171F32",
"srcMsgId": "rwfsdfsfqwe121432gsgsfgd71",
"Mainsys_srcMsgId": "",
"baseSystemId": "abc1",
"routeInstanceId": "abc2",
"routepointID": "abc3",
"eventTime": "1985-04-12T23:20:50Z",
"messageType": "abc4",
.....................................
Message from backup system:
{
"event": "Sourcetype test please",
"sourcetype": "testsystem-2",
"host": "some-host-123",
"fields":
{
"messageId": "ED280816-E404-444A-A2D9-FFD2D171F23",
"srcMsgId": "rwfsdfsfqwe121432gsgsfgd72",
"Mainsys_srcMsgId": "rwfsdfsfqwe121432gsgsfgd71",
"baseSystemId": "abc1",
"routeInstanceId": "abc2",
"routepointID": "abc3",
"eventTime": "1985-04-12T23:20:50Z",
"messageType": "abc4",
"GISGMPRequestID": "PS000BA780816-E404-444A-A2D9-FFD2D1712345",
"GISGMPResponseID": "PS000BA780816-E404-444B-A2D9-FFD2D1712345",
"resultcode": "abc7",
"resultdesc": "abc8"
}
}
When we want to combine in a query only five messages from one chain, related: "srcMsgId".
We make the following request:
index="bl_logging" sourcetype="testsystem-2"
| транзакция maxpause=5m srcMsgId Mainsys_srcMsgId messageId
| таблица _time srcMsgId Mainsys_srcMsgId messageId продолжительность eventcount
| сортировать srcMsgId_time
| streamstats current=f window=1 значения (_time) as prevTime по теме
| eval timeDiff=_time-prevTime
| delta _time как timediff

How frequently are the Azure Storage Queue metrics updated?

I observed that it took about 6 hours from the time of setting up Diagnostics (the newer offering still in preview) for the Queue Message Count metric to move from 0 to the actual total number of messages in queue. The other capacity metrics Queue Capacity and Queue Count took about 1 hour to reflect actual values.
Can anyone shed light on how these metrics are updated? It would be good to know how to predict the accuracy of the graphs.
I am concerned because if the latency of these metrics is typically this large then an alert based on queue metrics could take too long to raise.
Update:
Platform metrics are created by Azure resources and give you visibility into their health and performance. Each type of resource creates a distinct set of metrics without any configuration required. Platform metrics are collected from Azure resources at one-minute frequency unless specified otherwise in the metric's definition.
And 'Queue Message Count' is platform metrics.
So it should update the data every 1 minute.
But it didn't. And this is not a problem that only occur on portal. Even you use rest api to get the QueueMessageCount, it still not update after 1 minute:
https://management.azure.com/subscriptions/xxx-xxx-xxx-xxx-xxx/resourceGroups/0730BowmanWindow/providers/Microsoft.Storage/storageAccounts/0730bowmanwindow/queueServices/default/providers/microsoft.insights/metrics?interval=PT1H&metricnames=QueueMessageCount&aggregation=Average&top=100&orderby=Average&api-version=2018-01-01&metricnamespace=Microsoft.Storage/storageAccounts/queueServices
{
"cost": 59,
"timespan": "2021-05-17T08:57:56Z/2021-05-17T09:57:56Z",
"interval": "PT1H",
"value": [
{
"id": "/subscriptions/xxx-xxx-xxx-xxx-xxx/resourceGroups/0730BowmanWindow/providers/Microsoft.Storage/storageAccounts/0730bowmanwindow/queueServices/default/providers/Microsoft.Insights/metrics/QueueMessageCount",
"type": "Microsoft.Insights/metrics",
"name": {
"value": "QueueMessageCount",
"localizedValue": "Queue Message Count"
},
"displayDescription": "The number of unexpired queue messages in the storage account.",
"unit": "Count",
"timeseries": [
{
"metadatavalues": [],
"data": [
{
"timeStamp": "2021-05-17T08:57:00Z",
"average": 1.0
}
]
}
],
"errorCode": "Success"
}
],
"namespace": "Microsoft.Storage/storageAccounts/queueServices",
"resourceregion": "centralus"
}
This may be an issue that needs to be reported to the azure team. It is so slow, it even loses its practicality. I think send an alert based on this is a bad thing(it’s too slow).
Maybe you can design you own logic by code to check the QueueMessageCount.
Just a sample(C#):
1, Get Queues
Then get all of the queue names.
2, Get Properties
Then get the number of the message in each queue.
3, sum the obtained numbers.
4, send custom alert.
Original Answer:
At first, after I send message to one queue in queue storage, the 'Queue Message Count' also remains stubbornly at zero on my side, but a few hours later it can get the 'Queue Message Count':
I thought it would be a bug, but it seems to work well now.

RabbitMQ File Size-based Log Rotation Default Size

I'm new to RabbitMQ and I have an application that uses RabbitMQ as the message broker. Up until this day, I've been using the default settings - no log rotation. I wanted to use the log rotation feature so I set it using:
{log, [
{file, [{file, "MyAppLogs.log"},
{level, info},
{date, "$D0"},
{size, 1073741824},
{count, 30}
]}
]}
Of course testing would take a while if I am to test 1GB file size, so for testing purposes I changed it to 1024 instead. I expected the log will rotate when it reaches 1KB but it did not. I've noticed that the log files would only rotate once the file size reaches 5KB.
So my question is - is the minimum log file size for RabbitMQ file-based log rotation 5KB?
I've looked around the web - especially in the rabbitmq documentation site: https://www.rabbitmq.com/logging.html - however there's no mention of any minimimum size.
Here is the sample output of my the settings that I've used:
Test Settings:
[{file, [{file, "rabbit.log"},
{level, info},
{date, "$D0"},
{size, 1024},
{count, 3}
]}
]}
https://groups.google.com/d/topic/rabbitmq-users/wJGMVGB1cAk/discussion
Hi Renya,
Please always let us know what version of RabbitMQ and Erlang you are using. I can tell you're using Windows - what version?
Log rotation is not necessarily precise due to when it happens in the logging process, as well as buffering.
Thanks -
Luke
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
This requires rabbitmq version > 3.7. Put the log rotate logics inside your rabbitmq.conf file like below:
{log, [
{file, [{file, "/var/log/rabbitmq/rabbitmq.log"}, %% log.file
{level, info}, %% log.file.info
{date, "$D0"}, %% log.file.rotation.date
{size, 1024}, %% log.file.rotation.size
{count, 15} %% log.file.rotation.count
]}
]},

RabbitMQ Queues HA and Dead Letter Exchanges Not Working

I have 3 nodes (A,B,C) in my cluster . Right now I want to configure the queue High Availability using the ha-nodes option with nodes A and C as the params.I am successfully configured the HA policy and its working. But after I use the DLX policy for all queues, the HA policy is not working anymore.
Is that normal or am I missing something here?
I want to use the HA policy and DLX policy together, but now it seems impossible. Thanks.
Only one policy is applied at a time for a given queue or exchange:
http://www.rabbitmq.com/parameters.html#policies
But you still can configure HA and dead-lettering together: you just need to do that in one policy. Here is an example:
{
"ha-mode": "nodes",
"ha-params": ["A", "C"],
"dead-letter-exchange": "my-dlx"
}

Persistence and Durability Concepts Confusion in AMQP

Being a bit confused about these two terms, I'm thinking what is the purpose of having a persistent message but transient (non-durable) queue?
After all, if the broker restarts and the queues are not restored the recovered messages will be wasted.
You can have durable queue but "mortal" messages, so after broker restarts you can still have queue but it will be empty and vice versa, but as you sad, yes, you'll lose all messages in the queue.
In the combination you provided message persistence option is really useless but will cause no error.
But if you bind alternate exchange to exchange you are publishing messages to and it is durable, after restart, your can route messages to it if you don't have transient queue declared.
Example:
Assume we have such combination and properly bound queues, Q*1 receive messages M*1 and Q*2 - M*2.
[ Exchange-main/durable ] + [Exchange-alternate/durable]
[Qm1/transient][Qm2/transient] [Qax1/durable][Qax2/durable]
Let's publish messages [Mt1/transient] and `[Md1/durable], we'll get such situation:
[ Exchange-main/durable ] + [Exchange-alternate/durable]
[Qm1/transient][Qm2/transient] [Qax1/durable][Qax2/durable]
[Mt1/transient]
[Md1/durable]
After restart we'll get
[ Exchange-main/durable ] + [Exchange-alternate/durable]
[Qax1/durable][Qax2/durable]
Let's publish two messages again, [Mt1/transient] and `[Md1/durable]:
[ Exchange-main/durable ] + [Exchange-alternate/durable]
[Qax1/durable][Qax2/durable]
[Mt1/transient]
[Md1/durable]
So, restart broker again:
[ Exchange-main/durable ] + [Exchange-alternate/durable]
[Qax1/durable][Qax2/durable]
[Md1/durable]