Is rabbitmq writing to disk asynchronous or synchronous? - rabbitmq

Is it possible to adjust asynchronous or synchronous like kafka?
When testing the throughput of rabbitmq, I found that with message persistence on, the throughput is only 1k-2k/s, but without message persistence, it can be 2w+/s.
The disk is SSD
kafka has a lot of options to adjust the mode of writing to the hard disk, but rabbitmq has no information on this! Is there anyone who knows more about it?

Related

RabbitMQ HA with Durable features

Background
I have a RabbitMQ cluster that running for more than a year without any problems. Lastly, I found that sometimes, the CPU of the machine is touching the 100% CPU. I'm investigating ways to increase the throughput of the cluster to serve more customers.
The cluster architecture is that we have HA enabled (exactly 1 replica), and durable messages (for all the queues). As I understand it, the durable feature is the most expensive one in terms of performance. So, I trying to understand if it is needed for me.
Question
According to my experience, the cluster was running for more than a year without problems. So I assume that the chance for a problem is very low. Even after this, I want to create another layer of protection, just in case...
If I have two servers that holding the same data, but not storing it into the disk (durable OFF), is not safe enough for 99.99% of the cases? Those two servers are in different regions so the chance that both of them will go down is very low. Wondering if saving it to the disk can be helpful, or just a waste?
There is a thumb rule about the performance improvements of disabling the durable feature? In percents.
Thank you!
The influence of durable on performance
For reliable delivery, rabbitmq use the publish confirmation mechanism. Everytime the publisher publish a message to rabbitmq server, the server will respond with basic.ack rpc to ack the message. For routable messages, the basic.ack is sent when a message has been accepted by all the queues. For persistent messages routed to durable queues, this means persisting to disk. For mirrored queues, this means that all mirrors have accepted the message. So as you mentioned, the IO may become bottlenect of performance.
Is it overhead both durable and mirrored
It depends on your consideration between performance and HA. Imagine if you declare non-durable mirrored queue, and the master and slave are down, your messages will get lost. So whether overhead depends on how important message safty is.
Is the performance bottleneck mainly caused by durable?
As we discussed, if you declare non-durable queue, the throught maybe increase. But this may not be the main cause of low performance. You have said the cpu usage sometimes is 100%, which means very little I/O waitting. The high load maybe due to many connections and high throughput. In order to determine how to increase throughput, you can use benchmark tool to find the bottleneck.
pages maybe useful:
https://www.cloudamqp.com/blog/2016-01-25-identify-and-protect-against-high-cpu-and-memory-usage.html
https://www.cloudamqp.com/blog/2018-01-08-part2-rabbitmq-best-practice-for-high-performance.html

What's the difference between RabbitMQ and kafka?

Which will fair better under different scenarios?
I know that RabbitMQ is based on AMQP protocol, and has visualization for the developer.
RabbitMQ, as you noted, is an implementation of AMQP. AMQP is a messaging protocol which originated in the financial industry. Its core metaphors are Messages, Exchanges, and Queues.
Kafka was designed as a log-structured distributed datastore. It has features that make it suitable for use as part of a messaging system, but it also can accommodate other use cases, like stream processing. It originated at LinkedIn and its core metaphors are Messages, Topics, and Partitions.
Subjectively, I find RabbitMQ more pleasant to work with: its web-based management tool is nice; there is little friction in creating, deleting, and configuring queues and exchanges. Library support is good in many languages. As in its default configurations Rabbit only stores messages in memory, latencies are low.
Kafka is younger, the tooling feels more clunky, and it has had relatively poor support in non-JVM languages, though this is getting better. On the other hand, it has stronger guarantees in the face of network partitions and broker loss, and since it is designed to move messages to disk as soon as possible, it can accommodate a larger data set on typical deployments. (Rabbit can page messages to disk but sometimes it doesn't perform well).
In either, you can design for direct (one:one), fanout (one:many), and pub-sub (many:many) communication patterns.
If I were building a system that needed to buffer massive amounts of incoming data with strong durability guarantees, I'd choose Kafka for sure. If I was working with a JVM language or needed to do some stream processing over the data, that would only reinforce the choice.
If, on the other hand, I had a use case in which I valued latency over throughput and could handle loss of transient messages, I'd choose Rabbit.
Kafka:
Message will be always there. You can manage this by specifying a
message retention policy.
It is distributed event streaming platform.
We can use it as a log
Kafka streaming, you can change and process the message automatically.
We can not set message priority
Retain order only inside a partition. In a partition, Kafka guarantees that the whole batch of messages either fail or pass.
Not many mature platforms like RMQ (Scala and Java)
RabbitMQ:
RabbitMQ is a queuing system so messages get deleted just after consume.
It is distributed, by a message broker.
It can not be used like this
Streaming is not supported in RMQ
We can set the priority of the message and can consume on the basis of the same.
Does not support guarantee automaticity even in relation to transactions involving a single queue.
Mature platform ( Have the best compatibility with all languages)

Specifying RabbitMQ messaging strategy on memeory or disc

I am new at RabbitMQ am wonder something about saving message strategy. By default RabbitMQ saves message queuses on memeory. This way is high performance. But messages are important and should be save on disc. Because server may down at any time. This way shows slower performace.
Which stuation should be prefable. What is your real world experience?
There is a whole lot regarding persistance here.
You can make queues durable, in that way messages are saved to the disk. Of course only until they are acknowledged!
You didn't say what is your use case and what do you need this for, but bare in mind that RAbbitMQ is not a database.

Redis PUB/SUB and high availability

Currently I'm working on a distributed test execution and reporting system. I'm planning to use Redis PUB/SUB as a message queue and message distribution system.
I'm new to Redis, so I'm trying to read as many docs as I can and play around with it. One of the most important topics is high availability. As I said, I'm not an expert, but I'm aware of the possible options - using Sentinel, replication, clustering, etc.
What's not clear for me is how the Pub/Sub feature and the HA options are related each other. What's the best practice to build a reliable messaging system with Redis? By reliable I mean if my Redis message broker is down there should be some kind of a backup node (a slave?) that should be able to take over this role.
Is there a purely server-side solution? Or do I need to create a smart wrapper around the Redis client to handle this? Will a Sentinel-driven setup help me?
Doing pub sub in Redis with failover means thinking about additional factors in the client side. A key piece to understand is that subscriptions are per-connection. If you are subscribed to a channel on a node and it fails, you will need to handle reconnect and resubscribe. Because subscriptions are done at the connection level it is not something which can be replicated.
Regarding the details as to how it works and what you can expect to see, along with ways around it see a post I made earlier this year at https://objectrocket.com/blog/how-to/reliable-pubsub-and-blocking-commands-during-redis-failovers
You can lower the risk surface by subscribing to slaves and publishing to the master, but you would then need to have non-promotable slaves to subscribe to and still need to handle losing a slave - there is just as much chance to lose a given slave as there is a master.
IMO, PUB/SUB is not a good choice, may be disque (comes from antirez, author of the Redis) fits better:
Disque, an in-memory, distributed job queue

Difference between Redis and Kafka

Redis can be used as realtime pub-sub just as Kafka.
I am confused which one to use when.
Any use case would be a great help.
Redis pub-sub is mostly like a fire and forget system where all the messages you produced will be delivered to all the consumers at once and the data is kept nowhere. You have limitation in memory with respect to Redis. Also, the number of producers and consumers can affect the performance in Redis.
Kafka, on the other hand, is a high throughput, distributed log that can be used as a queue. Here any number of users can produce and consumers can consume at any time they want. It also provides persistence for the messages sent through the queue.
Final Take:
Use Redis:
If you want a fire and forget kind of system, where all the messages that you produce are delivered instantly to consumers.
If speed is most concerned.
If you can live up with data loss.
If you don't want your system to hold the message that has been sent.
The amount of data that is gonna be dealt with is not huge.
Use kafka:
If you want reliability.
If you want your system to have a copy of messages that has been sent even after consumption.
If you can't live up with data loss.
If Speed is not a big concern.
data size is huge
Redis 5.0+ version provides the Stream data structure. It could be considered as a log data structure with delivery guarantees. It offers a set of blocking operations allowing consumers to wait for new data added to a stream by producers, and in addition to that, a concept called Consumer Groups.
Basically Stream structure provides the same capabilities as Kafka.
Here is the documentation https://redis.io/topics/streams-intro
There are two most popular Java clients that support this feature: Redisson and Jedis
Redisson provides ReliableTopic object if reliability of delivery is required. https://github.com/redisson/redisson/wiki/6.-distributed-objects/#613-reliable-topic