I was just wondering why would you use a something like RabbitMQ instead of a persistent store especially a document store like MongoDB? Arent they kinda the same? Whats the benefit of something like RabbitMQ over a database?
Would anyone who used something like RabbitMQ elaborate on the benefits?
RabbitMQ is a message broker software aka a queue and not a NoSql database!
While the trend goes towards storing more and more data in scaled-up queues as well as processing data at real time and thus obliterating the need for additional data storage, queues are not to be confused with databases:
most queues don't persist data indefinitely.
the data in queues is not available on demand by the use of queries, but accessed via an automatically triggered consumer mechanism.
the architectural intention behind queues differs tremendously from that of databases. Their purpose in a system's architecture is not data storage, but system integration and data distribution. For more good information on queue architecture, please check this article from the Kafka guys.
Related
We have been using ActiveMQ version 5.16.0 broker with single instances in production. Now we are planning to use cluster of AMQ brokers for HA and load distribution with consistency in message data. Currently we are using only one queue
HA can be achieved using failover but do we need to use the same datastore or it can be separated? If I use different instances for AMQ brokers then how to setup a common datastore.
Please guide me how to setup datastore for HA and load distribution
Multiple ActiveMQ servers clustered together can provide HA in a couple ways:
Scale message flow by using compute resources across multiple broker nodes
Maintain message flow during single node planned or unplanned outage of a broker node
Share data store in the event of ActiveMQ process failure.
Network of brokers solve #1 and #2. A standard 3-node cluster will give you excellent performance and ability to scale the number of producers and consumers, along with splitting the full flow across 3-nodes to provide increased capacity.
Solving for #3 is complicated-- in all messaging products. Brokers are always working to be completely empty-- so clustering the data store of a single-broker becomes an anti-pattern of sorts. Many times, relying on RAID disk with a single broker node will provide higher reliability than adding NFSv4, GFSv2, or JDBC and using shared-store.
That being said, if you must use a shared store-- follow best practices and use GFSv2 or NFSv4. JDBC is much slower and requires significant DB maintenance to keep running efficiently.
Note: [#Kevin Boone]'s note about CIFS/SMB is incorrect and CIFS/SMB should not be used. Otherwise, his responses are solid.
You can configure ActiveMQ so that instances share a message store, or so they have separate message stores. If they share a message store, then (essentially) the brokers will automatically form a master-slave configuration, such that only one broker (at a time) will accept connections from clients, and only one broker will update the store. Clients need to identify both brokers in their connection URIs, and will connect to whichever broker happens to be master.
With a shared message store like this, locks in the message store coordinate the master-slave assignment, which makes the choice of message store critical. Stores can be shared filesystems, or shared databases. Only a few shared filesystem implementations work properly -- anything based on NFS 4.x should work. CIFS/SMB stores can work, but there's so much variation between providers that it's hard to be sure. NFS v3 doesn't work, however well-implemented, because the locking semantics are inappropriate. In any case, the store needs to be robust, or replicated, or both, because the whole broker cluster depends on it. No store, no brokers.
In my experience, it's easier to get good throughput from a shared file store than a shared database although, of course, there are many factors to consider. Poor network connectivity will make it hard to get good throughput with any kind of shared store (or any kind of cluster, for that matter).
When using individual message stores, it's typical to put the brokers into some kind of mesh, with 'network connectors' to pass messages from one broker to another. Both brokers will accept connections from clients (there is no master), and the network connections will deal with the situation where messages are sent to one broker, but need to be consumed from another.
Clients' don't necessarily need to specify all brokers in their connection URIs, but generally will, in case one of the brokers is down.
A mesh is generally easier to set up, and (broadly speaking) can handle more client load, than a master-slave with shared filestore. However, (a) losing a broker amounts to losing any messages that were associated with it (until the broker can be restored) and (b) the mesh interferes with messaging patterns like message grouping and exclusive consumers.
There's really no hard-and-fast rule to determine which configuration to use. Many installers who already have some sort of shared store infrastructure (a decent relational database, or a clustered NFS, for example) will tend to want to use it. The rise in cloud deployments has had the effect that mesh operation with no shared store has become (I think) a lot more popular, because it's so symmetric.
There's more -- a lot more -- that could be said here. As a broad question, I suspect the OP is a bit out-of-scope for SO. You'll probably get more traction if you break your question up into smaller, more focused, parts.
I am new at RabbitMQ am wonder something about saving message strategy. By default RabbitMQ saves message queuses on memeory. This way is high performance. But messages are important and should be save on disc. Because server may down at any time. This way shows slower performace.
Which stuation should be prefable. What is your real world experience?
There is a whole lot regarding persistance here.
You can make queues durable, in that way messages are saved to the disk. Of course only until they are acknowledged!
You didn't say what is your use case and what do you need this for, but bare in mind that RAbbitMQ is not a database.
Currently I'm working on a distributed test execution and reporting system. I'm planning to use Redis PUB/SUB as a message queue and message distribution system.
I'm new to Redis, so I'm trying to read as many docs as I can and play around with it. One of the most important topics is high availability. As I said, I'm not an expert, but I'm aware of the possible options - using Sentinel, replication, clustering, etc.
What's not clear for me is how the Pub/Sub feature and the HA options are related each other. What's the best practice to build a reliable messaging system with Redis? By reliable I mean if my Redis message broker is down there should be some kind of a backup node (a slave?) that should be able to take over this role.
Is there a purely server-side solution? Or do I need to create a smart wrapper around the Redis client to handle this? Will a Sentinel-driven setup help me?
Doing pub sub in Redis with failover means thinking about additional factors in the client side. A key piece to understand is that subscriptions are per-connection. If you are subscribed to a channel on a node and it fails, you will need to handle reconnect and resubscribe. Because subscriptions are done at the connection level it is not something which can be replicated.
Regarding the details as to how it works and what you can expect to see, along with ways around it see a post I made earlier this year at https://objectrocket.com/blog/how-to/reliable-pubsub-and-blocking-commands-during-redis-failovers
You can lower the risk surface by subscribing to slaves and publishing to the master, but you would then need to have non-promotable slaves to subscribe to and still need to handle losing a slave - there is just as much chance to lose a given slave as there is a master.
IMO, PUB/SUB is not a good choice, may be disque (comes from antirez, author of the Redis) fits better:
Disque, an in-memory, distributed job queue
Redis can be used as realtime pub-sub just as Kafka.
I am confused which one to use when.
Any use case would be a great help.
Redis pub-sub is mostly like a fire and forget system where all the messages you produced will be delivered to all the consumers at once and the data is kept nowhere. You have limitation in memory with respect to Redis. Also, the number of producers and consumers can affect the performance in Redis.
Kafka, on the other hand, is a high throughput, distributed log that can be used as a queue. Here any number of users can produce and consumers can consume at any time they want. It also provides persistence for the messages sent through the queue.
Final Take:
Use Redis:
If you want a fire and forget kind of system, where all the messages that you produce are delivered instantly to consumers.
If speed is most concerned.
If you can live up with data loss.
If you don't want your system to hold the message that has been sent.
The amount of data that is gonna be dealt with is not huge.
Use kafka:
If you want reliability.
If you want your system to have a copy of messages that has been sent even after consumption.
If you can't live up with data loss.
If Speed is not a big concern.
data size is huge
Redis 5.0+ version provides the Stream data structure. It could be considered as a log data structure with delivery guarantees. It offers a set of blocking operations allowing consumers to wait for new data added to a stream by producers, and in addition to that, a concept called Consumer Groups.
Basically Stream structure provides the same capabilities as Kafka.
Here is the documentation https://redis.io/topics/streams-intro
There are two most popular Java clients that support this feature: Redisson and Jedis
Redisson provides ReliableTopic object if reliability of delivery is required. https://github.com/redisson/redisson/wiki/6.-distributed-objects/#613-reliable-topic
Why does Redis, a datastore, have Pub/Sub features? My first thought is that it's the wrong layer to implement such a thing. But maybe I need to think outside the box.
Redis is defined as data structure server. Redis provides multiple functionality like memcache, queue, pubsub etc. This is very useful for a cloudapp/webstack where 3 components RabbitMQ(queuing) + XMPP(pubsub) + Memcache can be currently replaced with redis. Queuing is not as feature rich as RabbitMQ though.
That would be true if it was about feeds for end users to subscribe to. Actually it's closer to the concept of events or database triggers - a process that knows the internals of the datastore keeps a connection open and does something when a change happens.