The Scenario:
We have multiple nodes distributed geographically on which we want to have queues collecting messages for that location. And then we want to send this collected data from every queue in every node to their corresponding queues in a central location. In the central node, we will pull out data collected in the queues (from other nodes), process it and store it persistently.
Constraints:
Data is very important to us. Therefore, we have to make sure that we are not loosing data in any case.
Therefore, we need persistent queues on every node so that even if the node goes down for some random reason, when we bring it up we have the collected data safe with us and we can send it to the central node where it can be processed.
Similarly, if the central node goes down, the data must remain at all the other nodes so that when the central node comes up we can send all the data to the central node for processing.
Also, the data on the central node must not get duplicated or stored again. That is data collected on one of the nodes should be stored on the central nodes only once.
The data that we are collecting is very important to us and the order of data delivery to the central node is not an issue.
Our Solution
We have considered a couple of solutions out of which I am going to list down the one that we thought would be the best. A possible solution (in our opinion) is to use Redis to maintain queues everywhere because Redis provides persistent storage. Then perhaps have a daemon running on all the geographically separated nodes which reads the data from the queue and sends it to the central node. The central node on receiving the data sends an ACK to the node it received the data from (because data is very important to us) and then on receiving the ACK, the node deletes the data from the queue. Of course, there will be timeout period in which the ACK must be received.
The Problem
The above stated solution (according to us) will work fine but the issue is that we don't want to implement the whole synchronization protocol by ourselves for the simple reason that we might be wrong here. We were unable to find this particular way of synchronization in Redis. So we are open to other AMQP based queues like RabbitMQ, ZeroMQ, etc. Again we were not able to figure out if we can do this with these solutions.
Do these Message Queues or any other data store provide features that can be the solution to our problem? If yes, then how?
If not, then is our solution good enough?
Can anyone suggest a better solution?
Can there be a better way to do this?
What would be the best way to make it fail safe?
The data that we are collecting is very important to us and the order of data delivery to the central node is not an issue.
You could do this with RabbitMQ by setting up the central node (or cluster of nodes) to be a consumer of messages from the other nodes, and using the message acknowledgement feature. This feature means that the central node(s) can ack delivery, so that other nodes only delete messages after the ack. See for example: http://www.rabbitmq.com/tutorials/tutorial-two-python.html
If you have further questions please email the mailing list rabbitmq-discuss.
Related
I have a question related to a tricky situation in an event-driven system that I want to ask for advise. Here is the situation:
In our system, I use redis as a memcached database, and kafkaa as message queues. To increase the performance of redis, I use lua scripting to process data, and at the same time, push events into a blocking list of redis. Then there will be a process to pick redis events in that blocking list and move them to kafka. So in this process, there are 3 steps:
1) Read events from redis list
2) Produce in batch into kafka
3) Delete corresponding events in redis
Unfortunately, if the process dies between 2 and 3, meaning that after producing all events into kafka, it doesn't delete corresponding events in redis, then after that process is restarted, it will produce duplicated events into kafka, which is unacceptable. So does any one has any solution for this problem. Thanks in advance, I really appreciate it.
Kafka is prone to reprocess events, even if written exactly once. Reprocessing will almost certainly be caused by rebalancing clients. Rebalancing might be triggered by:
Modification of partitions on a topic.
Redeployment of servers and subsequent temporary unavailabilty of clients.
Slow message consumption and subsequent recreation of client by the broker.
In other words, if you need to be sure that messages are processed exactly once, you need to insure that at the client. You could do so, by setting a partition key that ensures related messages are consumed in a sequential fashion by the same client. This client could then maintain a databased record of what he has already processed.
I have a cluster of backend servers on GCP, and they need to send messages to each other. All the servers need to receive every message, but I can tolerate a low error rate. I can deal with receiving the message more than once on a given server. Packet ordering doesn't matter.
I don't need much of a persistence layer. A message becomes stale within a couple of seconds after sending it.
I wired up Google Cloud PubSub and pretty quickly realized that for a given subscription, you can have any number of subscribers but only one of them is guaranteed to get the message. I considered making the subscribers all fail to ack it, but that seems like a gross hack that probably won't work well.
My server cluster is sized dynamically by an autoscaler. It spins up VM instances as needed, with dynamic hostnames and IP addresses. There is no convenient way to map the dynamic hosts to static subscriptions, but it feels like that's my only real option: Create more subscriptions than my max server pool size, and then use some sort of paxos system (runtime config, zookeeper, whatever) to allocate servers to subscriptions.
I'm starting to feel that even though my use case feels really simple ("Every server can multicast a message to every other server in my group"), it may not be a good fit for Cloud PubSub.
Should I be using GCM/FCM? Or some other technology?
Cloud Pub/Sub may or may not be a fit for you, depending on the size of your server cluster. Failing to ack the messages certainly won't work because you can't be sure each instance will get the message; it could just be redelivered to the same instance over and over again.
You could use multiple subscriptions and have each instance create a new subscription when it starts up. This only works if you don't plan to scale beyond 10,000 instances in your cluster, as that is the maximum number of subscriptions per topic allowed. The difficulty here is in cleaning up subscriptions for instances that go down. Ones that cleanly shut down could probably delete their own subscriptions, but there will always be some that don't get cleaned up. You'd need some kind of external process that can determine if the instance for each subscription is still up and running and if not, delete the subscription. You could use GCE shutdown scripts to catch this most of the time, though there will still be edge cases where deletes would have to be done manually.
I've found different zookeeper definitions across multiple resources. Maybe some of them are taken out of context, but look at them pls:
A canonical example of Zookeeper usage is distributed-memory computation...
ZooKeeper is an open source Apacheā¢ project that provides a centralized infrastructure and services that enable synchronization across a cluster.
Apache ZooKeeper is an open source file application program interface (API) that allows distributed processes in large systems to synchronize with each other so that all clients making requests receive consistent data.
I've worked with Redis and Hazelcast, that would be easier for me to understand Zookeeper by comparing it with them.
Could you please compare Zookeeper with in-memory-data-grids and Redis?
If distributed-memory computation, how does zookeeper differ from in-memory-data-grids?
If synchronization across cluster, than how does it differs from all other in-memory storages? The same in-memory-data-grids also provide cluster-wide locks. Redis also has some kind of transactions.
If it's only about in-memory consistent data, than there are other alternatives. Imdg allow you to achieve the same, don't they?
https://zookeeper.apache.org/doc/current/zookeeperOver.html
By default, Zookeeper replicates all your data to every node and lets clients watch the data for changes. Changes are sent very quickly (within a bounded amount of time) to clients. You can also create "ephemeral nodes", which are deleted within a specified time if a client disconnects. ZooKeeper is highly optimized for reads, while writes are very slow (since they generally are sent to every client as soon as the write takes place). Finally, the maximum size of a "file" (znode) in Zookeeper is 1MB, but typically they'll be single strings.
Taken together, this means that zookeeper is not meant to store for much data, and definitely not a cache. Instead, it's for managing heartbeats/knowing what servers are online, storing/updating configuration, and possibly message passing (though if you have large #s of messages or high throughput demands, something like RabbitMQ will be much better for this task).
Basically, ZooKeeper (and Curator, which is built on it) helps in handling the mechanics of clustering -- heartbeats, distributing updates/configuration, distributed locks, etc.
It's not really comparable to Redis, but for the specific questions...
It doesn't support any computation and for most data sets, won't be able to store the data with any performance.
It's replicated to all nodes in the cluster (there's nothing like Redis clustering where the data can be distributed). All messages are processed atomically in full and are sequenced, so there's no real transactions. It can be USED to implement cluster-wide locks for your services (it's very good at that in fact), and tehre are a lot of locking primitives on the znodes themselves to control which nodes access them.
Sure, but ZooKeeper fills a niche. It's a tool for making a distributed applications play nice with multiple instances, not for storing/sharing large amounts of data. Compared to using an IMDG for this purpose, Zookeeper will be faster, manages heartbeats and synchronization in a predictable way (with a lot of APIs for making this part easy), and has a "push" paradigm instead of "pull" so nodes are notified very quickly of changes.
The quotation from the linked question...
A canonical example of Zookeeper usage is distributed-memory computation
... is, IMO, a bit misleading. You would use it to orchestrate the computation, not provide the data. For example, let's say you had to process rows 1-100 of a table. You might put 10 ZK nodes up, with names like "1-10", "11-20", "21-30", etc. Client applications would be notified of this change automatically by ZK, and the first one would grab "1-10" and set an ephemeral node clients/192.168.77.66/processing/rows_1_10
The next application would see this and go for the next group to process. The actual data to compute would be stored elsewhere (ie Redis, SQL database, etc). If the node failed partway through the computation, another node could see this (after 30-60 seconds) and pick up the job again.
I'd say the canonical example of ZooKeeper is leader election, though. Let's say you have 3 nodes -- one is master and the other 2 are slaves. If the master goes down, a slave node must become the new leader. This type of thing is perfect for ZK.
Consistency Guarantees
ZooKeeper is a high performance, scalable service. Both reads and write operations are designed to be fast, though reads are faster than writes. The reason for this is that in the case of reads, ZooKeeper can serve older data, which in turn is due to ZooKeeper's consistency guarantees:
Sequential Consistency
Updates from a client will be applied in the order that they were sent.
Atomicity
Updates either succeed or fail -- there are no partial results.
Single System Image
A client will see the same view of the service regardless of the server that it connects to.
Reliability
Once an update has been applied, it will persist from that time forward until a client overwrites the update. This guarantee has two corollaries:
If a client gets a successful return code, the update will have been applied. On some failures (communication errors, timeouts, etc) the client will not know if the update has applied or not. We take steps to minimize the failures, but the only guarantee is only present with successful return codes. (This is called the monotonicity condition in Paxos.)
Any updates that are seen by the client, through a read request or successful update, will never be rolled back when recovering from server failures.
Timeliness
The clients view of the system is guaranteed to be up-to-date within a certain time bound. (On the order of tens of seconds.) Either system changes will be seen by a client within this bound, or the client will detect a service outage.
I'm using 6 servers to make a cluster and they are all disk nodes. I use rabbitmq for collecting log file for our website. Now at the peak hour, the publish rate is about 30k message per second. There are 2 main consumers(hdfs and elasticsearch) and each one need to handle all message, so the delivery rate hit about 60k per second.
In my scenario, a single server can hold 10k delivery rate and I use 6 node to load balance the pressure. My solution is that I created 2 queues on each node. Each message is with a random routing-key(something like message.0, message.1, etc) to distribute the pressure to every node.
What confused me is:
All message send to one node. Should I use a HA Proxy to load balance this publish pressure?
Is there any performance difference between Durable Queues and Transient Queues?
Is there any performance difference between Memory Node and Disk Node? What I know is the difference between memory node and disk node is only about the meta data such as queue configuration.
How can I imrove the performance in publish and delivery codes? I've researched and I know several methods:
disable the confirm mechanism(in publish codes?)
enable HiPE(I've done that and it helped a lot)
For example, input is 1w mps(message per second), there are two consumers to consume all message. Then the output is 2w mps. If my server can handle 1w mps, I need two server to handle the 2w-mps-pressure. Now a new consumer need to consume all message, too. As a result, output hits 3w mps, so I need another one more server. For a conclusion, one more consumer for all message, one more server?
"All message send to one node. Should I use a HA Proxy to load balance this publish pressure?"
This article outlines a number of designs aimed at distributing load in RabbitMQ.
"Is there any performance difference between Durable Queues and Transient Queues?"
Yes, Durable Queues are backed up to disk so that they can be reinstated on server-restart, for example. This adds a nominal overhead, though the actual process occurs asynchronously.
"Is there any performance difference between Memory Node and Disk Node?"
Not that I'm aware of, but that would depend on the machine itself.
"How can I imrove the performance in publish and delivery codes?"
Try this out.
I have implemented a wcf service and now, my client wants it to have three copies of it, working independently on different machines. A master-slave approach. I need to find a solution that will enable behavior:
the first service that is instantiated "asks" the other two "if they are alive?" - if no, then it becomes a master and it is the one that is active on the net. The other two, once instantiated see that there is already a master alive, so they became slaves and start sleeping. There needs to be some mechanisms to periodically check if master is not dead and if so, choses the next copy that is alive to became a master (until it becomes dead respectively)
This i think should be a kind of an architectural pattern, so I would be more than glad to be given any advices.
thanks
I would suggest looking at the WCF peer channel (System.Net.PeerToPeer) to facilitate each node knowing about the other nodes. Here is a link that offers a decent introduction.
As for determining which node should be the master, the trick will be negotiating which node should be the master if two or more nodes come online at about the same time. Once the nodes become aware of each other, there needs to be some deterministic mechanism for establishing the master. For example, you could use the earliest creation time, the lowest value of the last octet of each node's IP address, or anything really. You'll just need to define some scheme that allows the nodes to negotiate this automatically.
Finally, as for checking if the master is still alive, I would suggest using the event-based mechanism described here. The master could send out periodic health-and-status events that the other nodes would register for. I'd put a try/catch/finally block at the code entry point so that if the master were to crash, it could publish one final MasterClosing event to let the slaves know it's going away. What this does not account for is a system crash, e.g., power failure, etc. To handle this, provide a timeout in the slaves so that when the timeout expires, the slaves can query the master to see if it's still there. If not, the slaves can negotiate between themselves using your deterministic algorithm about who should be the next master.