why use rabbitmq or similar versus python builtin multiprocessing queue? - rabbitmq

I have a producer of tasks and multiple workers to consume those tasks. Many places recommend rabbitmq and/or celery. However python has a builtin multiprocessing queue that can be shared on an ip/port using a manager/proxy. What would be the advantages of using something like rabbitmq instead?

RabbitMq is an enterprise level tool, typically deployed separately on out-of-process servers / VMs / Containers, and plays in the enterprise service bus space.
Rabbit has reliable messaging as an objective - e.g. messages are persisted, and nodes in the cluster can be restarted without losing messages.
Supports a large range of messaging topologies, such as Point-Point, Fan out, and Topic subscriptions
Can be scaled for volume by adding multiple nodes to a cluster
Allows for conditional routing of messages to queues using routing keys or header filters
Agnostic of client technology, i.e. Clients can be on any platform which support the AMQP protocol
Has an out of the box administration, monitoring and diagnostics UI
Has a wide range of extensions and tools, such as shovels allowing messages to be replicated across multiple RabbitMQ clusters.
I'm no Python expert, but from what I understand of the multiprocessing package, it serves as an manager for distributing work between worker processes and threads, so IMO would be regarded as a more local system concern, as opposed to 'enterprise' level.
e.g. you would need to handle persistence, i.e. so messages are not lost during a crash / restart, and would likely need to built your own administration and monitoring tools.

Related

Is Apache Kafka another API for JMS?

Is not Apache Kafka another implementation of JMS?
I am using JMS+AMQ in my application, and migrating to Apache Kafka. Do I have to change all JMS codes?
No, Kafka is different from JMS systems such as ActiveMQ.
see ActiveMQ vs Apollo vs Kafka
Kafka has less features than ActiveMQ, as the stress has been put on performances. So before migrating, check that the features you use in AMQ are in Kafka.
However, there is an open suggestion for a bridge between JMS and Kafka, to allow exactly what you need. Maybe the provided links can help you
https://issues.apache.org/jira/browse/KAFKA-1995
Actually, the two are not the same. And with a little more time seeing the two co-exist - and listening to problems and happy points from those deploying each in the field - there is a little more to say about each one.
Firstly, JMS supports both point-to-point messaging (where messages are sent to single consumers; the consumers themselves maintain their message queues) and the publish-and-subscribe (pub/sub) model (where messages are written to a single topic, and consumers, independently, decide which messages to consume).
In a point-to-point messaging architecture, message producers and consumers know each other, where as in a pub/sub model they do not. Apache Kafka focuses on a pub/sub model, maintaining a separate log/topic from which consumers read from offsets. Kafka is also built for the cloud, with high-throughput a core consideration.
Many in our community and at meetups throw their hands up in frustration at MOMs (message-oriented middlewares) like JMS and switch to Kafka, for, what boils down to one reason: scalability. They argue that Kafka is better suited for scale than other MOMs because Kafka maintains a partitioned topic log. In so doing, Kafka can split up message flow to groups of consumers by partition and batch transmit the messages.
This concept also allows Kafka to have more granular control over ACLs (access control) to Kafka Consumers, although there are some issues there, which Apache Pulsar is addressing.
Finally, on Kafka, since the client/consumer decides which messages to consume (by offset in the topic), this removes some of the producer-side complexity of routing rules built into MOMs like JMS.
There's more differences than that, but this is a distillation of some of the ones that keep coming up! Hope this helps.
No, Kafka uses its own non-standard protocol and clients.
However, there's a 3rd-party JMS Client for Kafka from Confluent.

Real world example of Apache Helix, Zookeeper, Mesos and Erlang?

I am new in
Apache ZooKeeper : ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Apache Mesos : Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.
Apache Helix : Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
Erlang Langauge : Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability.
It sounds to me that Helix and Mesos both are useful for Clustering management System. How they are related to ZooKeeper? It'd better if someone give me a real world example for their usage.
I am curious to know How [BOINC][1] are distributing tasks to their clients? Are they using any of the above technologies? (Forget about Erlang).
I just need a brief view on it :)
Erlang was built by Ericsson, designed for use in phone systems. By design, it runs hundreds, thousands, or even 10s of thousands of small processes to handle tasks by sending information between them instead of sharing memory or state. This enables all sorts of interesting features that are great for high availability distributed systems such as:
hot code reloading. Each process is paused, it's relevant module code is swapped out, and it is resumed where it left off, so deploys can happen without restarting or causing significant interruption.
Easy distributed messaging and clustering. Sending a message to a local process or a remote one is fairly seamless in most instances.
Process-local GC. Garbage collection happens in each process independently instead of a global stop-the-world even like java, aiding in low-latency results.
Supervision trees and complex process hierarchy and monitoring/managing.
A few concrete real-world examples that makes great use of Erlang would be:
MongooseIM A highly performant and incredibly scalable, distributed XMPP / Chat server
Riak A distributed key/value store.
Mesos, on the other hand, you can sort of think of as a platform effectively for turning a datacenter of servers into a platform for teams and developers. If I, say as a company, own a datacenter with 10,000 physical servers, and I have 1,000 engineers developing hundreds of services, a good way to allow the engineers to deploy and manage services across that hardware without them needing to worry about the servers directly. It's an abstraction layer over-top of the physical servers to that allows you to share and intelligently allocate resources.
As a user of Mesos, I might say that I have Service X. It's an executable bundle that lives in location Y. Each instance of Service X needs 4 GB of RAM and 2 cores. And I need 8 instances which will be attached to a load balancer. You can specify this in configuration and deploy based on that config. Mesos will find hardware that has enough ram and CPU capacity available to handle each instance of that service and start it running in each of those locations.
It can handle a lot of other more complex topics about the orchestration of them as well, but that's probably a bit in-depth for this :)
Zookeepers most common use cases are Service Discover and configuration management. You can think of it, fundamentally, a bit like a nested key value store, where services can look at pre-defined paths to see where other services currently live.
A simple example is that I have a web service using a shared database cluster. I know a simple name for that database cluster and where the configuration for it lives in zookeeper. I can look up (or repeatedly poll) that path in zookeeper to check what the addresses of the active database hosts are. And on the other side, if I take a database node out of rotation and replace it with a new one, the config in zookeeper gets updated with the new address, and anything continually looking at it will detect this change and change where it's connected to.
A more complex use case for zookeeper is how Kafka uses it (or did at the time that I last used Kafka). Kafka has streams, and streams have many shards. Each consumer of each stream use zookeeper to save checkpoints in each shard after they have read and processed up to a certain point in the stream. That way if the consumer crashes or is restarted, it knows where to pick up in the stream.
I dont know about Meos and Earlang language. But this article might help you with Helix and Zookeeper.
This article tells us:
Zookeeper is responsible for gluing all parts together where Helix is cluster management component that registers all cluster details (cluster itself, nodes, resources).
The article is related to clustering in JBPM using helix and zookeeper.But with this you will get a basic idea on what helix and zookeeper is used for.
And from most of the articles i read online it seems like zookeeper and helix are used together.
Apache Zookeeper can be installed on a single machine or on a cluster.
It can be used to keep track of logs. It can provide various services on a distributed platform.
Storm and Kafka rely on Zookeeper.
Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.
Kafka queue consumers can use Zookeeper to store information on what has been consumed from the queue.

NServicebus+RabbitMQ and Distributor

NServiceBus Distributor/Worker pattern makes perfect sense for MSMQ due to the hard requirement of local input queues.
But this is not the case with RabbitMQ, I am trying to understand how and when the NServiceBus distributor is relevant with RabbitMQ. With RabbitMQ multiple workers can read from the same remote queue.
The actual scenario is similar to using an AWS auto-scaling group to scale out workers pointing to a high available RabbitMQ cluster. Now avoiding distributor altogether makes the setup much simpler to build, test and provision.
Thoughts?
As RabbitMQ transport falls into the broker style bus, so, in your use case, it would make more sense not to use the distributor.
The same goes for all broker-style transports, where you can use a competing consumer pattern to scale out.
NServiceBus is an excellent system and does wonders in most message queuing system where you don't have an integrated distributor (which you do with exchanges in RabbitMQ). We use NServiceBus here at our company.
Azure Queues and MSMQ are perfect examples of such queuing technologies.
NServiceBus handles the distribution internally and therefore reproduces this capability for you.
However... If you are blessed with the possibility of imposing what queuing technology you can use, then I would highly encourage you to look into RabbitMQ and a product (Open Source) called MassTransit
http://masstransit-project.com/
MassTransit can in turn function in the two modes and will either delegate or simulate the distribution for you - however I nonetheless have a soft spot for NServiceBus as do our senior devs here.
Per this page...
http://docs.particular.net/nservicebus/load-balancing-with-the-distributor
Using the distributor is only useful when using MSMQ - if you aren't using MSMQ then there is no point. RabbitMQ and other transport will allow access to the same queue from multiple consumers, while MSMQ will not. The distributor in a nutshell will take messages from the main queue and distribute them across multiple worker queues as they report that they are done with whatever they are working on.

Are Activemq, Redis and Apache camel a right combination?

Are Activemq, Redis and Apache camel a right combination?
Am planning for a high performant enterprise level integration solution accross multiple applications
My objective is to make the solution
a. independent of the consumers performance
b. able to trouble shoot in case of any issue
c. highly available with failover support
d. Hanlde 10k msgs per second
Here I'm planning to have
a. network of activemq brokers running in all app servers and storing the consumed messages in redis data store
b. from redis data store, application can retrieve the messages through camel end points
(camel end point is chosen to process the messages before reaching the app).
Also can ActiveMQ be removed with only Redis + Apache camel, as I see from the discussions forms that Redis does most of the ActiveMQ stuff
Could any one advise on this technology stack.
ActiveMQ and Camel works great together and scales very well - should be no problem to handle the load given proper hardware.
Are you thinking about something like this?
Message producer App -> ActiveMQ -> Camel -> Redis
Message Consumer App <- Camel [some endpoint] <- Redis
Puting ActiveMQ in between is usually a very good way to achieve HA, load balancing and making the solution elastic. Depending on your specific setup with machines etc. ActiveMQ can help in many ways to solve HA issues.
Removing ActiveMQ can a good option if your apps use some other protocol than JMS/ActiveMQ messaging, i.e. HTTP, raw tcp or similar. Can you elaborate on how the apps will communicate with Camel? ActiveMQ, by default, supports transactions, guaranteed delivery and you can live with a limited number of threads on the server, even for your heavy traffic. For other protocols, this might be a bit trickier to achieve. Without a HA layer (cluster) in ActiveMQ you need to setup Redis to handle HA in all aspects, which might be just as easy, but Redis is a bit memory hungry, so be aware of that.

rabbitmq HA cluster

I am wanting to setup RabbitMQ as a two (or more) node cluster with HA.
Use case: a client producer app (C#.NET) knows that the cluster has two nodes and publishes to the cluster. Various consumer apps (also C#.NET) connect to the cluster and get all messages generated by the producer. So long as at least one node is up and running the producer and consumers will all continue to work without error. Supposing nodes A and B are running and B dies for a while, then gets restarted, then a while later A dies, the clients all continue to function without receiving an error since at all times at least one node is up.
Can it be made to work like this out of the box?
Are there any other MQs that would be more appropriate (commercial ok) for a Windows/.NET application environment?
RabbitMQ v2.6.0 now supports high-availability queues using active/active clustering. Microsoft and a number of other companies have collaborated on Apache QPid which has C# bindings and which also supports active/active HA clustering.
Can it be made to work like this out of the box?
No. When a node goes down, all of its connections are closed. Since AMQP connections are stateful, there's no way around this. What you could achieve is 1) broker goes down, 2) all clients disconnect, 3) clients connect to other node (masquerading as original) and are none the wiser.
On a side note, rabbit does not support active-active HA clustering at the moment. It does support active-passive clustering and a form of logical clustering (which might be what you're looking for).