We're running a flask application and we do all our heavy processing with celery. We use a redis instance from amazon to be our message broker. We just had a fail, causing much pain and bleeding, so we're looking into fail-over strategies.
The first project that appeared to us was Celery sentinel. https://github.com/dealertrack/celery-redis-sentinel
Would this be something that would give us a fail-over capability?
We've been doing some tests, and it seems not to be working as anticipated.
In your case maybe moving the celery backend to RabbitMQ would be better, as RabbitMQ is a lot more persistent with its data
Related
I have an application which uses Celery with RabbitMQ as a broker.
Recently I've noticed that the application became slow. All RabbitMQ Connections had the state "blocking" and the used memory was pretty much at the high watermark.
After making sure that RabbitMQ has enough memory, a couple of connections went to the "running" state and the overall system normalized.
Now I want to be able to recognize this earlier. While I will improve the monitoring/alerting for RabbitMQ itself, I was wondering if it's possible to detect this state on the Celery app side. What does Celery do when all connections of the broker are blocking / when the broker has issues?
I have installed both celery and rabbitmq. Now i would like to track how many messages are there in the queue and how it is distributed, want to see the list of celery consumers and tasks they are executing etc. this is bcoz i had issues with celery getting stuck when there is a memory pressure. I tried installing rabbitmq management for a start and when i tried to login at myservr.com:15672 it said can only be used through localhost, is there any workaround? Also is it a good idea to run such monitoring on production servers? Will there be any chance for memory leaks?
We have been having below issues from RabbitMQ and had been manually restarting the servers every weekend as a work around.
Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions.
We have gone through other popular posts on the topic e.g. here and here
Our network is not highly reliable and occasional blips are expected but when it does come up I would have expected 1 of the 4 node RabbitMQ cluster to join the rest of cluster - as is the case with 4 nodes of Tomcat installed on same servers.
Although the nodes on single partition continue to run independently but doesnt seem like that is a graceful recovery from failure in one node.
We didnt have great luck with using any rabbitmqctl commands like rabbitmqctl cluster_status - It used to sporadically cause the rabbitmq process to hang which needed a sudo kill to RabbitMQ process.
We are at a point of evaluating moving to Kafka or any other message broker that handles message partition well
Any thoughts on working around not needing manual RabbitMQ restarts or ability of Kafka to handle such situation is highly appreciated
I think Kafka with replication should be able to handle network partitions quite easily, as long as the number of brokers partitioned is inferior to the replication factor of your topic (aka, the consumers and producers can always reach at least 1 broker for the topics they're operating with).
To avoid backpressure in the clients while Zookeeper discover the partition and propagate the information to the producers and consumer, you may want to set short ZK heartbeating (yes, you'll need ZK, and a cluster too since you absolutely don't want your whole ZK cluster partitioned).
Fair warning though : using a cluster of kafka brokers will drop the FIFO aspect of your message queue which can be pretty disturbing if you're expecting the same order of messages produced by the producers and read by the consumers, which you could expect with RabbitMQ.
I'm trying to figure out how HA works. (high availability queues)
The current configuration I have is: every machine has multiple celery workers and points to itself as broker. Each machine can do this rather than point at one broker machine because of HA; in this way, there is less load on any one machine, as all are brokers and have copies of the same queue.
My question is, is my above logic correct? Or do all workers need to point to one broker machine regardless of HA?
If you have looked at HA and clustering and have ensured that the queues mirror each other then what you are doing should be fine. But that may seem a tad inefficient to run it on every server where you run your workers.
The other option is to run your queues on a few servers for HA and have other servers running the workers to point to them. But since the celery worker config can only point to one broker url, you would need to work around that by possibly using a load balancer to which all workers will point to. This is to the best of what I've come to understand over the past few years on RabbitMQ HA for celery.
Are Activemq, Redis and Apache camel a right combination?
Am planning for a high performant enterprise level integration solution accross multiple applications
My objective is to make the solution
a. independent of the consumers performance
b. able to trouble shoot in case of any issue
c. highly available with failover support
d. Hanlde 10k msgs per second
Here I'm planning to have
a. network of activemq brokers running in all app servers and storing the consumed messages in redis data store
b. from redis data store, application can retrieve the messages through camel end points
(camel end point is chosen to process the messages before reaching the app).
Also can ActiveMQ be removed with only Redis + Apache camel, as I see from the discussions forms that Redis does most of the ActiveMQ stuff
Could any one advise on this technology stack.
ActiveMQ and Camel works great together and scales very well - should be no problem to handle the load given proper hardware.
Are you thinking about something like this?
Message producer App -> ActiveMQ -> Camel -> Redis
Message Consumer App <- Camel [some endpoint] <- Redis
Puting ActiveMQ in between is usually a very good way to achieve HA, load balancing and making the solution elastic. Depending on your specific setup with machines etc. ActiveMQ can help in many ways to solve HA issues.
Removing ActiveMQ can a good option if your apps use some other protocol than JMS/ActiveMQ messaging, i.e. HTTP, raw tcp or similar. Can you elaborate on how the apps will communicate with Camel? ActiveMQ, by default, supports transactions, guaranteed delivery and you can live with a limited number of threads on the server, even for your heavy traffic. For other protocols, this might be a bit trickier to achieve. Without a HA layer (cluster) in ActiveMQ you need to setup Redis to handle HA in all aspects, which might be just as easy, but Redis is a bit memory hungry, so be aware of that.