activemq tuning for 20000 threads - activemq

I have running ActiveMQ which connects thru stomp port with 20000+ servers at same time to publish and consume message. The activemq server is running 8CPU and 32G memory. I have assigned JVM max memory as -Xmx16384m . But still when all the servers are connected with this ActiveMQ, server gets over loaded and Virtual Memory usage about 21G and cpu utilization is about 500 some times.
Not sure whether JVM uses that much or anyother process utilizing in this activemq and tried with many tunings and no improvements.

Maybe you should reconsider the architecture. If you really need that many servers you may want to try a non blocking messaging bus, like ActiveMQ Artemis. I don't know for sure how many STOMP client it will support under your setup but it's worth a try. Keeping that many clients a separate threads will have a huge memory footprint and I think Artemis will handle such cases better. Not sure for STOMP though.

Related

RabbitMQ as Message Broker used by Spring Websocket dies under load

I develop an application where we need to handle 160k concurrent users which are connected to the backend via a websocket connection.
We decided to use the spring websocket implementation and RabbitMQ as the message broker.
In our application every user needs to subscribe to its user queue /exchange/amq.direct/update as well as to another queue where also other users can potential subscribe to /topic/someUniqueName.
In our first performance test we did the naive approach where every user subscribes to two new queues.
When running the test RabbitMQ dies silently when around 800 users are connected at the same time, so around 1600 queues are active (See the graph of all RabbitMQ objects here).
I read though that you should be careful opening many connections to RabbitMQ.
Now I wonder if the approach that is anticipated by Spring Websocket with opening one queue per user is a conceptional problem for systems with high load or if there is another error in my system.
Limiting factors for RabbitMQ are usually:
memory (can be checked in dashboard) that needs to grow with number of messages and number of queues (if you don't use lazy queues that go directly to disk).
maximum number of file descriptors (at least 1 per connection) that often defaults to too low values on many distributions (ref: https://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-April/019615.html)
CPU for routing the messages
I did find the issue. I actually misconfigured the RabbitMQ service and just gave it a 1024 file descriptor limit. Increasing it solved the issue.

what are the maximum mqtt connections supported by activemq 5.10.0

I want to support around 100K mqtt connections using activemq. The activemq server is rejecting connections beyond 30K. How to tune activemq to support more number of connections.
I have tried the following
transportConnector name="mqtt" allowLinkStealing="true"
uri="mqtt+nio://0.0.0.0:1883?maximumConnections=100000&wireFormat.maxFrameSize=104857600&transport.defaultKeepAlive=60000&transport.closeAsync=false&useQueueForAccept=false
in activemq.xml but of no use.
I did some unix kernel tuning for number of open file fds to 100000.
Any one solved this problem ?
If you are going to handle > 100k connections I'd recommend looking into a dedicated MQTT broker instead of a multi-protocol message broker. You can see a list of MQTT brokers at the MQTT Github wiki.
ActiveMQ is afaik not designed for handling that much MQTT connections and is not optimized for MQTT because it's a multi-purpose Message Queue. If you want to stick with Apache software, perhaps using Apache Apollo can help although I don't know of any MQTT Apollo deployments with that size, but probably wort a try if you need a multi-protocol broker. Again, I'd recommend a dedicated MQTT broker for large amounts of MQTT connections.
You should definitely look into reactive and multi-threaded MQTT brokers if you want to handle that amount of connections and you should make sure that the MQTT broker you choose is known to work with your desired connection amount and load. HiveMQ for example is capable of handling >100k connections.
Full disclosure: I work for the company behind HiveMQ.
May I suggest you use Apache Apollo for MQTT connections when you have that number of concurrent sessions?
Apache Apollo is a sub project of ActiveMQ with the intent to make the broker scalable to a large number of connected clients. While ActiveMQ supports MQTT, it's not really optimized for this scenario.
JoramMQ (http://jorammq.com) is based on the Joram (http://joram.ow2.org) multi-protocol message broker and it supports more than 500K concurrent MQTT connections.
For anyone still trying to find a fitting MQTT broker for many connections here are my tests of multiple brokers (I should actually add ActiveMQ to the comparison). Performance is not the only thing to compare, but also clustering, monitoring, support, price. Final pick depeneds on your own needs.
Tests were conducted on a 32GB RAM, AMD 5800X, Ubuntu 18 PC.
50 000 MQTT clients connected with no ssl.
Clients subscribed to 4 channels & no messages were published.
Tests above 50k need multiple machines involved or some other tricks because of the 65k limit of outgoing sockets in the system.
Test results
RabbitMQ: 21GB of RAM and ~4 cores.
Mosquitto: 200Mb of RAM and ~0.05 core.
HiveMQ: 2.1GB of RAM and ~0.05 core.
EMQX: 1.4GB of RAM and ~1
core.
VerneMQ: 1.7GB of RAM and ~0.5 core.
If pricing is OK for you - HiveMQ lookes to me like the best broker.
If you are looking for something for free - check VerneMQ.

Real world example of Apache Helix, Zookeeper, Mesos and Erlang?

I am new in
Apache ZooKeeper : ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Apache Mesos : Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.
Apache Helix : Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
Erlang Langauge : Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability.
It sounds to me that Helix and Mesos both are useful for Clustering management System. How they are related to ZooKeeper? It'd better if someone give me a real world example for their usage.
I am curious to know How [BOINC][1] are distributing tasks to their clients? Are they using any of the above technologies? (Forget about Erlang).
I just need a brief view on it :)
Erlang was built by Ericsson, designed for use in phone systems. By design, it runs hundreds, thousands, or even 10s of thousands of small processes to handle tasks by sending information between them instead of sharing memory or state. This enables all sorts of interesting features that are great for high availability distributed systems such as:
hot code reloading. Each process is paused, it's relevant module code is swapped out, and it is resumed where it left off, so deploys can happen without restarting or causing significant interruption.
Easy distributed messaging and clustering. Sending a message to a local process or a remote one is fairly seamless in most instances.
Process-local GC. Garbage collection happens in each process independently instead of a global stop-the-world even like java, aiding in low-latency results.
Supervision trees and complex process hierarchy and monitoring/managing.
A few concrete real-world examples that makes great use of Erlang would be:
MongooseIM A highly performant and incredibly scalable, distributed XMPP / Chat server
Riak A distributed key/value store.
Mesos, on the other hand, you can sort of think of as a platform effectively for turning a datacenter of servers into a platform for teams and developers. If I, say as a company, own a datacenter with 10,000 physical servers, and I have 1,000 engineers developing hundreds of services, a good way to allow the engineers to deploy and manage services across that hardware without them needing to worry about the servers directly. It's an abstraction layer over-top of the physical servers to that allows you to share and intelligently allocate resources.
As a user of Mesos, I might say that I have Service X. It's an executable bundle that lives in location Y. Each instance of Service X needs 4 GB of RAM and 2 cores. And I need 8 instances which will be attached to a load balancer. You can specify this in configuration and deploy based on that config. Mesos will find hardware that has enough ram and CPU capacity available to handle each instance of that service and start it running in each of those locations.
It can handle a lot of other more complex topics about the orchestration of them as well, but that's probably a bit in-depth for this :)
Zookeepers most common use cases are Service Discover and configuration management. You can think of it, fundamentally, a bit like a nested key value store, where services can look at pre-defined paths to see where other services currently live.
A simple example is that I have a web service using a shared database cluster. I know a simple name for that database cluster and where the configuration for it lives in zookeeper. I can look up (or repeatedly poll) that path in zookeeper to check what the addresses of the active database hosts are. And on the other side, if I take a database node out of rotation and replace it with a new one, the config in zookeeper gets updated with the new address, and anything continually looking at it will detect this change and change where it's connected to.
A more complex use case for zookeeper is how Kafka uses it (or did at the time that I last used Kafka). Kafka has streams, and streams have many shards. Each consumer of each stream use zookeeper to save checkpoints in each shard after they have read and processed up to a certain point in the stream. That way if the consumer crashes or is restarted, it knows where to pick up in the stream.
I dont know about Meos and Earlang language. But this article might help you with Helix and Zookeeper.
This article tells us:
Zookeeper is responsible for gluing all parts together where Helix is cluster management component that registers all cluster details (cluster itself, nodes, resources).
The article is related to clustering in JBPM using helix and zookeeper.But with this you will get a basic idea on what helix and zookeeper is used for.
And from most of the articles i read online it seems like zookeeper and helix are used together.
Apache Zookeeper can be installed on a single machine or on a cluster.
It can be used to keep track of logs. It can provide various services on a distributed platform.
Storm and Kafka rely on Zookeeper.
Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.
Kafka queue consumers can use Zookeeper to store information on what has been consumed from the queue.

Are Activemq, Redis and Apache camel a right combination?

Are Activemq, Redis and Apache camel a right combination?
Am planning for a high performant enterprise level integration solution accross multiple applications
My objective is to make the solution
a. independent of the consumers performance
b. able to trouble shoot in case of any issue
c. highly available with failover support
d. Hanlde 10k msgs per second
Here I'm planning to have
a. network of activemq brokers running in all app servers and storing the consumed messages in redis data store
b. from redis data store, application can retrieve the messages through camel end points
(camel end point is chosen to process the messages before reaching the app).
Also can ActiveMQ be removed with only Redis + Apache camel, as I see from the discussions forms that Redis does most of the ActiveMQ stuff
Could any one advise on this technology stack.
ActiveMQ and Camel works great together and scales very well - should be no problem to handle the load given proper hardware.
Are you thinking about something like this?
Message producer App -> ActiveMQ -> Camel -> Redis
Message Consumer App <- Camel [some endpoint] <- Redis
Puting ActiveMQ in between is usually a very good way to achieve HA, load balancing and making the solution elastic. Depending on your specific setup with machines etc. ActiveMQ can help in many ways to solve HA issues.
Removing ActiveMQ can a good option if your apps use some other protocol than JMS/ActiveMQ messaging, i.e. HTTP, raw tcp or similar. Can you elaborate on how the apps will communicate with Camel? ActiveMQ, by default, supports transactions, guaranteed delivery and you can live with a limited number of threads on the server, even for your heavy traffic. For other protocols, this might be a bit trickier to achieve. Without a HA layer (cluster) in ActiveMQ you need to setup Redis to handle HA in all aspects, which might be just as easy, but Redis is a bit memory hungry, so be aware of that.

Using Apache Camel for Load Balancing

Can I access SEDA or VM queue from another machine or JVM?
I actually want to implement load balancing with the help of Camel but do not want introduce another messaging framework for this. I just want to distribute load to different consumers from a producers using some in built queue.
Is it possible? If no then what are my options?
Another Approach:(Pull Approach)
Not sure how optimum new approach is or what are the advantages and disadvantages of new approach, So please help me to analyze this approach.
Messages will be put into a Master queue and all the worker systems will be listening to Master queue.Let's say 100,000 messages are being put into Master queue and 5 worker systems are listening to it. Worker systems will process the messages one by one from the master queue. There are two big benefits with this approach:
I don't need to worry about registering my worker systems with the producer. Sixth system just boot up and start listening to Master queue.
I don't need to worry about sending message to a consumer system which is free. When worker system will be done processing a message, it pick up another one from the Master queue.
Let me know your thoughts on it.
SEDA and VM:// work only on the same JVM.
Load balancing in Java messaging is usually achieved using the JMS and Competing Consumers pattern. You send messages to the queue and multiple consumers compete to process them.
If broker with its queue becomes a bottleneck - consider using fan-out pattern and the network of brokers.
SEDA and VM endpoints are valid for the host Context and JVM respectively. To facilitate JVM-to-JVM messaging you will need to use an over-the-wire protocol component such as, but not limited to, Mina, HTTP or JMS.
The easiest way is to use jms. If you have n routes listening on the same jms queue then they will automatically load balance. If one goes away the load will be balanced over the remaining ones. I recommend starting with ActiveMQ as it is very easy to setup and well integrated with Camel.To make the broker highly available you can either setup two standalone brokers or setup one embedded broker per camel instance.