HiveMQ/RabbitMQ as load balancing MQTT node(s) before Thingsboard IoT system - rabbitmq

Our endpoint devices are pushing data over MQTT to an IoT system based on the Thingsboard IoT platform. There is only one MQTT topic called /telemetry where all devices connect. The server knows which device the data belongs to based on the device's token used as the MQTT username.
Due to not rare peaks of data loading, outages happen.
My question is:
Is it possible and how to use HiveMQ (RabbitMQ or some similar product) between devices and our IoT system to avoid data loss and smooth out peaks?

This post explains how to use Quality of Service levels, offline buffering, throttling, automatic reconnect and more to avoid data loss and maintain uptime.
The tldr; is that MQTT and HiveMQ have features built in to help avoid data loss, guaranteed delivery, traffic spikes and to handle back-pressure.
It may be worth considering what you can do with your existing tools before expanding your deploy footprint which just adds unnecessary complexity if unwarranted.

I would recommend using Apache Kafka or Confluent in between the MQTT Broker & ThingsBoard. Kafka stores all data on disk (instead of RAM in the case of RabbitMQ) and is scalable among multiple cluster nodes. You could also reload data to ThingsBoard by resetting offsets. This could be useful if there was an error in the configuration of a rulechain and you would have ThingsBoard reprocess the data again.
To connect with Kafka/Confluent you can use the ThingsBoard Integration.
Find more details here:
https://medium.com/python-point/mqtt-and-kafka-8e470eff606b

Related

RabbitMQ,Vernemq and HiveMQ - Distribution and HA

I'm trying to pick one up for a project I'm doing.
It's utterly unclear from the documentation how does the HA work in all of them except for RabbitMQ, which states that incoming requests will always find their way to the node that originally created the queue in question. Meaning If I want to publish to QueueA which was created by Node1 but end up making the request at Node2, I will be internally routed to Node2.
I on the other hand was looking for a TRUE distributed solution where I would be able to address any node without having my request re-routed internally, and get an answer back from it.
VerneMQ provides clustering without the use of any additional components like Zookeeper, full syncronisation of the topic tree and subscriptions, self-healing netsplits. VerneMQ routes every message from any cluster node to any other node where subscribes live. VerneMQ can do queue migration in the background for offline queues.
VerneMQ has an eventually consistent synchronisation model. It broadcasts events in the cluster, but also uses anti-entropy to make sure the synchronisation happens.
Contrary to brokers based on Mnesia (Rabbit), VerneMQ has a defined model regarding CAP (consistency, availability).
Contrary to other brokers, VerneMQ offers these features in its fully Open Source version!
Disclaimer: I'm with the VerneMQ project.
Disclaimer: I work for HiveMQ.
From the HiveMQ User Guide:
An MQTT broker cluster is a distributed system that represents one logical MQTT broker to connected MQTT clients, meaning that for the MQTT client it makes zero difference whether it is connected to a single HiveMQ broker node or a multi node HiveMQ cluster.
Data sets get replicated between different node and when a node leaves or enters the cluster data gets re-synchronised to ensure the configured replica count is always upheld.
HiveMQ provides a masterless, fully elastic cluster with true high availability.
No additional software such as Zookeeper is required.

Kafka or Redis broker on embedded device

I would like to understand the implication of running a Kafka broker on a embedded device that has the spec 2GH and 4GB of memory. I am also interest in other's personal experience of running Redis or Kafka on a stringent hardware requirement.
I've been playing around a setup that logs the data from sensors on a Raspberry PI device, the logs is push out to a web client that display these data real-time. It's easier to use a broker such as Redis or Kafka to facilitate with the data stream but, I am unsure about the performance of running it on embedded device.
In short, how is it like to run a Redis or Kafka broker on a embedded device in terms of performance. By the way, this is done on a Python env.

Client queue persistence

Amqp brokers have persistence settings that allow guaranteed delivery - but that only works if the message actually reaches the broker. If there is a network failure and a subsequent client crash/reboot messages could be lost. Is there some way in rabbitmq or activemq or some other messaging framework for the client (producer) to persist messages to disk so that in the event the client crashes or is rebooted any unsent messages will not be lost?
I have seen people run a broker locally in order to get around this issue. That seems like an unnecessary amount of work, especially if you don't have much control over the deployment of your client.
In reality you've answered your own question pretty well. Many people looking for client side persistence turn to embedded brokers because it's actually a very good solution. Having a local broker that can store and forward gives you a lot more flexibility than just an built in persistence layer in each client, all local clients can share one broker instance which can allow you to move storage as needed in cases where you find that your stored local messages are building up due to unforeseen remote downtime.
There are of course some client implementations that do offer storage but finding one depends on your chosen broker / protocol and of course your willingness to shell out the money to buy support or licensing if that client happens to not be from say an open source implementation. The MQTT Paho client does I think have a local storage option as do some others.

what are the maximum mqtt connections supported by activemq 5.10.0

I want to support around 100K mqtt connections using activemq. The activemq server is rejecting connections beyond 30K. How to tune activemq to support more number of connections.
I have tried the following
transportConnector name="mqtt" allowLinkStealing="true"
uri="mqtt+nio://0.0.0.0:1883?maximumConnections=100000&wireFormat.maxFrameSize=104857600&transport.defaultKeepAlive=60000&transport.closeAsync=false&useQueueForAccept=false
in activemq.xml but of no use.
I did some unix kernel tuning for number of open file fds to 100000.
Any one solved this problem ?
If you are going to handle > 100k connections I'd recommend looking into a dedicated MQTT broker instead of a multi-protocol message broker. You can see a list of MQTT brokers at the MQTT Github wiki.
ActiveMQ is afaik not designed for handling that much MQTT connections and is not optimized for MQTT because it's a multi-purpose Message Queue. If you want to stick with Apache software, perhaps using Apache Apollo can help although I don't know of any MQTT Apollo deployments with that size, but probably wort a try if you need a multi-protocol broker. Again, I'd recommend a dedicated MQTT broker for large amounts of MQTT connections.
You should definitely look into reactive and multi-threaded MQTT brokers if you want to handle that amount of connections and you should make sure that the MQTT broker you choose is known to work with your desired connection amount and load. HiveMQ for example is capable of handling >100k connections.
Full disclosure: I work for the company behind HiveMQ.
May I suggest you use Apache Apollo for MQTT connections when you have that number of concurrent sessions?
Apache Apollo is a sub project of ActiveMQ with the intent to make the broker scalable to a large number of connected clients. While ActiveMQ supports MQTT, it's not really optimized for this scenario.
JoramMQ (http://jorammq.com) is based on the Joram (http://joram.ow2.org) multi-protocol message broker and it supports more than 500K concurrent MQTT connections.
For anyone still trying to find a fitting MQTT broker for many connections here are my tests of multiple brokers (I should actually add ActiveMQ to the comparison). Performance is not the only thing to compare, but also clustering, monitoring, support, price. Final pick depeneds on your own needs.
Tests were conducted on a 32GB RAM, AMD 5800X, Ubuntu 18 PC.
50 000 MQTT clients connected with no ssl.
Clients subscribed to 4 channels & no messages were published.
Tests above 50k need multiple machines involved or some other tricks because of the 65k limit of outgoing sockets in the system.
Test results
RabbitMQ: 21GB of RAM and ~4 cores.
Mosquitto: 200Mb of RAM and ~0.05 core.
HiveMQ: 2.1GB of RAM and ~0.05 core.
EMQX: 1.4GB of RAM and ~1
core.
VerneMQ: 1.7GB of RAM and ~0.5 core.
If pricing is OK for you - HiveMQ lookes to me like the best broker.
If you are looking for something for free - check VerneMQ.

MQTT vs MQ design considerations

I don't have a specific query here ; just need some design guidelines.
I came across this article on Node.js , MQTT and Websockets.
I guess we can achieve similar purpose using Node/Java + ActiveMQ + Websockets. My query is how to select between MQ and MQTT ? Can I safely use an "open" server like mosquitto in a medium-large scale project, compared to ActiveMQ ?
This article has had some insight, and it seems like I should use both MQ and MQTT, as MQTT may possibly help if I get lightweight clients in future.
Thanks !
Adding to what Shashi has said, these have different capabilities and use cases.
MQTT defines a standard wire protocol for pub/sub and, as Shashi noted, is designed for very lightweight environments. As such it has a very minimal wire format, a few basic qualities of service and a basic feature set.
Traditional message queueing systems on the other hand are generally proprietary (although AMQP aims to change that), cover both point-to-point and pub/sub, offer many qualities of service and tend to have a more heavyweight wire format, although this exists to support enhanced feature sets such as reply-to addressing, protocol conversion, etc.
A good example of MQTT would be where you have endpoints in phones, tablets and set-top boxes. These have minimal horsepower, memory and system resources. Typically connections from these either stay MQTT and they talk amongst themselves, or they are bridged to an enterprise-class MQ where they can intercommunicate with back-end applications. For example, an MQTT-based chat client might talk directly to another through the MQTT broker. Alternatively, an MQTT-based content-delivery system would bridge to an enterprise messaging network which hosted the ads and other content to be delivered to apps running on phones and tablets. The enterprise back-end would manage all the statistics of ad delivery and views upon which billings are based and the MQTT leg allows the content to be pushed with minimal battery or horsepower consumption on the end-user device.
So MQTT is used for embedded systems and end-user devices where power, bandwidth and network stability are issues. This is often in combination with traditional MQ messaging, although I haven't ever seen MQTT used as the exclusive transport for traditional messaging applications. Presumably, this is because MQTT lacks some of the more robust features such as message correlation, reply-to addressing and point-to-point addressing that have been core to messaging for 20 years.
MQTT protocol is suited for small devices like sensors, mobile phones etc that have small memory footprint. These devices are typically located in a brittle network and typically have low computing power.
These devices connect to an organizations back-end network via MQTT protocol for sending and receiving messages. For example a temperature sensor in an oil pipeline would collect temperature of the oil flowing through the pipe and send it to the control center. In response a command message could be sent over MQTT to another device to reduce/stop the flow of oil through that pipe.
WebSphere MQ has the capability to send/receive messages to/from the MQTT devices. So if you plan to implement a messaging based solution that involves devices & sensors, you can consider MQ and MQTT.
HTH
As already discussed, MQTT defines an applicative wire protocol (i.e. how the information is organized and then serialized, before to be transferred).
Mosquitto, or whatever else MQTT broker, is just an implementation of the Hub and Spoke Integration Pattern, just like JMS and AMQP based brokers, the difference consists in the wire protocol at transport level: AMQP defines a standardized transport wire protocol, instead JMS brokers like ActiveMQ defines their own proprietary format, namely the OpenWire. Of course, not standard implementations, like Mosquitto, implement proprietary wire transport protocol (this impacts interoperability, but can be a better choice in terms of perfomances).
Back to the question. Brokers like Mosquitto can be used in real scenarios, according to your needs in terms of scalability and reliability: normally, clustering is needed to assure i. Availability, ii. Reliability and iii. Scalability. Brokers thought for PAN (Private Area Netorks), normally do not provide OTB (Out of The Box) such features - ActiveMQ provides that.
Concluding, it's up to your requirements to pick for you the best solution.