Kafka or Redis broker on embedded device - redis

I would like to understand the implication of running a Kafka broker on a embedded device that has the spec 2GH and 4GB of memory. I am also interest in other's personal experience of running Redis or Kafka on a stringent hardware requirement.
I've been playing around a setup that logs the data from sensors on a Raspberry PI device, the logs is push out to a web client that display these data real-time. It's easier to use a broker such as Redis or Kafka to facilitate with the data stream but, I am unsure about the performance of running it on embedded device.
In short, how is it like to run a Redis or Kafka broker on a embedded device in terms of performance. By the way, this is done on a Python env.

Related

HiveMQ/RabbitMQ as load balancing MQTT node(s) before Thingsboard IoT system

Our endpoint devices are pushing data over MQTT to an IoT system based on the Thingsboard IoT platform. There is only one MQTT topic called /telemetry where all devices connect. The server knows which device the data belongs to based on the device's token used as the MQTT username.
Due to not rare peaks of data loading, outages happen.
My question is:
Is it possible and how to use HiveMQ (RabbitMQ or some similar product) between devices and our IoT system to avoid data loss and smooth out peaks?
This post explains how to use Quality of Service levels, offline buffering, throttling, automatic reconnect and more to avoid data loss and maintain uptime.
The tldr; is that MQTT and HiveMQ have features built in to help avoid data loss, guaranteed delivery, traffic spikes and to handle back-pressure.
It may be worth considering what you can do with your existing tools before expanding your deploy footprint which just adds unnecessary complexity if unwarranted.
I would recommend using Apache Kafka or Confluent in between the MQTT Broker & ThingsBoard. Kafka stores all data on disk (instead of RAM in the case of RabbitMQ) and is scalable among multiple cluster nodes. You could also reload data to ThingsBoard by resetting offsets. This could be useful if there was an error in the configuration of a rulechain and you would have ThingsBoard reprocess the data again.
To connect with Kafka/Confluent you can use the ThingsBoard Integration.
Find more details here:
https://medium.com/python-point/mqtt-and-kafka-8e470eff606b

Is there a redis pub/sub replacement option, with high availability and redundancy, or, probably p2p messaging?

I have an app with hundreds of horizontally scaled servers which uses redis pub/sub, and it works just fine.
The redis server is a central point of failure. Whenever redis fails (well, it happens sometimes), our application falls into inconsistent state and have to follow recovery process which takes time. During this time the entire app is hardly useful.
Is there any messaging system/framework option, similar to redis pub/sub, but with redundancy and high availability so that if one instance fails, other will continue to deliver the messages exchanged between application hosts?
Or, better, is there any distributed messaging system in which app instances exchange the messages in a peer-to-peer manner, so that there is no single point of failure?

Is it a good way to run Kafka on Kubernetes?

For a large online application, use k8s to run it. The scale maybe daily activity user 500,000.
The application inside k8s need messaging feature - Pub/Sub, there are these options:
Kafka
RabbitMQ
Redis
Kafka
It needs zookeeper and good to run on os depends on disk I/O. So if install it into k8s cluster, how? The performance will be worse?
And, if keep Kafka outside of the k8s cluster, connect Kafka from application inside the k8s cluster, how about that performance? They are in the different layer, won't be slow?
RabbitMQ
It's slow than Kafka, but for a daily activity user 500,000 application, is it good enough? If so, maybe it's a good choice.
Redis
It's another option. Maybe the most simple one. But from the internet I got that it will lose message sometimes. If true, that's terrible.
So, the most important thing is, use Kafka(also with zookeeper) on k8s, good or not in this use case?
Yes, running Kafka on Kubernetes is great. Check out this example: https://github.com/Yolean/kubernetes-kafka. It includes ZooKeeper and Kafka as StatefulSets.
PS. Running any of the services in your question on Kubernetes will be pleasant. You can Google the name of the service and "kubernetes" and find example manifests. Many examples here: https://github.com/kubernetes/charts.
For Kafka, you can find some suggestion here. Kubernetes 1.7+ supports local persistent volume, which may be good for Kafka deployment.
You can also take a look to the following project :
https://github.com/EnMasseProject/barnabas
It's about running Kafka on Kubernetes and OpenShift as well. It provides deploying with StatefulSets with persistent volumes or just in memory (for developing or just testing purpose). It provides deploying for Kafka Connect and Prometheus metrics as well.
Another simple configuration of Kafka/Zookeeper on Kubernetes in DigitalOcean with external access:
https://github.com/StanislavKo/k8s_digitalocean_kafka
You can connect to Kafka from outside of AWS/DO/GCE by regular binary protocol. Connection is PLAINTEXT or SASL_PLAINTEXT (user/password).
Kafka cluster is StatefulSet, so you can scale cluster easily.

what are the maximum mqtt connections supported by activemq 5.10.0

I want to support around 100K mqtt connections using activemq. The activemq server is rejecting connections beyond 30K. How to tune activemq to support more number of connections.
I have tried the following
transportConnector name="mqtt" allowLinkStealing="true"
uri="mqtt+nio://0.0.0.0:1883?maximumConnections=100000&wireFormat.maxFrameSize=104857600&transport.defaultKeepAlive=60000&transport.closeAsync=false&useQueueForAccept=false
in activemq.xml but of no use.
I did some unix kernel tuning for number of open file fds to 100000.
Any one solved this problem ?
If you are going to handle > 100k connections I'd recommend looking into a dedicated MQTT broker instead of a multi-protocol message broker. You can see a list of MQTT brokers at the MQTT Github wiki.
ActiveMQ is afaik not designed for handling that much MQTT connections and is not optimized for MQTT because it's a multi-purpose Message Queue. If you want to stick with Apache software, perhaps using Apache Apollo can help although I don't know of any MQTT Apollo deployments with that size, but probably wort a try if you need a multi-protocol broker. Again, I'd recommend a dedicated MQTT broker for large amounts of MQTT connections.
You should definitely look into reactive and multi-threaded MQTT brokers if you want to handle that amount of connections and you should make sure that the MQTT broker you choose is known to work with your desired connection amount and load. HiveMQ for example is capable of handling >100k connections.
Full disclosure: I work for the company behind HiveMQ.
May I suggest you use Apache Apollo for MQTT connections when you have that number of concurrent sessions?
Apache Apollo is a sub project of ActiveMQ with the intent to make the broker scalable to a large number of connected clients. While ActiveMQ supports MQTT, it's not really optimized for this scenario.
JoramMQ (http://jorammq.com) is based on the Joram (http://joram.ow2.org) multi-protocol message broker and it supports more than 500K concurrent MQTT connections.
For anyone still trying to find a fitting MQTT broker for many connections here are my tests of multiple brokers (I should actually add ActiveMQ to the comparison). Performance is not the only thing to compare, but also clustering, monitoring, support, price. Final pick depeneds on your own needs.
Tests were conducted on a 32GB RAM, AMD 5800X, Ubuntu 18 PC.
50 000 MQTT clients connected with no ssl.
Clients subscribed to 4 channels & no messages were published.
Tests above 50k need multiple machines involved or some other tricks because of the 65k limit of outgoing sockets in the system.
Test results
RabbitMQ: 21GB of RAM and ~4 cores.
Mosquitto: 200Mb of RAM and ~0.05 core.
HiveMQ: 2.1GB of RAM and ~0.05 core.
EMQX: 1.4GB of RAM and ~1
core.
VerneMQ: 1.7GB of RAM and ~0.5 core.
If pricing is OK for you - HiveMQ lookes to me like the best broker.
If you are looking for something for free - check VerneMQ.

Real world example of Apache Helix, Zookeeper, Mesos and Erlang?

I am new in
Apache ZooKeeper : ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Apache Mesos : Apache Mesos is a cluster manager that simplifies the complexity of running applications on a shared pool of servers.
Apache Helix : Apache Helix is a generic cluster management framework used for the automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
Erlang Langauge : Erlang is a programming language used to build massively scalable soft real-time systems with requirements on high availability.
It sounds to me that Helix and Mesos both are useful for Clustering management System. How they are related to ZooKeeper? It'd better if someone give me a real world example for their usage.
I am curious to know How [BOINC][1] are distributing tasks to their clients? Are they using any of the above technologies? (Forget about Erlang).
I just need a brief view on it :)
Erlang was built by Ericsson, designed for use in phone systems. By design, it runs hundreds, thousands, or even 10s of thousands of small processes to handle tasks by sending information between them instead of sharing memory or state. This enables all sorts of interesting features that are great for high availability distributed systems such as:
hot code reloading. Each process is paused, it's relevant module code is swapped out, and it is resumed where it left off, so deploys can happen without restarting or causing significant interruption.
Easy distributed messaging and clustering. Sending a message to a local process or a remote one is fairly seamless in most instances.
Process-local GC. Garbage collection happens in each process independently instead of a global stop-the-world even like java, aiding in low-latency results.
Supervision trees and complex process hierarchy and monitoring/managing.
A few concrete real-world examples that makes great use of Erlang would be:
MongooseIM A highly performant and incredibly scalable, distributed XMPP / Chat server
Riak A distributed key/value store.
Mesos, on the other hand, you can sort of think of as a platform effectively for turning a datacenter of servers into a platform for teams and developers. If I, say as a company, own a datacenter with 10,000 physical servers, and I have 1,000 engineers developing hundreds of services, a good way to allow the engineers to deploy and manage services across that hardware without them needing to worry about the servers directly. It's an abstraction layer over-top of the physical servers to that allows you to share and intelligently allocate resources.
As a user of Mesos, I might say that I have Service X. It's an executable bundle that lives in location Y. Each instance of Service X needs 4 GB of RAM and 2 cores. And I need 8 instances which will be attached to a load balancer. You can specify this in configuration and deploy based on that config. Mesos will find hardware that has enough ram and CPU capacity available to handle each instance of that service and start it running in each of those locations.
It can handle a lot of other more complex topics about the orchestration of them as well, but that's probably a bit in-depth for this :)
Zookeepers most common use cases are Service Discover and configuration management. You can think of it, fundamentally, a bit like a nested key value store, where services can look at pre-defined paths to see where other services currently live.
A simple example is that I have a web service using a shared database cluster. I know a simple name for that database cluster and where the configuration for it lives in zookeeper. I can look up (or repeatedly poll) that path in zookeeper to check what the addresses of the active database hosts are. And on the other side, if I take a database node out of rotation and replace it with a new one, the config in zookeeper gets updated with the new address, and anything continually looking at it will detect this change and change where it's connected to.
A more complex use case for zookeeper is how Kafka uses it (or did at the time that I last used Kafka). Kafka has streams, and streams have many shards. Each consumer of each stream use zookeeper to save checkpoints in each shard after they have read and processed up to a certain point in the stream. That way if the consumer crashes or is restarted, it knows where to pick up in the stream.
I dont know about Meos and Earlang language. But this article might help you with Helix and Zookeeper.
This article tells us:
Zookeeper is responsible for gluing all parts together where Helix is cluster management component that registers all cluster details (cluster itself, nodes, resources).
The article is related to clustering in JBPM using helix and zookeeper.But with this you will get a basic idea on what helix and zookeeper is used for.
And from most of the articles i read online it seems like zookeeper and helix are used together.
Apache Zookeeper can be installed on a single machine or on a cluster.
It can be used to keep track of logs. It can provide various services on a distributed platform.
Storm and Kafka rely on Zookeeper.
Storm uses Zookeeper to store all state so that it can recover from an outage in any of its (distributed) component services.
Kafka queue consumers can use Zookeeper to store information on what has been consumed from the queue.