Kafka HBase Sink Connector unable to deliver its messages to HBase - ssl

I have particular Kafka HBase Sink Connector problem for which I will appreciate any advise or suggestions.
It is a 3-node Kafka cluster - 2 nodes for connect-distributed and 1 node for schema registry + kafka streaming. The Kafka version is 0.10.1 and is part of the Hortonworks platform 2.6.3. There are SSL and Kerberos authentication settings also. On top of it I have custom Kafka application that receives messages, processes them via Kafka streaming and delivers them in HBase.
The process model is:
1) Input topic;
2) Processing (in Kafka streaming);
3) Output topic;
4) HBase sink connector;
5) HBase.
The delivered messages in 1) are successfully transferred and processed until the step 3) inclusive. Then the though the sink connector works fine no message is delivered to HBase.
That being said I tested our custom application model with the Unit tests creating embedded Kafka cluster with its own basic settings and the tests were successful. This could quite likely indicate that the connectivity problem comes from some cluster setting(s).
For your information I observed 3 specific things:
The standard consumer console functionality is able to successfully consume the messages from the sink topic;
There is no consumer id for the sink connection established;
The process of connections starts successfully but stops for not logged reasons and do not call the WorkerSinkTask java class, where actually the writing to the HBase happens.
Addtional important point is the whole SSL encryption and Kerberos authentication setup that might be misconfigured.
In case anyone faced such a case I will greatly appreciate any comments that could be of a help.
Dimitar

Related

Azure IoT on Edge - IoTSDK - Read batch of messages from ModuleClient

I'm tryng to develop an high-frequency message dispatching application and i'm observing for the behavior of the SDK about message reading from the ModuleClient connected to the edgeHub by using "MQTT on TCP Only" transport settings.
Seems that there is no way to read multiple messages at time (batch) from the edgeHub (I think is something related to the underlying protocol).
So the result is that one must sequentially read a message, process it and give the ack to the hub.
Does exist a way to process multiple message at time without waiting for the processing of the previous?
Is this "limitation" tied to the underlyng protocol?
I'm using Microsoft.Azure.Devices.Client 1.37.2 on a .NET Core 3.1 application deployed on Azure Kubernetes (AKS) by using Azure IoT Edge on Kubernetes workload.
You are correct, you cannot use batch in MQTT protocol. This is a limitation tied to IoTHub when using MQTT Protocol.
IoT Hub only supports batch send over AMQP and HTTPS at the moment.
The MQTT implementation loops over the batch and sends each message
individually.
Ref: https://github.com/Azure/azure-iot-sdk-csharp
Suggest that you add a new feature request, if need IoTHub to support batch when connecting using MQTT: https://feedback.azure.com/forums/321918-azure-iot-hub-dps-sdks

Why did we only receive the response half of the time (round-robin) with "Spring Cloud DataFlow for HTTP request/response" approach deployed in PCF?

This issue is related to 2 earlier questions:
How to implement HTTP request/reply when the response comes from a rabbitMQ reply queue using Spring Integration DSL?
How do I find the connection information of a RabbitMQ server that is bound to a SCDF stream deployed on Tanzu (Pivotal/PCF) environment?
As you can see the update for the question 2 above, we can receive the correct response back from the rabbit sink. However, it only works half of the time alternated as round-robin way (success-timeout-success-timeout-...). The outside http app was implemented with Spring Integration showed in question 1 - sending the request to the request rabbit source queue and receiving the response from the response rabbit sink queue. This only happened in PCF environment after we deployed both the outside http app and created the stream (see following POC stream) there. However, it's working locally all the time (NOT alternately). Did we miss anything? Not sure what's the culprit in PCF. Thanks.
rabbitSource: rabbit --queues=rabbitSource | my-processor | rabbitSink: rabbit --routing-key=pocStream.rabbitSink.pocStream
Sounds like you have several instances of your stream in that PCF environment. This way there are more then one (round-robin feels like two) subscribers to the same RabbitMQ queue. Where only one consumer must be for that queue since only initiator of the request waits for reply, but odd (or even) replies go to different consumer of the same queue. I don't place it as an answer, just because it is the best guess what is going on since you don't see a problem locally.
Please, investigate your PCF environment and how does it scale instances for your stream. There also might be some option of SCDF which does scaling for us.

One mule app server in cluster polling maximum message from MQ

My mule application is comprised of 2 nodes running in a cluster, and it listens to IBM MQ Cluster (basically connecting to 2 MQ via queue manager). There are situations where one mule node pulls or takes more than 80% of message from MQ cluster and another mule node picks rest 20%. This is causing CPU performance issues.
We have double checked that all load balancing is proper, and very few times we get CPU performance problem. Please can anybody give some ideas what could be possible reason for it.
Example: last scenario was created where there are 200000 messages in queue, and node2 mule server picked 92% of message from queue within few minutes.
This issue has been fixed now. Got into the root cause - our mule application running on MULE_NODE01 reads/writes to WMQ_NODE01, and similarly for node 2. One of the mule node (lets say MULE_NODE02) reads from linux/windows file system and puts huge messages to its corresponding WMQ_NODE02. Now, its IBM MQ which tries to push maximum load to other WMQ node to balance the work load. That's why MULE_NODE01 reads all those loaded files from WMQ_NODE01 and causes CPU usage alerts.
#JoshMc your clue helped a lot in understanding the issues, thanks a lot for helping.
Its WMQ node in a cluster which tries to push maximum load to other WMQ node, seems like this is how MQ works internally.
To solve this, we are now connecting our mule node to MQ gateway, rather making 1-to-1 connectivity
This could be solved by avoiding the racing condition caused by multiple listeners. Configure the listener in the cluster to the primary node only.
republish the message to a persistent VM queue.
move the logic to another flow that could be triggered via a VM listener and let the Mule cluster do the load balancing.

Kafka To S3 Connector

Let assume we are using Kafka S3 Sink Connector in a Standalone mode.
As it's written on the confluent page, it has exactly once delivery garantee.
I don't understand how does it work...
If for example - at some point of time, the connector wrote messages to the S3, but didn't manage to commit offsets to the Kafka topic and crushed.
The next time it starts up, it should process previous messages again?
Or does it use transactions internally?

MQTT backend scaling

I am currently developing a typical IoT service. At the moment multiple devices connect to one MQTT broker (mosquitto) and my java backend also connects to the broker (Paho).
The problem i see is the following:
When i am going to have multiple instances of my java backend every backend will receive and process every message received. That`s a big issue. I just want to deliver a message to only one java backend. Anybody an idea how to deal with this problem?
Btw: Java backends will be added or removed depending on the load.
There are a couple of options
Place a queuing system between your application and the MQTT broker, possibly something like Apache Kafka
HiveMQ and IBM MessageSight brokers support (different implementations) of something called shared subscriptions. This allows messages to be shared out between more than one client. Shared subscriptions is likely to be formally added to the MQTTv5 spec which should mean that it will be added to more broker and have a standard implementation.