Is there any way to read messages from Kafka topic without consumer? - testing

Just for testing purpose, I want to automate scenario where I need to check Kafka messages content, so just wanted to know if it is possible to read messages without consumers directly from TOPIC using Kafka java libraries?
I'm new to Kafka so any suggestion will be good for me.
Thanks in advance!

You could SSH to the broker in question, then dump the log segments into a deserialized fashion, but it would take less time to simply use a consumer in any language, not necessarily Java
"For testing purposes" Kafka Java API provides MockProducer and MockConsumer, which are backed by Lists, not a full broker

Related

Is it possible to use RabbitMQ streams with Celery?

RabbitMQ introduced streams last year. They claim streams work with AMQP 0.9 and 1.0 as well as mentioned here. That is, theoretically we should be able to create a queue backed by a stream, connect as many workers we need to fan-out to the queue and each worker should get the message delivered.
My question is, has anyone tried to use streams with celery yet? If so, please share any info on how to configure streams in Celery and your experience with them so far. There are unfortunately no blog posts nor any documentation I could find on this topic. I am hoping this post brings together all this information in one place.
The big advantage of streams is they allow large fan-out using the existing infra of RabbitMQ + Celery.
As far as I am aware of there is no way Celery can utilise streams. However, you can probably spin up a long running Celery task that processes particular stream. This is probably reason why nobody attempted (or better say recorded as a blog post or something similar) to do this. - Why bother using Celery for something it is not made for?

How can I retrieve the content of a Redis channel at the time of subscribe?

When my web app subscribes to a Redis channel (mostly on Application_Start), it should automatically load the current channel content, but not wait for the next publish within this channel.
I couldn't find any way to achieve this - but as this "problem" appears to be so common and trivial I guess there must be an easy solution for this?
In the web app I'm using StackExchange.Redis (in case that's relevant). Who can help? Thx in advance!
The answer is no, there is no option to do this using Redis pub/sub functionality, Redis don't actually store messages which being published to the channel, so you can't retrieve them when you connect to channel.
Take a look at RabbitMQ with their persistent queues and message acknowledgements, which they have out of the box.
As there's obviously no comfortable option available in Redis, I'm now publishing the channel message also as a regular key-value. So the clients take it from key-value store before subscribing to the channel.

What is difference between a Message Queue and ESB?

I was just reading about Enterprise Service Bus and trying to figure out how to implement it. However, the more I read about it, my conclusion was that it is just a glorified message queue.
I read about it here: What is an ESB and what is it good for?
We use RabbitMQ in our architecture quite a lot and what I was having hard time understanding was that there any many similarities between both concepts:
Both are basically post and forget
You can post a message of any format in both queues
My question is that what is it that an ESB does and RabbitMQ is not able to do?
I have not used RabbitMQ so I wont be able to comment on it. I have used ESB and currently using it.
ESB: It provides you multiple ways of subscribing to your message.Its mostly useful in Publisher-Subscriber model in which topics and subscription is used. You can publish your message payload in topics(similar to queues). Unlike a queue,topic provides us with capability to have more than one subscription for a single topic. This subscription can be divided based on your business need and you can define some kind of filter expression on those topic (also called channel)and with the specified filter a proper subscriber will pull the message from bus. Also one single message can be subscribed by multiple subscriber at a time. If no filtering is used on topics then it means all subscriber for that topic will pull the message from the channel.
This is asynchronous mechanism as you mentioned, post and forget. There is a retry mechanism in ESB where you can try subscribing the message for some number of times I think its 10 times(max) after which its sent in dead queue.
So if your requirement is to connect multiple enterprise system with loosely couple architecture then ESB is a good option.
I hope this was helpful to know about ESB

how to make sub-topics in Kafka

I am trying to represent Topics and Sub-topics in Kafka.
Example : Topic 'Sports' Sub-topic 'Football', 'Handball'
And as I know Kafka doesn't support this. what I am using now are Topics like this 'Sports_Football' , 'Sports_Handball'...
this is not really functional because when we need to when we want the Topic 'Sports' with all the subs we need to query all the topics for it.
we are also using Redis and Apache Storm. So please is there a better way of doing this?
You are correct. There is no such thing as a "subtopic" in Kafka, however, consuming all topics that begin with the word 'Sports' is trivial. Assuming you're using Java, once you have initialized a consumer use the method consumer.subscribe(Pattern.compile("^Sports_.+")) to subscribe to your "subtopics." Calling consumer.poll(timeout) will now read from all topics beginning with 'Sports_'.
The only downside to doing it this way is that the consumer will have to resubscribe when new 'Sports_' topics are added.
Apache Kafka doesn't support it, you are right. But Kafka supports message partitioning. Kafka provides garantee, that all messages with the same key go to the same partition.
You can consume all partitions, or only single one. So you basically can set different keys for the each sport in order to separate messages.
There is also the option of using Redis streams,
using kafka-redis-connector you can push data to Redis streams.
but consider the benefits and inconvenients of Redis streams.
An other interesting solution is using Kafka Streams, so you can create subtopics.
Broker(Sport) ==> Sport_Stream(Football, Handball) ==> Consumer can receive topics from the Broker or receive a subtopic from the Stream.

Read all messages from the very begining

Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).
Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.
What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.
Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".