I am trying to represent Topics and Sub-topics in Kafka.
Example : Topic 'Sports' Sub-topic 'Football', 'Handball'
And as I know Kafka doesn't support this. what I am using now are Topics like this 'Sports_Football' , 'Sports_Handball'...
this is not really functional because when we need to when we want the Topic 'Sports' with all the subs we need to query all the topics for it.
we are also using Redis and Apache Storm. So please is there a better way of doing this?
You are correct. There is no such thing as a "subtopic" in Kafka, however, consuming all topics that begin with the word 'Sports' is trivial. Assuming you're using Java, once you have initialized a consumer use the method consumer.subscribe(Pattern.compile("^Sports_.+")) to subscribe to your "subtopics." Calling consumer.poll(timeout) will now read from all topics beginning with 'Sports_'.
The only downside to doing it this way is that the consumer will have to resubscribe when new 'Sports_' topics are added.
Apache Kafka doesn't support it, you are right. But Kafka supports message partitioning. Kafka provides garantee, that all messages with the same key go to the same partition.
You can consume all partitions, or only single one. So you basically can set different keys for the each sport in order to separate messages.
There is also the option of using Redis streams,
using kafka-redis-connector you can push data to Redis streams.
but consider the benefits and inconvenients of Redis streams.
An other interesting solution is using Kafka Streams, so you can create subtopics.
Broker(Sport) ==> Sport_Stream(Football, Handball) ==> Consumer can receive topics from the Broker or receive a subtopic from the Stream.
Related
Sorry if this is answered in the documentation, but I need some more insight. We currently use RabbitMQ, and need a distributed system. I would like to build a distributed system with 3 or more distributed brokers, named NEWYORK, NEVADA and TEXAS. Looking to see if it is workable to send Q messages with routing keys like, NEWYORK.terminal.abc from NEVADA with the ability to send a reply back with a replyTo type option. Also, things like: NEVADA.jobQueue.fastpace from TEXAS. or TEXAS.queues.ect.
Then ability to send TOPIC type messages from NEWYORK.weather and other sites subscribe to NEWYORK.weather. ect.. ect..
Is this something that ActiveMQ/Artemis can do?
Yes, this sort of data transmission is done all the time with ActiveMQ.
Tip: Topics become confusing and complicated to configure once you go to a multi-broker architecture. Look into using Virtual Topics or Composite Destinations to get your data subscriptions lined up how you want, while maintaining pub-sub pattern.
Virtual Topic summary:
Producers end to a topic
Consumers read from specially named queue(s)
Ability to have multiple subscribers, and separate local traffic with over-the-wan traffic into separate queues
Support for consumer-provided server-side filtering using JMS standard selectors
ref: https://activemq.apache.org/virtual-destinations
RabbitMQ introduced streams last year. They claim streams work with AMQP 0.9 and 1.0 as well as mentioned here. That is, theoretically we should be able to create a queue backed by a stream, connect as many workers we need to fan-out to the queue and each worker should get the message delivered.
My question is, has anyone tried to use streams with celery yet? If so, please share any info on how to configure streams in Celery and your experience with them so far. There are unfortunately no blog posts nor any documentation I could find on this topic. I am hoping this post brings together all this information in one place.
The big advantage of streams is they allow large fan-out using the existing infra of RabbitMQ + Celery.
As far as I am aware of there is no way Celery can utilise streams. However, you can probably spin up a long running Celery task that processes particular stream. This is probably reason why nobody attempted (or better say recorded as a blog post or something similar) to do this. - Why bother using Celery for something it is not made for?
Just for testing purpose, I want to automate scenario where I need to check Kafka messages content, so just wanted to know if it is possible to read messages without consumers directly from TOPIC using Kafka java libraries?
I'm new to Kafka so any suggestion will be good for me.
Thanks in advance!
You could SSH to the broker in question, then dump the log segments into a deserialized fashion, but it would take less time to simply use a consumer in any language, not necessarily Java
"For testing purposes" Kafka Java API provides MockProducer and MockConsumer, which are backed by Lists, not a full broker
I haven't been able to find much information about this online. I'm wondering if its possible to build a Flink app that can dynamically consume all topics matching a regex pattern and sync those topics to s3. Also, each topic being dynamically synced would have Avro messages and the Flink app would use Confluent's Schema Registry.
So lucky man! Flink 1.4 just released a few days ago and this is the first version that provides consuming Kafka topics using REGEX. According to java docs here is how you can use it:
FlinkKafkaConsumer011
public FlinkKafkaConsumer011(PatternsubscriptionPattern,DeserializationSchema<T> valueDeserializer,Properties props)
Creates a new Kafka streaming source consumer for Kafka 0.11.x. Use
this constructor to subscribe to multiple topics based on a regular
expression pattern. If partition discovery is enabled (by setting a
non-negative value for
FlinkKafkaConsumerBase.KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS in the
properties), topics with names matching the pattern will also be
subscribed to as they are created on the fly.
Parameters:
subscriptionPattern - The regular expression for a pattern of topic names to subscribe to.
valueDeserializer - The de-/serializer used to convert between Kafka's byte messages and Flink's objects.
props - The properties used to configure the Kafka consumer client, and the ZooKeeper client.
Just notice that running Flink streaming application, it fetch topic data from Zookeeper at intervals specified using the consumer config :
FlinkKafkaConsumerBase.KEY_PARTITION_DISCOVERY_INTERVAL_MILLIS
It means every consumer should resync the metadata including topics, at some specified intervals.The default value is 5 minutes. So adding a new topic you should expect consumer starts to consume it at most in 5 minutes. You should set this configuration for Flink consumer with your desired time interval.
Subscribing to Kafka topics with a regex pattern was added in Flink 1.4. See the documentation here.
S3 is one of the file systems supported by Flink. For reliable, exactly-once delivery of a stream into a file system, use the flink-connector-filesystem connector.
You can configure Flink to use Avro, but I'm not sure what the status is of interop with Confluent's schema registry.
For searching on these and other topics, I recommend the search on the Flink doc page. For example: https://ci.apache.org/projects/flink/flink-docs-release-1.4/search-results.html?q=schema+registry
I come from a RabbitMQ background, and with RabbitMQ, you can set up exchanges that route messages to different queues based on a routing key.
In Kafka, how I am currently understanding topics is that they can be thought of as queues (that never get emptied). However, I am interested in putting different messages into different topics based on a certain criteria, and I would like to avoid doing that logic on the producer side.
Are there Kafka equivalent(s) to RabbitMQ's exchanges?
There is no equivalent. The only way to route different messages to
different topics is to put that logic on the producer side. Even deciding which partition of a topic to send an individual message to is left up to the producer.
Kafka's great strength is that it's really simple. That's part of why Kafka can scale really, really well. The downside is that Kafka doesn't have the feature set of a conventional message queue.
Kafka has something called a message key and when a message key added to the message for the first time Kafka pushes it to a random partition in the topic but when there is a new message with the same message key then Kafka pushes it to the same partition