Reactor Kafka consume messages synchronous and process them async

Reactor Kafka consume messages synchronous and process them async - spring-webflux

I'm quite new into the reactive world and using Spring Webflux + reactor Kafka.
kafkaReceiver
.receive()
// .publishOn(Schedulers.boundedElastic())
.doOnNext(a -> log.info("Reading message: {}", a.value()))
.concatMap(kafkaRecord ->
//perform DB operation
//kafkaRecord.receiverOffset.ackwnowledge
)
.doOnError(e -> log.error("Error", e))
.retry()
.subscribe();
I understood that in order to parallelise message consumption, I have to instantiate one KafkaReceiver for each partition but is it possible/recommended for a partition to read messages in a synchronous manner and process them async (including the manual acknowledge)?
So that this is the desired output:
Reading message:1
Reading message:2
Reading message:3
Reading message:4
Stored message 1 in DB + ack
Reading message:5
Stored message 2 in DB + ack
Stored message 5 in DB + ack
Stored message 3 in DB + ack
Stored message 4 in DB + ack
In case of errors, I'm thinking of publishing the record to a DLT.
I've tried with flatMap too, but it seems that the entire processing happens sequentially on a single thread. Also if I'm publishing to a new scheduler, the processing happens on a new single Thread.
If what I'm asking is possible, can someone please help me with a code snippet?

What's the output of your current code log ?

Related

Spring Webflux - initial message without subscriber

I am trying to make an SSE Spring application, using Webflux. According to the documentation, the message is not sent to the sink if there is no subscriber. In my use case, I would like that the subscriber would receive the last message when calling for subscription. I have found that Sink can be configured in following way:
Sinks.many().replay().latest();
And when I have both publisher and subscriber, and the next subscriber calls for subscription, he receives the last sent message, which is great. However if I don't have any subscribers, publisher sends the message and then first subscriber comes in, it receives none. Which is just as documentation above says actually, but I am thinking how to solve that issue to meet my needs. As a workaround I did something like this:
if (shareSinks.currentSubscriberCount() == 0) {
shareSinks.asFlux().subscribe();
}
shareSinks.tryEmitNext(shareDTO);
But subscribing the publisher to its own subscription doesn't sound like a clean way to do this...

This is a matter of hot and cold publishers. Currently, your publisher (Sinks.many().replay().latest()) is a cold publisher. Events that are being emitted while there is no subscriber, will just vanish.
What you need is a so called hot publisher. Hot publishers cache the events and a new subscriber will receive all previously cached events.
This will do the trick:
final Sinks.Many<String> shareSinks = Sinks.many()
.replay()
.all(); // or .limit(10); to keep only the last 10 emissions
final Flux<String> hotPublisher = shareSinks.asFlux()
.cache(); // .cache() turns the cold flux into a
// hot flux
shareSinks.tryEmitNext("1");
shareSinks.tryEmitNext("2");
shareSinks.tryEmitNext("3");
shareSinks.tryEmitNext("4");
hotPublisher.subscribe(message -> System.out.println("received: " + message));
The console print out would be:
received: 1
received: 2
received: 3
received: 4
The Reactor docu also has a chapter on hot vs. cold.

Kafka Streams write an event back to the input topic

in my kafka streams app, I need to re-try processing a message whenever a particular type of exception is thrown in the processing logic.
Rather than wrapping my logic in the RetryTemplate (am using springboot), am considering just simply writing the message back into the input topic, my assumption is that this message will be added to the back of the log in the appropriate partition and it will eventually be re-processed.
Am aware that this would mess up the ordering and am okay with that.
My question is, would kafka streams have an issue when it encounters a message that was supposedly already processed in the past (am assuming kafka streams has a way of marking the messages it has processed especially when exactly is enabled)?
Here is an example of the code am considering for this solution.
val branches = streamsBuilder.stream(inputTopicName)
.mapValues { it -> myServiceObject.executeSomeLogic(it) }
.branch(
{ _, value -> value is successfulResult() }, // success
{ _, error -> error is errorResult() }, // exception was thrown
)
branches[0].to(outputTopicName)
branches[1].to(inputTopicName) //write them back to input as a way of retrying

How to keep redis connection open when reading from reactive API

I am continuously listening on redis streams using the spring reactive api(using lettuce driver). I am using a standalone connection. It seems like the reactor's event loop opens a new connection every time it reads the messages instead of keeping the connection open. I see a lot of TIME_WAIT ports in my machine when i run my program. Is this normal? Is there a way to let lettuce know to re-use the connection instead of reconnecting every time?
This is my code:
StreamReceiver<String, MapRecord<String, String, String>> receiver = StreamReceiver.create(factory);
return receiver
.receive(Consumer.from(keyCacheStreamsConfig.getConsumerGroup(), keyCacheStreamsConfig.getConsumer()),
StreamOffset.create(keyCacheStreamsConfig.getStreamName(), ReadOffset.lastConsumed()))//
// flatMap reads 256 messages by default and processes them in the given scheduler
.flatMap(record -> Mono.fromCallable(() -> consumer.consume(record)).subscribeOn(Schedulers.boundedElastic()))//
.doOnError(t -> {
log.error("Error processing.", t);
streamConnections.get(nodeName).setDirty(true);
})//
.onErrorContinue((err, elem) -> log.error("Error processing message. Continue listening."))//
.subscribe();

Looks like the spring-data-redis library re-uses the connection only if the poll timeout is set to '0' in the stream receiver options and pass it as the second argument in StreamReceiver.create(factory, options). Figured by looking into spring-data-redis' source code.

Camunda - Intermedia message event cannot correlate to a single execution

I created a small application (Spring Boot and camunda) to process an order process. The Order-Service receives the new order via Rest and calls the Start Event of the BPMN Order workflow. The order process contains two asynchronous JMS calls (Customer check and Warehouse Stock check). If both checks return the order process should continue.
The Start event is called within a Spring Rest Controller:
ProcessInstance processInstance =
runtimeService.startProcessInstanceByKey("orderService", String.valueOf(order.getId()));
The Send Task (e.g. the customer check) sends the JMS message into a asynchronous queue.
The answer of this service is catched by a another Spring component which then trys to send an intermediate message:
runtimeService.createMessageCorrelation("msgReceiveCheckCustomerCredibility")
.processInstanceBusinessKey(response.getOrder().getBpmnBusinessKey())
.setVariable("resultOrderCheckCustomterCredibility", response)
.correlate();
I deactivated the warehouse service to see if the order process waits for the arrival of the second call, but instead I get this exception:
1115 06:33:08.564 WARN [o.c.b.e.jobexecutor] ENGINE-14006 Exception while executing job 67d2cc24-0769-11ea-933a-d89ef3425300:
org.springframework.messaging.MessageHandlingException: nested exception is org.camunda.bpm.engine.MismatchingMessageCorrelationException: ENGINE-13031 Cannot correlate a message with name 'msgReceiveCheckCustomerCredibility' to a single execution. 4 executions match the correlation keys: CorrelationSet [businessKey=1, processInstanceId=null, processDefinitionId=null, correlationKeys=null, localCorrelationKeys=null, tenantId=null, isTenantIdSet=false]
This is my process. I cannot see a way to post my bpmn file :-(
What can't it not correlate with the message name and the business key? The JMS queues are empty, there are other messages with the same businessKey waiting.
Thanks!

Just to narrow the problem: Do a runtimeService eventSubscription query before you try to correlate and check what subscriptions are actually waiting .. maybe you have a duplicate message name? Maybe you (accidentally) have another instance of the same process running? Once you identified the subscriptions, you could just notify the execution directly without using the correlation builder ...

Twitter stream api with agents in F#

From Don Syme blog (http://blogs.msdn.com/b/dsyme/archive/2010/01/10/async-and-parallel-design-patterns-in-f-reporting-progress-with-events-plus-twitter-sample.aspx) I tried to implement a twitter stream listener. My goal is to follow the guidance of the twitter api documentation which says "that tweets should often be saved or queued before processing when building a high-reliability system".
So my code needs to have two components:
A queue that piles up and processes each status/tweet json
Something to read the twitter stream that dumps to the queue the tweet in json strings
I choose the following:
An agent to which I post each tweet, that decodes the json, and dumps it to database
A simple http webrequest
I also would like to dump into a text file any error from inserting in the database. ( I will probably switch to a supervisor agent for all the errors).
Two problems:
is my strategy here any good ? If I understand correctly, the agent behaves like a smart queue and processes its messages asynchronously ( if it has 10 guys on its queue it will process a bunch of them at time, instead of waiting for the 1 st one to finish then the 2nd etc...), correct ?
According to Don Syme's post everything before the while is Isolated so the StreamWriter and the database dump are Isolated. But because I need this, I never close my database connection... ?
The code looks something like:
let dumpToDatabase databaseName =
//opens databse connection
fun tweet -> inserts tweet in database
type Agent<'T> = MailboxProcessor<'T>
let agentDump =
Agent.Start(fun (inbox: MailboxProcessor<string>) ->
async{
use w2 = new StreamWriter(#"\Errors.txt")
let dumpError =fun (error:string) -> w2.WriteLine( error )
let dumpTweet = dumpToDatabase "stream"
while true do
let! msg = inbox.Receive()
try
let tw = decode msg
dumpTweet tw
with
| :? MySql.Data.MySqlClient.MySqlException as ex ->
dumpError (msg+ex.ToString() )
| _ as ex -> ()
}
)
let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url
let stream = twitterStream MyCredentials stream_url parameters
while true do
agentDump.Post(stream.ReadLine())
Thanks a lot !
Edit of code with processor agent:
let dumpToDatabase (tweets:tweet list)=
bulk insert of tweets in database
let agentProcessor =
Agent.Start(fun (inbox: MailboxProcessor<string list>) ->
async{
while true do
let! msg = inbox.Receive()
try
msg
|> List.map(decode)
|> dumpToDatabase
with
| _ as ex -> Console.WriteLine("Processor "+ex.ToString()))
}
)
let agentDump =
Agent.Start(fun (inbox: MailboxProcessor<string>) ->
let rec loop messageList count = async{
try
let! newMsg = inbox.Receive()
let newMsgList = newMsg::messageList
if count = 10 then
agentProcessor.Post( newMsgList )
return! loop [] 0
else
return! loop newMsgList (count+1)
with
| _ as ex -> Console.WriteLine("Dump "+ex.ToString())
}
loop [] 0)
let filter_url = "http://stream.twitter.com/1/statuses/filter.json"
let parameters = "track=RT&"
let stream_url = filter_url
let stream = twitterStream MyCredentials stream_url parameters
while true do
agentDump.Post(stream.ReadLine())

I think that the best way to describe agent is that it is is a running process that keeps some state and can communicate with other agents (or web pages or database). When writing agent-based application, you can often use multiple agents that send messages to each other.
I think that the idea to create an agent that reads tweets from the web and stores them in a database is a good choice (though you could also keep the tweets in memory as the state of the agent).
I wouldn't keep the database connection open all the time - MSSQL (and MySQL likely too) implements connection pooling, so it will not close the connection automatically when you release it. This means that it is safer and similarly efficient to reopen the connection each time you need to write data to the database.
Unless you expect to receive a large number of error messages, I would probably do the same for file stream as well (when writing, you can open it, so that new content is added to the end).
The way queue of F# agents work is that it processes messages one by one (in your example, you're waiting for a message using inbox.Receive(). When the queue contains multiple messages, you'll get them one by one (in a loop).
If you wanted to process multiple messages at once, you could write an agent that waits for, say, 10 messages and then sends them as a list to another agent (which would then perform bulk-processing).
You can also specify timeout parameter to the Receive method, so you could wait for at most 10 messages as long as they all arrive within one second - this way, you can quite elegantly implement bulk processing that doesn't hold messages for a long time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas