How do you close a cold observable gracefully after certain time has elapsed from subscribe? - flowable

Assume below code which processes data from a paginatedAPI (external).
Flowable<Data> process = Flowable.generate(() -> new State(),
new BiConsumer<State, Emitter<Data>>() {
void accept() {
//get data from upstream service
//This calls dataEmitter.onNext() internally
pageId = paginatedAPI.get(pageId, dataEmitter);
//process
...
//update state
state.updatePageId(pageId);
}
}).subscribeOn(Schedulers.from(executor));
Now, since this is created from .generate, accept will be called only when subscriber is ready for next data.
I have full control on what I can add to state, but I can't change paginatedAPI
Requirement:
After a time T from subscription,
a) Iterate through all pages without sending them to subscriber and call paginatedAPI.close()
b) Provide subscriber with data from paginatedAPI.close()
If the subscriber disconnects before time T, then
a) Iterate through all pages without sending them to subscriber and call paginatedAPI.close()
I don't understand how to add the concept of time from subscription in controlling the flowable logic.
Also, accept can only call onNext atmost once. Now how can I finish through the paginatedAPI by calling onNext multiple times.
Edit: Added details on emitter and internal onNext call in paginatedAPi.get(pageId, dataEmitter);

Related

Kafka streams: groupByKey and reduce not triggering action exactly once when error occurs in stream

I have a simple Kafka streams scenario where I am doing a groupyByKey then reduce and then an action. There could be duplicate events in the source topic hence the groupyByKey and reduce
The action could error and in that case, I need the streams app to reprocess that event. In the example below I'm always throwing an error to demonstrate the point.
It is very important that the action only ever happens once and at least once.
The problem I'm finding is that when the streams app reprocesses the event, the reduce function is being called and as it returns null the action doesn't get recalled.
As only one event is produced to the source topic TOPIC_NAME I would expect the reduce to not have any values and skip down to the mapValues.
val topologyBuilder = StreamsBuilder()
topologyBuilder.stream(
TOPIC_NAME,
Consumed.with(Serdes.String(), EventSerde())
)
.groupByKey(Grouped.with(Serdes.String(), EventSerde()))
.reduce { current, _ ->
println("reduce hit")
null
}
.mapValues { _, v ->
println(Id: "${v.correlationId}")
throw Exception("simulate error")
}
To cause the issue I run the streams app twice. This is the output:
First run
Id: 90e6aefb-8763-4861-8d82-1304a6b5654e
11:10:52.320 [test-app-dcea4eb1-a58f-4a30-905f-46dad446b31e-StreamThread-1] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [test-app-dcea4eb1-a58f-4a30-905f-46dad446b31e] All stream threads have died. The instance will be in error state and should be closed.
Second run
reduce hit
As you can see the .mapValues doesn't get called on the second run even though it errored on the first run causing the streams app to reprocess the same event again.
Is it possible to be able to have a streams app re-process an event with a reduced step where it's treating the event like it's never seen before? - Or is there a better approach to how I'm doing this?
I was missing a property setting for the streams app.
props["processing.guarantee"]= "exactly_once"
By setting this, it will guarantee that any state created from the point of picking up the event will rollback in case of a exception being thrown and the streams app crashing.
The problem was that the streams app would pick up the event again to re-process but the reducer step had state which has persisted. By enabling the exactly_once setting it ensures that the reducer state is also rolled back.
It now successfully re-processes the event as if it had never seen it before

How to chase a JFR event over multiple threads

I'm struggling to model asynchronous servlet request processing with custom JFR events.
The challenge I'm facing is that in asynchronous processing a request may be #dispatch()ed several times. This means the whole request processing chain may be executed multiple times, some time apart in different threads. How do I model this with custom JFR events?
What would help me is either the concept of a "parent" event (possibly in a different thread) or the suspension and resumption of an event.
Edit
To illustrate the issue a bit. An async request may take 100 seconds wall clock time to process. However the actual processing may happen in only 4 seconds user time in a Servlet#service() method:
second 0-1 in thread A, Servlet#service() method returns, AsyncContext started
second 10-11 in thread B, Servlet#service() method returns, AsyncContext started
second 80-81 in thread A, Servlet#service() method returns, AsyncContext started
second 99-100 in thread C, Servlet#service() method returns
I'm only interested in generating events for these four durations in these three threads and then correlating them with a single request.
You can add a thread field to the event
public class MyEvent extends Event [
#Label("Start Thread")
#TransitionFrom
private final Thread startThread;
MyEvent(Thread thread) {
this.startThread = thread;
}
]
When you commit the event the end thread will be stored.
If you want to track an event over several threads, you would need to create an event for every thread and have an id so you can understand the flow.
class MyEvent extends Event {
#Label("Transition id");
long id;
}
If you like you can create a relational id to describe the relation and JMC should be able to hint (in context menus etc.) there is a relation among events.
#Label("Transition Id")
#Relational
#Target({ ElementType.FIELD })
#Retention(RetentionPolicy.RUNTIME)
#interface TransitionId {
}
If you don't want to repeat yourself, you can write the above functionality in a method in a base class, which you can call for every new thread the event visits.
abstract AbstractTransition extends Event {
#TransitionId
#Label("Transition Id")
private long id;
public void setTransitionId(long id) {
this.id = id;
}
}
There is no other way to do this.
It's not possible for the JVM to know what thread an event object is in, or what threads that should be recorded. The user needs to provide at least one method call for every thread that should be touched (together with some context).
The problem is similar to how to tie JFR events for spans and scopes together in distributed tracers.
This article may help:
http://hirt.se/blog/?p=1081

Why Alamofire is using dispatch_sync() function when creating a dataTask?

The code below is from source code of Alamofire
let queue = dispatch_queue_create(nil, DISPATCH_QUEUE_SERIAL)
public func request(URLRequest: URLRequestConvertible) -> Request {
var dataTask: NSURLSessionDataTask!
dispatch_sync(queue) { dataTask = self.session.dataTaskWithRequest(URLRequest.URLRequest) }
let request = Request(session: session, task: dataTask)
self.delegate[request.delegate.task] = request.delegate
if startRequestsImmediately {
request.resume()
}
return request
}
It seems like every time it creates a dataTask, it dispatch that creating process to a serial queue. Would this measure protect the program from any kind of multi-thread trap?
I can't figure out what's the difference without that queue.
The reason why we implemented that check is due to Alamofire Issue #393. We were seeing duplicate task identifiers without the serial queue when creating data and upload tasks in parallel from multiple threads. It appears that Apple has a thread safety issue when incrementing the task identifiers. Therefore in Alamofire, we eliminate the issue by creating the tasks on a serial queue.
Cheers. 🍻

RabbitMQ Wait for a message with a timeout

I'd like to send a message to a RabbitMQ server and then wait for a reply message (on a "reply-to" queue). Of course, I don't want to wait forever in case the application processing these messages is down - there needs to be a timeout. It sounds like a very basic task, yet I can't find a way to do this. I've now run into this problem with Java API.
The RabbitMQ Java client library now supports a timeout argument to its QueueConsumer.nextDelivery() method.
For instance, the RPC tutorial uses the following code:
channel.basicPublish("", requestQueueName, props, message.getBytes());
while (true) {
QueueingConsumer.Delivery delivery = consumer.nextDelivery();
if (delivery.getProperties().getCorrelationId().equals(corrId)) {
response = new String(delivery.getBody());
break;
}
}
Now, you can use consumer.nextDelivery(1000) to wait for maximum one second. If the timeout is reached, the method returns null.
channel.basicPublish("", requestQueueName, props, message.getBytes());
while (true) {
// Use a timeout of 1000 milliseconds
QueueingConsumer.Delivery delivery = consumer.nextDelivery(1000);
// Test if delivery is null, meaning the timeout was reached.
if (delivery != null &&
delivery.getProperties().getCorrelationId().equals(corrId)) {
response = new String(delivery.getBody());
break;
}
}
com.rabbitmq.client.QueueingConsumer has a nextDelivery(long timeout) method, which will do what you want. However, this has been deprecated.
Writing your own timeout isn't so hard, although it may be better to have an ongoing thread and a list of in-time identifiers, rather than adding and removing consumers and associated timeout threads all the time.
Edit to add: Noticed the date on this after replying!
There is similar question. Although it's answers doesn't use java, maybe you can get some hints.
Wait for a single RabbitMQ message with a timeout
I approached this problem using C# by creating an object to keep track of the response to a particular message. It sets up a unique reply queue for a message, and subscribes to it. If the response is not received in a specified timeframe, a countdown timer cancels the subscription, which deletes the queue. Separately, I have methods that can be synchronous from my main thread (uses a semaphore) or asynchronous (uses a callback) to utilize this functionality.
Basically, the implementation looks like this:
//Synchronous case:
//Throws TimeoutException if timeout happens
var msg = messageClient.SendAndWait(theMessage);
//Asynchronous case
//myCallback receives an exception message if there is a timeout
messageClient.SendAndCallback(theMessage, myCallback);

How to write a transactional, multi-threaded WCF service consuming MSMQ

I have a WCF service that posts messages to a private, non-transactional MSMQ queue. I have another WCF service (multi-threaded) that processes the MSMQ messages and inserts them in the database.
My issue is with sequencing. I want the messages to be in certain order. For example MSG-A need to go to the database before MSG-B is inserted. So my current solution for that is very crude and expensive from database perspective.
I am reading the message, if its MSG-B and there is no MSG-A in the database, I throw it back on the message queue and I keep doing that till MSG-A is inserted in the database. But this is a very expensive operation as it involves table scan (SELECT stmt).
The messages are always posted to the queue in sequence.
Short of making my WCF Queue Processing service Single threaded (By setting the service behavior attribute InstanceContextMode to Single), can someone suggest a better solution?
Thanks
Dan
Instead of immediately pushing messages to the DB after taking them out of the queue, keep a list of pending messages in memory. When you get an A or B, check to see if the matching one is in the list. If so, submit them both (in the right order) to the database, and remove the matching one from the list. Otherwise, just add the new message to that list.
If checking for a match is too expensive a task to serialize - I assume you are multithreading for a reason - the you could have another thread process the list. The existing multiple threads read, immediately submit most messages to the DB, but put the As and Bs aside in the (threadsafe) list. The background thread scavenges through that list finding matching As and Bs and when it finds them it submits them in the right order (and removes them from the list).
The bottom line is - since your removing items from the queue with multiple threads, you're going to have to serialize somewhere, in order to ensure ordering. The trick is to minimize the number of times and length of time you spend locked up in serial code.
There might also be something you could do at the database level, with triggers or something, to reorder the entries when it detects this situation. I'm afraid I don't know enough about DB programming to help there.
UPDATE: Assuming the messages contain some id that lets you associate a message 'A' with the correct associated message 'B', the following code will make sure A goes in the database before B. Note that it does not make sure they are adjacent records in the database - there could be other messages between A and B. Also, if for some reason you get an A or B without ever receiving the matching message of the other type, this code will leak memory since it hangs onto the unmatched message forever.
(You could extract those two 'lock'ed blocks into a single subroutine, but I'm leaving it like this for clarity with respect to A and B.)
static private object dictionaryLock = new object();
static private Dictionary<int, MyMessage> receivedA =
new Dictionary<int, MyMessage>();
static private Dictionary<int, MyMessage> receivedB =
new Dictionary<int, MyMessage>();
public void MessageHandler(MyMessage message)
{
MyMessage matchingMessage = null;
if (IsA(message))
{
InsertIntoDB(message);
lock (dictionaryLock)
{
if (receivedB.TryGetValue(message.id, out matchingMessage))
{
receivedB.Remove(message.id);
}
else
{
receivedA.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(matchingMessage);
}
}
else if (IsB(message))
{
lock (dictionaryLock)
{
if (receivedA.TryGetValue(message.id, out matchingMessage))
{
receivedA.Remove(message.id);
}
else
{
receivedB.Add(message.id, message);
}
}
if (matchingMessage != null)
{
InsertIntoDB(message);
}
}
else
{
// not A or B, do whatever
}
}
If you're the only client of those queues, you could very easy add a timestamp as a message header (see IDesign sample) and save the Sent On field (kinda like an outlook message) in the database as well. You could process them in the order they were sent (basically you move the sorting logic at the time of consumption).
Hope this helps,
Adrian