Are there queues between WebFilters - spring-webflux

I am planning to use webflux to create a set of micro-services. Each service has a set of processing stages - a set of webfilters for pre-processing, a filter/stage to do some IO (which could become very slow), and a set of filters to do post processing. Each post procession filter need result of the IO operation to do its work. How does this work under the hood? Does the framework automatically trigger filters when results of the proceeding filter is available?
More information
Lets say I have a pipeline of filters. First filter does a remote API call - which may be slow. Second filter uses results of this API call. In a event loop based asynchronous processing pipeline, this would be modeled using one or more queues and a thread pool. Incoming requests would be put in to the queue and will be picked up by a worker thread. As the first step of the pipeline is doing the API call, this thread would do the API call and return immediately without waiting for the result. When result is received, it will be placed in the queue again and will be picked up by a worker thread and dispatched to the second filter which can use the result of the API call. Without a framework, we will setup queues, threads, message handlers in proper way so that at runtime above sequence of events happen. I just wanted to understand how this work in webflux.

Related

Is the WebSocket returned from HttpContext.WebSockets.AcceptWebSocketAsync thread-safe?

In ASP.NET Core v2, is the WebSocket returned by HttpContext.WebSockets.AcceptWebSocketAsync thread-safe?
More specifically, can I call ReceiveAsync in parallel with a thread that calls SendAsync?
I'd like to be able to have a message loop receiving messages like the close event, while at the same time be able to send messages in response to server-side events (that is, not in response to received events).
I haven't found any documentation specifying what implementation AcceptWebSocketAsync returns, but in practice it appears to consistently return a ManagedWebSocket instance.
I haven't found any API documentation for ManagedWebSocket. Fortunately the source code has been published and it contains this helpful note:
Thread-safety:
It's acceptable to call ReceiveAsync and SendAsync in parallel. One of each may run concurrently.
It's acceptable to have a pending ReceiveAsync while CloseOutputAsync or CloseAsync is called.
Attemping to invoke any other operations in parallel may corrupt the instance. Attempting to invoke a send operation while another is in progress or a receive operation while another is in progress will result in an exception.
— (source1)
(source2)
tl;dr: not thread-safe in general, but the read and send in parallel scenario is supported

How to make a Saga handler Reentrant

I have a task that can be started by the user, that could take hours to run, and where there's a reasonable chance that the user will start the task multiple times during a run.
I've broken the processing of the task up into smaller batches, but the way the data looks it's very difficult to tell what's still to be processed. I batch it using messages that each process a bite sized chunk of the data.
I have thought of using a Saga to control access to starting this process, with a Saga property called Processing that I set at the start of the handler and then unset at the end of the handler. The handler does some work and sends the messages to process the data. I check the value at the start of the handler, and if it's set, then just return.
I'm using Azure storage for Saga storage, if it makes a difference for the next bit. I'm also using NSB 6
I have a few questions though:
Is this the correct approach to re-entrancy with NSB?
When is a change to Saga data persisted? (and is it different depending on the transport?)
Following on from the above, if I set a Saga value in a handler, wait a while and then reset it to its original value will it change the persistent storage at all?
Seem to be cross posted in the Particular Software google group:
https://groups.google.com/forum/#!topic/particularsoftware/p-qD5merxZQ
Sagas are very often used for such patterns. The saga instance would track progress and guard that the (sub)tasks aren't invoked multiple times but could also take actions if the expected task(s) didn't complete or is/are over time.
The saga instance data is stored after processing the message and not when updating any of the saga data properties. The logic you described would not work.
The correct way would be having a saga that orchestrates your process and having regular handlers that do the actual work.
In the saga handle method that creates the saga check if the saga was already created or already the 'busy' status and if it does not have this status send a message to do some work. This will guard that the task is only initiated once and after that the saga is stored.
The handler can now do the actual task, when it completes it can do a 'Reply' back to the saga
When the saga receives the reply it can now start any other follow up task or raise an event and it can also 'complete'.
Optimistic concurrency control and batched sends
If two message are received that create/update the same saga instance only the first writer wins. The other will fail because of optimistic concurrency control.
However, if these messages are not processed in parallel but sequential both fail unless the saga checks if the saga instance is already initialized.
The following sample demonstrates this: https://github.com/ramonsmits/docs.particular.net/tree/azure-storage-saga-optimistic-concurrency-control/samples/azure/storage-persistence/ASP_1
The client sends two identical message bodies. The saga is launched and only 1 message succeeds due to optimistic concurrency control.
Due to retries eventually the second copy will be processed to but the saga checks the saga data for a field that it knows would normally be initialized by by a message that 'starts' the saga. If that field is already initialized it assumes the message is already processed and just returns:
It also demonstrates batches sends. Messages are not immediately send until the all handlers/sagas are completed.
Saga design
The following video might help you with designing your sagas and understand the various patterns:
Integration Patterns with NServiceBus: https://www.youtube.com/watch?v=BK8JPp8prXc
Keep in mind that Azure Storage isn't transactional and does not provide locking, it is only atomic. Any work you do within a handler or saga can potentially be invoked more than once and if you use non-transactional resources then make sure that logic is idempotent.
So after a lot of testing
I don't believe that this is the right approach.
As Archer says, you can manipulate the saga data properties as much as you like, they are only saved at the end of the handler.
So if the saga receives two simultaneous messages the check for Processing will pass both times and I'll have two processes running (and in my case processing the same data twice).
The saga within a saga faces a similar problem too.
What I believe will work (and has done during my PoC testing) is using a database unique index to help out. I'm using entity framework and azure sql, so database access is not contained within the handler's transaction (this is the important difference between the database and the saga data). The database will also operate across all instances of the endpoint and generally seems like a good solution.
The table that I'm using has each of the columns that make up the saga 'id', and there is a unique index on them.
At the beginning of the handler I retrieve a row from the database. If there is a row, the handler returns (in my case this is okay, in others you could throw an exception to get the handler to run again). The first thing that the handler does (before any work, although I'm not 100% sure that it matters) is to write a row to the table. If the write fails (probably because of the unique constraint being violated) the exception puts the message back on the queue. It doesn't really matter why the database write fails, as NSB will handle it.
Then the handler does the work.
Then remove the row.
Of course there is a chance that something happens during processing of the work, so I'm also using a timestamp and another process to reset it if it's busy for too long. (still need to define 'too long' though :) )
Maybe this can help someone with a similar problem.

How to figure out if mule flow message processing is in progress

I have a requirement where I need to make sure only one message is being processed at a time by a mule flow.Flow is triggered by a quartz scheduler which reads one file from FTP server every time
My proposed solution is to keep a global variable "FLOW_STATUS" which will be set to "RUNNING" when a message is received and would be reset to "STOPPED" once the processing of message is done.
Any messages fed to the flow will check for this variable and abort if "FLOW_STATUS" is "RUNNING".
This setup seems to be working , but I was wondering if there is a better way to do it.
Is there any best practices around this or any inbuilt mule helper functions to achieve the same instead of relying on global variables
It seems like a more simple solution would be to set the maxActiveThreads for the flow to 1. In Mule, each message processed gets it's own thread. So setting the maxActiveThreads to 1 would effectively make your flow singled threaded. Other pending requests will wait in the receiver threads. You will need to make sure your receiver thread pool is large enough to accommodate all of the potential waiting threads. That may mean throttling back your quartz scheduler to allow time process the files so the receiver thread pool doesn't fill up. For more information on the thread pools and how to tune performance, here is a good link: http://www.mulesoft.org/documentation/display/current/Tuning+Performance

Mule ESB: How to achieve typical ReTry Mechanism in MULE ESB

I need to implement a logic on Retry. Inbound endpoint pushes the messages to Rest (Outbound). If the REST is unavailable, I need to retry for 1 time and put it in the queue. But the second upcoming messages should not do any retry, it has to directly put the messages in to queue until the REST service is available.
Once the service is available, I need to pushes all the messages from QUEUE to REST Service (in ordering) via batch job.
Questions:
How do I know the service is unavailable for my second message? If I use until Successful, for every message it do retry and put in queue. Plm is 2nd message shouldn't do retry.
For batch, I thought of using poll, but how to tell to poll, when the service becomes available to begin the batch process. (bcz,Poll is more of with configuring timings to run batch)?
Other ticky confuses me is - Here ordering has to be preserved. once the service is available. Queue messages ( i,e Batch) has to move first to REST Services then with real time. I doubt whether Is it applicable.
It will be very helpful for the quick response to implement the logic.
Using Mule: 3.5.1
I could try something like below: using flow controls
process a message; if exception or bad response code, set a variable/property like serviceAvailable=false.
subsequent message processing will first check the property serviceAvailable to process the messages. if property is false, en-queue the messages to a DB table with status=new/unprocessed
create a flow/scheduler to process the messages from DB sequentially, but it will not check the property serviceAvailable and call the rest service.
If service throws exception it will not store the messages in db again but if processes successfully change the property serviceAvailable=true and de-queue the messages or change the status. Add another property and set it to true if there are more messages in db table like moreDBMsg=true.
New messages should not be processed/consumed until moreDBMsg=false
once moreDBMsg=false and serviceAvailable=true start processing the messages from queue.
For the timeout I would still look at the response code and catch time-outs to determine if the call was successful or requires a retry. Practically you normally do multi threading anyway, so you have multiple calls in parallel anyway. Or simply one call starts before the other ends.
That is just quite normal.
But you can simply retry calls in a queue that time out. And after x amounts of time-outs you "skip" or defer the retry.
But all of this has been done using actual Mule flow components like either:
MEL http://www.mulesoft.org/documentation/display/current/Mule+Expression+Language+Reference
Or flow controls: http://www.mulesoft.org/documentation/display/current/Choice+Flow+Control+Reference
Or for example you reference a Spring Bean and do it in native Java code.
One possibility for the queue would be to persist it in a database. Mule has database connector that has a "poll" feature, see: http://www.mulesoft.org/documentation/display/current/JDBC+Transport+Reference#JDBCTransportReference-PollingTransport

On Heroku, does utilising Node.js prevent the need for queues + worker dynos for third-party API calls?

The Heroku Dev Center on the page about using worker dynos and background jobs states that you need to use worker's + queues to handle API calls, such as fetching an RSS feed, as the operation may take some time if the server is slow and doing this on a web dyno would result in it being blocked from receiving additional requests.
However, from what I've read, it seems to me that one of the major points of Node.js is that it doesn't suffer from blocking under these conditions due to its asynchronous event-based runtime model.
I'm confused because wouldn't this imply that it would be ok to do API calls (asynchronously) in the web dynos? Perhaps the docs were written more for the Ruby/Python/etc use cases where a synchronous model was more prevalent?
NodeJS is an implementation of the reactor pattern. The default build of of NodeJS uses 5 reactors. Once these 5 reactors are being used for IO bound tasks, the main event loop will block.
A common misconception about NodeJS is that it is a system that allows you to do many things at once. This is not necessarily the case, it allows you to do other things while waiting on IO bound tasks, up to 5 at a time.
Any CPU bound tasks are always executed in the main event loop, meaning they will block.
This means if your "job" is IO bound, like putting things in databases then you can probably get away with not using dynos. This of course is dependent on how many things you plan on having go on at once. Remember, any task you put in your main app will take away resources from other incoming requests.
Generally it is not recommended for things like this, if you have a job that does some processing, it belongs in a queue that is executed in its own process or thread.