How to use a different named worker pool in same verticle? - blocking

I have one verticle in my service which takes in the http requests and uses executeBlocking to talk to MySQL db. I am using named worker pool to interact with DB. Now, for pushing the application metrics (using a lib. which is blocking) I want to use a different named worker pool. As I don't want the DB operations to be interrupted with metrics so I want to have a separate worker pool.
I could use event bus and use a worker verticle to push the metrics but as that has overhead of transformation to the JsonObject, I want to use executeBlocking itself from the same verticle.
As mentioned here https://groups.google.com/d/msg/vertx/eSf3AQagGGU/9m8RizIJeNQJ
, the worker pool used in both the cases is same. So, will making a new worker verticle really help me in decoupling the threads used for DB operation and the ones used to push metrics.
Can anyone help me with a better design choice or how can I use a different worker pool if I use the same verticle ?

Try the following code (written in Kotlin, but you get the idea):
val workerExecutor1 = vertx.createSharedWorkerExecutor("executor1", 4)
val workerExecutor2 = vertx.createSharedWorkerExecutor("executor2", 4)
workerExecutor1.executeBlocking(...) // execute your db code here
workerExecutor2.executeBlocking(...) // execute your metrics code here
Don't forget to close the workerExecutor once it's not needed:
workerExecutor1.close()

Related

No duplication work in two instance of simple job processor

I have a web app (Asp.net Core 2.0) and a simple job processor (.NET Core 2.0) as below.
My web app will add jobs to a database, the processor will get the job every 5 minutes and do some logic.
I wrapped the processor in docker, deployed & run on two servers (there are two instances)
Any solutions to make sure there is no duplicate work here? I want two instances active at the same time.
Simple job processor
while (true)
{
Console.WriteLine("Background worker is running");
//Query job from table job
if (DateTime.UtcNow < job.ExpiredAt)
{
//Call external REST API
//Do something
}
Console.WriteLine($"Background worker is delayed for 5 minutes\r\n");
Task.Delay(JobInterval * 60 * 1000).Wait();
};
You need something to coordinate your workers. You can't just have multiple instances grabbing at the same pool and keep things separate with no duplication of work. Concurrency will eat your lunch. Instead, there should be a coordinating node that assigns tasks out to the other nodes. That's the only way you can handle this.

DMLC and concurrent consumers working

Does DMLC creates separate threads for each concurrent consumer? What happens under the hood? The documentation writes this:
Actual MessageListener execution happens in asynchronous work units which are created through Spring's TaskExecutor abstraction. By default, the specified number of invoker tasks will be created on startup, according to the "concurrentConsumers" setting.
I am not able to understand this, are these tasks executed in parallel? If yes, what are the default limits for this, like thread count etc.?
Thanks!
Yes a separate thread is used for each consumer (obtained from the task executor). By default, a SimpleAsyncTaskExecutor is used and the thread is destroyed when the consumer is stopped. There is no thread limit beyond the container's concurrency settings.
If you inject a different kind of task executor (such as a ThreadPoolTaskExecutor) you must make sure it has enough available threads to support your container's concurrency settings. Container threads are generally long-lived.

On Heroku, does utilising Node.js prevent the need for queues + worker dynos for third-party API calls?

The Heroku Dev Center on the page about using worker dynos and background jobs states that you need to use worker's + queues to handle API calls, such as fetching an RSS feed, as the operation may take some time if the server is slow and doing this on a web dyno would result in it being blocked from receiving additional requests.
However, from what I've read, it seems to me that one of the major points of Node.js is that it doesn't suffer from blocking under these conditions due to its asynchronous event-based runtime model.
I'm confused because wouldn't this imply that it would be ok to do API calls (asynchronously) in the web dynos? Perhaps the docs were written more for the Ruby/Python/etc use cases where a synchronous model was more prevalent?
NodeJS is an implementation of the reactor pattern. The default build of of NodeJS uses 5 reactors. Once these 5 reactors are being used for IO bound tasks, the main event loop will block.
A common misconception about NodeJS is that it is a system that allows you to do many things at once. This is not necessarily the case, it allows you to do other things while waiting on IO bound tasks, up to 5 at a time.
Any CPU bound tasks are always executed in the main event loop, meaning they will block.
This means if your "job" is IO bound, like putting things in databases then you can probably get away with not using dynos. This of course is dependent on how many things you plan on having go on at once. Remember, any task you put in your main app will take away resources from other incoming requests.
Generally it is not recommended for things like this, if you have a job that does some processing, it belongs in a queue that is executed in its own process or thread.

Prioritize real time msgs over batch msgs using Queues/MDBs

In my application a specific service has a constant bandwidth (For e.g 100 transactions at a time ) , requests to the service arrive real-time as well as batch jobs (Queues). The real time requests doesnt have a uniform distribution. I need a way to make sure that real time jobs are processed first before the batch jobs and also make sure that at any time I don't exceed the threshold of the service.
Please evaluate the following approach.
Have 2 queues A - real time and B - Batch job. Have a thread pool of size = 100 (Service Threshold ) and let the
thread pool first try to pick msgs from A if any else pick from B.
My application runs on Weblogic , I want to make use of MDBs instead of the thread pool but there is no way to make the MDBs listen to multiple Queues.
Within JMS you can set a message priority which should be respected if possible. This may be something simple to try.
Another option could be to set a JMS property on the message with the client and use a Message Selector on the MDB. You could set MY_MESSAGE_TYPE=batch/rt and then have multiple MDB's deployed that are listening to the same queue but can be assigned to different work managers. Keep in mind that Work Manager != Thread Pool. You can also set a Request Class to ensure that if the batch pool is in use that the RT pool will not be starved for threads/CPU.
With this design I believe that if you have two MDB's, one with a message selector, messages that meet the selector criteria should be delivered to the MDB with that selector (RT) before an MDB with no selectors (BATCH). This would be a fairly simple POC to do - set up a client that sends messages to the queue, some of which have the JMS property set to RT and others that do not have it set.
10.0 referece (which is still applicable): http://docs.oracle.com/cd/E11035_01/wls100/config_wls/self_tuned.html

How should i design my workflow so that taks can run parallel

how to design parallel processing workflow
I have a scenarial case about data analysis.
There are four steps basicly:
pick up task either read from a queue or receive a message throught API (web service maybe) to trigger the service
submit request to remote service base on the parameters from step 1
wait from remote service finished and download
perform process on the data that downloaded from step 3
the four step above looks like a sequence workflow.
my question is that how can i scale it out.
every day i might need to perform hundreds to thousands of this task.
if i can do them in parallel, that will help a lot.
e.g run 20 tasks at a time.
so can we config windows workflow foundation to run parallel?
Thanks.
You may want to use pfx (http://www.albahari.com/threading/part5.aspx), then you can control how many threads to make for fetching, and using PLINQ I find helpful.
So, you loop over the list of urls, perhaps reading from a file or database, and then in your select you can then call a function to do the processing.
If you can go into more detail as to whether you want to have the fetching and processing be on different threads, for example, it may be easier to give a more complete answer.
UPDATE:
This is how I would approach this, but I am also using ConcurrentQueue (http://www.codethinked.com/net-40-and-system_collections_concurrent_concurrentqueue) so I can be putting data into the queue while reading from it.
This way each thread can dequeue safely, without worrying about having to lock your collection.
Parallel.For(0, queue.Count, new ParallelOptions() { MaxDegreeOfParallelism = 20 },
(j) =>
{
String i;
queue.TryDequeue(out i);
// call out to URL
// process data
}
});
You may want to put the data into another concurrent collection and have that be processed separately, it depends on your application needs.
Depending on the way your tasks and workflow is modeled you can use a Parallel activity and create different branches for the different tasks to be performed. Each branch has its own logic and the WF runtime will start a second WCF request to retrieve data as soon as it is waiting for the first to respond. This requires you to model the number of branches explicitly but allows for different activities in each branch.
But from you description it sounds like you have the same steps for each task and in that case you could model it using a ParallelForEach activity and have that iterate over a collection of tasks. Each task object would need to contain all the information used for the request. This requires each task to have the same steps but you can put in as many tasks as you want.
What works best really depends on your scenario.