I am looking for a reliable and easy pattern to consume (execute) background tasks in .net core in parallel.
I found this https://stackoverflow.com/a/49814520/1448545 answer, but the problem is there is always single consumer of the tasks.
What if there is 1 new task to perform per 100ms, while each task takes 500ms to complete (e.g. long running API call). In such case the tasks will pile up.
How to make it dynamic, so if there is more items in BlockingCollection<TaskSettings> _tasks the .net core will create more tasks executors (consumer) dynamically?
Related
I have an application that mimics an equity market. One part of it, generates price changes and POSTS them to a consumer sending roughly 100 price changes roughly ten times a second. The second part of the market exchange takes in orders, executes them randomly and asynchronously sends execution reports back to the same consumer of the price changes.
I want to put this on a App Service, but here's the issue:
I want the price generator to start immediately and run continuously.
The order execution only needs to run when orders are sent (asynchronously) and all the orders have been executed or cancelled. Then it can shut down until another order is received.
It seems like I'm forced into one of two buckets and neither applies to what I want to do. A Web Job appears to work like a Service in Windows 11. It will start up immediately and run until you shut it down, but it doesn't have the logic to handle an ASP-type controller.
Deploying as an App Service works as long as I wake it up by POSTing an order, but the price feed doesn't start until I send the order.
So here's the question: How do you deploy a .NET Core application as an App Service and have it start automatically (without waking it up with an initial HTTP call)?
According to your description, I suggest you could consider modifying the price feed as a background service inside the .net core application. The background tasks can be implemented as hosted services. A hosted service is a class with background task logic that implements the IHostedService interface.
It contains the StartAsync method. The StartAsync(CancellationToken) contains the logic to start the background task. StartAsync is called before:
The app's request processing pipeline is configured.
The server is started and IApplicationLifetime.ApplicationStarted is triggered.
More details, you could refer to this article.
Besides, I suggest you could also set the Azure web app's configuration as alwayson.
What am I doing wrong?
IHostedService classes are registered in ASP.net core HostBuilder.
Then run continuously in the background, even if they have no work to do.
There is no way to pass them work, so I suppose they must pick up tasks from a dynamic store eg database
So it seems they poll mostly. Or run short interval timers; looking for work
Then when they get jobs they can only do 1 at a time.
So if my users (100+) run reports on a Friday (or what ever day they wish) the service just polls the database for 6.5 days and then is throttled for 0.5 days to get the 100+ reports generated.
So how can I
control the starting of the IHostedservice service
Run more that 1 instance of the IHostedservice service
Send tasks in the form of data to the IHostedservice service (instance)
Furthur to this I will need 10+ different types of IHostedService. (10+ different polling types)
So running them in the backgorund just to poll the database takes up CPU cycles on both web server and database server
control the starting of the IHostedservice service
You can't. The start and stop of IHostedService is controlled by Host itself. See official docs
Run more that 1 instance of the IHostedservice service
As the IHostedService is run by host itself, I believe you won't actually want to start multiple IHostedService, but instead you're seeking a way to start tasks in parallel way and then wait all the tasks to be done. IMO, a better way is to create a delegate that returns a task which consists of several sub tasks. This sub tasks will run in a parallel way.
Send tasks in the form of data to the IHostedservice service (instance)
The official docs has an excellent example for this:
You can wrap your task with a delegate (which is a form of data).
When you want to send this task to IHostedService somewhere (e.g. in Controller), just enqueue your task wrapper (delegate instance) into the queue service.
The HostedService will wait until there's a task in the queue, dequeue a work item, and execute that delegate instance.
If you want to start several work items at the same time, just dequeue multiple work items and then start these tasks with theTask.WhenAll :
await Task.WhenAll(...);
For more details, see parallel-programming
I have a web app (Asp.net Core 2.0) and a simple job processor (.NET Core 2.0) as below.
My web app will add jobs to a database, the processor will get the job every 5 minutes and do some logic.
I wrapped the processor in docker, deployed & run on two servers (there are two instances)
Any solutions to make sure there is no duplicate work here? I want two instances active at the same time.
Simple job processor
while (true)
{
Console.WriteLine("Background worker is running");
//Query job from table job
if (DateTime.UtcNow < job.ExpiredAt)
{
//Call external REST API
//Do something
}
Console.WriteLine($"Background worker is delayed for 5 minutes\r\n");
Task.Delay(JobInterval * 60 * 1000).Wait();
};
You need something to coordinate your workers. You can't just have multiple instances grabbing at the same pool and keep things separate with no duplication of work. Concurrency will eat your lunch. Instead, there should be a coordinating node that assigns tasks out to the other nodes. That's the only way you can handle this.
The Heroku Dev Center on the page about using worker dynos and background jobs states that you need to use worker's + queues to handle API calls, such as fetching an RSS feed, as the operation may take some time if the server is slow and doing this on a web dyno would result in it being blocked from receiving additional requests.
However, from what I've read, it seems to me that one of the major points of Node.js is that it doesn't suffer from blocking under these conditions due to its asynchronous event-based runtime model.
I'm confused because wouldn't this imply that it would be ok to do API calls (asynchronously) in the web dynos? Perhaps the docs were written more for the Ruby/Python/etc use cases where a synchronous model was more prevalent?
NodeJS is an implementation of the reactor pattern. The default build of of NodeJS uses 5 reactors. Once these 5 reactors are being used for IO bound tasks, the main event loop will block.
A common misconception about NodeJS is that it is a system that allows you to do many things at once. This is not necessarily the case, it allows you to do other things while waiting on IO bound tasks, up to 5 at a time.
Any CPU bound tasks are always executed in the main event loop, meaning they will block.
This means if your "job" is IO bound, like putting things in databases then you can probably get away with not using dynos. This of course is dependent on how many things you plan on having go on at once. Remember, any task you put in your main app will take away resources from other incoming requests.
Generally it is not recommended for things like this, if you have a job that does some processing, it belongs in a queue that is executed in its own process or thread.
how to design parallel processing workflow
I have a scenarial case about data analysis.
There are four steps basicly:
pick up task either read from a queue or receive a message throught API (web service maybe) to trigger the service
submit request to remote service base on the parameters from step 1
wait from remote service finished and download
perform process on the data that downloaded from step 3
the four step above looks like a sequence workflow.
my question is that how can i scale it out.
every day i might need to perform hundreds to thousands of this task.
if i can do them in parallel, that will help a lot.
e.g run 20 tasks at a time.
so can we config windows workflow foundation to run parallel?
Thanks.
You may want to use pfx (http://www.albahari.com/threading/part5.aspx), then you can control how many threads to make for fetching, and using PLINQ I find helpful.
So, you loop over the list of urls, perhaps reading from a file or database, and then in your select you can then call a function to do the processing.
If you can go into more detail as to whether you want to have the fetching and processing be on different threads, for example, it may be easier to give a more complete answer.
UPDATE:
This is how I would approach this, but I am also using ConcurrentQueue (http://www.codethinked.com/net-40-and-system_collections_concurrent_concurrentqueue) so I can be putting data into the queue while reading from it.
This way each thread can dequeue safely, without worrying about having to lock your collection.
Parallel.For(0, queue.Count, new ParallelOptions() { MaxDegreeOfParallelism = 20 },
(j) =>
{
String i;
queue.TryDequeue(out i);
// call out to URL
// process data
}
});
You may want to put the data into another concurrent collection and have that be processed separately, it depends on your application needs.
Depending on the way your tasks and workflow is modeled you can use a Parallel activity and create different branches for the different tasks to be performed. Each branch has its own logic and the WF runtime will start a second WCF request to retrieve data as soon as it is waiting for the first to respond. This requires you to model the number of branches explicitly but allows for different activities in each branch.
But from you description it sounds like you have the same steps for each task and in that case you could model it using a ParallelForEach activity and have that iterate over a collection of tasks. Each task object would need to contain all the information used for the request. This requires each task to have the same steps but you can put in as many tasks as you want.
What works best really depends on your scenario.