Send notice to Azure Web role from a Azure Worker role - Best Practice - lucene

Situation
Users can upload Documents, a queue message will be placed onto the queue with the documents ID. The Worker Role will pick this up and get the document. Parse it completely with Lucene. After the parsing is complete the Lucene IndexSearcher on the Webrole should be updated.
On the Web role I'm keeping a static Lucene IndexSearcher because otherwise you have to make a new IndexSearch every search request and this gives a lot of overhead etc.
What I want do to is send a notice from the Worker Role to the Web Role that he needs to update his IndexSearcher.
Possible Solutions
Make some sort of notice queue. The Web Role starts an endless task that keeps checking the notice queue. If he finds a message then he should update the IndexSearch.
Start a WCF Service on the Worker Role and connect with the Web Role. Do a callback from the Worker Role and tell the Web Role through the Service that he needs to update his IndexSearcher.
Just update it on a regular interval
What would be the best solution or is there any other solution for this?
Many thanks !

If your worker roles write each finished job's details to a table using a PK of something like (DateTime.MaxValue - DateTime.UtcNow).Ticks.ToString("d19"), you will have a sorted list of the latest jobs that have been processed. Set your web role to poll the table like so:
var q = ctx.CreateQuery<LatestJobs>("jobstable")
.Where(j => j.PartitionKey.CompareTo(LastIndexTime.GetReverseTicks()) < 0)
.Take(1)
.AsTableServiceQuery()
if (q.Count() > 0)
{
//new jobs exist since last check... re-index.
}
For worker roles that do the indexing work, this is great because they can write indiscriminately to the table without worry of conflict. For you, you also have an audit log of the jobs they are processing (assuming you put some details in there).
However, you have one remaining problem: it sounds like you have 1 web role that updates the index. This one web role can of course poll this table on whatever frequency you choose (just track the LastIndexTime for searching later). Your issue is how to control concurrency of the web role(s) if you have more than one. Does each web role maintain it's own index or do you have one stored somewhere for all? Sorry, but I am not an expert in Lucene if that should be obvious.
Anyhow, if you have multiple instances in your WebRole and a single index that all can see, you need to prevent multiple roles from updating the index over and over. You can do this through leasing the index (if stored in blob storage).
Update based on comment:
If each WebRole instance has its own index, then you don't have to worry about leasing. That is only if they are sharing a blob resource together. So, this technique should work fine as-is and your only potential obstacle is that the polling intervals for the web roles could be slightly out of sync, causing somewhat different results until all update (depending on which instance you hit). Poll every 30 seconds on the table and that will be your max out of sync. Each web role instance simply needs to track the last time it updated and do incremental searches from that point.

Depending on upload frequency, you may find queue messages to cause you unneeded updates. For instance, if you get a dozen uploads and process them in close time proximity, you'd now have a dozen queue messages, each telling your web role to update. It would make more sense to keep a single signal (maybe a table row or SQL Azure row). You could simply set a row value to 1, signaling the need to update. When your web role detects this change, reset to 0 and start the update. Note: If using an Azure Table row, you'd need to poll for updates (and depending on traffic, you could start accumulating a large number of transactions). You could use the AppFabric Cache for this signal as well.
You could use a WCF service on an internal endpoint on your Web Role. However, you still have the burst issue (if you get, say, a dozen uploads while the webrole is updating, you don't want to then do another dozen updates).

Related

Hangfire - Is there a way to attach additional meta data to jobs when they are created to be able to identify them later?

I am looking to implement Hangfire within an Asp.Net Core application.
However, I'm struggling to understand how best to prevent the user from creating duplicate Hangfire "Fire-and-Forget" jobs.
The Problem
Say the user, via the app, creates a job that does some processing relating to a specific client. The process may take several minutes to complete. I want to be able to prevent the user from creating another job for the same client while there are other jobs for that client still being processed by Hangfire (i.e. there can only be 1 processing job for a specific client at any one time, although several different clients could also each have their own job being processed).
Solution?
I need a way to attach additional meta-data (in this example, the client id) to each job as it is created, which I can then use to interrogate the jobs currently processing in Hangfire to see if any of them relate to the client id in question.
It seems like such a basic feature that would prove so useful for such scenarios, but I'm coming to the conclusion that such a thing isn't supported, which surprises me.
... Unless you know different.
Hangfire looks great, and I'm keen to use it, but this might be a show-stopper for me.
Any advice would be greatly received.
Thanks
I need a way to attach additional meta-data (in this example, the
client id) to each job as it is created
Adding metadata to jobs can be achieved by the mean of hangfire filters.
You may have a look at this answer.
https://stackoverflow.com/a/57396553/1236044
Depending on your needs you may use more filters types.
For example, the IElectStateFilter may be useful to filter out jobs if another one is currently processing.
I you have several processing servers, you will need your own storage solution to handle your own custom currently processing/priority/locking mechanism.

How to avoid DB deadlocks when multiple Kafka messages produced for same item?

We have 2 difference web applications. lets named them A and B.
When user change analyse of item in A app, A app do stuff things and produce a kafka mesage.
A rest API in B app consume the message via Confluent http sink connector.
The rest API in B app call SQL Stored Procedure that update records with transaction.
When (happens a lot) the user changing analyze of the same item in A app constantly- a deadlock caused in the DB. because the SP still works on records when another call for same item reach.
what is the best practice to handle this issue?
manage some global list with current items(IDs) enter to SP and remove them when SP finish? handle it on DB? other suggestion?
some relevant info:
the apps are ASP .Net Core.
stored in load balancing envoirment(AWS).
Any relevant answer is appreciated.
Thanks!
Make sure the same item is always published with the same key (eg use the hashcode of the item). This ensures that all requests from app A will go on the same topic partition.
In app B make sure the procedure call is done in the consumer polling thread (don't spawn a new thread) so that all procedure calls for the same item will be guaranteed to execute sequentially.
This will resolve deadlocks at the cost of performance. For multiple items you can scale horizontally with multiple consumers (as long as you have plenty of partitions). If performance on repeated requests for the same item is too slow then you have a more complex design issue to address.

Decide large process and notify users

I have some processing to do at server side.
When user selects large amount of data for processing (let's say, they are insert, update, delete in database and file read/write stuffs), it takes so much time.
I am using c# with .net core mvc web application.
In this case, is it possible to decide when process takes more than some decided time, run it into background (or say transfer process to another tool if possible) and notify user that it will take some time and u will be notified once done (that notification need not be real time. We can mail)
So is there any mechanism to do this?
You can go ahead and create a job for processing the data, you can try hangfire, that allows you to create background jobs inside your aspnet core application.
I don't think you will be able to make what you want. On the other hand you can use parallel foreach to process more data faster.
Ex:
Parallel.ForEach([list of entities],
new ParallelOptions { MaxDegreeOfParallelism = 2 },
[entity] =>
{
//YOURCODE PROCESSING THE ENTITY (INSERT,UPDATE,DELETE)
}
);
The property MaxDegreeOfParallelism defines how many threads you're going to use in maximum. I suggest you to start with 2 and increase one by one to see what's the best fit for you.
Parallel foreach is going to use more threads to process your data.
If parallel foreach does not solve your problem, you can use an strategy that consists in receiving your users data, assumes that this processing is going to take a long time, stores this data as is on your database or any other kind of storage, give your user a answer back with a transaction id and a text explaining that it's going to take a long time and the response is going to be sent by e-mail. By doing it you will need to build another service to process these transaction and e-mail the users with anything you think is necessary.
Another possibility is, instead of notifying the user through e-mail, you can create a method to check the processing status based on a transaction id and do a pooling strategy so the user won't even notice that the processing is being done in background.
Hope you succeed
Best Regards

Periodic Email Notifications (Windows Azure .Net)

I have an application written in C# ASP.Net MVC4 and running on Windows Azure Website. I would like to write a service / job to perform following:
1. Read the user information from the website database
2. Build a user-wise site activity summary
3. Generate an HTML email message that includes the summary for each user account
4. Periodically send such emails to each user
I am new to Windows Azure Cloud Services and would like to know best approach / solution to achieve the above.
Based on my study so far, I see that independent Worker Role of Cloud Services along with SendGrid and Postal would be a best fit. Please suggest.
You're on the right track, but... Remember that a Worker Role (or Web Role) is basically a blueprint for a Windows Server VM, and you run one or more instances of that role definition. And that VM, just like Windows Server running locally, can perform a bunch of tasks simultaneously. So... there's no need to create a separate worker role just for doing hourly emails. Think about it: For nearly an hour, it'll be sitting idle, and you'll be paying for it (for however many instances of the role you launch, and you cannot drop it to zero - you'll always need minimum one instance).
If, however, you create a thread on an existing worker or web role, which simply sleeps for an hour and then does the email updates, you basically get this ability at no extra cost (and you should hopefully cause minimal impact to the other tasks running on that web/worker role's instances).
One thing you'll need to do, independent of separate role or reused role: Be prepared for multiple instances. That is: If you have two role instances, they'll both be running the code to check every hour. So you'll need a scheme to prevent both instances doing the same task. This can be solved in several ways. For example: Use a queue message that stays invisible for an hour, then appears, and your code would check maybe every minute for a queue message (and the first one who gets it does the hourly stuff). Or maybe run quartz.net.
I didn't know postal, but it seems like the right combination to use.

Real time application on Microsoft Azure

I'm working on a real-time application and building it on Azure.
The idea is that every user reports something about himself and all the other users should see it immediately (they poll the service every seconds or so for new info)
My approach for now was using a Web Role for a WCF REST Service where I'm doing all the writing to the DB (SQL Azure) without a Worker Role so that it will be written immediately.
I've come think that maybe using a Worker Role and a Queue to do the writing might be much more scalable, but might interfere with the real-time side of the service. (The worker role might not take the job immediately from the queue)
Is it true? How should I go about this issue?
Thanks
While it's true that the queue will add a bit of latency, you'll be able to scale out the number of Worker Role instances to handle the sheer volume of messages.
You can also optimize queue-reading by getting more than one message at a time. Since a single queue has a scalability target of 500 TPS, this lets you go well beyond 500 messages per second on reads.
You might look into a Cache for buffering the latest user updates, so when polling occurs, your service reads from cache instead of SQL Azure. That might help as the volume of information increases.
You could have a look at SignalR, it does not support farm scenarios out-of-the-box, but should be able to work with the use of either internal endpoint calls to update every instance, using the Azure Service Bus, or using the AppFabric Cache. This way you get a Push scenario rather than a Pull scenario, thus you don't have to poll your endpoints for potential updates.