Hangfire - Is there a way to attach additional meta data to jobs when they are created to be able to identify them later? - hangfire

I am looking to implement Hangfire within an Asp.Net Core application.
However, I'm struggling to understand how best to prevent the user from creating duplicate Hangfire "Fire-and-Forget" jobs.
The Problem
Say the user, via the app, creates a job that does some processing relating to a specific client. The process may take several minutes to complete. I want to be able to prevent the user from creating another job for the same client while there are other jobs for that client still being processed by Hangfire (i.e. there can only be 1 processing job for a specific client at any one time, although several different clients could also each have their own job being processed).
Solution?
I need a way to attach additional meta-data (in this example, the client id) to each job as it is created, which I can then use to interrogate the jobs currently processing in Hangfire to see if any of them relate to the client id in question.
It seems like such a basic feature that would prove so useful for such scenarios, but I'm coming to the conclusion that such a thing isn't supported, which surprises me.
... Unless you know different.
Hangfire looks great, and I'm keen to use it, but this might be a show-stopper for me.
Any advice would be greatly received.
Thanks

I need a way to attach additional meta-data (in this example, the
client id) to each job as it is created
Adding metadata to jobs can be achieved by the mean of hangfire filters.
You may have a look at this answer.
https://stackoverflow.com/a/57396553/1236044
Depending on your needs you may use more filters types.
For example, the IElectStateFilter may be useful to filter out jobs if another one is currently processing.
I you have several processing servers, you will need your own storage solution to handle your own custom currently processing/priority/locking mechanism.

Related

Pentaho large source table processing to target table same schema

I currently have an etl job that reads source table with over 1 million records and then sequentially processing to target table. Both source and target are in same schema but in between there is an external rest endpoint call to post some data from the source table and this job is performing very bad right now and Can someone please let me know what are some ways to improve performance in terms of how to parallelize this or reducing fetchsize etc to reduce this job running time ?
Check if your rest endpoint supports batching, and then implement that. Most APIs do these days. (In this case, you send multiple requests in one json/xml file to the end point)
Otherwise you simply need to use multiple copies of the REST client step. you should be able to get away with 8-10 at least, but check that you're not limited in some way at the other end.
Finally if none of that helps, try concocting your own httpclient in the java class step (not the javascript) and be sure that you only authenticate with the rest endpoint once, not every request, by keeping the session open. I'm not 100% convinced the rest client does this, and authentication is often the most expensive bit.

Use IronWorkers while using my work

My website is hosted on AWS Elastic Beanstalk (PHP). I use Yii Framework as an MVC.
A while ago I wanted to run a SQL query everyday. I looked up how to run crons on Beanstalk and it seemed complicated to merge the concepts of Cloud and Cron. I ran into Iron Worker (http://www.iron.io/worker), and managed to create a worker that is currently doing its job fine.
Today I want to run a more complex cron (Look for notifications in my database, decide whether to send an email, build an email template and send the email (via AWS SES).
From what I understand, worker files are supposed to be self-contained items, with everything they need to work.
However, I have invested a lot of time and effort in building my MVC. I have complex models, verifications, an email templating engine, etc...
It seems very difficult to use the work I've done to create an Iron Worker. Even if I managed to port all of my code to a worker (which seems like a great deal of work), it means anytime I make changes to my main code I need to make sure the worker also has those changes. It means I would have a "branch" of my code. Even more so if I want to create more workers in the future.
What is the correct approach?
Short-term, you could likely just use the scheduling capabilities in IronWorker and have the worker hit an endpoint in your application. The endpoint will then trigger the operations to run within your app environment.
Longer-term, we do suggest you look at more of a service-oriented approach whereby you break your application up to be more loose-coupled and distributed. Here's a post on the subject. The advantages are many especially around scalability and development agility.
https://blog.heroku.com/archives/2013/12/3/end_monolithic_app
You can also take a look at this YII addition.
http://www.yiiframework.com/extension/yiiron/
Certainly don't want you rewrite your app unnecessarily but there are likely areas where you can look to decouple. Suggest creating a worker directory and making efforts to write the workers to be self-contained. In that way, you could run them in a different environment and just pass payloads to the worker. (Push queues can also be used to push to these workers.) Once you get used to distributed async processing, it's a pretty easy process to manage.
(Note: I work at Iron.io)

Periodic Email Notifications (Windows Azure .Net)

I have an application written in C# ASP.Net MVC4 and running on Windows Azure Website. I would like to write a service / job to perform following:
1. Read the user information from the website database
2. Build a user-wise site activity summary
3. Generate an HTML email message that includes the summary for each user account
4. Periodically send such emails to each user
I am new to Windows Azure Cloud Services and would like to know best approach / solution to achieve the above.
Based on my study so far, I see that independent Worker Role of Cloud Services along with SendGrid and Postal would be a best fit. Please suggest.
You're on the right track, but... Remember that a Worker Role (or Web Role) is basically a blueprint for a Windows Server VM, and you run one or more instances of that role definition. And that VM, just like Windows Server running locally, can perform a bunch of tasks simultaneously. So... there's no need to create a separate worker role just for doing hourly emails. Think about it: For nearly an hour, it'll be sitting idle, and you'll be paying for it (for however many instances of the role you launch, and you cannot drop it to zero - you'll always need minimum one instance).
If, however, you create a thread on an existing worker or web role, which simply sleeps for an hour and then does the email updates, you basically get this ability at no extra cost (and you should hopefully cause minimal impact to the other tasks running on that web/worker role's instances).
One thing you'll need to do, independent of separate role or reused role: Be prepared for multiple instances. That is: If you have two role instances, they'll both be running the code to check every hour. So you'll need a scheme to prevent both instances doing the same task. This can be solved in several ways. For example: Use a queue message that stays invisible for an hour, then appears, and your code would check maybe every minute for a queue message (and the first one who gets it does the hourly stuff). Or maybe run quartz.net.
I didn't know postal, but it seems like the right combination to use.

Send notice to Azure Web role from a Azure Worker role - Best Practice

Situation
Users can upload Documents, a queue message will be placed onto the queue with the documents ID. The Worker Role will pick this up and get the document. Parse it completely with Lucene. After the parsing is complete the Lucene IndexSearcher on the Webrole should be updated.
On the Web role I'm keeping a static Lucene IndexSearcher because otherwise you have to make a new IndexSearch every search request and this gives a lot of overhead etc.
What I want do to is send a notice from the Worker Role to the Web Role that he needs to update his IndexSearcher.
Possible Solutions
Make some sort of notice queue. The Web Role starts an endless task that keeps checking the notice queue. If he finds a message then he should update the IndexSearch.
Start a WCF Service on the Worker Role and connect with the Web Role. Do a callback from the Worker Role and tell the Web Role through the Service that he needs to update his IndexSearcher.
Just update it on a regular interval
What would be the best solution or is there any other solution for this?
Many thanks !
If your worker roles write each finished job's details to a table using a PK of something like (DateTime.MaxValue - DateTime.UtcNow).Ticks.ToString("d19"), you will have a sorted list of the latest jobs that have been processed. Set your web role to poll the table like so:
var q = ctx.CreateQuery<LatestJobs>("jobstable")
.Where(j => j.PartitionKey.CompareTo(LastIndexTime.GetReverseTicks()) < 0)
.Take(1)
.AsTableServiceQuery()
if (q.Count() > 0)
{
//new jobs exist since last check... re-index.
}
For worker roles that do the indexing work, this is great because they can write indiscriminately to the table without worry of conflict. For you, you also have an audit log of the jobs they are processing (assuming you put some details in there).
However, you have one remaining problem: it sounds like you have 1 web role that updates the index. This one web role can of course poll this table on whatever frequency you choose (just track the LastIndexTime for searching later). Your issue is how to control concurrency of the web role(s) if you have more than one. Does each web role maintain it's own index or do you have one stored somewhere for all? Sorry, but I am not an expert in Lucene if that should be obvious.
Anyhow, if you have multiple instances in your WebRole and a single index that all can see, you need to prevent multiple roles from updating the index over and over. You can do this through leasing the index (if stored in blob storage).
Update based on comment:
If each WebRole instance has its own index, then you don't have to worry about leasing. That is only if they are sharing a blob resource together. So, this technique should work fine as-is and your only potential obstacle is that the polling intervals for the web roles could be slightly out of sync, causing somewhat different results until all update (depending on which instance you hit). Poll every 30 seconds on the table and that will be your max out of sync. Each web role instance simply needs to track the last time it updated and do incremental searches from that point.
Depending on upload frequency, you may find queue messages to cause you unneeded updates. For instance, if you get a dozen uploads and process them in close time proximity, you'd now have a dozen queue messages, each telling your web role to update. It would make more sense to keep a single signal (maybe a table row or SQL Azure row). You could simply set a row value to 1, signaling the need to update. When your web role detects this change, reset to 0 and start the update. Note: If using an Azure Table row, you'd need to poll for updates (and depending on traffic, you could start accumulating a large number of transactions). You could use the AppFabric Cache for this signal as well.
You could use a WCF service on an internal endpoint on your Web Role. However, you still have the burst issue (if you get, say, a dozen uploads while the webrole is updating, you don't want to then do another dozen updates).

Periodic tasks inside WCF service hosted in IIS

We would like to have some periodic actions executed by our WCF service hosted in IIS. What is the best way to do this? Creating a timer doesn't look as a good solution. Creating a windows service that would behave as some kind of a heart beat looks like a problem solution, but it still doesn't smell good. What approach will be a good solution to this problem?
That depends on what your action is trying to do. If it's a database related clean up action, e.g. deleting orphaned shopping carts, you could schedule a job for this in your database of choice, like SQL Server's very reliable job engine. A Windows service would be a great candidate if it's an OS based action like periodic clean up/deletion of files etc. Since an IIS/WCF service is usually designed more to handle external responses I don't think it'd be wrong to use the service layers of the OS or DB for your task.
I used to run into tasks like this in my PHP days, when I would want to schedule an email to be sent at a given time. After many months of tinkering (mainly trying to handle calls to a page that may never come in), I eventually came to the conclusion that an essentially stateless bit of code is not the place to do it, and scheduled a cron job to fire each night.
I'd definitely recommend going down the route of an externally triggered job (either in SQL, a windows service, etc) and handling your operations from there. The pain, as I know to my cost, is just not worth the return.
I have struggled much with this, and have, in some cases, where clean-up is required, just done an asynchronous (background) task on the back of a common function to do period clean-up, i.e. On GetCommonList(), I do a check in settings/appsetting for lastrun and then kick it off once a day or every 5 minutes, etc. That way, if the app goes to greener pastures (which does happen), I don't need to worry about any lingering tasks running somewhere. Doesn't work in all cases, but security, etc. is also automatically taken care of - whereas services, etc. you may still have issues with that. Just my 2c.