I am building a system that will consist of several instances, each running our Optaplanner implementation. These instances will monitor a common queue for incoming jobs. I don't want an instance that is already busy to take the job, so I want to check the number of ongoing jobs in the solver manager.
In the debugger, it looks like the solverManager has some stuff that could help me check that (problemIdToSolverJobMap.size() < parallelSolverCount would work for instance), but these are private ant not accesible to me.
How do I in the most robust way check the status of the solver manager as a whole, not for a specific job?
That would be useful indeed. This is an API gap, clearly. Please create a jira.
Related
I am looking to implement Hangfire within an Asp.Net Core application.
However, I'm struggling to understand how best to prevent the user from creating duplicate Hangfire "Fire-and-Forget" jobs.
The Problem
Say the user, via the app, creates a job that does some processing relating to a specific client. The process may take several minutes to complete. I want to be able to prevent the user from creating another job for the same client while there are other jobs for that client still being processed by Hangfire (i.e. there can only be 1 processing job for a specific client at any one time, although several different clients could also each have their own job being processed).
Solution?
I need a way to attach additional meta-data (in this example, the client id) to each job as it is created, which I can then use to interrogate the jobs currently processing in Hangfire to see if any of them relate to the client id in question.
It seems like such a basic feature that would prove so useful for such scenarios, but I'm coming to the conclusion that such a thing isn't supported, which surprises me.
... Unless you know different.
Hangfire looks great, and I'm keen to use it, but this might be a show-stopper for me.
Any advice would be greatly received.
Thanks
I need a way to attach additional meta-data (in this example, the
client id) to each job as it is created
Adding metadata to jobs can be achieved by the mean of hangfire filters.
You may have a look at this answer.
https://stackoverflow.com/a/57396553/1236044
Depending on your needs you may use more filters types.
For example, the IElectStateFilter may be useful to filter out jobs if another one is currently processing.
I you have several processing servers, you will need your own storage solution to handle your own custom currently processing/priority/locking mechanism.
We are developing a Web API using .Net Core. To perform background tasks we have used Hosted Services.
System has been hosted in AWS Beantalk Environment with the Load Balancer. So based on the load Beanstalk creates/remove new instances of the system.
Our problem is,
Since background services also runs inside the API, When load balancer increases the instances, number of background services also get increased and there is a possibility to execute same task multiple times. Ideally there should be only one instance of background services.
One way to tackle this is to stop executing background services when in a load balanced environment and have a dedicated non-load balanced single instance environment for background services only.
That is a bit ugly solution. So,
1) Is there a better solution for this?
2) Is there a way to identify the primary instance while in a load balanced environment? If so I can conditionally register Hosted services.
Any help is really appreciated.
Thanks
I am facing the same scenario and thinking of a way to implement a custom service architecture that can run normally on all of the instance but to take advantage of pub/sub broker and distributed memory service so those small services will contact each other and coordinate what's to be done. It's complicated to develop yes but a very robust solution IMO.
You'll "have to" use a distributed "lock" system. You'll have to use, for example, a distributed memory cache who put a lock when someone (a node of your cluster) is working on background. If another node is trying to do the same job, he'll be locked by the first lock if the work isn't done yet.
What i mean, if all your nodes doesn't have a "sync handler" you can't handle this kind of situation. It could be SQL app lock, distributed memory cache or other things ..
There is something called Mutex but even that won't control this in multi-instance environment. However, there are ways to control it to some level (may be even 100%). One way would be to keep a tracker in the database. e.g. if the job has to run daily, before starting your job in the background service you might wanna query the database if there is any entry for today, if not then you will insert an entry and start your job.
Complete newbie to PigLatin, but looking to pull data from the MetOffice DataPoint API e.g.:
http://datapoint.metoffice.gov.uk/public/data/val/wxfcs/all/xml/350509?res=3hourly&key=abc123....
...into Hadoop.
My question is "Can this be undertaken using PigLatin (from within Pig View, in Ambari)"?
I've hunted round for how to format a GET request into the code, but without luck.
Am I barking up the wrong tree? Should I be looking to use a different service within the Hadoop framework to accomplish this?
It is very bad idea to make calls to external services from inside of map-reduce jobs. The reason being that when running on the cluster your jobs are very scalable whereas the external system might not be so. Modern resource managers like YARN make this situation even worse, when you swamp external system with the requests your tasks on the cluster will be mostly sleeping waiting for reply from the server. The resource manager will see that CPU is not being used by tasks and will schedule more of your tasks to run which will make even more requests to the external system, swamping it with the requests even more. I've seen modest 100 machine cluster putting out 100K requests per second.
What you really want to do is to either somehow get the bulk data from the web service or setup a system with a queue and few controlled number of workers that will pull from the external system at set rate.
As for your original question, I don't think PigLatin provides such service, but it could be easily done with UDFs either Python or Java. With Python you can use excellent requests library, which will make your UDF be about 6 lines of code. Java UDF will be little bit more verbose, but nothing terrible by Java standards.
"Can this be undertaken using PigLatin (from within Pig View, in
Ambari)"?
No, by default Pig load from HDFS storage, unless you write your own loader.
And i share same point with #Vlad, that this is not a good idea, you have many other other components used for data ingestion, but this not a use case of Pig !
I am new to weblogic server. I am using work manager. I want to know what is work manager and why we need it. What is the difference between normal request with out work manager and with work manager !!
I think the documentation is rather good on this subject.
WebLogic Server prioritizes work and allocates threads based on an
execution model that takes into
account administrator-defined
parameters and actual run-time
performance and throughput.
Administrators can configure a set of
scheduling guidelines and associate
them with one or more applications, or
with particular application
components. For example, you can
associate one set of scheduling
guidelines for one application, and
another set of guidelines for other
application. At run-time, WebLogic
Server uses these guidelines to assign
pending work and enqueued requests to
execution threads.
Essentially, with work managers you can attach a scheduling policy to an application to e.g. make sure that a specific application gets a fair share of the available computing resources under a heavy load situation. Or you might want to restict the maximum number of threads that will be allocated to an application to prevent a buggy/untested application to bring the whole application server to its knees. (But surely all apps have been tested not to do anything like that.... ;) )
Outside of modifying the default allocation algorithms, the Work Manager is also useful if you are using a Foreign JMS Provider (such as IBM MQ) and need to process more than 16 messages at a time.
I wish to develop an application in VB.NET to provide to following functionality and hope you can give me some pointers on which direction to take.
I need some kind of “server” type component which sits in the background monitoring request from users and performing various task. (this component can be install locally or centrally)
The users submit an instruction to the “server” to perform a certain task at a designated date and time. (or perform the task straight away)
The “server” would perform the task at the desired date and time and inform the user the result of the task.
I have thought of using a central database to which the user writes the instructions. The “server” could read from the database to obtain the instructions, and write the result back to the database.
I want a fast reaction to the instructions, so the “server” must poll the database every few seconds; I fear this may be detrimental to performance. Also how do I get the server to perform the task at the desired time?
Again checking all outstanding tasks against the current time is not very efficient, so I thought about utilising the Windows Scheduler, but I am not sure of the best way of integrating this functionality.
I would be grateful for any ideas, pointers or suggestions.
Have you looked at quartz.net? It's a scheduling framework which might be useful to you.
We have a similar system where we work, utilising a webservice to accept requests, run them when required, and notify callers with the results if necessary.
In our case the callers were other applications and not people.
The web service consisted of the following methods: (rough version, not exact)
int AddJob(string jobType, string input, datetime startTime) // schedules job and sets timer to call StartJobs when needed, and then returns job id
void GetResults(int jobId, out string status, out string output) // gets results (status="queued / running / completed / failed")
void StartJobs() //called via a timer as needed to kick off scheduled jobs
We also built in checks to limit how many jobs of could run simultaneously, and whether they could retry if they failed, and emails admins if any jobs fail the last attempt.
Our version is much more comprehensive than this, with the jobs actually being webservices themselves, supporting simultaneous running, built-in workflow so jobs can wait on others, but maybe it will give you some ideas. It's not a trivial project, but was fun to implement!