I'm working on a workflow that has both Hive and Java actions. Very often we have been noticing that there is a few minutes delay between Java action start time and the job submission time. We don't see that with Hive jobs, meaning Hive jobs seem to be submitted almost immediately after they are started. The Java jobs do not do much and so they finish successfully in seconds after they are submitted but the time between start and submission seem to be very night ( 4 -5 minutes). We are using fair scheduler and the there are enough mapper/reducer slots available. But still even if it's a resource problem the Hive jobs should also show delay between start and submission but they don't ! Java jobs are very simple jobs and they don't process any files etc and basically used to call a web service and they spawn only single mapper and no reducers where are the Hive jobs creates hundreds of mapper/reducer tasks but still there is not delay between start and submission. We are not able to figure out why oozie is not submitting the Java job immediately. Any ideas?
Related
I've an application that sends redid jobs output to front-end via socket. There are some special type nodes those are special & needs to wait to finish their jobs.So I'm using infinite waiting to finish those jobs to complete. Something like below.
While True:
If job.status() == "finished" :
Break
I want to if there's any way to free the resource & re run the jobs from their previous state.
Tried a solution to stay awake all the time. I want a solution where the resources are not bound to one jobs , the system can utilize their resources to other jobs
Well what you can do is,
To return if the job is special. And save the states of jobs in Redis Environment.
If u have a back end in your application, you can always check if the special jobs are finished running.
I have some Tasks in pentaho, and for some reason some times, some tasks simply stall with message Carte - Installing timer to purge stale objects after 1440 minutes. task. For example, I scheduled one task to run at 05h00 AM and this task usually runs in 10 minutes, but some times it never ends. The task stalls with aforementioned message. However, when I run the task run on the Pentaho Data Integration Canvas, the job works.
The .exe that I use to run is:
cd c:\data-integration
kitchen.bat /rep:repo /job:jobs//job_ft_acidentes /dir: /level:Minimal
Picture of the message
Hoe can I prevent this error?
I'm running into a weird issue with Pentaho 7.1. I run a job that I created and it runs perfectly and quickly the first time I run it.
The job is an ETL Job consisting of a Start widget, 7 Transformations running in a sequence, and a Success widget.
I'm confused as to why the job runs once, and when I try to run it again it says "Spoon - Starting job..." and then the job just hangs.
If I delete the job and I create a brand new one, I am then able to run the job once and I am stuck again with the job no longer able to run after that. I don't understand why the job keeps hanging after it gets executed once, and it is then 100% broken after a Successful run...
I turned up the logging in Pentaho 7.1 Spoon, and it shows this continuously...
2018/08/14 17:30:24 - Job1 - Triggering heartbeat signal for
ported_jobs at every 10 seconds 2018/08/14 17:30:34 - Job1 -
Triggering heartbeat signal for ported_jobs at every 10 seconds
2018/08/14 17:30:44 - Job1 - Triggering heartbeat signal for
ported_jobs at every 10 seconds 2018/08/14 17:30:54 - Job1 -
Triggering heartbeat signal for ported_jobs at every 10 seconds
I can't seem to put my finger on why this happening.
Any help is appreciated
Probable answer: Check that your transformations are not opening the same database for input and output. A quick check may to run the first transformation directly (without the job) and see if it locks.
Happened because the server db you want to update are slow to respond. Probably high CPU and RAM. I tried to increase the RAM and CPU for the db server, now my job runs okay.
I use hangfire to run recurring jobs.
My job get current data from the database, perform an action and leave a trace on records processed. If a job didn't run this minute - I have no need to run it twice the next minute.
Somehow I got my recurring jobs (1 minute cycle) queued by their thousands and never executed. When I restarted my IIS it tried to execute them all at once and clog the DB.
Besides than fixing the problem of no execution, is there a way to stop them from queuing up?
If you want to disable retry of failed job simply decorate your method with an AutomaticRetryAttribute and set Attempts to 0
See https://github.com/HangfireIO/Hangfire/blob/master/src/Hangfire.Core/AutomaticRetryAttribute.cs for more details
Is there a plugin or can I somehow configure it, that a job (that is triggered by 3 other jobs) queues until a specified time and only then executes the whole queue?
Our case is this:
we have tests run for 3 branches
each of the 3 build jobs for those branches triggers the same smoke-test-job that runs immediately
each of the 3 build jobs for those branches triggers the same complete-test-job
points 1. and 2. work perfectly fine.
The complete-test-job should queue the tests all day long and just execute them in the evening or at night (starting from a defined time like 6 pm), so that the tests are run at night and during the day the job is silent.
It's no option to trigger the complete-test-job on a specified time with the newest version. we absolutely need the trigger of the upstream build-job (because of promotion plugin and we do not want to run already run versions again).
That seems a rather strange request. Why queue a build if you don't want it now... And if you want a build later, then you shouldn't be triggering it now.
You can use Jenkins Exclusion plugin. Have your test jobs use a certain resource. Make another job whose task is to "hold" the resource during the day. While the resource is in use, the test jobs won't run.
Problem with this: you are going to kill your executors by having queued non-executing jobs, and there won't be free executors for other jobs.
Haven't tried it myself, but this sounds like a solution to your problem.