So, I start running my Job with just one transformation, it starts and stays that way. It doesn´t throw any errors.
The transformation alone runs perfect and fast, but the Job doesn´t.
What can the problem be? I´ve been searching without success.
Print of my Job
Related
I'm running into a weird issue with Pentaho 7.1. I run a job that I created and it runs perfectly and quickly the first time I run it.
The job is an ETL Job consisting of a Start widget, 7 Transformations running in a sequence, and a Success widget.
I'm confused as to why the job runs once, and when I try to run it again it says "Spoon - Starting job..." and then the job just hangs.
If I delete the job and I create a brand new one, I am then able to run the job once and I am stuck again with the job no longer able to run after that. I don't understand why the job keeps hanging after it gets executed once, and it is then 100% broken after a Successful run...
I turned up the logging in Pentaho 7.1 Spoon, and it shows this continuously...
2018/08/14 17:30:24 - Job1 - Triggering heartbeat signal for
ported_jobs at every 10 seconds 2018/08/14 17:30:34 - Job1 -
Triggering heartbeat signal for ported_jobs at every 10 seconds
2018/08/14 17:30:44 - Job1 - Triggering heartbeat signal for
ported_jobs at every 10 seconds 2018/08/14 17:30:54 - Job1 -
Triggering heartbeat signal for ported_jobs at every 10 seconds
I can't seem to put my finger on why this happening.
Any help is appreciated
Probable answer: Check that your transformations are not opening the same database for input and output. A quick check may to run the first transformation directly (without the job) and see if it locks.
Happened because the server db you want to update are slow to respond. Probably high CPU and RAM. I tried to increase the RAM and CPU for the db server, now my job runs okay.
It's been more than an hour, and the job is still running, I guess it is dead already, what I was doing is very simple:
I have two very small textfiles, and I imported them to hdfs already and would like to practice some pig latin operations. Here is what I did:
1. I created two relations, one for each
2. I created a co-grouping
3. I tried to get a dump
The dump lasted for more than an hours now, I checked a few times in GUI, and found the same job has been ended and started again:
1. completed 50%
Started again and hanging
btw: what the heck is Dr. Who showing in this screenshot (top right corner):
In this case you may want to kill the job, the command is:
yarn -kill application_xxxxxx
and refresh the queue after the job is killed:
yarn rmadmin -refreshQueue
Hi spring batch users,
regarding the documentation http://docs.spring.io/spring-batch/reference/htmlsingle/#d5e1320
"If the process died ("kill -9" or server failure) the job is, of course, not running, but the JobRepository has no way of knowing because no-one told it before the process died."
I try to find and restart the stale job executions by using
Set<JobExecution> jobExecutions = jobExplorer.findRunningJobExecutions(jobName);
...
jobExecution.setStatus(FAILED);
jobExecution.setEndTime(new Date());
jobRepository.update(jobExecution);
jobOperator.restart(jobExecution.getId());
But this seems to be very inconvenient.
1) I have to do this before other (new) jobs could be started.
2) I have to handle multiple instances of running servers so findRunningJobExecutions will not do the trick.
You can find other questions regarding this topic:
https://jira.spring.io/browse/BATCH-2433?jql=project%20%3D%20BATCH%20AND%20status%20%3D%20Open%20ORDER%20BY%20priority%20DESC
Spring Batch after JVM crash
I would love to see a solution to register a "start up clean jobs listener". This will still not fix the problems originated by the multi server environment because spring batch does not know if the JobExecution marked by STARTED is not running on an other instance.
Thanks for any advice
Alex
Your job cannot and should not recover "automatically" from a kill -9 scenario. A kill -9 is treated very differently than you application throwing a caught Exception. The reason for this is that you've effectively pulled the carpet out from under the application without giving it a chance to reach a synchronization point with the database to commit any necessary information to the ExecutionContext or update the job/step status(es). Therefore, the last status touchpoint with the database will remain and the job will still look STARTED.
"OK, fine" you say, "but if I start another execution, I want it to find that STARTED execution, and pick up where it left off." The problem here is that there is no clean way for the application to distinguish a job that is ACTUALLY RUNNING from one that has failed but couldn't up the database. The framework here correctly errs on the side of caution and prevents you from starting a job that already appears running, and this is a GOOD thing.
Why? Because let's assume your job was actually still running and you restarted by accident. As coded, the framework will start to spin up, see your running execution and fail with the following message A job execution for this job is already running. I can't tell you how many times we've been saved by this because someone accidentally launched a job twice!
If you were to implement the listener you suggest, the 2nd execution would instead be allowed to start and you'd have 2 different JVMs repeating the same work, possibly writing to the same files/tables and causing a huge data mess that could be impossible to clean up.
Trust me, in the event the Linux terminal kills your job or your job dies because the connection to the database has been severed, you WANT human eyes on those execution states before you attempt a restart.
Finally, on the off chance you actually wanted to kill you job, you can leverage several other standard patterns for stopping jobs:
Stop via throw Exception
Stop via JobOperator.stop()
I hope someone can point me in the right direction or shed some light on the issue I'm having. We have Autosys 11.3.5 running in Windows environment.
I have several jobs setup to launch on a remote NAS server.
I needed JOB_1 in particular to only run if another completed successfully.
Seems straight forward enough. In UI there's a section to specify Condition such as: s(job_name) as I have done and I'm assuming that ONLY if the job with name job_name succeeds that my initial job should run.
No matter what I do, when I make the second job fail on purpose (whether manually setting its status to FAILURE) or changing some of its parameters so that its natural run time causes it to fail. The other job that I run afterwards seems to ignore the condition altogether and complete successfully each time.
I've triple checked the job names (in fact I copy and pasted it from the JIL definition of the job so there are no typos), but it's still being ignored.
Any help in figuring out how to make one job only run if another did not fail (and NOT to run if it DID fail) would be appreciated.
If both the jobs are scheduled and become active together, then this should not happen.
The way i think is, you must be force starting the other job while the first is failed. If thats the case, then conditions will not work.
You need to let both the jobs start as per schedule, or at least the other job start as per schedule while the first one is failed. In that case the other job will stay in AC state until the first one is SU.
Let me know if this is not the case, i will have to rephrase another solution then.
I have a SSIS package that monitors a folder. This package will run continuously until it's terminated.
I want to schedule this using a SQL Agent job.This SQL Agent job will utilize two steps. It is kind of heart beat job to to make sure the SSIS package runs.
Step - 1 it checks whether the SSIS package is running. if running quit else step 2.
Step - 2 Execute the SSIS job. if OK then report success and quit else report failure and quit.
uses a daily schedule Mon-Fri every 4 hrs.
When I execute the SQL Job, it starts the SSIS package but the job keeps running and the job monitor and history shows it as "inprogress"
I had to close the job to come out of the dialog but in background the SSIS job is still running as expected.
Is this normal behavior ? Do I need to approach this in a different way ?
Appreciate any pointers or help on this.
Once the job has begun, the Start Jobs dialog box has no impact whatsoever on the running of the job itself - it exists solely to provide a monitoring window for you. Closing it will have no effect on the running job.
From other phrases in your question, I gather that you do not expect the job to ever 'finish' - therefore I would expect it to always show as In Progress unless it errors out or is stopped.
"This package will run continuously until it's terminated."
"The job keeps running and the job monitor and history shows it as in progress"