How to get application id from oozie job id - hadoop-yarn

Is there a way to find out the applicationid using the oozie job id in single command?
I want to display the application id of the failed jobs for an oozie id.
oozie job -info
Can any option be used in above command to retrieve ext id of failed jobs?

Related

Tivoli Workload Scheduler WAPL to restart failed job from step

Is it possible to restart a failed Job in TWS on z/OS from either a particular step or the entire Job using WAPL.
I am trying to automate the restart from Jenkins using WAPL and was unable to find the right syntax.
Thanks

s3distcp fail with "mapreduce_shuffle does not exist"

When I running command below,
s3-dist-cp --src s3://test/9.19 --dest hdfs:///user/hadoop/test
I got a error about auxService.
20/02/03 07:52:13 INFO mapreduce.Job: Task Id : attempt_1580716305878_0001_m_000000_2, Status : FAILED
Container launch failed for container_1580716305878_0001_01_000004 : org.apache.hadoop.yarn.exceptions.InvalidAuxServiceException: The auxService:mapreduce_shuffle does not exist
In many QnA, I found a solution like this
link.
But there is no process for nodemanager.
[hadoop#ip-172-31-37-115 ~]$ initctl list | grep yarn
hadoop-yarn-timelineserver start/running, process 8149
hadoop-yarn-resourcemanager start/running, process 17331
hadoop-yarn-proxyserver start/running, process 8147
My EMR was created by quick menu with emr-5.28.0.
Is there anyone knows about this problem?
Thanks!
I'm sure there's some way to update the configs, but what I did was create a cluster using the 'advanced' setup and chose these software packages:
Ganglia
Hive
Hue
Mahout
Pig
Tez
Spark
Hadoop
(8 in total)
Most of those, except spark, are installed with the default settings (the first radio button for software packages in quick setup). One of these software packages or something related to it is what causes s3-dist-cp to be installed, and I was able to use it with no problems with that setup.

When I run a command with yarn, how do I get the applicationId?

I'm submitting a job with the yarn jar command to run the distributed shell. How do I get the applicationId programmatically?
To get the application Id you need to go to ResourceManager Web UI, which can be accessed by the IP addr of your node where resource manager is available and port number to use is 8088. There you can see the Application id, Container id and your job status.
You can look the job status from CLI also. You can list all the running jobs using command yarn application -list and yarn application status . It won't be a detailed output like you can see in web UI but will help you get the status and running jobs

Oozie prepare war error Stop oozie first [Oozie v4.3]

I am trying to migrate Oozie database from derby to MySql. Everything is running fine except when i try to run command
./oozie-setup.sh prepare-war
I got the following error
setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"
ERROR: Stop Oozie first
And when I try to run
./oozied.sh stop
I get error PID file found but no matching process was found. Stop aborted.
I am struggling as I am not getting a way to stop oozie so that I can proceed.
After a long search and trial i was able to resolve the issue.
You need to delete the pid file which in my case was located in
oozie-4.3.0/oozie-server/temp/oozie.pid
After then you can run ./oozie-setup.sh prepare-war there will be no error.
Actually issue was that tomcat abroted abnormally and pid file was present there. So removing the file did the job.

Use of Enable blocking in PDI - Pig Script Executor

I am exploring Big data plugin in Pentaho 5.2. I was trying to run Pig Script executor. I am unable to understand the usage of
Enabling Blocking. The PDI documentation says that
If checked, the Pig Script Executor job entry will prevent downstream
entries from executing until the script has finished processing.
I am aware that running a pig script will convert the execution to Map reduce jobs. I am running the job with Start job -> Pig Script. If I disable the Enable blocking step I am unable to execute the script. I am getting permission denied errors. As per the documentation " ".
What does downstream mean here. I do not pass any hops from the pig script out. I am unable to understand the Enable blocking step. Any hints can be helpful and will be appreciated.
Enable blocking: the task is deployed to the Hadoop cluster; PDI will follow up on progress and only proceed with the rest of the job tasks AFTER the execution of the Hadoop job finishes;
Enable blocking is disabled: PDI deploys the task to the Hadoop cluster and forgets about it. The rest of the job tasks proceed immediately after the cluster accepts the task, but doesn't wait for it to complete.