Yarn Job related commands - hadoop-yarn

For any job which is submitted to YARN using YARN console and YARN Cluster UI, how to find:
Who has submitted the job?
To which YARN queue is a job submitted?
How much time did it take to finish the job?
I tried using below command, but it gives lot of details, not specific
yarn application -list

Give a look at Yarn Admin Page, there are the details about all the jobs you have submitted to the cluster.
Just accessing to <Local_ip>:8088 I.E: Localhost:8088.
Also, there is a section for logs at /logs/userlogs directory. This directory will contain logs for all applications running by a user.

Related

upload yarn application logs from emr cluster to s3

I know they can be uploaded to s3 in ~5 minute intervals with logpusher, but I would ideally like to get them within 30s-1min of step completion.
The logs I am looking for are the application logs for stdout
I can ssh to the master node and get these logs via:
yarn logs -applicationId <<application_id>>
Is there a way that I can either write a bootstrap script that restarts the logpusher service after a step has been completed, or a way to submit an emr step that will export the yarn logs to s3?
EDIT:
I ended up accomplishing this task by setting up an automatic follow-up job with boto3 utilizing AWS's script-runner jar, where I run a bash script that creates a text file from the yarn cli of a list of yarn application id's, downloads a python script from s3 I made to parse this text file and find the most recent applicationid, then pass that appID to the yarn cli to make a text file of that apps logs, then uploads them to s3 again. This reduces the wait time to ~15 seconds after a job completes.
You need not specify anything. By default, EMR pushes application logs to s3 location specified in Log Uri. Just look under containers in the Log Uri location.

yarn usercache dir not resolved properly when running an example application

I am using Hadoop 3.2.0 and trying to run a simple application in a docker container and I have made the required configuration changes both in yarn-site.xml and container-executor.cfg to choose LinuxContainerExecutor and docker runtime.
I use the example of distributed shell in one of the hortonworks blog. https://hortonworks.com/blog/trying-containerized-applications-apache-hadoop-yarn-3-1/
The problem I face here is when the application is submitted to YARN it fails with a reason related to directory creation issue with the below error
2019-02-14 20:51:16,450 INFO distributedshell.Client: Got application
report from ASM for, appId=2, clientToAMToken=null,
appDiagnostics=Application application_1550156488785_0002 failed 2
times due to AM Container for appattempt_1550156488785_0002_000002
exited with exitCode: -1000 Failing this attempt.Diagnostics:
[2019-02-14 20:51:16.282]Application application_1550156488785_0002
initialization failed (exitCode=20) with output: main : command
provided 0 main : user is myuser main : requested yarn user is
myuser Failed to create directory
/data/yarn/local/nmPrivate/container_1550156488785_0002_02_000001.tokens/usercache/myuser
- Not a directory
I have configured yarn.nodemanager.local-dirs in yarn-site.xml and I can see the same reflected in YARN web ui localhost:8088/conf
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/yarn/local</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
I do not understand why is it trying to create usercache dir inside the nmPrivate directory.
Note : I have verified the permissions for myuser to the directories and also have tried clearing the directories manually as suggested in a related post. But no fruit. I do not see any additional information about container launch failure in any other logs.
How do I debug why the usercache dir is not resolved properly??
Really appreciate any help on this.
Realized that this is all because of the users the services were started with and the permissions to the directories the services work on.
After making sure the required changes are done, I am able to seamlessly run the examples and other applications..
Thanks Hadoop user community for the direction. Adding the link here for more details.
http://mail-archives.apache.org/mod_mbox/hadoop-user/201902.mbox/browser

When I run a command with yarn, how do I get the applicationId?

I'm submitting a job with the yarn jar command to run the distributed shell. How do I get the applicationId programmatically?
To get the application Id you need to go to ResourceManager Web UI, which can be accessed by the IP addr of your node where resource manager is available and port number to use is 8088. There you can see the Application id, Container id and your job status.
You can look the job status from CLI also. You can list all the running jobs using command yarn application -list and yarn application status . It won't be a detailed output like you can see in web UI but will help you get the status and running jobs

yarn not getting nodes

This is in AWS EMR cluster with 2 task nodes and a Master.
I'm trying the hello-samza that launches a yarn job. The job gets stuck in ACCEPTED STATE. I looked in other posts and it seems that my yarn getting no nodes. Any help on what yarn not getting task nodes will help.
[hadoop#xxx hello-samza]$ deploy/yarn/bin/yarn node -list
17/04/18 23:30:45 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total Nodes:0
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
[hadoop#xxx hello-samza]$ deploy/yarn/bin/yarn application -list -appStates ALL
17/04/18 23:26:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1492557889328_0001 wikipedia-parser_1 Samza hadoop default ACCEPTED UNDEFINED 0% N/A
I made a complete answer for a similar case I've been experiencing: have a look at it, it might be this kind of conf issue
It seems like the nodemanagers are not running on either node (either not started at all or exited with error). Use jps command to check if all the daemons associated with YARN are running on the two nodes. Additionally, check both nodemanager logs to see if any exceptions might have killed it.

How to disable simultaneous build on drone io?

I use drone as CI and want to know how I can disable simultaneous build. What's happening is that when I submit two commits to git repo, drone will trigger two build on each of the submit. How can I let the second build wait until the first one finish?
Regarding the open source version of Drone: set the DOCKER_MAX_PROCS environment variable of your drone agent to 1, i.e. docker run -e DOCKER_MAX_PROCS=1 [...] drone/drone:0.5 agent. The agent will run one build concurrently, other builds will queue up.
See the Installation Reference section in the readme for more info.