A few times I restarted Yarn when there were running MapReduce jobs on it, but I found that the running MR jobs were not affected. i.e after restarting Yarn, the MR jobs could be resumed immediately, I was wondering that why didn't the MR jobs failed? btw, all of my MapReduce jobs were Pig script job.
The link provides the HA(High availability) architecture of YARN's Resource Manager.
In your case , I believe Automatic fail over is enabled ,so when the Resource Manager goes down , another RM is automatically elected to be the active.
Related
For any job which is submitted to YARN using YARN console and YARN Cluster UI, how to find:
Who has submitted the job?
To which YARN queue is a job submitted?
How much time did it take to finish the job?
I tried using below command, but it gives lot of details, not specific
yarn application -list
Give a look at Yarn Admin Page, there are the details about all the jobs you have submitted to the cluster.
Just accessing to <Local_ip>:8088 I.E: Localhost:8088.
Also, there is a section for logs at /logs/userlogs directory. This directory will contain logs for all applications running by a user.
I am running EMR cluster with 3 m5.xlarge nodes (1 master, 2 core) and Flink 1.8 installed (emr-5.24.1).
On master node I start a Flink session within YARN cluster using the following command:
flink-yarn-session -s 4 -jm 12288m -tm 12288m
That is the maximum memory and slots per TaskManager that YARN let me set up based on selected instance types.
During startup there is a log:
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=12288, taskManagerMemoryMB=12288, numberTaskManagers=1, slotsPerTaskManager=4}
This shows that there is only one task manager. Also when looking at YARN Node manager I see that there is only one container running on one of the core nodes. YARN Resource manager shows that the application is using only 50% of cluster.
With the current setup I would assume that I can run Flink job with parallelism set to 8 (2 TaskManagers * 4 slots), but in case that submitted job has set parallelism to more than 4, it fails after a while as it could not get desired resources.
In case the job parallelism is set to 4 (or less), the job runs as it should. Looking at CPU and memory utilisation with Ganglia it shows that only one node is utilised, while the other flat.
Why is application run only on one node and how to utilise the other node as well? Did I need to set up something on YARN that it would set up Flink on the other node as well?
In previous version of Flik there was startup option -n which was used to specify number of task managers. The option is now obsolete.
When you're starting a 'Session Cluster', you should see only one container which is used for the Flink Job Manager. This is probably what you see in the YARN Resource Manager. Additional containers will automatically be allocated for Task Managers, once you submit a job.
How many cores do you see available in the Resource Manager UI?
Don't forget that the Job Manager also uses cores out of the available 8.
You need to do a little "Math" here.
For example, if you would have set the number of slots to 2 per TM and less memory per TM, then submitted a job with parallelism of 6 it should have worked with 3 TMs.
I want to run two jobs parallelly on the same Slave.
Job 1 is Functional Testing jobs doesn't require Browser and Job 2 is Selenium Job which requires Browser for testing.
As for running the job on the same slave, you can use the option Restrict where this project can be run, assuming you have the jenkins slave configured in your setup.
For running the jobs in parallel (are you trying to do this via Jenkinsfile or via freestyle jobs?). For jenkinsfile, you can use the parallel stages feature as described here. For freestyle jobs, I would suggest adding one more job (for example setup job) and use this job to trigger your two jobs at the same time. Here are few screenshots showing one of my pipeline triggering jobs in parallel.
I am exploring Big data plugin in Pentaho 5.2. I was trying to run Pig Script executor. I am unable to understand the usage of
Enabling Blocking. The PDI documentation says that
If checked, the Pig Script Executor job entry will prevent downstream
entries from executing until the script has finished processing.
I am aware that running a pig script will convert the execution to Map reduce jobs. I am running the job with Start job -> Pig Script. If I disable the Enable blocking step I am unable to execute the script. I am getting permission denied errors. As per the documentation " ".
What does downstream mean here. I do not pass any hops from the pig script out. I am unable to understand the Enable blocking step. Any hints can be helpful and will be appreciated.
Enable blocking: the task is deployed to the Hadoop cluster; PDI will follow up on progress and only proceed with the rest of the job tasks AFTER the execution of the Hadoop job finishes;
Enable blocking is disabled: PDI deploys the task to the Hadoop cluster and forgets about it. The rest of the job tasks proceed immediately after the cluster accepts the task, but doesn't wait for it to complete.
I want to have a Hudson setup that has two cluster nodes with JBoss. There is already a test machine with Hudson and it is running the nightly build and tests. At the moment the application is deployed on the Hudson box.
There are couple options in my mind. One could be to use SCPplugin for Hudson to copy the ear file over from master to the cluster nodes. The other option could be to setup Hudson slaves on cluster nodes.
Any opinions, experiences or other approaches?
edit: I set up a slave but it seems that I can't make a job to take place on more than one slave without copying the job. Am I missing something?
You are right. You can't run different build steps of one job on different nodes. However, a job can be configured to run on different slaves, Hudson than determines at execution time what node that job will run on.
You need to configure labels for you nodes. A node can have more than one label. Every job can also require more than one label.
Example:
Node 1 has label maven and db2
Node 2 has label maven and ant
Job 1 requires label maven
can run on Node 1 and Node 2
Job 2 requires label ant
can run on Node 2
Job 2 requires label maven and db2
can run on Node 1
If you need different build steps of one job to run on different nodes you have to create more than one job and chain them. You only trigger the first job who triggers the subsequent jobs. One of the following jobs can access the artifacts of the previous job. You can even run two jobs in parallel and when both are done automatically trigger the next job. You will need the Join Plugin for the parallel jobs.
If you want load balancing and central administration from Hudson (i.e. configuring projects, seeing what builds run ATM, etc.), you must run slaves.