I know Ignite still does not support setting up custom YARN queues from this JIRA ticket - https://issues.apache.org/jira/browse/IGNITE-2738 . I cannot find any information on whether Ignite supports running its containers within specified YARN node labels?
Currently in our cluster we have labelled all of our nodes and in attempting to start an Ignite applicaiton, the app is stuck in Pending stage because it is waiting for resources to be assigned from AM, with the AM container Node Label expression defaulting to <DEFAULT_PARTITION> .
Is there a way to supply node labels for Ignite on YARN?
ignite-yarn doesn't seem to set node labels.
Have you tried specifying them externally?
Related
I've installed fresh version of DSE 6.8 for dev purposes, after installing a cluster with one node (Cassandra + Solr) I want to allow Graph, the job keeps failing with error:
Graph is enabled and should have native-transport-address set to 0.0.0.0. name="node1" ssh-management-address="IP" rack="rack1"
Changed the cassandra.yaml from:
native_transport_address: IP
to:
native_transport_address: localhost
The job keeps failing, any ideas?
As it says, you need to configure setting native_transport_address in the node definition dialog to 0.0.0.0, and native_transport_broadcast_address to actual IP address.
This change should be done in the LCM UI as described in documentation, and then you can say reconfigure, or reinstall - you shouldn't change cassandra.yaml directly - it's generated by LCM.
We have a 10 node cluster with version 3.9 running with cold-start-empty false, in which we did the following activity :
Added a node 10.0.29.212 with version community build 3.13.0.10
Waited for migrations to finish (new cluster size 11). There were incoming Migrations on only 10.0.29.212 node as expected.
Added 2 nodes 10.0.29.190 , 10.0.29.135 simultaneously with version community build 3.13.0.10.
Waited for migrations to finish (new cluster size 13).Incoming Migrations on only these two nodes node as expected.
Added a node 10.0.29.214 after few hours with version community build 3.13.0.10.
Immediately after the node was added , the total master objects in the cluster dropped and incoming migrations started on all nodes and we started getting timeouts on cluster.
I am running EMR cluster with 3 m5.xlarge nodes (1 master, 2 core) and Flink 1.8 installed (emr-5.24.1).
On master node I start a Flink session within YARN cluster using the following command:
flink-yarn-session -s 4 -jm 12288m -tm 12288m
That is the maximum memory and slots per TaskManager that YARN let me set up based on selected instance types.
During startup there is a log:
org.apache.flink.yarn.AbstractYarnClusterDescriptor - Cluster specification: ClusterSpecification{masterMemoryMB=12288, taskManagerMemoryMB=12288, numberTaskManagers=1, slotsPerTaskManager=4}
This shows that there is only one task manager. Also when looking at YARN Node manager I see that there is only one container running on one of the core nodes. YARN Resource manager shows that the application is using only 50% of cluster.
With the current setup I would assume that I can run Flink job with parallelism set to 8 (2 TaskManagers * 4 slots), but in case that submitted job has set parallelism to more than 4, it fails after a while as it could not get desired resources.
In case the job parallelism is set to 4 (or less), the job runs as it should. Looking at CPU and memory utilisation with Ganglia it shows that only one node is utilised, while the other flat.
Why is application run only on one node and how to utilise the other node as well? Did I need to set up something on YARN that it would set up Flink on the other node as well?
In previous version of Flik there was startup option -n which was used to specify number of task managers. The option is now obsolete.
When you're starting a 'Session Cluster', you should see only one container which is used for the Flink Job Manager. This is probably what you see in the YARN Resource Manager. Additional containers will automatically be allocated for Task Managers, once you submit a job.
How many cores do you see available in the Resource Manager UI?
Don't forget that the Job Manager also uses cores out of the available 8.
You need to do a little "Math" here.
For example, if you would have set the number of slots to 2 per TM and less memory per TM, then submitted a job with parallelism of 6 it should have worked with 3 TMs.
This is in AWS EMR cluster with 2 task nodes and a Master.
I'm trying the hello-samza that launches a yarn job. The job gets stuck in ACCEPTED STATE. I looked in other posts and it seems that my yarn getting no nodes. Any help on what yarn not getting task nodes will help.
[hadoop#xxx hello-samza]$ deploy/yarn/bin/yarn node -list
17/04/18 23:30:45 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total Nodes:0
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
[hadoop#xxx hello-samza]$ deploy/yarn/bin/yarn application -list -appStates ALL
17/04/18 23:26:30 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
Total number of applications (application-types: [] and states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED]):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1492557889328_0001 wikipedia-parser_1 Samza hadoop default ACCEPTED UNDEFINED 0% N/A
I made a complete answer for a similar case I've been experiencing: have a look at it, it might be this kind of conf issue
It seems like the nodemanagers are not running on either node (either not started at all or exited with error). Use jps command to check if all the daemons associated with YARN are running on the two nodes. Additionally, check both nodemanager logs to see if any exceptions might have killed it.
Is it possible (and how) to specify a shell script somewhere which will be executed each time a new node is added to Ambari cluster?
I'm using HDP Ambari for that and I would like to add some symbolic links when setup of new node is completed, but I want to automatize that so that I (or someone else) don't forget it.
There is no functionality that currently exists that will enable you to execute a script when a node is added to the cluster. What you're asking for is a custom hook. You would have to look through the Ambari source code and see if you can define a custom hook for the stack. There are a few hooks provided in each stack, for examples see: https://github.com/apache/ambari/tree/trunk/ambari-server/src/main/resources/stacks/HDP/2.0.6/hooks