Resource usage in nodelabel jobs - hadoop-yarn

I am using Hadoop 2.7.2. Is there any way I can measure the resource usage in the jobs running in nodelabel? I am looking for a way to get this through jmx.
I want to see how much usage is there in nodelabel and scale/descale accordingly.

Yes, you can. You can access Yarn Resource Manager web UI through port no. 8088 and if you want to access jmx report then simply enter 8088/jmx.
Eg: 192.1.1.1:8080/jmx ( For jmx report)
192.1.1.1:8088 ( For resource manager UI)

Related

IBM APIConnect - task security-appID

I have an instance of APIConnect on premise.
Analyzing the logs, I have seen the task called "security-appID" moving from 10ms execution time to 200ms execution time.
What is the meaning of this task?
This task I believe offloads application security requests to other integrations if you have it so configured. It does not have anything to do necessarily with apiconnect, it is probably related to your bluemix ID, dashboard or landing page and how that is setup. You can probably find more information about it in the BMX docs. https://console.dys0.bluemix.net/docs/services/appid/existing.html#adding-app-id-to-an-existing-app

YARN Architecture of Hadoop 2.0

From below link of Apache Hadoop site, I learn that
ApplicationMaster has the responsibility of negotiating appropriate
resource containers from the Scheduler (ResourceManager)
and also learn that
ApplicationsManager negotiating the first container for executing the
ApplicationMaster
Link : http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
So here is my confusion.
If ApplicationMaster has the responsilibility to request ResourceManager for Container, then Who is creating the first container and what is the process to create the first container for executing the ApplicationMaster?
Is there anyone giving and request to create the first container?
What are the resonsibilities of the first Container? First Container only executes the ApplicationMaster or it is also behaving like other Resource Container?
Please let me know if anyone has the idea regarding this.
First of all, you are confusing the terms ApplicationManager and ApplicationMaster. They are not the same, have a look at my answer to understand difference between Application Manager and Application Master in YARN.
Answers to your questions are given below:
YarnClient has the responsibility to submit the application to ResourceManager, it sends an ApplicationSubmissionContext object to ResourceManager, which represents all of the information needed by the ResourceManager to launch the ApplicationMaster for an application.
Yes, YarnClient does that!
First Container is the Application Master, its job is to request the resources(containers) from ResourceManager and make application level decisions. If a sufficient number of containers (defined by the logic in your ApplicationMaster) are provided by the ResourceManager, then ApplicationMaster can go ahead and launch the application code on containers. FurtherMore, ApplicationMaster keeps track of failed containers and relauch them or terminates the application(kills all other containers), again based on the logic of your ApplicationMaster.
To understand the internals of Hadoop YARN, i would suggest you to read YARN paper or if you have more time you can read a book on Hadoop YARN.

Saving VisualVM information as data

Using VisaulVM, I'd like to obtain this as data, without image processing algorithms being applied... How can I do that? I think this won't come out of a snapshot.
I am not sure how VisualVM and jVisualVM vary, the naming is sure confusing, but I'm running the Oracle supplied one (Version 1.7.0_80 (Build 150109))
Thanks!
You can use Tracer plugin with various probes. Tracer can export data in CSV, HTML or XML.
All this information is available through JMX. That's how VisualVM gets the information and you can use the same technology to get it too. First install the VisualVM-MBeans plugin from the Tools menu. This will add another tab titled MBeans where you can see all the available data for your application. You will find the graphed data under java.lang.Memory and java.lang.OperationSystem.
If you're trying to check information for your own process, it's as simple as calling ManagementFactory.getOperatingSystemMXBean().getSystemLoadAverage() and ManagementFactory.getMemoryMXBean().getHeapMemoryUsage(). There are more, but these should get you started.
To get precise CPU usage see: Using OperatingSystemMXBean to get CPU usage
If you want to get information on another process, you'd need some more code. There is a complete answer on Accessing a remote MBean server but basically:
// replace host and port
// not tested, might not work
JMXServiceURL url = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://<addr>:<port>");
JMXConnector jmxConnector = JMXConnectorFactory.connect(url);
MBeanServerConnection connection = jmxConnector.getMBeanServerConnection();
OperatingSystemMXBean bean = ManagementFactory.getPlatformMXBean(connection, OperatingSystemMXBean.class);
bean.getSystemLoadAverage();
You will also have to start your Java process with exposed JMX as explained on How to activate JMX on my JVM for access with jconsole? but basically:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=9010
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
There is also a way to enumerate Java processes running on the local machine and even connect to processes that don't have JMX enabled (though you get less data). If that's what you're looking for, VisualVM Source Code will be a good place to start.
To answer your other question about naming:
VisualVM is opensource project hosted at visualvm.java.net and Java VisualVM is stable version of VisualVM with Oracle branding and other small changes. Java VisualVM is distributed in JDK. There is a table where you can find which VisualVM release is the basis for Java VisualVM in respective JDK update.

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

How to investigate if a workmanager is being used or not?

We have like 20 WorkManagers and right now our project is in cleaning phase. I am assigned a task to list down all the workmanagers which are being used and which are not. I can see list of created workmanagers on WebLogic Console but how can i figure out if some work manager is handling some requests or ont?
Is there any history graph?
Is there any log?
Anything which tells that which workmanager process which request?
Weblogic 10.2
Look at the admin console and see if all 20 Work Managers are serving requests.
Under
Deployments -> Your Application name -> Configuration -> Workload tab.