In YARN what is the difference between a managed and an unmanaged Application Manager - hadoop-yarn

I'm experimenting with the Distributed Shell example in YARN 2.2 and am hoping that someone can clarify what the difference between a managed and and an un-managed application manager is?
For example the following lines appear in the client code
// unmanaged AM
appContext.setUnmanagedAM(true);
but I am unable to find documentation explaining the difference this line makes to the execution behaviour.
Many thanks.

The setUnmanagedAM(true) is used for debugging purposes i.e. it runs an application manager in local mode and does not submit it to a cluster so it is easier to step into code and debug.
You can see it in use in the hadoop-yarn-applications-unmanaged-am-launcher.jar that ships with yarn

Check the respective JIRA tickets: JIRA-420 and JIRA-419 (client side)
Currently, the RM itself manages the AM by allocating a container for it and negotiating the launch on the NodeManager and manages the AM lifecycle. Thereafter, the AM negotiates resources with the RM and launches tasks to do the real work.
It would be a useful improvement to enhance this model by allowing the AM to be launched independently by the client without requiring the RM. These AM's would be launched on a gateway machine that can talk to the cluster. This would open up new use cases such as the following
1) Easy debugging of AM, specially during initial development. Having the AM launched on an arbitrary cluster node makes it hard to looks at logs or attach a debugger to the AM. If it can be launched locally then these tasks would be easier.
2) Running AM's that need special privileges that may not be available on machines managed by the NodeManager
Blog post with more implementation details on unmanaged AM: click-me
Example of how Impala manages its resources with the help of unmanaged applications: Llama

Related

Mac OS X XPC as IPC between two applications

I have a windowed application and windowless helper, sitting inside the app bundle, and working as login item. App can start and stop the helper, everything woks there. The problem is that I need to create some bidirectional communication channel between them. And it should work in both sandboxed and not sandboxed versions, desirably in OS X 10.7+.
I've investigated the topic and find that XPC can provide peer-to-per connection. I've read related Apple docs, as well as few topics below:
Is possible to use Mac OS X XPC like IPC to exchange messages between processes? How?
Communicate with another app using XPC
http://afewguyscoding.com/2012/07/ipc-easy-introducing-xpc-nsxpcconnection/
https://www.objc.io/issues/14-mac/xpc/
But I can't find any description of how should I organize my XCode project. I have two targets: "Main App" and "Helper App". Now I need to add the third one, taking XPC Service, as a template. OK, but what to do next? Where this XPC bundle should be located to be available for both applications? Note, that helper sits in the main app bundle, as it's a login item. So, I need some clear instruction or just a XCode project sample.
Thanks,
Alex
Alright for anyone that has been struggling with this, I was finally able to 100% get communication working between two application processes, using NSXPCConnection
The key to note is that you can only create an NSXPCConnection to three things.
An XPCService. You can connect to an XPCService strictly through
a name
A Mach Service. You can also connect to a Mach Service
strictly through a name
An NSXPCEndpoint. This is what we're
looking for, to communicate between two application processes.
The problem being that we can't directly transfer an NSXPCEndpoint from one application to another.
It involved creating a machservice Launch Agent (See this example for how to do that) that held an NSXPCEndpoint property. One application can connect to the machservice, and set that property to it's own [NSXPCListener anonymousListener].endpoint
Then the other application can connect to the machservice, and ask for that endpoint.
Then using that endpoint, an NSXPCConnection can be created, which successfully established a bridge between the two applications. I have tested sending objects back and forth, and it all works as expected.
Note that if your application is sandboxed, you will have to create an XPCService, as a middle man between your Application and the Machservice
I'm pretty pumped that I got this working-- I'm fairly active in SO, so if anybody is interested in source code, just add a comment and I can go through the effort to post more details
Some hurdles I came across:
You have to launch your machservice, these are the lines:
OSStatus err;
AuthorizationExternalForm extForm;
err = AuthorizationCreate(NULL, NULL, 0, &self->_authRef);
if (err == errAuthorizationSuccess) {
NSLog(#"SUCCESS AUTHORIZING DAEMON");
}
assert(err == errAuthorizationSuccess);
Boolean success;
CFErrorRef error;
success = SMJobBless(
kSMDomainSystemLaunchd,
CFSTR("DAEMON IDENTIFIER HERE"),
self->_authRef,
&error
);
Also, every time you rebuild your daemon, you have to unload the previous launch agent, with these bash commands:
sudo launchctl unload /Library/LaunchDaemons/com.example.apple-samplecode.EBAS.HelperTool.plist
sudo rm /Library/LaunchDaemons/com.example.apple-samplecode.EBAS.HelperTool.plist
sudo rm /Library/PrivilegedHelperTools/com.example.apple-samplecode.EBAS.HelperTool
(With your corresponding identifiers, of course)

YARN Architecture of Hadoop 2.0

From below link of Apache Hadoop site, I learn that
ApplicationMaster has the responsibility of negotiating appropriate
resource containers from the Scheduler (ResourceManager)
and also learn that
ApplicationsManager negotiating the first container for executing the
ApplicationMaster
Link : http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
So here is my confusion.
If ApplicationMaster has the responsilibility to request ResourceManager for Container, then Who is creating the first container and what is the process to create the first container for executing the ApplicationMaster?
Is there anyone giving and request to create the first container?
What are the resonsibilities of the first Container? First Container only executes the ApplicationMaster or it is also behaving like other Resource Container?
Please let me know if anyone has the idea regarding this.
First of all, you are confusing the terms ApplicationManager and ApplicationMaster. They are not the same, have a look at my answer to understand difference between Application Manager and Application Master in YARN.
Answers to your questions are given below:
YarnClient has the responsibility to submit the application to ResourceManager, it sends an ApplicationSubmissionContext object to ResourceManager, which represents all of the information needed by the ResourceManager to launch the ApplicationMaster for an application.
Yes, YarnClient does that!
First Container is the Application Master, its job is to request the resources(containers) from ResourceManager and make application level decisions. If a sufficient number of containers (defined by the logic in your ApplicationMaster) are provided by the ResourceManager, then ApplicationMaster can go ahead and launch the application code on containers. FurtherMore, ApplicationMaster keeps track of failed containers and relauch them or terminates the application(kills all other containers), again based on the logic of your ApplicationMaster.
To understand the internals of Hadoop YARN, i would suggest you to read YARN paper or if you have more time you can read a book on Hadoop YARN.

How to submit code to a remote Spark cluster from IntelliJ IDEA

I have two clusters, one in local virtual machine another in remote cloud. Both clusters in Standalone mode.
My Environment:
Scala: 2.10.4
Spark: 1.5.1
JDK: 1.8.40
OS: CentOS Linux release 7.1.1503 (Core)
The local cluster:
Spark Master: spark://local1:7077
The remote cluster:
Spark Master: spark://remote1:7077
I want to finish this:
Write codes(just simple word-count) in IntelliJ IDEA locally(on my laptp), and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077, then run my codes in IntelliJ IDEA. That is, I don't want to use spark-submit to submit a job.
But I got some problem:
When I use the local cluster, everything goes well. Run codes in IntelliJ IDEA or use spark-submit can submit job to cluster and can finish the job.
But When I use the remote cluster, I got a warning log:
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
It is sufficient resources not sufficient memory!
And this log keep printing, no further actions. Both spark-submit and run codes in IntelliJ IDEA result the same.
I want to know:
Is it possible to submit codes from IntelliJ IDEA to remote cluster?
If it's OK, does it need configuration?
What are the possible reasons that can cause my problem?
How can I handle this problem?
Thanks a lot!
Update
There is a similar question here, but I think my scene is different. When I run my codes in IntelliJ IDEA, and set Spark Master to local virtual machine cluster, it works. But I got Initial job has not accepted any resources;... warning instead.
I want to know whether the security policy or fireworks can cause this?
Submitting code programatically (e.g. via SparkSubmit) is quite tricky. At the least there is a variety of environment settings and considerations -handled by the spark-submit script - that are quite difficult to replicate within a scala program. I am still uncertain of how to achieve it: and there have been a number of long running threads within the spark developer community on the topic.
My answer here is about a portion of your post: specifically the
TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have
sufficient resources
The reason is typically there were a mismatch on the requested memory and/or number of cores from your job versus what were available on the cluster. Possibly when submitting from IJ the
$SPARK_HOME/conf/spark-defaults.conf
were not properly matching the parameters required for your task on the existing cluster. You may need to update:
spark.driver.memory 4g
spark.executor.memory 8g
spark.executor.cores 8
You can check the spark ui on port 8080 to verify that the parameters you requested are actually available on the cluster.

OpenSliceDDS across a network

I am completely new to the DDS world. I understand basic concepts like publish and subscribe, and the stuff that can be gained from the documentation. I am attempting to use OpenSlice DDS, and am able to get through the tutorial without much difficulty. However, I want to get two different computers on the same network to talk to each other, which seems like a relatively simple task, but i can find no documentation on it.
For example, the message chat room tutorial... how would i get the message board running on one machine, and the chatter on another machine?
Thanks!
Found it! http://opensplice.org/pipermail/developer/2009-July/000094.html.
To summarize from the link:
Setup your environment on node 1 by running the release file in the OSPL_HOME directory (release.bat)
start the opensplice daemon on node 1 (ospl start)
run the messageboard application on node 1
Setup your environment on node 1 by running the release file in the OSPL_HOME directory (release.bat)
start the opensplice daemon on node 2 (ospl start)
run the chatter application on node 2

tomcat7 clickstack not finding Config params

I am testing the tomcat7 clickstack for our application which has some config parameters set using the built in Config features of Cloudbees. The tomcat7 clickstack does not find them, but the standard tomcat6 container does. I have double checked them and reset them through the cloudbees sdk and they are there and correct, but are coming back as null for tomcat7.
The switch to clickstacks requires us to refactor how the servlet container gets configured so that the injection points such as cloudbees-web.xml and jvm system properties behave consistently across all the servlet container clickstacks.
Some of that refactoring has been committed but some of the work is still in my backlog... Assuming none of the other bees steal that task from my backlog before I get to it ;-)
IF I recall correctly, the parameters should be available as environment variables (sub optimal I know, but all containers should be giving this as a consistent UX for all clickstacks, eg both non-java based and java-based) and may be already available as system properties (again sub optimal, but the java container refactoring should be giving this as a consistent UX for all java based clickstacks). The consistent java servlet UX has not been committed yet but should be available soon.